Find Duplicate Words in a String in Java

Finding the duplicate or repeated words in a Java String is a very common interview question. We can find all the duplicate words using different methods such as Collections and Java 8 Streams.

1. Problem

Suppose we have a string with names. We want to count which names appear more than once. We may also want to count the occurences of such duploicate words as well as all words.

String sentence = "alex brian charles alex charles david eric david";

The above string contains 3 duplicate words that occur twice, and two unique words.

alex=2
charles=2
david=2

brian=1
eric=1

2. Find Duplicate Words using Stream

Java Stream API provides several useful methods to iterate over collections, perform intermediate operations and collect the matching items into new collections.

  • In given Java program, we are doing the following steps:
  • Split the string with whitespace to get all words in a String[]
  • Convert String[] to List containing all the words
  • Iterate over List using Stream and find duplicate words

To determine that a word is duplicate, we are mainitaining a HashSet. If the Set.add() method return false, the it means that word is already present in the set and thus it is duplicate.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Set<String> tempSet = new HashSet<>();

List<String> duplicateWords = wordsList.stream()
    .filter(w -> !tempSet.add(w))
    .collect(Collectors.toList());

System.out.println(duplicateWords); 

Program output.

[alex, charles, david]

Suppose we want to count the occurrences of each word in the sentence then we can collect the words using toMap() and count the occurences with Math::addExact.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Map<String, Integer> wordsMapWithCount = wordsList.stream()
        .collect(Collectors.toMap(Function.identity(), word -> 1, Math::addExact));

System.out.println(wordsMapWithCount);

Program output.

{alex=2, eric=1, charles=2, david=2, brian=1}

If we want to find only the duplicate words and their number of occurences then we can filter() the above Map as follows:

Map<String, Integer> dupWordsMapWithCount = wordsMapWithCount.entrySet()
    .stream().filter(e -> e.getValue() > 1)
    .collect(Collectors.toMap(Entry::getKey, Entry::getValue));

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

3. Find Duplicate Words using Collections

Largely, the process to find the duplicates using Collections is simlar to previous approach.

We start with splitting the string and collecting all words in a List. Then we use the HashSet.add() method to check if the word is unique or duplicate.

List<String> wordsList = Arrays.asList(sentence.split(" "));
Set<String> tempSet = new HashSet<>();
List<String> duplicateWords = new ArrayList<>();

for (String word : wordsList) {
  if (!tempSet.add(word)) {
    duplicateWords.add(word);
  }
}

System.out.println(duplicateWords);

Program output.

[alex, charles, david]

If we are interested in finding the duplicate words along with their count of occureneces in the String, we can use the Collections.frequency(list, item) API that counts the number of times a item appears in the specified list.

Map<String, Integer> dupWordsMapWithCount = new HashMap<>();

for (String word : duplicateWords) {

  dupWordsMapWithCount.put(word, Collections.frequency(wordsList, word));
}

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

4. Conclusion

In this Java tutorial, we discussed the two approches to find all duplicate words in a String and how many number of times they apprear in that String. These Java programs can be used to find the unique words in a string too.

Happy Learning !!

Sourcecode on Github

Comments

Subscribe
Notify of
guest
22 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode