Find Duplicate Words in a String in Java

Finding the duplicate or repeated words in a Java String is a very common interview question. We can find all the duplicate words using different methods such as Collections and Java 8 Streams.

1. Problem

Suppose we have a string with names. We want to count which names appear more than once. We may also want to count the occurences of such duploicate words as well as all words.

String sentence = "alex brian charles alex charles david eric david";

The above string contains 3 duplicate words that occur twice, and two unique words.

alex=2
charles=2
david=2

brian=1
eric=1

2. Find Duplicate Words using Stream

Java Stream API provides several useful methods to iterate over collections, perform intermediate operations and collect the matching items into new collections.

In given Java program, we are doing the following steps:
Split the string with whitespace to get all words in a String[]
Convert String[] to List containing all the words
Iterate over List using Stream and find duplicate words

To determine that a word is duplicate, we are mainitaining a HashSet. If the Set.add() method return false, the it means that word is already present in the set and thus it is duplicate.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Set<String> tempSet = new HashSet<>();

List<String> duplicateWords = wordsList.stream()
    .filter(w -> !tempSet.add(w))
    .collect(Collectors.toList());

System.out.println(duplicateWords);

Program output.

[alex, charles, david]

Suppose we want to count the occurrences of each word in the sentence then we can collect the words using toMap() and count the occurences with Math::addExact.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Map<String, Integer> wordsMapWithCount = wordsList.stream()
        .collect(Collectors.toMap(Function.identity(), word -> 1, Math::addExact));

System.out.println(wordsMapWithCount);

Program output.

{alex=2, eric=1, charles=2, david=2, brian=1}

If we want to find only the duplicate words and their number of occurences then we can filter() the above Map as follows:

Map<String, Integer> dupWordsMapWithCount = wordsMapWithCount.entrySet()
    .stream().filter(e -> e.getValue() > 1)
    .collect(Collectors.toMap(Entry::getKey, Entry::getValue));

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

3. Find Duplicate Words using Collections

Largely, the process to find the duplicates using Collections is simlar to previous approach.

We start with splitting the string and collecting all words in a List. Then we use the HashSet.add() method to check if the word is unique or duplicate.

List<String> wordsList = Arrays.asList(sentence.split(" "));
Set<String> tempSet = new HashSet<>();
List<String> duplicateWords = new ArrayList<>();

for (String word : wordsList) {
  if (!tempSet.add(word)) {
    duplicateWords.add(word);
  }
}

System.out.println(duplicateWords);

Program output.

[alex, charles, david]

If we are interested in finding the duplicate words along with their count of occureneces in the String, we can use the Collections.frequency(list, item) API that counts the number of times a item appears in the specified list.

Map<String, Integer> dupWordsMapWithCount = new HashMap<>();

for (String word : duplicateWords) {

  dupWordsMapWithCount.put(word, Collections.frequency(wordsList, word));
}

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

4. Conclusion

In this Java tutorial, we discussed the two approches to find all duplicate words in a String and how many number of times they apprear in that String. These Java programs can be used to find the unique words in a string too.

Happy Learning !!

Sourcecode on Github

Find Duplicate Words in a String in Java

1. Problem

2. Find Duplicate Words using Stream

3. Find Duplicate Words using Collections

4. Conclusion

Weekly Newsletter

Comments

Java Cloning – Deep and Shallow Copy – Copy Constructors

Java Memory Management – Garbage Collection Algorithms

About Us

Tutorial Series

Meta Links

Our Blogs

Follow On: