Find Duplicate Words in a String in Java

Learn to find duplicate words in string or sentence along with their counts, using the Collections and Java 8 Stream with Examples.

Finding the duplicate or repeated words in a Java String is a very common interview question. We can find all the duplicate words using different methods such as Collections and Java 8 Streams.

1. Problem

Suppose we have a string with names. We want to count which names appear more than once. We may also want to count the occurences of such duploicate words as well as all words.

String sentence = "alex brian charles alex charles david eric david";

The above string contains 3 duplicate words that occur twice, and two unique words.

alex=2
charles=2
david=2

brian=1
eric=1

2. Find Duplicate Words using Stream

Java Stream API provides several useful methods to iterate over collections, perform intermediate operations and collect the matching items into new collections.

  • In given Java program, we are doing the following steps:
  • Split the string with whitespace to get all words in a String[]
  • Convert String[] to List containing all the words
  • Iterate over List using Stream and find duplicate words

To determine that a word is duplicate, we are mainitaining a HashSet. If the Set.add() method return false, the it means that word is already present in the set and thus it is duplicate.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Set<String> tempSet = new HashSet<>();

List<String> duplicateWords = wordsList.stream()
    .filter(w -> !tempSet.add(w))
    .collect(Collectors.toList());

System.out.println(duplicateWords); 

Program output.

[alex, charles, david]

Suppose we want to count the occurrences of each word in the sentence then we can collect the words using toMap() and count the occurences with Math::addExact.

List<String> wordsList = Arrays.stream(sentence.split(" ")).collect(Collectors.toList());

Map<String, Integer> wordsMapWithCount = wordsList.stream()
        .collect(Collectors.toMap(Function.identity(), word -> 1, Math::addExact));

System.out.println(wordsMapWithCount);

Program output.

{alex=2, eric=1, charles=2, david=2, brian=1}

If we want to find only the duplicate words and their number of occurences then we can filter() the above Map as follows:

Map<String, Integer> dupWordsMapWithCount = wordsMapWithCount.entrySet()
    .stream().filter(e -> e.getValue() > 1)
    .collect(Collectors.toMap(Entry::getKey, Entry::getValue));

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

3. Find Duplicate Words using Collections

Largely, the process to find the duplicates using Collections is simlar to previous approach.

We start with splitting the string and collecting all words in a List. Then we use the HashSet.add() method to check if the word is unique or duplicate.

List<String> wordsList = Arrays.asList(sentence.split(" "));
Set<String> tempSet = new HashSet<>();
List<String> duplicateWords = new ArrayList<>();

for (String word : wordsList) {
  if (!tempSet.add(word)) {
    duplicateWords.add(word);
  }
}

System.out.println(duplicateWords);

Program output.

[alex, charles, david]

If we are interested in finding the duplicate words along with their count of occureneces in the String, we can use the Collections.frequency(list, item) API that counts the number of times a item appears in the specified list.

Map<String, Integer> dupWordsMapWithCount = new HashMap<>();

for (String word : duplicateWords) {

  dupWordsMapWithCount.put(word, Collections.frequency(wordsList, word));
}

System.out.println(dupWordsMapWithCount);

Program output.

{alex=2, charles=2, david=2}

4. Conclusion

In this Java tutorial, we discussed the two approches to find all duplicate words in a String and how many number of times they apprear in that String. These Java programs can be used to find the unique words in a string too.

Happy Learning !!

Sourcecode on Github

Leave a Comment

  1. /*To check if the values in a map are greater than 1 using a lambda expression with Java 8, you can iterate over the entries of the map using the forEach method and then apply the condition to each value. Here’s how you can do it:
    */
    String sentence = “alex brian charles alex charles david eric david”;

        Map<String,Integer> ma=new HashMap<>();

        for(String k1:sentence.split(” “)){

          Integer l1=ma.get(k1);
          ma.put(k1,(l1==null) ? 1: l1+1);

        }

        ma.forEach((key, value) -> {
          if (value > 1) {
            System.out.println(“Key: ” + key + “, Value: ” + value + ” is greater than 1.”);
          }
        });
    ————————————————–

    Output :
    Key: alex, Value: 2 is greater than 1.
    Key: charles, Value: 2 is greater than 1.
    Key: david, Value: 2 is greater than 1.

    Reply
  2. If String s = “abacdadbgbc”;

    i need the output
    a
    and i don’t need other repeated character in my output
    I need only first repeated Character as my output

    Reply
    • ANS:
      import java.util.*;
      public class DisplayOnlyDuplicate {

      public static void main(String[] args) {
      ArrayList as=new ArrayList();
      as.add(“call”);
      as.add(“me”);
      as.add(“call”);
      as.add(“you”);
      as.add(“me”);
      Set s1=new HashSet();
      for(String s2:s1)
      {
      if(s1.add(s2)==false)//Will check non duplicate element in the String
      {
      System.out.println(s2);
      }
      }

      }

      }

      Reply
        • package set;
          
          import java.awt.List;
          import java.util.ArrayList;
          import java.util.HashSet;
          import java.util.Set;
          
          public class DisplayOnlyDuplicate {
          
          	public static void main(String[] args) {
          		ArrayList l1 = new ArrayList();
          		l1.add("call");
          		l1.add("me");
          		l1.add("call");
          		l1.add("you");
          		l1.add("me");
          		System.out.println("List: " + l1);
          		
          		Set s1= new HashSet();
          		for(Object s2:l1){
          			if(s1.add(s2)== false){
          				System.out.println(s2);
          			}
          		}
          	}
          
          }
          Reply
      • I hope the statement in above code should be:
        //the first line here

        for(String s2: as){
        if(s1.add(s2)==false){
        System.out.println(s2);

        Reply
  3. Hi Lokesh,

    Your articles are excellent. Your website is a major base for my interview preparation. However for the benefit of people preparing for interviews like me, the complexity of the solution using Collections.frequency is O(N^2) since the complexity of the Collections.frequency by itself is O(N). Hence an alternative solution of putting the words and their count as key value pair in a LinkedHashMap will have a complexity of O(N). (LinkedHashMap is used to maintain the order just incase if you have to recreate the String removing the duplicates).

    Aravind.

    Reply
  4. Hi, One of the interview they asked me to write a program for the below. mentioned logic.

    input : { “abc”,”cba”,”BCa”,”bac”,”xyz”,”zyx”,”123″,”321″…} list of anagrams.

    then output should be : { “abc”,”xyz”,”123″}.

    Could you please helpw the logic behind this.

    Reply
    • I have quickly written some code for this. Hope it helps:

      package com.howtodoinjava;
      
      import java.util.HashSet;
      
      public class AnagramExample
      {
         public static void main(String[] args)
         {
            String[] rawList = {&quot;abc&quot;, &quot;cba&quot;, &quot;BCa&quot;, &quot;bac&quot;, &quot;xyz&quot;, &quot;zyx&quot;, &quot;123&quot;, &quot;321&quot;};
            
            //This will store unique strings
            HashSet&lt;String&gt; uniqueStrings = new HashSet&lt;String&gt;();
            
            //Get all anagrams and sort it's chars and store in set
            for(String s : rawList){
               uniqueStrings.add(sortString(s.toLowerCase()));
            }
            
            //We got unique strings here; Great !!
            System.out.println(uniqueStrings);
         }
         
         //Function to sort all chars inside a string
         private static String sortString(String s){
            char[] c = s.toCharArray();        //Convert to array of chars 
            java.util.Arrays.sort(c);          //Sort chars
            return new String(c);             //Return sorted String
         }
      }
      
      Output:
      
      [abc, 123, xyz]
      
      Reply
          • A very broad question which is not possible to answer in few sentences. I will write my thoughts in any future post dedicated to performance improvements. For now, my only suggestion is “Keep it simple, don’t over-engineer anything”.

          • Hi Lokesh, can you please help me with this problem below. please i eally need your help. I’ll be glad if you response. Thank you very much :)

            Problem: you are about to write a program that reads in a while bunch of words and will spit out the duplicate words. for our purposes, words are to be separated by spaces, commas, ! , ? , and period. Output the duplicate words in line, in lowercase and in sorted order.

            sample input:
            how much wood would a woodchuck if a woodchuck could chuck wood? are you feeling well?

            sample output:
            a
            chuck
            well
            wood
            woodchuck
            you

  5. Hi Lokesh,

    I think this one is better appraoch plz suggest :

    String words[] = { “the”, “cat”, “in”, “the”, “hat” };
    HashMap wordCounts = new HashMap();
    for (String w : words) {
    Integer i = wordCounts.get(w);
    if (i == null)
    wordCounts.put(w, 1);
    else
    wordCounts.put(w, i + 1);
    }

    Reply
  6. In Java language, if you do not want to include any collection class then something like this will work.

    Pseudo code

    String[] words = sentence.split( sentense );
    int length = words.length();
    for(int i = 0; i < length; i++) { for(int j = 0; j < length; j++) { if(words[i].equalsIgnoreCase(words[j])) { System.out.printf("%s %s", word[i], word[j]); } } }

    Reply
    • This will cost you O(N^2)
      a Better solution will be to sort the words list which cost O(NlogN)
      and then to run on the sorted list to find equal strings which cost O(N)
      total cost will be = O(NLOGN) + O(N) = O(NLogN)

      Reply
  7. An approach could be after changing the text string to an arrayList, I will push them to a HashMap with the found word as and a 1 as , and will check if the key exist,if so I will increment the value to value+1. so where the value is greater than 1 are duplicates. Your comments pls..

    Reply
    • This will also work. BUT, text.split(” “) will return you a String array. No need to convert to arraylist. You can iterate over array and push the elements in map, and increment the counter.
      This way we can save at least one extra object in runtime.

      Reply
  8. List list = new ArrayList();

    list.addAll(Arrays.asList(text.split(” “)));

    I don’t think you need to create a new ArrayList when Arrays.asList does this exact thing for you:
    List stooges = Arrays.asList(“Larry”, “Moe”, “Curly”);

    A major part of the programming exercise I ask people to work on during an interview is to supply a unit test for the example code. We are finding that a lot of candidates have never written tests for any of their code and have no clue how to produce testable code. In turn, that lack of experience leads to them producing code that is impossible to maintain over the long term as well as high numbers of easy to find bugs.

    We have also discovered that although many people are able to code these small examples, they are unable to handle the complexity of the real work.

    Reply

Leave a Comment

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.