How to Find Duplicate Words in String in Java

This is very common interview question where you have to find duplicate words in a string or text file. This can be solved using some overly-complex algorithms also, but in this post, I will propose rather an easy way using Java Collections API.

1. Problem

Lets say we have an String/ text like below-

“a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f”;

Above string is the result of some random hits on keyboard to make it complete unpredictable. [For fun only 🙂]

We will use Collections.frequency() method to do this job. This method seems to be here only for this purpose.

Lets go directly to our solution and see it works at all.

2. Java program to find repeated words

Given below is a Java program to find the number of occurrences of each word in a sentence or String.

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class DuplicateWordSearcher {
	@SuppressWarnings("unchecked")
	public static void main(String[] args) {

		String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f";

		List<String> list = Arrays.asList(text.split(" "));

		Set<String> uniqueWords = new HashSet<String>(list);
		for (String word : uniqueWords) {
			System.out.println(word + ": " + Collections.frequency(list, word));
		}
	}
}

Output:

ft: 1
f: 7
g: 2
d: 5
e: 1
b: 1
c: 1
a: 3
wed: 1
sd: 1
se: 1
j: 3
ws: 1
k: 2
h: 2
w: 1
v: 1
s: 4
r: 1
gh: 1
x: 1

3. Summary

In above example, we get a Java program to count how many times a word appears in a String or find duplicate words. It can help you in to find most frequent words or count repeated words in a string.

This can be a Java program to find unique words in a string, also. Just check the count which will be equal to one for unique words.

Happy Learning !!

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

30 thoughts on “How to Find Duplicate Words in String in Java”

  1. Hi
    I have been trying to write a script from last 2 days but can’t think of any logic. Please help me !!!

    I enter name, fon n email in a form and when we click submit it goes and gets displayed on the User List page.

    I want to fetch the data and check if any duplicate name or email address exists on that User List. As new enters the details it gets saved on that web page.

    How do I fetch data from that particular page and say duplicate email.

    I have tried but can’t think of any logic and finding very hard. Please help

    Reply
  2. If String s = “abacdadbgbc”;

    i need the output
    a
    and i don’t need other repeated character in my output
    I need only first repeated Character as my output

    Reply
    • ANS:
      import java.util.*;
      public class DisplayOnlyDuplicate {

      public static void main(String[] args) {
      ArrayList as=new ArrayList();
      as.add(“call”);
      as.add(“me”);
      as.add(“call”);
      as.add(“you”);
      as.add(“me”);
      Set s1=new HashSet();
      for(String s2:s1)
      {
      if(s1.add(s2)==false)//Will check non duplicate element in the String
      {
      System.out.println(s2);
      }
      }

      }

      }

      Reply
        • package set;
          
          import java.awt.List;
          import java.util.ArrayList;
          import java.util.HashSet;
          import java.util.Set;
          
          public class DisplayOnlyDuplicate {
          
          	public static void main(String[] args) {
          		ArrayList l1 = new ArrayList();
          		l1.add("call");
          		l1.add("me");
          		l1.add("call");
          		l1.add("you");
          		l1.add("me");
          		System.out.println("List: " + l1);
          		
          		Set s1= new HashSet();
          		for(Object s2:l1){
          			if(s1.add(s2)== false){
          				System.out.println(s2);
          			}
          		}
          	}
          
          }
          Reply
      • I hope the statement in above code should be:
        //the first line here

        for(String s2: as){
        if(s1.add(s2)==false){
        System.out.println(s2);

        Reply
  3. Hi Lokesh, Really your articles are best to learn , even to understand the core. I am new to java , want to understand about the way to calculate the performance. You have used something like O(N^2). Thanks in advance.

    Reply
  4. Hi Lokesh,

    Your articles are excellent. Your website is a major base for my interview preparation. However for the benefit of people preparing for interviews like me, the complexity of the solution using Collections.frequency is O(N^2) since the complexity of the Collections.frequency by itself is O(N). Hence an alternative solution of putting the words and their count as key value pair in a LinkedHashMap will have a complexity of O(N). (LinkedHashMap is used to maintain the order just incase if you have to recreate the String removing the duplicates).

    Aravind.

    Reply
  5. Hi, One of the interview they asked me to write a program for the below. mentioned logic.

    input : { “abc”,”cba”,”BCa”,”bac”,”xyz”,”zyx”,”123″,”321″…} list of anagrams.

    then output should be : { “abc”,”xyz”,”123″}.

    Could you please helpw the logic behind this.

    Reply
    • I have quickly written some code for this. Hope it helps:

      package com.howtodoinjava;
      
      import java.util.HashSet;
      
      public class AnagramExample
      {
         public static void main(String[] args)
         {
            String[] rawList = {&quot;abc&quot;, &quot;cba&quot;, &quot;BCa&quot;, &quot;bac&quot;, &quot;xyz&quot;, &quot;zyx&quot;, &quot;123&quot;, &quot;321&quot;};
            
            //This will store unique strings
            HashSet&lt;String&gt; uniqueStrings = new HashSet&lt;String&gt;();
            
            //Get all anagrams and sort it's chars and store in set
            for(String s : rawList){
               uniqueStrings.add(sortString(s.toLowerCase()));
            }
            
            //We got unique strings here; Great !!
            System.out.println(uniqueStrings);
         }
         
         //Function to sort all chars inside a string
         private static String sortString(String s){
            char[] c = s.toCharArray();        //Convert to array of chars 
            java.util.Arrays.sort(c);          //Sort chars
            return new String(c);             //Return sorted String
         }
      }
      
      Output:
      
      [abc, 123, xyz]
      
      Reply
          • A very broad question which is not possible to answer in few sentences. I will write my thoughts in any future post dedicated to performance improvements. For now, my only suggestion is “Keep it simple, don’t over-engineer anything”.

          • Hi Lokesh, can you please help me with this problem below. please i eally need your help. I’ll be glad if you response. Thank you very much 🙂

            Problem: you are about to write a program that reads in a while bunch of words and will spit out the duplicate words. for our purposes, words are to be separated by spaces, commas, ! , ? , and period. Output the duplicate words in line, in lowercase and in sorted order.

            sample input:
            how much wood would a woodchuck if a woodchuck could chuck wood? are you feeling well?

            sample output:
            a
            chuck
            well
            wood
            woodchuck
            you

  6. Hi Lokesh,

    I think this one is better appraoch plz suggest :

    String words[] = { “the”, “cat”, “in”, “the”, “hat” };
    HashMap wordCounts = new HashMap();
    for (String w : words) {
    Integer i = wordCounts.get(w);
    if (i == null)
    wordCounts.put(w, 1);
    else
    wordCounts.put(w, i + 1);
    }

    Reply
    • In Java language, if you do not want to include any collection class then something like this will work.

      Pseudo code

      String[] words = sentence.split( sentense );
      int length = words.length();
      for(int i = 0; i < length; i++) { for(int j = 0; j < length; j++) { if(words[i].equalsIgnoreCase(words[j])) { System.out.printf("%s %s", word[i], word[j]); } } }

      Reply
      • This will cost you O(N^2)
        a Better solution will be to sort the words list which cost O(NlogN)
        and then to run on the sorted list to find equal strings which cost O(N)
        total cost will be = O(NLOGN) + O(N) = O(NLogN)

        Reply
  7. List list = new ArrayList();

    list.addAll(Arrays.asList(text.split(” “)));

    I don’t think you need to create a new ArrayList when Arrays.asList does this exact thing for you:
    List stooges = Arrays.asList(“Larry”, “Moe”, “Curly”);

    A major part of the programming exercise I ask people to work on during an interview is to supply a unit test for the example code. We are finding that a lot of candidates have never written tests for any of their code and have no clue how to produce testable code. In turn, that lack of experience leads to them producing code that is impossible to maintain over the long term as well as high numbers of easy to find bugs.

    We have also discovered that although many people are able to code these small examples, they are unable to handle the complexity of the real work.

    Reply
  8. In my dzone link: https://dzone.com/links/how_to_find_duplicate_words_in_a_string_in_java.html. Some asked below question:
    “As interviewer, my next question would be “great, now assume Collections.frequency didn’t exist. How would you do it then?” And strictly speaking, your answer still needs to eliminate words that have frequency 1.”

    My answer was:
    “You can ask. You are the boss… Let me make a wild guess. In my other simplest solution, I will sort the collection so that it makes similar words in sequence. Now a simple iteration will filter out all different words and their frequency.. including one also. You can choose them to ignore. I know this solution can have some performance overhead, but for that large load of data, I will go for some expert algorithm implementation. Otherwise, it is also good to go.”

    What will be your answer???????

    Reply
    • An approach could be after changing the text string to an arrayList, I will push them to a HashMap with the found word as and a 1 as , and will check if the key exist,if so I will increment the value to value+1. so where the value is greater than 1 are duplicates. Your comments pls..

      Reply
      • This will also work. BUT, text.split(” “) will return you a String array. No need to convert to arraylist. You can iterate over array and push the elements in map, and increment the counter.
        This way we can save at least one extra object in runtime.

        Reply

Leave a Comment

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.