HowToDoInJava

  • Python
  • Java
  • Spring Boot
  • Dark Mode
Home / Java / String / How to Find Duplicate Words in String in Java

How to Find Duplicate Words in String in Java

This is very common interview question where you have to find duplicate words in a string or text file. This can be solved using some overly-complex algorithms also, but in this post, I will propose rather an easy way using Java Collections API.

1. Problem

Lets say we have an String/ text like below-

“a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f”;

Above string is the result of some random hits on keyboard to make it complete unpredictable. [For fun only ๐Ÿ™‚]

We will use Collections.frequency() method to do this job. This method seems to be here only for this purpose.

Lets go directly to our solution and see it works at all.

2. Java program to find repeated words

Given below is a Java program to find the number of occurrences of each word in a sentence or String.

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class DuplicateWordSearcher {
	@SuppressWarnings("unchecked")
	public static void main(String[] args) {

		String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f";

		List<String> list = Arrays.asList(text.split(" "));

		Set<String> uniqueWords = new HashSet<String>(list);
		for (String word : uniqueWords) {
			System.out.println(word + ": " + Collections.frequency(list, word));
		}
	}
}

Output:

ft: 1
f: 7
g: 2
d: 5
e: 1
b: 1
c: 1
a: 3
wed: 1
sd: 1
se: 1
j: 3
ws: 1
k: 2
h: 2
w: 1
v: 1
s: 4
r: 1
gh: 1
x: 1

3. Summary

In above example, we get a Java program to count how many times a word appears in a String or find duplicate words. It can help you in to find most frequent words or count repeated words in a string.

This can be a Java program to find unique words in a string, also. Just check the count which will be equal to one for unique words.

Happy Learning !!

Was this post helpful?

Let us know if you liked the post. Thatโ€™s the only way we can improve.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Reddit

About Lokesh Gupta

A family guy with fun loving nature. Love computers, programming and solving everyday problems. Find me on Facebook and Twitter.

Feedback, Discussion and Comments

  1. Sukhi

    December 7, 2016

    Hi
    I have been trying to write a script from last 2 days but can’t think of any logic. Please help me !!!

    I enter name, fon n email in a form and when we click submit it goes and gets displayed on the User List page.

    I want to fetch the data and check if any duplicate name or email address exists on that User List. As new enters the details it gets saved on that web page.

    How do I fetch data from that particular page and say duplicate email.

    I have tried but can’t think of any logic and finding very hard. Please help

  2. Maujood

    February 27, 2016

    If String s = “abacdadbgbc”;

    i need the output
    a
    and i don’t need other repeated character in my output
    I need only first repeated Character as my output

    • Dhanashekar

      May 26, 2017

      ANS:
      import java.util.*;
      public class DisplayOnlyDuplicate {

      public static void main(String[] args) {
      ArrayList as=new ArrayList();
      as.add(“call”);
      as.add(“me”);
      as.add(“call”);
      as.add(“you”);
      as.add(“me”);
      Set s1=new HashSet();
      for(String s2:s1)
      {
      if(s1.add(s2)==false)//Will check non duplicate element in the String
      {
      System.out.println(s2);
      }
      }

      }

      }

      • sumanta

        July 10, 2018

        what is s2 here

        • Ganesh Taru

          November 20, 2019

          package set;
          
          import java.awt.List;
          import java.util.ArrayList;
          import java.util.HashSet;
          import java.util.Set;
          
          public class DisplayOnlyDuplicate {
          
          	public static void main(String[] args) {
          		ArrayList l1 = new ArrayList();
          		l1.add("call");
          		l1.add("me");
          		l1.add("call");
          		l1.add("you");
          		l1.add("me");
          		System.out.println("List: " + l1);
          		
          		Set s1= new HashSet();
          		for(Object s2:l1){
          			if(s1.add(s2)== false){
          				System.out.println(s2);
          			}
          		}
          	}
          
          }
      • Shaik Babajan

        May 16, 2019

        I hope the statement in above code should be:
        //the first line here

        for(String s2: as){
        if(s1.add(s2)==false){
        System.out.println(s2);

  3. kushal

    January 10, 2016

    how to remove the dublicate word like jaaaaavavaavavav. and show the output java.
    Thanks in advance.

  4. spandana

    January 4, 2016

    Hi Lokesh, Really your articles are best to learn , even to understand the core. I am new to java , want to understand about the way to calculate the performance. You have used something like O(N^2). Thanks in advance.

  5. javalschool

    September 12, 2015

    very easy way sir .clean explanation thank u sir it is very help full for me

  6. Aravind

    November 30, 2014

    Hi Lokesh,

    Your articles are excellent. Your website is a major base for my interview preparation. However for the benefit of people preparing for interviews like me, the complexity of the solution using Collections.frequency is O(N^2) since the complexity of the Collections.frequency by itself is O(N). Hence an alternative solution of putting the words and their count as key value pair in a LinkedHashMap will have a complexity of O(N). (LinkedHashMap is used to maintain the order just incase if you have to recreate the String removing the duplicates).

    Aravind.

    • Lokesh Gupta

      December 1, 2014

      Hi Aravind, thanks for the feedback and suggested solution.

      • Aravind

        December 21, 2014

        Thank you once again Lokesh. Your website really helped me to ace all my interviews and I finally got my dream job.

        Aravind.

        • Lokesh Gupta

          December 21, 2014

          WOW !! It’s great to hear that. Congrats buddy. And thanks for your all contribution to this blog.

  7. Srujana

    September 12, 2014

    Hi, One of the interview they asked me to write a program for the below. mentioned logic.

    input : { “abc”,”cba”,”BCa”,”bac”,”xyz”,”zyx”,”123″,”321″…} list of anagrams.

    then output should be : { “abc”,”xyz”,”123″}.

    Could you please helpw the logic behind this.

    • Lokesh Gupta

      September 12, 2014

      I have quickly written some code for this. Hope it helps:

      package com.howtodoinjava;
      
      import java.util.HashSet;
      
      public class AnagramExample
      {
         public static void main(String[] args)
         {
            String[] rawList = {&quot;abc&quot;, &quot;cba&quot;, &quot;BCa&quot;, &quot;bac&quot;, &quot;xyz&quot;, &quot;zyx&quot;, &quot;123&quot;, &quot;321&quot;};
            
            //This will store unique strings
            HashSet&lt;String&gt; uniqueStrings = new HashSet&lt;String&gt;();
            
            //Get all anagrams and sort it's chars and store in set
            for(String s : rawList){
               uniqueStrings.add(sortString(s.toLowerCase()));
            }
            
            //We got unique strings here; Great !!
            System.out.println(uniqueStrings);
         }
         
         //Function to sort all chars inside a string
         private static String sortString(String s){
            char[] c = s.toCharArray();        //Convert to array of chars 
            java.util.Arrays.sort(c);          //Sort chars
            return new String(c);             //Return sorted String
         }
      }
      
      Output:
      
      [abc, 123, xyz]
      
      • Srujana

        September 12, 2014

        Thanks Lokesh for your response.

        • Srujana

          September 12, 2014

          Hi Lokesh,

          What are the things we need to take care to improve performance of the java web application?

          • Lokesh Gupta

            September 13, 2014

            A very broad question which is not possible to answer in few sentences. I will write my thoughts in any future post dedicated to performance improvements. For now, my only suggestion is “Keep it simple, don’t over-engineer anything”.

            • Yhen Renoblas

              October 17, 2014

              Hi Lokesh, can you please help me with this problem below. please i eally need your help. I’ll be glad if you response. Thank you very much ๐Ÿ™‚

              Problem: you are about to write a program that reads in a while bunch of words and will spit out the duplicate words. for our purposes, words are to be separated by spaces, commas, ! , ? , and period. Output the duplicate words in line, in lowercase and in sorted order.

              sample input:
              how much wood would a woodchuck if a woodchuck could chuck wood? are you feeling well?

              sample output:
              a
              chuck
              well
              wood
              woodchuck
              you

  8. arun

    February 9, 2014

    It’s very easy to find duplicate word.
    thanks

  9. vikas

    January 13, 2014

    Hi Lokesh,

    I think this one is better appraoch plz suggest :

    String words[] = { “the”, “cat”, “in”, “the”, “hat” };
    HashMap wordCounts = new HashMap();
    for (String w : words) {
    Integer i = wordCounts.get(w);
    if (i == null)
    wordCounts.put(w, 1);
    else
    wordCounts.put(w, i + 1);
    }

    • Lokesh Gupta

      January 13, 2014

      I can not predict if it is better, but yes it is another good approach.

  10. sritej

    June 21, 2013

    Can we do the above example without using collection

    • Lokesh Gupta

      June 21, 2013

      In Java language, if you do not want to include any collection class then something like this will work.

      Pseudo code

      String[] words = sentence.split( sentense );
      int length = words.length();
      for(int i = 0; i < length; i++) { for(int j = 0; j < length; j++) { if(words[i].equalsIgnoreCase(words[j])) { System.out.printf("%s %s", word[i], word[j]); } } }

      • Mos

        May 4, 2014

        This will cost you O(N^2)
        a Better solution will be to sort the words list which cost O(NlogN)
        and then to run on the sorted list to find equal strings which cost O(N)
        total cost will be = O(NLOGN) + O(N) = O(NLogN)

  11. brunkb

    November 11, 2012

    List list = new ArrayList();

    list.addAll(Arrays.asList(text.split(” “)));

    I don’t think you need to create a new ArrayList when Arrays.asList does this exact thing for you:
    List stooges = Arrays.asList(“Larry”, “Moe”, “Curly”);

    A major part of the programming exercise I ask people to work on during an interview is to supply a unit test for the example code. We are finding that a lot of candidates have never written tests for any of their code and have no clue how to produce testable code. In turn, that lack of experience leads to them producing code that is impossible to maintain over the long term as well as high numbers of easy to find bugs.

    We have also discovered that although many people are able to code these small examples, they are unable to handle the complexity of the real work.

    • Admin

      November 11, 2012

      Hello, I had written that piece of code for POC purpose with usage of Collections.frequency() in mind. Still, I appreciate your feedback and have corrected the unnecessary function call.

      Further, I do agree with you on unit testing part. It is usually a weak area for programmers. I also had once, but now I write them with greater attention. I have summarized some good practices in this post. You also can take a look if interested.

      https://howtodoinjava.com/best-practices/unit-testing-best-practices-junit-reference-guide/

  12. Admin

    November 11, 2012

    In my dzone link: https://dzone.com/links/how_to_find_duplicate_words_in_a_string_in_java.html. Some asked below question:
    “As interviewer, my next question would be “great, now assume Collections.frequency didn’t exist. How would you do it then?” And strictly speaking, your answer still needs to eliminate words that have frequency 1.”

    My answer was:
    “You can ask. You are the boss… Let me make a wild guess. In my other simplest solution, I will sort the collection so that it makes similar words in sequence. Now a simple iteration will filter out all different words and their frequency.. including one also. You can choose them to ignore. I know this solution can have some performance overhead, but for that large load of data, I will go for some expert algorithm implementation. Otherwise, it is also good to go.”

    What will be your answer???????

    • padmalava

      December 5, 2012

      An approach could be after changing the text string to an arrayList, I will push them to a HashMap with the found word as and a 1 as , and will check if the key exist,if so I will increment the value to value+1. so where the value is greater than 1 are duplicates. Your comments pls..

      • Lokesh Gupta

        December 5, 2012

        This will also work. BUT, text.split(” “) will return you a String array. No need to convert to arraylist. You can iterate over array and push the elements in map, and increment the counter.
        This way we can save at least one extra object in runtime.

Comments are closed on this article!

Search Tutorials

String methods

  • String concat()
  • String hashCode()
  • String contains()
  • String compareTo()
  • String compareToIgnoreCase()
  • String equals()
  • String equalsIgnoreCase()
  • String charAt()
  • String indexOf()
  • String lastIndexOf()
  • String intern()
  • String split()
  • String replace()
  • String replaceFirst()
  • String replaceAll()
  • String substring()
  • String startsWith()
  • String endsWith()
  • String toUpperCase()
  • String toLowerCase()

String examples

  • Convert String to int
  • Convert int to String
  • Convert String to long
  • Convert long to String
  • Convert CSV String to List
  • Java StackTrace to String
  • Convert float to String
  • String – Alignment
  • String – Immutable
  • String – StringJoiner
  • Java – Split string
  • String – Escape HTML
  • String – Unescape HTML
  • String – Convert to title case
  • String – Find duplicate words
  • String – Left pad a string
  • String – Right pad a string
  • String – Reverse recursively
  • String – Leading whitespaces
  • String – Trailing whitespaces
  • String – Remove whitespaces
  • String – Reverse words
  • String – Find duplicate characters
  • String – Check empty string
  • String – Get first 4 characters
  • String – Get last 4 characters
  • String – (123) 456-6789 pattern
  • String – Interview Questions

Java Tutorial

  • Java Introduction
  • Java Keywords
  • Java Flow Control
  • Java OOP
  • Java Inner Class
  • Java String
  • Java Enum
  • Java Collections
  • Java ArrayList
  • Java HashMap
  • Java Array
  • Java Sort
  • Java Clone
  • Java Date Time
  • Java Concurrency
  • Java Generics
  • Java Serialization
  • Java Input Output
  • Java New I/O
  • Java Exceptions
  • Java Annotations
  • Java Reflection
  • Java Garbage collection
  • Java JDBC
  • Java Security
  • Java Regex
  • Java Servlets
  • Java XML
  • Java Puzzles
  • Java Examples
  • Java Libraries
  • Java Resources
  • Java 14
  • Java 12
  • Java 11
  • Java 10
  • Java 9
  • Java 8
  • Java 7

Meta Links

  • About Me
  • Contact Us
  • Privacy policy
  • Advertise
  • Guest and Sponsored Posts

Recommended Reading

  • 10 Life Lessons
  • Secure Hash Algorithms
  • How Web Servers work?
  • How Java I/O Works Internally?
  • Best Way to Learn Java
  • Java Best Practices Guide
  • Microservices Tutorial
  • REST API Tutorial
  • How to Start New Blog

Copyright © 2020 · HowToDoInjava.com · All Rights Reserved. | Sitemap

  • Java 15 New Features
  • Sealed Classes and Interfaces
  • EdDSA (Ed25519 / Ed448)