This is very common interview question where you have to find duplicate words in a string or text file. This can be solved using some overly-complex algorithms also, but in this post, I will propose rather an easy way using Java Collections API.
1. Problem
Lets say we have an String/ text like below-
“a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f”;
Above string is the result of some random hits on keyboard to make it complete unpredictable. [For fun only ๐]
We will use Collections.frequency() method to do this job. This method seems to be here only for this purpose.
Lets go directly to our solution and see it works at all.
2. Java program to find repeated words
Given below is a Java program to find the number of occurrences of each word in a sentence or String.
import java.util.Arrays; import java.util.Collections; import java.util.HashSet; import java.util.List; import java.util.Set; public class DuplicateWordSearcher { @SuppressWarnings("unchecked") public static void main(String[] args) { String text = "a r b k c d se f g a d f s s f d s ft gh f ws w f v x s g h d h j j k f sd j e wed a d f"; List<String> list = Arrays.asList(text.split(" ")); Set<String> uniqueWords = new HashSet<String>(list); for (String word : uniqueWords) { System.out.println(word + ": " + Collections.frequency(list, word)); } } } Output: ft: 1 f: 7 g: 2 d: 5 e: 1 b: 1 c: 1 a: 3 wed: 1 sd: 1 se: 1 j: 3 ws: 1 k: 2 h: 2 w: 1 v: 1 s: 4 r: 1 gh: 1 x: 1
3. Summary
In above example, we get a Java program to count how many times a word appears in a String or find duplicate words. It can help you in to find most frequent words or count repeated words in a string.
This can be a Java program to find unique words in a string, also. Just check the count which will be equal to one for unique words.
Happy Learning !!
Sukhi
Hi
I have been trying to write a script from last 2 days but can’t think of any logic. Please help me !!!
I enter name, fon n email in a form and when we click submit it goes and gets displayed on the User List page.
I want to fetch the data and check if any duplicate name or email address exists on that User List. As new enters the details it gets saved on that web page.
How do I fetch data from that particular page and say duplicate email.
I have tried but can’t think of any logic and finding very hard. Please help
Maujood
If String s = “abacdadbgbc”;
i need the output
a
and i don’t need other repeated character in my output
I need only first repeated Character as my output
Dhanashekar
ANS:
import java.util.*;
public class DisplayOnlyDuplicate {
public static void main(String[] args) {
ArrayList as=new ArrayList();
as.add(“call”);
as.add(“me”);
as.add(“call”);
as.add(“you”);
as.add(“me”);
Set s1=new HashSet();
for(String s2:s1)
{
if(s1.add(s2)==false)//Will check non duplicate element in the String
{
System.out.println(s2);
}
}
}
}
sumanta
what is s2 here
Ganesh Taru
Shaik Babajan
I hope the statement in above code should be:
//the first line here
for(String s2: as){
if(s1.add(s2)==false){
System.out.println(s2);
kushal
how to remove the dublicate word like jaaaaavavaavavav. and show the output java.
Thanks in advance.
spandana
Hi Lokesh, Really your articles are best to learn , even to understand the core. I am new to java , want to understand about the way to calculate the performance. You have used something like O(N^2). Thanks in advance.
javalschool
very easy way sir .clean explanation thank u sir it is very help full for me
Aravind
Hi Lokesh,
Your articles are excellent. Your website is a major base for my interview preparation. However for the benefit of people preparing for interviews like me, the complexity of the solution using Collections.frequency is O(N^2) since the complexity of the Collections.frequency by itself is O(N). Hence an alternative solution of putting the words and their count as key value pair in a LinkedHashMap will have a complexity of O(N). (LinkedHashMap is used to maintain the order just incase if you have to recreate the String removing the duplicates).
Aravind.
Lokesh Gupta
Hi Aravind, thanks for the feedback and suggested solution.
Aravind
Thank you once again Lokesh. Your website really helped me to ace all my interviews and I finally got my dream job.
Aravind.
Lokesh Gupta
WOW !! It’s great to hear that. Congrats buddy. And thanks for your all contribution to this blog.
Srujana
Hi, One of the interview they asked me to write a program for the below. mentioned logic.
input : { “abc”,”cba”,”BCa”,”bac”,”xyz”,”zyx”,”123″,”321″…} list of anagrams.
then output should be : { “abc”,”xyz”,”123″}.
Could you please helpw the logic behind this.
Lokesh Gupta
I have quickly written some code for this. Hope it helps:
Srujana
Thanks Lokesh for your response.
Srujana
Hi Lokesh,
What are the things we need to take care to improve performance of the java web application?
Lokesh Gupta
A very broad question which is not possible to answer in few sentences. I will write my thoughts in any future post dedicated to performance improvements. For now, my only suggestion is “Keep it simple, don’t over-engineer anything”.
Yhen Renoblas
Hi Lokesh, can you please help me with this problem below. please i eally need your help. I’ll be glad if you response. Thank you very much ๐
Problem: you are about to write a program that reads in a while bunch of words and will spit out the duplicate words. for our purposes, words are to be separated by spaces, commas, ! , ? , and period. Output the duplicate words in line, in lowercase and in sorted order.
sample input:
how much wood would a woodchuck if a woodchuck could chuck wood? are you feeling well?
sample output:
a
chuck
well
wood
woodchuck
you
arun
It’s very easy to find duplicate word.
thanks
vikas
Hi Lokesh,
I think this one is better appraoch plz suggest :
String words[] = { “the”, “cat”, “in”, “the”, “hat” };
HashMap wordCounts = new HashMap();
for (String w : words) {
Integer i = wordCounts.get(w);
if (i == null)
wordCounts.put(w, 1);
else
wordCounts.put(w, i + 1);
}
Lokesh Gupta
I can not predict if it is better, but yes it is another good approach.
sritej
Can we do the above example without using collection
Lokesh Gupta
In Java language, if you do not want to include any collection class then something like this will work.
Pseudo code
String[] words = sentence.split( sentense );
int length = words.length();
for(int i = 0; i < length; i++) { for(int j = 0; j < length; j++) { if(words[i].equalsIgnoreCase(words[j])) { System.out.printf("%s %s", word[i], word[j]); } } }
Mos
This will cost you O(N^2)
a Better solution will be to sort the words list which cost O(NlogN)
and then to run on the sorted list to find equal strings which cost O(N)
total cost will be = O(NLOGN) + O(N) = O(NLogN)
brunkb
List list = new ArrayList();
list.addAll(Arrays.asList(text.split(” “)));
I don’t think you need to create a new ArrayList when Arrays.asList does this exact thing for you:
List stooges = Arrays.asList(“Larry”, “Moe”, “Curly”);
A major part of the programming exercise I ask people to work on during an interview is to supply a unit test for the example code. We are finding that a lot of candidates have never written tests for any of their code and have no clue how to produce testable code. In turn, that lack of experience leads to them producing code that is impossible to maintain over the long term as well as high numbers of easy to find bugs.
We have also discovered that although many people are able to code these small examples, they are unable to handle the complexity of the real work.
Admin
Hello, I had written that piece of code for POC purpose with usage of Collections.frequency() in mind. Still, I appreciate your feedback and have corrected the unnecessary function call.
Further, I do agree with you on unit testing part. It is usually a weak area for programmers. I also had once, but now I write them with greater attention. I have summarized some good practices in this post. You also can take a look if interested.
https://howtodoinjava.com/best-practices/unit-testing-best-practices-junit-reference-guide/
Admin
In my dzone link: https://dzone.com/links/how_to_find_duplicate_words_in_a_string_in_java.html. Some asked below question:
“As interviewer, my next question would be “great, now assume Collections.frequency didn’t exist. How would you do it then?” And strictly speaking, your answer still needs to eliminate words that have frequency 1.”
My answer was:
“You can ask. You are the boss… Let me make a wild guess. In my other simplest solution, I will sort the collection so that it makes similar words in sequence. Now a simple iteration will filter out all different words and their frequency.. including one also. You can choose them to ignore. I know this solution can have some performance overhead, but for that large load of data, I will go for some expert algorithm implementation. Otherwise, it is also good to go.”
What will be your answer???????
padmalava
An approach could be after changing the text string to an arrayList, I will push them to a HashMap with the found word as and a 1 as , and will check if the key exist,if so I will increment the value to value+1. so where the value is greater than 1 are duplicates. Your comments pls..
Lokesh Gupta
This will also work. BUT, text.split(” “) will return you a String array. No need to convert to arraylist. You can iterate over array and push the elements in map, and increment the counter.
This way we can save at least one extra object in runtime.