Java Stream – Find, Count and Remove Duplicates

Few simple examples to find and count the duplicates in a Stream and remove those duplicates since Java 8. We will use ArrayList to provide a Stream of elements including duplicates.

1. Stream.distinct() – To Remove Duplicates

1.1. Remove Duplicate Strings

The distinct() method returns a Stream consisting of the distinct elements of the given stream. The object equality is checked according to the object’s equals() method.

List<String> list = Arrays.asList("A", "B", "C", "D", "A", "B", "C");

// Get list without duplicates
List<String> distinctItems = list.stream()
                                    .distinct()
                                    .collect(Collectors.toList());

// Let's verify distinct elements
System.out.println(distinctItems);

Program output:

[A, B, C, D]

1.2. Remove Duplicate Custom Objects

The same syntax can be used to remove the duplicate objects from List. To do so, we need to be very careful about the object’s equals() method, because it will decide if an object is duplicate or unique.

Consider the below example where two Person instances are considered equal if both have the same id value.

public class Person 
{
    private Integer id;
    private String fname;
    private String lname;
}

Let us see an example of how we can remove duplicate Person objects from a List.

//Add some random persons
Collection<Person> list = Arrays.asList(p1, p2, p3, p4, p5, p6);

// Get distinct people by id
List<Person> distinctElements = list.stream()
        .distinct()
        .collect( Collectors.toList() );

To find all unique objects using a different equality condition, we can take the help of the following distinctByKey() method. For example, we are finding all unique objects by Person’s full name.

//Add some random persons
List<Person> list = Arrays.asList(p1, p2, p3, p4, p5, p6);

// Get distinct people by full name
List<Person> distinctPeople = list.stream()
              .filter( distinctByKey(p -> p.getFname() + " " + p.getLname()) )
              .collect( Collectors.toList() );

//********The distinctByKey() method need to be created**********

public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) 
{
  Map<Object, Boolean> map = new ConcurrentHashMap<>();
  return t -> map.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}

2. Collectors.toSet() – To Remove Duplicates

Another simple and very useful way is to store all the elements in a Set. Sets, by definition, store only distinct elements. Note that a Set stores distinct items by comparing the objects with equals() method.

Here, we cannot compare the objects using a custom equality condition.

ArrayList<Integer> numbersList
= new ArrayList<>(Arrays.asList(1, 1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8));
 
Set<Integer> setWithoutDuplicates = numbersList.stream()
.collect(Collectors.toSet());
 
System.out.println(setWithoutDuplicates);

Program output:

[1, 2, 3, 4, 5, 6, 7, 8]

3. Collectors.toMap() – To Count Duplicates

Sometimes, we are interested in finding out which elements are duplicates and how many times they appeared in the original list. We can use a Map to store this information.

We have to iterate over the list, put the element as the Map key, and all its occurrences in the Map value.

// ArrayList with duplicate elements
ArrayList<Integer> numbersList
= new ArrayList<>(Arrays.asList(1, 1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8));
 
Map<Integer, Long> elementCountMap = numbersList.stream()
.collect(Collectors.toMap(Function.identity(), v -> 1L, Long::sum));
 
System.out.println(elementCountMap);

Program output:

{1=2, 2=1, 3=3, 4=1, 5=1, 6=3, 7=1, 8=1}

Happy Learning !!

Sourcecode on Github

Leave a Reply

0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions, and frequently asked interview questions.

Our Blogs

REST API Tutorial