Python: Find Most Frequent Element in a List or Array

In data analysis and processing tasks using Python, finding the most frequently occurring items in a sequence (such as list or array) is a common requirement. Python offers several efficient methods to find the most common items, each with its advantages and best use cases. In this tutorial, we’ll explore three such methods, and understand how they work with examples and when to use them.

MethodWhen to Use
collections.Counter()In small to medium-sized sequences.
pandasIn large sequences (preferably of numbers) where convenience and performance are important.
numpyIn large arrays containing numerical data.

1. Find the Most Frequent Element using ‘collections.Counter()

Python’s collections module provides a convenient data structure called Counter for counting hashable objects in a sequence efficiently.

When initializing a Counter object, you can pass an iterable (such as a list, tuple, or string) or a dictionary as an argument. If an iterable is provided, Counter will count the occurrences of each unique element in the iterable. We can access the count of a specific element using indices, just like a dictionary.

from collections import Counter

sequence = [1, 2, 3, 4, 1, 2, 1, 2, 1]
counter = Counter(sequence)
top_two = counter.most_common(2)  # Get top 2 most common items
print(top_two)

The program output:

[(1, 4), (2, 3)]

For sequences with a large number of unique elements, Counter remains efficient due to its underlying implementation which utilizes a dictionary-like structure.

Please beware that when trying to pass an unhashable object to Counter. Since Counter relies on hashing to count occurrences, it requires hashable objects.

sequence = [[1], [2], [1], [2], [1]]  # List of lists
counter = Counter(sequence)  # Raises TypeError: unhashable type: 'list'

To prevent this, we can convert the inner lists into tuples, as tuples are hashable in Python.

from collections import Counter

# List of lists
sequence = [[1], [2], [1], [2], [1]]

# Convert inner lists to tuples
sequence_tuples = [tuple(sublist) for sublist in sequence]

# Use Counter to count occurrences
counter = Counter(sequence_tuples)

# Find common elements
top_two = counter.most_common(2)

2. Using Pandas to Find Most Frequent Items

When using pandas, we use value_counts() function which returns a Series containing counts of unique values in descending order. By default, it excludes NA/null values.

If your sequence contains missing values (NaN), we should handle them appropriately based on the requirements. For example, we can set the dropna parameter to False to include NA/null values in the counts.

import pandas as pd

sequence_with_nan = [1, 2, 3, 4, 1, 2, 1, 2, 1, None]
series_with_nan = pd.Series(sequence_with_nan)
top_two = series_with_nan.value_counts(dropna=False).head(2)
print(top_two)

The program output:

1.0    4
2.0    3
dtype: int64

The value_counts() is primarily designed to work with numerical data. If the sequence contains non-numeric data (e.g., strings), it will still work, but we may encounter unexpected behavior if the data type doesn’t support counting (e.g., if the data is not hashable). We may need to preprocess the data or convert it to a suitable format before using it.

3. Using NumPy to Find Most Common Elements in Large Numerical Arrays

For numerical data, numpy provides efficient array-based operations. The numpy.unique() can be used to get unique elements along with their counts.

import numpy as np

sequence = np.array([1, 2, 3, 4, 1, 2, 1, 2, 1])
unique, counts = np.unique(sequence, return_counts=True)
most_common_indices = np.argsort(-counts)[:2]
most_common = [(unique[i], counts[i]) for i in most_common_indices]
print(most_common)

The program output:

[(1, 4), (2, 3)]

4. Conclusion

We can find more such ways to find the most frequent items in a list of any size in Python. We discussed 3 efficient ways for finding the most common items in a sequence.

  • For small to medium-sized sequences, collections.Counter() offers excellent performance.
  • For large datasets, especially in data analysis tasks, pandas provides a convenient interface with good performance.
  • For numerical data, numpy is the most efficient choice.

Happy Learning !!

Source Code on Github

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.