Python: Finding the Largest or Smallest N Items

In Python data processing, identifying the largest or smallest N items from a collection is a frequent task. This tutorial will explore different methods to find the largest or smallest N items using built-in sorting, heapq module, and NumPy library.

For quick reference, this table summarizes the different approaches, indicates when each approach is suitable, and provides a brief example for each scenario.

Approach/DataTypeWhen to Use?Example
SortingSmall to medium-sized datasets, simple codesorted_list = sorted(my_list)
largest_n_items = sorted_list[-3:]
smallest_n_items = sorted_list[:3]
heapq moduleLarge datasets, performance-critical applicationsl_items = heapq.nlargest(3, my_list)
s_items = heapq.nsmallest(3, my_list)
NumPy ArraysNumerical data, large datasetsl_items = np.partition(array, -3)[-3:]
s_items = np.partition(array, 3)[:3]

1. Using Sorting to Find the Largest or Smallest Items

Sorting the collection and then selecting the first or last N elements is a straightforward approach for small sequences. In this example, we sort the list and then slice it to obtain the largest or smallest N items.

my_list = [4, 7, 1, 9, 3, 5, 8]

sorted_list = sorted(my_list)

largest_n_items = sorted_list[-3:]  # Replace 3 with the desired N
smallest_n_items = sorted_list[:3]  # Replace 3 with the desired N

print(f"Largest N items: {largest_n_items}")  # 
print(f"Smallest N items: {smallest_n_items}")

The program output:

Largest N items: [7, 8, 9]
Smallest N items: [1, 3, 4]

2. Using ‘heapq‘ for Efficiency

For large datasets and performance-critical applications, the heapq (heap queues, also known as priority queues) module provides a more efficient solution using heaps. A heap is a specialized tree-based data structure that satisfies the heap property.

  • In a max heap, for any given node C with parent P, the value of P is greater than or equal to the value of C.
  • In a min heap, the value of P is less than or equal to the value of C.
  • heapq in Python uses the min heaps.

The ‘heapq.nlargest‘ and ‘heapq.nsmallest‘ functions efficiently find the largest or smallest N items, respectively, using a heap data structure. It also does not require sorting the entire collection which could be computationally expensive.

import heapq

my_list = [4, 7, 1, 9, 3, 5, 8]

largest_n_items = heapq.nlargest(3, my_list)
smallest_n_items = heapq.nsmallest(3, my_list)

print(f"Largest N items: {largest_n_items}")
print(f"Smallest N items: {smallest_n_items}")

The program output:

Largest N items: [9, 8, 7]
Smallest N items: [1, 3, 4]

3. Using NumPy Array for Large Datasets

When dealing with numeric data, NumPy provides optimized functions to find the largest or smallest N items. Its ‘partition‘ function efficiently finds the N largest or smallest items in an array.

The ‘partition‘ function efficiently rearranges the elements in the array in such a way that the values smaller than a given k-th element appear to the left, and values larger than the k-th element appear to the right. The k-th element itself takes its final sorted position.

When we call ‘np.partition(my_array, 3)‘, the partition method rearranges the elements in my_array such that the three smallest elements appear on the left side, and the rest are on the right side. The order of the elements within the left and right partitions is not necessarily sorted.

import numpy as np

my_array = np.array([4, 7, 1, 9, 3, 5, 8])

largest_n_items = np.partition(my_array, -3)[-3:]  # Replace 3 with the desired N
smallest_n_items = np.partition(my_array, 3)[:3]   # Replace 3 with the desired N

print(f"Largest N items: {largest_n_items}")
print(f"Smallest N items: {smallest_n_items}")

The program output:

Largest N items: [7 8 9]
Smallest N items: [1 3 4]

4. Conclusion

To effectively find the largest or smallest N items in a collection, Python provides various approaches as discussed above. These solutions use built-in function such as sorting, more efficient solution using heaps, and specialized libraries like NumPy. Depending on the size of the dataset and the specific requirements, we can choose the most suitable approach.

Happy Learning !!

Source Code on Github

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode