Python Numpy 101: A Beginners Guide

NumPy is a open-source Python library that stands for Numerical Python. NumPy is widely used for working with large, multi-dimensional arrays (ndarray), matrices of numerical data and it provides extremely fast and efficient numerical operations on arrays. NumPy is commonly employed in data science and machine learning to perform mathematical operations on large datasets.

By the end of this article, we will have a good understanding of the basics of NumPy. We will learn about the installation, NumPy data types, and why NumPy is used. Finally, we will learn about NumPy arrays in detail, data visualization using NumPy, and some of its limitations.

So, without further ado, let’s get started!

1. Why use NumPy?

NumPy arrays are more efficient and faster. Numpy has a built-in data structure called an array similar to the usual Python list, but it can store and operate on data much more efficiently.

Here are some reasons to use NumPy:

  • NumPy operations perform faster than equivalent operations on Python lists. This is because NumPy arrays are stored in memory in a way optimized for faster access.
  • NumPy provides various functions to perform common operations on arrays without loops. This makes the code easier to read and understand.
  • NumPy code is clearer because you can perform operations on arrays at once using the built-in functions, which makes the code more readable and understandable.

2. Installing NumPy

To use NumPy, we first need to install it on the system. There are several methods, but let’s take a look at two simple methods.

2.1. Installing NumPy with ‘pip

The ‘pip‘ is a Python’s package installation manager that makes it easy to install Python libraries or frameworks. If we have Python version 3.4 or higher, then Pip comes by default. Otherwise, we will need to install pip before installing NumPy.

Now, first, launch the command prompt and type the following command:

pip install numpy

Hit Enter and we will see that NumPy will start installing. Now we can use NumPy in Python programs.

After the installation is finished, we have to import NumPy into a Python program. We can either use import numpy or import numpy as np.

import numpy 

//or

import numpy as np

2.2. Installing NumPy with Anaconda

Another way to install NumPy is to install Anaconda. Anaconda is a Python distribution that provides us with access to different tools.

When we install Anaconda, it installs all the major libraries automatically. To install Anaconda, first download it using from its download page.

Now, launch it, but remember to check the following boxes:

Just click on the Install button and wait for the installation to complete. Once Anaconda is installed, we can use NumPy in the Windows command prompt, VS Code editor, or PowerShell prompt (one of the tools available in the Anaconda Navigator).

If we are going to use NumPy, it is a best practice to use it in a Jupyter Notebook. Jupyter Notebook is a web-based, interactive computing notebook.

To use Jupyter, open Anaconda Navigator in our system and open the Jupyter Notebook. We can see the Jupyter Notebook option in the image below.

Just click on the Launch button and the notebook will open on the localhost page, as we can see below.

We can click on the New button and select Python 3. We are now ready to use the Jupyter Notebook.

3. NumPy Data Types

NumPy provides a wider range of numeric data types than Python. The additional data types in NumPy are designed for numerical calculations. Here is the list of NumPy data types and the characters used to represent them:

3.1. i (integer)

The ‘i‘ is used to represent signed integer types. The number of bits used to store the integer depends on the machine. For example, on a 64-bit machine, the ‘i‘ is 64 bits wide, while on a 32-bit machine, it is 32 bits wide.

If we execute the below code, the output will differ on various machines. On my machine, the output is: ‘int32‘, which indicates that the array is of a 32-bit signed integer type. On other machines, the output may be int64 or vary. Here, the .dtype attribute is used to print the data type.

import numpy as np

arr = np.array([1, 2, 3])
print(arr.dtype)   # Prints 'int64'

We can also explicitly specify the data type when creating an array. For example, the following code will create an array of 64-bit integers.

import numpy as np

arr = np.array([1, 2, 3], dtype=np.int32)
print(arr.dtype)  # Prints 'int32'

3.2. b (boolean)

The ‘b‘ represents Boolean values which can either be True or False. Let’s create an array of Boolean values and print its data type.

import numpy as np

arr = np.array([True, False, True])
print(arr.dtype)  # Prints 'bool'

3.3. u (unsigned integer)

The ‘u‘ represents unsigned integer data types. Unsigned integers can only store positive values. The size of the unsigned integer depends on the machine. For example, the below code will create a numpy array of 8-bit unsigned integers.

import numpy as np

arr = np.array([1, 2, 3], dtype=np.uint8)
print(arr.dtype)  # Prints 'uint8'

If we try to include the negative values, then we will get an error.

import numpy as np

arr = np.array([1, 2, 3, -4], dtype=np.uint8)
print(arr.dtype)  # Error: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of -1 to uint8 will fail in the future.

3.4. f (float)

The ‘f‘ represents floating point numbers. The precision of floating point numbers depends on the platform. It may be 16-bit floating point numbers, 32-bit, and 64-bit. For example, if we run the below code, the result will vary. It may be 32-bit, 64-bit, etc. Like in my case, it’s float64.

import numpy as np

arr = np.array([1.1, 2.2, 3.3])
print(arr.dtype)  # Prints 'float64'

You can also create floating point numbers of specific bits by mentioning their data type, as shown below, we are creating an array of 32-bit floating point numbers.

import numpy as np

arr = np.array([1.1, 2.2, 3.3], dtype="float32")
print(arr.dtype)  # Prints 'float32'

3.5. c (complex float)

The ‘c‘ represents complex numbers and is often denoted as complex64 or complex128. Complex numbers have both real and imaginary parts. In complex64, both real and imaginary part is represented using a 32-bit floating point number, and in complex128, each part is represented using a 64-bit floating point number.

We can access the real and imaginary parts using the numpy built-in functions: .real and .imag. Here is an example:

import numpy as np

array1 = np.array([1 + 2j, 3 - 4j], dtype=np.complex64)
array2 = np.array([1.5 + 2.5j, -3.5 - 4j], dtype=np.complex128)

print(array1)
print(array2)

print("Real part of Array 1:", array1.real)
print("Imaginary part of Array 1:", array1.imag)

The program output:

[1.+2.j 3.-4.j]
[ 1.5+2.5j -3.5-4.j ]

Real part of Array 1: [1. 3.]
Imaginary part of Array 1: [ 2. -4.]

3.6. m (timedelta64)

The ‘m‘ represents time delta which means time durations or intervals. It also allows us to perform arithmetic operations on time intervals such as adding different time durations.

In the following code, we create four-time delta objects that show day, hour, minute, and second.

import numpy as np

time_day = np.timedelta64(4, 'D') 
time_hr = np.timedelta64(7, 'h')
time_min = np.timedelta64(60, 'm')
time_sec = np.timedelta64(120, 's')

print(time_day, time_hr, time_min, time_sec)  # Prints '4 days 7 hours 60 minutes 120 seconds'

3.7. M (datetime64)

The ‘M‘ represents date and time. When we create a datetime64 object, It holds the year, month, day, hour, minute, second, and even fractions of a second. For example, consider the following datetime:

np.datetime64('2023-08-18T12:30:45.500')

In this case, datetime holds the date: ‘August 18th, 2023, and time: 12:30:00.50‘. We can then perform various operations with this datetime object, such as comparing it to other date-times, calculating time intervals, and more.

In the below code, we have calculated the duration by subtracting the start time from the end time.

import numpy as np

start = np.datetime64('2023-08-19T11:00:00')
end = np.datetime64('2023-08-21T16:40:00')

event_duration = end - start
print(event_duration) # Prints '193200 seconds'

4. Creating Arrays with NumPy

NumPy arrays are similar to Python lists, except lists can store elements of different data types whereas all of the elements in a NumPy array should be homogeneous.

There are several ways to create NumPy arrays. We will explore some of these methods below.

4.1. Creating an Empty Array

An empty array allocates memory for the array elements without initializing them to any particular value. It’s crucial to recognize that the array’s elements are uninitialized and may retain previous memory values.

We can create an empty array using the np.empty() function. In the following example, the shape (3, 4) specifies that the array should have 3 rows and 4 columns

import numpy as np

empty_array = np.empty((3, 4)) # Creates a 3x4 empty array

4.2. Creating N-d Arrays

NumPy provides following inbuilt methods to create arrays:

  • array()
  • arrange()
  • zeros()
  • ones()
  • linespace()

Let us learn how to use these methods in brief.

4.2.1. numpy.array()

The numpy.array() function creates an array from any iterable object, such as a list, tuple, or range. For example, the following code creates an array from a list:

import numpy as np

list1 = [1, 2, 3, 4]
array1 = np.array(list1)

print(array1) # Prints: [1 2 3 4]

We can also create multi-dimensional arrays by providing nested lists or tuples:

# Create a 2D NumPy array from a nested Python list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_2d_array = np.array(matrix)

# Create a 3D NumPy array from nested Python lists
cube = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
my_3d_array = np.array(cube)

4.2.2. numpy.arrange()

The numpy.arrange() creates one-dimensional arrays of evenly spaced numbers. It takes three arguments:

  • start: The starting value of the array.
  • stop: The ending value of the array, is not included.
  • step: The step size between elements in the array.
# Create an array of integers from 0 to 9 (exclusive)
arr1 = np.arange(10)
print(arr1)  # Output: [0 1 2 3 4 5 6 7 8 9]

# Create an array of even integers from 2 to 10 (exclusive)
arr2 = np.arange(2, 10, 2)
print(arr2)  # Output: [2 4 6 8]

# Create an array of floating-point numbers from 0.0 to 1.0 (exclusive) with a step of 0.1
arr3 = np.arange(0.0, 1.0, 0.1)
print(arr3)  # Output: [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]

We can use arrange() with other NumPy functions to create multi-dimensional arrays if desired. For example, we are using reshape() to convert a single-dimension array to 2-D array.

# Create a one-dimensional array of 12 elements
arr1d = np.arange(12)

# Reshape it into a 3x4 two-dimensional array
arr2d = arr1d.reshape(3, 4)

print(arr2d)

# Output

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

4.2.3. numpy.zeros()

The numpy.zeros() creates an array of zeros. It takes one argument: shape (size of the array).

array3 = np.zeros(8)

print(array3) # Output: [0. 0. 0. 0. 0. 0. 0. 0.]

To create multi-dimensional arrays, we can specify the shape by passing a tuple containing the desired dimensions.

# Create a 2D array with dimensions 3x4 filled with zeros
zeros_2d = np.zeros((3, 4))

# Create a 3D array with dimensions 2x3x2 filled with zeros
zeros_3d = np.zeros((2, 3, 2))

# Create a 4D array with dimensions 2x3x2x2 filled with zeros
zeros_4d = np.zeros((2, 3, 2, 2))

4.2.4. numpy.ones()

The numpy.ones() creates an array of ones. It takes one argument: shape (size of the array).

array4 = np.ones(8)

print(array4) # Output: [1. 1. 1. 1. 1. 1. 1. 1.]

Similar to zeros(), we can specify the dimensions for creating a multi-dimensional arrays.

# Create a 2D array with dimensions 3x4 filled with ones
ones_2d = np.ones((3, 4))

# Create a 3D array with dimensions 2x3x2 filled with ones
ones_3d = np.ones((2, 3, 2))

# Create a 4D array with dimensions 2x3x2x2 filled with ones
ones_4d = np.ones((2, 3, 2, 2))

4.2.5. numpy.linspace()

The numpy.linspace() creates an array of evenly spaced numbers over a specified interval. The linspace() function takes four arguments:

  • start: The starting value of the array.
  • stop: The ending value of the array, inclusive.
  • num: The number of elements in the array.
  • endpoint: Whether to include the stop value in the array.
array5 = np.linspace(0, 1, 5)

print(array5) # Output: [0. 0.2 0.4 0.6 0.8]

We can use the reshape() for converting an array created with linespace() to N-dimensional array.

# Create a 2D array with 4 rows and 3 columns, with values evenly spaced from 0 to 1 (inclusive)
arr_1d = np.linspace(0, 1, 12)  # 12 values
arr_nd = arr_1d.reshape(4, 3)  # Reshape into a 2D array

4.3. Loading Arrays from Files

NumPy provides functions like numpy.loadtxt(), numpy.genfromtxt(), and numpy.load() for reading data from text files, CSV files, and NumPy binary files, respectively.

The numpy.load() function loads arrays from ‘.npy’ files (NumPy Binary Files), previously saved with its own binary format using numpy.save().

import numpy as np

arr = np.array([1, 2, 3, 4])
np.save('my_array.npy', arr)

array = np.load("my_array.npy")
print(array) # Output: [1, 2, 3, 4]

The numpy.genfromtxt() or numpy.loadtxt() functions load arrays from text files or CSV files, generally, previously saved using numpy.savetxt(). It takes the following arguments:

  • filename: The name of the file to load.
  • delimiter: The delimiter is used to separate the values in the file.
  • dtype: The data type of the values in the file.
  • skip_header: The number of lines to skip at the top of the file.
  • comments: A character to identify comment lines in the file.
import numpy as np

arr = np.array([1, 2, 3, 4])
np.savetxt('my_array.txt', arr)

array = np.genfromtxt("my_array.txt", delimiter=",")
print(array) # Output: [1, 2, 3, 4]

5. Constants and Attributes

Constants are predefined values that can be used without having to define them first. Attributes are properties that can be accessed using the dot notation.

5.1. Constants

NumPy provides several important mathematical constants that can be accessed for various calculations. Here are some of the most commonly used constants:

ConstantDescription
np.piThe mathematical constant pi (π), which is approximately equal to 3.141592653589793
np.eThe mathematical constant e, which is approximately equal to 2.718281828459045
np.infPositive infinity
np.nanNot a Number (NaN)

5.2. Attributes

NumPy arrays have several attributes that provide information about the array’s properties. Here are some of the most important attributes:

AttributeDescription
.shapeTuple indicates the dimensions of the array
.dtypeData type of the array elements
.sizeTotal number of elements in the array
.ndimNumber of dimensions (axes) of the array
Here’s the code:
import numpy as np

print(np.pi)  # 3.141592653589793

print(np.e)  # 2.718281828459045

print(np.inf)  # inf

print(np.nan)  # nan

array = np.array([1, 2, 3])
print(array.dtype)  # int32

print(array.shape)  # (3,)

print(array.size)  # 3

print(array.ndim)  # 1

6. Working with NumPy Arrays

We can perform various operations like addition, subtraction, and multiplication, using statistical functions, comparing two arrays, performing matrix operations, set operations, and much more on NumPy arrays.

6.1. Add, Subtract, Multiply and Divide Arrays

We can perform arithmetic operations on NumPy arrays using the standard arithmetic operators (+, -, *, /). The results of these operations will be arrays of the same data type as the input arrays.

import numpy as np

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

add_res = array1 + array2
subtract_res = array1 - array2
multiply_res = array1 * array2
divide_res = array1 / array2

print(add_res, subtract_res, multiply_res, divide_res)

Here’s the output:

[5 7 9] [-3 -3 -3] [ 4 10 18] [0.25 0.4  0.5 ]

6.2. Statistical Functions

NumPy has a bunch of statistical functions that can summarize data. Here are some of them:

  • mean(): Returns the mean of an array of values.
  • median(): Returns the middle value in an array when it is sorted in increasing or decreasing order.
  • min(): Returns the smallest value in an array.
  • max(): Returns the largest value in an array.
  • std(): Returns the standard deviation of the values in an array.
  • var(): Returns the variance of the values in an array.
import numpy as np

data = np.array([11, 13, 17, 20, 23])

mini = np.min(data)
maxi = np.max(data)
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)

print(mini)
print(maxi)
print(mean)
print(median)
print(std_dev)
print(variance)

Here’s the output:

11
23   
16.8 
17.0 
4.4  
19.36

6.3. Comparing Two Arrays

We can use the comparison operators (==, !=, <, >, <=, >=) to compare two NumPy arrays element-wise. The results of these comparisons will be Boolean arrays.

import numpy as np

arr1 = np.array([2, 3, 4])
arr2 = np.array([4, 3, 2])

more_than = arr1 > arr2
equal_to = arr1 == arr2

print(more_than)
print(equal_to)

Here’s the result:

[False False  True]
[False  True False]

6.4. Manipulating Strings Stored in Arrays

NumPy provides a set of string functions that can be used to manipulate strings stored in arrays. Some functions include:

  • lower(): Converts all the characters in a string to lowercase.
  • upper(): Converts all the characters in a string to uppercase.
  • str_len(): To find the length of all the strings.
import numpy as np

names = np.array(['Alice', 'Bob', 'Charlie'])

uppercase = np.char.upper(names)
lowercase = np.char.lower(names)
length = np.char.str_len(names)

print(uppercase)
print(lowercase)
print(length)

Here’s the result:

['ALICE' 'BOB' 'CHARLIE']
['alice' 'bob' 'charlie']
[5 3 7]

6.5. Matrix Operations

NumPy has lots of functions for working with matrix arrays. Some of these functions are:

  • dot(): calculates a matrix’s dot product.
  • transpose(): transposes a matrix.
  • inverse(): calculates a matrix’s inverse.
import numpy as np

matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

result_dot = np.dot(matrix1, matrix2)

print("Dot Product:")
print(result_dot)

transpose_matrix = matrix1.T

print("Transposed Matrix:")
print(transpose_matrix)

inverse_matrix = np.linalg.inv(matrix1)
print("Inverse Matrix:")
print(inverse_matrix)

Here’s the result:

Dot Product:
[[19 22]
 [43 50]]
Transposed Matrix:
[[1 3]
 [2 4]]
Inverse Matrix:   
[[-2.   1. ]      
 [ 1.5 -0.5]] 

6.6. Set Operations

We can do set operations on arrays as well. These include:

  • union(): returns the union of two arrays.
  • intersection(): returns the intersection of two arrays.
  • difference(): returns the difference between two arrays.
import numpy as np

set1 = np.array([1, 2, 3, 4])
set2 = np.array([3, 4, 5, 6])

union = np.union1d(set1, set2)
intersection = np.intersect1d(set1, set2)
difference = np.setdiff1d(set1, set2)

print(union)
print(intersection)
print(difference)

Here’s the result:

[1 2 3 4 5 6]
[3 4]
[1 2]

6.7. Vectorization

A vectorization involves applying a function to each element of an array. Consider the following example where we apply the square root and sin function on each element of the array.

import numpy as np

arr = np.array([4, 9, 16, 25])

result_sqrt = np.sqrt(arr)
result_sin = np.sin(arr)

print("\nElement-wise Square Root:")
print(result_sqrt)
print("\nElement-wise Sine:")
print(result_sin)

Here’s the result:

Element-wise Square Root:
[2. 3. 4. 5.]

Element-wise Sine:
[-0.7568025   0.41211849 -0.28790332 -0.13235175]

7. Error Handling

NumPy provides several functions that can be used to handle errors, including:

  • try/except: handles errors that occur in the code.
  • assert: checks for conditions that should never be true.
  • isnan(): checks if a value is NaN (Not a Number).
import numpy as np

# Assert Statement
x = np.array([1, 2, 3])
y = np.array([1, 2, 4])

try:
    assert len(x) == len(y), "Arrays must have the same length"
except AssertionError as e:
    print("Assertion Error:", e)
else:
    print("Lenghts are same")

# isnan() Function
z = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
is_nan = np.isnan(z)

print("\nOriginal Array:")
print(z)
print("\nNaN Mask:")
print(is_nan)

Here’s the result:

Lenghts are same

Original Array:
[ 1. nan  3. nan  5.]

NaN Mask:
[False  True False  True False]

8. Data Visualization

Next, we will dive into data visualization using NumPy.

First, we need to install matplotlib and import it. We should also open a Jupyter notebook so we can run all the code and see the actual visualization. NumPy has a number of cool ways to show data visually, such as line plots, scatter plots, bar graphs, and histograms. Visualizing data helps us quickly understand large data sets.

8.1. Line Plot

In NumPy, a line plot displays data as a series of points connected by a line. The plot() function is used to line plot the data, and it takes two arguments: the x-coordinates and the y-coordinates.

Let’s see an example.

import numpy as np
import matplotlib.pyplot as plt

fruit = np.array(["Apple", "Banana", "Orange", "Grapes", "Mango", "Strawberry"])
weight = np.array([150, 120, 180, 85, 200, 50])

plt.plot(fruit, weight)
plt.show()

Here’s the result:

In this case, we’ve used plot() to plot the data. The x and y coordinates are set according to the fruit and weight arrays.

8.2. Scatter Plot

The scatter plot displays data as a collection of points. Use the scatter() function to plot the data.

import numpy as np
import matplotlib.pyplot as plt

fruit = np.array(["Apple", "Banana", "Orange", "Grapes", "Mango", "Strawberry"])
weight = np.array([150, 120, 180, 85, 200, 50])

plt.scatter(fruit, weight)
plt.show()

Here’s the result:

So, in this example, we used the scatter() function to plot the data points we had. We just passed the fruit and weight as the x and y coordinates. But, in a scatter plot, we can also pass the c and s arguments to set the color and size of the points. For example,

import numpy as np
import matplotlib.pyplot as plt

fruit = np.array(["Apple", "Banana", "Orange", "Grapes", "Mango", "Strawberry"])
weight = np.array([150, 120, 180, 85, 200, 50])


colors = np.array([1, 2, 3, 4, 5, 6])
sizes = np.array([21, 41, 61, 81, 101, 121])

plt.scatter(fruit, weight, c=colors, s=sizes)
plt.show()

Here’s the result:

8.3. Bar Graph

Bar graphs are like rectangular boxes that show data. NumPy has bar() function that we can use to plot data in a bar graph.

For example,

import numpy as np
import matplotlib.pyplot as plt

fruit = np.array(["Apple", "Banana", "Orange", "Grapes", "Mango", "Strawberry"])
weight = np.array([150, 120, 180, 85, 200, 50])

plt.bar(fruit, weight)
plt.title('Bar Graph')
plt.show()

Here’s the result:

Here, we have used the bar() function to plot the bar graph and pass two arrays, fruit and weight, as its arguments.

8.4. Histogram

NumPy uses hist() to create histograms. Here’s an example.

import numpy as np
import matplotlib.pyplot as plt

weight = np.array([0.6, 1.8, 2.2, 2.5])

plt.hist(weight)
plt.show()

Here’s the result:

9. Advantages of NumPy

Let’s discuss some of the great advantages of NumPy:

  • NumPy arrays use less memory. NumPy’s arrays are more compact in size than Python lists.
  • The speed is also great. NumPy arrays perform computations faster than Python lists. The NumPy library uses the BLAS (Basic Linear Algebra Subroutines) library as its backend.
  • In comparison with Python lists, NumPy is more efficient and faster at performing mathematical calculations on arrays and matrices.
  • It is open source and all features can be accessed for free.
  • In Numpy arrays, there are various functions, methods, and variables, which simplify the computation of matrices.

10. Limitations of NumPy

Apart from advantages, there are also some limitations.

  • NumPy can have a steep learning curve, especially for beginners who are not familiar with array programming concepts.
  • NumPy arrays can consume more memory than Python lists because they store additional metadata and type information with each element. This can lead to memory problems, especially in systems with limited memory.
  • NumPy arrays do not have a built-in way to represent missing values (NaN). This can be a problem for data analysis tasks requiring missing values handling.
  • NumPy arrays require all elements to be of the same data type. This can limit their use for handling data structures that contain different types of data.

11. Conclusion

In this Python tutorial, we have discussed how to get started with NumPy. We took a look at the data types of NumPy, and how to use NumPy for data visualization, followed by the advantages and limitations of NumPy.

Happy Learning!

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode