Dive into Data Science: A Beginner’s Guide to NumPy in Python

Angel B
5 min readJust now

--

NumPy (Numerical Python) is an essential library for anyone working with data in Python. Whether you’re dealing with scientific computing, machine learning, or just need to manipulate large datasets, NumPy’s powerful array operations and mathematical functions can make your job a lot easier. This guide covers the basics to get you started with NumPy and help you harness its full potential.

Creating NumPy Arrays: The Foundation of Data Manipulation

Empty Array:

import numpy as np
arr1 = np.array([]) # Creates an empty array
print(arr1, type(arr1)) # Output: [] <class 'numpy.ndarray'>

Array from a List:

list1 = [2, 4, 7, 1, 3]
arr2 = np.array(list1) # Converts a Python list to a NumPy array
print(arr2, type(arr2), arr2.dtype) # Output: [2 4 7 1 3] <class 'numpy.ndarray'> int64

Understanding NumPy Array Properties

Knowing the properties of arrays helps you efficiently manipulate them. Here are a few important attributes:

  • Size: Total number of elements in the array.
  • Dimensions (ndim): Number of dimensions (1D, 2D, 3D, etc.).
  • Shape: Shape of the array (rows and columns).

Example:

arr2.size  # Output: 5
arr2.ndim # Output: 1 (since it's a 1D array)
arr2.shape # Output: (5,)

2D Array:

nested_list = [[1, 5], [2, 6], [5, 7]]
arr3 = np.array(nested_list)
print(arr3, arr3.ndim, arr3.shape, arr3.size) # Output: 2D array info

3D Array

nested_list2 = [[[2, 4, 5], [3, 6, 1]], [[7, 14, 25], [13, 26, 11]]]
arr4 = np.array(nested_list2)
print(arr4, arr4.ndim, arr4.shape, arr4.size) # Output: 3D array info

Array Creation Functions: Efficient Data Generation

Using arange():

a1 = np.arange(10)  # Creates an array from 0 to 9
a2 = np.arange(5, 24) # Creates an array from 5 to 23
a3 = np.arange(2, 20, 2) # Creates an array from 2 to 19 with a step size of 2

Using linspace():

a4 = np.linspace(1, 3, 50)  # 50 evenly spaced elements between 1 and 3
a5, step = np.linspace(23, 100, 20, retstep=True) # 20 elements with step size, returns tuple
print(a5, step)

Zeros and Ones:

a6 = np.zeros(3, dtype=int)  # 1D array with zeros
a7 = np.zeros((3, 4)) # 2D array of zeros
a8 = np.ones(5, dtype=int) # 1D array with ones
a9 = np.ones((2, 2, 5)) # 3D array of ones

Identity Matrix:

a10 = np.eye(5, dtype=int)  # Identity matrix of size 5x5
a11 = np.eye(3) # Identity matrix of size 3x3

Reshaping Arrays: Tailoring Data Structures to Fit Your Needs

NumPy provides powerful ways to change the shape of your data without changing its contents. Note that when reshaping, the total number of elements must remain constant.

arr3_reshape = arr3.reshape((2, 3))  # Reshape 1D to 2D
arr2_reshaped = arr2.reshape((5, 1)) # Convert a 1D array to a 5x1 array
arr4_reshaped = arr4.reshape((1, 3, 4)) # Reshape 3D array

Broadcasting: Making Element-Wise Operations Efficient

Broadcasting is a powerful feature in NumPy that allows you to perform element-wise operations on arrays of different shapes. NumPy automatically expands the smaller array to match the shape of the larger one, making these operations faster and more memory efficient.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a + b # Broadcasting happens here
print(result) # Output: [5 7 9]

Random Array Generation: Introducing Randomness to Your Data

Sometimes, you need random data for testing or simulations. NumPy makes it easy to generate random arrays:

Using rand():

Generates random values between 0 and 1.

a12 = np.random.rand(10)  # 10 random values between 0 and 1
sorted_a12 = np.sort(a12) # Sort the array in ascending order

Using randn():

Generates random values from a standard normal distribution (mean = 0, std = 1).

a13 = np.random.randn(20)  # 20 random values from a standard normal distribution

Using randint():

Generates random integers in a specified range.

a15 = np.random.randint(1, 100, 8)  # 8 random integers between 1 and 100

Indexing and Slicing: Extracting Data with Precision

1D Array:

arr = np.arange(5, 100, 5)
print(arr[8]) # Access the 8th element
print(arr[4:18:2]) # Extract elements from index 4 to 18 with a step of 2

Filtering:

arr2 = np.random.randint(1, 100, 20)
filter_array = arr2[arr2 < 50] # Extract elements less than 50

Using where():

indices = np.where(arr2 > 50)  # Find indices where values are greater than 50

2D Array Indexing:

new_arr = arr5.reshape(3, 4)
print(new_arr[1, 3]) # Access element at row 1, column 3
print(new_arr[:, 0:2]) # Slice the first two columns

3D Array Indexing:

num_3d = np.array(nested_list2)
print(num_3d[1, 1, 1]) # Access element at depth 1, row 1, column 1

Fancy Indexing: Accessing Multiple Elements Simultaneously

Fancy indexing allows you to select multiple elements or rows/columns in one go.

reshape_arr = np.arange(1, 21).reshape(5, 4)
print(reshape_arr[[3, 1, 4]]) # Fetch rows 3, 1, and 4
print(reshape_arr[:, [3, 0, 1]]) # Fetch columns 3, 0, and 1

Array Operations: Arithmetic and Mathematical Functions

NumPy supports element-wise operations that can be applied directly to arrays. These operations include addition, subtraction, multiplication, and division.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # Output: [5 7 9]
print(a * b) # Output: [4 10 18]

You can also perform operations on the entire array, like sum(), mean(), std(), min(), max(), etc.

print(np.sum(a))  # Sum of all elements in array a
print(np.mean(a)) # Mean of array a

Aggregating Functions and Axis Operations

Understanding how NumPy aggregates data along different axes (rows, columns, etc.) is crucial.

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr, axis=0)) # Sum of each column: [5 7 9]
print(np.sum(arr, axis=1)) # Sum of each row: [6 15]

Array Manipulation: Stacking, Splitting, and Concatenation

Concatenation:

You can combine arrays using np.concatenate().

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.concatenate((arr1, arr2))) # Output: [1 2 3 4 5 6]

Stacking:

Use np.vstack() (vertical stack) or np.hstack() (horizontal stack).

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.vstack((arr1, arr2))) # Vertical stack
print(np.hstack((arr1, arr2))) # Horizontal stack

Splitting:

Use np.split() to split an array into multiple sub-arrays.

arr = np.array([1, 2, 3, 4, 5, 6])
print(np.split(arr, 3)) # Split into 3 sub-arrays

Linear Algebra with NumPy

NumPy provides powerful tools for linear algebra, including dot products, matrix multiplication, and eigenvalues/eigenvectors.

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b)) # Matrix multiplication

For matrix inversion, determinant, etc.:

print(np.linalg.inv(a))  # Inverse of matrix
print(np.linalg.det(a)) # Determinant of matrix

Advanced Indexing: Boolean Indexing

In addition to fancy indexing, you can use Boolean arrays to index your data.

arr = np.array([1, 2, 3, 4, 5])
mask = arr > 3 # Boolean mask where condition is true
print(arr[mask]) # Output: [4 5]

Memory Layout and Performance Optimization

  • NumPy uses a contiguous block of memory for arrays, which is much more efficient than Python lists. Understanding memory layouts and optimizing performance using the np.reshape() method or changing dtype can make your code run faster.
  • You can also use np.copy() to avoid unwanted modifications to the original data when working with slices or indexing.

NumPy is an incredibly powerful tool for data manipulation in Python. With its rich set of functions and array-handling capabilities, it’s an essential library for anyone working in scientific computing, machine learning, or data analysis. This guide provides a strong foundation, but the true potential of NumPy can only be realized through practice. Keep exploring, and you’ll soon master its vast array of features!

--

--

Angel B
Angel B

Written by Angel B

Aspiring Data Scientist | AI & ML Enthusiast | Computer Science Graduate | Passionate About Solving Real-World Problems | Exploring Life, Learning, and Growth.

No responses yet