Python NumPy Basics

Introduction

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It's an essential library to understand before diving into PyTorch, as PyTorch's tensor operations are heavily inspired by NumPy's array operations. NumPy provides support for large, multi-dimensional arrays and matrices, along with a vast collection of high-level mathematical functions to operate on these arrays efficiently.

In this tutorial, we'll explore the basic concepts of NumPy that will serve as a foundation for your PyTorch journey.

Why NumPy?

Before we dive into NumPy's functionality, let's understand why it's so important:

Performance: NumPy operations are executed in pre-compiled C code, making them much faster than equivalent Python code.
Memory Efficiency: NumPy arrays are more compact than Python lists.
Convenience: NumPy provides powerful tools for array manipulation and mathematical operations.
Foundation for PyTorch: PyTorch's tensor operations mimic NumPy's array operations, making the transition seamless.

Getting Started with NumPy

Installation

If you haven't installed NumPy yet, you can install it using pip:

pip install numpy

Importing NumPy

To use NumPy in your Python code, import it like this:

import numpy as np  # 'np' is the conventional alias for NumPy

NumPy Arrays

NumPy's main object is the ndarray (N-dimensional array), which is a fast, flexible container for large datasets. Let's learn how to create and manipulate arrays.

Creating NumPy Arrays

There are several ways to create NumPy arrays:

From Python Lists

import numpy as np

# Create a 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d)
# Output: [1 2 3 4 5]

# Create a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

Using NumPy Functions

# Create an array of zeros
zeros_arr = np.zeros((3, 4))
print(zeros_arr)
# Output:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

# Create an array of ones
ones_arr = np.ones((2, 3))
print(ones_arr)
# Output:
# [[1. 1. 1.]
#  [1. 1. 1.]]

# Create an array with a specific value
full_arr = np.full((2, 2), 7)
print(full_arr)
# Output:
# [[7 7]
#  [7 7]]

# Create an identity matrix
identity = np.eye(3)
print(identity)
# Output:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Create evenly spaced values
linspace_arr = np.linspace(0, 10, 5)  # 5 values between 0 and 10
print(linspace_arr)
# Output: [ 0.   2.5  5.   7.5 10. ]

# Create arrays with random values
random_arr = np.random.random((2, 2))  # Random values between 0 and 1
print(random_arr)
# Output (example):
# [[0.42365602 0.18741356]
#  [0.79473463 0.53475251]]

# Random integers
random_int = np.random.randint(1, 10, size=(3, 3))  # Random integers between 1 and 10
print(random_int)
# Output (example):
# [[7 2 9]
#  [3 6 1]
#  [8 5 2]]

Array Attributes

NumPy arrays have several useful attributes:

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(f"Array shape: {arr.shape}")  # Dimensions of the array
# Output: Array shape: (2, 3)

print(f"Data type: {arr.dtype}")  # Data type of elements
# Output: Data type: int64

print(f"Number of dimensions: {arr.ndim}")  # Number of dimensions
# Output: Number of dimensions: 2

print(f"Total elements: {arr.size}")  # Total number of elements
# Output: Total elements: 6

print(f"Element size in bytes: {arr.itemsize}")  # Size of each element in bytes
# Output: Element size in bytes: 8

print(f"Total memory used: {arr.nbytes} bytes")  # Total memory used
# Output: Total memory used: 48 bytes

Array Indexing and Slicing

Basic Indexing

Accessing array elements works similar to Python lists, but with added dimensions:

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Get element at position (row=1, col=2)
print(arr[1, 2])  # Equivalent to arr[1][2]
# Output: 7

# Get first row
print(arr[0])
# Output: [1 2 3 4]

# Get specific element in the first row
print(arr[0, 2])
# Output: 3

Array Slicing

Slicing works like in Python lists, but extended to multiple dimensions:

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Slice first 2 rows and columns 1 to 3
print(arr[0:2, 1:3])
# Output:
# [[2 3]
#  [6 7]]

# All rows, specific columns
print(arr[:, [0, 2]])
# Output:
# [[ 1  3]
#  [ 5  7]
#  [ 9 11]]

# Reverse an array
print(arr[::-1, ::-1])
# Output:
# [[12 11 10  9]
#  [ 8  7  6  5]
#  [ 4  3  2  1]]

Array Manipulation

NumPy provides many functions to manipulate arrays:

Reshaping Arrays

arr = np.arange(12)  # Create array from 0 to 11
print(arr)
# Output: [ 0  1  2  3  4  5  6  7  8  9 10 11]

# Reshape to 3x4 matrix
reshaped = arr.reshape(3, 4)
print(reshaped)
# Output:
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Flatten a multi-dimensional array
flattened = reshaped.flatten()
print(flattened)
# Output: [ 0  1  2  3  4  5  6  7  8  9 10 11]

Joining Arrays

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Stack horizontally (column-wise)
horizontally_stacked = np.hstack((a, b))
print(horizontally_stacked)
# Output:
# [[1 2 5 6]
#  [3 4 7 8]]

# Stack vertically (row-wise)
vertically_stacked = np.vstack((a, b))
print(vertically_stacked)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Concatenate along a specific axis
concatenated = np.concatenate((a, b), axis=0)  # Same as vstack
print(concatenated)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

Splitting Arrays

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Split horizontally (column-wise)
hsplit = np.hsplit(arr, 2)  # Split into 2 equal parts
print(hsplit[0])
# Output:
# [[ 1  2]
#  [ 5  6]
#  [ 9 10]]

print(hsplit[1])
# Output:
# [[ 3  4]
#  [ 7  8]
#  [11 12]]

# Split vertically (row-wise)
vsplit = np.vsplit(arr, 3)  # Split into 3 equal parts
print(vsplit[0])
# Output: [[1 2 3 4]]

print(vsplit[1])
# Output: [[5 6 7 8]]

Array Operations

NumPy provides a wide range of mathematical operations that can be performed on arrays.

Basic Operations

a = np.array([10, 20, 30, 40])
b = np.array([1, 2, 3, 4])

# Element-wise addition
print(a + b)
# Output: [11 22 33 44]

# Element-wise subtraction
print(a - b)
# Output: [9 18 27 36]

# Element-wise multiplication
print(a * b)
# Output: [10 40 90 160]

# Element-wise division
print(a / b)
# Output: [10. 10. 10. 10.]

# Element-wise power
print(b ** 2)
# Output: [ 1  4  9 16]

# Universal functions (ufuncs)
print(np.sqrt(a))
# Output: [3.16227766 4.47213595 5.47722558 6.32455532]

print(np.exp(b))
# Output: [ 2.71828183  7.3890561  20.08553692 54.59815003]

Statistical Operations

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(f"Sum of all elements: {np.sum(arr)}")
# Output: Sum of all elements: 45

print(f"Mean value: {np.mean(arr)}")
# Output: Mean value: 5.0

print(f"Standard deviation: {np.std(arr)}")
# Output: Standard deviation: 2.581988897471611

print(f"Min value: {np.min(arr)}")
# Output: Min value: 1

print(f"Max value: {np.max(arr)}")
# Output: Max value: 9

# Operations along a specific axis
print(f"Sum of each column: {np.sum(arr, axis=0)}")
# Output: Sum of each column: [12 15 18]

print(f"Mean of each row: {np.mean(arr, axis=1)}")
# Output: Mean of each row: [2. 5. 8.]

Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing operations:

# Adding a scalar to an array
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr + 10)
# Output:
# [[11 12 13]
#  [14 15 16]]

# Adding a vector to each row
row_vector = np.array([10, 20, 30])
print(arr + row_vector)
# Output:
# [[11 22 33]
#  [14 25 36]]

# Adding a column vector to each column
col_vector = np.array([[10], [20]])
print(arr + col_vector)
# Output:
# [[11 12 13]
#  [24 25 26]]

Comparison with PyTorch Tensors

Now that you understand NumPy arrays, let's briefly see how they relate to PyTorch tensors:

import numpy as np
import torch

# Create a NumPy array
np_arr = np.array([1, 2, 3, 4, 5])
print(f"NumPy array: {np_arr}")
# Output: NumPy array: [1 2 3 4 5]

# Convert NumPy array to PyTorch tensor
torch_tensor = torch.from_numpy(np_arr)
print(f"PyTorch tensor: {torch_tensor}")
# Output: PyTorch tensor: tensor([1, 2, 3, 4, 5], dtype=torch.int32)

# Convert PyTorch tensor to NumPy array
back_to_np = torch_tensor.numpy()
print(f"Back to NumPy: {back_to_np}")
# Output: Back to NumPy: [1 2 3 4 5]

Practical Example: Image Processing with NumPy

Let's see a practical example of using NumPy for basic image processing, a common use case in deep learning projects:

import numpy as np
import matplotlib.pyplot as plt

# Create a simple 5x5 image (a small square)
img = np.zeros((5, 5))  # Black background
img[1:4, 1:4] = 1       # White square in the middle

# Display the image
plt.figure(figsize=(3, 3))
plt.imshow(img, cmap='gray')
plt.title("Original Image")
plt.axis('off')
plt.show()

# Flip the image horizontally
flipped_img = np.fliplr(img)

# Rotate the image
rotated_img = np.rot90(img)

# Create a 3x3 filter for edge detection
edge_filter = np.array([[-1, -1, -1], 
                         [-1,  8, -1], 
                         [-1, -1, -1]])

# Apply the filter (basic convolution - this is simplified)
# In practice, you'd use scipy.signal.convolve2d or similar
filtered_img = np.zeros((3, 3))
for i in range(3):
    for j in range(3):
        filtered_img[i, j] = np.sum(img[i:i+3, j:j+3] * edge_filter)

# Display the results
fig, axes = plt.subplots(1, 3, figsize=(9, 3))
axes[0].imshow(flipped_img, cmap='gray')
axes[0].set_title("Flipped Image")
axes[0].axis('off')

axes[1].imshow(rotated_img, cmap='gray')
axes[1].set_title("Rotated Image")
axes[1].axis('off')

axes[2].imshow(filtered_img, cmap='gray')
axes[2].set_title("Edge Detection")
axes[2].axis('off')

plt.tight_layout()
plt.show()

Note: To run the above example, you'll need to have matplotlib installed (pip install matplotlib).

Summary

In this tutorial, we've covered the fundamentals of NumPy, which forms the foundation for working with PyTorch:

Creating and manipulating NumPy arrays
Indexing and slicing arrays
Array operations and broadcasting
Basic statistical functions
Converting between NumPy arrays and PyTorch tensors
A practical example of image manipulation

Understanding NumPy is crucial for effective use of PyTorch, as PyTorch's tensor operations are largely inspired by NumPy's array operations. The concepts and operations you've learned here will directly translate to working with PyTorch tensors.

Additional Resources

To deepen your understanding of NumPy:

Exercises

Basic Array Manipulation:
- Create a 3x3 array of random integers between 1 and 10
- Extract the diagonal elements
- Reverse the order of rows
Mathematical Operations:
- Create two 4x4 arrays, one with values from 1 to 16 and another with random values
- Calculate the element-wise product
- Find the row-wise and column-wise sums
Image Processing:
- Create a 10x10 "image" with a cross pattern
- Blur the image by replacing each pixel with the average of its neighbors
- Apply different rotations and transformations to the image
PyTorch Conversion:
- Create a complex NumPy array with mixed dimensions
- Convert it to a PyTorch tensor
- Perform operations in PyTorch and convert back to NumPy

Understanding NumPy well will make your transition to PyTorch much smoother and help you better understand the underlying operations in neural networks.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why NumPy?​

Getting Started with NumPy​

Installation​

Importing NumPy​

NumPy Arrays​

Creating NumPy Arrays​

From Python Lists​

Using NumPy Functions​

Array Attributes​

Array Indexing and Slicing​

Basic Indexing​

Array Slicing​

Array Manipulation​

Reshaping Arrays​

Joining Arrays​

Splitting Arrays​

Array Operations​

Basic Operations​

Statistical Operations​

Broadcasting​

Comparison with PyTorch Tensors​

Practical Example: Image Processing with NumPy​

Summary​

Additional Resources​

Exercises​