Python NumPy Basics
Introduction
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. NumPy is the foundation for many Python data science libraries, including Pandas, SciPy, and Scikit-learn.
In this tutorial, we'll explore the basics of NumPy, focusing on:
- Why NumPy is essential for data science
- Creating and manipulating NumPy arrays
- Basic operations and functions
- Broadcasting and vectorization
- Practical applications
Why Use NumPy?
Before diving into NumPy, let's understand why it's preferred over Python's built-in lists:
- Performance: NumPy operations are executed in pre-compiled C code, making them much faster than Python loops
- Memory efficiency: NumPy arrays are more compact than Python lists
- Convenience: NumPy provides a wide range of mathematical functions and operations
- Vectorization: Allows operations on entire arrays without explicit loops
Getting Started with NumPy
Installation
First, you'll need to install NumPy. Open your terminal or command prompt and run:
pip install numpy
Importing NumPy
In your Python script or Jupyter notebook, import NumPy with:
import numpy as np
The convention is to import NumPy with the alias np
for brevity.
Creating NumPy Arrays
From Python Lists
The most straightforward way to create a NumPy array is from a Python list:
# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Output:
[1 2 3 4 5]
# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)
Output:
[[1 2 3]
[4 5 6]
[7 8 9]]
Using Built-in NumPy Functions
NumPy provides several functions to create arrays with specific patterns:
Arrays with zeros, ones, or specific values
# Array of zeros
zeros = np.zeros((3, 4)) # 3 rows, 4 columns
print(zeros)
Output:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
# Array of ones
ones = np.ones((2, 3)) # 2 rows, 3 columns
print(ones)
Output:
[[1. 1. 1.]
[1. 1. 1.]]
# Array filled with a specific value
full = np.full((2, 2), 7) # 2x2 array filled with 7
print(full)
Output:
[[7 7]
[7 7]]
Sequential Arrays
# Create a range of values
range_arr = np.arange(0, 10, 2) # Start, stop, step
print(range_arr)
Output:
[0 2 4 6 8]
# Create evenly spaced values in a range
linspace = np.linspace(0, 1, 5) # Start, stop, number of elements
print(linspace)
Output:
[0. 0.25 0.5 0.75 1. ]
Random Arrays
# Random values between 0 and 1
random_array = np.random.random((2, 2))
print(random_array)
Output (your results will vary):
[[0.42829726 0.16301084]
[0.89231551 0.29416272]]
# Random integers
random_ints = np.random.randint(0, 10, (3, 3)) # min, max, size
print(random_ints)
Output (your results will vary):
[[5 2 0]
[7 3 9]
[1 4 8]]
Array Attributes and Methods
NumPy arrays have several useful attributes and methods:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
# Shape (dimensions)
print("Shape:", arr.shape)
# Number of dimensions
print("Dimensions:", arr.ndim)
# Data type
print("Data type:", arr.dtype)
# Total number of elements
print("Size:", arr.size)
Output:
Shape: (2, 4)
Dimensions: 2
Data type: int64
Size: 8
Array Indexing and Slicing
Accessing Elements
Indexing in NumPy arrays is similar to Python lists but extends to multiple dimensions:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
# Access a single element (row, column)
print("Element at (1,2):", arr[1, 2]) # row 1, column 2
# Access an entire row
print("First row:", arr[0])
# Access an entire column
print("Second column:", arr[:, 1])
Output:
Element at (1,2): 7
First row: [1 2 3 4]
Second column: [2 6]
Slicing
Slicing works with the syntax start:stop:step
:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Slicing [start:stop:step]
print("First 5 elements:", arr[0:5])
print("Every other element:", arr[::2])
print("Reversed array:", arr[::-1])
Output:
First 5 elements: [0 1 2 3 4]
Every other element: [0 2 4 6 8]
Reversed array: [9 8 7 6 5 4 3 2 1 0]
For multi-dimensional arrays:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Slice rows and columns
print("Submatrix (first 2 rows, last 3 columns):")
print(arr[0:2, 1:4])
Output:
Submatrix (first 2 rows, last 3 columns):
[[2 3 4]
[6 7 8]]
Array Operations
NumPy provides efficient ways to perform mathematical operations on arrays.
Arithmetic Operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise addition
print("a + b =", a + b)
# Element-wise subtraction
print("a - b =", a - b)
# Element-wise multiplication
print("a * b =", a * b)
# Element-wise division
print("a / b =", a / b)
# Element-wise exponentiation
print("a ** 2 =", a ** 2)
Output:
a + b = [5 7 9]
a - b = [-3 -3 -3]
a * b = [4 10 18]
a / b = [0.25 0.4 0.5 ]
a ** 2 = [1 4 9]
Statistical Operations
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Sum
print("Sum of all elements:", arr.sum())
print("Sum of each column:", arr.sum(axis=0))
print("Sum of each row:", arr.sum(axis=1))
# Mean
print("Mean of all elements:", arr.mean())
# Standard deviation
print("Standard deviation:", arr.std())
# Min and max
print("Minimum:", arr.min())
print("Maximum:", arr.max())
Output:
Sum of all elements: 45
Sum of each column: [12 15 18]
Sum of each row: [ 6 15 24]
Mean of all elements: 5.0
Standard deviation: 2.581988897471611
Minimum: 1
Maximum: 9
Array Reshaping and Manipulation
NumPy provides several functions to change the shape of arrays:
# Create an array
arr = np.arange(12)
print("Original array:", arr)
# Reshape to 3x4 matrix
reshaped = arr.reshape(3, 4)
print("Reshaped to 3x4:\n", reshaped)
# Flatten a multi-dimensional array
flattened = reshaped.flatten()
print("Flattened array:", flattened)
# Transpose a matrix
transposed = reshaped.T
print("Transposed matrix:\n", transposed)
Output:
Original array: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Reshaped to 3x4:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Flattened array: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Transposed matrix:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]
Broadcasting
Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations:
# Add scalar to array
arr = np.array([1, 2, 3, 4])
print("Array + 10:", arr + 10)
# Add 1D array to 2D array
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3
b = np.array([10, 20, 30]) # 1D array with shape (3,)
print("2D + 1D:\n", a + b) # b is broadcast to each row
Output:
Array + 10: [11 12 13 14]
2D + 1D:
[[11 22 33]
[14 25 36]]
Practical Example: Image Processing
Let's explore a simple image processing example to see NumPy in action:
import numpy as np
import matplotlib.pyplot as plt
# Create a simple 5x5 image (grayscale)
image = np.array([
[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]
])
# Display the original image
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(image, cmap='gray')
# Apply a filter to detect edges (simplified version)
# We'll use a simple difference operation
vertical_edges = np.diff(image, axis=0) # Vertical edges
horizontal_edges = np.diff(image, axis=1) # Horizontal edges
# Combine edges (simplified)
edges = np.zeros_like(image)
edges[:-1, :] = np.maximum(edges[:-1, :], np.abs(vertical_edges))
edges[:, :-1] = np.maximum(edges[:, :-1], np.abs(horizontal_edges))
# Display the edges
plt.subplot(1, 2, 2)
plt.title('Edge Detection')
plt.imshow(edges, cmap='gray')
plt.tight_layout()
plt.show()
This example demonstrates:
- Creating an array to represent a simple image
- Using NumPy's
diff()
function to find edges - Manipulating arrays to combine different edge detections
Real-world Application: Data Analysis
Here's a real-world example of using NumPy to analyze a dataset:
import numpy as np
# Sample data: daily temperatures for a week (in Celsius)
temperatures = np.array([
# City A, City B, City C, City D
[22, 25, 21, 19], # Monday
[24, 27, 20, 22], # Tuesday
[23, 26, 24, 20], # Wednesday
[25, 28, 23, 23], # Thursday
[21, 24, 22, 18], # Friday
[19, 23, 20, 17], # Saturday
[20, 25, 21, 19] # Sunday
])
# Basic statistics
print("Average temperature by city:")
city_averages = temperatures.mean(axis=0)
for i, avg in enumerate(city_averages):
print(f"City {chr(65+i)}: {avg:.1f}°C")
print("\nAverage temperature by day:")
day_names = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
day_averages = temperatures.mean(axis=1)
for day, avg in zip(day_names, day_averages):
print(f"{day}: {avg:.1f}°C")
# Find the hottest day for each city
hottest_days = temperatures.argmax(axis=0)
print("\nHottest day for each city:")
for i, day_idx in enumerate(hottest_days):
print(f"City {chr(65+i)}: {day_names[day_idx]} ({temperatures[day_idx, i]}°C)")
# Find cities with temperatures above 25°C
hot_days = np.where(temperatures > 25)
print("\nDays with temperatures above 25°C:")
for day_idx, city_idx in zip(hot_days[0], hot_days[1]):
print(f"{day_names[day_idx]} in City {chr(65+city_idx)}: {temperatures[day_idx, city_idx]}°C")
Output:
Average temperature by city:
City A: 22.0°C
City B: 25.4°C
City C: 21.6°C
City D: 19.7°C
Average temperature by day:
Monday: 21.8°C
Tuesday: 23.2°C
Wednesday: 23.2°C
Thursday: 24.8°C
Friday: 21.2°C
Saturday: 19.8°C
Sunday: 21.2°C
Hottest day for each city:
City A: Thursday (25°C)
City B: Thursday (28°C)
City C: Wednesday (24°C)
City D: Thursday (23°C)
Days with temperatures above 25°C:
Tuesday in City B: 27°C
Wednesday in City B: 26°C
Thursday in City B: 28°C
This example demonstrates how NumPy can be used to:
- Calculate statistics across different dimensions
- Find maximum values and their indices
- Filter data based on conditions
Summary
In this tutorial, we've covered the basics of NumPy, including:
- Creating arrays using various methods
- Accessing and manipulating array elements
- Performing mathematical and statistical operations
- Reshaping and transforming arrays
- Broadcasting for handling arrays of different shapes
- Real-world examples demonstrating NumPy's capabilities
NumPy is the foundation of the Python data science ecosystem, and mastering it will help you with other libraries like Pandas, Matplotlib, SciPy, and more.
Additional Resources
To deepen your understanding of NumPy:
Exercises
- Create a 3x3 identity matrix using NumPy functions.
- Generate an array of 10 random integers between 1 and 100, then find the mean, median, and standard deviation.
- Create a 5x5 checkerboard pattern (alternating 0s and 1s) using NumPy.
- Load a sample dataset using
np.loadtxt()
and perform basic statistical analysis. - Write a function that normalizes an array (scales values to be between 0 and 1).
- Use NumPy to solve a system of linear equations.
By completing these exercises, you'll gain practical experience with NumPy's functionality and be well-prepared to tackle more complex data science tasks!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)