Python NumPy Basics
Introduction
NumPy (Numerical Python) is a fundamental Python library for scientific computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. NumPy serves as the foundation for many data science libraries, including Pandas, which we'll explore in later sections.
In this tutorial, we'll cover the basics of NumPy that will prepare you for working with Pandas. Understanding NumPy is crucial because Pandas is built on top of NumPy, and many Pandas operations rely on NumPy functionality under the hood.
Why NumPy?
Before diving into NumPy, you might wonder why we need it when Python already has lists. Here are some key advantages:
- Performance: NumPy operations are executed in pre-compiled C code, making them much faster than Python list operations.
- Memory efficiency: NumPy arrays use less memory than Python lists.
- Convenience: NumPy provides a wide range of mathematical functions that work directly with arrays.
- Vectorization: NumPy allows you to perform operations on entire arrays without explicit loops.
Installation
Let's start by installing NumPy. Open your terminal or command prompt and run:
pip install numpy
Creating NumPy Arrays
Basic Array Creation
Let's explore different ways to create NumPy arrays:
import numpy as np
# Create array from a Python list
arr1 = np.array([1, 2, 3, 4, 5])
print("Array from list:", arr1)
# Create a 2D array (matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D array:\n", arr2)
# Check the dimensions
print("\nShape of arr2:", arr2.shape)
print("Dimensions of arr2:", arr2.ndim)
print("Data type:", arr2.dtype)
Output:
Array from list: [1 2 3 4 5]
2D array:
[[1 2 3]
[4 5 6]]
Shape of arr2: (2, 3)
Dimensions of arr2: 2
Data type: int64
Array Creation Functions
NumPy provides convenient functions to create special arrays:
# Create an array of zeros
zeros = np.zeros((3, 4))
print("Array of zeros:\n", zeros)
# Create an array of ones
ones = np.ones((2, 3))
print("\nArray of ones:\n", ones)
# Create an identity matrix
identity = np.eye(3)
print("\nIdentity matrix:\n", identity)
# Create an array with a range of values
range_array = np.arange(0, 10, 2) # start, stop, step
print("\nRange array:", range_array)
# Create an array with evenly spaced values
linspace = np.linspace(0, 1, 5) # start, stop, num
print("\nLinspace array:", linspace)
Output:
Array of zeros:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
Array of ones:
[[1. 1. 1.]
[1. 1. 1.]]
Identity matrix:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
Range array: [0 2 4 6 8]
Linspace array: [0. 0.25 0.5 0.75 1. ]
Random Number Generation
NumPy provides functions for generating random numbers:
# Generate random numbers between 0 and 1
random_array = np.random.random((2, 3))
print("Random array:\n", random_array)
# Generate random integers between 1 and 10
random_integers = np.random.randint(1, 10, size=(3, 3))
print("\nRandom integers:\n", random_integers)
# Set a random seed for reproducibility
np.random.seed(42)
random_with_seed = np.random.random(5)
print("\nRandom with seed:", random_with_seed)
Output:
Random array:
[[0.82687517 0.40427245 0.07531833]
[0.52454352 0.90237461 0.7156505 ]]
Random integers:
[[7 2 9]
[3 5 1]
[8 6 4]]
Random with seed: [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864]
Array Indexing and Slicing
NumPy provides powerful ways to access and modify array elements:
Basic Indexing and Slicing
# Create a sample array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Original array:\n", arr)
# Accessing elements
print("\nElement at index (1,2):", arr[1, 2]) # Row 1, Column 2
# Slicing arrays
print("\nFirst two rows, all columns:\n", arr[:2, :])
print("\nAll rows, last two columns:\n", arr[:, 2:])
print("\nSubmatrix (rows 1-2, columns 1-3):\n", arr[1:3, 1:3])
Output:
Original array:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
Element at index (1,2): 7
First two rows, all columns:
[[1 2 3 4]
[5 6 7 8]]
All rows, last two columns:
[[ 3 4]
[ 7 8]
[11 12]]
Submatrix (rows 1-2, columns 1-3):
[[ 6 7]
[10 11]]
Advanced Indexing
# Boolean indexing
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
bool_idx = arr > 4
print("Boolean index:", bool_idx)
print("Elements greater than 4:", arr[bool_idx])
# Or more directly:
print("Elements greater than 4:", arr[arr > 4])
# Integer indexing
idx = np.array([1, 3, 5]) # Get elements at indices 1, 3, and 5
print("\nElements at indices 1, 3, 5:", arr[idx])
# Changing values that meet a condition
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d[arr_2d % 2 == 0] = -1 # Replace even numbers with -1
print("\nArray after replacing even numbers:\n", arr_2d)
Output:
Boolean index: [False False False False True True True True]
Elements greater than 4: [5 6 7 8]
Elements greater than 4: [5 6 7 8]
Elements at indices 1, 3, 5: [2 4 6]
Array after replacing even numbers:
[[ 1 -1 3]
[-1 5 -1]
[ 7 -1 9]]
Array Operations
NumPy's power comes from its ability to perform operations on entire arrays efficiently:
Mathematical Operations
# Create arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Basic operations
print("a + b =", a + b)
print("a - b =", a - b)
print("a * b =", a * b) # Element-wise multiplication
print("a / b =", a / b)
print("a ** 2 =", a ** 2) # Squaring each element
# Other operations
print("\nSquare root of a:", np.sqrt(a))
print("Exponential of a:", np.exp(a))
print("Natural log of a:", np.log(a))
print("Sum of a:", np.sum(a))
print("Mean of a:", np.mean(a))
Output:
a + b = [5 7 9]
a - b = [-3 -3 -3]
a * b = [ 4 10 18]
a / b = [0.25 0.4 0.5 ]
a ** 2 = [1 4 9]
Square root of a: [1. 1.41421356 1.73205081]
Exponential of a: [ 2.71828183 7.3890561 20.08553692]
Natural log of a: [0. 0.69314718 1.09861229]
Sum of a: 6
Mean of a: 2.0
Broadcasting
Broadcasting is a powerful NumPy feature that allows operations between arrays of different shapes:
# Create a 3x3 array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Original matrix:\n", matrix)
# Add a scalar to every element
print("\nMatrix + 10:\n", matrix + 10)
# Add a vector to each row
row_vector = np.array([10, 20, 30])
print("\nMatrix + row_vector:\n", matrix + row_vector)
# Add a vector to each column
col_vector = np.array([[100], [200], [300]])
print("\nMatrix + col_vector:\n", matrix + col_vector)
Output:
Original matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
Matrix + 10:
[[11 12 13]
[14 15 16]
[17 18 19]]
Matrix + row_vector:
[[11 22 33]
[14 25 36]
[17 28 39]]
Matrix + col_vector:
[[101 102 103]
[204 205 206]
[307 308 309]]
Array Transformation
NumPy provides various functions to reshape and transform arrays:
# Create a 1D array
arr = np.arange(12)
print("Original array:", arr)
# Reshape to a 3x4 matrix
reshaped = arr.reshape(3, 4)
print("\nReshaped to 3x4:\n", reshaped)
# Transpose a matrix
transposed = reshaped.T
print("\nTransposed matrix:\n", transposed)
# Flatten a multi-dimensional array
flattened = reshaped.flatten()
print("\nFlattened array:", flattened)
# Stack arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Vertical stack
v_stack = np.vstack((a, b))
print("\nVertical stack:\n", v_stack)
# Horizontal stack
h_stack = np.hstack((a, b))
print("\nHorizontal stack:\n", h_stack)
Output:
Original array: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Reshaped to 3x4:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Transposed matrix:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]
Flattened array: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Vertical stack:
[[1 2]
[3 4]
[5 6]
[7 8]]
Horizontal stack:
[[1 2 5 6]
[3 4 7 8]]
Practical Example: Data Analysis with NumPy
Let's work through a simple data analysis example using NumPy. We'll analyze some basic temperature data:
# Daily temperatures (°C) for a week in two cities
city1_temps = np.array([20, 21, 19, 24, 25, 22, 23])
city2_temps = np.array([18, 19, 17, 23, 24, 20, 21])
# Calculate basic statistics
print("City 1 - Mean temperature:", np.mean(city1_temps))
print("City 2 - Mean temperature:", np.mean(city2_temps))
print("City 1 - Temperature range:", np.max(city1_temps) - np.min(city1_temps))
print("City 2 - Temperature range:", np.max(city2_temps) - np.min(city2_temps))
# Temperature difference between cities
temp_diff = city1_temps - city2_temps
print("\nDaily temperature differences:", temp_diff)
print("Average temperature difference:", np.mean(temp_diff))
# Days when city1 was warmer than city2
warmer_days = city1_temps > city2_temps
print("\nDays when City 1 was warmer than City 2:", warmer_days)
print("Number of days City 1 was warmer:", np.sum(warmer_days))
# Combined data analysis
all_temps = np.vstack((city1_temps, city2_temps))
print("\nCombined temperature data:\n", all_temps)
print("Daily average temperatures:", np.mean(all_temps, axis=0))
print("Each city's average temperature:", np.mean(all_temps, axis=1))
Output:
City 1 - Mean temperature: 22.0
City 2 - Mean temperature: 20.28571428571429
City 1 - Temperature range: 6
City 2 - Temperature range: 7
Daily temperature differences: [2 2 2 1 1 2 2]
Average temperature difference: 1.7142857142857142
Days when City 1 was warmer than City 2: [ True True True True True True True]
Number of days City 1 was warmer: 7
Combined temperature data:
[[20 21 19 24 25 22 23]
[18 19 17 23 24 20 21]]
Daily average temperatures: [19. 20. 18. 23.5 24.5 21. 22. ]
Each city's average temperature: [22. 20.28571429]
Summary
In this tutorial, we've covered the basics of NumPy, a fundamental library for numerical computing in Python. Here's what we learned:
- Creating NumPy arrays using various methods
- Accessing and manipulating array elements through indexing and slicing
- Performing mathematical operations on arrays
- Transforming arrays with reshaping and stacking
- Analyzing data using NumPy's statistical functions
These NumPy skills form the foundation for working with Pandas, which we'll explore in upcoming tutorials. Pandas builds on NumPy's functionality to provide higher-level data structures specifically designed for working with tabular and time-series data.
Additional Resources
To deepen your understanding of NumPy, check out these resources:
Exercises
-
Create a 4x4 matrix with values ranging from 1 to 16, then extract the submatrix consisting of rows 1 and 2 (inclusive) and columns 1 and 2 (inclusive).
-
Generate a 1D array of 10 random integers between 0 and 100. Replace all odd numbers with -1.
-
Create a 5x5 identity matrix, add 3 to each element, and then extract the diagonal.
-
Given two 1D arrays [1, 2, 3] and [4, 5, 6], compute their dot product.
-
Create an array of 20 linearly spaced points between 0 and 1, and then reshape it to a 4x5 matrix. Calculate the mean of each row and each column.
Happy coding!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)