Skip to main content

Python NumPy Basics

Introduction

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. NumPy is the foundation for many Python data science libraries, including Pandas, SciPy, and Scikit-learn.

In this tutorial, we'll explore the basics of NumPy, focusing on:

  • Why NumPy is essential for data science
  • Creating and manipulating NumPy arrays
  • Basic operations and functions
  • Broadcasting and vectorization
  • Practical applications

Why Use NumPy?

Before diving into NumPy, let's understand why it's preferred over Python's built-in lists:

  1. Performance: NumPy operations are executed in pre-compiled C code, making them much faster than Python loops
  2. Memory efficiency: NumPy arrays are more compact than Python lists
  3. Convenience: NumPy provides a wide range of mathematical functions and operations
  4. Vectorization: Allows operations on entire arrays without explicit loops

Getting Started with NumPy

Installation

First, you'll need to install NumPy. Open your terminal or command prompt and run:

bash
pip install numpy

Importing NumPy

In your Python script or Jupyter notebook, import NumPy with:

python
import numpy as np

The convention is to import NumPy with the alias np for brevity.

Creating NumPy Arrays

From Python Lists

The most straightforward way to create a NumPy array is from a Python list:

python
# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
print(arr)

Output:

[1 2 3 4 5]
python
# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)

Output:

[[1 2 3]
[4 5 6]
[7 8 9]]

Using Built-in NumPy Functions

NumPy provides several functions to create arrays with specific patterns:

Arrays with zeros, ones, or specific values

python
# Array of zeros
zeros = np.zeros((3, 4)) # 3 rows, 4 columns
print(zeros)

Output:

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
python
# Array of ones
ones = np.ones((2, 3)) # 2 rows, 3 columns
print(ones)

Output:

[[1. 1. 1.]
[1. 1. 1.]]
python
# Array filled with a specific value
full = np.full((2, 2), 7) # 2x2 array filled with 7
print(full)

Output:

[[7 7]
[7 7]]

Sequential Arrays

python
# Create a range of values
range_arr = np.arange(0, 10, 2) # Start, stop, step
print(range_arr)

Output:

[0 2 4 6 8]
python
# Create evenly spaced values in a range
linspace = np.linspace(0, 1, 5) # Start, stop, number of elements
print(linspace)

Output:

[0.   0.25 0.5  0.75 1.  ]

Random Arrays

python
# Random values between 0 and 1
random_array = np.random.random((2, 2))
print(random_array)

Output (your results will vary):

[[0.42829726 0.16301084]
[0.89231551 0.29416272]]
python
# Random integers
random_ints = np.random.randint(0, 10, (3, 3)) # min, max, size
print(random_ints)

Output (your results will vary):

[[5 2 0]
[7 3 9]
[1 4 8]]

Array Attributes and Methods

NumPy arrays have several useful attributes and methods:

python
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Shape (dimensions)
print("Shape:", arr.shape)

# Number of dimensions
print("Dimensions:", arr.ndim)

# Data type
print("Data type:", arr.dtype)

# Total number of elements
print("Size:", arr.size)

Output:

Shape: (2, 4)
Dimensions: 2
Data type: int64
Size: 8

Array Indexing and Slicing

Accessing Elements

Indexing in NumPy arrays is similar to Python lists but extends to multiple dimensions:

python
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Access a single element (row, column)
print("Element at (1,2):", arr[1, 2]) # row 1, column 2

# Access an entire row
print("First row:", arr[0])

# Access an entire column
print("Second column:", arr[:, 1])

Output:

Element at (1,2): 7
First row: [1 2 3 4]
Second column: [2 6]

Slicing

Slicing works with the syntax start:stop:step:

python
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Slicing [start:stop:step]
print("First 5 elements:", arr[0:5])
print("Every other element:", arr[::2])
print("Reversed array:", arr[::-1])

Output:

First 5 elements: [0 1 2 3 4]
Every other element: [0 2 4 6 8]
Reversed array: [9 8 7 6 5 4 3 2 1 0]

For multi-dimensional arrays:

python
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Slice rows and columns
print("Submatrix (first 2 rows, last 3 columns):")
print(arr[0:2, 1:4])

Output:

Submatrix (first 2 rows, last 3 columns):
[[2 3 4]
[6 7 8]]

Array Operations

NumPy provides efficient ways to perform mathematical operations on arrays.

Arithmetic Operations

python
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print("a + b =", a + b)

# Element-wise subtraction
print("a - b =", a - b)

# Element-wise multiplication
print("a * b =", a * b)

# Element-wise division
print("a / b =", a / b)

# Element-wise exponentiation
print("a ** 2 =", a ** 2)

Output:

a + b = [5 7 9]
a - b = [-3 -3 -3]
a * b = [4 10 18]
a / b = [0.25 0.4 0.5 ]
a ** 2 = [1 4 9]

Statistical Operations

python
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Sum
print("Sum of all elements:", arr.sum())
print("Sum of each column:", arr.sum(axis=0))
print("Sum of each row:", arr.sum(axis=1))

# Mean
print("Mean of all elements:", arr.mean())

# Standard deviation
print("Standard deviation:", arr.std())

# Min and max
print("Minimum:", arr.min())
print("Maximum:", arr.max())

Output:

Sum of all elements: 45
Sum of each column: [12 15 18]
Sum of each row: [ 6 15 24]
Mean of all elements: 5.0
Standard deviation: 2.581988897471611
Minimum: 1
Maximum: 9

Array Reshaping and Manipulation

NumPy provides several functions to change the shape of arrays:

python
# Create an array
arr = np.arange(12)
print("Original array:", arr)

# Reshape to 3x4 matrix
reshaped = arr.reshape(3, 4)
print("Reshaped to 3x4:\n", reshaped)

# Flatten a multi-dimensional array
flattened = reshaped.flatten()
print("Flattened array:", flattened)

# Transpose a matrix
transposed = reshaped.T
print("Transposed matrix:\n", transposed)

Output:

Original array: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped to 3x4:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Flattened array: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Transposed matrix:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]

Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations:

python
# Add scalar to array
arr = np.array([1, 2, 3, 4])
print("Array + 10:", arr + 10)

# Add 1D array to 2D array
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3
b = np.array([10, 20, 30]) # 1D array with shape (3,)
print("2D + 1D:\n", a + b) # b is broadcast to each row

Output:

Array + 10: [11 12 13 14]
2D + 1D:
[[11 22 33]
[14 25 36]]

Practical Example: Image Processing

Let's explore a simple image processing example to see NumPy in action:

python
import numpy as np
import matplotlib.pyplot as plt

# Create a simple 5x5 image (grayscale)
image = np.array([
[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]
])

# Display the original image
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(image, cmap='gray')

# Apply a filter to detect edges (simplified version)
# We'll use a simple difference operation
vertical_edges = np.diff(image, axis=0) # Vertical edges
horizontal_edges = np.diff(image, axis=1) # Horizontal edges

# Combine edges (simplified)
edges = np.zeros_like(image)
edges[:-1, :] = np.maximum(edges[:-1, :], np.abs(vertical_edges))
edges[:, :-1] = np.maximum(edges[:, :-1], np.abs(horizontal_edges))

# Display the edges
plt.subplot(1, 2, 2)
plt.title('Edge Detection')
plt.imshow(edges, cmap='gray')

plt.tight_layout()
plt.show()

This example demonstrates:

  1. Creating an array to represent a simple image
  2. Using NumPy's diff() function to find edges
  3. Manipulating arrays to combine different edge detections

Real-world Application: Data Analysis

Here's a real-world example of using NumPy to analyze a dataset:

python
import numpy as np

# Sample data: daily temperatures for a week (in Celsius)
temperatures = np.array([
# City A, City B, City C, City D
[22, 25, 21, 19], # Monday
[24, 27, 20, 22], # Tuesday
[23, 26, 24, 20], # Wednesday
[25, 28, 23, 23], # Thursday
[21, 24, 22, 18], # Friday
[19, 23, 20, 17], # Saturday
[20, 25, 21, 19] # Sunday
])

# Basic statistics
print("Average temperature by city:")
city_averages = temperatures.mean(axis=0)
for i, avg in enumerate(city_averages):
print(f"City {chr(65+i)}: {avg:.1f}°C")

print("\nAverage temperature by day:")
day_names = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
day_averages = temperatures.mean(axis=1)
for day, avg in zip(day_names, day_averages):
print(f"{day}: {avg:.1f}°C")

# Find the hottest day for each city
hottest_days = temperatures.argmax(axis=0)
print("\nHottest day for each city:")
for i, day_idx in enumerate(hottest_days):
print(f"City {chr(65+i)}: {day_names[day_idx]} ({temperatures[day_idx, i]}°C)")

# Find cities with temperatures above 25°C
hot_days = np.where(temperatures > 25)
print("\nDays with temperatures above 25°C:")
for day_idx, city_idx in zip(hot_days[0], hot_days[1]):
print(f"{day_names[day_idx]} in City {chr(65+city_idx)}: {temperatures[day_idx, city_idx]}°C")

Output:

Average temperature by city:
City A: 22.0°C
City B: 25.4°C
City C: 21.6°C
City D: 19.7°C

Average temperature by day:
Monday: 21.8°C
Tuesday: 23.2°C
Wednesday: 23.2°C
Thursday: 24.8°C
Friday: 21.2°C
Saturday: 19.8°C
Sunday: 21.2°C

Hottest day for each city:
City A: Thursday (25°C)
City B: Thursday (28°C)
City C: Wednesday (24°C)
City D: Thursday (23°C)

Days with temperatures above 25°C:
Tuesday in City B: 27°C
Wednesday in City B: 26°C
Thursday in City B: 28°C

This example demonstrates how NumPy can be used to:

  1. Calculate statistics across different dimensions
  2. Find maximum values and their indices
  3. Filter data based on conditions

Summary

In this tutorial, we've covered the basics of NumPy, including:

  • Creating arrays using various methods
  • Accessing and manipulating array elements
  • Performing mathematical and statistical operations
  • Reshaping and transforming arrays
  • Broadcasting for handling arrays of different shapes
  • Real-world examples demonstrating NumPy's capabilities

NumPy is the foundation of the Python data science ecosystem, and mastering it will help you with other libraries like Pandas, Matplotlib, SciPy, and more.

Additional Resources

To deepen your understanding of NumPy:

Exercises

  1. Create a 3x3 identity matrix using NumPy functions.
  2. Generate an array of 10 random integers between 1 and 100, then find the mean, median, and standard deviation.
  3. Create a 5x5 checkerboard pattern (alternating 0s and 1s) using NumPy.
  4. Load a sample dataset using np.loadtxt() and perform basic statistical analysis.
  5. Write a function that normalizes an array (scales values to be between 0 and 1).
  6. Use NumPy to solve a system of linear equations.

By completing these exercises, you'll gain practical experience with NumPy's functionality and be well-prepared to tackle more complex data science tasks!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)