PyTorch Introduction

What is PyTorch?

PyTorch logo

PyTorch is an open-source machine learning library developed by Facebook's AI Research lab (FAIR). It provides a flexible and intuitive framework for building and training neural networks. Since its release in 2016, PyTorch has gained tremendous popularity among researchers and industry practitioners alike due to its pythonic nature, ease of use, and dynamic computational graph.

Why PyTorch?

There are several key features that make PyTorch stand out:

Pythonic Interface: PyTorch feels natural to Python programmers, making the learning curve less steep.
Dynamic Computation Graph: Unlike static graph frameworks, PyTorch builds the graph on-the-fly as operations are executed.
GPU Acceleration: Seamless integration with NVIDIA CUDA for fast computation on GPUs.
Rich Ecosystem: Extensive libraries and tools for various domains like computer vision (torchvision), natural language processing (torchtext), etc.
Production Ready: TorchScript and TorchServe make deployment easier.

Installation

Before we dive into code, let's install PyTorch. You can install it using pip:

pip install torch torchvision torchaudio

For specific installations (like CUDA versions), it's recommended to use the official installation selector on the PyTorch website.

PyTorch Basics

Tensors - The Building Blocks

In PyTorch, everything revolves around tensors. Tensors are multi-dimensional arrays similar to NumPy's ndarrays, but with the ability to run on GPUs for accelerated computing.

Let's create our first tensor:

import torch

# Creating a tensor
x = torch.tensor([1, 2, 3, 4, 5])
print(x)
print(f"Type: {x.dtype}")
print(f"Shape: {x.shape}")

Output:

tensor([1, 2, 3, 4, 5])
Type: torch.int64
Shape: torch.Size([5])

Creating Different Types of Tensors

PyTorch provides various functions to create tensors:

# Zero tensor
zeros = torch.zeros(2, 3)
print("Zeros tensor:")
print(zeros)

# Random tensor
random = torch.rand(2, 3)
print("\nRandom tensor:")
print(random)

# Tensor with specific range
arange = torch.arange(0, 10, step=2)
print("\nArange tensor:")
print(arange)

# Like NumPy's linspace
linspace = torch.linspace(0, 10, steps=5)
print("\nLinspace tensor:")
print(linspace)

Output:

Zeros tensor:
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Random tensor:
tensor([[0.8684, 0.5555, 0.0474],
        [0.7352, 0.1019, 0.9756]])

Arange tensor:
tensor([0, 2, 4, 6, 8])

Linspace tensor:
tensor([ 0.0000,  2.5000,  5.0000,  7.5000, 10.0000])

Tensor Operations

PyTorch supports a vast range of operations on tensors:

# Basic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Addition
print(f"a + b = {a + b}")  # Equivalent to torch.add(a, b)

# Multiplication (element-wise)
print(f"a * b = {a * b}")  # Equivalent to torch.mul(a, b)

# Matrix multiplication
c = torch.tensor([[1, 2], [3, 4]])
d = torch.tensor([[5, 6], [7, 8]])
print(f"c @ d = \n{c @ d}")  # Equivalent to torch.matmul(c, d)

Output:

a + b = tensor([5, 7, 9])
a * b = tensor([ 4, 10, 18])
c @ d = 
tensor([[19, 22],
        [43, 50]])

Reshaping Tensors

Changing the shape of tensors is a common operation:

# Create a tensor
x = torch.arange(9)
print(f"Original tensor: {x}")

# Reshape to 3x3 matrix
x_3x3 = x.reshape(3, 3)
print(f"Reshaped to 3x3:\n{x_3x3}")

# View is similar but shares memory with original tensor
x_view = x.view(3, 3)
print(f"View as 3x3:\n{x_view}")

# Transpose
print(f"Transposed:\n{x_3x3.T}")

Output:

Original tensor: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
Reshaped to 3x3:
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
View as 3x3:
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
Transposed:
tensor([[0, 3, 6],
        [1, 4, 7],
        [2, 5, 8]])

GPU Acceleration

One of PyTorch's strongest features is its seamless GPU integration:

# Check if CUDA (GPU) is available
print(f"Is CUDA available? {torch.cuda.is_available()}")

# Create a tensor and move it to GPU (if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

x = torch.tensor([1, 2, 3])
x = x.to(device)
print(f"Tensor on {device}: {x}")

# Move back to CPU if needed
x = x.to("cpu")
print(f"Tensor back on CPU: {x}")

Output (will vary based on your hardware):

Is CUDA available? True
Using device: cuda
Tensor on cuda: tensor([1, 2, 3], device='cuda:0')
Tensor back on CPU: tensor([1, 2, 3])

Autograd: Automatic Differentiation

PyTorch's autograd system is a powerful feature for automatic differentiation, which is crucial for training neural networks:

# Create tensors with gradient tracking
x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)

# Define a computation
z = x**2 + y**3

# Compute gradients
z.backward()

# Access gradients
print(f"dz/dx: {x.grad}")  # Should be 2*x = 2*3 = 6
print(f"dz/dy: {y.grad}")  # Should be 3*y^2 = 3*2^2 = 12

Output:

dz/dx: 6.0
dz/dy: 12.0

A Simple Neural Network Example

Let's put everything together by building a simple neural network:

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.linear1 = nn.Linear(2, 5)  # 2 input features, 5 hidden units
        self.activation = nn.ReLU()     # ReLU activation
        self.linear2 = nn.Linear(5, 1)  # 5 hidden units, 1 output
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        return x

# Create the model
model = SimpleNN()
print(model)

# Generate some fake data
X = torch.tensor([[0.5, 0.1], [0.3, 0.9], [0.7, 0.2]], dtype=torch.float)
y = torch.tensor([[0.6], [1.2], [0.9]], dtype=torch.float)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
print("\nTraining the model...")
for epoch in range(100):
    # Forward pass
    y_pred = model(X)
    
    # Compute loss
    loss = criterion(y_pred, y)
    
    # Zero gradients, backward pass, update weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Test the model
with torch.no_grad():  # Disable gradient calculation
    y_pred = model(X)
    print("\nFinal predictions vs actual:")
    for i in range(len(X)):
        print(f"Input: {X[i]}, Predicted: {y_pred[i].item():.4f}, Actual: {y[i].item():.4f}")

Output:

SimpleNN(
  (linear1): Linear(in_features=2, out_features=5, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=5, out_features=1, bias=True)
)

Training the model...
Epoch 20, Loss: 0.0548
Epoch 40, Loss: 0.0141
Epoch 60, Loss: 0.0047
Epoch 80, Loss: 0.0019
Epoch 100, Loss: 0.0009

Final predictions vs actual:
Input: tensor([0.5000, 0.1000]), Predicted: 0.5867, Actual: 0.6000
Input: tensor([0.3000, 0.9000]), Predicted: 1.1884, Actual: 1.2000
Input: tensor([0.7000, 0.2000]), Predicted: 0.9019, Actual: 0.9000

Practical Application: Image Classification

Let's see how PyTorch can be used for a real-world problem like image classification using the MNIST dataset:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST dataset (download if not present)
train_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=True,
    transform=transform,
    download=True
)

# Create data loader
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

# Define the model
class MNISTClassifier(nn.Module):
    def __init__(self):
        super(MNISTClassifier, self).__init__()
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.flatten(x)
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        return x

# Initialize model, loss function, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MNISTClassifier().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop (just one epoch for demonstration)
print("Training on MNIST dataset...")
model.train()
for batch_idx, (images, labels) in enumerate(train_loader):
    images, labels = images.to(device), labels.to(device)
    
    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (batch_idx + 1) % 100 == 0:
        print(f"Batch [{batch_idx + 1}/{len(train_loader)}], Loss: {loss.item():.4f}")
    
    # Break after a few batches for demonstration
    if batch_idx >= 299:
        break

print("Training completed!")

# You would typically evaluate on a test set and save the model

Output:

Training on MNIST dataset...
Batch [100/938], Loss: 0.3127
Batch [200/938], Loss: 0.2155
Batch [300/938], Loss: 0.1477
Training completed!

Summary

In this introduction to PyTorch, we've covered:

What PyTorch is and its key features
How to install PyTorch
Creating and manipulating tensors
GPU acceleration
Automatic differentiation with autograd
Building a simple neural network
A practical application with image classification

PyTorch's intuitive design and powerful capabilities make it an excellent choice for machine learning projects, from simple models to complex deep learning architectures.

Additional Resources

Exercises

Create a tensor containing the numbers 1 to 10, then reshape it to a 2x5 matrix.
Write a function that can determine whether a tensor is on CPU or GPU.
Build a neural network to solve a simple regression problem with a custom dataset.
Modify the MNIST classifier to use convolutional layers (hint: look into nn.Conv2d).
Implement a simple image classifier using a pre-trained model from torchvision.models.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

What is PyTorch?​

Why PyTorch?​

Installation​

PyTorch Basics​

Tensors - The Building Blocks​

Creating Different Types of Tensors​

Tensor Operations​

Reshaping Tensors​

GPU Acceleration​

Autograd: Automatic Differentiation​

A Simple Neural Network Example​

Practical Application: Image Classification​

Summary​

Additional Resources​

Exercises​

What is PyTorch?

Why PyTorch?

Installation

PyTorch Basics

Tensors - The Building Blocks

Creating Different Types of Tensors

Tensor Operations

Reshaping Tensors

GPU Acceleration

Autograd: Automatic Differentiation

A Simple Neural Network Example

Practical Application: Image Classification

Summary

Additional Resources

Exercises