PyTorch Training Basics

Welcome to the world of PyTorch training! This guide will walk you through the foundational concepts and practical implementation of training neural networks using PyTorch. By the end, you'll understand how to construct and execute an effective training loop for your deep learning models.

Introduction

Training a neural network involves an iterative process where a model learns patterns from data by adjusting its internal parameters. PyTorch provides a flexible framework for this process, giving developers control over each step while handling complex computations under the hood.

At its core, PyTorch training consists of these key components:

Data preparation and loading
Model definition
Loss function selection
Optimizer configuration
The training loop itself

Let's explore each of these components in detail.

Data Preparation and Loading

Before training begins, you need to prepare and load your data in a format PyTorch can work with.

Using PyTorch Datasets and DataLoaders

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np

# Create a simple dataset
class SimpleDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx], self.targets[idx]

# Generate synthetic data: input features and target values
x = np.random.rand(100, 5).astype(np.float32)  # 100 samples, 5 features each
y = np.random.randint(0, 2, size=(100, 1)).astype(np.float32)  # Binary targets

# Create dataset
dataset = SimpleDataset(
    torch.from_numpy(x), 
    torch.from_numpy(y)
)

# Create data loader
train_loader = DataLoader(dataset=dataset, batch_size=16, shuffle=True)

print(f"Dataset size: {len(dataset)}")
print(f"Number of batches: {len(train_loader)}")

Output:

Dataset size: 100
Number of batches: 7

The DataLoader splits our dataset into batches, which helps with memory efficiency and can improve training dynamics.

Model Definition

Next, we need to define our neural network model. PyTorch makes this easy with its nn.Module class:

import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleModel, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        x = self.sigmoid(x)
        return x

# Initialize the model
input_size = 5
hidden_size = 10
output_size = 1

model = SimpleModel(input_size, hidden_size, output_size)
print(model)

Output:

SimpleModel(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (activation): ReLU()
  (layer2): Linear(in_features=10, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

Loss Function

The loss function measures how far the model's predictions are from the ground truth:

# Define loss function
criterion = nn.BCELoss()  # Binary Cross Entropy Loss for binary classification

# Example of how loss is calculated
x_sample = torch.randn(1, 5)  # One random sample
y_pred = model(x_sample)
y_true = torch.tensor([[1.0]])  # Ground truth (positive class)

loss = criterion(y_pred, y_true)
print(f"Prediction: {y_pred.item():.4f}")
print(f"Target: {y_true.item()}")
print(f"Loss: {loss.item():.4f}")

Output:

Prediction: 0.5378
Target: 1.0
Loss: 0.6188

Optimizer

The optimizer updates the model's parameters based on the calculated loss:

# Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Example of using the optimizer
optimizer.zero_grad()  # Clear previous gradients
loss.backward()        # Compute gradients
optimizer.step()       # Update parameters

# Check model predictions after one update step
new_pred = model(x_sample)
print(f"Prediction after update: {new_pred.item():.4f}")

Output:

Prediction after update: 0.5392

Putting It Together: The Basic Training Loop

Now let's implement a complete training loop:

def train_model(model, train_loader, criterion, optimizer, num_epochs):
    model.train()  # Set model to training mode
    
    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        for i, (inputs, targets) in enumerate(train_loader):
            # Zero the parameter gradients
            optimizer.zero_grad()
            
            # Forward pass: compute predictions
            outputs = model(inputs)
            
            # Compute loss
            loss = criterion(outputs, targets)
            
            # Backward pass: compute gradients
            loss.backward()
            
            # Update parameters
            optimizer.step()
            
            # Track statistics
            running_loss += loss.item()
            predicted = (outputs > 0.5).float()
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
        
        # Print epoch statistics
        epoch_loss = running_loss / len(train_loader)
        accuracy = 100 * correct / total
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%")
    
    return model

# Train the model
num_epochs = 10
trained_model = train_model(model, train_loader, criterion, optimizer, num_epochs)

Output:

Epoch 1/10, Loss: 0.6932, Accuracy: 48.00%
Epoch 2/10, Loss: 0.6931, Accuracy: 51.00%
Epoch 3/10, Loss: 0.6930, Accuracy: 49.00%
Epoch 4/10, Loss: 0.6929, Accuracy: 52.00%
Epoch 5/10, Loss: 0.6928, Accuracy: 54.00%
Epoch 6/10, Loss: 0.6927, Accuracy: 56.00%
Epoch 7/10, Loss: 0.6926, Accuracy: 58.00%
Epoch 8/10, Loss: 0.6925, Accuracy: 60.00%
Epoch 9/10, Loss: 0.6923, Accuracy: 63.00%
Epoch 10/10, Loss: 0.6922, Accuracy: 65.00%

Note: Because we're using randomly generated data without actual patterns, the model won't achieve high accuracy in this example.

Evaluating the Model

After training, we should evaluate our model on separate validation or test data:

def evaluate_model(model, data_loader, criterion):
    model.eval()  # Set the model to evaluation mode
    
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():  # No gradient computation during evaluation
        for inputs, targets in data_loader:
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            
            running_loss += loss.item()
            predicted = (outputs > 0.5).float()
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
    
    avg_loss = running_loss / len(data_loader)
    accuracy = 100 * correct / total
    
    print(f"Evaluation - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")
    
    return avg_loss, accuracy

# Evaluate the model on the same data (in a real scenario, use separate test data)
evaluate_model(trained_model, train_loader, criterion)

Output:

Evaluation - Loss: 0.6921, Accuracy: 68.00%

Real-World Example: Training an MNIST Classifier

Let's apply these concepts to a real dataset - the classic MNIST handwritten digit classification:

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Data transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST dataset
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

# Create data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Define the model
class MNISTModel(nn.Module):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.dropout = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.relu = nn.ReLU()
        self.max_pool = nn.MaxPool2d(2)
        self.log_softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.dropout(self.conv2(x))))
        x = x.view(-1, 320)  # Flatten
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return self.log_softmax(x)

# Initialize the model and move to device
model = MNISTModel().to(device)
print(model)

# Loss function and optimizer
criterion = nn.NLLLoss()  # Negative Log Likelihood Loss
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function with progress tracking
def train_mnist(model, train_loader, criterion, optimizer, epochs=5):
    model.train()
    
    for epoch in range(epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
            
            # Print progress every 100 batches
            if batch_idx % 100 == 0:
                print(f'Epoch: {epoch+1}/{epochs}, Batch: {batch_idx}/{len(train_loader)}, '
                      f'Loss: {loss.item():.4f}, Accuracy: {100 * correct / total:.2f}%')
        
        # Print epoch statistics
        epoch_loss = running_loss / len(train_loader)
        accuracy = 100 * correct / total
        print(f"Epoch {epoch+1}/{epochs} Summary - Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%")
    
    return model

# Train for just 2 epochs to demonstrate (usually more would be better)
trained_model = train_mnist(model, train_loader, criterion, optimizer, epochs=2)

Output:

Using device: cuda
MNISTModel(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (dropout): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
  (relu): ReLU()
  (max_pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (log_softmax): LogSoftmax(dim=1)
)
Epoch: 1/2, Batch: 0/938, Loss: 2.3023, Accuracy: 9.38%
Epoch: 1/2, Batch: 100/938, Loss: 0.4033, Accuracy: 85.93%
Epoch: 1/2, Batch: 200/938, Loss: 0.2601, Accuracy: 88.23%
Epoch: 1/2, Batch: 300/938, Loss: 0.2376, Accuracy: 89.28%
Epoch: 1/2, Batch: 400/938, Loss: 0.0909, Accuracy: 89.97%
Epoch: 1/2, Batch: 500/938, Loss: 0.0476, Accuracy: 90.47%
Epoch: 1/2, Batch: 600/938, Loss: 0.0384, Accuracy: 90.85%
Epoch: 1/2, Batch: 700/938, Loss: 0.0876, Accuracy: 91.14%
Epoch: 1/2, Batch: 800/938, Loss: 0.0674, Accuracy: 91.37%
Epoch: 1/2, Batch: 900/938, Loss: 0.1365, Accuracy: 91.58%
Epoch 1/2 Summary - Loss: 0.3196, Accuracy: 91.60%
Epoch: 2/2, Batch: 0/938, Loss: 0.0794, Accuracy: 96.88%
Epoch: 2/2, Batch: 100/938, Loss: 0.0547, Accuracy: 95.35%
Epoch: 2/2, Batch: 200/938, Loss: 0.0742, Accuracy: 95.44%
Epoch: 2/2, Batch: 300/938, Loss: 0.0788, Accuracy: 95.50%
Epoch: 2/2, Batch: 400/938, Loss: 0.0151, Accuracy: 95.55%
Epoch: 2/2, Batch: 500/938, Loss: 0.0224, Accuracy: 95.60%
Epoch: 2/2, Batch: 600/938, Loss: 0.0207, Accuracy: 95.64%
Epoch: 2/2, Batch: 700/938, Loss: 0.0153, Accuracy: 95.67%
Epoch: 2/2, Batch: 800/938, Loss: 0.0364, Accuracy: 95.70%
Epoch: 2/2, Batch: 900/938, Loss: 0.0327, Accuracy: 95.72%
Epoch 2/2 Summary - Loss: 0.1033, Accuracy: 95.73%

Testing the Trained Model

Let's evaluate our trained MNIST model:

def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    
    print(f'\nTest set: Average loss: {test_loss:.4f}, '
          f'Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.2f}%)')

# Test the trained model
test(trained_model, test_loader)

Output:

Test set: Average loss: 0.0002, Accuracy: 9685/10000 (96.85%)

Saving and Loading the Model

Once you've trained a model, you'll want to save it for future use:

# Save the trained model
torch.save(trained_model.state_dict(), 'mnist_model.pth')
print("Model saved to mnist_model.pth")

# Load the model (demonstrate how to load)
model = MNISTModel().to(device)
model.load_state_dict(torch.load('mnist_model.pth'))
model.eval()
print("Model loaded successfully")

Output:

Model saved to mnist_model.pth
Model loaded successfully

Summary

In this guide, we've covered the essential components of training neural networks with PyTorch:

Data preparation and loading using Dataset and DataLoader classes
Model definition by creating classes that inherit from nn.Module
Loss functions to measure model performance
Optimizers to update model parameters
Training loops to iteratively improve the model
Evaluation to assess model performance
Saving and loading models for later use

We demonstrated these concepts with both a simple synthetic example and a real-world MNIST classification task.

The PyTorch training loop follows this general pattern:

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        # 1. Zero gradients
        optimizer.zero_grad()
        
        # 2. Forward pass
        outputs = model(inputs)
        
        # 3. Compute loss
        loss = criterion(outputs, targets)
        
        # 4. Backward pass
        loss.backward()
        
        # 5. Update weights
        optimizer.step()

This pattern is remarkably consistent across different neural network architectures and tasks, making it a fundamental skill for deep learning practitioners.

Additional Resources and Exercises

Resources

Exercises

Basic Exercise: Modify the MNIST model to improve its accuracy (hint: try adding more layers or changing the activation functions).
Intermediate Exercise: Implement a validation loop during training to monitor for overfitting.
Advanced Exercise: Implement a training loop with learning rate scheduling and early stopping.
Challenge: Adapt the code to train on the CIFAR-10 dataset, which contains color images in 10 classes.
Project: Build a complete image classification system with data augmentation, model training, evaluation, and a simple interface for making predictions on new images.

Remember that practice is key to mastering these concepts. Happy training!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Data Preparation and Loading​

Using PyTorch Datasets and DataLoaders​

Model Definition​

Loss Function​

Optimizer​

Putting It Together: The Basic Training Loop​

Evaluating the Model​

Real-World Example: Training an MNIST Classifier​

Testing the Trained Model​

Saving and Loading the Model​

Summary​

Additional Resources and Exercises​

Resources​

Exercises​

Introduction

Data Preparation and Loading

Using PyTorch Datasets and DataLoaders

Model Definition

Loss Function

Optimizer

Putting It Together: The Basic Training Loop

Evaluating the Model

Real-World Example: Training an MNIST Classifier

Testing the Trained Model

Saving and Loading the Model

Summary

Additional Resources and Exercises

Resources

Exercises