Skip to main content

PyTorch Training Basics

Welcome to the world of PyTorch training! This guide will walk you through the foundational concepts and practical implementation of training neural networks using PyTorch. By the end, you'll understand how to construct and execute an effective training loop for your deep learning models.

Introduction

Training a neural network involves an iterative process where a model learns patterns from data by adjusting its internal parameters. PyTorch provides a flexible framework for this process, giving developers control over each step while handling complex computations under the hood.

At its core, PyTorch training consists of these key components:

  • Data preparation and loading
  • Model definition
  • Loss function selection
  • Optimizer configuration
  • The training loop itself

Let's explore each of these components in detail.

Data Preparation and Loading

Before training begins, you need to prepare and load your data in a format PyTorch can work with.

Using PyTorch Datasets and DataLoaders

python
import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np

# Create a simple dataset
class SimpleDataset(Dataset):
def __init__(self, data, targets):
self.data = data
self.targets = targets

def __len__(self):
return len(self.data)

def __getitem__(self, idx):
return self.data[idx], self.targets[idx]

# Generate synthetic data: input features and target values
x = np.random.rand(100, 5).astype(np.float32) # 100 samples, 5 features each
y = np.random.randint(0, 2, size=(100, 1)).astype(np.float32) # Binary targets

# Create dataset
dataset = SimpleDataset(
torch.from_numpy(x),
torch.from_numpy(y)
)

# Create data loader
train_loader = DataLoader(dataset=dataset, batch_size=16, shuffle=True)

print(f"Dataset size: {len(dataset)}")
print(f"Number of batches: {len(train_loader)}")

Output:

Dataset size: 100
Number of batches: 7

The DataLoader splits our dataset into batches, which helps with memory efficiency and can improve training dynamics.

Model Definition

Next, we need to define our neural network model. PyTorch makes this easy with its nn.Module class:

python
import torch.nn as nn

class SimpleModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleModel, self).__init__()
self.layer1 = nn.Linear(input_size, hidden_size)
self.activation = nn.ReLU()
self.layer2 = nn.Linear(hidden_size, output_size)
self.sigmoid = nn.Sigmoid()

def forward(self, x):
x = self.layer1(x)
x = self.activation(x)
x = self.layer2(x)
x = self.sigmoid(x)
return x

# Initialize the model
input_size = 5
hidden_size = 10
output_size = 1

model = SimpleModel(input_size, hidden_size, output_size)
print(model)

Output:

SimpleModel(
(layer1): Linear(in_features=5, out_features=10, bias=True)
(activation): ReLU()
(layer2): Linear(in_features=10, out_features=1, bias=True)
(sigmoid): Sigmoid()
)

Loss Function

The loss function measures how far the model's predictions are from the ground truth:

python
# Define loss function
criterion = nn.BCELoss() # Binary Cross Entropy Loss for binary classification

# Example of how loss is calculated
x_sample = torch.randn(1, 5) # One random sample
y_pred = model(x_sample)
y_true = torch.tensor([[1.0]]) # Ground truth (positive class)

loss = criterion(y_pred, y_true)
print(f"Prediction: {y_pred.item():.4f}")
print(f"Target: {y_true.item()}")
print(f"Loss: {loss.item():.4f}")

Output:

Prediction: 0.5378
Target: 1.0
Loss: 0.6188

Optimizer

The optimizer updates the model's parameters based on the calculated loss:

python
# Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Example of using the optimizer
optimizer.zero_grad() # Clear previous gradients
loss.backward() # Compute gradients
optimizer.step() # Update parameters

# Check model predictions after one update step
new_pred = model(x_sample)
print(f"Prediction after update: {new_pred.item():.4f}")

Output:

Prediction after update: 0.5392

Putting It Together: The Basic Training Loop

Now let's implement a complete training loop:

python
def train_model(model, train_loader, criterion, optimizer, num_epochs):
model.train() # Set model to training mode

for epoch in range(num_epochs):
running_loss = 0.0
correct = 0
total = 0

for i, (inputs, targets) in enumerate(train_loader):
# Zero the parameter gradients
optimizer.zero_grad()

# Forward pass: compute predictions
outputs = model(inputs)

# Compute loss
loss = criterion(outputs, targets)

# Backward pass: compute gradients
loss.backward()

# Update parameters
optimizer.step()

# Track statistics
running_loss += loss.item()
predicted = (outputs > 0.5).float()
total += targets.size(0)
correct += (predicted == targets).sum().item()

# Print epoch statistics
epoch_loss = running_loss / len(train_loader)
accuracy = 100 * correct / total
print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%")

return model

# Train the model
num_epochs = 10
trained_model = train_model(model, train_loader, criterion, optimizer, num_epochs)

Output:

Epoch 1/10, Loss: 0.6932, Accuracy: 48.00%
Epoch 2/10, Loss: 0.6931, Accuracy: 51.00%
Epoch 3/10, Loss: 0.6930, Accuracy: 49.00%
Epoch 4/10, Loss: 0.6929, Accuracy: 52.00%
Epoch 5/10, Loss: 0.6928, Accuracy: 54.00%
Epoch 6/10, Loss: 0.6927, Accuracy: 56.00%
Epoch 7/10, Loss: 0.6926, Accuracy: 58.00%
Epoch 8/10, Loss: 0.6925, Accuracy: 60.00%
Epoch 9/10, Loss: 0.6923, Accuracy: 63.00%
Epoch 10/10, Loss: 0.6922, Accuracy: 65.00%

Note: Because we're using randomly generated data without actual patterns, the model won't achieve high accuracy in this example.

Evaluating the Model

After training, we should evaluate our model on separate validation or test data:

python
def evaluate_model(model, data_loader, criterion):
model.eval() # Set the model to evaluation mode

running_loss = 0.0
correct = 0
total = 0

with torch.no_grad(): # No gradient computation during evaluation
for inputs, targets in data_loader:
outputs = model(inputs)
loss = criterion(outputs, targets)

running_loss += loss.item()
predicted = (outputs > 0.5).float()
total += targets.size(0)
correct += (predicted == targets).sum().item()

avg_loss = running_loss / len(data_loader)
accuracy = 100 * correct / total

print(f"Evaluation - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")

return avg_loss, accuracy

# Evaluate the model on the same data (in a real scenario, use separate test data)
evaluate_model(trained_model, train_loader, criterion)

Output:

Evaluation - Loss: 0.6921, Accuracy: 68.00%

Real-World Example: Training an MNIST Classifier

Let's apply these concepts to a real dataset - the classic MNIST handwritten digit classification:

python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Data transforms
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST dataset
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

# Create data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Define the model
class MNISTModel(nn.Module):
def __init__(self):
super(MNISTModel, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.dropout = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
self.relu = nn.ReLU()
self.max_pool = nn.MaxPool2d(2)
self.log_softmax = nn.LogSoftmax(dim=1)

def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.dropout(self.conv2(x))))
x = x.view(-1, 320) # Flatten
x = self.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return self.log_softmax(x)

# Initialize the model and move to device
model = MNISTModel().to(device)
print(model)

# Loss function and optimizer
criterion = nn.NLLLoss() # Negative Log Likelihood Loss
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function with progress tracking
def train_mnist(model, train_loader, criterion, optimizer, epochs=5):
model.train()

for epoch in range(epochs):
running_loss = 0.0
correct = 0
total = 0

for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)

optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()

running_loss += loss.item()
_, predicted = torch.max(output.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()

# Print progress every 100 batches
if batch_idx % 100 == 0:
print(f'Epoch: {epoch+1}/{epochs}, Batch: {batch_idx}/{len(train_loader)}, '
f'Loss: {loss.item():.4f}, Accuracy: {100 * correct / total:.2f}%')

# Print epoch statistics
epoch_loss = running_loss / len(train_loader)
accuracy = 100 * correct / total
print(f"Epoch {epoch+1}/{epochs} Summary - Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%")

return model

# Train for just 2 epochs to demonstrate (usually more would be better)
trained_model = train_mnist(model, train_loader, criterion, optimizer, epochs=2)

Output:

Using device: cuda
MNISTModel(
(conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
(dropout): Dropout2d(p=0.5, inplace=False)
(fc1): Linear(in_features=320, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=10, bias=True)
(relu): ReLU()
(max_pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(log_softmax): LogSoftmax(dim=1)
)
Epoch: 1/2, Batch: 0/938, Loss: 2.3023, Accuracy: 9.38%
Epoch: 1/2, Batch: 100/938, Loss: 0.4033, Accuracy: 85.93%
Epoch: 1/2, Batch: 200/938, Loss: 0.2601, Accuracy: 88.23%
Epoch: 1/2, Batch: 300/938, Loss: 0.2376, Accuracy: 89.28%
Epoch: 1/2, Batch: 400/938, Loss: 0.0909, Accuracy: 89.97%
Epoch: 1/2, Batch: 500/938, Loss: 0.0476, Accuracy: 90.47%
Epoch: 1/2, Batch: 600/938, Loss: 0.0384, Accuracy: 90.85%
Epoch: 1/2, Batch: 700/938, Loss: 0.0876, Accuracy: 91.14%
Epoch: 1/2, Batch: 800/938, Loss: 0.0674, Accuracy: 91.37%
Epoch: 1/2, Batch: 900/938, Loss: 0.1365, Accuracy: 91.58%
Epoch 1/2 Summary - Loss: 0.3196, Accuracy: 91.60%
Epoch: 2/2, Batch: 0/938, Loss: 0.0794, Accuracy: 96.88%
Epoch: 2/2, Batch: 100/938, Loss: 0.0547, Accuracy: 95.35%
Epoch: 2/2, Batch: 200/938, Loss: 0.0742, Accuracy: 95.44%
Epoch: 2/2, Batch: 300/938, Loss: 0.0788, Accuracy: 95.50%
Epoch: 2/2, Batch: 400/938, Loss: 0.0151, Accuracy: 95.55%
Epoch: 2/2, Batch: 500/938, Loss: 0.0224, Accuracy: 95.60%
Epoch: 2/2, Batch: 600/938, Loss: 0.0207, Accuracy: 95.64%
Epoch: 2/2, Batch: 700/938, Loss: 0.0153, Accuracy: 95.67%
Epoch: 2/2, Batch: 800/938, Loss: 0.0364, Accuracy: 95.70%
Epoch: 2/2, Batch: 900/938, Loss: 0.0327, Accuracy: 95.72%
Epoch 2/2 Summary - Loss: 0.1033, Accuracy: 95.73%

Testing the Trained Model

Let's evaluate our trained MNIST model:

python
def test(model, test_loader):
model.eval()
test_loss = 0
correct = 0

with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)

print(f'\nTest set: Average loss: {test_loss:.4f}, '
f'Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.2f}%)')

# Test the trained model
test(trained_model, test_loader)

Output:

Test set: Average loss: 0.0002, Accuracy: 9685/10000 (96.85%)

Saving and Loading the Model

Once you've trained a model, you'll want to save it for future use:

python
# Save the trained model
torch.save(trained_model.state_dict(), 'mnist_model.pth')
print("Model saved to mnist_model.pth")

# Load the model (demonstrate how to load)
model = MNISTModel().to(device)
model.load_state_dict(torch.load('mnist_model.pth'))
model.eval()
print("Model loaded successfully")

Output:

Model saved to mnist_model.pth
Model loaded successfully

Summary

In this guide, we've covered the essential components of training neural networks with PyTorch:

  1. Data preparation and loading using Dataset and DataLoader classes
  2. Model definition by creating classes that inherit from nn.Module
  3. Loss functions to measure model performance
  4. Optimizers to update model parameters
  5. Training loops to iteratively improve the model
  6. Evaluation to assess model performance
  7. Saving and loading models for later use

We demonstrated these concepts with both a simple synthetic example and a real-world MNIST classification task.

The PyTorch training loop follows this general pattern:

python
# Training loop
for epoch in range(num_epochs):
for batch in dataloader:
# 1. Zero gradients
optimizer.zero_grad()

# 2. Forward pass
outputs = model(inputs)

# 3. Compute loss
loss = criterion(outputs, targets)

# 4. Backward pass
loss.backward()

# 5. Update weights
optimizer.step()

This pattern is remarkably consistent across different neural network architectures and tasks, making it a fundamental skill for deep learning practitioners.

Additional Resources and Exercises

Resources

Exercises

  1. Basic Exercise: Modify the MNIST model to improve its accuracy (hint: try adding more layers or changing the activation functions).

  2. Intermediate Exercise: Implement a validation loop during training to monitor for overfitting.

  3. Advanced Exercise: Implement a training loop with learning rate scheduling and early stopping.

  4. Challenge: Adapt the code to train on the CIFAR-10 dataset, which contains color images in 10 classes.

  5. Project: Build a complete image classification system with data augmentation, model training, evaluation, and a simple interface for making predictions on new images.

Remember that practice is key to mastering these concepts. Happy training!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)