Skip to main content

PyTorch Model Building

Introduction

Building neural network models is at the heart of deep learning, and PyTorch provides a flexible and intuitive framework for designing and implementing these models. Whether you're creating a simple feedforward network or a complex architecture, understanding how to build models in PyTorch is an essential skill.

In this tutorial, we'll explore how to create neural network models in PyTorch from scratch. We'll start with basic concepts and gradually move to more complex model structures. By the end, you'll be comfortable defining your own neural network architectures and ready to solve real-world problems.

Prerequisites

Before diving in, make sure you have:

  • Basic understanding of Python
  • PyTorch installed (pip install torch torchvision)
  • Familiarity with basic neural network concepts (neurons, layers, activation functions)

The Building Blocks: nn.Module

At the foundation of PyTorch model building is the nn.Module class. This is the base class for all neural network modules in PyTorch.

Understanding nn.Module

nn.Module is a powerful class that provides:

  • Parameter management
  • GPU support
  • Export functionality
  • Component composition

Let's start by creating a simple neural network using nn.Module:

python
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleNetwork(nn.Module):
def __init__(self):
super(SimpleNetwork, self).__init__()
# Define layers
self.fc1 = nn.Linear(in_features=10, out_features=50)
self.fc2 = nn.Linear(in_features=50, out_features=20)
self.fc3 = nn.Linear(in_features=20, out_features=2)

def forward(self, x):
# Define forward pass
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

# Instantiate the model
model = SimpleNetwork()
print(model)

Output:

SimpleNetwork(
(fc1): Linear(in_features=10, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=20, bias=True)
(fc3): Linear(in_features=20, out_features=2, bias=True)
)

Key Components of a PyTorch Model

Every PyTorch model has two essential methods:

  1. __init__: Where you define the layers and components
  2. forward: Where you specify how data passes through the network

Creating Custom Layers

You can create custom layers by extending the nn.Module class:

python
class CustomLayer(nn.Module):
def __init__(self, in_features, out_features):
super(CustomLayer, self).__init__()
self.linear = nn.Linear(in_features, out_features)
self.activation = nn.ReLU()
self.dropout = nn.Dropout(0.5)

def forward(self, x):
x = self.linear(x)
x = self.activation(x)
x = self.dropout(x)
return x

# Using our custom layer
class NetworkWithCustomLayer(nn.Module):
def __init__(self):
super(NetworkWithCustomLayer, self).__init__()
self.layer1 = CustomLayer(10, 50)
self.layer2 = CustomLayer(50, 20)
self.output = nn.Linear(20, 2)

def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.output(x)
return x

Common Neural Network Architectures

Let's explore implementing some common neural network architectures in PyTorch.

Feedforward Neural Network (Multi-Layer Perceptron)

python
class FeedForwardNN(nn.Module):
def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.2):
super(FeedForwardNN, self).__init__()

# Input layer
layers = [nn.Linear(input_size, hidden_sizes[0]), nn.ReLU(), nn.Dropout(dropout_rate)]

# Hidden layers
for i in range(len(hidden_sizes)-1):
layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i+1]))
layers.append(nn.ReLU())
layers.append(nn.Dropout(dropout_rate))

# Output layer
layers.append(nn.Linear(hidden_sizes[-1], output_size))

# Combine all layers into a sequential model
self.model = nn.Sequential(*layers)

def forward(self, x):
return self.model(x)

# Example usage
model = FeedForwardNN(
input_size=28*28, # e.g., for MNIST
hidden_sizes=[512, 256, 128],
output_size=10 # 10 classes
)

# Create a dummy input
dummy_input = torch.randn(64, 28*28) # Batch size of 64
output = model(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")

Output:

Input shape: torch.Size([64, 784])
Output shape: torch.Size([64, 10])

Convolutional Neural Network (CNN)

python
class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()

# Convolutional layers
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)

# Pooling layer
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

# Fully connected layers
self.fc1 = nn.Linear(64 * 7 * 7, 128)
self.fc2 = nn.Linear(128, num_classes)

# Dropout
self.dropout = nn.Dropout(0.25)

def forward(self, x):
# First conv block
x = F.relu(self.conv1(x))
x = self.pool(x)

# Second conv block
x = F.relu(self.conv2(x))
x = self.pool(x)

# Flatten
x = x.view(-1, 64 * 7 * 7)

# Fully connected layers
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)

return x

# Example usage for MNIST
cnn = SimpleCNN(num_classes=10)

# Create a dummy input (batch_size, channels, height, width)
dummy_input = torch.randn(64, 1, 28, 28)
output = cnn(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")

Output:

Input shape: torch.Size([64, 1, 28, 28])
Output shape: torch.Size([64, 10])

Recurrent Neural Network (RNN)

python
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(SimpleRNN, self).__init__()

self.hidden_size = hidden_size
self.num_layers = num_layers

# RNN layer
self.rnn = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True
)

# Output layer
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
# Initialize hidden state
batch_size = x.size(0)

# Forward propagate through RNN
# output shape: (batch_size, seq_length, hidden_size)
output, _ = self.rnn(x)

# Take the output from the last time step
# output shape: (batch_size, hidden_size)
output = output[:, -1, :]

# Pass through the fully connected layer
output = self.fc(output)
return output

# Example usage for sequence data
rnn_model = SimpleRNN(input_size=10, hidden_size=64, output_size=5, num_layers=2)

# Create a dummy sequence input (batch_size, sequence_length, feature_size)
dummy_input = torch.randn(32, 20, 10)
output = rnn_model(dummy_input)
print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")

Output:

Input shape: torch.Size([32, 20, 10])
Output shape: torch.Size([32, 5])

Training Your PyTorch Models

Once you've defined your model architecture, you need to train it. Here's a basic training loop in PyTorch:

python
def train_model(model, train_loader, criterion, optimizer, device, num_epochs=5):
model.train() # Set model to training mode

for epoch in range(num_epochs):
running_loss = 0.0

for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)

# Zero the parameter gradients
optimizer.zero_grad()

# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)

# Backward pass and optimize
loss.backward()
optimizer.step()

running_loss += loss.item() * inputs.size(0)

epoch_loss = running_loss / len(train_loader.dataset)
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}')

# Example usage (assuming we have a DataLoader already set up)
model = FeedForwardNN(input_size=784, hidden_sizes=[512, 256], output_size=10)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Assuming train_loader is defined
# train_model(model, train_loader, criterion, optimizer, device, num_epochs=5)

Model Evaluation

After training, you'll want to evaluate your model:

python
def evaluate_model(model, test_loader, criterion, device):
model.eval() # Set model to evaluation mode

running_loss = 0.0
correct = 0
total = 0

with torch.no_grad(): # Disable gradient calculation for inference
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)

# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)

# Calculate statistics
running_loss += loss.item() * inputs.size(0)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

test_loss = running_loss / len(test_loader.dataset)
accuracy = correct / total

print(f'Test Loss: {test_loss:.4f}, Accuracy: {accuracy:.4f}')
return test_loss, accuracy

# Example usage (assuming test_loader is defined)
# test_loss, accuracy = evaluate_model(model, test_loader, criterion, device)

Real-World Example: Handwritten Digit Recognition

Let's put everything together with a real-world example using the MNIST dataset:

python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define transforms
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(
root='./data',
train=True,
download=True,
transform=transform
)

test_dataset = torchvision.datasets.MNIST(
root='./data',
train=False,
download=True,
transform=transform
)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000)

# Define the CNN model
class MNISTClassifier(nn.Module):
def __init__(self):
super(MNISTClassifier, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)

# Setup model, criterion, and optimizer
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MNISTClassifier().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

# Training function
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)

optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()

if batch_idx % 100 == 0:
print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

# Testing function
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0

with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)

print(f'\nTest set: Average loss: {test_loss:.4f}, '
f'Accuracy: {correct}/{len(test_loader.dataset)} ({accuracy:.0f}%)\n')

# Train and test the model
# for epoch in range(1, 3):
# train(model, device, train_loader, optimizer, epoch)
# test(model, device, test_loader)

# Save the model
# torch.save(model.state_dict(), "mnist_cnn.pt")

Advanced Model Building Techniques

Once you're comfortable with basic model building, you can explore advanced techniques:

Using Sequential Containers

python
# Build model with nn.Sequential
sequential_model = nn.Sequential(
nn.Conv2d(1, 32, 3, 1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3, 1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1600, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 10)
)

Using ModuleList and ModuleDict

python
class DynamicNetwork(nn.Module):
def __init__(self, layer_sizes):
super(DynamicNetwork, self).__init__()

self.layers = nn.ModuleList()

# Add layers dynamically
for i in range(len(layer_sizes) - 1):
self.layers.append(nn.Linear(layer_sizes[i], layer_sizes[i+1]))

def forward(self, x):
for i, layer in enumerate(self.layers):
x = layer(x)
# Apply ReLU to all but the last layer
if i < len(self.layers) - 1:
x = F.relu(x)
return x

# Create a model with custom layer sizes
dynamic_model = DynamicNetwork([784, 512, 256, 128, 10])
print(dynamic_model)

Weight Initialization

python
def initialize_weights(m):
if isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)

# Apply weight initialization to the model
model = FeedForwardNN(input_size=784, hidden_sizes=[512, 256], output_size=10)
model.apply(initialize_weights)

Summary

In this tutorial, we've covered the essential aspects of building neural network models with PyTorch:

  1. The Foundation: Understanding nn.Module as the base class for all PyTorch models
  2. Building Custom Models: Creating your own network architectures
  3. Common Architectures: Implementing feedforward networks, CNNs, and RNNs
  4. Training and Evaluation: Setting up training loops and evaluation procedures
  5. Real-World Application: Building a digit classifier with the MNIST dataset
  6. Advanced Techniques: Using sequential containers, module lists, and weight initialization

PyTorch's flexible design makes it a powerful tool for building neural networks, from simple prototypes to complex research models. By understanding these building blocks, you can create custom architectures tailored to your specific needs.

Further Resources

To continue your PyTorch journey, explore these resources:

Exercises

  1. Modify the FeedForwardNN class to include batch normalization after each hidden layer.
  2. Implement a ResNet-style network with skip connections.
  3. Create a model that combines CNN and LSTM layers for image sequence analysis.
  4. Build a simple autoencoder for dimensionality reduction on the MNIST dataset.
  5. Implement transfer learning by using a pre-trained model from torchvision.models and fine-tuning it on a new dataset.

Happy modeling with PyTorch!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)