PyTorch Sequential
Introduction
When building neural networks in PyTorch, organizing and connecting layers can sometimes become verbose and difficult to manage. This is where the nn.Sequential
container comes in handy. It provides an elegant way to stack neural network layers sequentially, making your code cleaner and more readable.
In this tutorial, you'll learn:
- What
nn.Sequential
is and why it's useful - How to create neural networks using
Sequential
- Comparing
Sequential
with the traditional subclassing approach - Best practices and common patterns
What is PyTorch Sequential?
torch.nn.Sequential
is a container that runs its defined layers in sequence, one after another. It's the simplest way to compose a neural network when your architecture follows a linear flow (meaning each layer feeds directly into the next one with no branching or complex connections).
Think of it as placing building blocks in a straight line - the output of one block becomes the input to the next.
Creating Your First Sequential Model
Let's start by building a simple neural network that classifies images:
import torch
import torch.nn as nn
import torch.nn.functional as F
# Creating a sequential model
model = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
nn.Linear(64 * 7 * 7, 128),
nn.ReLU(),
nn.Linear(128, 10)
)
# Let's see what our model looks like
print(model)
Output:
Sequential(
(0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Flatten(start_dim=1, end_dim=-1)
(7): Linear(in_features=3136, out_features=128, bias=True)
(8): ReLU()
(9): Linear(in_features=128, out_features=10, bias=True)
)
As you can see, we've created a Convolutional Neural Network (CNN) for image classification. Each layer is automatically indexed, making it easy to identify components.
Using the Sequential Model
Let's see how we can use this model to process an image:
# Create a random input tensor (batch_size, channels, height, width)
x = torch.randn(1, 1, 28, 28) # Simulating a MNIST image
# Pass the input through the model
output = model(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print(f"Output values: {output}")
Output:
Input shape: torch.Size([1, 1, 28, 28])
Output shape: torch.Size([1, 10])
Output values: tensor([[ 0.1257, -0.0889, 0.1633, -0.1621, -0.0031, 0.0645, -0.0034,
-0.0138, 0.0091, -0.0254]], grad_fn=<AddmmBackward0>)
The model takes our 28×28 image with 1 channel and outputs 10 values (one for each class).
Named Layers in Sequential
Sometimes, it's helpful to name your layers for better organization and easier access. With Sequential
, you can use an OrderedDict
to name each layer:
from collections import OrderedDict
named_model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1, 32, kernel_size=3, padding=1)),
('relu1', nn.ReLU()),
('pool1', nn.MaxPool2d(kernel_size=2)),
('conv2', nn.Conv2d(32, 64, kernel_size=3, padding=1)),
('relu2', nn.ReLU()),
('pool2', nn.MaxPool2d(kernel_size=2)),
('flatten', nn.Flatten()),
('fc1', nn.Linear(64 * 7 * 7, 128)),
('relu3', nn.ReLU()),
('fc2', nn.Linear(128, 10))
]))
print(named_model)
# Accessing a specific layer by name
print(named_model.conv1)
Output:
Sequential(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU()
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu2): ReLU()
(pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=3136, out_features=128, bias=True)
(relu3): ReLU()
(fc2): Linear(in_features=128, out_features=10, bias=True)
)
Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
Sequential vs. Subclassing nn.Module
To understand the benefits of nn.Sequential
, let's compare it with the traditional approach of subclassing nn.Module
:
Using Sequential
sequential_model = nn.Sequential(
nn.Linear(10, 20),
nn.ReLU(),
nn.Linear(20, 15),
nn.ReLU(),
nn.Linear(15, 5)
)
Using nn.Module subclassing
class CustomModel(nn.Module):
def __init__(self):
super(CustomModel, self).__init__()
self.fc1 = nn.Linear(10, 20)
self.fc2 = nn.Linear(20, 15)
self.fc3 = nn.Linear(15, 5)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
custom_model = CustomModel()
As you can see, the Sequential
approach is more concise. However, the nn.Module
subclassing offers more flexibility when you need custom operations or non-linear architectures.
Adding Layers Dynamically
One nice feature of Sequential
is the ability to add layers after initialization:
model = nn.Sequential()
model.add_module('conv1', nn.Conv2d(1, 32, 3, padding=1))
model.add_module('relu1', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(2))
# You can even append multiple layers as another Sequential
model.add_module('additional_layers', nn.Sequential(
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
))
print(model)
Output:
Sequential(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU()
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(additional_layers): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)
Practical Example: MNIST Classifier
Let's build a complete example of training a neural network on the MNIST dataset using nn.Sequential
:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Define transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Load MNIST dataset
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000)
# Create a model using Sequential
model = nn.Sequential(
nn.Flatten(),
nn.Linear(28 * 28, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10)
)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
def train(model, epochs):
for epoch in range(epochs):
model.train()
running_loss = 0.0
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
running_loss += loss.item()
if batch_idx % 100 == 99:
print(f'Epoch: {epoch+1}, Batch: {batch_idx+1}, Loss: {running_loss/100:.4f}')
running_loss = 0.0
# Evaluate on test set
model.eval()
correct = 0
with torch.no_grad():
for data, target in test_loader:
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
print(f'Epoch: {epoch+1}, Test Accuracy: {correct/len(test_loader.dataset):.4f}')
# Uncomment to train the model (takes a few minutes)
# train(model, epochs=3)
This example creates a simple fully-connected neural network for MNIST digit classification using Sequential
. The model consists of three linear layers with ReLU activations and dropout regularization.
When to Use Sequential
Use nn.Sequential
when:
- Your network has a simple linear topology (each layer feeds into the next)
- You don't need custom operations between layers
- You want concise, readable code
- You're building prototype models quickly
Avoid nn.Sequential
when:
- Your architecture has branches or skip connections
- You need to reuse intermediate outputs
- Your forward pass has conditional logic
- Your layers need to share weights or states
Sequential with Functional API
For more advanced users, you can combine nn.Sequential
with PyTorch's functional API for even more flexibility:
class AdvancedModel(nn.Module):
def __init__(self):
super(AdvancedModel, self).__init__()
# Feature extraction block using Sequential
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
)
# Classifier block using Sequential
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(128 * 8 * 8, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 10)
)
def forward(self, x):
# Use Sequential blocks within a custom forward pass
features = self.feature_extractor(x)
# You can do custom operations here
if self.training:
# Apply some special processing during training
features = features * 0.9
output = self.classifier(features)
return output
advanced_model = AdvancedModel()
print(advanced_model)
Output:
AdvancedModel(
(feature_extractor): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=8192, out_features=256, bias=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=256, out_features=10, bias=True)
)
)
This approach gives you the readability benefits of Sequential
while maintaining the flexibility of custom nn.Module
subclasses.
Summary
PyTorch's nn.Sequential
is a powerful tool for building neural networks with a linear structure. It offers:
- A clean, concise way to stack layers
- Automatic handling of connections between layers
- The ability to name and access specific layers
- Easy integration with more complex PyTorch models
While Sequential
is perfect for straightforward architectures, remember that you can always subclass nn.Module
when you need more flexibility.
Additional Resources
Exercises
- Create a
Sequential
model for binary classification of a tabular dataset with 10 features. - Modify the MNIST example to use convolutional layers instead of fully connected layers.
- Build a model that combines both
Sequential
blocks and custom forward logic for image segmentation. - Implement a simple autoencoder using two
Sequential
blocks (encoder and decoder). - Create a
Sequential
model using the OrderedDict approach and access specific layers to modify their parameters.
Happy coding with PyTorch Sequential!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)