Skip to main content

PyTorch Sequential

Introduction

When building neural networks in PyTorch, organizing and connecting layers can sometimes become verbose and difficult to manage. This is where the nn.Sequential container comes in handy. It provides an elegant way to stack neural network layers sequentially, making your code cleaner and more readable.

In this tutorial, you'll learn:

  • What nn.Sequential is and why it's useful
  • How to create neural networks using Sequential
  • Comparing Sequential with the traditional subclassing approach
  • Best practices and common patterns

What is PyTorch Sequential?

torch.nn.Sequential is a container that runs its defined layers in sequence, one after another. It's the simplest way to compose a neural network when your architecture follows a linear flow (meaning each layer feeds directly into the next one with no branching or complex connections).

Think of it as placing building blocks in a straight line - the output of one block becomes the input to the next.

Creating Your First Sequential Model

Let's start by building a simple neural network that classifies images:

python
import torch
import torch.nn as nn
import torch.nn.functional as F

# Creating a sequential model
model = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
nn.Linear(64 * 7 * 7, 128),
nn.ReLU(),
nn.Linear(128, 10)
)

# Let's see what our model looks like
print(model)

Output:

Sequential(
(0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Flatten(start_dim=1, end_dim=-1)
(7): Linear(in_features=3136, out_features=128, bias=True)
(8): ReLU()
(9): Linear(in_features=128, out_features=10, bias=True)
)

As you can see, we've created a Convolutional Neural Network (CNN) for image classification. Each layer is automatically indexed, making it easy to identify components.

Using the Sequential Model

Let's see how we can use this model to process an image:

python
# Create a random input tensor (batch_size, channels, height, width)
x = torch.randn(1, 1, 28, 28) # Simulating a MNIST image

# Pass the input through the model
output = model(x)

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
print(f"Output values: {output}")

Output:

Input shape: torch.Size([1, 1, 28, 28])
Output shape: torch.Size([1, 10])
Output values: tensor([[ 0.1257, -0.0889, 0.1633, -0.1621, -0.0031, 0.0645, -0.0034,
-0.0138, 0.0091, -0.0254]], grad_fn=<AddmmBackward0>)

The model takes our 28×28 image with 1 channel and outputs 10 values (one for each class).

Named Layers in Sequential

Sometimes, it's helpful to name your layers for better organization and easier access. With Sequential, you can use an OrderedDict to name each layer:

python
from collections import OrderedDict

named_model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1, 32, kernel_size=3, padding=1)),
('relu1', nn.ReLU()),
('pool1', nn.MaxPool2d(kernel_size=2)),
('conv2', nn.Conv2d(32, 64, kernel_size=3, padding=1)),
('relu2', nn.ReLU()),
('pool2', nn.MaxPool2d(kernel_size=2)),
('flatten', nn.Flatten()),
('fc1', nn.Linear(64 * 7 * 7, 128)),
('relu3', nn.ReLU()),
('fc2', nn.Linear(128, 10))
]))

print(named_model)

# Accessing a specific layer by name
print(named_model.conv1)

Output:

Sequential(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU()
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu2): ReLU()
(pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=3136, out_features=128, bias=True)
(relu3): ReLU()
(fc2): Linear(in_features=128, out_features=10, bias=True)
)

Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

Sequential vs. Subclassing nn.Module

To understand the benefits of nn.Sequential, let's compare it with the traditional approach of subclassing nn.Module:

Using Sequential

python
sequential_model = nn.Sequential(
nn.Linear(10, 20),
nn.ReLU(),
nn.Linear(20, 15),
nn.ReLU(),
nn.Linear(15, 5)
)

Using nn.Module subclassing

python
class CustomModel(nn.Module):
def __init__(self):
super(CustomModel, self).__init__()
self.fc1 = nn.Linear(10, 20)
self.fc2 = nn.Linear(20, 15)
self.fc3 = nn.Linear(15, 5)

def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

custom_model = CustomModel()

As you can see, the Sequential approach is more concise. However, the nn.Module subclassing offers more flexibility when you need custom operations or non-linear architectures.

Adding Layers Dynamically

One nice feature of Sequential is the ability to add layers after initialization:

python
model = nn.Sequential()
model.add_module('conv1', nn.Conv2d(1, 32, 3, padding=1))
model.add_module('relu1', nn.ReLU())
model.add_module('pool1', nn.MaxPool2d(2))

# You can even append multiple layers as another Sequential
model.add_module('additional_layers', nn.Sequential(
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
))

print(model)

Output:

Sequential(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU()
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(additional_layers): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)

Practical Example: MNIST Classifier

Let's build a complete example of training a neural network on the MNIST dataset using nn.Sequential:

python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST dataset
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000)

# Create a model using Sequential
model = nn.Sequential(
nn.Flatten(),
nn.Linear(28 * 28, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10)
)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
def train(model, epochs):
for epoch in range(epochs):
model.train()
running_loss = 0.0

for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()

running_loss += loss.item()

if batch_idx % 100 == 99:
print(f'Epoch: {epoch+1}, Batch: {batch_idx+1}, Loss: {running_loss/100:.4f}')
running_loss = 0.0

# Evaluate on test set
model.eval()
correct = 0
with torch.no_grad():
for data, target in test_loader:
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()

print(f'Epoch: {epoch+1}, Test Accuracy: {correct/len(test_loader.dataset):.4f}')

# Uncomment to train the model (takes a few minutes)
# train(model, epochs=3)

This example creates a simple fully-connected neural network for MNIST digit classification using Sequential. The model consists of three linear layers with ReLU activations and dropout regularization.

When to Use Sequential

Use nn.Sequential when:

  1. Your network has a simple linear topology (each layer feeds into the next)
  2. You don't need custom operations between layers
  3. You want concise, readable code
  4. You're building prototype models quickly

Avoid nn.Sequential when:

  1. Your architecture has branches or skip connections
  2. You need to reuse intermediate outputs
  3. Your forward pass has conditional logic
  4. Your layers need to share weights or states

Sequential with Functional API

For more advanced users, you can combine nn.Sequential with PyTorch's functional API for even more flexibility:

python
class AdvancedModel(nn.Module):
def __init__(self):
super(AdvancedModel, self).__init__()

# Feature extraction block using Sequential
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
)

# Classifier block using Sequential
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(128 * 8 * 8, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 10)
)

def forward(self, x):
# Use Sequential blocks within a custom forward pass
features = self.feature_extractor(x)

# You can do custom operations here
if self.training:
# Apply some special processing during training
features = features * 0.9

output = self.classifier(features)
return output

advanced_model = AdvancedModel()
print(advanced_model)

Output:

AdvancedModel(
(feature_extractor): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=8192, out_features=256, bias=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=256, out_features=10, bias=True)
)
)

This approach gives you the readability benefits of Sequential while maintaining the flexibility of custom nn.Module subclasses.

Summary

PyTorch's nn.Sequential is a powerful tool for building neural networks with a linear structure. It offers:

  • A clean, concise way to stack layers
  • Automatic handling of connections between layers
  • The ability to name and access specific layers
  • Easy integration with more complex PyTorch models

While Sequential is perfect for straightforward architectures, remember that you can always subclass nn.Module when you need more flexibility.

Additional Resources

Exercises

  1. Create a Sequential model for binary classification of a tabular dataset with 10 features.
  2. Modify the MNIST example to use convolutional layers instead of fully connected layers.
  3. Build a model that combines both Sequential blocks and custom forward logic for image segmentation.
  4. Implement a simple autoencoder using two Sequential blocks (encoder and decoder).
  5. Create a Sequential model using the OrderedDict approach and access specific layers to modify their parameters.

Happy coding with PyTorch Sequential!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)