PyTorch Convolutional Networks
Convolutional Neural Networks (CNNs) are specialized deep learning models that have revolutionized computer vision tasks. In this tutorial, we'll dive into building and training CNNs using PyTorch, exploring their architecture, implementation, and applications.
Introduction to Convolutional Neural Networks
Convolutional Neural Networks are designed to automatically detect important features in images without manual feature extraction. They're inspired by how the visual cortex in animals processes images and have proven extremely effective for tasks like:
- Image classification
- Object detection
- Face recognition
- Image segmentation
Unlike traditional neural networks that use fully connected layers, CNNs leverage three key ideas:
- Local receptive fields: Neurons only connect to a small region of the input
- Shared weights: The same filter is applied across the entire image
- Pooling: Downsampling operations that reduce spatial dimensions
Core Components of a CNN
1. Convolutional Layers
The convolutional layer is the core building block of a CNN. It applies a set of learnable filters to the input.
import torch
import torch.nn as nn
# Define a simple convolutional layer
conv_layer = nn.Conv2d(
in_channels=3, # RGB input
out_channels=16, # Number of filters
kernel_size=3, # 3x3 filter
stride=1, # Step size
padding=1 # Border padding
)
# Apply it to a random image tensor (batch_size, channels, height, width)
sample_image = torch.randn(1, 3, 28, 28)
output = conv_layer(sample_image)
print(f"Input shape: {sample_image.shape}")
print(f"Output shape: {output.shape}")
Output:
Input shape: torch.Size([1, 3, 28, 28])
Output shape: torch.Size([1, 16, 28, 28])
Notice how the convolutional layer preserved the spatial dimensions (28x28) while changing the number of channels from 3 to 16.
2. Activation Functions
After convolution, we typically apply a non-linear activation function. ReLU (Rectified Linear Unit) is the most common choice for CNNs.
relu = nn.ReLU()
activated_output = relu(output)
3. Pooling Layers
Pooling layers reduce the spatial dimensions of the data, helping with:
- Computational efficiency
- Controlling overfitting
- Providing translation invariance
# Max pooling layer
max_pool = nn.MaxPool2d(kernel_size=2, stride=2)
pooled_output = max_pool(activated_output)
print(f"Before pooling: {activated_output.shape}")
print(f"After pooling: {pooled_output.shape}")
Output:
Before pooling: torch.Size([1, 16, 28, 28])
After pooling: torch.Size([1, 16, 14, 14])
4. Fully Connected Layers
After several convolution and pooling layers, we flatten the output and connect it to fully connected layers for classification.
# Flatten the output
flattened = pooled_output.view(pooled_output.size(0), -1)
print(f"Flattened shape: {flattened.shape}")
# Fully connected layer
fc = nn.Linear(in_features=16*14*14, out_features=10) # 10 classes
final_output = fc(flattened)
print(f"Final output shape: {final_output.shape}")
Output:
Flattened shape: torch.Size([1, 3136])
Final output shape: torch.Size([1, 10])
Building a Complete CNN in PyTorch
Now, let's put everything together to build a complete CNN for classifying images from the CIFAR-10 dataset:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# Define the CNN architecture
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# First convolutional block
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2)
# Second convolutional block
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2)
# Third convolutional block
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.relu3 = nn.ReLU()
self.pool3 = nn.MaxPool2d(kernel_size=2)
# Fully connected layers
# CIFAR-10 images are 32x32
# After 3 pooling layers of size 2, we get 4x4 feature maps
self.fc1 = nn.Linear(128 * 4 * 4, 512)
self.relu4 = nn.ReLU()
self.dropout = nn.Dropout(0.5) # Regularization
self.fc2 = nn.Linear(512, 10) # 10 classes for CIFAR-10
def forward(self, x):
# Apply convolutional blocks
x = self.pool1(self.relu1(self.conv1(x)))
x = self.pool2(self.relu2(self.conv2(x)))
x = self.pool3(self.relu3(self.conv3(x)))
# Flatten
x = x.view(-1, 128 * 4 * 4)
# Apply fully connected layers
x = self.relu4(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
Training the CNN
Let's set up the training pipeline for our CNN:
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# Data preprocessing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
shuffle=False, num_workers=2)
# Classes in CIFAR-10
classes = ('plane', 'car', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')
# Initialize the model
model = SimpleCNN().to(device)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
def train_model(model, epochs=5):
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
# Zero the parameter gradients
optimizer.zero_grad()
# Forward + backward + optimize
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Print statistics
running_loss += loss.item()
if i % 200 == 199: # Print every 200 mini-batches
print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 200:.3f}')
running_loss = 0.0
print('Finished Training')
# Train the model
# train_model(model) # Uncomment to train
Evaluating the CNN
After training, we need to evaluate our model's performance:
def evaluate_model(model):
correct = 0
total = 0
# No need to track gradients during evaluation
with torch.no_grad():
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy on 10,000 test images: {100 * correct / total:.2f}%')
# Evaluate the model
# evaluate_model(model) # Uncomment to evaluate
Visualizing Activations
Understanding what your CNN "sees" can provide insights into how it works:
import matplotlib.pyplot as plt
import numpy as np
def visualize_activations(model, image):
# Set model to evaluation mode
model.eval()
# Move image to the same device as model
image = image.to(device)
# Get activations from first convolutional layer
activation = {}
def get_activation(name):
def hook(model, input, output):
activation[name] = output.detach()
return hook
# Register hook
handle = model.conv1.register_forward_hook(get_activation('conv1'))
# Forward pass
output = model(image)
# Remove hook
handle.remove()
# Convert activations to numpy for visualization
act = activation['conv1'].squeeze().cpu().numpy()
# Plot the first 16 feature maps
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flatten()):
if i < len(act):
ax.imshow(act[i], cmap='viridis')
ax.axis('off')
plt.tight_layout()
plt.show()
# Get a sample image
# dataiter = iter(testloader)
# images, _ = next(dataiter)
# visualize_activations(model, images[0:1]) # Uncomment to visualize
Real-World Application: Transfer Learning
In practice, instead of training a CNN from scratch, we often use pre-trained models and fine-tune them for our specific task. This is called transfer learning:
import torchvision.models as models
def create_transfer_learning_model():
# Load pre-trained ResNet-18
resnet = models.resnet18(pretrained=True)
# Freeze all parameters
for param in resnet.parameters():
param.requires_grad = False
# Replace the final fully connected layer
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 10) # 10 classes for CIFAR-10
return resnet
# Create transfer learning model
transfer_model = create_transfer_learning_model().to(device)
# Now we only train the final layer
optimizer = optim.Adam(transfer_model.fc.parameters(), lr=0.001)
# Training and evaluation would follow the same pattern as before
# train_model(transfer_model, epochs=3)
# evaluate_model(transfer_model)
Advanced CNN Architectures
Modern computer vision applications often use more sophisticated CNN architectures:
-
ResNet (Residual Networks): Introduces skip connections to solve the vanishing gradient problem in deep networks.
-
Inception/GoogLeNet: Uses parallel convolutional blocks with different kernel sizes.
-
DenseNet: Creates dense connections between layers to improve gradient flow.
-
EfficientNet: Uses neural architecture search to balance network depth, width, and resolution.
Here's how to use pretrained versions in PyTorch:
# Load pretrained models
resnet = models.resnet50(pretrained=True)
inception = models.inception_v3(pretrained=True)
densenet = models.densenet121(pretrained=True)
# These models can be fine-tuned for your specific task
Common CNN Techniques
Data Augmentation
Data augmentation increases the effective size of your training set by applying random transformations:
# Enhanced data augmentation
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Batch Normalization
Batch normalization accelerates training and improves stability:
class ImprovedCNN(nn.Module):
def __init__(self):
super(ImprovedCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(32) # Batch normalization
self.relu = nn.ReLU()
# ... other layers
def forward(self, x):
x = self.relu(self.bn1(self.conv1(x)))
# ... rest of the forward pass
return x
Dropout
Dropout helps prevent overfitting:
# Apply dropout after fully connected layers
self.fc1 = nn.Linear(512, 256)
self.dropout = nn.Dropout(0.5) # 50% dropout rate
Summary
In this tutorial, we covered:
- The core components of CNNs: Convolutional layers, activation functions, pooling, and fully connected layers
- Building a complete CNN in PyTorch: From architecture definition to training and evaluation
- Visualizing CNN activations: Understanding what our model "sees"
- Transfer learning: Leveraging pre-trained models for new tasks
- Advanced CNN architectures: Brief overview of modern network designs
- Common CNN techniques: Data augmentation, batch normalization, and dropout
Convolutional Neural Networks have revolutionized computer vision, enabling applications that were once considered impossible. With PyTorch's intuitive design, you can build and experiment with CNNs more easily than ever before.
Exercises
- Basic: Modify the
SimpleCNN
architecture to include batch normalization layers and observe if it improves performance. - Intermediate: Implement a custom dataset loader for your own image data and train a CNN on it.
- Advanced: Implement a technique called Grad-CAM to visualize which parts of an image your CNN focuses on when making predictions.
Additional Resources
- PyTorch Vision Models Documentation
- CS231n: Convolutional Neural Networks for Visual Recognition
- Deep Learning for Computer Vision by Andrew Ng
- Visualizing and Understanding Convolutional Networks (Zeiler & Fergus)
- Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG paper)
With the knowledge gained from this tutorial, you're well on your way to building sophisticated computer vision applications with PyTorch!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)