Skip to main content

PyTorch Convolutional Networks

Convolutional Neural Networks (CNNs) are specialized deep learning models that have revolutionized computer vision tasks. In this tutorial, we'll dive into building and training CNNs using PyTorch, exploring their architecture, implementation, and applications.

Introduction to Convolutional Neural Networks

Convolutional Neural Networks are designed to automatically detect important features in images without manual feature extraction. They're inspired by how the visual cortex in animals processes images and have proven extremely effective for tasks like:

  • Image classification
  • Object detection
  • Face recognition
  • Image segmentation

Unlike traditional neural networks that use fully connected layers, CNNs leverage three key ideas:

  • Local receptive fields: Neurons only connect to a small region of the input
  • Shared weights: The same filter is applied across the entire image
  • Pooling: Downsampling operations that reduce spatial dimensions

Core Components of a CNN

1. Convolutional Layers

The convolutional layer is the core building block of a CNN. It applies a set of learnable filters to the input.

python
import torch
import torch.nn as nn

# Define a simple convolutional layer
conv_layer = nn.Conv2d(
in_channels=3, # RGB input
out_channels=16, # Number of filters
kernel_size=3, # 3x3 filter
stride=1, # Step size
padding=1 # Border padding
)

# Apply it to a random image tensor (batch_size, channels, height, width)
sample_image = torch.randn(1, 3, 28, 28)
output = conv_layer(sample_image)

print(f"Input shape: {sample_image.shape}")
print(f"Output shape: {output.shape}")

Output:

Input shape: torch.Size([1, 3, 28, 28])
Output shape: torch.Size([1, 16, 28, 28])

Notice how the convolutional layer preserved the spatial dimensions (28x28) while changing the number of channels from 3 to 16.

2. Activation Functions

After convolution, we typically apply a non-linear activation function. ReLU (Rectified Linear Unit) is the most common choice for CNNs.

python
relu = nn.ReLU()
activated_output = relu(output)

3. Pooling Layers

Pooling layers reduce the spatial dimensions of the data, helping with:

  • Computational efficiency
  • Controlling overfitting
  • Providing translation invariance
python
# Max pooling layer
max_pool = nn.MaxPool2d(kernel_size=2, stride=2)
pooled_output = max_pool(activated_output)

print(f"Before pooling: {activated_output.shape}")
print(f"After pooling: {pooled_output.shape}")

Output:

Before pooling: torch.Size([1, 16, 28, 28])
After pooling: torch.Size([1, 16, 14, 14])

4. Fully Connected Layers

After several convolution and pooling layers, we flatten the output and connect it to fully connected layers for classification.

python
# Flatten the output
flattened = pooled_output.view(pooled_output.size(0), -1)
print(f"Flattened shape: {flattened.shape}")

# Fully connected layer
fc = nn.Linear(in_features=16*14*14, out_features=10) # 10 classes
final_output = fc(flattened)
print(f"Final output shape: {final_output.shape}")

Output:

Flattened shape: torch.Size([1, 3136])
Final output shape: torch.Size([1, 10])

Building a Complete CNN in PyTorch

Now, let's put everything together to build a complete CNN for classifying images from the CIFAR-10 dataset:

python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Define the CNN architecture
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# First convolutional block
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2)

# Second convolutional block
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2)

# Third convolutional block
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.relu3 = nn.ReLU()
self.pool3 = nn.MaxPool2d(kernel_size=2)

# Fully connected layers
# CIFAR-10 images are 32x32
# After 3 pooling layers of size 2, we get 4x4 feature maps
self.fc1 = nn.Linear(128 * 4 * 4, 512)
self.relu4 = nn.ReLU()
self.dropout = nn.Dropout(0.5) # Regularization
self.fc2 = nn.Linear(512, 10) # 10 classes for CIFAR-10

def forward(self, x):
# Apply convolutional blocks
x = self.pool1(self.relu1(self.conv1(x)))
x = self.pool2(self.relu2(self.conv2(x)))
x = self.pool3(self.relu3(self.conv3(x)))

# Flatten
x = x.view(-1, 128 * 4 * 4)

# Apply fully connected layers
x = self.relu4(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)

return x

Training the CNN

Let's set up the training pipeline for our CNN:

python
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Data preprocessing
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
shuffle=False, num_workers=2)

# Classes in CIFAR-10
classes = ('plane', 'car', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')

# Initialize the model
model = SimpleCNN().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
def train_model(model, epochs=5):
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)

# Zero the parameter gradients
optimizer.zero_grad()

# Forward + backward + optimize
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# Print statistics
running_loss += loss.item()
if i % 200 == 199: # Print every 200 mini-batches
print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 200:.3f}')
running_loss = 0.0

print('Finished Training')

# Train the model
# train_model(model) # Uncomment to train

Evaluating the CNN

After training, we need to evaluate our model's performance:

python
def evaluate_model(model):
correct = 0
total = 0
# No need to track gradients during evaluation
with torch.no_grad():
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f'Accuracy on 10,000 test images: {100 * correct / total:.2f}%')

# Evaluate the model
# evaluate_model(model) # Uncomment to evaluate

Visualizing Activations

Understanding what your CNN "sees" can provide insights into how it works:

python
import matplotlib.pyplot as plt
import numpy as np

def visualize_activations(model, image):
# Set model to evaluation mode
model.eval()

# Move image to the same device as model
image = image.to(device)

# Get activations from first convolutional layer
activation = {}

def get_activation(name):
def hook(model, input, output):
activation[name] = output.detach()
return hook

# Register hook
handle = model.conv1.register_forward_hook(get_activation('conv1'))

# Forward pass
output = model(image)

# Remove hook
handle.remove()

# Convert activations to numpy for visualization
act = activation['conv1'].squeeze().cpu().numpy()

# Plot the first 16 feature maps
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flatten()):
if i < len(act):
ax.imshow(act[i], cmap='viridis')
ax.axis('off')

plt.tight_layout()
plt.show()

# Get a sample image
# dataiter = iter(testloader)
# images, _ = next(dataiter)
# visualize_activations(model, images[0:1]) # Uncomment to visualize

Real-World Application: Transfer Learning

In practice, instead of training a CNN from scratch, we often use pre-trained models and fine-tune them for our specific task. This is called transfer learning:

python
import torchvision.models as models

def create_transfer_learning_model():
# Load pre-trained ResNet-18
resnet = models.resnet18(pretrained=True)

# Freeze all parameters
for param in resnet.parameters():
param.requires_grad = False

# Replace the final fully connected layer
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 10) # 10 classes for CIFAR-10

return resnet

# Create transfer learning model
transfer_model = create_transfer_learning_model().to(device)

# Now we only train the final layer
optimizer = optim.Adam(transfer_model.fc.parameters(), lr=0.001)

# Training and evaluation would follow the same pattern as before
# train_model(transfer_model, epochs=3)
# evaluate_model(transfer_model)

Advanced CNN Architectures

Modern computer vision applications often use more sophisticated CNN architectures:

  1. ResNet (Residual Networks): Introduces skip connections to solve the vanishing gradient problem in deep networks.

  2. Inception/GoogLeNet: Uses parallel convolutional blocks with different kernel sizes.

  3. DenseNet: Creates dense connections between layers to improve gradient flow.

  4. EfficientNet: Uses neural architecture search to balance network depth, width, and resolution.

Here's how to use pretrained versions in PyTorch:

python
# Load pretrained models
resnet = models.resnet50(pretrained=True)
inception = models.inception_v3(pretrained=True)
densenet = models.densenet121(pretrained=True)

# These models can be fine-tuned for your specific task

Common CNN Techniques

Data Augmentation

Data augmentation increases the effective size of your training set by applying random transformations:

python
# Enhanced data augmentation
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Batch Normalization

Batch normalization accelerates training and improves stability:

python
class ImprovedCNN(nn.Module):
def __init__(self):
super(ImprovedCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(32) # Batch normalization
self.relu = nn.ReLU()
# ... other layers

def forward(self, x):
x = self.relu(self.bn1(self.conv1(x)))
# ... rest of the forward pass
return x

Dropout

Dropout helps prevent overfitting:

python
# Apply dropout after fully connected layers
self.fc1 = nn.Linear(512, 256)
self.dropout = nn.Dropout(0.5) # 50% dropout rate

Summary

In this tutorial, we covered:

  1. The core components of CNNs: Convolutional layers, activation functions, pooling, and fully connected layers
  2. Building a complete CNN in PyTorch: From architecture definition to training and evaluation
  3. Visualizing CNN activations: Understanding what our model "sees"
  4. Transfer learning: Leveraging pre-trained models for new tasks
  5. Advanced CNN architectures: Brief overview of modern network designs
  6. Common CNN techniques: Data augmentation, batch normalization, and dropout

Convolutional Neural Networks have revolutionized computer vision, enabling applications that were once considered impossible. With PyTorch's intuitive design, you can build and experiment with CNNs more easily than ever before.

Exercises

  1. Basic: Modify the SimpleCNN architecture to include batch normalization layers and observe if it improves performance.
  2. Intermediate: Implement a custom dataset loader for your own image data and train a CNN on it.
  3. Advanced: Implement a technique called Grad-CAM to visualize which parts of an image your CNN focuses on when making predictions.

Additional Resources

With the knowledge gained from this tutorial, you're well on your way to building sophisticated computer vision applications with PyTorch!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)