Skip to main content

PyTorch Pretrained Models

Introduction

When starting a computer vision project, building and training models from scratch can be time-consuming and computationally expensive. Fortunately, PyTorch provides a collection of pre-trained models through its torchvision.models module. These models have been trained on large datasets like ImageNet and can be used as-is or fine-tuned for specific tasks.

In this tutorial, you'll learn:

  • What pretrained models are and why they're useful
  • How to load and use popular pretrained models in PyTorch
  • How to perform transfer learning by fine-tuning pretrained models
  • How to apply these models to real-world computer vision tasks

What Are Pretrained Models?

Pretrained models are deep learning models that have already been trained on large datasets. Instead of starting with randomly initialized weights, you can leverage these models with weights that have already learned useful features from millions of images.

Benefits of Using Pretrained Models:

  1. Save time and computational resources - Training deep models from scratch can take days or weeks
  2. Better performance - Models trained on large datasets often generalize better
  3. Less data required - Fine-tuning pretrained models requires less training data than training from scratch
  4. Feature extraction - Pretrained models can be used as powerful feature extractors

Loading Pretrained Models in PyTorch

PyTorch's torchvision.models module provides easy access to popular architectures like ResNet, VGG, Inception, and many more.

Let's start by loading some common pretrained models:

python
import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

# Check if GPU is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load a pretrained ResNet50 model
resnet = models.resnet50(pretrained=True)
# Note: In newer PyTorch versions, use weights=models.ResNet50_Weights.DEFAULT instead of pretrained=True

# Load a pretrained VGG16 model
vgg = models.vgg16(pretrained=True)

# Load a pretrained MobileNetV2 (a smaller model for mobile applications)
mobilenet = models.mobilenet_v2(pretrained=True)

When you run the code above, PyTorch will download the model weights if they're not already cached on your system.

Using Pretrained Models for Inference

Now that we've loaded a pretrained model, let's use it to make predictions on an image. We'll use ResNet50 to classify an image:

python
# Define the transformation pipeline
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load an image (replace with your own image path)
img = Image.open("cat.jpg")

# Preprocess the image
img_tensor = preprocess(img)
# Add batch dimension
img_tensor = img_tensor.unsqueeze(0)
# Move to the same device as the model
img_tensor = img_tensor.to(device)

# Put the model in evaluation mode
resnet.to(device)
resnet.eval()

# Make prediction
with torch.no_grad():
output = resnet(img_tensor)

# Load ImageNet class labels
import json
with open("imagenet_classes.json") as f:
labels = json.load(f)

# Get the predicted class
_, predicted = torch.max(output, 1)
predicted_class = labels[predicted.item()]

# Display the image and prediction
plt.imshow(np.array(img))
plt.title(f"Predicted class: {predicted_class}")
plt.axis('off')
plt.show()

Output:

Using device: cuda:0
Predicted class: tabby cat

Note: You'll need an imagenet_classes.json file containing the class labels. You can create one from this list.

Understanding Model Architecture

Let's examine the architecture of the pretrained model:

python
# Print the model architecture
print(resnet)

# Print just the last few layers
print("\nLast layer of ResNet50:")
print(list(resnet.children())[-1])

Output:

ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(...)
(layer2): Sequential(...)
(layer3): Sequential(...)
(layer4): Sequential(...)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)

Last layer of ResNet50:
Linear(in_features=2048, out_features=1000, bias=True)

As you can see, the final layer is a fully connected layer with 1000 outputs, corresponding to the 1000 ImageNet classes.

Transfer Learning with Pretrained Models

Transfer learning allows you to adapt pretrained models to your specific task. There are two main approaches:

  1. Feature extraction: Use the pretrained model as a fixed feature extractor and only train a new classifier
  2. Fine-tuning: Replace the classifier and also update some or all of the pretrained weights

Let's see how to implement both approaches:

Feature Extraction

python
import torch.nn as nn
import torch.optim as optim

# Freeze all parameters to use as a feature extractor
for param in resnet.parameters():
param.requires_grad = False

# Replace the final fully connected layer
num_features = resnet.fc.in_features
num_classes = 10 # Example: 10 classes in your dataset
resnet.fc = nn.Linear(num_features, num_classes)

# Now only the parameters of the final layer are being trained
optimizer = optim.SGD(resnet.fc.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()

# Training loop would go here...

Fine-tuning

python
# Unfreeze all parameters for fine-tuning
for param in resnet.parameters():
param.requires_grad = True

# Replace the final fully connected layer
num_features = resnet.fc.in_features
num_classes = 10 # Example: 10 classes in your dataset
resnet.fc = nn.Linear(num_features, num_classes)

# Use a smaller learning rate when fine-tuning
optimizer = optim.SGD([
{'params': list(resnet.parameters())[:-2], 'lr': 0.0001}, # Smaller learning rate for pretrained parameters
{'params': resnet.fc.parameters(), 'lr': 0.001} # Larger learning rate for new parameters
], momentum=0.9)

criterion = nn.CrossEntropyLoss()

# Training loop would go here...

Real-world Application: Image Classification with Custom Dataset

Let's use transfer learning on a custom dataset. For this example, we'll imagine we have a dataset of animal images:

python
import torch.utils.data as data

# Define your dataset and dataloaders
# This is a simplified example, you would normally use a custom dataset class
train_loader = data.DataLoader(
your_train_dataset, # Replace with your actual dataset
batch_size=32,
shuffle=True,
num_workers=4
)

test_loader = data.DataLoader(
your_test_dataset, # Replace with your actual dataset
batch_size=32,
shuffle=False,
num_workers=4
)

# Load a pretrained model
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes) # Replace with number of classes in your dataset
model = model.to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
model.train()
running_loss = 0.0

for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)

# Zero the parameter gradients
optimizer.zero_grad()

# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)

# Backward pass and optimize
loss.backward()
optimizer.step()

running_loss += loss.item() * inputs.size(0)

epoch_loss = running_loss / len(train_loader.dataset)
print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}")

# Update learning rate
scheduler.step()

print("Training complete!")

# Evaluation
model.eval()
correct = 0
total = 0

with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f"Accuracy on test set: {100 * correct / total:.2f}%")

Available Pretrained Models in PyTorch

PyTorch offers a wide variety of pretrained models. Here are some of the most popular ones:

  1. Classification Models:

    • ResNet (18, 34, 50, 101, 152)
    • VGG (11, 13, 16, 19)
    • DenseNet (121, 169, 201, 161)
    • Inception v3
    • MobileNet v2, v3
    • EfficientNet (b0-b7)
  2. Detection Models:

    • Faster R-CNN
    • Mask R-CNN
    • SSD
    • RetinaNet
  3. Segmentation Models:

    • FCN ResNet
    • DeepLabV3

To see all available models, you can check the PyTorch documentation or use:

python
import torchvision.models as models

# Print all available models
for model_name in dir(models):
if not model_name.startswith("__") and callable(getattr(models, model_name)):
print(model_name)

Best Practices for Using Pretrained Models

  1. Choose the right model: Consider the trade-off between accuracy and speed based on your requirements.
  2. Preprocessing: Always use the same preprocessing steps that were used during training.
  3. Batch normalization layers: Keep these in evaluation mode during fine-tuning.
  4. Learning rates: Use smaller learning rates when fine-tuning pretrained layers.
  5. Data augmentation: Use augmentation to prevent overfitting, especially with small datasets.
  6. Monitor validation performance: Watch for overfitting during training.

Summary

In this tutorial, you've learned:

  • How to load pretrained models from PyTorch's model zoo
  • How to use these models for inference on new images
  • How to implement transfer learning through feature extraction and fine-tuning
  • How to apply pretrained models to custom datasets

PyTorch's pretrained models offer an excellent starting point for many computer vision tasks, allowing you to achieve strong results without the need for massive datasets or computational resources.

Additional Resources

Exercises

  1. Load a different pretrained model (like VGG16 or MobileNetV2) and compare its performance on the same image.
  2. Implement transfer learning on a simple dataset like CIFAR-10 using a pretrained ResNet model.
  3. Try freezing different layers of a pretrained model and observe how it affects training.
  4. Implement data augmentation techniques to improve the performance of your fine-tuned model.
  5. Research and implement a pretrained model for a different task like object detection or segmentation.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)