Skip to main content

PyTorch CI/CD Integration

Introduction

Continuous Integration and Continuous Deployment (CI/CD) are development practices that help teams deliver code changes more frequently and reliably. When working with PyTorch projects, proper CI/CD integration ensures that your models are consistently tested, validated, and deployed without manual intervention.

In this guide, you'll learn how to:

  • Set up CI/CD pipelines for PyTorch projects
  • Automate testing of PyTorch models
  • Configure deployment workflows for model serving
  • Implement best practices for reliable PyTorch CI/CD

Whether you're developing a simple classification model or a complex deep learning system, integrating CI/CD into your PyTorch workflow will save time and increase reliability.

Why CI/CD Matters for PyTorch Projects

PyTorch projects present unique challenges for CI/CD pipelines:

  1. Computational requirements: Training and testing models often requires significant resources
  2. Environment dependencies: PyTorch has specific CUDA/GPU requirements
  3. Model artifacts: Models produce large binary files that need special handling
  4. Reproducibility: Ensuring consistent results across different environments

A proper CI/CD setup addresses these challenges and helps maintain code quality as your project evolves.

Getting Started with PyTorch CI/CD

Setting Up Your Project Structure

Before implementing CI/CD, organize your PyTorch project with testability and automation in mind:

pytorch-project/
├── .github/workflows/ # GitHub Actions configurations
├── src/ # Source code
│ └── models/ # PyTorch models
├── tests/ # Test scripts
├── configs/ # Configuration files
├── requirements.txt # Dependencies
└── setup.py # Package setup

Creating Essential Test Cases

First, create basic test cases to verify your PyTorch models work correctly. Here's a simple example using pytest:

python
# tests/test_model.py
import torch
import pytest
from src.models.simple_cnn import SimpleCNN

def test_model_forward_pass():
# Create a sample input
batch_size = 2
channels = 3
height = 32
width = 32
x = torch.randn(batch_size, channels, height, width)

# Initialize model
model = SimpleCNN(num_classes=10)
model.eval()

# Perform forward pass
with torch.no_grad():
output = model(x)

# Check output dimensions
assert output.shape == (batch_size, 10), f"Expected shape (2, 10), got {output.shape}"

This test verifies that your model produces outputs with the expected shape.

GitHub Actions for PyTorch CI/CD

GitHub Actions is a popular choice for setting up CI/CD pipelines. Let's create a workflow file for a PyTorch project.

Basic Testing Workflow

Create a file .github/workflows/test.yml:

yaml
name: PyTorch Tests

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
pip install -r requirements.txt

- name: Run tests
run: |
pytest tests/

Adding GPU Support for Testing

To test PyTorch models with GPU capabilities:

yaml
name: PyTorch GPU Tests

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
test-gpu:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

- name: Run GPU tests
run: |
pytest tests/test_gpu.py

Model Testing Best Practices

1. Test Model Loading and Saving

python
# tests/test_model_io.py
import torch
import os
from src.models.simple_cnn import SimpleCNN

def test_model_save_load():
# Create and save a model
model = SimpleCNN(num_classes=10)
torch.save(model.state_dict(), "model.pth")

# Load model from file
loaded_model = SimpleCNN(num_classes=10)
loaded_model.load_state_dict(torch.load("model.pth"))
loaded_model.eval()

# Verify both models produce the same output
x = torch.randn(1, 3, 32, 32)
with torch.no_grad():
output1 = model(x)
output2 = loaded_model(x)

assert torch.allclose(output1, output2, atol=1e-7)

# Cleanup
os.remove("model.pth")

2. Test Numerical Stability

python
# tests/test_numerical_stability.py
import torch
from src.models.simple_cnn import SimpleCNN

def test_backward_pass():
# Create inputs and targets
x = torch.randn(4, 3, 32, 32, requires_grad=True)
target = torch.randint(0, 10, (4,))

# Initialize model and optimizer
model = SimpleCNN(num_classes=10)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

# Forward and backward pass
optimizer.zero_grad()
output = model(x)
loss = criterion(output, target)
loss.backward()

# Check if gradients are finite (not NaN or Inf)
for name, param in model.named_parameters():
assert torch.isfinite(param.grad).all(), f"Parameter {name} has non-finite gradients"

# Apply optimizer step
optimizer.step()

Automating Model Training in CI/CD

For small models, you can include training in your CI/CD pipeline:

yaml
name: Train Model

on:
schedule:
- cron: '0 0 * * 0' # Weekly on Sunday at midnight
workflow_dispatch: # Manual trigger

jobs:
train:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Train model
run: |
python src/train.py --config configs/training_config.yaml

- name: Save model artifact
uses: actions/upload-artifact@v3
with:
name: trained-model
path: outputs/model.pth

Example Training Script

python
# src/train.py
import argparse
import yaml
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from src.models.simple_cnn import SimpleCNN

def train(config):
# Set up device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load data
transform = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=transform
)

train_loader = DataLoader(
train_dataset,
batch_size=config['batch_size'],
shuffle=True
)

# Initialize model
model = SimpleCNN(num_classes=10).to(device)

# Set up training
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])

# Training loop
for epoch in range(config['epochs']):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)

optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()

if batch_idx % 100 == 0:
print(f"Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}] Loss: {loss.item():.6f}")

# Save model
torch.save(model.state_dict(), config['output_path'])
print(f"Model saved to {config['output_path']}")

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True, help='Path to config file')
args = parser.parse_args()

with open(args.config, 'r') as f:
config = yaml.safe_load(f)

train(config)

Deploying PyTorch Models

After testing, you'll want to deploy your models to a production environment. Here's how to automate deployment using CI/CD:

Model Deployment with Docker

  1. Create a Dockerfile for your PyTorch application:
dockerfile
FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "src/serve.py"]
  1. Add a model serving script:
python
# src/serve.py
import torch
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from src.models.simple_cnn import SimpleCNN

app = FastAPI(title="PyTorch Model API")

# Load model
model = SimpleCNN(num_classes=10)
model.load_state_dict(torch.load("outputs/model.pth", map_location="cpu"))
model.eval()

class ImageData(BaseModel):
image: list

@app.post("/predict")
async def predict(data: ImageData):
# Convert input data to tensor
image_tensor = torch.tensor(data.image).reshape(1, 3, 32, 32).float()

# Make prediction
with torch.no_grad():
output = model(image_tensor)
prediction = output.argmax(dim=1).item()

return {"prediction": prediction}

if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
  1. Create a deployment workflow:
yaml
name: Deploy Model

on:
push:
branches: [ main ]
paths:
- 'outputs/model.pth'

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
context: .
push: true
tags: yourusername/pytorch-model:latest

Real-World Example: End-to-End PyTorch CI/CD Pipeline

Let's put everything together into a complete CI/CD pipeline for a PyTorch project.

Project Structure

pytorch-project/
├── .github/workflows/
│ ├── test.yml # Run tests on PR and push
│ ├── train.yml # Weekly model training
│ └── deploy.yml # Deploy when model changes
├── src/
│ ├── models/
│ │ └── resnet.py # Model definition
│ ├── train.py # Training script
│ └── serve.py # API server
├── tests/
│ ├── test_model.py # Model tests
│ └── test_api.py # API tests
├── configs/
│ └── training_config.yaml # Training parameters
├── Dockerfile
├── requirements.txt
└── README.md

Complete CI/CD Workflow

  1. Developer pushes code changes
  2. GitHub Actions runs tests to validate changes
  3. If tests pass and it's time for weekly training, a new model is trained
  4. The trained model is saved as an artifact
  5. When a new model is available, deployment workflow builds and pushes a Docker image
  6. The image is deployed to your infrastructure (e.g., Kubernetes)

This approach ensures that your PyTorch models are consistently tested, trained, and deployed without manual intervention.

Monitoring Model Performance in Production

After deployment, it's important to monitor your model's performance. Add logging to your serving script:

python
# src/serve.py (modified)
import torch
import logging
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from src.models.simple_cnn import SimpleCNN

# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("api.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)

app = FastAPI(title="PyTorch Model API")

# Load model
try:
model = SimpleCNN(num_classes=10)
model.load_state_dict(torch.load("outputs/model.pth", map_location="cpu"))
model.eval()
logger.info("Model loaded successfully")
except Exception as e:
logger.error(f"Failed to load model: {e}")
raise

class ImageData(BaseModel):
image: list

@app.post("/predict")
async def predict(data: ImageData):
try:
# Convert input data to tensor
image_tensor = torch.tensor(data.image).reshape(1, 3, 32, 32).float()

# Make prediction
with torch.no_grad():
output = model(image_tensor)
prediction = output.argmax(dim=1).item()

logger.info(f"Prediction made: {prediction}")
return {"prediction": prediction}
except Exception as e:
logger.error(f"Prediction error: {e}")
raise

Summary

In this guide, we've explored how to:

  1. Set up CI/CD pipelines specifically for PyTorch projects
  2. Create comprehensive tests for PyTorch models
  3. Automate model training within CI/CD workflows
  4. Deploy models using Docker and CI/CD
  5. Implement a complete end-to-end pipeline

By integrating PyTorch with CI/CD practices, you can ensure your models are consistently tested, validated, and deployed, saving time and reducing errors in your development workflow.

Additional Resources

Exercises

  1. Set up a basic CI pipeline for a simple PyTorch model using GitHub Actions.
  2. Create tests that validate model outputs are consistent across CPU and GPU.
  3. Implement automated hyperparameter tuning in your CI/CD pipeline.
  4. Create a Docker container that serves a PyTorch model with FastAPI.
  5. Add performance monitoring to track your model's accuracy in production.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)