PyTorch CI/CD Integration
Introduction
Continuous Integration and Continuous Deployment (CI/CD) are development practices that help teams deliver code changes more frequently and reliably. When working with PyTorch projects, proper CI/CD integration ensures that your models are consistently tested, validated, and deployed without manual intervention.
In this guide, you'll learn how to:
- Set up CI/CD pipelines for PyTorch projects
- Automate testing of PyTorch models
- Configure deployment workflows for model serving
- Implement best practices for reliable PyTorch CI/CD
Whether you're developing a simple classification model or a complex deep learning system, integrating CI/CD into your PyTorch workflow will save time and increase reliability.
Why CI/CD Matters for PyTorch Projects
PyTorch projects present unique challenges for CI/CD pipelines:
- Computational requirements: Training and testing models often requires significant resources
- Environment dependencies: PyTorch has specific CUDA/GPU requirements
- Model artifacts: Models produce large binary files that need special handling
- Reproducibility: Ensuring consistent results across different environments
A proper CI/CD setup addresses these challenges and helps maintain code quality as your project evolves.
Getting Started with PyTorch CI/CD
Setting Up Your Project Structure
Before implementing CI/CD, organize your PyTorch project with testability and automation in mind:
pytorch-project/
├── .github/workflows/ # GitHub Actions configurations
├── src/ # Source code
│ └── models/ # PyTorch models
├── tests/ # Test scripts
├── configs/ # Configuration files
├── requirements.txt # Dependencies
└── setup.py # Package setup
Creating Essential Test Cases
First, create basic test cases to verify your PyTorch models work correctly. Here's a simple example using pytest
:
# tests/test_model.py
import torch
import pytest
from src.models.simple_cnn import SimpleCNN
def test_model_forward_pass():
# Create a sample input
batch_size = 2
channels = 3
height = 32
width = 32
x = torch.randn(batch_size, channels, height, width)
# Initialize model
model = SimpleCNN(num_classes=10)
model.eval()
# Perform forward pass
with torch.no_grad():
output = model(x)
# Check output dimensions
assert output.shape == (batch_size, 10), f"Expected shape (2, 10), got {output.shape}"
This test verifies that your model produces outputs with the expected shape.
GitHub Actions for PyTorch CI/CD
GitHub Actions is a popular choice for setting up CI/CD pipelines. Let's create a workflow file for a PyTorch project.
Basic Testing Workflow
Create a file .github/workflows/test.yml
:
name: PyTorch Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
pip install -r requirements.txt
- name: Run tests
run: |
pytest tests/
Adding GPU Support for Testing
To test PyTorch models with GPU capabilities:
name: PyTorch GPU Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
test-gpu:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt
- name: Run GPU tests
run: |
pytest tests/test_gpu.py
Model Testing Best Practices
1. Test Model Loading and Saving
# tests/test_model_io.py
import torch
import os
from src.models.simple_cnn import SimpleCNN
def test_model_save_load():
# Create and save a model
model = SimpleCNN(num_classes=10)
torch.save(model.state_dict(), "model.pth")
# Load model from file
loaded_model = SimpleCNN(num_classes=10)
loaded_model.load_state_dict(torch.load("model.pth"))
loaded_model.eval()
# Verify both models produce the same output
x = torch.randn(1, 3, 32, 32)
with torch.no_grad():
output1 = model(x)
output2 = loaded_model(x)
assert torch.allclose(output1, output2, atol=1e-7)
# Cleanup
os.remove("model.pth")
2. Test Numerical Stability
# tests/test_numerical_stability.py
import torch
from src.models.simple_cnn import SimpleCNN
def test_backward_pass():
# Create inputs and targets
x = torch.randn(4, 3, 32, 32, requires_grad=True)
target = torch.randint(0, 10, (4,))
# Initialize model and optimizer
model = SimpleCNN(num_classes=10)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
# Forward and backward pass
optimizer.zero_grad()
output = model(x)
loss = criterion(output, target)
loss.backward()
# Check if gradients are finite (not NaN or Inf)
for name, param in model.named_parameters():
assert torch.isfinite(param.grad).all(), f"Parameter {name} has non-finite gradients"
# Apply optimizer step
optimizer.step()
Automating Model Training in CI/CD
For small models, you can include training in your CI/CD pipeline:
name: Train Model
on:
schedule:
- cron: '0 0 * * 0' # Weekly on Sunday at midnight
workflow_dispatch: # Manual trigger
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Train model
run: |
python src/train.py --config configs/training_config.yaml
- name: Save model artifact
uses: actions/upload-artifact@v3
with:
name: trained-model
path: outputs/model.pth
Example Training Script
# src/train.py
import argparse
import yaml
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from src.models.simple_cnn import SimpleCNN
def train(config):
# Set up device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load data
transform = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=transform
)
train_loader = DataLoader(
train_dataset,
batch_size=config['batch_size'],
shuffle=True
)
# Initialize model
model = SimpleCNN(num_classes=10).to(device)
# Set up training
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])
# Training loop
for epoch in range(config['epochs']):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print(f"Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}] Loss: {loss.item():.6f}")
# Save model
torch.save(model.state_dict(), config['output_path'])
print(f"Model saved to {config['output_path']}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--config', type=str, required=True, help='Path to config file')
args = parser.parse_args()
with open(args.config, 'r') as f:
config = yaml.safe_load(f)
train(config)
Deploying PyTorch Models
After testing, you'll want to deploy your models to a production environment. Here's how to automate deployment using CI/CD:
Model Deployment with Docker
- Create a Dockerfile for your PyTorch application:
FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "src/serve.py"]
- Add a model serving script:
# src/serve.py
import torch
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from src.models.simple_cnn import SimpleCNN
app = FastAPI(title="PyTorch Model API")
# Load model
model = SimpleCNN(num_classes=10)
model.load_state_dict(torch.load("outputs/model.pth", map_location="cpu"))
model.eval()
class ImageData(BaseModel):
image: list
@app.post("/predict")
async def predict(data: ImageData):
# Convert input data to tensor
image_tensor = torch.tensor(data.image).reshape(1, 3, 32, 32).float()
# Make prediction
with torch.no_grad():
output = model(image_tensor)
prediction = output.argmax(dim=1).item()
return {"prediction": prediction}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
- Create a deployment workflow:
name: Deploy Model
on:
push:
branches: [ main ]
paths:
- 'outputs/model.pth'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
context: .
push: true
tags: yourusername/pytorch-model:latest
Real-World Example: End-to-End PyTorch CI/CD Pipeline
Let's put everything together into a complete CI/CD pipeline for a PyTorch project.
Project Structure
pytorch-project/
├── .github/workflows/
│ ├── test.yml # Run tests on PR and push
│ ├── train.yml # Weekly model training
│ └── deploy.yml # Deploy when model changes
├── src/
│ ├── models/
│ │ └── resnet.py # Model definition
│ ├── train.py # Training script
│ └── serve.py # API server
├── tests/
│ ├── test_model.py # Model tests
│ └── test_api.py # API tests
├── configs/
│ └── training_config.yaml # Training parameters
├── Dockerfile
├── requirements.txt
└── README.md
Complete CI/CD Workflow
- Developer pushes code changes
- GitHub Actions runs tests to validate changes
- If tests pass and it's time for weekly training, a new model is trained
- The trained model is saved as an artifact
- When a new model is available, deployment workflow builds and pushes a Docker image
- The image is deployed to your infrastructure (e.g., Kubernetes)
This approach ensures that your PyTorch models are consistently tested, trained, and deployed without manual intervention.
Monitoring Model Performance in Production
After deployment, it's important to monitor your model's performance. Add logging to your serving script:
# src/serve.py (modified)
import torch
import logging
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from src.models.simple_cnn import SimpleCNN
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("api.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
app = FastAPI(title="PyTorch Model API")
# Load model
try:
model = SimpleCNN(num_classes=10)
model.load_state_dict(torch.load("outputs/model.pth", map_location="cpu"))
model.eval()
logger.info("Model loaded successfully")
except Exception as e:
logger.error(f"Failed to load model: {e}")
raise
class ImageData(BaseModel):
image: list
@app.post("/predict")
async def predict(data: ImageData):
try:
# Convert input data to tensor
image_tensor = torch.tensor(data.image).reshape(1, 3, 32, 32).float()
# Make prediction
with torch.no_grad():
output = model(image_tensor)
prediction = output.argmax(dim=1).item()
logger.info(f"Prediction made: {prediction}")
return {"prediction": prediction}
except Exception as e:
logger.error(f"Prediction error: {e}")
raise
Summary
In this guide, we've explored how to:
- Set up CI/CD pipelines specifically for PyTorch projects
- Create comprehensive tests for PyTorch models
- Automate model training within CI/CD workflows
- Deploy models using Docker and CI/CD
- Implement a complete end-to-end pipeline
By integrating PyTorch with CI/CD practices, you can ensure your models are consistently tested, validated, and deployed, saving time and reducing errors in your development workflow.
Additional Resources
- PyTorch Documentation
- GitHub Actions Documentation
- FastAPI for Model Serving
- Docker Documentation
- MLflow for Model Tracking
Exercises
- Set up a basic CI pipeline for a simple PyTorch model using GitHub Actions.
- Create tests that validate model outputs are consistent across CPU and GPU.
- Implement automated hyperparameter tuning in your CI/CD pipeline.
- Create a Docker container that serves a PyTorch model with FastAPI.
- Add performance monitoring to track your model's accuracy in production.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)