PyTorch TorchServe

Introduction

When you've finished training your PyTorch model, the next crucial step is deployment - making your model available to users or other systems. This is where TorchServe comes in.

TorchServe is an open-source model serving framework specifically designed for PyTorch models. Developed collaboratively by AWS and Facebook, TorchServe provides a flexible and easy-to-use solution for deploying machine learning models in production environments.

In this tutorial, we'll explore how to use TorchServe to deploy PyTorch models, making them accessible via REST APIs. Whether you're deploying to a cloud service, an edge device, or a local development environment, TorchServe streamlines the process and handles many production concerns for you.

Why Use TorchServe?

Before diving into implementation, let's understand why TorchServe is valuable:

Simplicity: Deploy models with minimal code
Performance: Optimized for high-throughput and low-latency serving
Flexibility: Support for custom pre/post-processing logic
Monitoring: Built-in metrics and logging
Scalability: Dynamic batching and multi-model serving

Prerequisites

To follow along with this tutorial, you'll need:

Python 3.6+
PyTorch 1.3+
Java 11+ (required for TorchServe)

Installing TorchServe

Let's begin by installing TorchServe and its dependencies:

bash
pip install torch torchserve torch-model-archiver torch-workflow-archiver

Understanding the TorchServe Workflow

The TorchServe deployment process follows these key steps:

Create a handler script (for pre/post-processing)
Package your model using the model archiver
Start TorchServe with your model
Send inference requests to the API endpoints

Let's walk through each step with a practical example.

Step 1: Prepare Your Model

First, let's assume we have a simple trained PyTorch model that classifies images. Here's a basic example:

python
import torch
import torch.nn as nn
import torchvision.models as models

# Create a simple model (in a real scenario, you would train this)
model = models.resnet18(pretrained=True)

# Save the model
torch.save(model.state_dict(), "resnet18_model.pth")

Step 2: Create a Custom Handler

TorchServe uses handlers to manage preprocessing of input data, inference, and postprocessing of model output. You can use built-in handlers or create a custom one.

Let's create a custom handler for our image classification model:

python
# image_classifier.py
import torch
import torch.nn.functional as F
from torchvision import transforms
from PIL import Image
import io
from ts.torch_handler.base_handler import BaseHandler

class ImageClassifierHandler(BaseHandler):
    """
    Custom handler for image classification using ResNet18
    """
    def __init__(self):
        super(ImageClassifierHandler, self).__init__()
        self.transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225])
        ])
        self.classes = None
        
    def initialize(self, context):
        """
        Initialize model and load class labels
        """
        super().initialize(context)
        
        # Load the mapping of class ids to human-readable labels
        with open('imagenet_classes.txt', 'r') as f:
            self.classes = [line.strip() for line in f.readlines()]
    
    def preprocess(self, data):
        """
        Preprocess the input data
        """
        images = []
        for row in data:
            # Assuming the input is an image in binary format
            image = Image.open(io.BytesIO(row))
            image = self.transform(image)
            images.append(image)
        
        return torch.stack(images)
    
    def inference(self, data):
        """
        Make predictions on the preprocessed data
        """
        with torch.no_grad():
            output = self.model(data)
            probabilities = F.softmax(output, dim=1)
        return probabilities
    
    def postprocess(self, data):
        """
        Post-process the model output
        """
        # Get the top 5 predictions for each image
        results = []
        for probs in data:
            top5_prob, top5_indices = torch.topk(probs, 5)
            top5_prob = top5_prob.tolist()
            top5_indices = top5_indices.tolist()
            top5_classes = [self.classes[idx] for idx in top5_indices]
            
            result = [
                {"class": cls, "probability": prob} 
                for cls, prob in zip(top5_classes, top5_prob)
            ]
            results.append(result)
        
        return results

Step 3: Package Your Model with the Model Archiver

Now we need to package our model using the torch-model-archiver tool. This creates a .mar file (Model Archive) that contains your model weights and the handler.

bash
torch-model-archiver --model-name resnet18 \
                     --version 1.0 \
                     --model-file model.py \
                     --serialized-file resnet18_model.pth \
                     --handler image_classifier.py \
                     --extra-files imagenet_classes.txt

This command creates a resnet18.mar file with all the necessary components.

Step 4: Start TorchServe

Next, let's create a simple configuration file for TorchServe:

bash
# config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=model_store

Create a directory for storing model archives:

bash
mkdir -p model_store
mv resnet18.mar model_store/

Now, let's start TorchServe:

bash
torchserve --start --ncs --model-store model_store --models resnet18=resnet18.mar --ts-config config.properties

This command starts TorchServe with our ResNet18 model and makes it available for inference.

Step 5: Make Inference Requests

Now that our model is deployed, we can send inference requests to it using the REST API:

python
import requests
import json
from PIL import Image
import io

# Load an image
image = Image.open("cat.jpg")

# Convert to bytes
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')

# Make prediction request
url = "http://localhost:8080/predictions/resnet18"
response = requests.post(url, data=img_bytes.getvalue(), headers={'Content-Type': 'application/octet-stream'})

# Print the response
predictions = json.loads(response.text)
print(json.dumps(predictions, indent=2))

The output might look like:

json
[
  {
    "class": "Egyptian cat",
    "probability": 0.8132
  },
  {
    "class": "tabby, tabby cat",
    "probability": 0.1023
  },
  {
    "class": "lynx, catamount",
    "probability": 0.0341
  },
  {
    "class": "tiger cat",
    "probability": 0.0201
  },
  {
    "class": "Persian cat",
    "probability": 0.0119
  }
]

Advanced TorchServe Features

TorchServe offers several advanced features worth exploring:

1. Batch Inference

TorchServe can process multiple requests as a batch to improve throughput:

bash
# Update config.properties
batch_size=4
max_batch_delay=100

2. Model Versioning and Management

TorchServe's management API allows you to:

Register new model versions
Set a default version
Scale models up/down

Example of registering a new model version:

bash
curl -X POST "http://localhost:8081/models?url=resnet18_v2.mar&model_name=resnet18&version=2.0"

3. Metrics and Monitoring

TorchServe exposes Prometheus-compatible metrics:

bash
curl http://localhost:8082/metrics

4. Kubernetes Deployment

For production environments, you can deploy TorchServe on Kubernetes using the official Helm chart:

bash
helm repo add torchserve https://pytorch.github.io/serve/
helm install ts torchserve/torchserve

Real-World Use Case: Deploying a Sentiment Analysis Model

Let's look at a more complete example of deploying a sentiment analysis model:

First, train a simple BERT-based sentiment classifier:

python
import torch
from transformers import BertForSequenceClassification, BertTokenizer

# Load pre-trained model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Save the model and tokenizer
torch.save(model.state_dict(), "bert_sentiment.pth")
tokenizer.save_pretrained("./bert_tokenizer")

Create a custom handler for sentiment analysis:

python
# sentiment_handler.py
import torch
import json
from ts.torch_handler.base_handler import BaseHandler
from transformers import BertForSequenceClassification, BertTokenizer

class SentimentAnalysisHandler(BaseHandler):
    def __init__(self):
        super(SentimentAnalysisHandler, self).__init__()
        self.tokenizer = None
        
    def initialize(self, context):
        super().initialize(context)
        
        # Load the tokenizer
        self.tokenizer = BertTokenizer.from_pretrained("bert_tokenizer")
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    def preprocess(self, data):
        text_list = []
        for row in data:
            text = row.decode('utf-8')
            text_list.append(text)
        
        # Tokenize the input texts
        encodings = self.tokenizer(
            text_list,
            padding=True,
            truncation=True,
            return_tensors="pt",
            max_length=128
        )
        
        return encodings
    
    def inference(self, encodings):
        # Move data to device and run inference
        encodings = {k: v.to(self.device) for k, v in encodings.items()}
        with torch.no_grad():
            outputs = self.model(**encodings)
            probabilities = torch.nn.functional.softmax(outputs.logits, dim=1)
        
        return probabilities
    
    def postprocess(self, probabilities):
        # Convert probabilities to sentiment labels and scores
        results = []
        for probs in probabilities:
            negative_prob = probs[0].item()
            positive_prob = probs[1].item()
            sentiment = "positive" if positive_prob > negative_prob else "negative"
            confidence = max(positive_prob, negative_prob)
            
            results.append({
                "sentiment": sentiment,
                "confidence": confidence,
                "positive_probability": positive_prob,
                "negative_probability": negative_prob
            })
        
        return results

Package and deploy the model:

bash
# Prepare the model archive
torch-model-archiver --model-name bert_sentiment \
                     --version 1.0 \
                     --serialized-file bert_sentiment.pth \
                     --handler sentiment_handler.py \
                     --extra-files "bert_tokenizer"

# Move to model store
mkdir -p model_store
mv bert_sentiment.mar model_store/

# Start TorchServe
torchserve --start --model-store model_store --models bert_sentiment=bert_sentiment.mar

Make inference requests:

python
import requests
import json

# Sample text for sentiment analysis
text = "I absolutely loved this product! It exceeded all my expectations."

# Make prediction request
url = "http://localhost:8080/predictions/bert_sentiment"
response = requests.post(url, data=text.encode('utf-8'))

# Print the response
result = json.loads(response.text)
print(json.dumps(result, indent=2))

Output:

json
{
  "sentiment": "positive",
  "confidence": 0.9873,
  "positive_probability": 0.9873,
  "negative_probability": 0.0127
}

Troubleshooting Common Issues

Model Loading Errors

If your model fails to load, check:

The model's format and compatibility
File paths in your handler
Memory requirements

Request Timeout

If requests time out:

bash
# Increase timeout in config.properties
inference_request_timeout=300

Custom Dependencies

If your handler requires additional packages:

bash
# Create requirements.txt with dependencies
torch-model-archiver --model-name my_model \
                     --requirements-file requirements.txt \
                     ...

Summary

In this tutorial, we've covered:

What TorchServe is and why it's useful for PyTorch model deployment
How to install and configure TorchServe
The process of creating custom handlers for pre/post-processing
Packaging models with the model archiver
Starting TorchServe and making inference requests
Advanced features like batch processing and model management
A real-world example of deploying a sentiment analysis model

TorchServe provides a powerful, flexible framework for deploying PyTorch models in production environments. With its robust features and straightforward API, you can quickly make your models available for inference without getting bogged down in deployment details.

Additional Resources

Exercises

Deploy a pre-trained image segmentation model (like DeepLabV3) using TorchServe.
Create a custom handler for a text generation model that accepts a prompt and returns generated text.
Implement a TorchServe deployment with dynamic batching and measure the performance difference.
Set up TorchServe with multiple models and create a simple web application that uses them.
Deploy TorchServe in a Docker container and configure it for horizontal scaling.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Use TorchServe?​

Prerequisites​

Installing TorchServe​

Understanding the TorchServe Workflow​

Step 1: Prepare Your Model​

Step 2: Create a Custom Handler​

Step 3: Package Your Model with the Model Archiver​

Step 4: Start TorchServe​

Step 5: Make Inference Requests​

Advanced TorchServe Features​

1. Batch Inference​

2. Model Versioning and Management​

3. Metrics and Monitoring​

4. Kubernetes Deployment​

Real-World Use Case: Deploying a Sentiment Analysis Model​

Troubleshooting Common Issues​

Model Loading Errors​

Request Timeout​

Custom Dependencies​

Summary​

Additional Resources​

Exercises​