Skip to main content

PyTorch TorchServe

Introduction

When you've finished training your PyTorch model, the next crucial step is deployment - making your model available to users or other systems. This is where TorchServe comes in.

TorchServe is an open-source model serving framework specifically designed for PyTorch models. Developed collaboratively by AWS and Facebook, TorchServe provides a flexible and easy-to-use solution for deploying machine learning models in production environments.

In this tutorial, we'll explore how to use TorchServe to deploy PyTorch models, making them accessible via REST APIs. Whether you're deploying to a cloud service, an edge device, or a local development environment, TorchServe streamlines the process and handles many production concerns for you.

Why Use TorchServe?

Before diving into implementation, let's understand why TorchServe is valuable:

  • Simplicity: Deploy models with minimal code
  • Performance: Optimized for high-throughput and low-latency serving
  • Flexibility: Support for custom pre/post-processing logic
  • Monitoring: Built-in metrics and logging
  • Scalability: Dynamic batching and multi-model serving

Prerequisites

To follow along with this tutorial, you'll need:

  • Python 3.6+
  • PyTorch 1.3+
  • Java 11+ (required for TorchServe)

Installing TorchServe

Let's begin by installing TorchServe and its dependencies:

bash
pip install torch torchserve torch-model-archiver torch-workflow-archiver

Understanding the TorchServe Workflow

The TorchServe deployment process follows these key steps:

  1. Create a handler script (for pre/post-processing)
  2. Package your model using the model archiver
  3. Start TorchServe with your model
  4. Send inference requests to the API endpoints

Let's walk through each step with a practical example.

Step 1: Prepare Your Model

First, let's assume we have a simple trained PyTorch model that classifies images. Here's a basic example:

python
import torch
import torch.nn as nn
import torchvision.models as models

# Create a simple model (in a real scenario, you would train this)
model = models.resnet18(pretrained=True)

# Save the model
torch.save(model.state_dict(), "resnet18_model.pth")

Step 2: Create a Custom Handler

TorchServe uses handlers to manage preprocessing of input data, inference, and postprocessing of model output. You can use built-in handlers or create a custom one.

Let's create a custom handler for our image classification model:

python
# image_classifier.py
import torch
import torch.nn.functional as F
from torchvision import transforms
from PIL import Image
import io
from ts.torch_handler.base_handler import BaseHandler

class ImageClassifierHandler(BaseHandler):
"""
Custom handler for image classification using ResNet18
"""
def __init__(self):
super(ImageClassifierHandler, self).__init__()
self.transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
self.classes = None

def initialize(self, context):
"""
Initialize model and load class labels
"""
super().initialize(context)

# Load the mapping of class ids to human-readable labels
with open('imagenet_classes.txt', 'r') as f:
self.classes = [line.strip() for line in f.readlines()]

def preprocess(self, data):
"""
Preprocess the input data
"""
images = []
for row in data:
# Assuming the input is an image in binary format
image = Image.open(io.BytesIO(row))
image = self.transform(image)
images.append(image)

return torch.stack(images)

def inference(self, data):
"""
Make predictions on the preprocessed data
"""
with torch.no_grad():
output = self.model(data)
probabilities = F.softmax(output, dim=1)
return probabilities

def postprocess(self, data):
"""
Post-process the model output
"""
# Get the top 5 predictions for each image
results = []
for probs in data:
top5_prob, top5_indices = torch.topk(probs, 5)
top5_prob = top5_prob.tolist()
top5_indices = top5_indices.tolist()
top5_classes = [self.classes[idx] for idx in top5_indices]

result = [
{"class": cls, "probability": prob}
for cls, prob in zip(top5_classes, top5_prob)
]
results.append(result)

return results

Step 3: Package Your Model with the Model Archiver

Now we need to package our model using the torch-model-archiver tool. This creates a .mar file (Model Archive) that contains your model weights and the handler.

bash
torch-model-archiver --model-name resnet18 \
--version 1.0 \
--model-file model.py \
--serialized-file resnet18_model.pth \
--handler image_classifier.py \
--extra-files imagenet_classes.txt

This command creates a resnet18.mar file with all the necessary components.

Step 4: Start TorchServe

Next, let's create a simple configuration file for TorchServe:

bash
# config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=model_store

Create a directory for storing model archives:

bash
mkdir -p model_store
mv resnet18.mar model_store/

Now, let's start TorchServe:

bash
torchserve --start --ncs --model-store model_store --models resnet18=resnet18.mar --ts-config config.properties

This command starts TorchServe with our ResNet18 model and makes it available for inference.

Step 5: Make Inference Requests

Now that our model is deployed, we can send inference requests to it using the REST API:

python
import requests
import json
from PIL import Image
import io

# Load an image
image = Image.open("cat.jpg")

# Convert to bytes
img_bytes = io.BytesIO()
image.save(img_bytes, format='JPEG')

# Make prediction request
url = "http://localhost:8080/predictions/resnet18"
response = requests.post(url, data=img_bytes.getvalue(), headers={'Content-Type': 'application/octet-stream'})

# Print the response
predictions = json.loads(response.text)
print(json.dumps(predictions, indent=2))

The output might look like:

json
[
{
"class": "Egyptian cat",
"probability": 0.8132
},
{
"class": "tabby, tabby cat",
"probability": 0.1023
},
{
"class": "lynx, catamount",
"probability": 0.0341
},
{
"class": "tiger cat",
"probability": 0.0201
},
{
"class": "Persian cat",
"probability": 0.0119
}
]

Advanced TorchServe Features

TorchServe offers several advanced features worth exploring:

1. Batch Inference

TorchServe can process multiple requests as a batch to improve throughput:

bash
# Update config.properties
batch_size=4
max_batch_delay=100

2. Model Versioning and Management

TorchServe's management API allows you to:

  • Register new model versions
  • Set a default version
  • Scale models up/down

Example of registering a new model version:

bash
curl -X POST "http://localhost:8081/models?url=resnet18_v2.mar&model_name=resnet18&version=2.0"

3. Metrics and Monitoring

TorchServe exposes Prometheus-compatible metrics:

bash
curl http://localhost:8082/metrics

4. Kubernetes Deployment

For production environments, you can deploy TorchServe on Kubernetes using the official Helm chart:

bash
helm repo add torchserve https://pytorch.github.io/serve/
helm install ts torchserve/torchserve

Real-World Use Case: Deploying a Sentiment Analysis Model

Let's look at a more complete example of deploying a sentiment analysis model:

  1. First, train a simple BERT-based sentiment classifier:
python
import torch
from transformers import BertForSequenceClassification, BertTokenizer

# Load pre-trained model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Save the model and tokenizer
torch.save(model.state_dict(), "bert_sentiment.pth")
tokenizer.save_pretrained("./bert_tokenizer")
  1. Create a custom handler for sentiment analysis:
python
# sentiment_handler.py
import torch
import json
from ts.torch_handler.base_handler import BaseHandler
from transformers import BertForSequenceClassification, BertTokenizer

class SentimentAnalysisHandler(BaseHandler):
def __init__(self):
super(SentimentAnalysisHandler, self).__init__()
self.tokenizer = None

def initialize(self, context):
super().initialize(context)

# Load the tokenizer
self.tokenizer = BertTokenizer.from_pretrained("bert_tokenizer")
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def preprocess(self, data):
text_list = []
for row in data:
text = row.decode('utf-8')
text_list.append(text)

# Tokenize the input texts
encodings = self.tokenizer(
text_list,
padding=True,
truncation=True,
return_tensors="pt",
max_length=128
)

return encodings

def inference(self, encodings):
# Move data to device and run inference
encodings = {k: v.to(self.device) for k, v in encodings.items()}
with torch.no_grad():
outputs = self.model(**encodings)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=1)

return probabilities

def postprocess(self, probabilities):
# Convert probabilities to sentiment labels and scores
results = []
for probs in probabilities:
negative_prob = probs[0].item()
positive_prob = probs[1].item()
sentiment = "positive" if positive_prob > negative_prob else "negative"
confidence = max(positive_prob, negative_prob)

results.append({
"sentiment": sentiment,
"confidence": confidence,
"positive_probability": positive_prob,
"negative_probability": negative_prob
})

return results
  1. Package and deploy the model:
bash
# Prepare the model archive
torch-model-archiver --model-name bert_sentiment \
--version 1.0 \
--serialized-file bert_sentiment.pth \
--handler sentiment_handler.py \
--extra-files "bert_tokenizer"

# Move to model store
mkdir -p model_store
mv bert_sentiment.mar model_store/

# Start TorchServe
torchserve --start --model-store model_store --models bert_sentiment=bert_sentiment.mar
  1. Make inference requests:
python
import requests
import json

# Sample text for sentiment analysis
text = "I absolutely loved this product! It exceeded all my expectations."

# Make prediction request
url = "http://localhost:8080/predictions/bert_sentiment"
response = requests.post(url, data=text.encode('utf-8'))

# Print the response
result = json.loads(response.text)
print(json.dumps(result, indent=2))

Output:

json
{
"sentiment": "positive",
"confidence": 0.9873,
"positive_probability": 0.9873,
"negative_probability": 0.0127
}

Troubleshooting Common Issues

Model Loading Errors

If your model fails to load, check:

  • The model's format and compatibility
  • File paths in your handler
  • Memory requirements

Request Timeout

If requests time out:

bash
# Increase timeout in config.properties
inference_request_timeout=300

Custom Dependencies

If your handler requires additional packages:

bash
# Create requirements.txt with dependencies
torch-model-archiver --model-name my_model \
--requirements-file requirements.txt \
...

Summary

In this tutorial, we've covered:

  1. What TorchServe is and why it's useful for PyTorch model deployment
  2. How to install and configure TorchServe
  3. The process of creating custom handlers for pre/post-processing
  4. Packaging models with the model archiver
  5. Starting TorchServe and making inference requests
  6. Advanced features like batch processing and model management
  7. A real-world example of deploying a sentiment analysis model

TorchServe provides a powerful, flexible framework for deploying PyTorch models in production environments. With its robust features and straightforward API, you can quickly make your models available for inference without getting bogged down in deployment details.

Additional Resources

Exercises

  1. Deploy a pre-trained image segmentation model (like DeepLabV3) using TorchServe.
  2. Create a custom handler for a text generation model that accepts a prompt and returns generated text.
  3. Implement a TorchServe deployment with dynamic batching and measure the performance difference.
  4. Set up TorchServe with multiple models and create a simple web application that uses them.
  5. Deploy TorchServe in a Docker container and configure it for horizontal scaling.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)