PyTorch C++ Deployment
Introduction
PyTorch is one of the most popular deep learning frameworks used for building and training neural networks. While Python is excellent for model development and experimentation, there are many scenarios where deploying models in a production environment requires faster execution, lower memory footprint, or integration with existing C++ applications.
This is where PyTorch's C++ API, known as LibTorch, comes in. LibTorch allows you to:
- Deploy models in C++ environments without Python dependencies
- Integrate deep learning capabilities into existing C++ applications
- Achieve potentially better performance for inference tasks
- Create standalone applications that can run on devices without Python
In this guide, we'll walk through the entire process of taking a PyTorch model developed in Python and deploying it using C++.
Prerequisites
Before diving into PyTorch C++ deployment, you should have:
- Basic knowledge of PyTorch in Python
- Familiarity with C++ programming
- A pre-trained PyTorch model you want to deploy
- C++ development environment (compiler, build system)
Setting Up the LibTorch Environment
Step 1: Download LibTorch
First, you need to download the LibTorch library, which is the C++ distribution of PyTorch:
- Visit the PyTorch website
- Select your preferences (stable/nightly, your OS, C++/LibTorch, your CUDA version)
- Download the recommended binary
# Example for downloading LibTorch on Linux with CUDA 11.7
wget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcu117.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.0.0+cu117.zip
Step 2: Configure Your Build System
For a simple CMake configuration:
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(torch_cpp_example)
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
add_executable(example example.cpp)
target_link_libraries(example "${TORCH_LIBRARIES}")
set_property(TARGET example PROPERTY CXX_STANDARD 14)
Exporting PyTorch Models for C++
Before deploying a model in C++, you need to export it from Python. PyTorch provides a few ways to do this:
Using TorchScript
TorchScript is a way to create serializable and optimizable models from PyTorch code. There are two ways to convert a model to TorchScript:
- Tracing: Runs example inputs through the model and captures the operations
- Scripting: Directly analyzes your model code and converts it
Example: Tracing a Model
import torch
import torchvision
# Load a pretrained model
model = torchvision.models.resnet18(pretrained=True)
model.eval()
# Create an example input
example_input = torch.rand(1, 3, 224, 224)
# Trace the model
traced_script_module = torch.jit.trace(model, example_input)
# Save the model for C++ usage
traced_script_module.save("resnet18_traced.pt")
Example: Scripting a Model
import torch
class SimpleModel(torch.nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc = torch.nn.Linear(100, 10)
def forward(self, x):
x = self.fc(x)
return torch.nn.functional.softmax(x, dim=1)
# Initialize the model
model = SimpleModel()
model.eval()
# Script the model
scripted_model = torch.jit.script(model)
# Save the model for C++ usage
scripted_model.save("simple_model_scripted.pt")
Loading and Running Models in C++
Now that you have exported your model, let's see how to load and use it in C++.
Basic Model Loading and Inference
#include <torch/script.h>
#include <iostream>
#include <memory>
int main() {
try {
// Load the model
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("resnet18_traced.pt");
// Create an input tensor
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));
// Execute the model
torch::Tensor output = module->forward(inputs).toTensor();
// Process the outputs
auto max_result = output.argmax(1);
std::cout << "Predicted class: " << max_result.item<int64_t>() << std::endl;
}
catch (const c10::Error& e) {
std::cerr << "Error loading the model: " << e.what() << std::endl;
return -1;
}
return 0;
}
Expected Output
Predicted class: 761
Optimizing Inference in C++
One of the main reasons to use C++ for deployment is performance. Let's explore how to optimize your model inference:
Batch Processing
Processing multiple inputs at once can improve throughput:
// Create a batch of inputs
auto batch_input = torch::ones({4, 3, 224, 224});
std::vector<torch::jit::IValue> inputs;
inputs.push_back(batch_input);
// Process the batch
torch::Tensor output = module->forward(inputs).toTensor();
// Get predictions for each item in the batch
auto predictions = output.argmax(1);
for (int i = 0; i < predictions.size(0); i++) {
std::cout << "Batch item " << i << " prediction: " << predictions[i].item<int64_t>() << std::endl;
}
Using GPU Acceleration
If you have a CUDA-enabled GPU, you can accelerate inference:
// Check if CUDA is available
if (torch::cuda::is_available()) {
// Move the model to GPU
module->to(torch::kCUDA);
// Create input tensor on GPU
auto gpu_input = torch::ones({1, 3, 224, 224}).to(torch::kCUDA);
std::vector<torch::jit::IValue> inputs;
inputs.push_back(gpu_input);
// Run inference on GPU
auto output = module->forward(inputs).toTensor();
// Move result back to CPU for processing if needed
auto cpu_output = output.to(torch::kCPU);
std::cout << "Prediction: " << cpu_output.argmax(1).item<int64_t>() << std::endl;
}
else {
std::cout << "CUDA is not available, using CPU" << std::endl;
// Fall back to CPU implementation
}
Quantization for Reduced Memory Usage
You can quantize your model to reduce memory usage and potentially increase speed:
// Note: Quantization is typically done in Python before exporting
// In C++, you would just load the pre-quantized model
std::shared_ptr<torch::jit::script::Module> quantized_module = torch::jit::load("quantized_model.pt");
Real-World Application: Image Classification Service
Let's implement a more complete example: a C++ service that loads an image classification model and processes images.
#include <torch/script.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
// Preprocess an image for the model
torch::Tensor preprocess_image(const cv::Mat& image) {
// Resize to model input size
cv::Mat resized_image;
cv::resize(image, resized_image, cv::Size(224, 224));
// Convert to float and normalize
cv::Mat float_image;
resized_image.convertTo(float_image, CV_32F, 1.0/255);
// Normalize with ImageNet mean and std
cv::Scalar mean(0.485, 0.456, 0.406);
cv::Scalar std(0.229, 0.224, 0.225);
cv::Mat channels[3];
cv::split(float_image, channels);
for (int i = 0; i < 3; i++) {
channels[i] = (channels[i] - mean[i]) / std[i];
}
cv::merge(channels, 3, float_image);
// Convert to tensor format NCHW (batch, channels, height, width)
torch::Tensor tensor_image = torch::from_blob(float_image.data,
{1, float_image.rows, float_image.cols, 3},
torch::kFloat32);
tensor_image = tensor_image.permute({0, 3, 1, 2});
return tensor_image;
}
// Load labels from file
std::vector<std::string> load_labels(const std::string& filename) {
std::vector<std::string> labels;
std::ifstream file(filename);
std::string line;
while (std::getline(file, line)) {
labels.push_back(line);
}
return labels;
}
int main(int argc, char** argv) {
if (argc < 3) {
std::cerr << "Usage: " << argv[0] << " <model_path> <image_path> [labels_file]" << std::endl;
return 1;
}
try {
// Load the model
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
module->eval();
// Load the image
cv::Mat image = cv::imread(argv[2]);
if (image.empty()) {
std::cerr << "Could not read the image: " << argv[2] << std::endl;
return 1;
}
// Preprocess the image
torch::Tensor tensor_image = preprocess_image(image);
// Forward pass
std::vector<torch::jit::IValue> inputs;
inputs.push_back(tensor_image);
torch::Tensor output = module->forward(inputs).toTensor();
// Apply softmax to get probabilities
auto probabilities = torch::softmax(output, 1);
// Get top prediction
auto values, indices = torch::topk(probabilities, 5, 1);
// Print results
if (argc > 3) {
// If labels file is provided
auto labels = load_labels(argv[3]);
for (int i = 0; i < 5; i++) {
int idx = indices[0][i].item<int>();
float prob = values[0][i].item<float>();
std::cout << i+1 << ". " << labels[idx] << ": " << (prob * 100.0) << "%" << std::endl;
}
} else {
// Just print the indices
std::cout << "Top 5 predictions (class indices):" << std::endl;
for (int i = 0; i < 5; i++) {
int idx = indices[0][i].item<int>();
float prob = values[0][i].item<float>();
std::cout << i+1 << ". Class " << idx << ": " << (prob * 100.0) << "%" << std::endl;
}
}
}
catch (const c10::Error& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
return 0;
}
To build this example, you'll need OpenCV installed in addition to LibTorch. Update your CMake file:
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(image_classifier)
find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
add_executable(classifier classifier.cpp)
target_link_libraries(classifier "${TORCH_LIBRARIES}" "${OpenCV_LIBS}")
set_property(TARGET classifier PROPERTY CXX_STANDARD 14)
Practical Considerations for Production Deployment
When deploying PyTorch models in C++ production environments, consider the following:
Model Versioning and Updates
// Save model version and metadata
void load_model(const std::string& model_path, int expected_version) {
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(model_path);
// Check if this model has the expected version (assuming model stores version as attribute)
try {
auto version = module->attr("version").toInt();
if (version != expected_version) {
std::cerr << "Warning: Model version mismatch. Expected "
<< expected_version << " but got " << version << std::endl;
}
}
catch (...) {
std::cerr << "Warning: Model has no version information" << std::endl;
}
return module;
}
Error Handling
try {
// Load model and perform inference
auto module = torch::jit::load(model_path);
auto output = module->forward(inputs).toTensor();
}
catch (const c10::Error& e) {
std::cerr << "PyTorch error: " << e.what() << std::endl;
// Handle gracefully, perhaps fall back to default prediction
}
catch (const std::exception& e) {
std::cerr << "Standard error: " << e.what() << std::endl;
// Handle other errors
}
Multi-threading for Parallel Inference
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>
class InferenceWorker {
public:
InferenceWorker(const std::string& model_path) {
// Load model in constructor
module_ = torch::jit::load(model_path);
}
void process_images(const std::vector<torch::Tensor>& batch_images,
std::vector<torch::Tensor>& results) {
std::vector<torch::jit::IValue> inputs;
// Stack images into a batch
auto batch = torch::stack(batch_images);
inputs.push_back(batch);
// Process batch
auto output = module_->forward(inputs).toTensor();
// Store results
for (int i = 0; i < output.size(0); i++) {
results.push_back(output[i]);
}
}
private:
std::shared_ptr<torch::jit::script::Module> module_;
};
Summary
In this guide, we've covered the essential aspects of deploying PyTorch models using C++ with LibTorch:
- Setting up the LibTorch environment - Downloading and configuring your build system
- Exporting models from Python - Using tracing and scripting to convert models
- Loading and running models in C++ - Basic inference with exported models
- Optimizing inference - Batch processing, GPU acceleration, and quantization
- Building a real-world application - An image classification service
- Production considerations - Model versioning, error handling, and multi-threading
PyTorch C++ deployment offers a powerful way to leverage your deep learning models in production environments, providing better performance, integration with C++ codebases, and standalone applications without Python dependencies.
Additional Resources
- PyTorch C++ API Documentation
- LibTorch Tutorials
- TorchScript Documentation
- PyTorch Mobile for edge deployment
Exercises
-
Basic TorchScript Export: Train a simple neural network in Python (such as an MNIST classifier) and export it using both tracing and scripting methods.
-
Model Loading: Write a C++ program that loads your exported model and runs inference on a test image.
-
Performance Comparison: Compare the inference speed of your model in Python vs. C++ on the same hardware. Create a benchmark that processes 1000 samples and measures the total time.
-
GPU Acceleration: Modify your C++ inference program to use GPU acceleration if available. Compare performance with CPU-only inference.
-
Real-World Application: Create a simple REST API service using a C++ web framework (like Crow or Pistache) that accepts images and returns classifications using your deployed model.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)