PyTorch C++ Deployment

Introduction

PyTorch is one of the most popular deep learning frameworks used for building and training neural networks. While Python is excellent for model development and experimentation, there are many scenarios where deploying models in a production environment requires faster execution, lower memory footprint, or integration with existing C++ applications.

This is where PyTorch's C++ API, known as LibTorch, comes in. LibTorch allows you to:

Deploy models in C++ environments without Python dependencies
Integrate deep learning capabilities into existing C++ applications
Achieve potentially better performance for inference tasks
Create standalone applications that can run on devices without Python

In this guide, we'll walk through the entire process of taking a PyTorch model developed in Python and deploying it using C++.

Prerequisites

Before diving into PyTorch C++ deployment, you should have:

Basic knowledge of PyTorch in Python
Familiarity with C++ programming
A pre-trained PyTorch model you want to deploy
C++ development environment (compiler, build system)

Setting Up the LibTorch Environment

Step 1: Download LibTorch

First, you need to download the LibTorch library, which is the C++ distribution of PyTorch:

Visit the PyTorch website
Select your preferences (stable/nightly, your OS, C++/LibTorch, your CUDA version)
Download the recommended binary

bash
# Example for downloading LibTorch on Linux with CUDA 11.7
wget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcu117.zip
unzip libtorch-cxx11-abi-shared-with-deps-2.0.0+cu117.zip

Step 2: Configure Your Build System

For a simple CMake configuration:

cmake
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(torch_cpp_example)

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(example example.cpp)
target_link_libraries(example "${TORCH_LIBRARIES}")

set_property(TARGET example PROPERTY CXX_STANDARD 14)

Exporting PyTorch Models for C++

Before deploying a model in C++, you need to export it from Python. PyTorch provides a few ways to do this:

Using TorchScript

TorchScript is a way to create serializable and optimizable models from PyTorch code. There are two ways to convert a model to TorchScript:

Tracing: Runs example inputs through the model and captures the operations
Scripting: Directly analyzes your model code and converts it

Example: Tracing a Model

python
import torch
import torchvision

# Load a pretrained model
model = torchvision.models.resnet18(pretrained=True)
model.eval()

# Create an example input
example_input = torch.rand(1, 3, 224, 224)

# Trace the model
traced_script_module = torch.jit.trace(model, example_input)

# Save the model for C++ usage
traced_script_module.save("resnet18_traced.pt")

Example: Scripting a Model

python
import torch

class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = torch.nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.fc(x)
        return torch.nn.functional.softmax(x, dim=1)

# Initialize the model
model = SimpleModel()
model.eval()

# Script the model
scripted_model = torch.jit.script(model)

# Save the model for C++ usage
scripted_model.save("simple_model_scripted.pt")

Loading and Running Models in C++

Now that you have exported your model, let's see how to load and use it in C++.

Basic Model Loading and Inference

cpp
#include <torch/script.h>
#include <iostream>
#include <memory>

int main() {
    try {
        // Load the model
        std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("resnet18_traced.pt");
        
        // Create an input tensor
        std::vector<torch::jit::IValue> inputs;
        inputs.push_back(torch::ones({1, 3, 224, 224}));
        
        // Execute the model
        torch::Tensor output = module->forward(inputs).toTensor();
        
        // Process the outputs
        auto max_result = output.argmax(1);
        std::cout << "Predicted class: " << max_result.item<int64_t>() << std::endl;
    }
    catch (const c10::Error& e) {
        std::cerr << "Error loading the model: " << e.what() << std::endl;
        return -1;
    }
    
    return 0;
}

Expected Output

Predicted class: 761

Optimizing Inference in C++

One of the main reasons to use C++ for deployment is performance. Let's explore how to optimize your model inference:

Batch Processing

Processing multiple inputs at once can improve throughput:

cpp
// Create a batch of inputs
auto batch_input = torch::ones({4, 3, 224, 224});
std::vector<torch::jit::IValue> inputs;
inputs.push_back(batch_input);

// Process the batch
torch::Tensor output = module->forward(inputs).toTensor();

// Get predictions for each item in the batch
auto predictions = output.argmax(1);
for (int i = 0; i < predictions.size(0); i++) {
    std::cout << "Batch item " << i << " prediction: " << predictions[i].item<int64_t>() << std::endl;
}

Using GPU Acceleration

If you have a CUDA-enabled GPU, you can accelerate inference:

cpp
// Check if CUDA is available
if (torch::cuda::is_available()) {
    // Move the model to GPU
    module->to(torch::kCUDA);
    
    // Create input tensor on GPU
    auto gpu_input = torch::ones({1, 3, 224, 224}).to(torch::kCUDA);
    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(gpu_input);
    
    // Run inference on GPU
    auto output = module->forward(inputs).toTensor();
    
    // Move result back to CPU for processing if needed
    auto cpu_output = output.to(torch::kCPU);
    std::cout << "Prediction: " << cpu_output.argmax(1).item<int64_t>() << std::endl;
}
else {
    std::cout << "CUDA is not available, using CPU" << std::endl;
    // Fall back to CPU implementation
}

Quantization for Reduced Memory Usage

You can quantize your model to reduce memory usage and potentially increase speed:

cpp
// Note: Quantization is typically done in Python before exporting
// In C++, you would just load the pre-quantized model
std::shared_ptr<torch::jit::script::Module> quantized_module = torch::jit::load("quantized_model.pt");

Real-World Application: Image Classification Service

Let's implement a more complete example: a C++ service that loads an image classification model and processes images.

cpp
#include <torch/script.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <fstream>
#include <vector>
#include <string>

// Preprocess an image for the model
torch::Tensor preprocess_image(const cv::Mat& image) {
    // Resize to model input size
    cv::Mat resized_image;
    cv::resize(image, resized_image, cv::Size(224, 224));
    
    // Convert to float and normalize
    cv::Mat float_image;
    resized_image.convertTo(float_image, CV_32F, 1.0/255);
    
    // Normalize with ImageNet mean and std
    cv::Scalar mean(0.485, 0.456, 0.406);
    cv::Scalar std(0.229, 0.224, 0.225);
    
    cv::Mat channels[3];
    cv::split(float_image, channels);
    
    for (int i = 0; i < 3; i++) {
        channels[i] = (channels[i] - mean[i]) / std[i];
    }
    
    cv::merge(channels, 3, float_image);
    
    // Convert to tensor format NCHW (batch, channels, height, width)
    torch::Tensor tensor_image = torch::from_blob(float_image.data, 
                                                 {1, float_image.rows, float_image.cols, 3},
                                                 torch::kFloat32);
    tensor_image = tensor_image.permute({0, 3, 1, 2});
    
    return tensor_image;
}

// Load labels from file
std::vector<std::string> load_labels(const std::string& filename) {
    std::vector<std::string> labels;
    std::ifstream file(filename);
    std::string line;
    
    while (std::getline(file, line)) {
        labels.push_back(line);
    }
    
    return labels;
}

int main(int argc, char** argv) {
    if (argc < 3) {
        std::cerr << "Usage: " << argv[0] << " <model_path> <image_path> [labels_file]" << std::endl;
        return 1;
    }
    
    try {
        // Load the model
        std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
        module->eval();
        
        // Load the image
        cv::Mat image = cv::imread(argv[2]);
        if (image.empty()) {
            std::cerr << "Could not read the image: " << argv[2] << std::endl;
            return 1;
        }
        
        // Preprocess the image
        torch::Tensor tensor_image = preprocess_image(image);
        
        // Forward pass
        std::vector<torch::jit::IValue> inputs;
        inputs.push_back(tensor_image);
        
        torch::Tensor output = module->forward(inputs).toTensor();
        
        // Apply softmax to get probabilities
        auto probabilities = torch::softmax(output, 1);
        
        // Get top prediction
        auto values, indices = torch::topk(probabilities, 5, 1);
        
        // Print results
        if (argc > 3) {
            // If labels file is provided
            auto labels = load_labels(argv[3]);
            
            for (int i = 0; i < 5; i++) {
                int idx = indices[0][i].item<int>();
                float prob = values[0][i].item<float>();
                
                std::cout << i+1 << ". " << labels[idx] << ": " << (prob * 100.0) << "%" << std::endl;
            }
        } else {
            // Just print the indices
            std::cout << "Top 5 predictions (class indices):" << std::endl;
            for (int i = 0; i < 5; i++) {
                int idx = indices[0][i].item<int>();
                float prob = values[0][i].item<float>();
                
                std::cout << i+1 << ". Class " << idx << ": " << (prob * 100.0) << "%" << std::endl;
            }
        }
    }
    catch (const c10::Error& e) {
        std::cerr << "Error: " << e.what() << std::endl;
        return 1;
    }
    
    return 0;
}

To build this example, you'll need OpenCV installed in addition to LibTorch. Update your CMake file:

cmake
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(image_classifier)

find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)

include_directories(${OpenCV_INCLUDE_DIRS})

add_executable(classifier classifier.cpp)
target_link_libraries(classifier "${TORCH_LIBRARIES}" "${OpenCV_LIBS}")

set_property(TARGET classifier PROPERTY CXX_STANDARD 14)

Practical Considerations for Production Deployment

When deploying PyTorch models in C++ production environments, consider the following:

Model Versioning and Updates

cpp
// Save model version and metadata
void load_model(const std::string& model_path, int expected_version) {
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(model_path);
    
    // Check if this model has the expected version (assuming model stores version as attribute)
    try {
        auto version = module->attr("version").toInt();
        if (version != expected_version) {
            std::cerr << "Warning: Model version mismatch. Expected " 
                     << expected_version << " but got " << version << std::endl;
        }
    }
    catch (...) {
        std::cerr << "Warning: Model has no version information" << std::endl;
    }
    
    return module;
}

Error Handling

cpp
try {
    // Load model and perform inference
    auto module = torch::jit::load(model_path);
    auto output = module->forward(inputs).toTensor();
}
catch (const c10::Error& e) {
    std::cerr << "PyTorch error: " << e.what() << std::endl;
    // Handle gracefully, perhaps fall back to default prediction
}
catch (const std::exception& e) {
    std::cerr << "Standard error: " << e.what() << std::endl;
    // Handle other errors
}

Multi-threading for Parallel Inference

cpp
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>

class InferenceWorker {
public:
    InferenceWorker(const std::string& model_path) {
        // Load model in constructor
        module_ = torch::jit::load(model_path);
    }
    
    void process_images(const std::vector<torch::Tensor>& batch_images, 
                        std::vector<torch::Tensor>& results) {
        std::vector<torch::jit::IValue> inputs;
        
        // Stack images into a batch
        auto batch = torch::stack(batch_images);
        inputs.push_back(batch);
        
        // Process batch
        auto output = module_->forward(inputs).toTensor();
        
        // Store results
        for (int i = 0; i < output.size(0); i++) {
            results.push_back(output[i]);
        }
    }
    
private:
    std::shared_ptr<torch::jit::script::Module> module_;
};

Summary

In this guide, we've covered the essential aspects of deploying PyTorch models using C++ with LibTorch:

Setting up the LibTorch environment - Downloading and configuring your build system
Exporting models from Python - Using tracing and scripting to convert models
Loading and running models in C++ - Basic inference with exported models
Optimizing inference - Batch processing, GPU acceleration, and quantization
Building a real-world application - An image classification service
Production considerations - Model versioning, error handling, and multi-threading

PyTorch C++ deployment offers a powerful way to leverage your deep learning models in production environments, providing better performance, integration with C++ codebases, and standalone applications without Python dependencies.

Additional Resources

Exercises

Basic TorchScript Export: Train a simple neural network in Python (such as an MNIST classifier) and export it using both tracing and scripting methods.
Model Loading: Write a C++ program that loads your exported model and runs inference on a test image.
Performance Comparison: Compare the inference speed of your model in Python vs. C++ on the same hardware. Create a benchmark that processes 1000 samples and measures the total time.
GPU Acceleration: Modify your C++ inference program to use GPU acceleration if available. Compare performance with CPU-only inference.
Real-World Application: Create a simple REST API service using a C++ web framework (like Crow or Pistache) that accepts images and returns classifications using your deployed model.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Setting Up the LibTorch Environment​

Step 1: Download LibTorch​

Step 2: Configure Your Build System​

Exporting PyTorch Models for C++​

Using TorchScript​

Example: Tracing a Model​

Example: Scripting a Model​

Loading and Running Models in C++​

Basic Model Loading and Inference​

Expected Output​

Optimizing Inference in C++​

Batch Processing​

Using GPU Acceleration​

Quantization for Reduced Memory Usage​

Real-World Application: Image Classification Service​

Practical Considerations for Production Deployment​

Model Versioning and Updates​

Error Handling​

Multi-threading for Parallel Inference​

Summary​

Additional Resources​

Exercises​