PyTorch ONNX Conversion

Introduction

When developing machine learning models with PyTorch, you'll often need to deploy these models in production environments that may not support PyTorch directly. This is where the Open Neural Network Exchange (ONNX) format comes in. ONNX provides a way to convert models between different frameworks, allowing you to train a model in PyTorch and deploy it in environments optimized for inference.

In this tutorial, we'll cover:

What ONNX is and why it's useful
How to convert PyTorch models to ONNX format
How to customize the export process
How to validate and optimize your converted models
Real-world examples of ONNX deployment

What is ONNX?

ONNX (Open Neural Network Exchange) is an open-source format designed to represent machine learning models. It defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

Key benefits of ONNX include:

Interoperability: Train models in one framework (like PyTorch) and deploy in another (like TensorFlow or ONNX Runtime)
Performance optimization: Access to specialized inference engines and hardware accelerators
Portability: Deploy to various platforms including cloud, edge devices, and mobile applications
Ecosystem support: Backed by major tech companies like Microsoft, Facebook, and Amazon

Basic PyTorch to ONNX Conversion

Let's start with a simple example of converting a PyTorch model to ONNX format:

python
import torch
import torch.nn as nn
import torch.onnx

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(100, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 10)
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Initialize model
model = SimpleModel()
model.eval()  # Set the model to evaluation mode

# Create dummy input for the model
dummy_input = torch.randn(1, 100)  # batch_size=1, input_size=100

# Export the model
torch.onnx.export(
    model,                      # model being run
    dummy_input,                # model input (or a tuple for multiple inputs)
    "simple_model.onnx",        # where to save the model
    export_params=True,         # store the trained parameter weights inside the model file
    opset_version=12,           # the ONNX version to export the model to
    do_constant_folding=True,   # whether to execute constant folding for optimization
    input_names=['input'],      # the model's input names
    output_names=['output'],    # the model's output names
    dynamic_axes={              # variable length axes
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

print("Model exported to simple_model.onnx")

When you run this code, PyTorch will convert your model to ONNX format and save it as simple_model.onnx.

Understanding the Export Parameters

Let's break down the parameters in the torch.onnx.export function:

model: The PyTorch model you want to export
dummy_input: A sample input that helps ONNX trace the model's execution
output_file: Path where the ONNX model will be saved
export_params: If True, the model parameters will be stored in the model file
opset_version: The version of the ONNX operator set to use (higher versions support more operations)
do_constant_folding: Optimization that replaces constant expressions with their results
input_names/output_names: Names for the input and output tensors
dynamic_axes: Specifies which dimensions can vary (such as batch size)

Verifying Your ONNX Model

After conversion, it's important to verify that your ONNX model works correctly. You can use the ONNX library to check the model structure and validate it:

python
import onnx

# Load the ONNX model
onnx_model = onnx.load("simple_model.onnx")

# Check that the model is well formed
onnx.checker.check_model(onnx_model)

# Print a human readable representation of the graph
print(onnx.helper.printable_graph(onnx_model.graph))

This code will validate that your ONNX model is structurally sound and print out a representation of the computational graph.

Running Inference with ONNX Runtime

Once you have your ONNX model, you can use ONNX Runtime for inference:

python
import onnxruntime as ort
import numpy as np

# Create an ONNX Runtime session
session = ort.InferenceSession("simple_model.onnx")

# Prepare input data as numpy array (matching the input shape expected by the model)
input_data = np.random.randn(1, 100).astype(np.float32)

# Run inference
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
result = session.run([output_name], {input_name: input_data})

print(f"Input shape: {input_data.shape}")
print(f"Output shape: {result[0].shape}")
print(f"Output: {result[0]}")

The output will show the shape and values of your model's predictions based on the random input data.

Converting More Complex Models

CNN Example

Let's see how to convert a more complex model like a CNN for image classification:

python
import torch
import torch.nn as nn
import torchvision.models as models

# Load a pretrained ResNet model
model = models.resnet18(pretrained=True)
model.eval()

# Create dummy input (3-channel image)
dummy_input = torch.randn(1, 3, 224, 224)

# Export the model
torch.onnx.export(
    model,
    dummy_input,
    "resnet18.onnx",
    export_params=True,
    opset_version=12,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

print("ResNet18 model exported to resnet18.onnx")

RNN Example

RNNs and LSTMs can be a bit trickier to export due to their recurrent nature, but ONNX supports them:

python
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        # Take the output from the last time step
        output = self.linear(lstm_out[:, -1, :])
        return output

# Initialize model
model = LSTMModel(input_size=10, hidden_size=20, output_size=5)
model.eval()

# Create dummy input (sequence length = 15, input features = 10)
dummy_input = torch.randn(1, 15, 10)  # [batch_size, seq_len, input_size]

# Export the model
torch.onnx.export(
    model, 
    dummy_input,
    "lstm_model.onnx",
    export_params=True,
    opset_version=12,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {0: 'batch_size', 1: 'seq_length'},
        'output': {0: 'batch_size'}
    }
)

print("LSTM model exported to lstm_model.onnx")

Common Challenges and Solutions

1. Handling Custom Operations

If your PyTorch model includes custom operations, you may need to register them with ONNX:

python
from torch.onnx import register_custom_op_symbolic

# Define how your custom op should be translated
def my_custom_op_symbolic(g, input):
    return g.op("MyCustomOp", input)

# Register the op
register_custom_op_symbolic('::my_custom_op', my_custom_op_symbolic, opset_version=12)

2. Dealing with Unsupported Operations

Sometimes you'll encounter operations that aren't supported by ONNX. In these cases, you might need to:

Use a higher ONNX opset version
Reimplement the problematic part of your model using supported operations
Create a custom implementation of the operation

3. Model Size Optimization

For large models, you may want to optimize the ONNX file size:

python
import onnx
from onnxruntime.tools.optimize import optimize_model

# Load the model
model_path = "large_model.onnx"
original_model = onnx.load(model_path)

# Optimize the model
optimized_model = optimize_model(original_model)

# Save the optimized model
onnx.save(optimized_model, "optimized_model.onnx")

Real-World Application: Deploying to Mobile

One common use case for ONNX conversion is deploying PyTorch models to mobile devices. Here's a simplified workflow:

Convert your PyTorch model to ONNX:

python
model = YourPyTorchModel()
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "mobile_model.onnx", 
                  export_params=True)

Optimize the model for mobile:

python
# Using ONNX Runtime tools
from onnxruntime.tools.optimize import optimize_model

optimized_model = optimize_model("mobile_model.onnx",
                                model_type="mobilenet",
                                optimization_level=99)
onnx.save(optimized_model, "mobile_optimized.onnx")

Integrate with mobile frameworks:
- For Android: Use ONNX Runtime for Android
- For iOS: Use Core ML (by converting ONNX to Core ML) or ONNX Runtime for iOS

Real-World Application: Cloud Deployment

For cloud deployment, you might want to use ONNX Runtime as a web service:

python
from flask import Flask, request, jsonify
import onnxruntime as ort
import numpy as np
from PIL import Image
import io

app = Flask(__name__)

# Load the ONNX model
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

@app.route('/predict', methods=['POST'])
def predict():
    if 'file' not in request.files:
        return jsonify({'error': 'No file part'})
    
    file = request.files['file']
    img = Image.open(io.BytesIO(file.read()))
    img = img.resize((224, 224)).convert('RGB')
    
    # Preprocess the image
    input_data = np.array(img).transpose(2, 0, 1).astype('float32')
    input_data = input_data / 255.0  # normalize
    input_data = input_data.reshape(1, 3, 224, 224)
    
    # Run inference
    result = session.run([output_name], {input_name: input_data})
    
    # Process the result (e.g., get class with highest probability)
    prediction = np.argmax(result[0], axis=1)[0]
    
    return jsonify({'prediction': int(prediction)})

if __name__ == '__main__':
    app.run(debug=True)

Summary

In this tutorial, we've learned:

What ONNX is and why it's useful for model deployment
How to convert simple and complex PyTorch models to ONNX format
How to verify and run inference with ONNX models
How to handle common challenges during conversion
Real-world applications of ONNX models in mobile and cloud environments

ONNX conversion is an essential skill for deploying PyTorch models in production environments, allowing you to leverage the best of both worlds: PyTorch's flexibility during development and specialized runtimes optimized for inference during deployment.

Additional Resources

Official ONNX GitHub Repository
PyTorch ONNX Documentation
ONNX Runtime Documentation
ONNX Model Zoo - Collection of pre-trained, state-of-the-art models in ONNX format

Exercises

Convert a pre-trained PyTorch model (like ResNet, BERT, or your own custom model) to ONNX and run inference with ONNX Runtime.
Compare the inference speed between PyTorch and ONNX Runtime for the same model.
Create a web service that accepts images and returns predictions using your ONNX model.
Try converting a model with custom layers and handle any challenges that arise.
Optimize an ONNX model for deployment on a resource-constrained device like a Raspberry Pi.

By mastering the PyTorch to ONNX conversion process, you'll greatly expand the environments where you can deploy your machine learning models, making your skills more versatile and valuable in production settings.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is ONNX?​

Basic PyTorch to ONNX Conversion​

Understanding the Export Parameters​

Verifying Your ONNX Model​

Running Inference with ONNX Runtime​

Converting More Complex Models​

CNN Example​

RNN Example​

Common Challenges and Solutions​

1. Handling Custom Operations​

2. Dealing with Unsupported Operations​

3. Model Size Optimization​

Real-World Application: Deploying to Mobile​

Real-World Application: Cloud Deployment​

Summary​

Additional Resources​

Exercises​