TensorFlow ONNX Conversion

Introduction

When deploying machine learning models across different platforms and frameworks, you often face compatibility challenges. A model trained in TensorFlow may need to run in an environment that doesn't support TensorFlow natively. This is where the Open Neural Network Exchange (ONNX) format comes in.

ONNX is an open standard designed to represent machine learning models in a framework-agnostic way. By converting your TensorFlow model to ONNX, you can deploy it on a variety of platforms and hardware that may not directly support TensorFlow, but do support ONNX. This includes deployment targets like:

Microsoft's Windows ML
NVIDIA TensorRT
Intel OpenVINO
Various mobile and edge devices
Other ML frameworks like PyTorch

In this tutorial, we'll walk through the process of converting TensorFlow models to the ONNX format, understand the benefits and limitations, and explore practical applications.

Prerequisites

Before you begin, make sure you have the following installed:

pip install tensorflow
pip install tf2onnx
pip install onnxruntime

Basic Conversion Process

Converting a TensorFlow model to ONNX involves a few key steps. Let's start with a simple example.

Step 1: Create or Load a TensorFlow Model

First, let's create a simple TensorFlow model:

import tensorflow as tf

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28)),
    tf.keras.layers.Reshape((28, 28, 1)),
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Summary of the model architecture
model.summary()

Step 2: Convert the Model to ONNX

Now, let's convert this TensorFlow model to the ONNX format:

import tf2onnx
import onnx

# Method 1: Using tf2onnx's convert_keras function
onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature=(tf.TensorSpec((None, 28, 28), tf.float32, name="input"),), opset=13)

# Save the model
onnx.save(onnx_model, "model.onnx")

print("Model converted to ONNX format and saved as 'model.onnx'")

Alternatively, you can use the command-line interface:

# For a saved Keras model
python -m tf2onnx.convert --saved-model path/to/saved_model --output model.onnx

# For a Keras H5 file
python -m tf2onnx.convert --keras path/to/model.h5 --output model.onnx

Step 3: Verify the Converted Model

It's always a good practice to verify that your converted model works as expected:

import numpy as np
import onnxruntime as ort

# Create a random input tensor matching your model's input shape
random_input = np.random.rand(1, 28, 28).astype(np.float32)

# Run inference with TensorFlow model
tf_output = model.predict(random_input)

# Run inference with ONNX model
ort_session = ort.InferenceSession("model.onnx")
onnx_output = ort_session.run(None, {"input": random_input})[0]

# Compare the outputs
print("TensorFlow output shape:", tf_output.shape)
print("ONNX output shape:", onnx_output.shape)
print("Outputs match:", np.allclose(tf_output, onnx_output, rtol=1e-5, atol=1e-5))

Advanced Conversion Techniques

Handling Custom Layers and Operations

If your TensorFlow model contains custom layers or operations, you may need to provide custom conversion functions:

class CustomLayer(tf.keras.layers.Layer):
    def __init__(self, units=32):
        super(CustomLayer, self).__init__()
        self.units = units
        
    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer='random_normal',
            trainable=True,
        )
        
    def call(self, inputs):
        # Custom operation: multiply inputs by weights and apply tanh
        return tf.math.tanh(tf.matmul(inputs, self.w))

# Function to handle custom layer conversion
def convert_custom_layer(scope, operator, container):
    # Get input and output tensors
    input_tensor = operator.inputs[0].full_name
    output_tensor = operator.outputs[0].full_name
    
    # Get layer weights
    weights = operator.raw_operator.weights
    weights_tensor = scope.get_unique_variable_name('W')
    container.add_initializer(weights_tensor, onnx_proto.TensorProto.FLOAT,
                            weights.shape, weights.numpy().flatten().tolist())
    
    # Create MatMul + Tanh nodes
    matmul_output = scope.get_unique_variable_name('matmul_output')
    container.add_node('MatMul', [input_tensor, weights_tensor], 
                      [matmul_output], name=scope.get_unique_operator_name('MatMul'))
    container.add_node('Tanh', [matmul_output], [output_tensor], 
                      name=scope.get_unique_operator_name('Tanh'))

# Register the custom converter
from tf2onnx.handler import tf_op
tf_op("CustomLayer", domain="", convert_custom_layer)

Converting Models with Dynamic Dimensions

ONNX supports dynamic dimensions, which is useful when your model can handle inputs of varying sizes:

# Define a model with dynamic input dimensions
dynamic_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(None, None, 3)),  # Dynamic height and width
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Convert to ONNX with dynamic dimensions
onnx_model, _ = tf2onnx.convert.from_keras(
    dynamic_model, 
    input_signature=(tf.TensorSpec((None, None, None, 3), tf.float32, name="input"),),
    opset=13
)

onnx.save(onnx_model, "dynamic_model.onnx")

Real-World Applications

Example 1: Cross-Platform Image Classification

Let's build and convert a pre-trained image classification model:

# Load a pre-trained MobileNetV2 model
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=True,
    weights='imagenet'
)

# Convert to ONNX
onnx_model, _ = tf2onnx.convert.from_keras(
    base_model,
    input_signature=(tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),),
    opset=13
)

onnx.save(onnx_model, "mobilenetv2.onnx")

# Example of using this model for inference
def process_image(image_path):
    from PIL import Image
    
    # Load and preprocess the image
    img = Image.open(image_path).resize((224, 224))
    img_array = np.array(img).astype(np.float32)
    
    # Add batch dimension and normalize
    img_array = np.expand_dims(img_array, axis=0)
    img_array = tf.keras.applications.mobilenet_v2.preprocess_input(img_array)
    
    return img_array

# Run inference with ONNX Runtime
def onnx_predict(image_path, onnx_model_path="mobilenetv2.onnx"):
    img_array = process_image(image_path)
    
    ort_session = ort.InferenceSession(onnx_model_path)
    predictions = ort_session.run(None, {"input": img_array})[0]
    
    # Decode predictions
    decoded_predictions = tf.keras.applications.mobilenet_v2.decode_predictions(predictions, top=5)[0]
    
    return decoded_predictions

# Example usage
# results = onnx_predict("dog.jpg")
# for i, (imagenet_id, label, score) in enumerate(results):
#     print(f"{i+1}: {label} ({score:.2f})")

Example 2: Deploying TensorFlow Models on Edge Devices

Many edge devices have limited support for TensorFlow but offer ONNX Runtime:

# Create a lightweight model suitable for edge deployment
edge_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(128, 128, 3)),
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(2, activation='softmax')  # Binary classification
])

# Convert to ONNX
onnx_model, _ = tf2onnx.convert.from_keras(
    edge_model,
    input_signature=(tf.TensorSpec((None, 128, 128, 3), tf.float32, name="input"),),
    opset=13
)

onnx.save(onnx_model, "edge_model.onnx")

# Optimize the model for inference (reduces model size and improves performance)
from onnxruntime.transformers.optimizer import optimize_model

optimized_model = optimize_model("edge_model.onnx")
optimized_model.save_model_to_file("edge_model_optimized.onnx")

print("Original model size:", os.path.getsize("edge_model.onnx") / (1024 * 1024), "MB")
print("Optimized model size:", os.path.getsize("edge_model_optimized.onnx") / (1024 * 1024), "MB")

Common Challenges and Solutions

Challenge 1: Unsupported Operations

Sometimes, TensorFlow operations don't have direct counterparts in ONNX:

# Model with potentially problematic custom operations
import tensorflow_addons as tfa

model_with_custom_ops = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3)),
    tfa.layers.GroupNormalization(groups=4),  # Custom operation
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Try to convert with a higher opset version
try:
    onnx_model, _ = tf2onnx.convert.from_keras(
        model_with_custom_ops,
        input_signature=(tf.TensorSpec((None, 28, 28, 1), tf.float32, name="input"),),
        opset=15  # Try with a higher opset version
    )
    onnx.save(onnx_model, "custom_ops_model.onnx")
    print("Conversion successful!")
except Exception as e:
    print(f"Conversion failed: {e}")
    print("Try implementing a custom converter for the unsupported operation")

Challenge 2: Model Optimization for Inference

After conversion, you can optimize your ONNX model for inference:

import onnxruntime as ort
from onnxruntime.quantization import quantize_dynamic, QuantType

# Quantize the model to reduce size and improve inference speed
quantized_model_path = "model_quantized.onnx"
quantize_dynamic("model.onnx", quantized_model_path)

print(f"Original model size: {os.path.getsize('model.onnx') / 1024:.2f} KB")
print(f"Quantized model size: {os.path.getsize(quantized_model_path) / 1024:.2f} KB")

# Compare inference speed
def compare_inference_speed(original_model_path, quantized_model_path, input_data):
    # Original model
    session_original = ort.InferenceSession(original_model_path)
    
    # Quantized model
    session_quantized = ort.InferenceSession(quantized_model_path)
    
    # Measure original model speed
    import time
    start = time.time()
    for _ in range(100):
        session_original.run(None, {"input": input_data})
    original_time = time.time() - start
    
    # Measure quantized model speed
    start = time.time()
    for _ in range(100):
        session_quantized.run(None, {"input": input_data})
    quantized_time = time.time() - start
    
    print(f"Original model inference time: {original_time:.4f} seconds")
    print(f"Quantized model inference time: {quantized_time:.4f} seconds")
    print(f"Speed improvement: {original_time/quantized_time:.2f}x")

Summary

In this tutorial, we learned how to convert TensorFlow models to the ONNX format for cross-platform deployment. Here's what we covered:

Basic conversion of TensorFlow models to ONNX
Handling custom layers and operations
Working with dynamic dimensions
Real-world applications and deployment examples
Common challenges and optimization techniques

Converting your TensorFlow models to ONNX allows for greater flexibility in deployment across different platforms and hardware accelerators. Whether you need to run your model on Windows ML, edge devices, or integrate with other frameworks, ONNX provides a standardized format for interoperability.

Additional Resources

Official tf2onnx Documentation
ONNX Runtime Documentation
ONNX Model Zoo - Examples of pre-trained models in ONNX format
TensorFlow to ONNX Conversion Guide

Exercises

Basic Conversion: Convert a pre-trained TensorFlow model like ResNet50 to ONNX and verify that the outputs match.
Custom Layers: Create a TensorFlow model with a custom layer and implement a custom converter for that layer.
Optimization: Quantize an ONNX model and measure the performance improvement on different hardware (CPU, GPU if available).
Edge Deployment: Convert a TensorFlow Lite model to ONNX and deploy it on a mobile or edge device using ONNX Runtime.
Interoperability: Convert a TensorFlow model to ONNX and then load it in PyTorch using ONNX as the bridge.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Basic Conversion Process​

Step 1: Create or Load a TensorFlow Model​

Step 2: Convert the Model to ONNX​

Step 3: Verify the Converted Model​

Advanced Conversion Techniques​

Handling Custom Layers and Operations​

Converting Models with Dynamic Dimensions​

Real-World Applications​

Example 1: Cross-Platform Image Classification​

Example 2: Deploying TensorFlow Models on Edge Devices​

Common Challenges and Solutions​

Challenge 1: Unsupported Operations​

Challenge 2: Model Optimization for Inference​

Summary​

Additional Resources​

Exercises​