TensorFlow ONNX Conversion
Introduction
When deploying machine learning models across different platforms and frameworks, you often face compatibility challenges. A model trained in TensorFlow may need to run in an environment that doesn't support TensorFlow natively. This is where the Open Neural Network Exchange (ONNX) format comes in.
ONNX is an open standard designed to represent machine learning models in a framework-agnostic way. By converting your TensorFlow model to ONNX, you can deploy it on a variety of platforms and hardware that may not directly support TensorFlow, but do support ONNX. This includes deployment targets like:
- Microsoft's Windows ML
- NVIDIA TensorRT
- Intel OpenVINO
- Various mobile and edge devices
- Other ML frameworks like PyTorch
In this tutorial, we'll walk through the process of converting TensorFlow models to the ONNX format, understand the benefits and limitations, and explore practical applications.
Prerequisites
Before you begin, make sure you have the following installed:
pip install tensorflow
pip install tf2onnx
pip install onnxruntime
Basic Conversion Process
Converting a TensorFlow model to ONNX involves a few key steps. Let's start with a simple example.
Step 1: Create or Load a TensorFlow Model
First, let's create a simple TensorFlow model:
import tensorflow as tf
# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28)),
    tf.keras.layers.Reshape((28, 28, 1)),
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# Summary of the model architecture
model.summary()
Step 2: Convert the Model to ONNX
Now, let's convert this TensorFlow model to the ONNX format:
import tf2onnx
import onnx
# Method 1: Using tf2onnx's convert_keras function
onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature=(tf.TensorSpec((None, 28, 28), tf.float32, name="input"),), opset=13)
# Save the model
onnx.save(onnx_model, "model.onnx")
print("Model converted to ONNX format and saved as 'model.onnx'")
Alternatively, you can use the command-line interface:
# For a saved Keras model
python -m tf2onnx.convert --saved-model path/to/saved_model --output model.onnx
# For a Keras H5 file
python -m tf2onnx.convert --keras path/to/model.h5 --output model.onnx
Step 3: Verify the Converted Model
It's always a good practice to verify that your converted model works as expected:
import numpy as np
import onnxruntime as ort
# Create a random input tensor matching your model's input shape
random_input = np.random.rand(1, 28, 28).astype(np.float32)
# Run inference with TensorFlow model
tf_output = model.predict(random_input)
# Run inference with ONNX model
ort_session = ort.InferenceSession("model.onnx")
onnx_output = ort_session.run(None, {"input": random_input})[0]
# Compare the outputs
print("TensorFlow output shape:", tf_output.shape)
print("ONNX output shape:", onnx_output.shape)
print("Outputs match:", np.allclose(tf_output, onnx_output, rtol=1e-5, atol=1e-5))
Advanced Conversion Techniques
Handling Custom Layers and Operations
If your TensorFlow model contains custom layers or operations, you may need to provide custom conversion functions:
class CustomLayer(tf.keras.layers.Layer):
    def __init__(self, units=32):
        super(CustomLayer, self).__init__()
        self.units = units
        
    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer='random_normal',
            trainable=True,
        )
        
    def call(self, inputs):
        # Custom operation: multiply inputs by weights and apply tanh
        return tf.math.tanh(tf.matmul(inputs, self.w))
# Function to handle custom layer conversion
def convert_custom_layer(scope, operator, container):
    # Get input and output tensors
    input_tensor = operator.inputs[0].full_name
    output_tensor = operator.outputs[0].full_name
    
    # Get layer weights
    weights = operator.raw_operator.weights
    weights_tensor = scope.get_unique_variable_name('W')
    container.add_initializer(weights_tensor, onnx_proto.TensorProto.FLOAT,
                            weights.shape, weights.numpy().flatten().tolist())
    
    # Create MatMul + Tanh nodes
    matmul_output = scope.get_unique_variable_name('matmul_output')
    container.add_node('MatMul', [input_tensor, weights_tensor], 
                      [matmul_output], name=scope.get_unique_operator_name('MatMul'))
    container.add_node('Tanh', [matmul_output], [output_tensor], 
                      name=scope.get_unique_operator_name('Tanh'))
# Register the custom converter
from tf2onnx.handler import tf_op
tf_op("CustomLayer", domain="", convert_custom_layer)
Converting Models with Dynamic Dimensions
ONNX supports dynamic dimensions, which is useful when your model can handle inputs of varying sizes:
# Define a model with dynamic input dimensions
dynamic_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(None, None, 3)),  # Dynamic height and width
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10, activation='softmax')
])
# Convert to ONNX with dynamic dimensions
onnx_model, _ = tf2onnx.convert.from_keras(
    dynamic_model, 
    input_signature=(tf.TensorSpec((None, None, None, 3), tf.float32, name="input"),),
    opset=13
)
onnx.save(onnx_model, "dynamic_model.onnx")
Real-World Applications
Example 1: Cross-Platform Image Classification
Let's build and convert a pre-trained image classification model:
# Load a pre-trained MobileNetV2 model
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=True,
    weights='imagenet'
)
# Convert to ONNX
onnx_model, _ = tf2onnx.convert.from_keras(
    base_model,
    input_signature=(tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),),
    opset=13
)
onnx.save(onnx_model, "mobilenetv2.onnx")
# Example of using this model for inference
def process_image(image_path):
    from PIL import Image
    
    # Load and preprocess the image
    img = Image.open(image_path).resize((224, 224))
    img_array = np.array(img).astype(np.float32)
    
    # Add batch dimension and normalize
    img_array = np.expand_dims(img_array, axis=0)
    img_array = tf.keras.applications.mobilenet_v2.preprocess_input(img_array)
    
    return img_array
# Run inference with ONNX Runtime
def onnx_predict(image_path, onnx_model_path="mobilenetv2.onnx"):
    img_array = process_image(image_path)
    
    ort_session = ort.InferenceSession(onnx_model_path)
    predictions = ort_session.run(None, {"input": img_array})[0]
    
    # Decode predictions
    decoded_predictions = tf.keras.applications.mobilenet_v2.decode_predictions(predictions, top=5)[0]
    
    return decoded_predictions
# Example usage
# results = onnx_predict("dog.jpg")
# for i, (imagenet_id, label, score) in enumerate(results):
#     print(f"{i+1}: {label} ({score:.2f})")
Example 2: Deploying TensorFlow Models on Edge Devices
Many edge devices have limited support for TensorFlow but offer ONNX Runtime:
# Create a lightweight model suitable for edge deployment
edge_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(128, 128, 3)),
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(2, activation='softmax')  # Binary classification
])
# Convert to ONNX
onnx_model, _ = tf2onnx.convert.from_keras(
    edge_model,
    input_signature=(tf.TensorSpec((None, 128, 128, 3), tf.float32, name="input"),),
    opset=13
)
onnx.save(onnx_model, "edge_model.onnx")
# Optimize the model for inference (reduces model size and improves performance)
from onnxruntime.transformers.optimizer import optimize_model
optimized_model = optimize_model("edge_model.onnx")
optimized_model.save_model_to_file("edge_model_optimized.onnx")
print("Original model size:", os.path.getsize("edge_model.onnx") / (1024 * 1024), "MB")
print("Optimized model size:", os.path.getsize("edge_model_optimized.onnx") / (1024 * 1024), "MB")
Common Challenges and Solutions
Challenge 1: Unsupported Operations
Sometimes, TensorFlow operations don't have direct counterparts in ONNX:
# Model with potentially problematic custom operations
import tensorflow_addons as tfa
model_with_custom_ops = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3)),
    tfa.layers.GroupNormalization(groups=4),  # Custom operation
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation='softmax')
])
# Try to convert with a higher opset version
try:
    onnx_model, _ = tf2onnx.convert.from_keras(
        model_with_custom_ops,
        input_signature=(tf.TensorSpec((None, 28, 28, 1), tf.float32, name="input"),),
        opset=15  # Try with a higher opset version
    )
    onnx.save(onnx_model, "custom_ops_model.onnx")
    print("Conversion successful!")
except Exception as e:
    print(f"Conversion failed: {e}")
    print("Try implementing a custom converter for the unsupported operation")
Challenge 2: Model Optimization for Inference
After conversion, you can optimize your ONNX model for inference:
import onnxruntime as ort
from onnxruntime.quantization import quantize_dynamic, QuantType
# Quantize the model to reduce size and improve inference speed
quantized_model_path = "model_quantized.onnx"
quantize_dynamic("model.onnx", quantized_model_path)
print(f"Original model size: {os.path.getsize('model.onnx') / 1024:.2f} KB")
print(f"Quantized model size: {os.path.getsize(quantized_model_path) / 1024:.2f} KB")
# Compare inference speed
def compare_inference_speed(original_model_path, quantized_model_path, input_data):
    # Original model
    session_original = ort.InferenceSession(original_model_path)
    
    # Quantized model
    session_quantized = ort.InferenceSession(quantized_model_path)
    
    # Measure original model speed
    import time
    start = time.time()
    for _ in range(100):
        session_original.run(None, {"input": input_data})
    original_time = time.time() - start
    
    # Measure quantized model speed
    start = time.time()
    for _ in range(100):
        session_quantized.run(None, {"input": input_data})
    quantized_time = time.time() - start
    
    print(f"Original model inference time: {original_time:.4f} seconds")
    print(f"Quantized model inference time: {quantized_time:.4f} seconds")
    print(f"Speed improvement: {original_time/quantized_time:.2f}x")
Summary
In this tutorial, we learned how to convert TensorFlow models to the ONNX format for cross-platform deployment. Here's what we covered:
- Basic conversion of TensorFlow models to ONNX
- Handling custom layers and operations
- Working with dynamic dimensions
- Real-world applications and deployment examples
- Common challenges and optimization techniques
Converting your TensorFlow models to ONNX allows for greater flexibility in deployment across different platforms and hardware accelerators. Whether you need to run your model on Windows ML, edge devices, or integrate with other frameworks, ONNX provides a standardized format for interoperability.
Additional Resources
- Official tf2onnx Documentation
- ONNX Runtime Documentation
- ONNX Model Zoo - Examples of pre-trained models in ONNX format
- TensorFlow to ONNX Conversion Guide
Exercises
- 
Basic Conversion: Convert a pre-trained TensorFlow model like ResNet50 to ONNX and verify that the outputs match. 
- 
Custom Layers: Create a TensorFlow model with a custom layer and implement a custom converter for that layer. 
- 
Optimization: Quantize an ONNX model and measure the performance improvement on different hardware (CPU, GPU if available). 
- 
Edge Deployment: Convert a TensorFlow Lite model to ONNX and deploy it on a mobile or edge device using ONNX Runtime. 
- 
Interoperability: Convert a TensorFlow model to ONNX and then load it in PyTorch using ONNX as the bridge. 
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!