TensorFlow Hyperparameters

Introduction

When building machine learning models with TensorFlow, you'll quickly discover that the performance of your model depends heavily on certain configuration settings called hyperparameters. Unlike model parameters (such as weights and biases) that are learned during training, hyperparameters must be set before training begins.

Hyperparameters control various aspects of the learning process, including:

How quickly the model learns
How complex the model is
How to avoid overfitting
How the optimization process works

In this guide, we'll explore the most important hyperparameters in TensorFlow, how to set them, and strategies for finding optimal values to improve your model's performance.

Key Hyperparameters in TensorFlow

Learning Rate

The learning rate is perhaps the most critical hyperparameter that determines how quickly your model updates its parameters during training.

import tensorflow as tf

# Creating a model with a specified learning rate
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Setting the learning rate in the optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

model.compile(
    optimizer=optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Effects of learning rate:

Too high: Model may fail to converge or even diverge
Too low: Training will be very slow and may get stuck in local minima
Just right: Model converges efficiently to a good solution

Batch Size

Batch size determines how many samples are processed before the model's internal parameters are updated.

# Training with a specific batch size
history = model.fit(
    x_train, 
    y_train,
    batch_size=32,  # Number of samples per gradient update
    epochs=10,
    validation_data=(x_val, y_val)
)

Trade-offs with batch size:

Larger batches: More stable gradient estimates but require more memory
Smaller batches: Faster iterations but with noisier gradients
Common values: 32, 64, 128, 256

Number of Epochs

An epoch represents one complete pass through the entire training dataset.

# Setting the number of epochs
history = model.fit(
    x_train, 
    y_train,
    batch_size=32,
    epochs=100,  # Number of complete passes through the training data
    validation_data=(x_val, y_val)
)

Too few epochs might cause underfitting, while too many might lead to overfitting. Using early stopping is a common technique to determine the optimal number of epochs automatically:

# Using early stopping to determine optimal epochs
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

history = model.fit(
    x_train, 
    y_train,
    batch_size=32,
    epochs=100,  # Maximum number of epochs
    callbacks=[early_stopping],
    validation_data=(x_val, y_val)
)

Network Architecture Hyperparameters

These include the number of layers, number of neurons per layer, and activation functions.

# Creating a model with specific architecture hyperparameters
model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),  # Number of neurons in first layer
    tf.keras.layers.Dropout(0.3),                   # Dropout rate
    tf.keras.layers.Dense(128, activation='relu'),  # Number of neurons in second layer
    tf.keras.layers.Dense(10, activation='softmax') # Output layer
])

Regularization Hyperparameters

Regularization helps prevent overfitting by constraining the model's capacity.

# L2 (Ridge) regularization
regularized_layer = tf.keras.layers.Dense(
    128, 
    activation='relu',
    kernel_regularizer=tf.keras.regularizers.l2(0.001)  # L2 regularization strength
)

# Dropout - another form of regularization
dropout_layer = tf.keras.layers.Dropout(0.5)  # Dropout rate of 0.5 (50%)

Practical Example: Hyperparameter Tuning

Let's see a complete example of how to tune hyperparameters for a simple classification model on the MNIST dataset:

import tensorflow as tf
from tensorflow.keras.datasets import mnist
import numpy as np

# Load and prepare the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize pixel values

# Split training data to create a validation set
val_size = 10000
x_val, y_val = x_train[:val_size], y_train[:val_size]
x_train, y_train = x_train[val_size:], y_train[val_size:]

# Define a function to create and train models with different hyperparameters
def train_model(learning_rate, batch_size, hidden_units, dropout_rate):
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(hidden_units, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='val_accuracy', 
        patience=3,
        restore_best_weights=True
    )
    
    history = model.fit(
        x_train, y_train,
        batch_size=batch_size,
        epochs=20,
        validation_data=(x_val, y_val),
        callbacks=[early_stopping],
        verbose=0
    )
    
    # Evaluate the model on the test set
    test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
    return model, test_acc, history

# Try different hyperparameter combinations
hyperparameters = [
    {'learning_rate': 0.001, 'batch_size': 32, 'hidden_units': 128, 'dropout_rate': 0.2},
    {'learning_rate': 0.01, 'batch_size': 64, 'hidden_units': 128, 'dropout_rate': 0.2},
    {'learning_rate': 0.001, 'batch_size': 32, 'hidden_units': 256, 'dropout_rate': 0.3},
]

results = []

for params in hyperparameters:
    print(f"Training with parameters: {params}")
    model, accuracy, history = train_model(**params)
    results.append((params, accuracy))
    print(f"Test accuracy: {accuracy:.4f}")
    print("-" * 50)

# Find the best hyperparameters
best_params, best_accuracy = max(results, key=lambda x: x[1])
print(f"Best hyperparameters: {best_params}")
print(f"Best test accuracy: {best_accuracy:.4f}")

Output (sample):

Training with parameters: {'learning_rate': 0.001, 'batch_size': 32, 'hidden_units': 128, 'dropout_rate': 0.2}
Test accuracy: 0.9772
--------------------------------------------------
Training with parameters: {'learning_rate': 0.01, 'batch_size': 64, 'hidden_units': 128, 'dropout_rate': 0.2}
Test accuracy: 0.9731
--------------------------------------------------
Training with parameters: {'learning_rate': 0.001, 'batch_size': 32, 'hidden_units': 256, 'dropout_rate': 0.3}
Test accuracy: 0.9803
--------------------------------------------------
Best hyperparameters: {'learning_rate': 0.001, 'batch_size': 32, 'hidden_units': 256, 'dropout_rate': 0.3}
Best test accuracy: 0.9803

Advanced Hyperparameter Tuning with Keras Tuner

For more systematic hyperparameter tuning, TensorFlow provides Keras Tuner:

import kerastuner as kt

def model_builder(hp):
    """Builds a model with hyperparameters to tune"""
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    
    # Tune the number of units in the first Dense layer
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    model.add(tf.keras.layers.Dense(units=hp_units, activation='relu'))
    
    # Tune the dropout rate
    hp_dropout = hp.Float('dropout', min_value=0.1, max_value=0.5, step=0.1)
    model.add(tf.keras.layers.Dropout(rate=hp_dropout))
    
    # Output layer
    model.add(tf.keras.layers.Dense(10, activation='softmax'))
    
    # Tune the learning rate
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Create a tuner
tuner = kt.Hyperband(
    model_builder,
    objective='val_accuracy',
    max_epochs=10,
    factor=3,
    directory='my_dir',
    project_name='mnist_tuning'
)

# Configure early stopping
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# Start the search
tuner.search(x_train, y_train, 
             epochs=50,
             validation_data=(x_val, y_val),
             callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first dense layer is {best_hps.get('units')} and the optimal learning rate for the optimizer is {best_hps.get('learning_rate')}.
The optimal dropout rate is {best_hps.get('dropout')}.
""")

# Build the model with the optimal hyperparameters and train it
model = tuner.hypermodel.build(best_hps)
history = model.fit(x_train, y_train, epochs=50, validation_data=(x_val, y_val))

# Evaluate the model
eval_result = model.evaluate(x_test, y_test)
print(f"Test loss: {eval_result[0]}, Test accuracy: {eval_result[1]}")

Common Hyperparameter Guidelines

While optimal hyperparameters vary by dataset and problem, here are some general guidelines:

Hyperparameter	Common Values	Notes
Learning rate	0.1, 0.01, 0.001, 0.0001	Often start with 0.01 and adjust down if training is unstable
Batch size	32, 64, 128, 256	Limited by GPU memory; smaller batches can have a regularizing effect
Hidden layers	1-5 for simple problems	Start with fewer layers and increase if underfitting
Neurons per layer	Powers of 2 (64, 128, 256, etc.)	Start with fewer neurons and increase if underfitting
Dropout rate	0.1-0.5	Higher values for larger models to prevent overfitting
L1/L2 regularization	0.01, 0.001, 0.0001	Start small and increase if overfitting

Practical Hyperparameter Tuning Strategies

Manual Search: Start with default values and adjust one hyperparameter at a time.
Grid Search: Try all combinations from a predefined set of values.
Random Search: Sample hyperparameter values from defined distributions.
Bayesian Optimization: Use past evaluations to guide the search for better hyperparameters.
Population-Based Training: Evolve a population of models, keeping the best ones.

Visualizing the Impact of Hyperparameters

Visualizing the training process can help you understand the impact of hyperparameters:

import matplotlib.pyplot as plt

def plot_training_history(histories, labels):
    """Plot training and validation metrics for different models"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    for history, label in zip(histories, labels):
        # Plot training & validation accuracy
        ax1.plot(history.history['accuracy'], label=f'{label} (train)')
        ax1.plot(history.history['val_accuracy'], label=f'{label} (val)')
    
        # Plot training & validation loss
        ax2.plot(history.history['loss'], label=f'{label} (train)')
        ax2.plot(history.history['val_loss'], label=f'{label} (val)')
    
    ax1.set_title('Model Accuracy')
    ax1.set_ylabel('Accuracy')
    ax1.set_xlabel('Epoch')
    ax1.legend()
    
    ax2.set_title('Model Loss')
    ax2.set_ylabel('Loss')
    ax2.set_xlabel('Epoch')
    ax2.legend()
    
    plt.tight_layout()
    plt.show()

# Example usage (assuming you've collected histories from different model runs)
histories = [history1, history2, history3]  # From your model training
labels = ['LR=0.001', 'LR=0.01', 'LR=0.1']
plot_training_history(histories, labels)

Summary

Hyperparameters are critical settings that significantly influence the performance of your TensorFlow models. Key points to remember:

Hyperparameters must be set before training begins, unlike model parameters
The most important hyperparameters include learning rate, batch size, network architecture, and regularization settings
Finding optimal hyperparameters involves systematic experimentation
Tools like Keras Tuner can automate the search for optimal hyperparameters
Visualizing the training process helps understand the impact of hyperparameter choices

By understanding and effectively tuning hyperparameters, you can significantly improve the performance of your TensorFlow models.

Additional Resources

Exercises

Train a simple neural network on the Fashion MNIST dataset with three different learning rates (0.1, 0.01, 0.001). Plot the training and validation accuracy curves to compare their performance.
Implement grid search to find the optimal combination of batch size and dropout rate for a simple CNN on the CIFAR-10 dataset.
Use Keras Tuner to optimize a deep neural network for a regression task on a dataset of your choice.
Create a visualization that shows the relationship between model complexity (number of layers and neurons) and validation performance.
Implement early stopping and learning rate scheduling in a TensorFlow model, and compare the results to a model without these techniques.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Key Hyperparameters in TensorFlow​

Learning Rate​

Batch Size​

Number of Epochs​

Network Architecture Hyperparameters​

Regularization Hyperparameters​

Practical Example: Hyperparameter Tuning​

Advanced Hyperparameter Tuning with Keras Tuner​

Common Hyperparameter Guidelines​

Practical Hyperparameter Tuning Strategies​

Visualizing the Impact of Hyperparameters​

Summary​

Additional Resources​

Exercises​