Skip to main content

TensorFlow Bidirectional RNN

Introduction

Recurrent Neural Networks (RNNs) are powerful tools for processing sequential data like text, time series, or speech. Traditional RNNs process sequences in a single direction, typically from past to future (left to right). However, in many real-world problems, understanding the context from both directions can be crucial.

Bidirectional RNNs (BiRNNs) solve this limitation by processing sequences in both directions—forward and backward—and then combining the results. This architecture allows the network to capture context from both past and future states at any point in the sequence.

In this tutorial, you'll learn:

  • What bidirectional RNNs are and why they're useful
  • How to implement BiRNNs using TensorFlow's high-level Keras API
  • Practical applications and use cases for BiRNNs
  • Best practices for training and evaluating BiRNN models

Understanding Bidirectional RNNs

The Concept of Bidirectionality

A bidirectional RNN consists of two separate RNN layers:

  1. Forward layer: Processes the sequence from start to end (left to right)
  2. Backward layer: Processes the sequence from end to start (right to left)

The outputs from both layers are combined (usually by concatenation, but sometimes by summation or multiplication) to form a single output. This approach allows each output state to have information about the entire sequence, not just the previous elements.

Here's a simple illustration of how a bidirectional RNN processes a sequence:

Forward RNN:  x₁ → x₂ → x₃ → x₄ → x₅
↓ ↓ ↓ ↓ ↓
h₁ᶠ h₂ᶠ h₃ᶠ h₄ᶠ h₅ᶠ

Backward RNN: x₁ ← x₂ ← x₃ ← x₄ ← x₅
↓ ↓ ↓ ↓ ↓
h₁ᵇ h₂ᵇ h₃ᵇ h₄ᵇ h₅ᵇ

Combined: [h₁ᶠ, h₁ᵇ], [h₂ᶠ, h₂ᵇ], [h₃ᶠ, h₃ᵇ], [h₄ᶠ, h₄ᵇ], [h₅ᶠ, h₅ᵇ]

When to Use BiRNNs

BiRNNs are particularly useful when:

  • The context from future inputs is as important as past inputs
  • You need to understand the entire sequence context for each prediction
  • The sequence can be fully observed before making predictions

Common applications include:

  • Natural Language Processing (NLP) tasks like named entity recognition
  • Speech recognition
  • Protein structure prediction
  • Handwriting recognition

Implementing Bidirectional RNNs in TensorFlow

TensorFlow makes it easy to build bidirectional RNNs using the Bidirectional wrapper from the Keras API.

Basic Implementation

Here's a basic example of a bidirectional LSTM network for sequence classification:

python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Bidirectional, LSTM, Dense, Embedding

# Define a simple bidirectional LSTM model
model = Sequential([
# Input layer: Embedding layer for text data
Embedding(input_dim=10000, output_dim=128, input_length=100),

# Bidirectional LSTM layer
Bidirectional(LSTM(64, return_sequences=False)),

# Output layer
Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)

# Print model summary
model.summary()

When you run this code, you'll see a model summary showing the architecture:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 100, 128) 1280000
_________________________________________________________________
bidirectional (Bidirectional) (None, 128) 98816
_________________________________________________________________
dense (Dense) (None, 1) 129
=================================================================
Total params: 1,378,945
Trainable params: 1,378,945
Non-trainable params: 0
_________________________________________________________________

Note that the bidirectional layer outputs 128 features (64 * 2) because it combines the outputs from both forward and backward LSTMs.

Stacking Bidirectional Layers

You can create deeper networks by stacking multiple bidirectional layers:

python
model = Sequential([
Embedding(input_dim=10000, output_dim=128, input_length=100),

# First bidirectional layer, return sequences for stacking
Bidirectional(LSTM(64, return_sequences=True)),

# Second bidirectional layer
Bidirectional(LSTM(32, return_sequences=False)),

Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Using Different Merge Modes

By default, the Bidirectional wrapper concatenates the outputs of the forward and backward RNNs. However, you can choose different merge modes:

python
# Concatenate outputs (default)
Bidirectional(LSTM(64), merge_mode='concat')

# Sum outputs
Bidirectional(LSTM(64), merge_mode='sum')

# Multiply outputs
Bidirectional(LSTM(64), merge_mode='mul')

# Keep outputs separate (returns a list)
Bidirectional(LSTM(64), merge_mode='ave')

# No merging (returns a list of forward and backward outputs)
Bidirectional(LSTM(64), merge_mode=None)

Different Types of RNN Cells

You can use different RNN cell types with the bidirectional wrapper:

python
# Bidirectional Simple RNN
Bidirectional(tf.keras.layers.SimpleRNN(64))

# Bidirectional LSTM
Bidirectional(tf.keras.layers.LSTM(64))

# Bidirectional GRU
Bidirectional(tf.keras.layers.GRU(64))

Practical Example: Sentiment Analysis

Let's implement a complete sentiment analysis model using a bidirectional LSTM network on the IMDB movie review dataset:

python
import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense, Dropout

# Load IMDB dataset
max_features = 10000 # Top 10,000 most frequent words
maxlen = 200 # Cut texts after 200 words

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to ensure consistent input size
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

# Build model
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2)))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)

# Train model
history = model.fit(
x_train, y_train,
batch_size=32,
epochs=5,
validation_split=0.2
)

# Evaluate the model
score = model.evaluate(x_test, y_test, batch_size=32)
print(f"Test accuracy: {score[1]:.4f}")

Expected output (actual values may vary due to random initialization):

Epoch 1/5
625/625 [==============================] - 42s 66ms/step - loss: 0.4998 - accuracy: 0.7582 - val_loss: 0.3729 - val_accuracy: 0.8340
Epoch 2/5
625/625 [==============================] - 40s 64ms/step - loss: 0.3148 - accuracy: 0.8687 - val_loss: 0.3542 - val_accuracy: 0.8486
Epoch 3/5
625/625 [==============================] - 40s 64ms/step - loss: 0.2583 - accuracy: 0.8956 - val_loss: 0.3783 - val_accuracy: 0.8426
Epoch 4/5
625/625 [==============================] - 41s 65ms/step - loss: 0.2248 - accuracy: 0.9115 - val_loss: 0.3967 - val_accuracy: 0.8488
Epoch 5/5
625/625 [==============================] - 40s 64ms/step - loss: 0.1961 - accuracy: 0.9246 - val_loss: 0.4281 - val_accuracy: 0.8308

782/782 [==============================] - 13s 17ms/step - loss: 0.4117 - accuracy: 0.8437
Test accuracy: 0.8437

Visualizing Training Progress

We can visualize the training and validation accuracy to better understand the model's performance:

python
import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.tight_layout()
plt.show()

Comparing Unidirectional vs. Bidirectional RNNs

To understand the benefits of bidirectional RNNs, let's compare a unidirectional LSTM with a bidirectional LSTM on the same task:

python
# Unidirectional LSTM model
uni_model = Sequential([
Embedding(max_features, 128, input_length=maxlen),
LSTM(64, dropout=0.2, recurrent_dropout=0.2),
Dense(1, activation='sigmoid')
])

uni_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)

# Train unidirectional model
uni_history = uni_model.fit(
x_train, y_train,
batch_size=32,
epochs=5,
validation_split=0.2,
verbose=0 # Suppress output for clarity
)

# Evaluate both models
uni_score = uni_model.evaluate(x_test, y_test, batch_size=32, verbose=0)
bi_score = model.evaluate(x_test, y_test, batch_size=32, verbose=0)

print(f"Unidirectional LSTM accuracy: {uni_score[1]:.4f}")
print(f"Bidirectional LSTM accuracy: {bi_score[1]:.4f}")
print(f"Improvement: {(bi_score[1] - uni_score[1])*100:.2f}%")

Expected output:

Unidirectional LSTM accuracy: 0.8302
Bidirectional LSTM accuracy: 0.8437
Improvement: 1.35%

Real-World Applications of Bidirectional RNNs

1. Named Entity Recognition (NER)

BiRNNs are excellent for NER because recognizing entities often requires understanding context from both directions.

python
# Simple BiLSTM model for Named Entity Recognition
def create_ner_model(vocab_size, embedding_dim, max_len, num_tags):
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_len),
Bidirectional(LSTM(100, return_sequences=True)),
Dense(num_tags, activation='softmax')
])
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
return model

2. Machine Translation

BiRNNs form a critical component in encoder-decoder models for machine translation:

python
# Simplified encoder part of a translation model
def create_encoder(vocab_size, embedding_dim, hidden_units):
encoder_inputs = tf.keras.layers.Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)(encoder_inputs)

# Bidirectional encoder
encoder_bilstm = Bidirectional(LSTM(hidden_units, return_state=True))
encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder_bilstm(encoder_embedding)

# Concatenate the states from both directions
state_h = tf.keras.layers.Concatenate()([forward_h, backward_h])
state_c = tf.keras.layers.Concatenate()([forward_c, backward_c])

encoder_states = [state_h, state_c]

return tf.keras.Model(encoder_inputs, encoder_states)

3. Speech Recognition

BiRNNs are frequently used in automatic speech recognition systems:

python
# Speech recognition model architecture
def create_speech_recognition_model(input_shape, num_classes):
model = Sequential([
# Input layer for spectrograms or MFCCs
tf.keras.layers.Input(shape=input_shape),

# Convolutional feature extraction
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),

# Reshape for RNN
tf.keras.layers.Reshape((-1, 64)), # Reshape to sequence

# Bidirectional RNN layers
Bidirectional(LSTM(128, return_sequences=True)),
Bidirectional(LSTM(64)),

# Output layer
Dense(num_classes, activation='softmax')
])

model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model

Best Practices and Considerations

When working with bidirectional RNNs, keep these practices in mind:

1. Performance Considerations

  • Memory usage: BiRNNs use approximately twice the memory of their unidirectional counterparts
  • Training time: They require more computational resources due to processing sequences in both directions
  • Parameter count: A BiRNN has roughly twice as many parameters as a unidirectional RNN

2. When to Use BiRNNs vs. Unidirectional RNNs

  • Use BiRNNs when:

    • You have access to the entire sequence during inference time
    • Both past and future context are important (text classification, NER)
    • You need maximum accuracy and have the computational resources
  • Use unidirectional RNNs when:

    • You're working with real-time sequential data where future context isn't available
    • You need to generate sequences (language modeling, text generation)
    • You have computational constraints

3. Hyperparameter Tuning

  • Number of units: Start with a modest number (32-128) and increase if needed
  • Merge mode: Try different merge modes (concat, sum, mul, ave) to see what works best
  • Dropout: Usually between 0.2-0.5 helps prevent overfitting

Common Issues and Solutions

Vanishing/Exploding Gradients

BiRNNs can still suffer from vanishing/exploding gradients, especially with longer sequences:

python
# Using gradient clipping to help with exploding gradients
model = Sequential([
Embedding(max_features, 128, input_length=maxlen),
Bidirectional(LSTM(64, return_sequences=True)),
Bidirectional(LSTM(64)),
Dense(1, activation='sigmoid')
])

# Use clipnorm or clipvalue
optimizer = tf.keras.optimizers.Adam(clipnorm=1.0) # Clip gradients to a maximum norm of 1
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

Overfitting

BiRNNs have more parameters and thus are more prone to overfitting:

python
model = Sequential([
Embedding(max_features, 128, input_length=maxlen),
Bidirectional(LSTM(64, dropout=0.3, recurrent_dropout=0.3)),
# Add regularization
Dense(64, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.001)),
Dropout(0.5),
Dense(1, activation='sigmoid')
])

Summary

Bidirectional RNNs are powerful extensions of traditional RNNs that process sequences in both directions, providing richer context for each element in the sequence. Key points to remember:

  • BiRNNs consist of two RNNs processing data in opposite directions
  • They're particularly useful for tasks where future context is as important as past context
  • TensorFlow's Bidirectional wrapper makes them easy to implement
  • They typically outperform unidirectional RNNs on tasks like sentiment analysis, NER, and speech recognition
  • They require more computational resources and are not suitable for real-time sequence generation

By understanding when and how to use bidirectional RNNs, you can significantly improve your model's performance on many sequential data tasks.

Additional Resources and Exercises

Resources

  1. TensorFlow Documentation on Bidirectional Layers
  2. Understanding Bidirectional RNN in PyTorch
  3. Original BiRNN Paper by Schuster & Paliwal (1997)

Exercises

  1. Comparative Analysis: Implement both unidirectional and bidirectional RNNs for a text classification task and compare their performance.

  2. Hyperparameter Exploration: Experiment with different merge modes for a BiLSTM and analyze how they affect model performance.

  3. Advanced Implementation: Build a name origin classifier using BiRNNs that can predict the nationality of a person based on their name.

  4. Real-World Application: Implement a BiRNN model for part-of-speech tagging on a publicly available dataset like Penn Treebank.

  5. Visualization Project: Create a visualization tool that shows how both directions of a BiRNN contribute to the classification of different inputs.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)