TensorFlow Bayesian Models

Introduction

Bayesian modeling represents a powerful approach to machine learning that focuses on quantifying uncertainty in predictions. Unlike traditional machine learning approaches that provide point estimates, Bayesian methods give us complete probability distributions over possible outcomes. This is particularly valuable when making critical decisions where understanding the confidence in predictions matters.

In this tutorial, we'll explore how to implement Bayesian models using TensorFlow Probability (TFP), an extension of TensorFlow designed specifically for probabilistic reasoning. By the end, you'll understand how to:

Build basic Bayesian models in TensorFlow
Perform Bayesian inference
Quantify uncertainty in your predictions
Apply Bayesian techniques to practical problems

Prerequisites

Before diving in, you should have:

Basic understanding of TensorFlow
Familiarity with probability concepts
Python programming knowledge

Let's start by installing the necessary libraries:

# Install TensorFlow and TensorFlow Probability
!pip install tensorflow tensorflow-probability

Understanding Bayesian Thinking

Bayesian modeling is based on Bayes' theorem, which provides a way to update our beliefs as we observe new data. The core formula is:

$P(\theta|D) = \frac{P(D|\theta) \times P(\theta)}{P(D)}$

Where:

$P(\theta|D)$ is the posterior probability (what we want to estimate)
$P(D|\theta)$ is the likelihood (how well the model explains the data)
$P(\theta)$ is the prior probability (our initial belief)
$P(D)$ is the evidence (normalization factor)

In TensorFlow Probability, we can encode this thinking into models that learn and update systematically.

Setting Up Your First Bayesian Model

Let's start with a simple Bayesian linear regression model. First, import the necessary libraries:

import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt

tfd = tfp.distributions  # Shorthand for tensorflow_probability.distributions

Creating Synthetic Data

We'll generate some simple data for our example:

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Generate synthetic data
N = 100  # Number of data points
true_slope = 0.5
true_intercept = 2.0
x_data = np.linspace(-5, 5, N)
noise = np.random.normal(0, 1, N)
y_data = true_slope * x_data + true_intercept + noise

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_data, alpha=0.7)
plt.title("Synthetic Data for Bayesian Linear Regression")
plt.xlabel("x")
plt.ylabel("y")
plt.grid(True)
plt.show()

This will generate and display a scatterplot of our synthetic data points that follow a linear pattern with some noise.

Building a Bayesian Linear Regression Model

Now, let's build a Bayesian linear regression model:

def build_bayesian_linear_regression_model(x_data, y_data):
    # Define priors for model parameters
    slope_prior = tfd.Normal(loc=0., scale=1.)
    intercept_prior = tfd.Normal(loc=0., scale=10.)
    noise_prior = tfd.LogNormal(loc=0., scale=1.)
    
    # Define the model
    def model():
        # Sample from priors
        slope = yield tfd.JointDistributionCoroutine.Root(slope_prior)
        intercept = yield tfd.JointDistributionCoroutine.Root(intercept_prior)
        noise = yield tfd.JointDistributionCoroutine.Root(noise_prior)
        
        # Calculate the mean prediction of the model
        mean = slope * x_data + intercept
        
        # Define the likelihood
        yield tfd.Normal(loc=mean, scale=noise)(y_data)
    
    # Create the joint distribution model
    model = tfd.JointDistributionCoroutine(model)
    
    return model

Performing Bayesian Inference

Now let's perform inference to get our posterior distributions:

# Create the model
model = build_bayesian_linear_regression_model(x_data, y_data)

# Set up MCMC transition kernel
num_burnin_steps = 1000
num_samples = 3000

# Initialize starting state
initial_state = [
    tf.zeros([], dtype=tf.float32),  # slope
    tf.zeros([], dtype=tf.float32),  # intercept
    tf.ones([], dtype=tf.float32),   # noise
]

# Define the Hamiltonian Monte Carlo kernel
kernel = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=lambda *args: model.log_prob(args),
    step_size=0.1,
    num_leapfrog_steps=10
)

# Add adaptive step size
adaptive_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
    kernel,
    num_adaptation_steps=int(num_burnin_steps * 0.8),
    target_accept_prob=0.75
)

# Run the MCMC chain
@tf.function
def run_mcmc():
    samples, stats = tfp.mcmc.sample_chain(
        num_results=num_samples,
        num_burnin_steps=num_burnin_steps,
        current_state=initial_state,
        kernel=adaptive_kernel,
        trace_fn=lambda _, pkr: pkr.inner_results.is_accepted
    )
    return samples, stats

samples, stats = run_mcmc()

Analyzing the Results

Let's examine the posterior distributions we obtained:

# Extract the samples
slope_samples = samples[0]
intercept_samples = samples[1]
noise_samples = samples[2]

# Calculate acceptance rate
acceptance_rate = tf.reduce_mean(tf.cast(stats, tf.float32))
print(f"Acceptance rate: {acceptance_rate:.3f}")

# Summary statistics
slope_mean = tf.reduce_mean(slope_samples)
slope_std = tf.math.reduce_std(slope_samples)
intercept_mean = tf.reduce_mean(intercept_samples)
intercept_std = tf.math.reduce_std(intercept_samples)
noise_mean = tf.reduce_mean(noise_samples)

print(f"Slope: {slope_mean:.3f} ± {slope_std:.3f} (true value: {true_slope})")
print(f"Intercept: {intercept_mean:.3f} ± {intercept_std:.3f} (true value: {true_intercept})")
print(f"Noise scale: {noise_mean:.3f}")

# Create plots of the posterior distributions
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.hist(slope_samples, bins=30, density=True, alpha=0.7)
plt.axvline(true_slope, color='red', linestyle='dashed')
plt.title(f'Slope Posterior\nMean: {slope_mean:.3f}, True: {true_slope}')

plt.subplot(1, 3, 2)
plt.hist(intercept_samples, bins=30, density=True, alpha=0.7)
plt.axvline(true_intercept, color='red', linestyle='dashed')
plt.title(f'Intercept Posterior\nMean: {intercept_mean:.3f}, True: {true_intercept}')

plt.subplot(1, 3, 3)
plt.hist(noise_samples, bins=30, density=True, alpha=0.7)
plt.axvline(1.0, color='red', linestyle='dashed')  # True noise std was 1.0
plt.title(f'Noise Posterior\nMean: {noise_mean:.3f}, True: 1.0')

plt.tight_layout()
plt.show()

# Plot the regression line with uncertainty
plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_data, alpha=0.4, label='Data')

# Plot mean regression line
x_range = np.linspace(-6, 6, 100)
y_mean = slope_mean * x_range + intercept_mean
plt.plot(x_range, y_mean, color='blue', label='Mean Posterior Prediction')

# Sample from the posterior to show uncertainty
num_posterior_samples = 100
posterior_idx = np.random.choice(num_samples, num_posterior_samples)
for i in posterior_idx:
    y_sample = slope_samples[i] * x_range + intercept_samples[i]
    plt.plot(x_range, y_sample, color='blue', alpha=0.05)

# Add true line
y_true = true_slope * x_range + true_intercept
plt.plot(x_range, y_true, color='red', linestyle='dashed', label='True Line')

plt.title('Bayesian Linear Regression')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()

The output will show:

The acceptance rate of our MCMC algorithm
The posterior means and standard deviations for our parameters
Histograms of the posterior distributions
A plot showing the uncertainty in our regression line

A Practical Example: Bayesian Neural Network

Now, let's implement a more complex model: a Bayesian Neural Network for classification. This demonstrates how to handle uncertainty in deep learning:

# Load a simple dataset (MNIST digits, but we'll use a small subset)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train[:1000].reshape(-1, 784).astype('float32') / 255.0
x_test = x_test[:100].reshape(-1, 784).astype('float32') / 255.0
y_train = y_train[:1000]
y_test = y_test[:100]

# Convert labels to one-hot
y_train_onehot = tf.one_hot(y_train, 10)
y_test_onehot = tf.one_hot(y_test, 10)

# Define a function to build a Bayesian Neural Network
def build_bayesian_neural_network():
    kl_divergence_function = lambda q, p, _: tfp.distributions.kl_divergence(q, p)
    
    # Define the Bayesian model with tfp.layers
    model = tf.keras.Sequential([
        tfp.layers.DenseVariational(
            units=20,
            make_posterior_fn=tfp.layers.default_mean_field_normal_fn(),
            make_prior_fn=tfp.layers.default_multivariate_normal_fn,
            kl_weight=1/x_train.shape[0],
            activation='relu',
            input_shape=(784,)
        ),
        tfp.layers.DenseVariational(
            units=10,
            make_posterior_fn=tfp.layers.default_mean_field_normal_fn(),
            make_prior_fn=tfp.layers.default_multivariate_normal_fn,
            kl_weight=1/x_train.shape[0],
            activation='softmax'
        )
    ])
    
    # Compile the model
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Create and train the model
bayesian_nn = build_bayesian_neural_network()
bayesian_nn.summary()

history = bayesian_nn.fit(
    x_train, y_train_onehot,
    epochs=10,
    batch_size=64,
    validation_split=0.15,
    verbose=1
)

Evaluating Uncertainty in Predictions

One of the key advantages of Bayesian neural networks is the ability to quantify uncertainty. Let's see how:

# Make multiple predictions to capture uncertainty
def predict_with_uncertainty(model, x, num_samples=100):
    predictions = np.stack([model(x, training=False) for _ in range(num_samples)])
    return predictions

# Generate predictions with uncertainty
predictions = predict_with_uncertainty(bayesian_nn, x_test)

# Calculate mean and standard deviation for each prediction
mean_prediction = np.mean(predictions, axis=0)
std_prediction = np.std(predictions, axis=0)

# Calculate the overall prediction and confidence
predicted_class = np.argmax(mean_prediction, axis=1)
confidence = np.max(mean_prediction, axis=1)
uncertainty = np.mean(std_prediction, axis=1)

# Display some results
plt.figure(figsize=(15, 10))

for i in range(10):
    # Original image
    plt.subplot(2, 5, i+1)
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    
    # Format the title to show prediction, confidence and uncertainty
    plt.title(f"Pred: {predicted_class[i]}, True: {y_test[i]}\n"
              f"Conf: {confidence[i]:.2f}, Uncert: {uncertainty[i]:.3f}")
    plt.axis('off')

plt.tight_layout()
plt.show()

This code will display 10 test images with their predicted class, true class, confidence score, and uncertainty measure.

Advantages of Bayesian Models

Bayesian models offer several key advantages:

Uncertainty Quantification: Instead of single point predictions, Bayesian models provide probability distributions.
Regularization: The prior distributions help prevent overfitting by regularizing parameters.
Incorporating Domain Knowledge: Prior distributions can encode expert knowledge into the model.
Handling Small Data: Bayesian methods work well even with limited data.
Decision Making: Uncertainty estimates help in making more informed decisions.

Real-World Applications

Bayesian models are especially valuable in domains where uncertainty is critical:

Medical Diagnosis

# Example of a simplified Bayesian model for medical risk prediction
def build_medical_risk_model(patient_features, outcomes):
    # Define priors for model parameters (simplified)
    weight_priors = [tfd.Normal(loc=0., scale=1.) for _ in range(patient_features.shape[1])]
    intercept_prior = tfd.Normal(loc=0., scale=1.)
    
    # Define the model
    def medical_model():
        # Sample weights from priors
        weights = [yield tfd.JointDistributionCoroutine.Root(prior) for prior in weight_priors]
        intercept = yield tfd.JointDistributionCoroutine.Root(intercept_prior)
        
        # Compute logits
        logits = intercept + tf.reduce_sum(
            [w * patient_features[:, i] for i, w in enumerate(weights)],
            axis=0
        )
        
        # Define the likelihood using a Bernoulli distribution
        yield tfd.Bernoulli(logits=logits)(outcomes)
    
    return tfd.JointDistributionCoroutine(medical_model)

Financial Forecasting

# Example of a Bayesian time series model for financial forecasting
def build_stock_price_model(prices, dates):
    # Create time features (simplified)
    time_features = np.column_stack([
        np.ones_like(dates),      # Intercept
        dates,                    # Linear trend
        np.sin(dates * 2 * np.pi / 365),  # Yearly seasonality
        np.cos(dates * 2 * np.pi / 365)   # Yearly seasonality
    ])
    
    # Define the model using LinearRegression with Bayesian inference
    model = tfp.glm.BayesianLinearRegression(
        design_matrix=time_features,
        response=prices,
        model_matrix_rank=4,
        prior_variance=10.,
        observation_noise_variance=5.
    )
    
    # Sample from the posterior
    samples = model.sample_posterior(50000)
    
    return model, samples, time_features

Summary

In this tutorial, we've explored Bayesian modeling with TensorFlow Probability. We've learned:

The fundamentals of Bayesian thinking and inference
How to implement a Bayesian linear regression model
Building Bayesian neural networks for classification
Quantifying and visualizing uncertainty in predictions
Applying Bayesian models to real-world problems

Bayesian models provide a principled approach to handling uncertainty, making them invaluable for critical applications. TensorFlow Probability makes these sophisticated techniques accessible to developers and data scientists.

Additional Resources

Here are some resources to deepen your understanding:

TensorFlow Probability Documentation
Bayesian Methods for Machine Learning (Coursera course)
Probabilistic Programming & Bayesian Methods for Hackers (Free online book)
Doing Bayesian Data Analysis by John Kruschke

Exercises

To reinforce your learning, try these exercises:

Modify the Bayesian Linear Regression model to include quadratic terms and see how it affects predictions.
Implement a Bayesian logistic regression model for a binary classification problem.
Use the Bayesian Neural Network to identify digits with high uncertainty and analyze why the model is uncertain.
Create a Bayesian model to forecast temperature for your city using historical weather data.
Compare the performance of Bayesian and non-Bayesian approaches on a dataset with limited samples.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Understanding Bayesian Thinking​

Setting Up Your First Bayesian Model​

Creating Synthetic Data​

Building a Bayesian Linear Regression Model​

Performing Bayesian Inference​

Analyzing the Results​

A Practical Example: Bayesian Neural Network​

Evaluating Uncertainty in Predictions​

Advantages of Bayesian Models​

Real-World Applications​

Medical Diagnosis​

Financial Forecasting​

Summary​

Additional Resources​

Exercises​