TensorFlow Probability

Introduction

TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning. It's designed to help developers and researchers build sophisticated models that can represent uncertainty in data, make predictions that include confidence levels, and learn from small datasets.

In traditional machine learning, we typically produce point estimates (single values) as predictions. However, in many real-world scenarios, understanding the uncertainty or confidence in our predictions is just as important as the predictions themselves. This is where probabilistic modeling comes in, and TensorFlow Probability provides the tools to implement these models efficiently.

Why Use TensorFlow Probability?

Express uncertainty: Quantify confidence in predictions
Bayesian inference: Incorporate prior knowledge into models
Robust learning: Better performance with small or noisy datasets
Complex models: Build hierarchical and composite models
Probabilistic programming: Specify models in an intuitive, declarative way

Installation

TensorFlow Probability can be installed using pip:

bash
pip install tensorflow-probability

Make sure you have TensorFlow already installed in your environment.

Key Components of TensorFlow Probability

TensorFlow Probability has several modules that provide different functionalities:

Distributions: Probability distributions and sampling operations
Layers: Neural network layers with uncertainty
MCMC: Markov Chain Monte Carlo methods for sampling from posterior distributions
Optimizers: Optimizers for probabilistic models
Stats: Statistical operations

Let's explore some of these components with examples.

Working with Distributions

Distributions are at the core of probabilistic modeling. TFP provides a wide range of probability distributions that can be used as building blocks for more complex models.

python
import tensorflow as tf
import tensorflow_probability as tfp

# Create a normal distribution
tfd = tfp.distributions
normal_dist = tfd.Normal(loc=0., scale=1.)  # Standard normal distribution

# Sample from the distribution
samples = normal_dist.sample(10)
print("Samples:", samples.numpy())

# Calculate the probability density function (PDF)
x = tf.constant([0.0, 1.0, 2.0])
log_prob = normal_dist.log_prob(x)
prob = tf.exp(log_prob)
print("PDF at x =", x.numpy(), ":", prob.numpy())

Example output:

Samples: [-1.2341,  0.5432,  0.7563, -0.3421,  1.9876, -0.2341,  0.0012, -0.6745,  0.8912,  1.3456]
PDF at x = [0. 1. 2.] : [0.3989, 0.2419, 0.0539]

Building a Simple Bayesian Model

Let's build a simple Bayesian linear regression model to predict housing prices based on square footage.

python
import tensorflow as tf
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import numpy as np

tfd = tfp.distributions

# Generate some synthetic data
true_slope = 2.0
true_intercept = 1.0
num_samples = 100
x_data = np.linspace(0, 10, num_samples)
noise = np.random.normal(0, 1, num_samples)
y_data = true_intercept + true_slope * x_data + noise

# Define priors for the model parameters
slope_prior = tfd.Normal(loc=0., scale=10.)
intercept_prior = tfd.Normal(loc=0., scale=10.)
noise_prior = tfd.InverseGamma(concentration=0.1, scale=0.1)

# Define the joint model
def model(x):
    slope = tf.Variable(tf.random.normal([]), name='slope')
    intercept = tf.Variable(tf.random.normal([]), name='intercept')
    noise_scale = tf.Variable(tf.ones([]), name='noise_scale')
    
    # Use priors to calculate their log probabilities
    slope_log_prob = slope_prior.log_prob(slope)
    intercept_log_prob = intercept_prior.log_prob(intercept)
    noise_log_prob = noise_prior.log_prob(noise_scale)
    
    # Define the likelihood as a normal distribution
    predicted = intercept + slope * x
    likelihood = tfd.Normal(loc=predicted, scale=noise_scale)
    
    return slope_log_prob, intercept_log_prob, noise_log_prob, likelihood

# Define the loss function (negative log likelihood)
def loss_fn():
    slope_log_prob, intercept_log_prob, noise_log_prob, likelihood = model(x_data)
    return -(tf.reduce_sum(likelihood.log_prob(y_data)) + 
             slope_log_prob + intercept_log_prob + noise_log_prob)

# Optimize to find the MAP estimate
optimizer = tf.optimizers.Adam(learning_rate=0.1)
variables = [var for var in tf.trainable_variables()]

for i in range(1000):
    optimizer.minimize(loss_fn, var_list=variables)
    if i % 100 == 0:
        print(f"Step {i}, Loss: {loss_fn().numpy()}")

# Extract the optimized parameters
estimated_slope = [var for var in variables if var.name == 'slope:0'][0].numpy()
estimated_intercept = [var for var in variables if var.name == 'intercept:0'][0].numpy()

print(f"True slope: {true_slope}, Estimated slope: {estimated_slope}")
print(f"True intercept: {true_intercept}, Estimated intercept: {estimated_intercept}")

# Plot the results
plt.scatter(x_data, y_data, alpha=0.5)
plt.plot(x_data, estimated_intercept + estimated_slope * x_data, 'r-', 
         label=f'y = {estimated_intercept:.2f} + {estimated_slope:.2f}x')
plt.legend()
plt.xlabel('Square footage')
plt.ylabel('Price')
plt.title('Bayesian Linear Regression for Housing Prices')
plt.show()

This example demonstrates a Bayesian approach to linear regression, where we:

Define prior distributions for our model parameters
Define a likelihood function based on our data
Optimize to find the Maximum A Posteriori (MAP) estimate
Interpret the results

Using Probabilistic Layers

TensorFlow Probability provides neural network layers that incorporate uncertainty. Let's build a simple neural network with probabilistic layers:

python
import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfpl = tfp.layers

# Create a probabilistic neural network for MNIST classification
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10 + 10),  # Parameters for mean and log variance
    tfpl.DistributionLambda(
        lambda t: tfd.Independent(
            tfd.Normal(loc=t[..., :10], scale=tf.exp(t[..., 10:])),
            reinterpreted_batch_ndims=1
        )
    )
])

# Define a negative log-likelihood loss function
def negative_log_likelihood(y_true, y_pred):
    return -y_pred.log_prob(y_true)

# Compile the model
model.compile(optimizer='adam', loss=negative_log_likelihood)

# Load and prepare the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding
y_train = tf.one_hot(y_train, 10)
y_test = tf.one_hot(y_test, 10)

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=32)

# Evaluate the model
loss = model.evaluate(x_test, y_test)
print(f"Test loss: {loss}")

# Make predictions with uncertainty
predictions = model(x_test[:10])
print("Predicted means:")
print(predictions.mean().numpy())
print("Predicted standard deviations:")
print(predictions.stddev().numpy())

This example creates a neural network for MNIST classification that outputs not just point predictions but entire probability distributions over the possible classes.

Markov Chain Monte Carlo (MCMC)

MCMC methods allow us to sample from complex posterior distributions. Let's use TFP's MCMC capabilities to perform Bayesian inference:

python
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data from a normal distribution
true_mean = 5.0
true_std = 2.0
num_samples = 100
data = np.random.normal(true_mean, true_std, num_samples)

# Define the joint log probability function
def joint_log_prob(data, mean, std):
    # Prior distributions
    prior_mean = tfd.Normal(loc=0., scale=10.)
    prior_std = tfd.InverseGamma(concentration=1.0, scale=1.0)
    
    # Likelihood
    likelihood = tfd.Normal(loc=mean, scale=std)
    
    # Joint log probability is sum of log priors and log likelihood
    return (prior_mean.log_prob(mean) + 
            prior_std.log_prob(std) + 
            tf.reduce_sum(likelihood.log_prob(data)))

# Create a closure over the data
def target_log_prob_fn(mean, std):
    return joint_log_prob(data, mean, std)

# Set up MCMC sampling
num_results = 1000
num_burnin_steps = 500

# Initialize states using current estimates
initial_mean = tf.constant(np.mean(data), dtype=tf.float32)
initial_std = tf.constant(np.std(data), dtype=tf.float32)

# Define transition kernels for MCMC
kernel = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=target_log_prob_fn,
    step_size=0.1,
    num_leapfrog_steps=3)

# Add adaptation for better sampling
adaptive_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
    kernel,
    num_adaptation_steps=int(num_burnin_steps * 0.8))

# Run the chain
@tf.function
def run_mcmc():
    return tfp.mcmc.sample_chain(
        num_results=num_results,
        num_burnin_steps=num_burnin_steps,
        current_state=[initial_mean, initial_std],
        kernel=adaptive_kernel,
        trace_fn=None)

# Sample from the posterior
samples = run_mcmc()
mean_samples, std_samples = samples

# Display results
print(f"True mean: {true_mean}")
print(f"True std: {true_std}")
print(f"Inferred mean: {tf.reduce_mean(mean_samples).numpy():.3f}")
print(f"Inferred std: {tf.reduce_mean(std_samples).numpy():.3f}")

# Plot posterior distributions
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(mean_samples.numpy(), bins=30, density=True)
plt.axvline(x=true_mean, color='r', linestyle='--')
plt.title('Posterior for mean')
plt.xlabel('Mean')

plt.subplot(1, 2, 2)
plt.hist(std_samples.numpy(), bins=30, density=True)
plt.axvline(x=true_std, color='r', linestyle='--')
plt.title('Posterior for standard deviation')
plt.xlabel('Standard deviation')

plt.tight_layout()
plt.show()

This example demonstrates:

How to define a joint probability distribution
How to use MCMC to sample from the posterior
How to visualize and interpret the results

Real-World Application: Anomaly Detection

Let's apply TensorFlow Probability to a practical problem: detecting anomalies in time series data.

python
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic time series data with anomalies
np.random.seed(42)
n_samples = 1000
time = np.arange(n_samples)

# Normal pattern: seasonal component + trend
seasonal = 10 * np.sin(2 * np.pi * time / 50)
trend = 0.01 * time
noise = np.random.normal(0, 1, n_samples)

# Add some anomalies
anomaly_indices = [100, 200, 300, 400, 700, 800]
anomaly_values = [15, -15, 20, -20, 25, -25]
anomalies = np.zeros(n_samples)
for idx, val in zip(anomaly_indices, anomaly_values):
    anomalies[idx] = val

# Combine all components
data = seasonal + trend + noise + anomalies

# Build a probabilistic model for the time series
def build_model():
    # Define distributions for components
    trend_dist = tfd.Normal(loc=0., scale=0.1)
    seasonal_dist = tfd.Normal(loc=0., scale=5.)
    noise_dist = tfd.Normal(loc=0., scale=1.)
    
    # Define the model
    def model_fn():
        # Parameters for trend component
        trend_coef = tf.Variable(0.01, name='trend_coefficient')
        trend = trend_coef * time
        
        # Parameters for seasonal component (as Fourier series)
        amplitude = tf.Variable(10., name='amplitude')
        frequency = tf.Variable(2 * np.pi / 50, name='frequency')
        phase = tf.Variable(0., name='phase')
        seasonal = amplitude * tf.sin(frequency * time + phase)
        
        # Noise scale
        noise_scale = tf.Variable(1., name='noise_scale')
        
        # Expected value = trend + seasonal
        expected_value = trend + seasonal
        
        # Distribution of observations
        observation_dist = tfd.Normal(loc=expected_value, scale=noise_scale)
        
        # Prior log probabilities
        trend_log_prob = trend_dist.log_prob(trend_coef)
        seasonal_amp_log_prob = seasonal_dist.log_prob(amplitude)
        noise_log_prob = noise_dist.log_prob(noise_scale)
        
        return observation_dist, trend_log_prob, seasonal_amp_log_prob, noise_log_prob
    
    return model_fn

# Create and optimize the model
model_fn = build_model()

def loss_fn():
    observation_dist, *log_probs = model_fn()
    return -(tf.reduce_sum(observation_dist.log_prob(data)) + sum(log_probs))

# Optimize parameters
optimizer = tf.optimizers.Adam(learning_rate=0.01)
variables = [var for var in tf.trainable_variables()]

for i in range(1000):
    optimizer.minimize(loss_fn, var_list=variables)
    if i % 100 == 0:
        print(f"Step {i}, Loss: {loss_fn().numpy()}")

# Get the fitted model
observation_dist, *_ = model_fn()

# Calculate anomaly scores (negative log probabilities)
log_probs = observation_dist.log_prob(data).numpy()
anomaly_scores = -log_probs

# Define a threshold for anomaly detection
threshold = np.percentile(anomaly_scores, 98)  # 98th percentile as threshold
detected_anomalies = anomaly_scores > threshold

# Visualize results
plt.figure(figsize=(15, 10))

plt.subplot(3, 1, 1)
plt.plot(time, data)
plt.title('Raw Time Series Data')
plt.ylabel('Value')

plt.subplot(3, 1, 2)
plt.plot(time, anomaly_scores)
plt.axhline(y=threshold, color='r', linestyle='--')
plt.title('Anomaly Scores with Threshold')
plt.ylabel('Anomaly Score')

plt.subplot(3, 1, 3)
plt.plot(time, data)
plt.scatter(time[detected_anomalies], data[detected_anomalies], color='r', marker='o')
for idx in anomaly_indices:
    plt.axvline(x=idx, color='g', linestyle='--', alpha=0.5)
plt.title('Detected Anomalies (red) vs True Anomalies (green lines)')
plt.ylabel('Value')
plt.xlabel('Time')

plt.tight_layout()
plt.show()

# Evaluate detection accuracy
true_anomaly_mask = np.zeros(n_samples, dtype=bool)
true_anomaly_mask[anomaly_indices] = True

# Calculate precision and recall
true_positives = np.sum(detected_anomalies & true_anomaly_mask)
false_positives = np.sum(detected_anomalies & ~true_anomaly_mask)
false_negatives = np.sum(~detected_anomalies & true_anomaly_mask)

precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)

print(f"Anomaly Detection Results:")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {2 * precision * recall / (precision + recall):.2f}")

This application shows how to use TensorFlow Probability to:

Model time series data with trend, seasonal, and noise components
Identify anomalies using probabilistic measures
Evaluate the performance of anomaly detection

Summary

TensorFlow Probability is a powerful library that extends TensorFlow's capabilities to probabilistic modeling and statistical inference. It enables you to:

Work with probability distributions
Build Bayesian models with uncertainty quantification
Perform MCMC sampling from complex distributions
Create probabilistic neural networks
Apply probabilistic methods to real-world problems like anomaly detection

By incorporating uncertainty into your models, you can make more robust predictions, especially in situations with limited data or where understanding confidence is critical.

Additional Resources

TensorFlow Probability Official Documentation
TFP Tutorials on GitHub
Probabilistic Machine Learning: An Introduction by Kevin Murphy
Bayesian Methods for Hackers by Cameron Davidson-Pilon

Exercises

Basic Probability: Create different probability distributions (Normal, Poisson, Beta) and visualize their probability density functions.
Bayesian Linear Regression: Implement a Bayesian linear regression model on a dataset of your choice and compare its performance with traditional linear regression.
MCMC Sampling: Use MCMC methods to sample from a mixture of Gaussian distributions and visualize the results.
Classification with Uncertainty: Build a probabilistic neural network for a classification task and analyze the uncertainty in its predictions.
Time Series Forecasting: Apply TensorFlow Probability to forecast a time series dataset with uncertainty bands around the predictions.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Use TensorFlow Probability?​

Installation​

Key Components of TensorFlow Probability​

Working with Distributions​

Building a Simple Bayesian Model​

Using Probabilistic Layers​

Markov Chain Monte Carlo (MCMC)​

Real-World Application: Anomaly Detection​

Summary​

Additional Resources​

Exercises​