TensorFlow Probability
Introduction
TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning. It's designed to help developers and researchers build sophisticated models that can represent uncertainty in data, make predictions that include confidence levels, and learn from small datasets.
In traditional machine learning, we typically produce point estimates (single values) as predictions. However, in many real-world scenarios, understanding the uncertainty or confidence in our predictions is just as important as the predictions themselves. This is where probabilistic modeling comes in, and TensorFlow Probability provides the tools to implement these models efficiently.
Why Use TensorFlow Probability?
- Express uncertainty: Quantify confidence in predictions
- Bayesian inference: Incorporate prior knowledge into models
- Robust learning: Better performance with small or noisy datasets
- Complex models: Build hierarchical and composite models
- Probabilistic programming: Specify models in an intuitive, declarative way
Installation
TensorFlow Probability can be installed using pip:
pip install tensorflow-probability
Make sure you have TensorFlow already installed in your environment.
Key Components of TensorFlow Probability
TensorFlow Probability has several modules that provide different functionalities:
- Distributions: Probability distributions and sampling operations
- Layers: Neural network layers with uncertainty
- MCMC: Markov Chain Monte Carlo methods for sampling from posterior distributions
- Optimizers: Optimizers for probabilistic models
- Stats: Statistical operations
Let's explore some of these components with examples.
Working with Distributions
Distributions are at the core of probabilistic modeling. TFP provides a wide range of probability distributions that can be used as building blocks for more complex models.
import tensorflow as tf
import tensorflow_probability as tfp
# Create a normal distribution
tfd = tfp.distributions
normal_dist = tfd.Normal(loc=0., scale=1.) # Standard normal distribution
# Sample from the distribution
samples = normal_dist.sample(10)
print("Samples:", samples.numpy())
# Calculate the probability density function (PDF)
x = tf.constant([0.0, 1.0, 2.0])
log_prob = normal_dist.log_prob(x)
prob = tf.exp(log_prob)
print("PDF at x =", x.numpy(), ":", prob.numpy())
Example output:
Samples: [-1.2341, 0.5432, 0.7563, -0.3421, 1.9876, -0.2341, 0.0012, -0.6745, 0.8912, 1.3456]
PDF at x = [0. 1. 2.] : [0.3989, 0.2419, 0.0539]
Building a Simple Bayesian Model
Let's build a simple Bayesian linear regression model to predict housing prices based on square footage.
import tensorflow as tf
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import numpy as np
tfd = tfp.distributions
# Generate some synthetic data
true_slope = 2.0
true_intercept = 1.0
num_samples = 100
x_data = np.linspace(0, 10, num_samples)
noise = np.random.normal(0, 1, num_samples)
y_data = true_intercept + true_slope * x_data + noise
# Define priors for the model parameters
slope_prior = tfd.Normal(loc=0., scale=10.)
intercept_prior = tfd.Normal(loc=0., scale=10.)
noise_prior = tfd.InverseGamma(concentration=0.1, scale=0.1)
# Define the joint model
def model(x):
slope = tf.Variable(tf.random.normal([]), name='slope')
intercept = tf.Variable(tf.random.normal([]), name='intercept')
noise_scale = tf.Variable(tf.ones([]), name='noise_scale')
# Use priors to calculate their log probabilities
slope_log_prob = slope_prior.log_prob(slope)
intercept_log_prob = intercept_prior.log_prob(intercept)
noise_log_prob = noise_prior.log_prob(noise_scale)
# Define the likelihood as a normal distribution
predicted = intercept + slope * x
likelihood = tfd.Normal(loc=predicted, scale=noise_scale)
return slope_log_prob, intercept_log_prob, noise_log_prob, likelihood
# Define the loss function (negative log likelihood)
def loss_fn():
slope_log_prob, intercept_log_prob, noise_log_prob, likelihood = model(x_data)
return -(tf.reduce_sum(likelihood.log_prob(y_data)) +
slope_log_prob + intercept_log_prob + noise_log_prob)
# Optimize to find the MAP estimate
optimizer = tf.optimizers.Adam(learning_rate=0.1)
variables = [var for var in tf.trainable_variables()]
for i in range(1000):
optimizer.minimize(loss_fn, var_list=variables)
if i % 100 == 0:
print(f"Step {i}, Loss: {loss_fn().numpy()}")
# Extract the optimized parameters
estimated_slope = [var for var in variables if var.name == 'slope:0'][0].numpy()
estimated_intercept = [var for var in variables if var.name == 'intercept:0'][0].numpy()
print(f"True slope: {true_slope}, Estimated slope: {estimated_slope}")
print(f"True intercept: {true_intercept}, Estimated intercept: {estimated_intercept}")
# Plot the results
plt.scatter(x_data, y_data, alpha=0.5)
plt.plot(x_data, estimated_intercept + estimated_slope * x_data, 'r-',
label=f'y = {estimated_intercept:.2f} + {estimated_slope:.2f}x')
plt.legend()
plt.xlabel('Square footage')
plt.ylabel('Price')
plt.title('Bayesian Linear Regression for Housing Prices')
plt.show()
This example demonstrates a Bayesian approach to linear regression, where we:
- Define prior distributions for our model parameters
- Define a likelihood function based on our data
- Optimize to find the Maximum A Posteriori (MAP) estimate
- Interpret the results
Using Probabilistic Layers
TensorFlow Probability provides neural network layers that incorporate uncertainty. Let's build a simple neural network with probabilistic layers:
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfpl = tfp.layers
# Create a probabilistic neural network for MNIST classification
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10 + 10), # Parameters for mean and log variance
tfpl.DistributionLambda(
lambda t: tfd.Independent(
tfd.Normal(loc=t[..., :10], scale=tf.exp(t[..., 10:])),
reinterpreted_batch_ndims=1
)
)
])
# Define a negative log-likelihood loss function
def negative_log_likelihood(y_true, y_pred):
return -y_pred.log_prob(y_true)
# Compile the model
model.compile(optimizer='adam', loss=negative_log_likelihood)
# Load and prepare the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Convert labels to one-hot encoding
y_train = tf.one_hot(y_train, 10)
y_test = tf.one_hot(y_test, 10)
# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=32)
# Evaluate the model
loss = model.evaluate(x_test, y_test)
print(f"Test loss: {loss}")
# Make predictions with uncertainty
predictions = model(x_test[:10])
print("Predicted means:")
print(predictions.mean().numpy())
print("Predicted standard deviations:")
print(predictions.stddev().numpy())
This example creates a neural network for MNIST classification that outputs not just point predictions but entire probability distributions over the possible classes.
Markov Chain Monte Carlo (MCMC)
MCMC methods allow us to sample from complex posterior distributions. Let's use TFP's MCMC capabilities to perform Bayesian inference:
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data from a normal distribution
true_mean = 5.0
true_std = 2.0
num_samples = 100
data = np.random.normal(true_mean, true_std, num_samples)
# Define the joint log probability function
def joint_log_prob(data, mean, std):
# Prior distributions
prior_mean = tfd.Normal(loc=0., scale=10.)
prior_std = tfd.InverseGamma(concentration=1.0, scale=1.0)
# Likelihood
likelihood = tfd.Normal(loc=mean, scale=std)
# Joint log probability is sum of log priors and log likelihood
return (prior_mean.log_prob(mean) +
prior_std.log_prob(std) +
tf.reduce_sum(likelihood.log_prob(data)))
# Create a closure over the data
def target_log_prob_fn(mean, std):
return joint_log_prob(data, mean, std)
# Set up MCMC sampling
num_results = 1000
num_burnin_steps = 500
# Initialize states using current estimates
initial_mean = tf.constant(np.mean(data), dtype=tf.float32)
initial_std = tf.constant(np.std(data), dtype=tf.float32)
# Define transition kernels for MCMC
kernel = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=target_log_prob_fn,
step_size=0.1,
num_leapfrog_steps=3)
# Add adaptation for better sampling
adaptive_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
kernel,
num_adaptation_steps=int(num_burnin_steps * 0.8))
# Run the chain
@tf.function
def run_mcmc():
return tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=[initial_mean, initial_std],
kernel=adaptive_kernel,
trace_fn=None)
# Sample from the posterior
samples = run_mcmc()
mean_samples, std_samples = samples
# Display results
print(f"True mean: {true_mean}")
print(f"True std: {true_std}")
print(f"Inferred mean: {tf.reduce_mean(mean_samples).numpy():.3f}")
print(f"Inferred std: {tf.reduce_mean(std_samples).numpy():.3f}")
# Plot posterior distributions
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(mean_samples.numpy(), bins=30, density=True)
plt.axvline(x=true_mean, color='r', linestyle='--')
plt.title('Posterior for mean')
plt.xlabel('Mean')
plt.subplot(1, 2, 2)
plt.hist(std_samples.numpy(), bins=30, density=True)
plt.axvline(x=true_std, color='r', linestyle='--')
plt.title('Posterior for standard deviation')
plt.xlabel('Standard deviation')
plt.tight_layout()
plt.show()
This example demonstrates:
- How to define a joint probability distribution
- How to use MCMC to sample from the posterior
- How to visualize and interpret the results
Real-World Application: Anomaly Detection
Let's apply TensorFlow Probability to a practical problem: detecting anomalies in time series data.
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic time series data with anomalies
np.random.seed(42)
n_samples = 1000
time = np.arange(n_samples)
# Normal pattern: seasonal component + trend
seasonal = 10 * np.sin(2 * np.pi * time / 50)
trend = 0.01 * time
noise = np.random.normal(0, 1, n_samples)
# Add some anomalies
anomaly_indices = [100, 200, 300, 400, 700, 800]
anomaly_values = [15, -15, 20, -20, 25, -25]
anomalies = np.zeros(n_samples)
for idx, val in zip(anomaly_indices, anomaly_values):
anomalies[idx] = val
# Combine all components
data = seasonal + trend + noise + anomalies
# Build a probabilistic model for the time series
def build_model():
# Define distributions for components
trend_dist = tfd.Normal(loc=0., scale=0.1)
seasonal_dist = tfd.Normal(loc=0., scale=5.)
noise_dist = tfd.Normal(loc=0., scale=1.)
# Define the model
def model_fn():
# Parameters for trend component
trend_coef = tf.Variable(0.01, name='trend_coefficient')
trend = trend_coef * time
# Parameters for seasonal component (as Fourier series)
amplitude = tf.Variable(10., name='amplitude')
frequency = tf.Variable(2 * np.pi / 50, name='frequency')
phase = tf.Variable(0., name='phase')
seasonal = amplitude * tf.sin(frequency * time + phase)
# Noise scale
noise_scale = tf.Variable(1., name='noise_scale')
# Expected value = trend + seasonal
expected_value = trend + seasonal
# Distribution of observations
observation_dist = tfd.Normal(loc=expected_value, scale=noise_scale)
# Prior log probabilities
trend_log_prob = trend_dist.log_prob(trend_coef)
seasonal_amp_log_prob = seasonal_dist.log_prob(amplitude)
noise_log_prob = noise_dist.log_prob(noise_scale)
return observation_dist, trend_log_prob, seasonal_amp_log_prob, noise_log_prob
return model_fn
# Create and optimize the model
model_fn = build_model()
def loss_fn():
observation_dist, *log_probs = model_fn()
return -(tf.reduce_sum(observation_dist.log_prob(data)) + sum(log_probs))
# Optimize parameters
optimizer = tf.optimizers.Adam(learning_rate=0.01)
variables = [var for var in tf.trainable_variables()]
for i in range(1000):
optimizer.minimize(loss_fn, var_list=variables)
if i % 100 == 0:
print(f"Step {i}, Loss: {loss_fn().numpy()}")
# Get the fitted model
observation_dist, *_ = model_fn()
# Calculate anomaly scores (negative log probabilities)
log_probs = observation_dist.log_prob(data).numpy()
anomaly_scores = -log_probs
# Define a threshold for anomaly detection
threshold = np.percentile(anomaly_scores, 98) # 98th percentile as threshold
detected_anomalies = anomaly_scores > threshold
# Visualize results
plt.figure(figsize=(15, 10))
plt.subplot(3, 1, 1)
plt.plot(time, data)
plt.title('Raw Time Series Data')
plt.ylabel('Value')
plt.subplot(3, 1, 2)
plt.plot(time, anomaly_scores)
plt.axhline(y=threshold, color='r', linestyle='--')
plt.title('Anomaly Scores with Threshold')
plt.ylabel('Anomaly Score')
plt.subplot(3, 1, 3)
plt.plot(time, data)
plt.scatter(time[detected_anomalies], data[detected_anomalies], color='r', marker='o')
for idx in anomaly_indices:
plt.axvline(x=idx, color='g', linestyle='--', alpha=0.5)
plt.title('Detected Anomalies (red) vs True Anomalies (green lines)')
plt.ylabel('Value')
plt.xlabel('Time')
plt.tight_layout()
plt.show()
# Evaluate detection accuracy
true_anomaly_mask = np.zeros(n_samples, dtype=bool)
true_anomaly_mask[anomaly_indices] = True
# Calculate precision and recall
true_positives = np.sum(detected_anomalies & true_anomaly_mask)
false_positives = np.sum(detected_anomalies & ~true_anomaly_mask)
false_negatives = np.sum(~detected_anomalies & true_anomaly_mask)
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
print(f"Anomaly Detection Results:")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {2 * precision * recall / (precision + recall):.2f}")
This application shows how to use TensorFlow Probability to:
- Model time series data with trend, seasonal, and noise components
- Identify anomalies using probabilistic measures
- Evaluate the performance of anomaly detection
Summary
TensorFlow Probability is a powerful library that extends TensorFlow's capabilities to probabilistic modeling and statistical inference. It enables you to:
- Work with probability distributions
- Build Bayesian models with uncertainty quantification
- Perform MCMC sampling from complex distributions
- Create probabilistic neural networks
- Apply probabilistic methods to real-world problems like anomaly detection
By incorporating uncertainty into your models, you can make more robust predictions, especially in situations with limited data or where understanding confidence is critical.
Additional Resources
- TensorFlow Probability Official Documentation
- TFP Tutorials on GitHub
- Probabilistic Machine Learning: An Introduction by Kevin Murphy
- Bayesian Methods for Hackers by Cameron Davidson-Pilon
Exercises
-
Basic Probability: Create different probability distributions (Normal, Poisson, Beta) and visualize their probability density functions.
-
Bayesian Linear Regression: Implement a Bayesian linear regression model on a dataset of your choice and compare its performance with traditional linear regression.
-
MCMC Sampling: Use MCMC methods to sample from a mixture of Gaussian distributions and visualize the results.
-
Classification with Uncertainty: Build a probabilistic neural network for a classification task and analyze the uncertainty in its predictions.
-
Time Series Forecasting: Apply TensorFlow Probability to forecast a time series dataset with uncertainty bands around the predictions.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)