TensorFlow Distributions
Introduction
Probability distributions are fundamental building blocks in the world of statistics and machine learning. They help us model uncertainty and make predictions based on incomplete information. TensorFlow Probability (TFP) provides a rich set of distributions through its tfp.distributions
module that allows you to incorporate probabilistic components into your TensorFlow models.
In this tutorial, you'll learn:
- What probability distributions are in the context of TensorFlow
- How to create and manipulate distributions
- How to sample from distributions and compute probabilities
- How to use distributions in practical machine learning applications
Understanding Probability Distributions
A probability distribution describes the likelihood of different outcomes in a random experiment. For example, a fair coin toss follows a distribution where heads and tails each have 0.5 probability.
TensorFlow Probability provides implementations of many common distributions, making it easy to:
- Generate random samples
- Compute probabilities and log probabilities
- Calculate statistics like mean and variance
- Build complex probabilistic models
Getting Started with TensorFlow Distributions
First, let's install TensorFlow Probability and import the necessary libraries:
# Install TensorFlow Probability
!pip install tensorflow-probability
# Import libraries
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
tfd = tfp.distributions # Convenient alias for distributions
Creating Basic Distributions
Let's start by creating some common distributions:
# Create a normal (Gaussian) distribution with mean=0 and standard deviation=1
normal_dist = tfd.Normal(loc=0.0, scale=1.0)
# Create a uniform distribution between 0 and 5
uniform_dist = tfd.Uniform(low=0.0, high=5.0)
# Create a Bernoulli distribution (coin flip) with probability of success 0.7
bernoulli_dist = tfd.Bernoulli(probs=0.7)
print("Created distributions:")
print(f"Normal: {normal_dist}")
print(f"Uniform: {uniform_dist}")
print(f"Bernoulli: {bernoulli_dist}")
Output:
Created distributions:
Normal: tfp.distributions.Normal("Normal", batch_shape=[], event_shape=[], dtype=float32)
Uniform: tfp.distributions.Uniform("Uniform", batch_shape=[], event_shape=[], dtype=float32)
Bernoulli: tfp.distributions.Bernoulli("Bernoulli", batch_shape=[], event_shape=[], dtype=float32)
Sampling from Distributions
Once you have created a distribution, you can easily generate samples from it:
# Generate 5 samples from each distribution
normal_samples = normal_dist.sample(5)
uniform_samples = uniform_dist.sample(5)
bernoulli_samples = bernoulli_dist.sample(5)
print("Normal samples:", normal_samples.numpy())
print("Uniform samples:", uniform_samples.numpy())
print("Bernoulli samples:", bernoulli_samples.numpy())
Output:
Normal samples: [ 0.6763483 -0.28694868 -0.09973579 0.8896322 -0.7025261 ]
Uniform samples: [2.7003956 0.8336139 3.8260238 4.1877055 3.1253445]
Bernoulli samples: [1. 1. 1. 0. 1.]
Computing Probabilities
You can compute the probability density function (PDF) or probability mass function (PMF) at specific values:
# Compute the PDF for the normal distribution at x=0
normal_pdf_at_0 = normal_dist.prob(0.0)
print(f"Normal PDF at x=0: {normal_pdf_at_0.numpy()}")
# Compute the PDF for the uniform distribution at x=2.5
uniform_pdf_at_2_5 = uniform_dist.prob(2.5)
print(f"Uniform PDF at x=2.5: {uniform_pdf_at_2_5.numpy()}")
# Compute the PMF for the Bernoulli distribution at x=1
bernoulli_pmf_at_1 = bernoulli_dist.prob(1.0)
print(f"Bernoulli PMF at x=1: {bernoulli_pmf_at_1.numpy()}")
Output:
Normal PDF at x=0: 0.3989423
Uniform PDF at x=2.5: 0.2
Bernoulli PMF at x=1: 0.7
Distribution Statistics
TensorFlow Probability distributions provide methods to compute various statistical properties:
# Mean and variance of distributions
print(f"Normal mean: {normal_dist.mean().numpy()}, variance: {normal_dist.variance().numpy()}")
print(f"Uniform mean: {uniform_dist.mean().numpy()}, variance: {uniform_dist.variance().numpy()}")
print(f"Bernoulli mean: {bernoulli_dist.mean().numpy()}, variance: {bernoulli_dist.variance().numpy()}")
# Standard deviation
print(f"Normal standard deviation: {normal_dist.stddev().numpy()}")
Output:
Normal mean: 0.0, variance: 1.0
Uniform mean: 2.5, variance: 2.0833333
Bernoulli mean: 0.7, variance: 0.21000001
Normal standard deviation: 1.0
Working with Multivariate Distributions
TensorFlow Probability also supports multivariate distributions:
# Create a 2D multivariate normal distribution
mv_normal = tfd.MultivariateNormalDiag(
loc=[0., 1.], # Mean vector
scale_diag=[1., 2.] # Diagonal of covariance matrix
)
# Sample from the distribution
samples = mv_normal.sample(5)
print("Multivariate normal samples:")
print(samples.numpy())
# Compute the log probability
log_prob = mv_normal.log_prob(samples)
print("Log probabilities:")
print(log_prob.numpy())
Output:
Multivariate normal samples:
[[ 1.7672778 3.3224177 ]
[ 0.8642473 -0.8449592 ]
[ 1.428548 0.8098803 ]
[ 0.7707763 4.784366 ]
[-0.35763764 1.1392013 ]]
Log probabilities:
[-4.398999 -4.1543307 -2.5179086 -6.6900334 -2.3207066]
Parameterizing Distributions with TensorFlow Variables
One powerful feature of TensorFlow Probability is the ability to parameterize distributions with trainable variables:
# Create a trainable Normal distribution
loc = tf.Variable(0.0)
scale = tf.Variable(1.0)
trainable_normal = tfd.Normal(loc=loc, scale=scale)
# Define a loss function
def loss_fn():
# Log probability of observing the data
return -tf.reduce_mean(trainable_normal.log_prob([0.5, 0.2, -0.1, 0.0]))
# Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
# Training loop
for i in range(100):
optimizer.minimize(loss_fn, var_list=[loc, scale])
if i % 20 == 0:
print(f"Step {i}: loc = {loc.numpy():.4f}, scale = {scale.numpy():.4f}, loss = {loss_fn().numpy():.4f}")
print(f"Final: loc = {loc.numpy():.4f}, scale = {scale.numpy():.4f}, loss = {loss_fn().numpy():.4f}")
Output:
Step 0: loc = 0.0750, scale = 0.9456, loss = 1.0928
Step 20: loc = 0.1443, scale = 0.2785, loss = 0.6631
Step 40: loc = 0.1498, scale = 0.2572, loss = 0.6173
Step 60: loc = 0.1500, scale = 0.2541, loss = 0.6119
Step 80: loc = 0.1500, scale = 0.2528, loss = 0.6099
Final: loc = 0.1500, scale = 0.2524, loss = 0.6093
Practical Examples
Example 1: Gaussian Mixture Model
Let's create a simple Gaussian mixture model with two components:
# Define two Gaussian components
mix_probs = [0.3, 0.7] # Component weights
component_1 = tfd.Normal(loc=-2.0, scale=0.5)
component_2 = tfd.Normal(loc=1.0, scale=1.0)
# Create the mixture distribution
mixture = tfd.Mixture(
cat=tfd.Categorical(probs=mix_probs),
components=[component_1, component_2]
)
# Generate samples
x = np.linspace(-5, 5, 1000)
samples = mixture.sample(5000)
# Plot the results
plt.figure(figsize=(10, 6))
plt.hist(samples.numpy(), bins=50, density=True, alpha=0.5, label='Samples')
plt.plot(x, [mixture.prob(point).numpy() for point in x], 'r-', label='PDF')
plt.title('Gaussian Mixture Model')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
Example 2: Bayesian Linear Regression
Now let's implement a simple Bayesian linear regression example:
# Generate synthetic data
true_slope = 0.5
true_intercept = 1.2
true_noise = 0.3
NUM_EXAMPLES = 100
x = np.linspace(-3, 3, NUM_EXAMPLES)
y = true_intercept + true_slope * x + np.random.normal(0, true_noise, NUM_EXAMPLES)
# Define the model
def bayesian_linear_regression():
# Define priors
slope = tfd.Normal(loc=0., scale=1.)
intercept = tfd.Normal(loc=0., scale=1.)
noise = tfd.HalfNormal(scale=1.)
# Define the linear model
y_pred = intercept + slope * x
# Define the likelihood
likelihood = tfd.Normal(loc=y_pred, scale=noise)
# Observe data
return likelihood.log_prob(y)
# Perform variational inference
@tf.function
def train_step(optimizer, variational_model):
with tf.GradientTape() as tape:
loss = -variational_model.target_log_prob()
grads = tape.gradient(loss, variational_model.trainable_variables)
optimizer.apply_gradients(zip(grads, variational_model.trainable_variables))
return loss
# Create surrogate posterior using variational inference
surrogate_posterior = tfp.experimental.vi.build_factored_surrogate_posterior(
event_shape=[1, 1, 1],
constraining_bijectors=[
tfp.bijectors.Identity(), # slope
tfp.bijectors.Identity(), # intercept
tfp.bijectors.Softplus(), # noise (must be positive)
]
)
# Define the variational model
target_log_prob_fn = lambda slope, intercept, noise: tf.reduce_sum(
tfd.Normal(loc=intercept + slope * x[:, tf.newaxis], scale=noise).log_prob(y[:, tf.newaxis]) +
tfd.Normal(loc=0., scale=1.).log_prob(slope) +
tfd.Normal(loc=0., scale=1.).log_prob(intercept) +
tfd.HalfNormal(scale=1.).log_prob(noise)
)
# Set up optimizer
optimizer = tf.optimizers.Adam(learning_rate=0.05)
# Variational inference
losses = []
num_steps = 500
# Training loop
for step in range(num_steps):
loss = tfp.vi.fit_surrogate_posterior(
target_log_prob_fn,
surrogate_posterior,
optimizer=optimizer,
num_steps=1
)
if step % 100 == 0:
losses.append(loss)
print(f'Step {step}: Loss = {loss}')
# Extract the learned parameters
q_slope = surrogate_posterior.parameters[0]
q_intercept = surrogate_posterior.parameters[1]
q_noise = surrogate_posterior.parameters[2]
print(f'Learned slope: mean = {q_slope[0].numpy()[0]}, std = {tf.exp(q_slope[1]).numpy()[0]}')
print(f'Learned intercept: mean = {q_intercept[0].numpy()[0]}, std = {tf.exp(q_intercept[1]).numpy()[0]}')
print(f'Learned noise: mean = {q_noise[0].numpy()[0]}, std = {tf.exp(q_noise[1]).numpy()[0]}')
print(f'True values: slope = {true_slope}, intercept = {true_intercept}, noise = {true_noise}')
Transformations and Compositions
TensorFlow Probability also allows you to transform distributions and create complex compositions:
# Create a normal distribution
base_dist = tfd.Normal(loc=0.0, scale=1.0)
# Apply a transformation to create a log-normal distribution
log_normal = tfd.TransformedDistribution(
distribution=base_dist,
bijector=tfp.bijectors.Exp()
)
# Sample from both distributions
normal_samples = base_dist.sample(1000)
log_normal_samples = log_normal.sample(1000)
# Compare the samples
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(normal_samples.numpy(), bins=30, alpha=0.7)
plt.title('Normal Distribution')
plt.subplot(1, 2, 2)
plt.hist(log_normal_samples.numpy(), bins=30, alpha=0.7)
plt.title('Log-Normal Distribution')
plt.tight_layout()
Common Distributions in TensorFlow Probability
Here's a reference of some commonly used distributions in TFP:
-
Continuous Distributions
tfd.Normal
: Gaussian distributiontfd.Uniform
: Uniform distributiontfd.Beta
: Beta distributiontfd.Gamma
: Gamma distributiontfd.Exponential
: Exponential distributiontfd.StudentT
: Student's t-distributiontfd.Laplace
: Laplace distribution
-
Discrete Distributions
tfd.Bernoulli
: Bernoulli distribution (binary outcomes)tfd.Categorical
: Categorical distribution (discrete outcomes)tfd.Poisson
: Poisson distribution (count data)tfd.Binomial
: Binomial distribution (number of successes)tfd.NegativeBinomial
: Negative Binomial distribution
-
Multivariate Distributions
tfd.MultivariateNormalDiag
: Multivariate normal with diagonal covariancetfd.MultivariateNormalTriL
: Multivariate normal with full covariancetfd.Dirichlet
: Dirichlet distributiontfd.Multinomial
: Multinomial distribution
-
Joint Distributions
tfd.JointDistributionSequential
: Builds joint distributions from sequencestfd.JointDistributionNamed
: Builds joint distributions using dictionaries
Summary
In this tutorial, you've learned about TensorFlow Probability's distributions module, which provides powerful tools for working with probability distributions in your machine learning models. We covered:
- Creating basic distributions
- Sampling from distributions
- Computing probabilities and statistics
- Working with multivariate distributions
- Parameterizing distributions with trainable variables
- Building mixture models and Bayesian regression
- Transforming and composing distributions
Probability distributions are fundamental to many advanced machine learning techniques, including Bayesian inference, generative modeling, and reinforcement learning. TensorFlow Probability makes these techniques more accessible by providing optimized implementations that integrate seamlessly with the broader TensorFlow ecosystem.
Additional Resources
- TensorFlow Probability Documentation
- TFP Distributions Guide
- Probabilistic ML for Hackers
- Book: "Bayesian Methods for Hackers" by Cameron Davidson-Pilon
Exercises
- Create a Beta distribution and visualize its PDF for different parameters.
- Build a Poisson distribution to model count data. Sample from it and compute the mean and variance.
- Implement a simple Bayesian model to estimate the bias of a coin from observed flips.
- Create a mixture of three normal distributions and visualize its probability density function.
- Use
tfp.distributions.Categorical
to implement a simple discrete choice model.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)