TensorFlow Distributions

Introduction

Probability distributions are fundamental building blocks in the world of statistics and machine learning. They help us model uncertainty and make predictions based on incomplete information. TensorFlow Probability (TFP) provides a rich set of distributions through its tfp.distributions module that allows you to incorporate probabilistic components into your TensorFlow models.

In this tutorial, you'll learn:

What probability distributions are in the context of TensorFlow
How to create and manipulate distributions
How to sample from distributions and compute probabilities
How to use distributions in practical machine learning applications

Understanding Probability Distributions

A probability distribution describes the likelihood of different outcomes in a random experiment. For example, a fair coin toss follows a distribution where heads and tails each have 0.5 probability.

TensorFlow Probability provides implementations of many common distributions, making it easy to:

Generate random samples
Compute probabilities and log probabilities
Calculate statistics like mean and variance
Build complex probabilistic models

Getting Started with TensorFlow Distributions

First, let's install TensorFlow Probability and import the necessary libraries:

python
# Install TensorFlow Probability
!pip install tensorflow-probability

# Import libraries
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt

tfd = tfp.distributions  # Convenient alias for distributions

Creating Basic Distributions

Let's start by creating some common distributions:

python
# Create a normal (Gaussian) distribution with mean=0 and standard deviation=1
normal_dist = tfd.Normal(loc=0.0, scale=1.0)

# Create a uniform distribution between 0 and 5
uniform_dist = tfd.Uniform(low=0.0, high=5.0)

# Create a Bernoulli distribution (coin flip) with probability of success 0.7
bernoulli_dist = tfd.Bernoulli(probs=0.7)

print("Created distributions:")
print(f"Normal: {normal_dist}")
print(f"Uniform: {uniform_dist}")
print(f"Bernoulli: {bernoulli_dist}")

Output:

Created distributions:
Normal: tfp.distributions.Normal("Normal", batch_shape=[], event_shape=[], dtype=float32)
Uniform: tfp.distributions.Uniform("Uniform", batch_shape=[], event_shape=[], dtype=float32)
Bernoulli: tfp.distributions.Bernoulli("Bernoulli", batch_shape=[], event_shape=[], dtype=float32)

Sampling from Distributions

Once you have created a distribution, you can easily generate samples from it:

python
# Generate 5 samples from each distribution
normal_samples = normal_dist.sample(5)
uniform_samples = uniform_dist.sample(5)
bernoulli_samples = bernoulli_dist.sample(5)

print("Normal samples:", normal_samples.numpy())
print("Uniform samples:", uniform_samples.numpy())
print("Bernoulli samples:", bernoulli_samples.numpy())

Output:

Normal samples: [ 0.6763483  -0.28694868 -0.09973579  0.8896322  -0.7025261 ]
Uniform samples: [2.7003956 0.8336139 3.8260238 4.1877055 3.1253445]
Bernoulli samples: [1. 1. 1. 0. 1.]

Computing Probabilities

You can compute the probability density function (PDF) or probability mass function (PMF) at specific values:

python
# Compute the PDF for the normal distribution at x=0
normal_pdf_at_0 = normal_dist.prob(0.0)
print(f"Normal PDF at x=0: {normal_pdf_at_0.numpy()}")

# Compute the PDF for the uniform distribution at x=2.5
uniform_pdf_at_2_5 = uniform_dist.prob(2.5)
print(f"Uniform PDF at x=2.5: {uniform_pdf_at_2_5.numpy()}")

# Compute the PMF for the Bernoulli distribution at x=1
bernoulli_pmf_at_1 = bernoulli_dist.prob(1.0)
print(f"Bernoulli PMF at x=1: {bernoulli_pmf_at_1.numpy()}")

Output:

Normal PDF at x=0: 0.3989423
Uniform PDF at x=2.5: 0.2
Bernoulli PMF at x=1: 0.7

Distribution Statistics

TensorFlow Probability distributions provide methods to compute various statistical properties:

python
# Mean and variance of distributions
print(f"Normal mean: {normal_dist.mean().numpy()}, variance: {normal_dist.variance().numpy()}")
print(f"Uniform mean: {uniform_dist.mean().numpy()}, variance: {uniform_dist.variance().numpy()}")
print(f"Bernoulli mean: {bernoulli_dist.mean().numpy()}, variance: {bernoulli_dist.variance().numpy()}")

# Standard deviation
print(f"Normal standard deviation: {normal_dist.stddev().numpy()}")

Output:

Normal mean: 0.0, variance: 1.0
Uniform mean: 2.5, variance: 2.0833333
Bernoulli mean: 0.7, variance: 0.21000001
Normal standard deviation: 1.0

Working with Multivariate Distributions

TensorFlow Probability also supports multivariate distributions:

python
# Create a 2D multivariate normal distribution
mv_normal = tfd.MultivariateNormalDiag(
    loc=[0., 1.],  # Mean vector
    scale_diag=[1., 2.]  # Diagonal of covariance matrix
)

# Sample from the distribution
samples = mv_normal.sample(5)
print("Multivariate normal samples:")
print(samples.numpy())

# Compute the log probability
log_prob = mv_normal.log_prob(samples)
print("Log probabilities:")
print(log_prob.numpy())

Output:

Multivariate normal samples:
[[ 1.7672778   3.3224177 ]
 [ 0.8642473  -0.8449592 ]
 [ 1.428548    0.8098803 ]
 [ 0.7707763   4.784366  ]
 [-0.35763764  1.1392013 ]]
Log probabilities:
[-4.398999  -4.1543307 -2.5179086 -6.6900334 -2.3207066]

Parameterizing Distributions with TensorFlow Variables

One powerful feature of TensorFlow Probability is the ability to parameterize distributions with trainable variables:

python
# Create a trainable Normal distribution
loc = tf.Variable(0.0)
scale = tf.Variable(1.0)
trainable_normal = tfd.Normal(loc=loc, scale=scale)

# Define a loss function
def loss_fn():
    # Log probability of observing the data
    return -tf.reduce_mean(trainable_normal.log_prob([0.5, 0.2, -0.1, 0.0]))

# Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)

# Training loop
for i in range(100):
    optimizer.minimize(loss_fn, var_list=[loc, scale])
    
    if i % 20 == 0:
        print(f"Step {i}: loc = {loc.numpy():.4f}, scale = {scale.numpy():.4f}, loss = {loss_fn().numpy():.4f}")

print(f"Final: loc = {loc.numpy():.4f}, scale = {scale.numpy():.4f}, loss = {loss_fn().numpy():.4f}")

Output:

Step 0: loc = 0.0750, scale = 0.9456, loss = 1.0928
Step 20: loc = 0.1443, scale = 0.2785, loss = 0.6631
Step 40: loc = 0.1498, scale = 0.2572, loss = 0.6173
Step 60: loc = 0.1500, scale = 0.2541, loss = 0.6119
Step 80: loc = 0.1500, scale = 0.2528, loss = 0.6099
Final: loc = 0.1500, scale = 0.2524, loss = 0.6093

Practical Examples

Example 1: Gaussian Mixture Model

Let's create a simple Gaussian mixture model with two components:

python
# Define two Gaussian components
mix_probs = [0.3, 0.7]  # Component weights
component_1 = tfd.Normal(loc=-2.0, scale=0.5)
component_2 = tfd.Normal(loc=1.0, scale=1.0)

# Create the mixture distribution
mixture = tfd.Mixture(
    cat=tfd.Categorical(probs=mix_probs),
    components=[component_1, component_2]
)

# Generate samples
x = np.linspace(-5, 5, 1000)
samples = mixture.sample(5000)

# Plot the results
plt.figure(figsize=(10, 6))
plt.hist(samples.numpy(), bins=50, density=True, alpha=0.5, label='Samples')
plt.plot(x, [mixture.prob(point).numpy() for point in x], 'r-', label='PDF')
plt.title('Gaussian Mixture Model')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.grid(True)

Example 2: Bayesian Linear Regression

Now let's implement a simple Bayesian linear regression example:

python
# Generate synthetic data
true_slope = 0.5
true_intercept = 1.2
true_noise = 0.3

NUM_EXAMPLES = 100
x = np.linspace(-3, 3, NUM_EXAMPLES)
y = true_intercept + true_slope * x + np.random.normal(0, true_noise, NUM_EXAMPLES)

# Define the model
def bayesian_linear_regression():
    # Define priors
    slope = tfd.Normal(loc=0., scale=1.)
    intercept = tfd.Normal(loc=0., scale=1.)
    noise = tfd.HalfNormal(scale=1.)
    
    # Define the linear model
    y_pred = intercept + slope * x
    
    # Define the likelihood
    likelihood = tfd.Normal(loc=y_pred, scale=noise)
    
    # Observe data
    return likelihood.log_prob(y)

# Perform variational inference
@tf.function
def train_step(optimizer, variational_model):
    with tf.GradientTape() as tape:
        loss = -variational_model.target_log_prob()
    grads = tape.gradient(loss, variational_model.trainable_variables)
    optimizer.apply_gradients(zip(grads, variational_model.trainable_variables))
    return loss

# Create surrogate posterior using variational inference
surrogate_posterior = tfp.experimental.vi.build_factored_surrogate_posterior(
    event_shape=[1, 1, 1],
    constraining_bijectors=[
        tfp.bijectors.Identity(),  # slope
        tfp.bijectors.Identity(),  # intercept
        tfp.bijectors.Softplus(),  # noise (must be positive)
    ]
)

# Define the variational model
target_log_prob_fn = lambda slope, intercept, noise: tf.reduce_sum(
    tfd.Normal(loc=intercept + slope * x[:, tf.newaxis], scale=noise).log_prob(y[:, tf.newaxis]) +
    tfd.Normal(loc=0., scale=1.).log_prob(slope) +
    tfd.Normal(loc=0., scale=1.).log_prob(intercept) +
    tfd.HalfNormal(scale=1.).log_prob(noise)
)

# Set up optimizer
optimizer = tf.optimizers.Adam(learning_rate=0.05)

# Variational inference
losses = []
num_steps = 500

# Training loop
for step in range(num_steps):
    loss = tfp.vi.fit_surrogate_posterior(
        target_log_prob_fn,
        surrogate_posterior,
        optimizer=optimizer,
        num_steps=1
    )
    if step % 100 == 0:
        losses.append(loss)
        print(f'Step {step}: Loss = {loss}')

# Extract the learned parameters
q_slope = surrogate_posterior.parameters[0]
q_intercept = surrogate_posterior.parameters[1]
q_noise = surrogate_posterior.parameters[2]

print(f'Learned slope: mean = {q_slope[0].numpy()[0]}, std = {tf.exp(q_slope[1]).numpy()[0]}')
print(f'Learned intercept: mean = {q_intercept[0].numpy()[0]}, std = {tf.exp(q_intercept[1]).numpy()[0]}')
print(f'Learned noise: mean = {q_noise[0].numpy()[0]}, std = {tf.exp(q_noise[1]).numpy()[0]}')
print(f'True values: slope = {true_slope}, intercept = {true_intercept}, noise = {true_noise}')

Transformations and Compositions

TensorFlow Probability also allows you to transform distributions and create complex compositions:

python
# Create a normal distribution
base_dist = tfd.Normal(loc=0.0, scale=1.0)

# Apply a transformation to create a log-normal distribution
log_normal = tfd.TransformedDistribution(
    distribution=base_dist,
    bijector=tfp.bijectors.Exp()
)

# Sample from both distributions
normal_samples = base_dist.sample(1000)
log_normal_samples = log_normal.sample(1000)

# Compare the samples
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(normal_samples.numpy(), bins=30, alpha=0.7)
plt.title('Normal Distribution')

plt.subplot(1, 2, 2)
plt.hist(log_normal_samples.numpy(), bins=30, alpha=0.7)
plt.title('Log-Normal Distribution')

plt.tight_layout()

Common Distributions in TensorFlow Probability

Here's a reference of some commonly used distributions in TFP:

Continuous Distributions
- tfd.Normal: Gaussian distribution
- tfd.Uniform: Uniform distribution
- tfd.Beta: Beta distribution
- tfd.Gamma: Gamma distribution
- tfd.Exponential: Exponential distribution
- tfd.StudentT: Student's t-distribution
- tfd.Laplace: Laplace distribution
Discrete Distributions
- tfd.Bernoulli: Bernoulli distribution (binary outcomes)
- tfd.Categorical: Categorical distribution (discrete outcomes)
- tfd.Poisson: Poisson distribution (count data)
- tfd.Binomial: Binomial distribution (number of successes)
- tfd.NegativeBinomial: Negative Binomial distribution
Multivariate Distributions
- tfd.MultivariateNormalDiag: Multivariate normal with diagonal covariance
- tfd.MultivariateNormalTriL: Multivariate normal with full covariance
- tfd.Dirichlet: Dirichlet distribution
- tfd.Multinomial: Multinomial distribution
Joint Distributions
- tfd.JointDistributionSequential: Builds joint distributions from sequences
- tfd.JointDistributionNamed: Builds joint distributions using dictionaries

Summary

In this tutorial, you've learned about TensorFlow Probability's distributions module, which provides powerful tools for working with probability distributions in your machine learning models. We covered:

Creating basic distributions
Sampling from distributions
Computing probabilities and statistics
Working with multivariate distributions
Parameterizing distributions with trainable variables
Building mixture models and Bayesian regression
Transforming and composing distributions

Probability distributions are fundamental to many advanced machine learning techniques, including Bayesian inference, generative modeling, and reinforcement learning. TensorFlow Probability makes these techniques more accessible by providing optimized implementations that integrate seamlessly with the broader TensorFlow ecosystem.

Additional Resources

TensorFlow Probability Documentation
TFP Distributions Guide
Probabilistic ML for Hackers
Book: "Bayesian Methods for Hackers" by Cameron Davidson-Pilon

Exercises

Create a Beta distribution and visualize its PDF for different parameters.
Build a Poisson distribution to model count data. Sample from it and compute the mean and variance.
Implement a simple Bayesian model to estimate the bias of a coin from observed flips.
Create a mixture of three normal distributions and visualize its probability density function.
Use tfp.distributions.Categorical to implement a simple discrete choice model.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Probability Distributions​

Getting Started with TensorFlow Distributions​

Creating Basic Distributions​

Sampling from Distributions​

Computing Probabilities​

Distribution Statistics​

Working with Multivariate Distributions​

Parameterizing Distributions with TensorFlow Variables​

Practical Examples​

Example 1: Gaussian Mixture Model​

Example 2: Bayesian Linear Regression​

Transformations and Compositions​

Common Distributions in TensorFlow Probability​

Summary​

Additional Resources​

Exercises​

Introduction

Understanding Probability Distributions

Getting Started with TensorFlow Distributions

Creating Basic Distributions

Sampling from Distributions

Computing Probabilities

Distribution Statistics

Working with Multivariate Distributions

Parameterizing Distributions with TensorFlow Variables

Practical Examples

Example 1: Gaussian Mixture Model

Example 2: Bayesian Linear Regression

Transformations and Compositions

Common Distributions in TensorFlow Probability

Summary

Additional Resources

Exercises