PyTorch Bayesian Neural Networks
Introduction
Traditional neural networks provide point estimates of predictions without quantifying uncertainty. Bayesian Neural Networks (BNNs) solve this limitation by treating model weights as probability distributions rather than fixed values. This approach enables us to quantify uncertainty in predictions, which is crucial for applications where decision-making relies on model confidence, such as medical diagnostics, autonomous driving, and financial modeling.
In this tutorial, you'll learn:
- The fundamentals of Bayesian Neural Networks
- How to implement BNNs using PyTorch
- Ways to visualize uncertainty in predictions
- Practical applications of BNNs
Prerequisites
To follow this tutorial, you should have:
- Basic understanding of PyTorch and neural networks
- Familiarity with probability concepts (Bayesian inference)
- Python environment with PyTorch installed
If you need to install PyTorch:
pip install torch torchvision
Understanding Bayesian Neural Networks
Traditional Neural Networks vs. Bayesian Neural Networks
In traditional neural networks, weights are treated as fixed parameters determined during training. In contrast, Bayesian Neural Networks treat weights as random variables with probability distributions.
Here's the key difference:
- Traditional NN: Output = f(input; weights)
- Bayesian NN: Output ~ f(input; weights distribution)
This probabilistic approach offers several advantages:
- Uncertainty quantification in predictions
- Automatic regularization
- Robustness to overfitting
- Better handling of small datasets
Implementing Bayesian Neural Networks in PyTorch
We'll implement BNNs in PyTorch using different approaches. First, let's create a basic Bayesian layer that replaces traditional deterministic layers.
Approach 1: Using Bayesian Layers with MC Dropout
Monte Carlo Dropout is the simplest way to approximate a Bayesian Neural Network:
import torch
import torch.nn as nn
import torch.nn.functional as F
class MCDropout(nn.Module):
def __init__(self, p=0.5):
super(MCDropout, self).__init__()
self.p = p
def forward(self, x):
return F.dropout(x, p=self.p, training=True)
class BayesianNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, dropout_prob=0.5):
super(BayesianNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.dropout1 = MCDropout(dropout_prob)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.dropout2 = MCDropout(dropout_prob)
self.fc3 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout1(x)
x = F.relu(self.fc2(x))
x = self.dropout2(x)
x = self.fc3(x)
return x
Approach 2: Implementing a Bayesian Layer with Weight Distributions
For a more formal Bayesian treatment, we can implement layers with explicit weight distributions:
class BayesianLinear(nn.Module):
def __init__(self, in_features, out_features, prior_sigma_1=1.0, prior_sigma_2=0.1, prior_pi=0.5):
super(BayesianLinear, self).__init__()
# Define parameters
self.in_features = in_features
self.out_features = out_features
# Weight mean parameters
self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features).normal_(0, 0.1))
# Weight rho parameters (for variance)
self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-3, -2))
# Bias mean parameters
self.bias_mu = nn.Parameter(torch.Tensor(out_features).normal_(0, 0.1))
# Bias rho parameters (for variance)
self.bias_rho = nn.Parameter(torch.Tensor(out_features).uniform_(-3, -2))
# Prior distributions
self.prior_sigma_1 = prior_sigma_1
self.prior_sigma_2 = prior_sigma_2
self.prior_pi = prior_pi
# Initialize log variational posterior
self.log_variational_posterior = 0.0
self.log_prior = 0.0
def forward(self, x):
# Sample weights from variational posterior
weight_epsilon = torch.randn_like(self.weight_mu)
weight_sigma = torch.log1p(torch.exp(self.weight_rho))
weight = self.weight_mu + weight_epsilon * weight_sigma
# Sample bias from variational posterior
bias_epsilon = torch.randn_like(self.bias_mu)
bias_sigma = torch.log1p(torch.exp(self.bias_rho))
bias = self.bias_mu + bias_epsilon * bias_sigma
# Calculate KL divergence
self._calculate_kl(weight, bias, weight_sigma, bias_sigma)
# Linear transformation
return F.linear(x, weight, bias)
def _calculate_kl(self, weight, bias, weight_sigma, bias_sigma):
# Weight KL divergence
weight_log_posterior = log_gaussian(weight, self.weight_mu, weight_sigma)
weight_log_prior = log_gaussian_mixture(weight, 0.0, self.prior_sigma_1, self.prior_sigma_2, self.prior_pi)
# Bias KL divergence
bias_log_posterior = log_gaussian(bias, self.bias_mu, bias_sigma)
bias_log_prior = log_gaussian_mixture(bias, 0.0, self.prior_sigma_1, self.prior_sigma_2, self.prior_pi)
self.log_variational_posterior = weight_log_posterior + bias_log_posterior
self.log_prior = weight_log_prior + bias_log_prior
# Helper functions for log probabilities
def log_gaussian(x, mu, sigma):
return -0.5 * torch.log(2 * torch.tensor(3.1415)) - torch.log(sigma) - (x - mu)**2 / (2 * sigma**2)
def log_gaussian_mixture(x, mu, sigma1, sigma2, pi):
first_gaussian = torch.exp(log_gaussian(x, mu, sigma1)) * pi
second_gaussian = torch.exp(log_gaussian(x, mu, sigma2)) * (1 - pi)
return torch.log(first_gaussian + second_gaussian)
Training a Bayesian Neural Network
Training a BNN requires a special approach to incorporate both the data likelihood and the KL divergence between the posterior and prior distributions:
# Full BNN model using our Bayesian layers
class BNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(BNN, self).__init__()
self.layer1 = BayesianLinear(input_size, hidden_size)
self.layer2 = BayesianLinear(hidden_size, hidden_size)
self.layer3 = BayesianLinear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.layer1(x))
x = F.relu(self.layer2(x))
x = self.layer3(x)
return x
def log_prior(self):
return self.layer1.log_prior + self.layer2.log_prior + self.layer3.log_prior
def log_variational_posterior(self):
return self.layer1.log_variational_posterior + self.layer2.log_variational_posterior + self.layer3.log_variational_posterior
def sample_elbo(self, input, target, samples=1):
outputs = torch.zeros(samples, target.shape[0], target.shape[1])
log_priors = torch.zeros(samples)
log_variational_posteriors = torch.zeros(samples)
for i in range(samples):
outputs[i] = self(input)
log_priors[i] = self.log_prior()
log_variational_posteriors[i] = self.log_variational_posterior()
# Negative log likelihood
log_likelihood = -F.mse_loss(outputs.mean(0), target, reduction='sum')
# Calculate ELBO
loss = log_likelihood - (log_variational_posteriors - log_priors).mean()
return loss
Here's how to train the model:
import torch.optim as optim
import numpy as np
# Generate some dummy data
np.random.seed(42)
X = torch.FloatTensor(np.random.normal(0, 1, (100, 10)))
y_true = torch.FloatTensor(np.random.normal(0, 1, (100, 1)))
# Initialize model
model = BNN(input_size=10, hidden_size=20, output_size=1)
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
for epoch in range(500):
optimizer.zero_grad()
# Calculate ELBO
loss = -model.sample_elbo(X, y_true, samples=3)
loss.backward()
optimizer.step()
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {loss.item()}")
# Output (example):
# Epoch 0, Loss: -623.4569702148438
# Epoch 100, Loss: -726.1539916992188
# Epoch 200, Loss: -782.8802490234375
# Epoch 300, Loss: -812.89306640625
# Epoch 400, Loss: -834.6373291015625
Making Predictions with Uncertainty Estimates
Once your BNN is trained, you can generate predictions with uncertainty estimates:
def predict_with_uncertainty(model, input_data, num_samples=100):
predictions = []
model.eval() # Set to evaluation mode
with torch.no_grad():
for _ in range(num_samples):
prediction = model(input_data)
predictions.append(prediction)
# Convert predictions to tensor
predictions = torch.stack(predictions)
# Calculate mean and standard deviation
mean_prediction = predictions.mean(dim=0)
std_prediction = predictions.std(dim=0)
return mean_prediction, std_prediction
# Generate a test point
x_test = torch.FloatTensor(np.random.normal(0, 1, (1, 10)))
# Predict with uncertainty
mean, std = predict_with_uncertainty(model, x_test)
print(f"Mean prediction: {mean.item():.4f}")
print(f"Standard deviation (uncertainty): {std.item():.4f}")
# Output (example):
# Mean prediction: 0.1245
# Standard deviation (uncertainty): 0.3281
Visualizing Uncertainty
Let's visualize how our BNN's predictions change for different input values:
import matplotlib.pyplot as plt
# Generate test data points (1D for easy visualization)
x = torch.linspace(-3, 3, 100).reshape(-1, 1)
# Simple 1D BNN for visualization
simple_bnn = BNN(input_size=1, hidden_size=20, output_size=1)
optimizer = optim.Adam(simple_bnn.parameters(), lr=0.01)
# Generate training data (sine function with noise)
x_train = torch.linspace(-3, 3, 50).reshape(-1, 1)
y_train = torch.sin(x_train) + 0.1 * torch.randn(x_train.size())
# Train for a few epochs
for epoch in range(1000):
optimizer.zero_grad()
loss = -simple_bnn.sample_elbo(x_train, y_train, samples=3)
loss.backward()
optimizer.step()
# Generate predictions with uncertainty
means = []
stds = []
for i in range(len(x)):
mean, std = predict_with_uncertainty(simple_bnn, x[i:i+1], num_samples=100)
means.append(mean.item())
stds.append(std.item())
# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(x.numpy(), means, 'b-', label='Prediction')
plt.fill_between(x.numpy().flatten(),
np.array(means) - 2 * np.array(stds),
np.array(means) + 2 * np.array(stds),
alpha=0.2, color='b', label='Uncertainty (2σ)')
plt.plot(x_train.numpy(), y_train.numpy(), 'ro', label='Training Data')
plt.plot(x.numpy(), np.sin(x.numpy()), 'g--', label='True Function')
plt.legend()
plt.title('BNN Prediction with Uncertainty')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
Practical Applications
1. Medical Image Classification with Uncertainty
When diagnosing diseases from medical images, uncertainty quantification is critical:
# Example of model calling for medical image diagnosis with uncertainty
def diagnose_with_confidence(image, model, threshold=0.7):
mean_pred, std_pred = predict_with_uncertainty(model, image, num_samples=50)
# Convert to probability
prob = torch.sigmoid(mean_pred)
# Decision with certainty measure
if prob > 0.5:
diagnosis = "Positive"
confidence = prob.item()
else:
diagnosis = "Negative"
confidence = (1 - prob).item()
# Flag for seeking additional opinion
needs_review = std_pred.item() > threshold or confidence < 0.8
return {
"diagnosis": diagnosis,
"confidence": confidence,
"uncertainty": std_pred.item(),
"needs_expert_review": needs_review
}
2. Active Learning with BNNs
Bayesian Neural Networks are excellent for active learning, where models request labels for the most informative examples:
def select_samples_for_labeling(unlabeled_pool, model, n_samples=10):
"""Select the most uncertain samples for labeling"""
uncertainties = []
for sample in unlabeled_pool:
_, std = predict_with_uncertainty(model, sample.unsqueeze(0))
uncertainties.append(std.item())
# Find indices of the n most uncertain samples
indices = np.argsort(uncertainties)[-n_samples:]
return indices
3. Reinforcement Learning with Risk Assessment
In reinforcement learning with high stakes (like autonomous driving), BNNs help avoid risky actions:
def safe_action_selection(state, model, risk_tolerance=0.1):
"""Choose actions considering both reward and uncertainty"""
actions = torch.tensor([[0.0], [0.5], [1.0]]) # Possible actions
means = []
stds = []
# Evaluate each action
for action in actions:
state_action = torch.cat([state, action], dim=1)
mean, std = predict_with_uncertainty(model, state_action)
means.append(mean.item())
stds.append(std.item())
# Calculate upper confidence bound (UCB)
ucb_scores = np.array(means) - risk_tolerance * np.array(stds)
# Choose the action with the highest UCB score
best_action_idx = np.argmax(ucb_scores)
return actions[best_action_idx], ucb_scores[best_action_idx]
Summary
Bayesian Neural Networks provide a powerful framework for uncertainty quantification in deep learning. In this tutorial, you learned:
- The fundamental difference between traditional and Bayesian neural networks
- How to implement Bayesian layers in PyTorch
- Training strategies for BNNs using variational inference
- Making predictions with uncertainty estimates
- Visualizing uncertainty in predictions
- Real-world applications of BNNs
By incorporating uncertainty into your models, you can build more robust and reliable AI systems for critical applications where understanding confidence is essential.
Further Resources and Exercises
Additional Resources
- Bayesian Methods for Machine Learning on Coursera
- Weight Uncertainty in Neural Networks paper by Blundell et al.
- PyTorch Bayesian Layers Documentation from Pyro
Exercises
-
Basic Exercise: Implement a Bayesian Neural Network for the MNIST dataset and compare its uncertainty on in-distribution vs. out-of-distribution samples.
-
Intermediate Exercise: Create a BNN for regression that captures both aleatoric (data) and epistemic (model) uncertainty.
-
Advanced Exercise: Build an active learning system using your BNN that selects which data points to label based on uncertainty.
-
Challenge: Implement a Bayesian Convolutional Neural Network and visualize uncertainty in image segmentation tasks.
Happy learning, and remember that uncertainty is not a bug but a feature of good machine learning systems!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)