PyTorch Bayesian Neural Networks

Introduction

Traditional neural networks provide point estimates of predictions without quantifying uncertainty. Bayesian Neural Networks (BNNs) solve this limitation by treating model weights as probability distributions rather than fixed values. This approach enables us to quantify uncertainty in predictions, which is crucial for applications where decision-making relies on model confidence, such as medical diagnostics, autonomous driving, and financial modeling.

In this tutorial, you'll learn:

The fundamentals of Bayesian Neural Networks
How to implement BNNs using PyTorch
Ways to visualize uncertainty in predictions
Practical applications of BNNs

Prerequisites

To follow this tutorial, you should have:

Basic understanding of PyTorch and neural networks
Familiarity with probability concepts (Bayesian inference)
Python environment with PyTorch installed

If you need to install PyTorch:

pip install torch torchvision

Understanding Bayesian Neural Networks

Traditional Neural Networks vs. Bayesian Neural Networks

In traditional neural networks, weights are treated as fixed parameters determined during training. In contrast, Bayesian Neural Networks treat weights as random variables with probability distributions.

Traditional vs Bayesian Neural Networks

Here's the key difference:

Traditional NN: Output = f(input; weights)
Bayesian NN: Output ~ f(input; weights distribution)

This probabilistic approach offers several advantages:

Uncertainty quantification in predictions
Automatic regularization
Robustness to overfitting
Better handling of small datasets

Implementing Bayesian Neural Networks in PyTorch

We'll implement BNNs in PyTorch using different approaches. First, let's create a basic Bayesian layer that replaces traditional deterministic layers.

Approach 1: Using Bayesian Layers with MC Dropout

Monte Carlo Dropout is the simplest way to approximate a Bayesian Neural Network:

import torch
import torch.nn as nn
import torch.nn.functional as F

class MCDropout(nn.Module):
    def __init__(self, p=0.5):
        super(MCDropout, self).__init__()
        self.p = p
        
    def forward(self, x):
        return F.dropout(x, p=self.p, training=True)

class BayesianNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, dropout_prob=0.5):
        super(BayesianNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.dropout1 = MCDropout(dropout_prob)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.dropout2 = MCDropout(dropout_prob)
        self.fc3 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout1(x)
        x = F.relu(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

Approach 2: Implementing a Bayesian Layer with Weight Distributions

For a more formal Bayesian treatment, we can implement layers with explicit weight distributions:

class BayesianLinear(nn.Module):
    def __init__(self, in_features, out_features, prior_sigma_1=1.0, prior_sigma_2=0.1, prior_pi=0.5):
        super(BayesianLinear, self).__init__()
        
        # Define parameters
        self.in_features = in_features
        self.out_features = out_features
        
        # Weight mean parameters
        self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features).normal_(0, 0.1))
        
        # Weight rho parameters (for variance)
        self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-3, -2))
        
        # Bias mean parameters
        self.bias_mu = nn.Parameter(torch.Tensor(out_features).normal_(0, 0.1))
        
        # Bias rho parameters (for variance)
        self.bias_rho = nn.Parameter(torch.Tensor(out_features).uniform_(-3, -2))
        
        # Prior distributions
        self.prior_sigma_1 = prior_sigma_1
        self.prior_sigma_2 = prior_sigma_2
        self.prior_pi = prior_pi
        
        # Initialize log variational posterior
        self.log_variational_posterior = 0.0
        self.log_prior = 0.0
        
    def forward(self, x):
        # Sample weights from variational posterior
        weight_epsilon = torch.randn_like(self.weight_mu)
        weight_sigma = torch.log1p(torch.exp(self.weight_rho))
        weight = self.weight_mu + weight_epsilon * weight_sigma
        
        # Sample bias from variational posterior
        bias_epsilon = torch.randn_like(self.bias_mu)
        bias_sigma = torch.log1p(torch.exp(self.bias_rho))
        bias = self.bias_mu + bias_epsilon * bias_sigma
        
        # Calculate KL divergence
        self._calculate_kl(weight, bias, weight_sigma, bias_sigma)
        
        # Linear transformation
        return F.linear(x, weight, bias)
    
    def _calculate_kl(self, weight, bias, weight_sigma, bias_sigma):
        # Weight KL divergence
        weight_log_posterior = log_gaussian(weight, self.weight_mu, weight_sigma)
        weight_log_prior = log_gaussian_mixture(weight, 0.0, self.prior_sigma_1, self.prior_sigma_2, self.prior_pi)
        
        # Bias KL divergence
        bias_log_posterior = log_gaussian(bias, self.bias_mu, bias_sigma)
        bias_log_prior = log_gaussian_mixture(bias, 0.0, self.prior_sigma_1, self.prior_sigma_2, self.prior_pi)
        
        self.log_variational_posterior = weight_log_posterior + bias_log_posterior
        self.log_prior = weight_log_prior + bias_log_prior

# Helper functions for log probabilities
def log_gaussian(x, mu, sigma):
    return -0.5 * torch.log(2 * torch.tensor(3.1415)) - torch.log(sigma) - (x - mu)**2 / (2 * sigma**2)

def log_gaussian_mixture(x, mu, sigma1, sigma2, pi):
    first_gaussian = torch.exp(log_gaussian(x, mu, sigma1)) * pi
    second_gaussian = torch.exp(log_gaussian(x, mu, sigma2)) * (1 - pi)
    return torch.log(first_gaussian + second_gaussian)

Training a Bayesian Neural Network

Training a BNN requires a special approach to incorporate both the data likelihood and the KL divergence between the posterior and prior distributions:

# Full BNN model using our Bayesian layers
class BNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(BNN, self).__init__()
        self.layer1 = BayesianLinear(input_size, hidden_size)
        self.layer2 = BayesianLinear(hidden_size, hidden_size)
        self.layer3 = BayesianLinear(hidden_size, output_size)
        
    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.layer3(x)
        return x
        
    def log_prior(self):
        return self.layer1.log_prior + self.layer2.log_prior + self.layer3.log_prior
    
    def log_variational_posterior(self):
        return self.layer1.log_variational_posterior + self.layer2.log_variational_posterior + self.layer3.log_variational_posterior
    
    def sample_elbo(self, input, target, samples=1):
        outputs = torch.zeros(samples, target.shape[0], target.shape[1])
        log_priors = torch.zeros(samples)
        log_variational_posteriors = torch.zeros(samples)
        
        for i in range(samples):
            outputs[i] = self(input)
            log_priors[i] = self.log_prior()
            log_variational_posteriors[i] = self.log_variational_posterior()
            
        # Negative log likelihood
        log_likelihood = -F.mse_loss(outputs.mean(0), target, reduction='sum')
        
        # Calculate ELBO
        loss = log_likelihood - (log_variational_posteriors - log_priors).mean()
        
        return loss

Here's how to train the model:

import torch.optim as optim
import numpy as np

# Generate some dummy data
np.random.seed(42)
X = torch.FloatTensor(np.random.normal(0, 1, (100, 10)))
y_true = torch.FloatTensor(np.random.normal(0, 1, (100, 1)))

# Initialize model
model = BNN(input_size=10, hidden_size=20, output_size=1)
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(500):
    optimizer.zero_grad()
    
    # Calculate ELBO
    loss = -model.sample_elbo(X, y_true, samples=3)
    
    loss.backward()
    optimizer.step()
    
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

# Output (example):
# Epoch 0, Loss: -623.4569702148438
# Epoch 100, Loss: -726.1539916992188
# Epoch 200, Loss: -782.8802490234375
# Epoch 300, Loss: -812.89306640625
# Epoch 400, Loss: -834.6373291015625

Making Predictions with Uncertainty Estimates

Once your BNN is trained, you can generate predictions with uncertainty estimates:

def predict_with_uncertainty(model, input_data, num_samples=100):
    predictions = []
    model.eval()  # Set to evaluation mode
    
    with torch.no_grad():
        for _ in range(num_samples):
            prediction = model(input_data)
            predictions.append(prediction)
    
    # Convert predictions to tensor
    predictions = torch.stack(predictions)
    
    # Calculate mean and standard deviation
    mean_prediction = predictions.mean(dim=0)
    std_prediction = predictions.std(dim=0)
    
    return mean_prediction, std_prediction

# Generate a test point
x_test = torch.FloatTensor(np.random.normal(0, 1, (1, 10)))

# Predict with uncertainty
mean, std = predict_with_uncertainty(model, x_test)
print(f"Mean prediction: {mean.item():.4f}")
print(f"Standard deviation (uncertainty): {std.item():.4f}")

# Output (example):
# Mean prediction: 0.1245
# Standard deviation (uncertainty): 0.3281

Visualizing Uncertainty

Let's visualize how our BNN's predictions change for different input values:

import matplotlib.pyplot as plt

# Generate test data points (1D for easy visualization)
x = torch.linspace(-3, 3, 100).reshape(-1, 1)

# Simple 1D BNN for visualization
simple_bnn = BNN(input_size=1, hidden_size=20, output_size=1)
optimizer = optim.Adam(simple_bnn.parameters(), lr=0.01)

# Generate training data (sine function with noise)
x_train = torch.linspace(-3, 3, 50).reshape(-1, 1)
y_train = torch.sin(x_train) + 0.1 * torch.randn(x_train.size())

# Train for a few epochs
for epoch in range(1000):
    optimizer.zero_grad()
    loss = -simple_bnn.sample_elbo(x_train, y_train, samples=3)
    loss.backward()
    optimizer.step()

# Generate predictions with uncertainty
means = []
stds = []

for i in range(len(x)):
    mean, std = predict_with_uncertainty(simple_bnn, x[i:i+1], num_samples=100)
    means.append(mean.item())
    stds.append(std.item())

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(x.numpy(), means, 'b-', label='Prediction')
plt.fill_between(x.numpy().flatten(), 
                 np.array(means) - 2 * np.array(stds),
                 np.array(means) + 2 * np.array(stds),
                 alpha=0.2, color='b', label='Uncertainty (2σ)')
plt.plot(x_train.numpy(), y_train.numpy(), 'ro', label='Training Data')
plt.plot(x.numpy(), np.sin(x.numpy()), 'g--', label='True Function')
plt.legend()
plt.title('BNN Prediction with Uncertainty')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()

Practical Applications

1. Medical Image Classification with Uncertainty

When diagnosing diseases from medical images, uncertainty quantification is critical:

# Example of model calling for medical image diagnosis with uncertainty
def diagnose_with_confidence(image, model, threshold=0.7):
    mean_pred, std_pred = predict_with_uncertainty(model, image, num_samples=50)
    
    # Convert to probability
    prob = torch.sigmoid(mean_pred)
    
    # Decision with certainty measure
    if prob > 0.5:
        diagnosis = "Positive"
        confidence = prob.item()
    else:
        diagnosis = "Negative"
        confidence = (1 - prob).item()
    
    # Flag for seeking additional opinion
    needs_review = std_pred.item() > threshold or confidence < 0.8
    
    return {
        "diagnosis": diagnosis,
        "confidence": confidence,
        "uncertainty": std_pred.item(),
        "needs_expert_review": needs_review
    }

2. Active Learning with BNNs

Bayesian Neural Networks are excellent for active learning, where models request labels for the most informative examples:

def select_samples_for_labeling(unlabeled_pool, model, n_samples=10):
    """Select the most uncertain samples for labeling"""
    uncertainties = []
    
    for sample in unlabeled_pool:
        _, std = predict_with_uncertainty(model, sample.unsqueeze(0))
        uncertainties.append(std.item())
    
    # Find indices of the n most uncertain samples
    indices = np.argsort(uncertainties)[-n_samples:]
    
    return indices

3. Reinforcement Learning with Risk Assessment

In reinforcement learning with high stakes (like autonomous driving), BNNs help avoid risky actions:

def safe_action_selection(state, model, risk_tolerance=0.1):
    """Choose actions considering both reward and uncertainty"""
    actions = torch.tensor([[0.0], [0.5], [1.0]])  # Possible actions
    
    means = []
    stds = []
    
    # Evaluate each action
    for action in actions:
        state_action = torch.cat([state, action], dim=1)
        mean, std = predict_with_uncertainty(model, state_action)
        means.append(mean.item())
        stds.append(std.item())
    
    # Calculate upper confidence bound (UCB)
    ucb_scores = np.array(means) - risk_tolerance * np.array(stds)
    
    # Choose the action with the highest UCB score
    best_action_idx = np.argmax(ucb_scores)
    
    return actions[best_action_idx], ucb_scores[best_action_idx]

Summary

Bayesian Neural Networks provide a powerful framework for uncertainty quantification in deep learning. In this tutorial, you learned:

The fundamental difference between traditional and Bayesian neural networks
How to implement Bayesian layers in PyTorch
Training strategies for BNNs using variational inference
Making predictions with uncertainty estimates
Visualizing uncertainty in predictions
Real-world applications of BNNs

By incorporating uncertainty into your models, you can build more robust and reliable AI systems for critical applications where understanding confidence is essential.

Further Resources and Exercises

Additional Resources

Bayesian Methods for Machine Learning on Coursera
Weight Uncertainty in Neural Networks paper by Blundell et al.
PyTorch Bayesian Layers Documentation from Pyro

Exercises

Basic Exercise: Implement a Bayesian Neural Network for the MNIST dataset and compare its uncertainty on in-distribution vs. out-of-distribution samples.
Intermediate Exercise: Create a BNN for regression that captures both aleatoric (data) and epistemic (model) uncertainty.
Advanced Exercise: Build an active learning system using your BNN that selects which data points to label based on uncertainty.
Challenge: Implement a Bayesian Convolutional Neural Network and visualize uncertainty in image segmentation tasks.

Happy learning, and remember that uncertainty is not a bug but a feature of good machine learning systems!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Understanding Bayesian Neural Networks​

Traditional Neural Networks vs. Bayesian Neural Networks​

Implementing Bayesian Neural Networks in PyTorch​

Approach 1: Using Bayesian Layers with MC Dropout​

Approach 2: Implementing a Bayesian Layer with Weight Distributions​

Training a Bayesian Neural Network​

Making Predictions with Uncertainty Estimates​

Visualizing Uncertainty​

Practical Applications​

1. Medical Image Classification with Uncertainty​

2. Active Learning with BNNs​

3. Reinforcement Learning with Risk Assessment​

Summary​

Further Resources and Exercises​

Additional Resources​

Exercises​