PyTorch Jacobian

Introduction

The Jacobian matrix is a fundamental concept in multivariate calculus that plays a crucial role in deep learning and optimization algorithms. In PyTorch, computing the Jacobian matrix is an important operation when working with vector-valued functions or when you need to understand how multiple outputs change with respect to multiple inputs.

This tutorial will guide you through the process of understanding and computing Jacobian matrices in PyTorch using its autograd functionality. We'll start with the basic concepts and gradually move toward more practical applications.

What is a Jacobian Matrix?

Before diving into code, let's understand what a Jacobian matrix is:

The Jacobian matrix represents all first-order partial derivatives of a vector-valued function. If you have a function F that maps from Rⁿ to Rᵐ (takes n inputs and produces m outputs), the Jacobian is an m×n matrix where each element (i,j) is the partial derivative of the i-th output with respect to the j-th input.

Mathematically, for a function F(x) = [F₁(x), F₂(x), ..., Fₘ(x)] where x = [x₁, x₂, ..., xₙ], the Jacobian J is:

J = [
    [∂F₁/∂x₁, ∂F₁/∂x₂, ..., ∂F₁/∂xₙ],
    [∂F₂/∂x₁, ∂F₂/∂x₂, ..., ∂F₂/∂xₙ],
    ...
    [∂Fₘ/∂x₁, ∂Fₘ/∂x₂, ..., ∂Fₘ/∂xₙ]
]

Computing Jacobian in PyTorch

PyTorch doesn't have a direct built-in function called jacobian(), but we can compute the Jacobian matrix using autograd's capabilities in several ways:

Method 1: Manual Computation with `grad`

The most basic approach is to compute the gradient of each output with respect to all inputs:

import torch

def compute_jacobian_manually(inputs, outputs):
    """
    Compute the Jacobian matrix manually.
    
    Args:
        inputs: Input tensor that requires gradients
        outputs: Output tensor
        
    Returns:
        Jacobian matrix of shape (outputs_size, inputs_size)
    """
    jacobian = []
    for i in range(outputs.size(0)):
        # Zero all previous gradients
        if inputs.grad is not None:
            inputs.grad.zero_()
        
        # Compute gradient of the i-th output with respect to inputs
        grad_output = torch.zeros_like(outputs)
        grad_output[i] = 1.0
        outputs.backward(grad_output, retain_graph=True)
        
        # Store the gradients
        jacobian.append(inputs.grad.clone())
    
    # Stack all gradients to form the Jacobian matrix
    return torch.stack(jacobian)

# Example usage
x = torch.randn(3, requires_grad=True)  # Input vector of size 3
y = torch.tensor([x[0]**2, x[1]**3, x[2]**4])  # Output vector of size 3

jacobian = compute_jacobian_manually(x, y)
print("Input x:", x)
print("Output y:", y)
print("Jacobian matrix:")
print(jacobian)

Example Output:

Input x: tensor([-0.5366,  0.9893, -0.2870], requires_grad=True)
Output y: tensor([ 0.2879,  0.9682,  0.0068])
Jacobian matrix:
tensor([[-1.0731,  0.0000,  0.0000],
        [ 0.0000,  2.9362,  0.0000],
        [ 0.0000,  0.0000, -0.9593]])

Method 2: Using `torch.autograd.functional.jacobian`

Since PyTorch 1.5.0, a more convenient way to compute the Jacobian is by using the torch.autograd.functional.jacobian function:

import torch
from torch.autograd.functional import jacobian

# Define a vector-valued function
def f(x):
    return torch.tensor([x[0]**2, x[1]**3, x[2]**4])

# Create an input tensor
x = torch.randn(3)
print("Input x:", x)

# Compute the Jacobian matrix
j = jacobian(f, x)
print("Jacobian matrix using torch.autograd.functional.jacobian:")
print(j)

Example Output:

Input x: tensor([0.8579, 0.3352, 0.6698])
Jacobian matrix using torch.autograd.functional.jacobian:
tensor([[1.7158, 0.0000, 0.0000],
        [0.0000, 1.0110, 0.0000],
        [0.0000, 0.0000, 8.0221]])

Method 3: Using `torch.autograd.grad` for Batch Processing

For more complex scenarios, such as when dealing with batches, we can use torch.autograd.grad:

import torch

def compute_jacobian_with_grad(outputs, inputs):
    """
    Compute Jacobian matrix using torch.autograd.grad.
    
    Args:
        outputs: Output tensor
        inputs: Input tensor that requires gradients
        
    Returns:
        Jacobian matrix
    """
    jacobian = []
    for i in range(outputs.size(0)):
        # Get gradients of the i-th output with respect to inputs
        gradient = torch.autograd.grad(
            outputs=outputs[i], 
            inputs=inputs,
            retain_graph=True, 
            create_graph=True,
            allow_unused=True
        )[0]
        
        jacobian.append(gradient)
    
    return torch.stack(jacobian)

# Create input with gradients enabled
x = torch.randn(3, requires_grad=True)

# Define a vector-valued function
y = torch.tensor([
    torch.sin(x[0]), 
    torch.cos(x[1]), 
    torch.exp(x[2])
])

# Compute Jacobian
jacobian = compute_jacobian_with_grad(y, x)
print("Input x:", x)
print("Output y:", y)
print("Jacobian matrix:")
print(jacobian)

Example Output:

Input x: tensor([ 0.6028, -0.0233,  0.4080], requires_grad=True)
Output y: tensor([0.5699, 0.9997, 1.5039])
Jacobian matrix:
tensor([[ 0.8217,  0.0000,  0.0000],
        [ 0.0000, -0.0233,  0.0000],
        [ 0.0000,  0.0000,  1.5039]])

Real-World Applications

1. Neural Network Sensitivity Analysis

The Jacobian can help us understand how sensitive each output of a neural network is to each input:

import torch
import torch.nn as nn
from torch.autograd.functional import jacobian

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.linear1 = nn.Linear(2, 3)
        self.activation = nn.ReLU()
        self.linear2 = nn.Linear(3, 2)
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        return x

# Initialize the model and a sample input
model = SimpleNN()
x = torch.tensor([1.0, 2.0], requires_grad=True)

# Get Jacobian to analyze sensitivity
j = jacobian(model, x)
print("Model input:", x)
print("Jacobian matrix for the model:")
print(j)
print("\nSensitivity analysis: The magnitude of each element shows")
print("how much each output changes when each input changes.")
print("Maximum sensitivity:", torch.abs(j).max().item())

2. Inverse Kinematics

In robotics and computer graphics, the Jacobian is used for inverse kinematics calculations:

import torch
from torch.autograd.functional import jacobian
import matplotlib.pyplot as plt

# Define a 2-joint robotic arm forward kinematics
def arm_forward(joint_angles):
    l1, l2 = 1.0, 0.8  # Arm segment lengths
    theta1, theta2 = joint_angles
    
    # Position of the first joint
    x1 = l1 * torch.cos(theta1)
    y1 = l1 * torch.sin(theta1)
    
    # Position of the end effector (second joint)
    x2 = x1 + l2 * torch.cos(theta1 + theta2)
    y2 = y1 + l2 * torch.sin(theta1 + theta2)
    
    return torch.tensor([x2, y2])

# Initial joint angles (in radians)
joint_angles = torch.tensor([0.5, 0.5], requires_grad=True)

# Compute end effector position
end_position = arm_forward(joint_angles)

# Compute Jacobian
j = jacobian(arm_forward, joint_angles)

print("Joint angles:", joint_angles)
print("End effector position:", end_position)
print("Jacobian matrix:")
print(j)

# Use Jacobian for inverse kinematics (simplified example)
target_position = torch.tensor([1.5, 0.5])
current_position = arm_forward(joint_angles)

# Error between current and target positions
position_error = target_position - current_position

# Pseudo-inverse of the Jacobian for inverse kinematics
j_pinv = torch.pinverse(j)

# Calculate joint angle adjustments
delta_theta = j_pinv @ position_error

print("\nTarget position:", target_position)
print("Position error:", position_error)
print("Suggested joint angle adjustments:", delta_theta)
print("New joint angles:", joint_angles + delta_theta)

3. Optimization Problems

Computing the Jacobian is crucial in optimization algorithms like Newton's method:

import torch
from torch.autograd.functional import jacobian
import matplotlib.pyplot as plt

# Define a function to minimize
def function_to_minimize(x):
    return torch.tensor([
        (x[0] - 2)**2 + (x[1] - 3)**2,  # First component
        (x[0] - 1)**2 + (x[1] - 1)**2   # Second component
    ])

# Initial guess
x = torch.tensor([0.0, 0.0], requires_grad=True)

# Newton's method implementation
learning_rate = 0.1
iterations = 10
trajectory = [x.detach().clone().numpy()]

for i in range(iterations):
    # Compute function value
    f_value = function_to_minimize(x)
    
    # Compute Jacobian
    j = jacobian(function_to_minimize, x)
    
    # Update x using Newton's method (simplified)
    with torch.no_grad():
        x -= learning_rate * torch.mm(torch.pinverse(j), f_value.unsqueeze(1)).squeeze()
    
    # Store trajectory
    trajectory.append(x.detach().clone().numpy())
    
    print(f"Iteration {i+1}, x = {x.tolist()}, f(x) = {function_to_minimize(x).tolist()}")

# Plot optimization trajectory
trajectory = torch.tensor(trajectory)
plt.figure(figsize=(10, 6))
plt.plot(trajectory[:, 0], trajectory[:, 1], 'o-', markersize=8)
plt.plot(1.5, 2.0, 'r*', markersize=15)  # Optimal point
plt.xlabel('x[0]')
plt.ylabel('x[1]')
plt.title("Optimization Trajectory with Newton's Method")
plt.grid(True)
plt.show()

Summary

In this tutorial, you've learned:

What a Jacobian matrix is and why it's important in machine learning and optimization
Multiple ways to compute the Jacobian in PyTorch:
- Manually using multiple backward passes
- Using torch.autograd.functional.jacobian
- Using torch.autograd.grad for more complex cases
Real-world applications of Jacobian matrices:
- Neural network sensitivity analysis
- Inverse kinematics for robotics
- Optimization algorithms like Newton's method

The Jacobian matrix provides crucial information about how a function's outputs change with respect to its inputs, making it an essential tool for many machine learning and optimization tasks.

Additional Resources

PyTorch Autograd Documentation
torch.autograd.functional API
Mathematics for Machine Learning - A good resource for understanding the mathematical concepts behind Jacobians and their applications

Exercises

Compute the Jacobian of a simple neural network with respect to its weights instead of its inputs.
Implement a custom optimizer that uses the Jacobian to determine step directions.
Write a function that visualizes the Jacobian matrix as a heatmap to understand which inputs most strongly affect which outputs.
Extend the inverse kinematics example to a 3-joint robotic arm and solve for the joint angles required to reach a specific target position.
Use the Jacobian to analyze the sensitivity of a pre-trained image classification model to various input perturbations.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is a Jacobian Matrix?​

Computing Jacobian in PyTorch​

Method 1: Manual Computation with grad​

Method 2: Using torch.autograd.functional.jacobian​

Method 3: Using torch.autograd.grad for Batch Processing​

Real-World Applications​

1. Neural Network Sensitivity Analysis​

2. Inverse Kinematics​

3. Optimization Problems​

Summary​

Additional Resources​

Exercises​

Introduction

What is a Jacobian Matrix?

Computing Jacobian in PyTorch

Method 1: Manual Computation with `grad`

Method 2: Using `torch.autograd.functional.jacobian`

Method 3: Using `torch.autograd.grad` for Batch Processing

Real-World Applications

1. Neural Network Sensitivity Analysis

2. Inverse Kinematics

3. Optimization Problems

Summary

Additional Resources

Exercises