PyTorch Jacobian
Introduction
The Jacobian matrix is a fundamental concept in multivariate calculus that plays a crucial role in deep learning and optimization algorithms. In PyTorch, computing the Jacobian matrix is an important operation when working with vector-valued functions or when you need to understand how multiple outputs change with respect to multiple inputs.
This tutorial will guide you through the process of understanding and computing Jacobian matrices in PyTorch using its autograd functionality. We'll start with the basic concepts and gradually move toward more practical applications.
What is a Jacobian Matrix?
Before diving into code, let's understand what a Jacobian matrix is:
The Jacobian matrix represents all first-order partial derivatives of a vector-valued function. If you have a function F that maps from Rⁿ to Rᵐ (takes n inputs and produces m outputs), the Jacobian is an m×n matrix where each element (i,j) is the partial derivative of the i-th output with respect to the j-th input.
Mathematically, for a function F(x) = [F₁(x), F₂(x), ..., Fₘ(x)] where x = [x₁, x₂, ..., xₙ], the Jacobian J is:
J = [
[∂F₁/∂x₁, ∂F₁/∂x₂, ..., ∂F₁/∂xₙ],
[∂F₂/∂x₁, ∂F₂/∂x₂, ..., ∂F₂/∂xₙ],
...
[∂Fₘ/∂x₁, ∂Fₘ/∂x₂, ..., ∂Fₘ/∂xₙ]
]
Computing Jacobian in PyTorch
PyTorch doesn't have a direct built-in function called jacobian()
, but we can compute the Jacobian matrix using autograd's capabilities in several ways:
Method 1: Manual Computation with grad
The most basic approach is to compute the gradient of each output with respect to all inputs:
import torch
def compute_jacobian_manually(inputs, outputs):
"""
Compute the Jacobian matrix manually.
Args:
inputs: Input tensor that requires gradients
outputs: Output tensor
Returns:
Jacobian matrix of shape (outputs_size, inputs_size)
"""
jacobian = []
for i in range(outputs.size(0)):
# Zero all previous gradients
if inputs.grad is not None:
inputs.grad.zero_()
# Compute gradient of the i-th output with respect to inputs
grad_output = torch.zeros_like(outputs)
grad_output[i] = 1.0
outputs.backward(grad_output, retain_graph=True)
# Store the gradients
jacobian.append(inputs.grad.clone())
# Stack all gradients to form the Jacobian matrix
return torch.stack(jacobian)
# Example usage
x = torch.randn(3, requires_grad=True) # Input vector of size 3
y = torch.tensor([x[0]**2, x[1]**3, x[2]**4]) # Output vector of size 3
jacobian = compute_jacobian_manually(x, y)
print("Input x:", x)
print("Output y:", y)
print("Jacobian matrix:")
print(jacobian)
Example Output:
Input x: tensor([-0.5366, 0.9893, -0.2870], requires_grad=True)
Output y: tensor([ 0.2879, 0.9682, 0.0068])
Jacobian matrix:
tensor([[-1.0731, 0.0000, 0.0000],
[ 0.0000, 2.9362, 0.0000],
[ 0.0000, 0.0000, -0.9593]])
Method 2: Using torch.autograd.functional.jacobian
Since PyTorch 1.5.0, a more convenient way to compute the Jacobian is by using the torch.autograd.functional.jacobian
function:
import torch
from torch.autograd.functional import jacobian
# Define a vector-valued function
def f(x):
return torch.tensor([x[0]**2, x[1]**3, x[2]**4])
# Create an input tensor
x = torch.randn(3)
print("Input x:", x)
# Compute the Jacobian matrix
j = jacobian(f, x)
print("Jacobian matrix using torch.autograd.functional.jacobian:")
print(j)
Example Output:
Input x: tensor([0.8579, 0.3352, 0.6698])
Jacobian matrix using torch.autograd.functional.jacobian:
tensor([[1.7158, 0.0000, 0.0000],
[0.0000, 1.0110, 0.0000],
[0.0000, 0.0000, 8.0221]])
Method 3: Using torch.autograd.grad
for Batch Processing
For more complex scenarios, such as when dealing with batches, we can use torch.autograd.grad
:
import torch
def compute_jacobian_with_grad(outputs, inputs):
"""
Compute Jacobian matrix using torch.autograd.grad.
Args:
outputs: Output tensor
inputs: Input tensor that requires gradients
Returns:
Jacobian matrix
"""
jacobian = []
for i in range(outputs.size(0)):
# Get gradients of the i-th output with respect to inputs
gradient = torch.autograd.grad(
outputs=outputs[i],
inputs=inputs,
retain_graph=True,
create_graph=True,
allow_unused=True
)[0]
jacobian.append(gradient)
return torch.stack(jacobian)
# Create input with gradients enabled
x = torch.randn(3, requires_grad=True)
# Define a vector-valued function
y = torch.tensor([
torch.sin(x[0]),
torch.cos(x[1]),
torch.exp(x[2])
])
# Compute Jacobian
jacobian = compute_jacobian_with_grad(y, x)
print("Input x:", x)
print("Output y:", y)
print("Jacobian matrix:")
print(jacobian)
Example Output:
Input x: tensor([ 0.6028, -0.0233, 0.4080], requires_grad=True)
Output y: tensor([0.5699, 0.9997, 1.5039])
Jacobian matrix:
tensor([[ 0.8217, 0.0000, 0.0000],
[ 0.0000, -0.0233, 0.0000],
[ 0.0000, 0.0000, 1.5039]])
Real-World Applications
1. Neural Network Sensitivity Analysis
The Jacobian can help us understand how sensitive each output of a neural network is to each input:
import torch
import torch.nn as nn
from torch.autograd.functional import jacobian
# Define a simple neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.linear1 = nn.Linear(2, 3)
self.activation = nn.ReLU()
self.linear2 = nn.Linear(3, 2)
def forward(self, x):
x = self.linear1(x)
x = self.activation(x)
x = self.linear2(x)
return x
# Initialize the model and a sample input
model = SimpleNN()
x = torch.tensor([1.0, 2.0], requires_grad=True)
# Get Jacobian to analyze sensitivity
j = jacobian(model, x)
print("Model input:", x)
print("Jacobian matrix for the model:")
print(j)
print("\nSensitivity analysis: The magnitude of each element shows")
print("how much each output changes when each input changes.")
print("Maximum sensitivity:", torch.abs(j).max().item())
2. Inverse Kinematics
In robotics and computer graphics, the Jacobian is used for inverse kinematics calculations:
import torch
from torch.autograd.functional import jacobian
import matplotlib.pyplot as plt
# Define a 2-joint robotic arm forward kinematics
def arm_forward(joint_angles):
l1, l2 = 1.0, 0.8 # Arm segment lengths
theta1, theta2 = joint_angles
# Position of the first joint
x1 = l1 * torch.cos(theta1)
y1 = l1 * torch.sin(theta1)
# Position of the end effector (second joint)
x2 = x1 + l2 * torch.cos(theta1 + theta2)
y2 = y1 + l2 * torch.sin(theta1 + theta2)
return torch.tensor([x2, y2])
# Initial joint angles (in radians)
joint_angles = torch.tensor([0.5, 0.5], requires_grad=True)
# Compute end effector position
end_position = arm_forward(joint_angles)
# Compute Jacobian
j = jacobian(arm_forward, joint_angles)
print("Joint angles:", joint_angles)
print("End effector position:", end_position)
print("Jacobian matrix:")
print(j)
# Use Jacobian for inverse kinematics (simplified example)
target_position = torch.tensor([1.5, 0.5])
current_position = arm_forward(joint_angles)
# Error between current and target positions
position_error = target_position - current_position
# Pseudo-inverse of the Jacobian for inverse kinematics
j_pinv = torch.pinverse(j)
# Calculate joint angle adjustments
delta_theta = j_pinv @ position_error
print("\nTarget position:", target_position)
print("Position error:", position_error)
print("Suggested joint angle adjustments:", delta_theta)
print("New joint angles:", joint_angles + delta_theta)
3. Optimization Problems
Computing the Jacobian is crucial in optimization algorithms like Newton's method:
import torch
from torch.autograd.functional import jacobian
import matplotlib.pyplot as plt
# Define a function to minimize
def function_to_minimize(x):
return torch.tensor([
(x[0] - 2)**2 + (x[1] - 3)**2, # First component
(x[0] - 1)**2 + (x[1] - 1)**2 # Second component
])
# Initial guess
x = torch.tensor([0.0, 0.0], requires_grad=True)
# Newton's method implementation
learning_rate = 0.1
iterations = 10
trajectory = [x.detach().clone().numpy()]
for i in range(iterations):
# Compute function value
f_value = function_to_minimize(x)
# Compute Jacobian
j = jacobian(function_to_minimize, x)
# Update x using Newton's method (simplified)
with torch.no_grad():
x -= learning_rate * torch.mm(torch.pinverse(j), f_value.unsqueeze(1)).squeeze()
# Store trajectory
trajectory.append(x.detach().clone().numpy())
print(f"Iteration {i+1}, x = {x.tolist()}, f(x) = {function_to_minimize(x).tolist()}")
# Plot optimization trajectory
trajectory = torch.tensor(trajectory)
plt.figure(figsize=(10, 6))
plt.plot(trajectory[:, 0], trajectory[:, 1], 'o-', markersize=8)
plt.plot(1.5, 2.0, 'r*', markersize=15) # Optimal point
plt.xlabel('x[0]')
plt.ylabel('x[1]')
plt.title("Optimization Trajectory with Newton's Method")
plt.grid(True)
plt.show()
Summary
In this tutorial, you've learned:
- What a Jacobian matrix is and why it's important in machine learning and optimization
- Multiple ways to compute the Jacobian in PyTorch:
- Manually using multiple backward passes
- Using
torch.autograd.functional.jacobian
- Using
torch.autograd.grad
for more complex cases
- Real-world applications of Jacobian matrices:
- Neural network sensitivity analysis
- Inverse kinematics for robotics
- Optimization algorithms like Newton's method
The Jacobian matrix provides crucial information about how a function's outputs change with respect to its inputs, making it an essential tool for many machine learning and optimization tasks.
Additional Resources
- PyTorch Autograd Documentation
- torch.autograd.functional API
- Mathematics for Machine Learning - A good resource for understanding the mathematical concepts behind Jacobians and their applications
Exercises
- Compute the Jacobian of a simple neural network with respect to its weights instead of its inputs.
- Implement a custom optimizer that uses the Jacobian to determine step directions.
- Write a function that visualizes the Jacobian matrix as a heatmap to understand which inputs most strongly affect which outputs.
- Extend the inverse kinematics example to a 3-joint robotic arm and solve for the joint angles required to reach a specific target position.
- Use the Jacobian to analyze the sensitivity of a pre-trained image classification model to various input perturbations.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)