Skip to main content

Python Error Handling

Introduction

When you're writing Python code, especially for complex applications like those using PyTorch, things won't always go according to plan. Your program might encounter unexpected situations like missing files, network issues, invalid inputs, or memory limitations. Without proper error handling, these issues can cause your program to crash abruptly, leaving users confused and frustrated.

Error handling is a programming technique that anticipates potential problems and deals with them gracefully. In Python, this is primarily done through a mechanism called "exceptions." Rather than letting errors terminate your program, you can catch these exceptions and decide how to respond—whether that's displaying a helpful message, trying an alternative approach, or safely shutting down.

In this tutorial, we'll explore Python's exception handling system and learn how to make our PyTorch applications more robust and user-friendly.

Understanding Exceptions in Python

What are Exceptions?

In Python, exceptions are events that disrupt the normal flow of a program's instructions. When an error occurs during execution, Python creates an exception object. If this exception isn't handled, the program terminates and displays an error message.

Here's a simple example of an exception:

python
# This will cause a ZeroDivisionError
result = 10 / 0
print("This line won't be reached")

Output:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

Common Python Exceptions

Before diving into handling exceptions, let's look at some common exceptions you might encounter:

ExceptionDescription
SyntaxErrorRaised when the parser encounters a syntax error
NameErrorRaised when a local or global name is not found
TypeErrorRaised when an operation is applied to an object of inappropriate type
ValueErrorRaised when a function gets an argument of correct type but inappropriate value
IndexErrorRaised when an index is out of range
KeyErrorRaised when a dictionary key is not found
FileNotFoundErrorRaised when a file or directory is requested but doesn't exist
ImportErrorRaised when an import statement fails
ZeroDivisionErrorRaised when division or modulo by zero is encountered

PyTorch-specific exceptions include torch.cuda.OutOfMemoryError when your GPU runs out of memory.

Basic Exception Handling

The try-except Block

The fundamental construct for handling exceptions in Python is the try-except block:

python
try:
# Code that might cause an exception
result = 10 / 0
except ZeroDivisionError:
# Code to handle the specific exception
print("Cannot divide by zero!")

Output:

Cannot divide by zero!

The program continues running instead of crashing, which is much more user-friendly.

Handling Multiple Exceptions

You can handle different exceptions in different ways:

python
try:
# This could cause different types of exceptions
number = int(input("Enter a number: "))
result = 10 / number
print(f"Result: {result}")
except ZeroDivisionError:
print("Cannot divide by zero!")
except ValueError:
print("You must enter a valid number!")

If the user enters "0":

Enter a number: 0
Cannot divide by zero!

If the user enters "hello":

Enter a number: hello
You must enter a valid number!

Catching Multiple Exceptions with One Handler

You can also handle multiple exceptions with the same code:

python
try:
# Code that might raise exceptions
file = open("nonexistent_file.txt", "r")
content = file.read()
print(content)
except (FileNotFoundError, PermissionError):
print("There was a problem accessing the file.")

Output:

There was a problem accessing the file.

The Catch-All Exception Handler

While it's generally better to catch specific exceptions, sometimes you might want to catch any potential exception:

python
try:
# Some risky code
x = 1 / 0
except Exception as e:
print(f"An error occurred: {e}")

Output:

An error occurred: division by zero

Advanced Exception Handling

The else Clause

The else clause runs if no exceptions were raised in the try block:

python
try:
number = int(input("Enter a positive number: "))
if number <= 0:
raise ValueError("That's not a positive number!")
except ValueError as err:
print(f"Error: {err}")
else:
print(f"You entered {number}, which is a valid positive number.")

If the user enters "5":

Enter a positive number: 5
You entered 5, which is a valid positive number.

If the user enters "-2":

Enter a positive number: -2
Error: That's not a positive number!

The finally Clause

The finally clause runs regardless of whether an exception occurred or not, making it perfect for cleanup operations:

python
try:
file = open("sample_data.txt", "r")
content = file.read()
# Process content...
except FileNotFoundError:
print("The file was not found.")
finally:
# This code always runs
try:
file.close()
print("File closed successfully.")
except:
print("No file to close.")

Raising Exceptions

Sometimes you might want to trigger exceptions manually using the raise statement:

python
def validate_age(age):
if age < 0:
raise ValueError("Age cannot be negative")
if age > 120:
raise ValueError("Age is too high")
return True

try:
validate_age(150)
except ValueError as e:
print(f"Validation error: {e}")

Output:

Validation error: Age is too high

Creating Custom Exceptions

For specialized error handling, you can create your own exception classes:

python
class ModelError(Exception):
"""Exception raised for errors in the ML model."""
pass

class DataShapeError(ModelError):
"""Exception raised when data doesn't match expected shape."""
def __init__(self, expected_shape, actual_shape):
self.expected_shape = expected_shape
self.actual_shape = actual_shape
self.message = f"Expected data shape {expected_shape}, got {actual_shape}"
super().__init__(self.message)

# Using our custom exception
try:
expected = (3, 224, 224)
actual = (1, 128, 128)
if expected != actual:
raise DataShapeError(expected, actual)
except DataShapeError as e:
print(f"Model input error: {e}")

Output:

Model input error: Expected data shape (3, 224, 224), got (1, 128, 128)

Error Handling in PyTorch

PyTorch operations can throw various exceptions, especially when working with tensors, GPU operations, or during model training. Let's look at some examples specific to PyTorch:

Handling CUDA Errors

When working with GPUs, you might encounter out-of-memory errors:

python
import torch

try:
# Try to allocate an extremely large tensor
huge_tensor = torch.ones(1000000, 1000000).cuda()
except torch.cuda.OutOfMemoryError:
print("Not enough GPU memory for this operation!")
# Maybe try with a smaller tensor or use CPU instead
small_tensor = torch.ones(1000, 1000).cpu()

Shape Mismatch Errors

A common issue in PyTorch involves tensor shape mismatches:

python
import torch

try:
# Create tensors with incompatible shapes
tensor_a = torch.randn(10, 20)
tensor_b = torch.randn(30, 40)

# This will raise a RuntimeError due to shape mismatch
result = torch.matmul(tensor_a, tensor_b)
except RuntimeError as e:
print(f"Matrix operation error: {e}")
print(f"Shape of tensor_a: {tensor_a.shape}")
print(f"Shape of tensor_b: {tensor_b.shape}")
print("For matrix multiplication, the inner dimensions must match.")

Output:

Matrix operation error: mat1 and mat2 shapes cannot be multiplied (10x20 and 30x40)
Shape of tensor_a: torch.Size([10, 20])
Shape of tensor_b: torch.Size([30, 40])
For matrix multiplication, the inner dimensions must match.

Graceful Fallbacks

A robust PyTorch application might include fallback mechanisms:

python
import torch

def train_model(use_gpu=True):
try:
if use_gpu and torch.cuda.is_available():
device = torch.device("cuda")
print("Using GPU for training")
else:
device = torch.device("cpu")
print("Using CPU for training")

# Create a simple model and move it to the selected device
model = torch.nn.Linear(10, 1).to(device)

# Generate some dummy data
inputs = torch.randn(100, 10).to(device)
targets = torch.randn(100, 1).to(device)

# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(5):
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.mse_loss(outputs, targets)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}/5, Loss: {loss.item():.4f}")

return model

except RuntimeError as e:
if "CUDA" in str(e):
print(f"GPU error: {e}")
print("Falling back to CPU...")
return train_model(use_gpu=False)
else:
raise # Re-raise if it's not a CUDA error

model = train_model()

Best Practices for Error Handling

Here are some guidelines to make your error handling more effective:

  1. Be specific: Catch specific exceptions rather than using bare except clauses
  2. Don't silence exceptions: Avoid empty except blocks that hide errors
  3. Log errors: In production code, log exceptions with context for debugging
  4. Clean up resources: Use try-finally or context managers (e.g., with statements)
  5. Provide helpful error messages: Make error messages informative for users
  6. Fail early: Validate inputs at the beginning of functions
  7. Don't use exceptions for flow control: Exceptions are for exceptional situations

Example of Good Error Handling

Here's a more complete example demonstrating good error handling practices:

python
import torch
import logging
from typing import Tuple, Optional

# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def load_model(model_path: str) -> Optional[torch.nn.Module]:
"""
Load a PyTorch model from a file with robust error handling.

Args:
model_path: Path to the saved model

Returns:
The loaded model or None if loading failed
"""
try:
# Check if file exists
import os
if not os.path.exists(model_path):
raise FileNotFoundError(f"Model file not found at {model_path}")

# Determine device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
logger.info(f"Using device: {device}")

# Load model
model = torch.load(model_path, map_location=device)
logger.info(f"Model successfully loaded from {model_path}")

return model

except FileNotFoundError as e:
logger.error(f"File error: {e}")
return None
except RuntimeError as e:
if "CUDA" in str(e):
logger.warning(f"CUDA error: {e}")
logger.info("Attempting to load on CPU instead...")
return load_model_cpu_only(model_path)
logger.error(f"Error loading model: {e}")
return None
except Exception as e:
logger.error(f"Unexpected error: {e}")
logger.exception("Stack trace:")
return None

def load_model_cpu_only(model_path: str) -> Optional[torch.nn.Module]:
"""Fallback function to load model on CPU only."""
try:
model = torch.load(model_path, map_location="cpu")
logger.info("Model loaded successfully on CPU")
return model
except Exception as e:
logger.error(f"Failed to load model on CPU: {e}")
return None

# Usage example
model = load_model("path/to/model.pth")
if model is not None:
print("Model loaded successfully, ready for inference")
else:
print("Failed to load model, please check the logs for details")

Summary

Error handling is a crucial aspect of writing robust Python applications, especially when working with PyTorch for machine learning tasks. In this tutorial, we've covered:

  • The basics of Python exceptions and how they work
  • Using try-except blocks to catch and handle errors
  • Advanced features like else and finally clauses
  • Creating and raising custom exceptions
  • Specific error handling scenarios in PyTorch
  • Best practices for effective error handling

By implementing proper error handling in your PyTorch projects, you can create applications that gracefully handle unexpected situations, provide helpful feedback to users, and ensure resources are properly managed.

Exercises

To practice your error handling skills, try these exercises:

  1. Write a function that loads a dataset and uses error handling to deal with missing files or corrupted data.
  2. Create a custom exception class for a specific error that might occur in your PyTorch model training.
  3. Modify an existing PyTorch training loop to include proper error handling for out-of-memory errors.
  4. Write a function that validates tensor shapes before performing operations and raises appropriate exceptions.
  5. Implement a context manager using __enter__ and __exit__ for resource management in a PyTorch application.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)