FastAPI Deployment Monitoring

Introduction

When you deploy a FastAPI application to production, it's not enough to simply get it running. You need to know how your application is performing, detect issues before they become critical, and understand user behavior. This is where monitoring comes in.

Monitoring your FastAPI application is crucial for:

Identifying performance bottlenecks
Detecting and diagnosing errors
Understanding application usage patterns
Ensuring high availability and reliability
Planning for scaling and improvements

In this guide, we'll explore various aspects of monitoring FastAPI applications, from basic logging to advanced observability tools, providing you with the knowledge to keep your applications healthy in production.

Basic Logging in FastAPI

Setting Up Logging

FastAPI is built on top of Starlette and uses Python's standard logging module. Let's start with a basic logging setup:

import logging
from fastapi import FastAPI

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("app.log"),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

app = FastAPI()

@app.get("/")
async def read_root():
    logger.info("Root endpoint accessed")
    return {"Hello": "World"}

This configuration will:

Set the logging level to INFO
Format logs with timestamp, logger name, level, and message
Output logs to both a file (app.log) and the console

Middleware for Request Logging

To log information about each request, you can use middleware:

from fastapi import FastAPI, Request
import logging
import time

app = FastAPI()
logger = logging.getLogger(__name__)

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start_time = time.time()
    
    # Get client IP address
    forwarded_for = request.headers.get("X-Forwarded-For")
    ip = forwarded_for.split(",")[0] if forwarded_for else request.client.host
    
    logger.info(f"Request started: {request.method} {request.url} from {ip}")
    
    response = await call_next(request)
    
    process_time = time.time() - start_time
    logger.info(f"Request completed: {request.method} {request.url} - Status: {response.status_code} - Duration: {process_time:.4f}s")
    
    return response

This middleware logs information about each request, including:

The HTTP method and URL
The client's IP address
The response status code
The time taken to process the request

Health Checks

Health checks are endpoints that indicate whether your application is running correctly. They're used by load balancers, container orchestrators, and monitoring systems.

Basic Health Check

from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

Advanced Health Check

A more sophisticated health check might verify database connections and other dependencies:

from fastapi import FastAPI, Depends, HTTPException, status
from sqlalchemy.orm import Session
from database import get_db, engine

app = FastAPI()

@app.get("/health")
async def health_check(db: Session = Depends(get_db)):
    try:
        # Check database connection
        db.execute("SELECT 1")
        
        # You could check other services here
        # e.g., Redis, external APIs, etc.
        
        return {
            "status": "healthy",
            "details": {
                "database": "connected"
            }
        }
    except Exception as e:
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail=f"Service unhealthy: {str(e)}"
        )

Metrics Collection with Prometheus

Prometheus is a powerful monitoring system that collects and stores metrics as time series data. Let's integrate it with FastAPI using the prometheus_fastapi_instrumentator package:

pip install prometheus-fastapi-instrumentator

Here's how to set it up:

from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

# Setup Prometheus instrumentation
@app.on_event("startup")
async def startup():
    Instrumentator().instrument(app).expose(app)

@app.get("/")
async def root():
    return {"message": "Hello World"}

This sets up the following metrics automatically:

Request count
Request duration
Request size
Response size
Exceptions

You can access these metrics at the /metrics endpoint, which will output something like:

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="/",method="GET",status="200",le="0.01"} 1
http_request_duration_seconds_bucket{handler="/",method="GET",status="200",le="0.025"} 1
...

Custom Metrics

You can also create custom metrics to track specific aspects of your application:

from fastapi import FastAPI
from prometheus_client import Counter, Gauge
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

# Create custom metrics
ACTIVE_USERS = Gauge("active_users", "Number of active users")
FAILED_LOGINS = Counter("failed_logins_total", "Total count of failed logins")

# Setup Prometheus instrumentation
@app.on_event("startup")
async def startup():
    Instrumentator().instrument(app).expose(app)

@app.get("/")
async def root():
    # Update the active users metric (this would normally be dynamic)
    ACTIVE_USERS.set(42)
    return {"message": "Hello World"}

@app.post("/login")
async def login(username: str, password: str):
    if username == "admin" and password == "password":
        return {"result": "success"}
    else:
        # Increment the failed logins counter
        FAILED_LOGINS.inc()
        return {"result": "failure"}

Distributed Tracing with OpenTelemetry

Distributed tracing helps you understand the flow of requests through your system, especially in microservices architectures. Let's integrate OpenTelemetry with FastAPI:

pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-fastapi

Basic setup:

from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Set up tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

# Configure exporter (in production, you'd use a proper exporter like Jaeger or Zipkin)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

app = FastAPI()

# Instrument FastAPI
FastAPIInstrumentor.instrument_app(app)

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int):
    # Create a custom span for this operation
    with tracer.start_as_current_span("process-item"):
        # Simulate some processing work
        result = process_item(item_id)
        return {"item_id": item_id, "result": result}

def process_item(item_id: int):
    # This function would normally do something more complex
    return f"Processed item {item_id}"

In a real-world scenario, you would configure OpenTelemetry to send traces to a system like Jaeger, Zipkin, or a cloud observability platform.

Error Tracking

To catch and track errors effectively, you can integrate dedicated error tracking tools. Here's an example using Sentry:

pip install sentry-sdk

import sentry_sdk
from fastapi import FastAPI, HTTPException
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware

# Initialize Sentry
sentry_sdk.init(
    dsn="https://[email protected]/0",  # Replace with your DSN
    traces_sample_rate=1.0  # Adjust in production
)

app = FastAPI()

# Add Sentry middleware
app.add_middleware(SentryAsgiMiddleware)

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.get("/error")
async def trigger_error():
    # This will be captured by Sentry
    raise ValueError("This is an example error")

@app.get("/handled-error")
async def handled_error():
    try:
        # Some operation that might fail
        result = 1 / 0
    except Exception as e:
        # Capture the exception with additional context
        sentry_sdk.capture_exception(e, {"operation": "division"})
        raise HTTPException(status_code=500, detail="An error occurred")

Real-world Example: Comprehensive Monitoring Setup

Let's put everything together in a more complete example that integrates logging, metrics, tracing, and error tracking:

import logging
import time
from fastapi import FastAPI, Request, Depends, HTTPException
from sqlalchemy.orm import Session
from prometheus_fastapi_instrumentator import Instrumentator
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import sentry_sdk
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("app.log"),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Initialize Sentry for error tracking
sentry_sdk.init(
    dsn="https://[email protected]/0",  # Replace with your DSN
    traces_sample_rate=0.2  # Adjust based on traffic
)

# Set up OpenTelemetry for distributed tracing
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))
tracer = trace.get_tracer(__name__)

# Create FastAPI app
app = FastAPI()

# Add Sentry middleware
app.add_middleware(SentryAsgiMiddleware)

# Database dependency (placeholder)
def get_db():
    db = None
    try:
        # In a real app, you'd get a database connection here
        db = "database_connection"
        yield db
    finally:
        # And close it here
        pass

# Request logging middleware
@app.middleware("http")
async def log_requests(request: Request, call_next):
    request_id = str(id(request))
    start_time = time.time()
    
    logger.info(f"[{request_id}] Request started: {request.method} {request.url}")
    
    try:
        response = await call_next(request)
        process_time = time.time() - start_time
        logger.info(
            f"[{request_id}] Request completed: {request.method} {request.url} - "
            f"Status: {response.status_code} - Duration: {process_time:.4f}s"
        )
        return response
    except Exception as e:
        process_time = time.time() - start_time
        logger.error(
            f"[{request_id}] Request failed: {request.method} {request.url} - "
            f"Error: {str(e)} - Duration: {process_time:.4f}s"
        )
        raise

# Setup Prometheus metrics
@app.on_event("startup")
async def startup():
    Instrumentator().instrument(app).expose(app)
    # Also initialize FastAPI instrumentation for OpenTelemetry
    FastAPIInstrumentor.instrument_app(app)
    logger.info("Application startup complete")

@app.on_event("shutdown")
async def shutdown():
    logger.info("Application shutdown")

# Health check endpoint
@app.get("/health")
async def health_check(db=Depends(get_db)):
    try:
        # In a real app, you'd check database connectivity here
        return {
            "status": "healthy",
            "details": {
                "database": "connected"
            }
        }
    except Exception as e:
        logger.error(f"Health check failed: {str(e)}")
        raise HTTPException(status_code=503, detail=f"Service unhealthy: {str(e)}")

# Example endpoints
@app.get("/")
async def read_root():
    logger.info("Root endpoint accessed")
    return {"message": "Hello World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int, db=Depends(get_db)):
    with tracer.start_as_current_span("process-item"):
        logger.info(f"Processing item {item_id}")
        # Simulate processing
        if item_id == 0:
            logger.error(f"Invalid item_id: {item_id}")
            raise HTTPException(status_code=400, detail="Item ID cannot be zero")
        return {"item_id": item_id, "name": f"Item {item_id}"}

@app.get("/error")
async def trigger_error():
    logger.error("Error endpoint accessed")
    raise ValueError("This is a test error")

To run this application, you would also need:

A Prometheus server to collect metrics
OpenTelemetry Collector to receive and forward traces
Sentry account for error tracking

In a real-world deployment, you would typically run this behind a reverse proxy like Nginx, and potentially use container orchestration with Kubernetes.

Monitoring Visualization

Once you've collected all this monitoring data, you need a way to visualize it. Here are some common tools:

Grafana: For visualizing Prometheus metrics
Jaeger UI: For exploring distributed traces
Kibana: If you forward logs to Elasticsearch
Sentry Dashboard: For error tracking and analysis

Best Practices for FastAPI Monitoring

Use structured logging: Make your logs machine-parsable with consistent fields
Focus on key metrics: Track request rate, error rate, and duration (RED method)
Set up alerts: Don't just collect data—set up alerts for abnormal conditions
Monitor dependencies: Track the health of databases, caches, and third-party services
Implement proper error handling: Catch and log exceptions properly
Use request IDs: Include a unique identifier in logs to trace requests across your system
Monitor resource usage: Track CPU, memory, disk, and network usage
Use sampling for high-traffic applications: Don't trace or log everything in production
Review and act on monitoring data: Regularly analyze your monitoring data to improve your application

Summary

Monitoring your FastAPI application is essential for ensuring reliability, performance, and a good user experience in production. In this guide, we've covered:

Basic logging and request tracking
Health checks for infrastructure integration
Metrics collection with Prometheus
Distributed tracing with OpenTelemetry
Error tracking with Sentry
A comprehensive real-world monitoring setup

By implementing these monitoring approaches, you'll gain visibility into your FastAPI application's behavior and be able to quickly detect and resolve issues before they impact your users.

Additional Resources

Exercises

Set up basic logging for a FastAPI application and observe how different types of requests are logged.
Implement a health check endpoint that verifies connectivity to a database.
Integrate Prometheus metrics and create a custom metric that tracks a business-relevant value.
Set up distributed tracing for a multi-service application using OpenTelemetry.
Create a Grafana dashboard that visualizes the key performance metrics for your FastAPI application.
Implement proper error handling with Sentry integration and trigger test errors to verify it's working.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Logging in FastAPI​

Setting Up Logging​

Middleware for Request Logging​

Health Checks​

Basic Health Check​

Advanced Health Check​

Metrics Collection with Prometheus​

Custom Metrics​

Distributed Tracing with OpenTelemetry​

Error Tracking​

Real-world Example: Comprehensive Monitoring Setup​

Monitoring Visualization​

Best Practices for FastAPI Monitoring​

Summary​

Additional Resources​

Exercises​