Skip to main content

FastAPI Deployment Monitoring

Introduction

When you deploy a FastAPI application to production, it's not enough to simply get it running. You need to know how your application is performing, detect issues before they become critical, and understand user behavior. This is where monitoring comes in.

Monitoring your FastAPI application is crucial for:

  • Identifying performance bottlenecks
  • Detecting and diagnosing errors
  • Understanding application usage patterns
  • Ensuring high availability and reliability
  • Planning for scaling and improvements

In this guide, we'll explore various aspects of monitoring FastAPI applications, from basic logging to advanced observability tools, providing you with the knowledge to keep your applications healthy in production.

Basic Logging in FastAPI

Setting Up Logging

FastAPI is built on top of Starlette and uses Python's standard logging module. Let's start with a basic logging setup:

python
import logging
from fastapi import FastAPI

# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler("app.log"),
logging.StreamHandler()
]
)

logger = logging.getLogger(__name__)

app = FastAPI()

@app.get("/")
async def read_root():
logger.info("Root endpoint accessed")
return {"Hello": "World"}

This configuration will:

  • Set the logging level to INFO
  • Format logs with timestamp, logger name, level, and message
  • Output logs to both a file (app.log) and the console

Middleware for Request Logging

To log information about each request, you can use middleware:

python
from fastapi import FastAPI, Request
import logging
import time

app = FastAPI()
logger = logging.getLogger(__name__)

@app.middleware("http")
async def log_requests(request: Request, call_next):
start_time = time.time()

# Get client IP address
forwarded_for = request.headers.get("X-Forwarded-For")
ip = forwarded_for.split(",")[0] if forwarded_for else request.client.host

logger.info(f"Request started: {request.method} {request.url} from {ip}")

response = await call_next(request)

process_time = time.time() - start_time
logger.info(f"Request completed: {request.method} {request.url} - Status: {response.status_code} - Duration: {process_time:.4f}s")

return response

This middleware logs information about each request, including:

  • The HTTP method and URL
  • The client's IP address
  • The response status code
  • The time taken to process the request

Health Checks

Health checks are endpoints that indicate whether your application is running correctly. They're used by load balancers, container orchestrators, and monitoring systems.

Basic Health Check

python
from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
async def health_check():
return {"status": "healthy"}

Advanced Health Check

A more sophisticated health check might verify database connections and other dependencies:

python
from fastapi import FastAPI, Depends, HTTPException, status
from sqlalchemy.orm import Session
from database import get_db, engine

app = FastAPI()

@app.get("/health")
async def health_check(db: Session = Depends(get_db)):
try:
# Check database connection
db.execute("SELECT 1")

# You could check other services here
# e.g., Redis, external APIs, etc.

return {
"status": "healthy",
"details": {
"database": "connected"
}
}
except Exception as e:
raise HTTPException(
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
detail=f"Service unhealthy: {str(e)}"
)

Metrics Collection with Prometheus

Prometheus is a powerful monitoring system that collects and stores metrics as time series data. Let's integrate it with FastAPI using the prometheus_fastapi_instrumentator package:

bash
pip install prometheus-fastapi-instrumentator

Here's how to set it up:

python
from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

# Setup Prometheus instrumentation
@app.on_event("startup")
async def startup():
Instrumentator().instrument(app).expose(app)

@app.get("/")
async def root():
return {"message": "Hello World"}

This sets up the following metrics automatically:

  • Request count
  • Request duration
  • Request size
  • Response size
  • Exceptions

You can access these metrics at the /metrics endpoint, which will output something like:

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="/",method="GET",status="200",le="0.01"} 1
http_request_duration_seconds_bucket{handler="/",method="GET",status="200",le="0.025"} 1
...

Custom Metrics

You can also create custom metrics to track specific aspects of your application:

python
from fastapi import FastAPI
from prometheus_client import Counter, Gauge
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

# Create custom metrics
ACTIVE_USERS = Gauge("active_users", "Number of active users")
FAILED_LOGINS = Counter("failed_logins_total", "Total count of failed logins")

# Setup Prometheus instrumentation
@app.on_event("startup")
async def startup():
Instrumentator().instrument(app).expose(app)

@app.get("/")
async def root():
# Update the active users metric (this would normally be dynamic)
ACTIVE_USERS.set(42)
return {"message": "Hello World"}

@app.post("/login")
async def login(username: str, password: str):
if username == "admin" and password == "password":
return {"result": "success"}
else:
# Increment the failed logins counter
FAILED_LOGINS.inc()
return {"result": "failure"}

Distributed Tracing with OpenTelemetry

Distributed tracing helps you understand the flow of requests through your system, especially in microservices architectures. Let's integrate OpenTelemetry with FastAPI:

bash
pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-fastapi

Basic setup:

python
from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Set up tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

# Configure exporter (in production, you'd use a proper exporter like Jaeger or Zipkin)
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)

app = FastAPI()

# Instrument FastAPI
FastAPIInstrumentor.instrument_app(app)

@app.get("/")
async def root():
return {"message": "Hello World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int):
# Create a custom span for this operation
with tracer.start_as_current_span("process-item"):
# Simulate some processing work
result = process_item(item_id)
return {"item_id": item_id, "result": result}

def process_item(item_id: int):
# This function would normally do something more complex
return f"Processed item {item_id}"

In a real-world scenario, you would configure OpenTelemetry to send traces to a system like Jaeger, Zipkin, or a cloud observability platform.

Error Tracking

To catch and track errors effectively, you can integrate dedicated error tracking tools. Here's an example using Sentry:

bash
pip install sentry-sdk
python
import sentry_sdk
from fastapi import FastAPI, HTTPException
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware

# Initialize Sentry
sentry_sdk.init(
dsn="https://[email protected]/0", # Replace with your DSN
traces_sample_rate=1.0 # Adjust in production
)

app = FastAPI()

# Add Sentry middleware
app.add_middleware(SentryAsgiMiddleware)

@app.get("/")
async def root():
return {"message": "Hello World"}

@app.get("/error")
async def trigger_error():
# This will be captured by Sentry
raise ValueError("This is an example error")

@app.get("/handled-error")
async def handled_error():
try:
# Some operation that might fail
result = 1 / 0
except Exception as e:
# Capture the exception with additional context
sentry_sdk.capture_exception(e, {"operation": "division"})
raise HTTPException(status_code=500, detail="An error occurred")

Real-world Example: Comprehensive Monitoring Setup

Let's put everything together in a more complete example that integrates logging, metrics, tracing, and error tracking:

python
import logging
import time
from fastapi import FastAPI, Request, Depends, HTTPException
from sqlalchemy.orm import Session
from prometheus_fastapi_instrumentator import Instrumentator
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import sentry_sdk
from sentry_sdk.integrations.asgi import SentryAsgiMiddleware

# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler("app.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)

# Initialize Sentry for error tracking
sentry_sdk.init(
dsn="https://[email protected]/0", # Replace with your DSN
traces_sample_rate=0.2 # Adjust based on traffic
)

# Set up OpenTelemetry for distributed tracing
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))
tracer = trace.get_tracer(__name__)

# Create FastAPI app
app = FastAPI()

# Add Sentry middleware
app.add_middleware(SentryAsgiMiddleware)

# Database dependency (placeholder)
def get_db():
db = None
try:
# In a real app, you'd get a database connection here
db = "database_connection"
yield db
finally:
# And close it here
pass

# Request logging middleware
@app.middleware("http")
async def log_requests(request: Request, call_next):
request_id = str(id(request))
start_time = time.time()

logger.info(f"[{request_id}] Request started: {request.method} {request.url}")

try:
response = await call_next(request)
process_time = time.time() - start_time
logger.info(
f"[{request_id}] Request completed: {request.method} {request.url} - "
f"Status: {response.status_code} - Duration: {process_time:.4f}s"
)
return response
except Exception as e:
process_time = time.time() - start_time
logger.error(
f"[{request_id}] Request failed: {request.method} {request.url} - "
f"Error: {str(e)} - Duration: {process_time:.4f}s"
)
raise

# Setup Prometheus metrics
@app.on_event("startup")
async def startup():
Instrumentator().instrument(app).expose(app)
# Also initialize FastAPI instrumentation for OpenTelemetry
FastAPIInstrumentor.instrument_app(app)
logger.info("Application startup complete")

@app.on_event("shutdown")
async def shutdown():
logger.info("Application shutdown")

# Health check endpoint
@app.get("/health")
async def health_check(db=Depends(get_db)):
try:
# In a real app, you'd check database connectivity here
return {
"status": "healthy",
"details": {
"database": "connected"
}
}
except Exception as e:
logger.error(f"Health check failed: {str(e)}")
raise HTTPException(status_code=503, detail=f"Service unhealthy: {str(e)}")

# Example endpoints
@app.get("/")
async def read_root():
logger.info("Root endpoint accessed")
return {"message": "Hello World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int, db=Depends(get_db)):
with tracer.start_as_current_span("process-item"):
logger.info(f"Processing item {item_id}")
# Simulate processing
if item_id == 0:
logger.error(f"Invalid item_id: {item_id}")
raise HTTPException(status_code=400, detail="Item ID cannot be zero")
return {"item_id": item_id, "name": f"Item {item_id}"}

@app.get("/error")
async def trigger_error():
logger.error("Error endpoint accessed")
raise ValueError("This is a test error")

To run this application, you would also need:

  1. A Prometheus server to collect metrics
  2. OpenTelemetry Collector to receive and forward traces
  3. Sentry account for error tracking

In a real-world deployment, you would typically run this behind a reverse proxy like Nginx, and potentially use container orchestration with Kubernetes.

Monitoring Visualization

Once you've collected all this monitoring data, you need a way to visualize it. Here are some common tools:

  1. Grafana: For visualizing Prometheus metrics
  2. Jaeger UI: For exploring distributed traces
  3. Kibana: If you forward logs to Elasticsearch
  4. Sentry Dashboard: For error tracking and analysis

Best Practices for FastAPI Monitoring

  1. Use structured logging: Make your logs machine-parsable with consistent fields
  2. Focus on key metrics: Track request rate, error rate, and duration (RED method)
  3. Set up alerts: Don't just collect data—set up alerts for abnormal conditions
  4. Monitor dependencies: Track the health of databases, caches, and third-party services
  5. Implement proper error handling: Catch and log exceptions properly
  6. Use request IDs: Include a unique identifier in logs to trace requests across your system
  7. Monitor resource usage: Track CPU, memory, disk, and network usage
  8. Use sampling for high-traffic applications: Don't trace or log everything in production
  9. Review and act on monitoring data: Regularly analyze your monitoring data to improve your application

Summary

Monitoring your FastAPI application is essential for ensuring reliability, performance, and a good user experience in production. In this guide, we've covered:

  • Basic logging and request tracking
  • Health checks for infrastructure integration
  • Metrics collection with Prometheus
  • Distributed tracing with OpenTelemetry
  • Error tracking with Sentry
  • A comprehensive real-world monitoring setup

By implementing these monitoring approaches, you'll gain visibility into your FastAPI application's behavior and be able to quickly detect and resolve issues before they impact your users.

Additional Resources

Exercises

  1. Set up basic logging for a FastAPI application and observe how different types of requests are logged.
  2. Implement a health check endpoint that verifies connectivity to a database.
  3. Integrate Prometheus metrics and create a custom metric that tracks a business-relevant value.
  4. Set up distributed tracing for a multi-service application using OpenTelemetry.
  5. Create a Grafana dashboard that visualizes the key performance metrics for your FastAPI application.
  6. Implement proper error handling with Sentry integration and trigger test errors to verify it's working.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)