Skip to main content

FastAPI Production Server

When moving your FastAPI application from development to production, you need to consider using a proper ASGI server configuration. FastAPI's built-in development server (uvicorn) isn't designed to handle production workloads efficiently. In this guide, we'll explore how to set up a production-grade server for your FastAPI applications.

Introduction to Production Servers

During development, running FastAPI with a simple uvicorn main:app --reload command works fine. However, this setup lacks features necessary for production environments:

  • Limited concurrency
  • No automatic restarts after crashes
  • Inefficient use of system resources
  • No load balancing

Production servers solve these issues by providing:

  • Worker management
  • Process monitoring
  • Better resource utilization
  • Enhanced security
  • Load balancing

Let's explore the main options for deploying FastAPI in production.

Uvicorn in Production Mode

While Uvicorn is often used during development, it can also serve as a production server with the right configuration.

Basic Production Configuration

python
# Without the --reload flag
uvicorn main:app --host 0.0.0.0 --port 8000

Using Multiple Workers

python
# Run with 4 worker processes
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

The number of workers typically should match your CPU cores. A common formula is:

workers = (2 * num_cores) + 1

Advanced Uvicorn Configuration

python
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 --log-level warning --no-access-log --limit-concurrency 1000

This command:

  • Runs 4 worker processes
  • Sets the log level to warning
  • Disables access logs (reduces I/O)
  • Limits concurrent connections to 1000

Gunicorn with Uvicorn Workers

Gunicorn (Green Unicorn) is a mature WSGI server that can manage Uvicorn workers, offering better process management.

Installation

bash
pip install gunicorn uvicorn

Basic Configuration

bash
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Where:

  • -w 4: Run 4 worker processes
  • -k uvicorn.workers.UvicornWorker: Use Uvicorn's worker class
  • -b 0.0.0.0:8000: Bind to all interfaces on port 8000

Using a Configuration File

For more complex configurations, create a gunicorn.conf.py file:

python
# gunicorn.conf.py
import multiprocessing

# Server socket
bind = "0.0.0.0:8000"

# Worker processes
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"

# Server mechanics
daemon = False
pidfile = "/tmp/gunicorn.pid"

# Logging
accesslog = "/var/log/gunicorn/access.log"
errorlog = "/var/log/gunicorn/error.log"
loglevel = "info"

# Process naming
proc_name = "fastapi_app"

# Maximum number of simultaneous clients
worker_connections = 1000

# Timeout
timeout = 30

# Maximum requests
max_requests = 10000
max_requests_jitter = 1000

Run Gunicorn with this configuration file:

bash
gunicorn -c gunicorn.conf.py main:app

Hypercorn for ASGI and HTTP/2 Support

Hypercorn is another ASGI server that supports HTTP/2 and WebSockets.

Installation

bash
pip install hypercorn

Basic Usage

bash
hypercorn main:app --bind 0.0.0.0:8000 --workers 4

Configuration File

Create a hypercorn_config.py file:

python
# hypercorn_config.py
bind = ["0.0.0.0:8000"]
workers = 4
worker_class = "asyncio"
keep_alive_timeout = 65
graceful_timeout = 30
accesslog = "-" # Log to stdout
errorlog = "-" # Log to stdout
loglevel = "INFO"

Run with:

bash
hypercorn -c hypercorn_config.py main:app

Docker Deployment

For containerized deployments, you can use Docker to package your FastAPI application with a production server.

Sample Dockerfile

dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Run using Gunicorn with Uvicorn workers
CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8000"]

EXPOSE 8000

Build and run the Docker container:

bash
docker build -t fastapi-app .
docker run -p 8000:8000 fastapi-app

Real-World Example: Complete FastAPI Application with Production Setup

Let's create a simple FastAPI application and prepare it for production deployment:

Project Structure

my_fastapi_app/
├── app/
│ ├── __init__.py
│ ├── main.py
│ ├── models.py
│ ├── routers/
│ └── dependencies.py
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── gunicorn.conf.py

Application Code (app/main.py)

python
from fastapi import FastAPI
from app.routers import items, users
import uvicorn

app = FastAPI(
title="MyAPI",
description="A production-ready FastAPI application",
version="1.0.0",
)

# Include routers
app.include_router(items.router)
app.include_router(users.router)

@app.get("/")
async def root():
return {"message": "Welcome to the API"}

@app.get("/health")
async def health_check():
return {"status": "healthy"}

if __name__ == "__main__":
# For development only
uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=True)

Sample Router (app/routers/items.py)

python
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel

router = APIRouter(
prefix="/items",
tags=["items"],
)

class Item(BaseModel):
id: int
name: str
description: str = None
price: float

items_db = {}

@router.get("/")
async def read_items():
return items_db

@router.get("/{item_id}")
async def read_item(item_id: int):
if item_id not in items_db:
raise HTTPException(status_code=404, detail="Item not found")
return items_db[item_id]

@router.post("/")
async def create_item(item: Item):
items_db[item.id] = item
return item

Docker Compose Setup (docker-compose.yml)

yaml
version: '3'

services:
api:
build: .
ports:
- "8000:8000"
volumes:
- ./logs:/var/log/gunicorn
restart: always
environment:
- ENV=production

Production Start Script (start.sh)

bash
#!/bin/bash

# Create log directories
mkdir -p /var/log/gunicorn

# Run with Gunicorn
exec gunicorn -c gunicorn.conf.py app.main:app

Make the script executable:

bash
chmod +x start.sh

Monitoring and Health Checks

It's important to implement health checks to monitor your FastAPI application in production:

Health Check Endpoint

python
@app.get("/health")
async def health_check():
# You can add database connection checks or other service checks here
return {
"status": "healthy",
"version": "1.0.0",
"timestamp": datetime.now().isoformat()
}

External Monitoring

You can use services like:

  • Prometheus for metrics
  • Grafana for visualization
  • Sentry for error tracking
  • Datadog or New Relic for application performance monitoring

Performance Optimization Tips

  1. Use async efficiently: Make sure I/O-bound operations are properly awaited
  2. Connection pooling: Use connection pools for databases
  3. Caching: Implement Redis or other caching mechanisms
  4. Database optimization: Use proper indexes and optimize queries
  5. Use background tasks: For processing heavy operations
  6. Limit request body size: Prevent excessive memory usage
  7. Implement rate limiting: Protect against DoS attacks

Scaling Strategies

Horizontal Scaling

Deploy multiple instances behind a load balancer:

[Client] → [Load Balancer] → [FastAPI Instance 1]
→ [FastAPI Instance 2]
→ [FastAPI Instance 3]

Vertical Scaling

Increase resources (CPU, memory) for your server.

Summary

Setting up a production server for FastAPI involves:

  1. Choosing the right ASGI server (Uvicorn, Gunicorn with Uvicorn workers, or Hypercorn)
  2. Configuring appropriate worker counts and connection limits
  3. Setting up logging and monitoring
  4. Using containers for deployment consistency
  5. Implementing health checks and performance optimizations

By following these best practices, your FastAPI application will be well-equipped to handle production traffic with improved reliability, performance, and scalability.

Additional Resources

Exercises

  1. Set up a FastAPI application with Gunicorn and Uvicorn workers using a configuration file.
  2. Create a Docker container for your FastAPI application with a production server.
  3. Implement a comprehensive health check endpoint that checks database connectivity.
  4. Configure logging to rotate log files daily and archive them after a week.
  5. Implement a load test to determine the optimal number of workers for your application.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)