API Monitoring with Prometheus

Introduction

API monitoring is a critical part of modern application observability. As applications increasingly rely on microservices and distributed architectures, APIs serve as the communication channels between services. Effective API monitoring helps ensure that these channels remain healthy, performant, and reliable.

In this guide, we'll explore how Prometheus can be used to implement comprehensive API monitoring strategies. You'll learn about key metrics to track, how to configure Prometheus for API monitoring, and practical examples that demonstrate real-world applications.

Why Monitor APIs?

Before diving into the implementation details, let's understand why API monitoring is essential:

Service Health: APIs are the lifelines of your applications. If they fail, your entire system may become unavailable.
Performance Insights: Monitoring provides data about response times, helping you identify and address bottlenecks.
Usage Patterns: Understanding how your APIs are used helps in capacity planning and resource allocation.
Error Detection: Monitoring helps detect errors and failures quickly, minimizing impact on users.
SLA Compliance: Many organizations have Service Level Agreements (SLAs) that require tracking API performance metrics.

Key API Metrics to Monitor

When monitoring APIs with Prometheus, consider tracking these essential metrics:

1. Request Rate

The number of requests your API receives over time.

rate(http_requests_total[5m])

2. Error Rate

The percentage of requests that result in errors.

rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])

3. Response Time

How long it takes your API to respond to requests.

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

4. Resource Utilization

CPU, memory, and network usage associated with your API service.

sum(rate(process_cpu_seconds_total{job="api-service"}[5m])) by (instance)

5. Availability

Whether your API is up and responding.

up{job="api-service"}

Setting Up API Monitoring with Prometheus

Let's walk through the process of configuring Prometheus to monitor an API:

Step 1: Instrument Your API

First, you need to instrument your API code to expose metrics. Here's an example using a Node.js Express application with the prom-client library:

const express = require('express');
const promClient = require('prom-client');

// Create a Registry to register metrics
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });

// Create custom metrics
const httpRequestDurationMicroseconds = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});

const httpRequestsTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

// Register the metrics
register.registerMetric(httpRequestDurationMicroseconds);
register.registerMetric(httpRequestsTotal);

const app = express();

// Middleware to collect metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - start;
    httpRequestDurationMicroseconds
      .labels(req.method, req.path, res.statusCode)
      .observe(duration / 1000); // Convert to seconds
    
    httpRequestsTotal
      .labels(req.method, req.path, res.statusCode)
      .inc();
  });
  
  next();
});

// Endpoint to expose metrics
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

// API endpoints
app.get('/api/users', (req, res) => {
  res.json({ users: ['Alice', 'Bob', 'Charlie'] });
});

app.listen(3000, () => {
  console.log('Server is running on port 3000');
});

Step 2: Configure Prometheus to Scrape Your API

Create or update your prometheus.yml configuration file to include your API service:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'api-service'
    static_configs:
      - targets: ['localhost:3000']

Step 3: Visualize and Alert on API Metrics

After collecting metrics, you can create dashboards in Grafana to visualize them. Here's a simple Prometheus query to monitor API request rates by endpoint:

sum(rate(http_requests_total[5m])) by (route)

Practical Example: Monitoring a RESTful API

Let's walk through a complete example of monitoring a RESTful API that provides user data.

Architecture Diagram

Key Components:

REST API Service: Exposes endpoints for user data and a /metrics endpoint
Prometheus Server: Scrapes metrics from the API service
Grafana: Provides visualization of API metrics
Alert Manager: Sends alerts when API metrics exceed thresholds

Alert Rules Example

Create api_alerts.yml to define alert rules:

groups:
- name: api_alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High API Error Rate"
      description: "Error rate is above 5% for the last 5 minutes ({{ $value | printf \"%.2f\" }})"
  
  - alert: SlowResponseTime
    expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Slow API Response Time"
      description: "95th percentile of API response time is above 1 second ({{ $value | printf \"%.2f\" }}s)"
  
  - alert: HighRequestRate
    expr: sum(rate(http_requests_total[5m])) > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High API Request Rate"
      description: "API is receiving more than 100 requests per second ({{ $value | printf \"%.2f\" }})"

Add this to your Prometheus configuration:

rule_files:
  - "api_alerts.yml"

Recording Rules for API Metrics

Recording rules can pre-compute frequently used or complex queries. Create api_recording_rules.yml:

groups:
- name: api_recording_rules
  rules:
  - record: api:request_rate:5m
    expr: sum(rate(http_requests_total[5m])) by (route)
  
  - record: api:error_rate:5m
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (route) / sum(rate(http_requests_total[5m])) by (route)
  
  - record: api:request_duration_p95:5m
    expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))

Red Flags and Common Issues

When monitoring APIs with Prometheus, watch out for:

High Cardinality: Be careful with label combinations that can create too many time series. For example, avoid using user IDs or session IDs as labels.
Missing Instrumentation: Ensure all critical API endpoints are instrumented.
Misleading Averages: Don't rely solely on average response times; use percentiles (histogram_quantile) for a more accurate picture.
Blind Spots: Make sure to monitor both successful requests and failures.
Alert Fatigue: Configure meaningful alert thresholds to avoid unnecessary notifications.

Advanced API Monitoring Techniques

Circuit Breaker Metrics

If your API implements circuit breakers, monitor their states:

const circuitBreakerState = new promClient.Gauge({
  name: 'circuit_breaker_state',
  help: 'Circuit breaker state (0=open, 1=half-open, 2=closed)',
  labelNames: ['service']
});

// Register and update the metric when circuit breaker state changes
register.registerMetric(circuitBreakerState);
circuitBreakerState.set({ service: 'database' }, 2); // Closed

Rate Limiting Metrics

Monitor rate limiting to understand API usage patterns:

const rateLimitHits = new promClient.Counter({
  name: 'rate_limit_hits_total',
  help: 'Total number of rate limit hits',
  labelNames: ['client_id', 'endpoint']
});

register.registerMetric(rateLimitHits);
// Increment when a client hits rate limit
rateLimitHits.inc({ client_id: 'client123', endpoint: '/api/users' });

Summary

Effective API monitoring with Prometheus involves:

Instrumenting your API code to expose relevant metrics
Configuring Prometheus to scrape these metrics
Setting up dashboards to visualize API performance
Creating alert rules to notify when issues occur
Using recording rules to optimize query performance

By implementing these practices, you can ensure your APIs remain reliable, performant, and meet your service level objectives.

Exercises

Basic Setup: Instrument a simple API with Prometheus metrics and set up a scrape configuration.
Dashboard Creation: Create a Grafana dashboard showing key API metrics.
Alert Configuration: Configure alerts for high error rates and slow response times.
Custom Metric: Implement a custom metric for tracking business-specific API information.
High-Cardinality Challenge: Solve a high-cardinality problem by reconsidering label usage.

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Monitor APIs?​

Key API Metrics to Monitor​

1. Request Rate​

2. Error Rate​

3. Response Time​

4. Resource Utilization​

5. Availability​

Setting Up API Monitoring with Prometheus​

Step 1: Instrument Your API​

Step 2: Configure Prometheus to Scrape Your API​

Step 3: Visualize and Alert on API Metrics​

Practical Example: Monitoring a RESTful API​

Architecture Diagram​

Key Components:​

Alert Rules Example​

Recording Rules for API Metrics​

Red Flags and Common Issues​

Advanced API Monitoring Techniques​

Circuit Breaker Metrics​

Rate Limiting Metrics​

Summary​

Exercises​

Additional Resources​