Docker Monitoring Stack

Introduction

When running applications in Docker containers, monitoring becomes essential to ensure optimal performance, detect issues early, and maintain system reliability. A Docker monitoring stack provides visibility into container health, resource usage, and application performance metrics.

In this tutorial, we'll build a complete monitoring solution for Docker environments using popular open-source tools. By the end, you'll have a functional monitoring stack that tracks container metrics, visualizes data, and alerts you when things go wrong.

Why Monitor Docker Containers?

Docker containers are ephemeral by nature - they can be created, destroyed, or replaced frequently. This dynamic nature makes traditional monitoring approaches insufficient. Here's why dedicated container monitoring is crucial:

Resource Utilization: Track CPU, memory, network, and disk usage to prevent resource starvation
Container Health: Monitor container states, restarts, and lifecycle events
Application Performance: Measure how your containerized applications perform under load
Troubleshooting: Quickly identify and resolve issues in complex container environments
Capacity Planning: Make informed decisions about infrastructure scaling based on historical data

Components of a Docker Monitoring Stack

A comprehensive Docker monitoring stack typically consists of these components:

In our implementation, we'll use:

Prometheus: For collecting and storing metrics
Node Exporter: For host-level metrics
cAdvisor: For container-level metrics
Grafana: For visualization and dashboards
Alertmanager: For handling alerts

Setting Up the Monitoring Stack

Let's implement our monitoring stack step by step using Docker Compose.

Step 1: Create the Project Structure

First, create a directory for your monitoring stack project:

mkdir docker-monitoring-stack
cd docker-monitoring-stack

Create the necessary subdirectories:

mkdir -p prometheus/config
mkdir -p grafana/provisioning/dashboards
mkdir -p grafana/provisioning/datasources
mkdir -p alertmanager

Step 2: Configure Prometheus

Create a Prometheus configuration file at prometheus/config/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
  
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Create an alert rules file at prometheus/config/alert_rules.yml:

groups:
- name: example
  rules:
  - alert: HighContainerCPU
    expr: (sum by(name) (rate(container_cpu_usage_seconds_total{image!=""}[1m])) * 100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage detected"
      description: "Container {{ $labels.name }} is using high CPU ({{ $value }}%)"
  
  - alert: HighContainerMemory
    expr: (container_memory_usage_bytes{image!=""} / container_spec_memory_limit_bytes{image!=""} * 100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage detected"
      description: "Container {{ $labels.name }} is using high memory ({{ $value }}%)"

Step 3: Configure Alertmanager

Create an Alertmanager configuration file at alertmanager/config.yml:

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:5001/'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Step 4: Configure Grafana

Create a Grafana datasource configuration file at grafana/provisioning/datasources/datasource.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

Step 5: Create the Docker Compose File

Create a docker-compose.yml file in the root directory:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/config:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager:/etc/alertmanager
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
    ports:
      - "9093:9093"
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data:
  grafana_data:

Step 6: Start the Monitoring Stack

Launch the entire monitoring stack with the following command:

docker-compose up -d

This will start all the services defined in your Docker Compose file. You can check the status with:

docker-compose ps

Expected output:

      Name                    Command               State           Ports         
--------------------------------------------------------------------------------
alertmanager       /bin/alertmanager --config ...   Up      0.0.0.0:9093->9093/tcp
cadvisor           /usr/bin/cadvisor -logtostderr   Up      0.0.0.0:8080->8080/tcp
grafana            /run.sh                          Up      0.0.0.0:3000->3000/tcp
node-exporter      /bin/node_exporter --path. ...   Up      0.0.0.0:9100->9100/tcp
prometheus         /bin/prometheus --config.f ...   Up      0.0.0.0:9090->9090/tcp

Accessing the Monitoring Tools

After starting the stack, you can access each component through your web browser:

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (login with admin/admin)
cAdvisor: http://localhost:8080
Alertmanager: http://localhost:9093

Using Prometheus for Metrics Collection

Prometheus collects time-series data from your containers and host system. Navigate to http://localhost:9090 to access the Prometheus web interface.

Exploring Available Metrics

To see all available metrics, go to the Prometheus UI and click on the "Graph" tab. You can then use the dropdown to browse metrics or start typing to search.

Some useful container metrics to explore:

container_cpu_usage_seconds_total: Total CPU time consumed
container_memory_usage_bytes: Current memory usage
container_network_receive_bytes_total: Network bytes received
container_network_transmit_bytes_total: Network bytes transmitted

Try a simple query to see CPU usage for all containers:

rate(container_cpu_usage_seconds_total{image!=""}[1m]) * 100

Visualizing Data with Grafana

Grafana provides powerful visualization capabilities for your monitoring data. Let's create a simple dashboard for container metrics.

Log in to Grafana at http://localhost:3000 with username admin and password admin
Create a new dashboard by clicking on "+ Create" > "Dashboard"
Add a new panel by clicking "Add panel"
Select "Prometheus" as the data source

Enter the following query for container CPU usage:

sum by(name) (rate(container_cpu_usage_seconds_total{image!=""}[1m])) * 100

Configure the panel title, axes, and visualization settings
Save the dashboard

Example Dashboard Configuration

Here's an example of a basic container monitoring dashboard with multiple panels:

Container CPU Usage:
- Query: sum by(name) (rate(container_cpu_usage_seconds_total{image!=""}[1m])) * 100
- Format: Graph
- Units: Percent
- Title: "Container CPU Usage (%)"
Container Memory Usage:
- Query: container_memory_usage_bytes{image!=""} / 1024 / 1024
- Format: Graph
- Units: Megabytes
- Title: "Container Memory Usage (MB)"

Container Network Usage:

Query:

sum by(name) (rate(container_network_receive_bytes_total{image!=""}[1m])) 
sum by(name) (rate(container_network_transmit_bytes_total{image!=""}[1m]))

Format: Graph
Units: Bytes/second
Title: "Container Network Traffic"

Container Status:
- Query: sum by(name, state) (container_tasks_state{image!=""})
- Format: Table
- Title: "Container States"

Setting Up Alerts

Prometheus and Alertmanager work together to provide alerting capabilities. We've already configured some basic alerts in the prometheus/config/alert_rules.yml file.

Let's examine how these alerts work:

HighContainerCPU alert:
- Triggers when a container uses more than 80% CPU for 1 minute
- Expression: (sum by(name) (rate(container_cpu_usage_seconds_total{image!=""}[1m])) * 100) > 80
HighContainerMemory alert:
- Triggers when a container uses more than 80% of its memory limit for 1 minute
- Expression: (container_memory_usage_bytes{image!=""} / container_spec_memory_limit_bytes{image!=""} * 100) > 80

You can view the status of your alerts in Prometheus by navigating to the "Alerts" tab. When an alert fires, it will be sent to Alertmanager, which handles notification routing.

In a production environment, you'd configure Alertmanager to send notifications via email, Slack, PagerDuty, or other channels.

Extending Your Monitoring Stack

The basic monitoring stack we've set up provides a solid foundation, but there are several ways to enhance it:

Application-specific metrics: Instrument your applications to expose Prometheus metrics for business-specific KPIs
Additional exporters: Add exporters for databases, message queues, and other services
High availability: Configure Prometheus and Alertmanager for high availability
Remote storage: Configure Prometheus to use long-term storage solutions
Service discovery: Use Prometheus service discovery for dynamic environments

Adding Redis Monitoring Example

Let's add Redis monitoring to our stack as an example of extending the monitoring capabilities:

Add the Redis exporter to your docker-compose.yml:

redis-exporter:
  image: oliver006/redis_exporter:latest
  container_name: redis-exporter
  restart: unless-stopped
  command:
    - '--redis.addr=redis://redis:6379'
  ports:
    - "9121:9121"
  networks:
    - monitoring

Add a Redis service:

redis:
  image: redis:latest
  container_name: redis
  restart: unless-stopped
  ports:
    - "6379:6379"
  networks:
    - monitoring

Update the Prometheus configuration to scrape Redis metrics:

- job_name: 'redis-exporter'
  static_configs:
    - targets: ['redis-exporter:9121']

Restart the stack to apply changes:

docker-compose up -d

Best Practices for Docker Monitoring

When implementing a monitoring solution for Docker environments, keep these best practices in mind:

Monitor both containers and applications: Collect system-level and application-level metrics
Set meaningful alerts: Focus on actionable alerts that indicate real problems
Use labels consistently: Apply consistent labels to your containers for better filtering
Retain historical data: Configure appropriate data retention based on your needs
Document dashboards: Add documentation to dashboards so others can understand them
Test your alerting: Regularly test that alerts are triggered and notifications are sent
Monitor the monitoring: Set up monitoring for your monitoring stack itself

Troubleshooting Common Issues

Here are solutions to common issues you might encounter:

Issue: Prometheus can't scrape targets

Check network connectivity between containers
Verify target endpoints are accessible
Check Prometheus configuration for errors

Issue: High resource usage by monitoring stack

Adjust scrape intervals
Configure appropriate retention periods
Use recording rules for frequently used queries

Issue: Missing container metrics

Verify cAdvisor is running properly
Check container labels and filtering rules
Ensure Docker is configured to expose metrics

Summary

In this tutorial, we've built a comprehensive Docker monitoring stack using Prometheus, Grafana, cAdvisor, Node Exporter, and Alertmanager. This stack provides visibility into container and host metrics, visualization capabilities, and alerting for problematic conditions.

Key takeaways:

Container monitoring requires specialized tools due to the ephemeral nature of containers
Prometheus provides a powerful platform for metrics collection and alerting
Grafana enables creation of informative dashboards
A complete monitoring solution includes metrics collection, storage, visualization, and alerting
Extending the stack with additional exporters allows monitoring of specific applications

Additional Resources

To deepen your knowledge about Docker monitoring:

Official Documentation:
Books:
- "Prometheus: Up & Running" by Brian Brazil
- "Docker Deep Dive" by Nigel Poulton

Exercises

Add a new exporter for MySQL or PostgreSQL to the monitoring stack
Create a custom Grafana dashboard showing container health metrics
Configure Alertmanager to send notifications to a webhook endpoint
Implement a recording rule in Prometheus for a frequently used query
Create an alert for containers that restart frequently

By completing these exercises, you'll gain hands-on experience with the different components of the Docker monitoring stack and be well-prepared to implement monitoring in your own container environments.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Monitor Docker Containers?​

Components of a Docker Monitoring Stack​

Setting Up the Monitoring Stack​

Step 1: Create the Project Structure​

Step 2: Configure Prometheus​

Step 3: Configure Alertmanager​

Step 4: Configure Grafana​

Step 5: Create the Docker Compose File​

Step 6: Start the Monitoring Stack​

Accessing the Monitoring Tools​

Using Prometheus for Metrics Collection​

Exploring Available Metrics​

Visualizing Data with Grafana​

Example Dashboard Configuration​

Setting Up Alerts​

Extending Your Monitoring Stack​

Adding Redis Monitoring Example​

Best Practices for Docker Monitoring​

Troubleshooting Common Issues​

Issue: Prometheus can't scrape targets​

Issue: High resource usage by monitoring stack​

Issue: Missing container metrics​

Summary​

Additional Resources​

Exercises​