Prometheus Introduction

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit that has become a cornerstone of modern infrastructure observability. Originally built at SoundCloud in 2012, Prometheus was later donated to the Cloud Native Computing Foundation (CNCF) where it became the second project to graduate, after Kubernetes.

Prometheus excels at monitoring highly dynamic container environments, microservices architectures, and cloud-native applications. It provides powerful querying capabilities, robust visualizations, and a reliable alerting system.

Core Concepts

Before diving into practical examples, let's understand some fundamental Prometheus concepts:

Time Series Data

Prometheus collects and stores all data as time series - streams of timestamped values belonging to the same metric with the same set of labeled dimensions. Each time series is uniquely identified by:

Its metric name (e.g., http_requests_total)
A set of key-value pairs called labels (e.g., {method="GET", endpoint="/api/users"})

Metrics Types

Prometheus supports four core metric types:

Counter: A cumulative metric that can only increase or be reset to zero (e.g., number of requests served)
Gauge: A metric that can increase and decrease (e.g., memory usage)
Histogram: Samples observations and counts them in configurable buckets (e.g., request durations)
Summary: Similar to histogram, but also calculates configurable quantiles

Pull-Based Architecture

Unlike many monitoring systems that push metrics to a central server, Prometheus follows a pull-based approach:

Prometheus Server regularly scrapes (pulls) metrics from instrumented applications or services. This approach offers several advantages:

You can run your monitoring even when your alerting system is broken
You can more easily detect if a service is down
Multiple Prometheus servers can scrape the same targets without causing issues

Key Components of Prometheus

The Prometheus ecosystem consists of multiple components:

Prometheus Server: The core component that scrapes and stores time series data
Client Libraries: Libraries for instrumenting application code
Pushgateway: For supporting short-lived jobs
Exporters: Tools that expose metrics from existing systems
Alertmanager: Handles alerts sent by the Prometheus server
Visualization Tools: Such as Grafana or the built-in Prometheus web UI

Getting Started with Prometheus

Let's start with a basic setup to demonstrate how Prometheus works.

Installing Prometheus

The easiest way to get started with Prometheus is to use Docker:

docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Basic Configuration

Prometheus uses YAML for configuration. Here's a minimal prometheus.yml file:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

This configuration tells Prometheus to:

Scrape targets every 15 seconds
Monitor itself (Prometheus exposes its own metrics on port 9090)

Exploring the Prometheus UI

Once Prometheus is running, you can access the web UI at http://localhost:9090. The interface provides several tabs:

Graph: For querying and visualizing metrics
Status: Shows runtime information, targets, and service discovery
Alerts: Displays configured alerts and their current state
Help: Documentation and configuration information

Instrumenting an Application

Let's create a simple example of how to instrument a Node.js application with Prometheus:

const express = require('express');
const promClient = require('prom-client');

// Create a Registry to register the metrics
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });

// Create a counter for HTTP requests
const httpRequestsTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'path', 'status'],
  registers: [register]
});

// Create a histogram for request durations
const httpRequestDurationMs = new promClient.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['method', 'path', 'status'],
  buckets: [10, 50, 100, 500, 1000, 5000],
  registers: [register]
});

const app = express();

// Middleware to measure request duration
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - start;
    httpRequestsTotal.inc({ 
      method: req.method, 
      path: req.path, 
      status: res.statusCode 
    });
    httpRequestDurationMs.observe(
      { method: req.method, path: req.path, status: res.statusCode },
      duration
    );
  });
  
  next();
});

// Sample endpoint
app.get('/api/users', (req, res) => {
  setTimeout(() => {
    res.json({ users: ['Alice', 'Bob', 'Charlie'] });
  }, Math.floor(Math.random() * 100));
});

// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(3000, () => {
  console.log('Server is running on port 3000');
});

How This Works:

We created two metrics:
- A counter (http_requests_total) to track the total number of requests
- A histogram (http_request_duration_ms) to measure request durations
We added middleware to measure each request and update our metrics
We exposed a /metrics endpoint that Prometheus can scrape

Prometheus Configuration to Scrape Our Application

To make Prometheus scrape our application, we need to add a job to our prometheus.yml:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'node-app'
    static_configs:
      - targets: ['localhost:3000']

Querying Prometheus with PromQL

Prometheus has its own query language called PromQL (Prometheus Query Language). Here are some basic query examples:

Simple Queries

# Get the total number of HTTP requests
http_requests_total

# Filter by label
http_requests_total{method="GET"}

# Rate of requests over the last 5 minutes
rate(http_requests_total[5m])

Aggregation

# Total requests across all endpoints
sum(http_requests_total)

# Requests per method
sum by(method) (http_requests_total)

# 95th percentile request duration for each endpoint
histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[5m])) by (path, le))

Alerting Basics

Prometheus can trigger alerts when certain conditions are met. Alerts are defined in a separate file, typically called alerts.yml:

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[5m])) by (path, le)) > 200
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High request latency on {{ $labels.path }}"
      description: "95th percentile latency is above 200ms (current value: {{ $value }}ms)"

This alert will fire if the 95th percentile of request latency exceeds 200ms for 5 minutes.

Real-World Use Cases

Prometheus is used in various scenarios:

Container Monitoring

In Kubernetes environments, Prometheus can monitor:

Node-level metrics (CPU, memory, disk, network)
Pod and container metrics
Application-specific metrics

Microservices Monitoring

For microservices architectures, Prometheus helps track:

Service health and availability
Request rates, error rates, and latencies
Inter-service dependencies and communication patterns

Database Monitoring

Using exporters, Prometheus can monitor databases like:

PostgreSQL
MySQL
MongoDB
Redis

Best Practices

When implementing Prometheus monitoring:

Choose meaningful metric names that follow conventions:
- Use prefixes for application-specific metrics (app_*)
- Use suffixes that describe the unit (*_seconds, *_bytes)
Use labels effectively but avoid high cardinality:
- Good: http_requests_total{method="GET", status="200", endpoint="/api/users"}
- Bad: http_requests_total{user_id="12345"} (too many possible values)
Monitor the Four Golden Signals:
- Latency (response time)
- Traffic (requests per second)
- Errors (rate of failed requests)
- Saturation (how "full" your system is)
Set up recording rules for frequently used or complex queries to improve performance

Summary

Prometheus provides a powerful monitoring solution with:

A multi-dimensional data model based on time series data
A flexible query language (PromQL)
A pull-based architecture that's reliable and simple
No reliance on distributed storage
A range of client libraries and integrations

Through this introduction, you've learned about Prometheus' core concepts, how to set it up, instrument an application, write queries, and configure basic alerts. Prometheus forms the foundation of modern observability stacks and is an essential tool for anyone working with cloud-native applications.

Additional Resources

Exercises

Install Prometheus using Docker and configure it to monitor itself
Create a simple web application and instrument it with Prometheus metrics
Write PromQL queries to:
- Calculate the error rate (percentage of HTTP status ≥ 400)
- Find the three endpoints with the highest latency
- Monitor the rate of incoming requests per second
Configure an alert for when your application's error rate exceeds 5%

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

What is Prometheus?​

Core Concepts​

Time Series Data​

Metrics Types​

Pull-Based Architecture​

Key Components of Prometheus​

Getting Started with Prometheus​

Installing Prometheus​

Basic Configuration​

Exploring the Prometheus UI​

Instrumenting an Application​

How This Works:​

Prometheus Configuration to Scrape Our Application​

Querying Prometheus with PromQL​

Simple Queries​

Aggregation​

Alerting Basics​

Real-World Use Cases​

Container Monitoring​

Microservices Monitoring​

Database Monitoring​

Best Practices​

Summary​

Additional Resources​

Exercises​