Skip to main content

PromQL Introduction

What is PromQL?

PromQL (Prometheus Query Language) is the query language built specifically for Prometheus, a powerful open-source monitoring and alerting system. PromQL allows you to select and aggregate time series data in real time, enabling you to:

  • Explore and visualize metrics
  • Create alerts based on metric thresholds
  • Generate dashboards for monitoring system behavior
  • Perform complex calculations on collected metrics

As a domain-specific language designed for time series data, PromQL makes it possible to extract valuable insights from your metrics with concise, expressive queries.

Understanding the Prometheus Data Model

Before diving into PromQL syntax, let's understand how Prometheus stores data:

  1. Time Series: Prometheus stores all data as time series - streams of timestamped values that belong to the same metric and set of labeled dimensions.

  2. Metrics: Each time series is uniquely identified by its metric name and optional key-value pairs called labels.

  3. Labels: These provide multi-dimensional data capabilities, allowing you to identify specific instances or characteristics of what's being measured.

A typical metric in Prometheus looks like this:

http_requests_total{status="200", method="GET", instance="10.0.0.1:5000"}

Here:

  • http_requests_total is the metric name
  • status="200", method="GET", and instance="10.0.0.1:5000" are labels

PromQL Data Types

PromQL works with four main data types:

  1. Instant Vector: A set of time series containing a single sample for each time series, all sharing the same timestamp.

  2. Range Vector: A set of time series containing a range of data points over time for each time series.

  3. Scalar: A simple numeric floating point value.

  4. String: A simple string value (currently only used for internal purposes).

For most queries, you'll work primarily with instant vectors and range vectors.

Basic PromQL Queries

Selecting Metrics

The simplest query selects a metric by name:

promql
http_requests_total

This returns an instant vector with all time series with the metric name http_requests_total.

Filtering with Labels

You can filter time series using label selectors:

promql
http_requests_total{status="200"}

This selects only time series where the status label equals 200.

You can use multiple label matchers:

promql
http_requests_total{status="200", method="GET"}

Label Matching Operators

PromQL supports several matching operators:

  • =: Exact match
  • !=: Not equal
  • =~: Regex match
  • !~: Regex not match

Examples:

promql
# All HTTP requests except those with status code 200
http_requests_total{status!="200"}

# All HTTP requests with status codes starting with "4" (client errors)
http_requests_total{status=~"4.."}

# All HTTP requests except those with status codes starting with "5" (server errors)
http_requests_total{status!~"5.."}

Time Windows with Range Vectors

Range vectors select data over a time period. Add a time range selector [time] after a metric:

promql
http_requests_total{status="200"}[5m]

This returns the values of this metric over the last 5 minutes.

Common time units:

  • s - seconds
  • m - minutes
  • h - hours
  • d - days
  • w - weeks
  • y - years

Basic Aggregation Operations

Range Vector Selectors

These functions help analyze data over time:

rate(): Calculates the per-second average rate of increase of the time series in a range vector.

promql
rate(http_requests_total{status="200"}[5m])

This shows the rate of 200 status requests per second, averaged over 5 minutes.

irate(): Calculates an instant rate based on the last two data points.

promql
irate(http_requests_total{status="200"}[5m])

Use irate() for fast-moving counters or highly volatile graphs.

Aggregation Operators

PromQL provides functions to aggregate metrics across multiple time series:

sum: Add values

promql
sum(rate(http_requests_total[5m])) by (status)

This adds up request rates grouped by status code.

avg: Calculate average

promql
avg(rate(http_requests_total[5m])) by (instance)

min/max: Find minimum/maximum values

promql
max(rate(node_cpu_seconds_total{mode="system"}[5m])) by (instance)

count: Count number of elements

promql
count(up)

Counts how many targets are up.

Operators in PromQL

PromQL supports various arithmetic and comparison operators:

Arithmetic Operators

  • + (addition)
  • - (subtraction)
  • * (multiplication)
  • / (division)
  • % (modulo)
  • ^ (power)

Example:

promql
# Calculate HTTP error percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

Comparison Operators

  • == (equal)
  • != (not equal)
  • > (greater than)
  • < (less than)
  • >= (greater than or equal)
  • <= (less than or equal)

Practical Examples

Example 1: Monitoring CPU Usage

To monitor the CPU usage of your instances:

promql
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

This calculates the percentage of CPU that is not idle (i.e., used) per instance.

Example 2: Memory Usage

To check memory usage percentage:

promql
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

Example 3: High Latency Requests

Find endpoints with high latency (over 0.5s):

promql
http_request_duration_seconds{quantile="0.9"} > 0.5

Example 4: Error Rate Alert Condition

An alert condition for when error rate exceeds 5%:

promql
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) 
/
sum(rate(http_requests_total[5m])) by (job) > 0.05

Visualizing the Query Evaluation Process

Let's understand how PromQL evaluates a common query:

Best Practices for Writing PromQL

  1. Start simple: Begin with basic queries before building complexity
  2. Use clear metric names: Follow Prometheus naming conventions
  3. Use labels effectively: Don't overuse labels, but use enough for proper filtering
  4. Consider cardinality: Be cautious of high-cardinality labels
  5. Use rate() for counters: Always use rate() or irate() when querying counters
  6. Use appropriate time windows: Match your time window to your scrape interval and the nature of your data
  7. Test with Prometheus UI: Verify queries in the Prometheus expression browser before using them in dashboards or alerts

Summary

PromQL is a powerful query language designed specifically for time-series data in Prometheus. In this introduction, we've covered:

  • The Prometheus data model and how metrics are stored
  • Basic data types in PromQL
  • How to select and filter metrics
  • Working with time ranges using range vectors
  • Common aggregation operations and functions
  • Practical examples for real-world monitoring scenarios

With these fundamentals, you're ready to start writing your own queries to extract valuable insights from your metrics.

Further Learning

Here are some exercises to practice your PromQL skills:

  1. Write a query to show the 95th percentile of HTTP request durations across all your services.
  2. Create a query that shows the top 5 endpoints with the highest error rates.
  3. Develop a query that shows the rate of increase in memory usage over the last hour.
  4. Write an expression that could be used for alerting when disk space is running low (below 10% free).

As you continue learning PromQL, remember that the official Prometheus documentation is an excellent resource for deeper exploration of advanced functions and techniques.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)