PromQL Introduction
What is PromQL?
PromQL (Prometheus Query Language) is the query language built specifically for Prometheus, a powerful open-source monitoring and alerting system. PromQL allows you to select and aggregate time series data in real time, enabling you to:
- Explore and visualize metrics
- Create alerts based on metric thresholds
- Generate dashboards for monitoring system behavior
- Perform complex calculations on collected metrics
As a domain-specific language designed for time series data, PromQL makes it possible to extract valuable insights from your metrics with concise, expressive queries.
Understanding the Prometheus Data Model
Before diving into PromQL syntax, let's understand how Prometheus stores data:
-
Time Series: Prometheus stores all data as time series - streams of timestamped values that belong to the same metric and set of labeled dimensions.
-
Metrics: Each time series is uniquely identified by its metric name and optional key-value pairs called labels.
-
Labels: These provide multi-dimensional data capabilities, allowing you to identify specific instances or characteristics of what's being measured.
A typical metric in Prometheus looks like this:
http_requests_total{status="200", method="GET", instance="10.0.0.1:5000"}
Here:
http_requests_total
is the metric namestatus="200"
,method="GET"
, andinstance="10.0.0.1:5000"
are labels
PromQL Data Types
PromQL works with four main data types:
-
Instant Vector: A set of time series containing a single sample for each time series, all sharing the same timestamp.
-
Range Vector: A set of time series containing a range of data points over time for each time series.
-
Scalar: A simple numeric floating point value.
-
String: A simple string value (currently only used for internal purposes).
For most queries, you'll work primarily with instant vectors and range vectors.
Basic PromQL Queries
Selecting Metrics
The simplest query selects a metric by name:
http_requests_total
This returns an instant vector with all time series with the metric name http_requests_total
.
Filtering with Labels
You can filter time series using label selectors:
http_requests_total{status="200"}
This selects only time series where the status
label equals 200
.
You can use multiple label matchers:
http_requests_total{status="200", method="GET"}
Label Matching Operators
PromQL supports several matching operators:
=
: Exact match!=
: Not equal=~
: Regex match!~
: Regex not match
Examples:
# All HTTP requests except those with status code 200
http_requests_total{status!="200"}
# All HTTP requests with status codes starting with "4" (client errors)
http_requests_total{status=~"4.."}
# All HTTP requests except those with status codes starting with "5" (server errors)
http_requests_total{status!~"5.."}
Time Windows with Range Vectors
Range vectors select data over a time period. Add a time range selector [time]
after a metric:
http_requests_total{status="200"}[5m]
This returns the values of this metric over the last 5 minutes.
Common time units:
s
- secondsm
- minutesh
- hoursd
- daysw
- weeksy
- years
Basic Aggregation Operations
Range Vector Selectors
These functions help analyze data over time:
rate(): Calculates the per-second average rate of increase of the time series in a range vector.
rate(http_requests_total{status="200"}[5m])
This shows the rate of 200 status requests per second, averaged over 5 minutes.
irate(): Calculates an instant rate based on the last two data points.
irate(http_requests_total{status="200"}[5m])
Use irate()
for fast-moving counters or highly volatile graphs.
Aggregation Operators
PromQL provides functions to aggregate metrics across multiple time series:
sum: Add values
sum(rate(http_requests_total[5m])) by (status)
This adds up request rates grouped by status code.
avg: Calculate average
avg(rate(http_requests_total[5m])) by (instance)
min/max: Find minimum/maximum values
max(rate(node_cpu_seconds_total{mode="system"}[5m])) by (instance)
count: Count number of elements
count(up)
Counts how many targets are up.
Operators in PromQL
PromQL supports various arithmetic and comparison operators:
Arithmetic Operators
+
(addition)-
(subtraction)*
(multiplication)/
(division)%
(modulo)^
(power)
Example:
# Calculate HTTP error percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
Comparison Operators
==
(equal)!=
(not equal)>
(greater than)<
(less than)>=
(greater than or equal)<=
(less than or equal)
Practical Examples
Example 1: Monitoring CPU Usage
To monitor the CPU usage of your instances:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
This calculates the percentage of CPU that is not idle (i.e., used) per instance.
Example 2: Memory Usage
To check memory usage percentage:
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
Example 3: High Latency Requests
Find endpoints with high latency (over 0.5s):
http_request_duration_seconds{quantile="0.9"} > 0.5
Example 4: Error Rate Alert Condition
An alert condition for when error rate exceeds 5%:
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
/
sum(rate(http_requests_total[5m])) by (job) > 0.05
Visualizing the Query Evaluation Process
Let's understand how PromQL evaluates a common query:
Best Practices for Writing PromQL
- Start simple: Begin with basic queries before building complexity
- Use clear metric names: Follow Prometheus naming conventions
- Use labels effectively: Don't overuse labels, but use enough for proper filtering
- Consider cardinality: Be cautious of high-cardinality labels
- Use rate() for counters: Always use rate() or irate() when querying counters
- Use appropriate time windows: Match your time window to your scrape interval and the nature of your data
- Test with Prometheus UI: Verify queries in the Prometheus expression browser before using them in dashboards or alerts
Summary
PromQL is a powerful query language designed specifically for time-series data in Prometheus. In this introduction, we've covered:
- The Prometheus data model and how metrics are stored
- Basic data types in PromQL
- How to select and filter metrics
- Working with time ranges using range vectors
- Common aggregation operations and functions
- Practical examples for real-world monitoring scenarios
With these fundamentals, you're ready to start writing your own queries to extract valuable insights from your metrics.
Further Learning
Here are some exercises to practice your PromQL skills:
- Write a query to show the 95th percentile of HTTP request durations across all your services.
- Create a query that shows the top 5 endpoints with the highest error rates.
- Develop a query that shows the rate of increase in memory usage over the last hour.
- Write an expression that could be used for alerting when disk space is running low (below 10% free).
As you continue learning PromQL, remember that the official Prometheus documentation is an excellent resource for deeper exploration of advanced functions and techniques.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)