PromQL Basics

Introduction

PromQL (Prometheus Query Language) is the powerful query language used in Prometheus and Grafana to retrieve and manipulate time-series data. If you're working with Grafana dashboards connected to Prometheus data sources, understanding PromQL is essential for creating effective visualizations and alerts.

In this guide, we'll explore the fundamentals of PromQL, its syntax, and how to construct queries that help you extract meaningful insights from your metrics data.

What is PromQL?

PromQL is a functional query language specifically designed for time-series data. It allows you to:

Select and filter time-series data
Perform mathematical operations on data
Aggregate data across multiple time series
Calculate rates of change
Create complex expressions for monitoring and alerting

Let's dive into the basics of writing PromQL queries.

PromQL Data Types

Before we start writing queries, it's important to understand the four main data types in PromQL:

Instant Vector - A set of time series containing a single sample for each time series, all sharing the same timestamp
Range Vector - A set of time series containing a range of data points over time
Scalar - A simple numeric floating-point value
String - A simple string value (rarely used in PromQL)

Basic Query Syntax

Selecting Metrics

The most basic PromQL query is simply the name of a metric:

http_requests_total

This query returns an instant vector containing all time series with the metric name http_requests_total.

Using Labels and Label Matchers

To filter time series, you can use label matchers:

http_requests_total{status="200", method="GET"}

This query selects only the time series with the metric name http_requests_total where status equals "200" and method equals "GET".

PromQL supports several matching operators:

=: Exact match
!=: Does not match
=~: Regex match
!~: Does not match regex

Example with regex matching:

http_requests_total{status=~"5.."}

This matches all HTTP requests with status codes starting with 5 (5xx errors).

Range Vectors

To select data points over time, you can use range vectors by appending a time range selector:

http_requests_total[5m]

This selects all data points for http_requests_total over the last 5 minutes.

Common time units:

s - seconds
m - minutes
h - hours
d - days
w - weeks
y - years

Operators and Functions

Arithmetic Operators

PromQL supports basic arithmetic operators:

node_memory_total - node_memory_free

This calculates the used memory by subtracting free memory from total memory.

Aggregation Operators

To combine multiple time series, you can use aggregation operators:

sum(http_requests_total) by (status)

This sums up all HTTP requests grouped by status code.

Common aggregation operators:

sum
min
max
avg
count
topk
bottomk

Rate Function

One of the most commonly used functions is rate, which calculates the per-second average rate of increase:

rate(http_requests_total[5m])

This gives you the per-second rate of HTTP requests over the last 5 minutes.

Practical Examples

Let's look at some real-world examples of PromQL queries that you might use in Grafana dashboards:

Example 1: Error Rate Percentage

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

This query calculates the percentage of HTTP 5xx errors relative to all requests over the last 5 minutes.

Example 2: CPU Usage by Node

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

This query calculates the CPU usage percentage for each instance by measuring how much the CPU is NOT idle.

Example 3: Memory Usage Percentage

(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100

This calculates the memory usage percentage by determining how much memory is neither free nor cached.

Complex Patterns

Delta vs Rate

For counters, you'll often need to choose between rate() and delta():

rate() - calculates per-second average rate of increase
delta() - calculates absolute increase in value

Example:

# Per-second rate of CPU time spent in user mode
rate(node_cpu_seconds_total{mode="user"}[5m])

# Absolute increase in CPU time spent in user mode
delta(node_cpu_seconds_total{mode="user"}[5m])

Predicting Resource Exhaustion

You can use linear prediction to estimate when a resource might run out:

predict_linear(node_filesystem_free_bytes[1h], 4 * 3600)

This predicts the amount of free disk space 4 hours in the future based on the trend from the last hour.

PromQL in Grafana

When using PromQL in Grafana:

Select your Prometheus data source
Choose "Metrics browser" or "Code" as the query type
Enter your PromQL expression
Use the time range selector in Grafana to adjust the query time frame

Common Pitfalls and Best Practices

Pitfalls to Avoid

Using increase() over very short intervals - This can lead to inaccurate results due to scrape intervals
Comparing metrics with different labels - Ensure label sets match when performing operations between metrics
Forgetting to use rate functions for counters - Always use rate(), irate(), or increase() with counter metrics

Best Practices

Start simple - Begin with basic queries and gradually add complexity
Use comments - Document complex queries for future reference
Watch for cardinality - High-cardinality metrics can impact performance
Use template variables - In Grafana, leverage template variables to make queries reusable

Visualizing PromQL Query Execution

The following diagram illustrates how a PromQL query is processed:

Summary

PromQL is a powerful query language that enables you to extract meaningful insights from your time-series data in Grafana. In this guide, we've covered:

PromQL data types and basic syntax
Filtering metrics using labels
Working with range vectors
Using operators and functions
Creating practical queries for real-world scenarios

By mastering these fundamentals, you'll be able to create effective Grafana dashboards and alerts that provide valuable insights into your systems' performance.

Additional Resources

Exercises

Write a PromQL query to show the rate of HTTP requests per second, grouped by endpoint.
Create a query to calculate the 95th percentile response time for your application.
Develop a query that alerts when disk usage exceeds 80% and is predicted to reach 100% within 24 hours.
Write a query to show the top 5 processes consuming the most CPU.
Create a query to calculate the request success rate (non-5xx responses) as a percentage.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is PromQL?​

PromQL Data Types​

Basic Query Syntax​

Selecting Metrics​

Using Labels and Label Matchers​

Range Vectors​

Operators and Functions​

Arithmetic Operators​

Aggregation Operators​

Rate Function​

Practical Examples​

Example 1: Error Rate Percentage​

Example 2: CPU Usage by Node​

Example 3: Memory Usage Percentage​

Complex Patterns​

Delta vs Rate​

Predicting Resource Exhaustion​

PromQL in Grafana​

Common Pitfalls and Best Practices​

Pitfalls to Avoid​

Best Practices​

Visualizing PromQL Query Execution​

Summary​

Additional Resources​

Exercises​