Skip to main content

PromQL Operators

Introduction

PromQL (Prometheus Query Language) is the powerful query language used in Prometheus, a leading open-source monitoring and alerting toolkit. Operators are fundamental building blocks in PromQL that allow you to manipulate, combine, and transform time series data. Understanding operators is crucial for crafting effective queries to analyze metrics and create meaningful visualizations or alerts.

In this guide, we'll explore the various types of operators in PromQL, their syntax, and how to apply them in real-world monitoring scenarios.

Operator Types in PromQL

PromQL operators can be categorized into several groups:

  1. Arithmetic Operators: Perform mathematical calculations on time series
  2. Comparison Operators: Compare values and filter time series
  3. Logical/Set Operators: Combine different time series
  4. Vector Matching Operators: Control how time series are matched during operations
  5. Aggregation Operators: Summarize and reduce time series data

Let's explore each category in detail.

Arithmetic Operators

Arithmetic operators perform mathematical operations on time series data. These operators can be applied between:

  • A scalar and a vector (applying the operation to each element)
  • Two vectors (applying the operation to matching elements)

Basic Arithmetic Operators

OperatorDescriptionExample
+Additionnode_memory_free_bytes + node_memory_cached_bytes
-Subtractionnode_memory_total_bytes - node_memory_free_bytes
*Multiplicationnode_network_transmit_bytes_total * 8 (convert bytes to bits)
/Divisionnode_cpu_seconds_total / 60 (convert seconds to minutes)
%Modulonode_cpu_seconds_total % 60
^Exponentiationnode_disk_read_bytes_total ^ 2

Example: Calculating Memory Usage Percentage

promql
(node_memory_total_bytes - node_memory_free_bytes) / node_memory_total_bytes * 100

This query calculates the percentage of memory used by:

  1. Subtracting free memory from total memory to get used memory
  2. Dividing used memory by total memory
  3. Multiplying by 100 to get a percentage

Comparison Operators

Comparison operators compare values and create a new time series where the value is 1 if the comparison is true and 0 if it's false. They're useful for filtering and thresholding.

Available Comparison Operators

OperatorDescriptionExample
==Equalnode_cpu_seconds_total == 0
!=Not equalnode_cpu_seconds_total != 0
>Greater thanhttp_requests_total > 100
<Less thanhttp_requests_total < 10
>=Greater than or equalnode_memory_usage_percentage >= 90
<=Less than or equalnode_memory_usage_percentage <= 10

Using Comparison with bool Modifier

The bool modifier filters time series where the comparison is true:

promql
http_requests_total > bool 100

This returns only the time series where the value is greater than 100.

Example: Finding High CPU Usage

promql
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80

This query:

  1. Calculates the CPU idle rate over 5 minutes
  2. Converts to percentage and subtracts from 100 to get CPU usage percentage
  3. Filters instances where CPU usage is above 80%

Logical/Set Operators

Logical operators combine or modify time series based on their existence or values.

Binary Logical Operators

OperatorDescriptionExample
andIntersectionup == 1 and rate(http_requests_total[5m]) > 0
orUnionnode_filesystem_avail_bytes{mountpoint="/"} < 10737418240 or node_filesystem_avail_bytes{mountpoint="/data"} < 21474836480
unlessComplementhttp_requests_total unless on(instance) node_boot_time < 1600000000

Example: Detecting Problems in Production

promql
(instance:requests:rate5m > 100) and (instance:errors:rate5m / instance:requests:rate5m > 0.05)

This query identifies instances that have both high request rates (over 100 per second) and an error rate exceeding 5%.

Vector Matching Operators

When performing operations between two vectors, PromQL needs to know how to match the time series. Vector matching operators provide this control.

One-to-One Matching

For operations where each series in the left vector matches exactly one series in the right vector:

promql
request_count{job="api"} / request_count{job="api", status="success"}

Many-to-One and One-to-Many Matching

When one side has more labels than the other:

promql
sum(http_requests_total) by (job) / sum(http_requests_total{status="success"}) by (job)

Vector Matching Keywords

OperatorDescriptionExample
onMatch only on specified labelsrequest_count{job="api"} / on(job) request_count{job="api", status="success"}
ignoringIgnore specified labels when matchingrequest_count{job="api", path="/"} / ignoring(path) request_count{job="api"}
group_leftMany-to-one matchingrequest_count{job="api", path="/"} / ignoring(path) group_left request_count{job="api"}
group_rightOne-to-many matchingrequest_count{job="api"} / ignoring(path) group_right request_count{job="api", path="/"}

Example: Calculating Error Ratio with Labels

promql
sum(rate(http_requests_total{status="error"}[5m])) by (job, handler) 
/
sum(rate(http_requests_total[5m])) by (job, handler)

This query:

  1. Calculates the rate of error requests over 5 minutes, grouped by job and handler
  2. Calculates the total rate of requests over 5 minutes, grouped by job and handler
  3. Divides to get the error ratio for each job and handler combination

Aggregation Operators

Aggregation operators combine multiple time series into fewer time series based on labels.

Available Aggregation Operators

OperatorDescriptionExample
sumSum of all valuessum(http_requests_total)
minMinimum valuemin(node_cpu_seconds_total)
maxMaximum valuemax(node_cpu_seconds_total)
avgAverage valueavg(node_cpu_seconds_total)
stddevStandard deviationstddev(node_cpu_seconds_total)
stdvarStandard variancestdvar(node_cpu_seconds_total)
countCount of elementscount(up == 1)
count_valuesCount of unique valuescount_values("version", build_version)
bottomkBottom k elementsbottomk(3, node_cpu_seconds_total)
topkTop k elementstopk(3, node_cpu_seconds_total)
quantileφ-quantile (0 ≤ φ ≤ 1)quantile(0.95, http_request_duration_seconds)

Modifying Aggregations with by and without

  • by: Keep only specified labels
  • without: Remove specified labels
promql
# Sum requests by job
sum by (job) (http_requests_total)

# Sum requests, removing path label
sum without (path) (http_requests_total)

Example: Finding Top CPU-Consuming Instances

promql
topk(5, 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))

This query:

  1. Calculates CPU usage percentage for each instance (100 minus the idle percentage)
  2. Returns the 5 instances with the highest CPU usage

Operator Precedence

Like in mathematics, PromQL operators follow a precedence order:

  1. Grouping: (...)
  2. Function calls, aggregations
  3. Exponentiation: ^
  4. Multiplication, division, modulo: *, /, %
  5. Addition, subtraction: +, -
  6. Comparison: ==, !=, <=, <, >=, >
  7. Logical/set operators: and, unless, or

Example: Operator Precedence

promql
sum(rate(http_requests_total[5m])) by (job) > 10 or sum(rate(errors_total[5m])) by (job) > 5

This evaluates as:

  1. Calculate rate(http_requests_total[5m]) and rate(errors_total[5m])
  2. Apply sum...by aggregations
  3. Apply > comparisons
  4. Combine with or

Real-world Examples

Example 1: Monitoring Service Health

promql
# Alert when error rate is over 5% in the last 5 minutes
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service) > 0.05

Example 2: Disk Space Prediction

promql
# Predict when disks will run out of space
predict_linear(node_filesystem_avail_bytes[6h], 24 * 3600) < 0

Example 3: Calculating Percentiles for Request Durations

promql
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler))

Visualizing PromQL Operator Flow

Here's a diagram showing how operators transform data in a typical PromQL query:

PromQL Operator Cheat Sheet

Here's a quick reference for PromQL operators:

CategoryOperatorsUsage
Arithmetic+, -, *, /, %, ^node_memory_total_bytes - node_memory_free_bytes
Comparison==, !=, >, <, >=, <=http_requests_total > 100
Logicaland, or, unlessup == 1 and rate(http_requests_total[5m]) > 0
Vector Matchingon, ignoring, group_left, group_rightrequest_count / on(job) request_success_count
Aggregationsum, min, max, avg, stddev, count, topk, etc.sum by (job) (http_requests_total)

Summary

PromQL operators are powerful tools for transforming and analyzing time series data in Prometheus. By mastering these operators, you can:

  • Perform mathematical calculations on metrics
  • Filter time series based on conditions
  • Combine metrics from different sources
  • Aggregate data to reduce dimensionality
  • Create meaningful visualizations and alerts

Remember that effective PromQL queries often combine multiple operators to extract valuable insights from your monitoring data.

Exercises

  1. Write a PromQL query to calculate the percentage of disk space used for each filesystem.
  2. Create a query to find the 3 busiest CPUs across all your instances.
  3. Write a query to calculate the ratio of HTTP 500 errors to total requests, grouped by service and endpoint.
  4. Use vector matching to join metrics from two different sources based on common labels.
  5. Create an alert expression that fires when memory usage is high AND disk space is running low.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)