Skip to main content

PromQL Functions

Introduction

Functions are a fundamental component of PromQL (Prometheus Query Language) that allow you to transform, aggregate, and manipulate time series data. In this guide, we'll explore the various functions available in PromQL, understand their syntax, and learn how to use them effectively in your monitoring queries.

PromQL functions enhance your ability to extract meaningful insights from your metrics data. They can help you calculate rates, aggregate values, predict trends, and transform data into more useful formats. Understanding these functions is essential for building powerful monitoring dashboards and alerting rules.

Function Categories

PromQL functions can be broadly categorized into several types based on their purpose:

  1. Aggregation Functions: Combine multiple time series into fewer series
  2. Counter Functions: Work with counter metrics (continuously increasing values)
  3. Mathematical Functions: Perform mathematical operations on time series data
  4. Rate Functions: Calculate rates of change
  5. Time Functions: Manipulate timestamps or perform time-based calculations
  6. Label Manipulation: Modify, add, or remove labels from time series

Let's explore each category in detail.

Aggregation Functions

Aggregation functions combine multiple time series into a smaller set, usually by operating across the "instance" dimension.

sum

The sum function adds the values of multiple time series together.

sum(http_requests_total)

This query sums the values of the http_requests_total metric across all instances and returns a single time series.

You can also aggregate by specific labels:

sum by (job) (http_requests_total)

This groups the metrics by the job label, summing the values for each job.

avg

The avg function calculates the average value across multiple time series:

avg(node_cpu_seconds_total{mode="idle"})

Like sum, you can use by or without to specify grouping:

avg by (instance) (node_cpu_seconds_total{mode="idle"})

Other Aggregation Functions

PromQL includes several other aggregation operators:

  • min: Selects the smallest value
  • max: Selects the largest value
  • count: Counts the number of elements in the vector
  • stddev: Calculates the population standard deviation
  • stdvar: Calculates the population standard variance
  • topk: Selects the k largest elements
  • bottomk: Selects the k smallest elements
  • quantile: Calculates the φ-quantile (0 ≤ φ ≤ 1)

Example using topk:

# Find the 3 busiest HTTP endpoints
topk(3, sum by (path) (rate(http_requests_total[5m])))

Counter Functions

Counter functions are designed to work with counter metrics, which always increase over time (except when they reset to zero, like during a restart).

rate

The rate function calculates the per-second average rate of increase over a time window:

rate(http_requests_total[5m])

This calculates the per-second rate of HTTP requests over the last 5 minutes.

irate

The irate function calculates the instant rate based on the last two data points:

irate(http_requests_total[5m])

irate is more responsive to recent changes but can be more volatile than rate.

increase

The increase function calculates the total increase in a counter over a time window:

increase(http_requests_total[1h])

This gives the total number of requests over the past hour.

When to use which counter function?

  • Use rate for regular graphing and alerting on counters.
  • Use irate for fast-changing counters or when you need to see rapid changes.
  • Use increase when you want the absolute increase rather than a per-second rate.

Mathematical Functions

PromQL offers various mathematical functions to transform your data.

abs, ceil, floor, round

Basic mathematical operations:

abs(temperature - 273.15)  # Convert Kelvin to Celsius and ensure it's positive
ceil(cpu_usage) # Round up CPU usage
floor(memory_fraction) # Round down memory usage
round(response_time_seconds) # Round to nearest integer

Trigonometric and Other Mathematical Functions

PromQL supports several advanced mathematical functions:

  • sqrt: Square root
  • ln, log2, log10: Logarithms
  • exp: Exponential function
  • sin, cos, tan: Trigonometric functions

Example:

sqrt(process_resident_memory_bytes / 1024 / 1024)

This calculates the square root of the memory usage in MB.

Histogram Functions

Prometheus often stores latency and size metrics as histograms. PromQL provides functions to analyze these histograms.

histogram_quantile

The histogram_quantile function calculates quantiles from histogram metrics:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

This query returns the 95th percentile of HTTP request durations over the last 5 minutes.

Time and Date Functions

These functions help with time-based calculations.

time

The time function returns the timestamp (in seconds since epoch) for each point:

time()

day_of_week, day_of_month, day_of_year

These functions return the day number (starting from 0):

day_of_week(time())  # 0 = Sunday, 6 = Saturday

hour, minute, month, year

Extract components from a timestamp:

hour(time()) # 0-23

Vector Matching Functions

These functions help when working with multiple time series that need to be combined.

vector

Converts a scalar to a vector:

vector(1)

on and ignoring

These modifiers control which labels to match on:

api_requests_total{method="GET"} / on(instance, method) api_requests_total{method="POST"}

Practical Examples

Let's look at some real-world examples of combining PromQL functions to answer common monitoring questions.

Example 1: Error Rate Calculation

Calculate the percentage of HTTP requests that resulted in 5xx errors:

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

Example 2: CPU Usage per Core

Calculate the average CPU usage per core, excluding idle time:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Example 3: Predicting Resource Exhaustion

Predict when disk space will run out based on current usage trend:

(node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.10
and
predict_linear(node_filesystem_avail_bytes[6h], 24 * 3600) < 0

This alerts when:

  1. Less than 10% disk space is available, AND
  2. The disk is predicted to fill up within the next 24 hours based on the last 6 hours of data.

Example 4: Detecting Service Degradation

Detect when a service's 95th percentile latency exceeds a threshold:

histogram_quantile(0.95, sum(rate(api_request_duration_seconds_bucket[5m])) by (le)) > 0.5

This alerts when the 95th percentile of API request duration exceeds 500ms.

Common Function Combinations

Some function combinations are frequently used together:

Rate then Sum

sum(rate(http_requests_total[5m])) by (path)

This pattern first calculates the rate for each time series, then sums them by path.

Aggregation then Histogram Quantile

histogram_quantile(0.95, sum(rate(request_duration_seconds_bucket[5m])) by (le, job))

This calculates request duration quantiles per job.

Function Pitfalls and Gotchas

Extrapolation Issues with predict_linear

The predict_linear function assumes a linear trend, which might not always be accurate for real-world data. Use with caution for long-term predictions.

Aggregation Order Matters

The order of operations can significantly affect results:

sum(rate(counter[5m]))  # Calculate rate first, then sum
rate(sum(counter)[5m]) # Sum first, then calculate rate (usually incorrect)

Histogram Quantiles Are Approximate

The histogram_quantile function provides an approximation based on bucket boundaries. The accuracy depends on how your histogram buckets are defined.

Summary

PromQL functions form the backbone of effective monitoring with Prometheus. They enable you to:

  • Transform raw metrics into meaningful insights
  • Calculate rates of change and trends
  • Aggregate data across multiple dimensions
  • Make predictions based on historical data
  • Create powerful alerting rules

By combining these functions appropriately, you can build comprehensive monitoring solutions that help you understand system behavior and detect problems early.

Additional Resources

To deepen your understanding of PromQL functions:

  • Experiment with the Prometheus expression browser in your own environment
  • Try writing queries that answer specific questions about your systems
  • Refer to the official Prometheus documentation for complete function reference

Exercises

  1. Write a PromQL query to find the 3 most CPU-intensive processes on your system.
  2. Create a query to calculate the memory usage growth rate and predict when you might run out of memory.
  3. Write a query to detect when the ratio of errors to total requests exceeds 5% over a 5-minute window.
  4. Create a dashboard panel showing the 95th percentile request latency broken down by endpoint.
  5. Write an alert expression that triggers when any instance has less than 10% disk space remaining and is predicted to run out within 48 hours.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)