PromQL Time-based Functions

Time is a fundamental dimension in monitoring and observability. In Prometheus, the ability to analyze metrics across different time ranges is essential for effective monitoring. This guide explores PromQL's time-based functions that allow you to manipulate the time dimension of your metrics.

Introduction to Time-based Functions

PromQL (Prometheus Query Language) provides powerful time-based functions to:

Calculate rates of change
Analyze data over specific time windows
Predict future values based on historical trends
Compare current metrics with past values

These functions are crucial for both real-time monitoring and retrospective analysis of system performance.

Core Time-based Functions

`rate()` and `irate()`

The rate() function calculates the per-second average rate of increase for a counter over a specified time range.

rate(http_requests_total[5m])

This returns the per-second rate of HTTP requests over the last 5 minutes.

Example:

For a counter with these values:

t=0s: 100 requests
t=60s: 160 requests
t=120s: 220 requests
t=180s: 300 requests
t=240s: 350 requests
t=300s: 420 requests

rate(http_requests_total[5m])

Output at t=300s: 1.067 requests/second (320 requests ÷ 300 seconds)

The related function irate() calculates the instant rate based only on the last two data points:

irate(http_requests_total[5m])

Output at t=300s: 1.167 requests/second (70 requests ÷ 60 seconds)

When to use which: Use rate() for regular graphing and alerting on counters with a predictable increase. Use irate() for highly volatile counters or when you need to see brief spikes that might be averaged out by rate().

`increase()`

The increase() function shows the total increase in a counter over a time period:

increase(http_requests_total[1h])

This shows the total number of requests received in the last hour.

Example:

If a service handled 3,600 requests in the last hour:

increase(http_requests_total[1h])

Output: 3600

Tip: You can think of increase(x[5m]) as being equivalent to 5 * 60 * rate(x[5m]).

`delta()`

The delta() function calculates the difference between the first and last value of a gauge metric in a time range:

delta(cpu_temperature_celsius[1h])

This shows how much the CPU temperature has changed over the last hour.

Example:

If the CPU temperature readings were:

t=0min: 65°C
t=15min: 68°C
t=30min: 72°C
t=45min: 70°C
t=60min: 67°C

delta(cpu_temperature_celsius[1h])

Output: 2 (67 - 65 = 2)

`idelta()`

Similar to delta(), but only considers the last two points in the specified range:

idelta(cpu_temperature_celsius[1h])

Using the same example data, the output would be: -3 (67 - 70 = -3)

Time Shift Functions

`offset`

The offset modifier allows you to look back in time relative to the current query time:

http_requests_total offset 1h

This returns the value of http_requests_total from 1 hour ago.

Comparing Current vs Past Values

One powerful application is comparing current metrics with historical ones:

(http_requests_total / http_requests_total offset 1d) * 100 - 100

This calculates the percentage change in requests compared to the same time yesterday.

Example:

If you have:

Current request count: 15,000
Request count 24h ago: 12,000

(http_requests_total / http_requests_total offset 1d) * 100 - 100

Output: 25 (meaning a 25% increase from yesterday)

Prediction and Trend Analysis

`predict_linear()`

The predict_linear() function predicts the value of a time series at a future point based on a linear regression:

predict_linear(node_filesystem_free_bytes[1h], 4 * 3600)

This predicts how much disk space will be free in 4 hours based on the trend of the last hour.

Example:

For a disk that's filling up steadily:

predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24 * 3600) < 0

This alerts if the disk is predicted to run out of space within 24 hours.

`deriv()`

The deriv() function calculates the per-second derivative of a gauge metric's value:

deriv(process_resident_memory_bytes[10m])

This shows the rate at which memory usage is changing.

Time Aggregation Functions

`<aggregation>_over_time()`

These functions perform calculations across time ranges:

avg_over_time(node_cpu_seconds_total{mode="idle"}[5m])

Available aggregation functions include:

avg_over_time: Average value over the time range
min_over_time: Minimum value within the time range
max_over_time: Maximum value within the time range
sum_over_time: Sum of all values in the time range
count_over_time: Count of data points in the time range
stddev_over_time: Standard deviation of values
stdvar_over_time: Standard variance of values
last_over_time: Last value in the time range
present_over_time: Returns 1 if the metric exists in the time range

Example:

For a service with varying response times over the last 10 minutes:

max_over_time(http_request_duration_seconds[10m])

This shows the maximum request duration observed in the last 10 minutes.

Resets and Changes

`resets()`

The resets() function counts counter resets (when a counter goes down instead of up) within a time range:

resets(app_crashes_total[1d])

This counts how many times the application crashed and restarted (causing the counter to reset) in the last day.

`changes()`

The changes() function counts the number of times a value changed within the time range:

changes(app_status{job="api-server"}[1h])

This shows how many times the API server status changed in the last hour.

Time Window Syntax

In PromQL, time windows are specified using:

s - seconds
m - minutes
h - hours
d - days
w - weeks
y - years

rate(http_requests_total[5m])   # 5-minute window
increase(errors_total[1h])      # 1-hour window
avg_over_time(cpu_usage[7d])    # 7-day window

Practical Examples

Detecting Service Degradation

Detecting if a service's error rate is increasing over the last 10 minutes:

rate(api_http_errors_total[10m]) / rate(api_http_requests_total[10m]) > 0.05

This alerts when more than 5% of requests are resulting in errors.

Capacity Planning

Predicting when disk space will run out based on usage trends:

predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 7 * 24 * 3600) / 1024 / 1024 / 1024

This shows how many GB of disk space will be left in 7 days at the current usage rate.

Comparing Day-over-Day Performance

(sum(rate(http_requests_total[1h])) / sum(rate(http_requests_total[1h] offset 1d))) * 100 - 100

This calculates the percentage change in HTTP request rate compared to the same hour yesterday.

SLA Compliance Checking

sum_over_time(up{job="api-service"}[30d]) / count_over_time(up{job="api-service"}[30d]) * 100 < 99.9

This checks if a service's uptime over the last 30 days is below the 99.9% SLA threshold.

Common Visualization Patterns

Heatmap of Weekly Patterns

sum(rate(http_requests_total[5m])) by (day_of_week, hour)

When visualized as a heatmap, this shows traffic patterns by day of week and hour.

95th Percentile Over Time

histogram_quantile(0.95, sum(rate(http_request_duration_bucket[5m])) by (le))

This shows the 95th percentile of HTTP request durations over time.

Summary

PromQL's time-based functions provide powerful tools for analyzing metrics across time dimensions. These functions allow you to:

Calculate rates of change with rate() and irate()
Measure increases with increase() and changes with delta()
Compare current metrics with historical values using offset
Predict future trends with predict_linear()
Perform time-based aggregations with *_over_time() functions
Detect anomalies and resets with resets() and changes()

Mastering these functions lets you build more effective monitoring dashboards and alerts that can detect trends and issues before they become critical problems.

Additional Resources

Exercises

Write a PromQL query to calculate the average CPU usage over the last hour.
Create a query that compares today's error rate with yesterday's at the same time.
Write a query to predict when memory usage will exceed 90% if the current trend continues.
Create an alert query that triggers when the service response time has increased by more than 50% compared to the last hour.
Write a query to find the busiest hour of the day based on request rates.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction to Time-based Functions​

Core Time-based Functions​

rate() and irate()​

Example:​

increase()​

Example:​

delta()​

Example:​

idelta()​

Time Shift Functions​

offset​

Comparing Current vs Past Values​

Example:​

Prediction and Trend Analysis​

predict_linear()​

Example:​

deriv()​

Time Aggregation Functions​

<aggregation>_over_time()​

Example:​

Resets and Changes​

resets()​

changes()​

Time Window Syntax​

Practical Examples​

Detecting Service Degradation​

Capacity Planning​

Comparing Day-over-Day Performance​

SLA Compliance Checking​

Common Visualization Patterns​

Heatmap of Weekly Patterns​

95th Percentile Over Time​

Summary​

Additional Resources​

Exercises​

Introduction to Time-based Functions

Core Time-based Functions

`rate()` and `irate()`

Example:

`increase()`

Example:

`delta()`

Example:

`idelta()`

Time Shift Functions

`offset`

Comparing Current vs Past Values

Example:

Prediction and Trend Analysis

`predict_linear()`

Example:

`deriv()`

Time Aggregation Functions

`<aggregation>_over_time()`

Example:

Resets and Changes

`resets()`

`changes()`

Time Window Syntax

Practical Examples

Detecting Service Degradation

Capacity Planning

Comparing Day-over-Day Performance

SLA Compliance Checking

Common Visualization Patterns

Heatmap of Weekly Patterns

95th Percentile Over Time

Summary

Additional Resources

Exercises