PromQL Time-based Functions
Time is a fundamental dimension in monitoring and observability. In Prometheus, the ability to analyze metrics across different time ranges is essential for effective monitoring. This guide explores PromQL's time-based functions that allow you to manipulate the time dimension of your metrics.
Introduction to Time-based Functions
PromQL (Prometheus Query Language) provides powerful time-based functions to:
- Calculate rates of change
- Analyze data over specific time windows
- Predict future values based on historical trends
- Compare current metrics with past values
These functions are crucial for both real-time monitoring and retrospective analysis of system performance.
Core Time-based Functions
rate()
and irate()
The rate()
function calculates the per-second average rate of increase for a counter over a specified time range.
rate(http_requests_total[5m])
This returns the per-second rate of HTTP requests over the last 5 minutes.
Example:
For a counter with these values:
- t=0s: 100 requests
- t=60s: 160 requests
- t=120s: 220 requests
- t=180s: 300 requests
- t=240s: 350 requests
- t=300s: 420 requests
rate(http_requests_total[5m])
Output at t=300s: 1.067 requests/second (320 requests ÷ 300 seconds)
The related function irate()
calculates the instant rate based only on the last two data points:
irate(http_requests_total[5m])
Output at t=300s: 1.167 requests/second (70 requests ÷ 60 seconds)
When to use which: Use
rate()
for regular graphing and alerting on counters with a predictable increase. Useirate()
for highly volatile counters or when you need to see brief spikes that might be averaged out byrate()
.
increase()
The increase()
function shows the total increase in a counter over a time period:
increase(http_requests_total[1h])
This shows the total number of requests received in the last hour.
Example:
If a service handled 3,600 requests in the last hour:
increase(http_requests_total[1h])
Output: 3600
Tip: You can think of
increase(x[5m])
as being equivalent to5 * 60 * rate(x[5m])
.
delta()
The delta()
function calculates the difference between the first and last value of a gauge metric in a time range:
delta(cpu_temperature_celsius[1h])
This shows how much the CPU temperature has changed over the last hour.
Example:
If the CPU temperature readings were:
- t=0min: 65°C
- t=15min: 68°C
- t=30min: 72°C
- t=45min: 70°C
- t=60min: 67°C
delta(cpu_temperature_celsius[1h])
Output: 2 (67 - 65 = 2)
idelta()
Similar to delta()
, but only considers the last two points in the specified range:
idelta(cpu_temperature_celsius[1h])
Using the same example data, the output would be: -3 (67 - 70 = -3)
Time Shift Functions
offset
The offset
modifier allows you to look back in time relative to the current query time:
http_requests_total offset 1h
This returns the value of http_requests_total
from 1 hour ago.
Comparing Current vs Past Values
One powerful application is comparing current metrics with historical ones:
(http_requests_total / http_requests_total offset 1d) * 100 - 100
This calculates the percentage change in requests compared to the same time yesterday.
Example:
If you have:
- Current request count: 15,000
- Request count 24h ago: 12,000
(http_requests_total / http_requests_total offset 1d) * 100 - 100
Output: 25 (meaning a 25% increase from yesterday)
Prediction and Trend Analysis
predict_linear()
The predict_linear()
function predicts the value of a time series at a future point based on a linear regression:
predict_linear(node_filesystem_free_bytes[1h], 4 * 3600)
This predicts how much disk space will be free in 4 hours based on the trend of the last hour.
Example:
For a disk that's filling up steadily:
predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24 * 3600) < 0
This alerts if the disk is predicted to run out of space within 24 hours.
deriv()
The deriv()
function calculates the per-second derivative of a gauge metric's value:
deriv(process_resident_memory_bytes[10m])
This shows the rate at which memory usage is changing.
Time Aggregation Functions
<aggregation>_over_time()
These functions perform calculations across time ranges:
avg_over_time(node_cpu_seconds_total{mode="idle"}[5m])
Available aggregation functions include:
avg_over_time
: Average value over the time rangemin_over_time
: Minimum value within the time rangemax_over_time
: Maximum value within the time rangesum_over_time
: Sum of all values in the time rangecount_over_time
: Count of data points in the time rangestddev_over_time
: Standard deviation of valuesstdvar_over_time
: Standard variance of valueslast_over_time
: Last value in the time rangepresent_over_time
: Returns 1 if the metric exists in the time range
Example:
For a service with varying response times over the last 10 minutes:
max_over_time(http_request_duration_seconds[10m])
This shows the maximum request duration observed in the last 10 minutes.
Resets and Changes
resets()
The resets()
function counts counter resets (when a counter goes down instead of up) within a time range:
resets(app_crashes_total[1d])
This counts how many times the application crashed and restarted (causing the counter to reset) in the last day.
changes()
The changes()
function counts the number of times a value changed within the time range:
changes(app_status{job="api-server"}[1h])
This shows how many times the API server status changed in the last hour.
Time Window Syntax
In PromQL, time windows are specified using:
s
- secondsm
- minutesh
- hoursd
- daysw
- weeksy
- years
rate(http_requests_total[5m]) # 5-minute window
increase(errors_total[1h]) # 1-hour window
avg_over_time(cpu_usage[7d]) # 7-day window
Practical Examples
Detecting Service Degradation
Detecting if a service's error rate is increasing over the last 10 minutes:
rate(api_http_errors_total[10m]) / rate(api_http_requests_total[10m]) > 0.05
This alerts when more than 5% of requests are resulting in errors.
Capacity Planning
Predicting when disk space will run out based on usage trends:
predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 7 * 24 * 3600) / 1024 / 1024 / 1024
This shows how many GB of disk space will be left in 7 days at the current usage rate.
Comparing Day-over-Day Performance
(sum(rate(http_requests_total[1h])) / sum(rate(http_requests_total[1h] offset 1d))) * 100 - 100
This calculates the percentage change in HTTP request rate compared to the same hour yesterday.
SLA Compliance Checking
sum_over_time(up{job="api-service"}[30d]) / count_over_time(up{job="api-service"}[30d]) * 100 < 99.9
This checks if a service's uptime over the last 30 days is below the 99.9% SLA threshold.
Common Visualization Patterns
Heatmap of Weekly Patterns
sum(rate(http_requests_total[5m])) by (day_of_week, hour)
When visualized as a heatmap, this shows traffic patterns by day of week and hour.
95th Percentile Over Time
histogram_quantile(0.95, sum(rate(http_request_duration_bucket[5m])) by (le))
This shows the 95th percentile of HTTP request durations over time.
Summary
PromQL's time-based functions provide powerful tools for analyzing metrics across time dimensions. These functions allow you to:
- Calculate rates of change with
rate()
andirate()
- Measure increases with
increase()
and changes withdelta()
- Compare current metrics with historical values using
offset
- Predict future trends with
predict_linear()
- Perform time-based aggregations with
*_over_time()
functions - Detect anomalies and resets with
resets()
andchanges()
Mastering these functions lets you build more effective monitoring dashboards and alerts that can detect trends and issues before they become critical problems.
Additional Resources
Exercises
- Write a PromQL query to calculate the average CPU usage over the last hour.
- Create a query that compares today's error rate with yesterday's at the same time.
- Write a query to predict when memory usage will exceed 90% if the current trend continues.
- Create an alert query that triggers when the service response time has increased by more than 50% compared to the last hour.
- Write a query to find the busiest hour of the day based on request rates.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)