Skip to main content

PromQL Time-based Functions

Time is a fundamental dimension in monitoring and observability. In Prometheus, the ability to analyze metrics across different time ranges is essential for effective monitoring. This guide explores PromQL's time-based functions that allow you to manipulate the time dimension of your metrics.

Introduction to Time-based Functions

PromQL (Prometheus Query Language) provides powerful time-based functions to:

  • Calculate rates of change
  • Analyze data over specific time windows
  • Predict future values based on historical trends
  • Compare current metrics with past values

These functions are crucial for both real-time monitoring and retrospective analysis of system performance.

Core Time-based Functions

rate() and irate()

The rate() function calculates the per-second average rate of increase for a counter over a specified time range.

promql
rate(http_requests_total[5m])

This returns the per-second rate of HTTP requests over the last 5 minutes.

Example:

For a counter with these values:

  • t=0s: 100 requests
  • t=60s: 160 requests
  • t=120s: 220 requests
  • t=180s: 300 requests
  • t=240s: 350 requests
  • t=300s: 420 requests
promql
rate(http_requests_total[5m])

Output at t=300s: 1.067 requests/second (320 requests ÷ 300 seconds)

The related function irate() calculates the instant rate based only on the last two data points:

promql
irate(http_requests_total[5m])

Output at t=300s: 1.167 requests/second (70 requests ÷ 60 seconds)

When to use which: Use rate() for regular graphing and alerting on counters with a predictable increase. Use irate() for highly volatile counters or when you need to see brief spikes that might be averaged out by rate().

increase()

The increase() function shows the total increase in a counter over a time period:

promql
increase(http_requests_total[1h])

This shows the total number of requests received in the last hour.

Example:

If a service handled 3,600 requests in the last hour:

promql
increase(http_requests_total[1h])

Output: 3600

Tip: You can think of increase(x[5m]) as being equivalent to 5 * 60 * rate(x[5m]).

delta()

The delta() function calculates the difference between the first and last value of a gauge metric in a time range:

promql
delta(cpu_temperature_celsius[1h])

This shows how much the CPU temperature has changed over the last hour.

Example:

If the CPU temperature readings were:

  • t=0min: 65°C
  • t=15min: 68°C
  • t=30min: 72°C
  • t=45min: 70°C
  • t=60min: 67°C
promql
delta(cpu_temperature_celsius[1h])

Output: 2 (67 - 65 = 2)

idelta()

Similar to delta(), but only considers the last two points in the specified range:

promql
idelta(cpu_temperature_celsius[1h])

Using the same example data, the output would be: -3 (67 - 70 = -3)

Time Shift Functions

offset

The offset modifier allows you to look back in time relative to the current query time:

promql
http_requests_total offset 1h

This returns the value of http_requests_total from 1 hour ago.

Comparing Current vs Past Values

One powerful application is comparing current metrics with historical ones:

promql
(http_requests_total / http_requests_total offset 1d) * 100 - 100

This calculates the percentage change in requests compared to the same time yesterday.

Example:

If you have:

  • Current request count: 15,000
  • Request count 24h ago: 12,000
promql
(http_requests_total / http_requests_total offset 1d) * 100 - 100

Output: 25 (meaning a 25% increase from yesterday)

Prediction and Trend Analysis

predict_linear()

The predict_linear() function predicts the value of a time series at a future point based on a linear regression:

promql
predict_linear(node_filesystem_free_bytes[1h], 4 * 3600)

This predicts how much disk space will be free in 4 hours based on the trend of the last hour.

Example:

For a disk that's filling up steadily:

promql
predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24 * 3600) < 0

This alerts if the disk is predicted to run out of space within 24 hours.

deriv()

The deriv() function calculates the per-second derivative of a gauge metric's value:

promql
deriv(process_resident_memory_bytes[10m])

This shows the rate at which memory usage is changing.

Time Aggregation Functions

<aggregation>_over_time()

These functions perform calculations across time ranges:

promql
avg_over_time(node_cpu_seconds_total{mode="idle"}[5m])

Available aggregation functions include:

  • avg_over_time: Average value over the time range
  • min_over_time: Minimum value within the time range
  • max_over_time: Maximum value within the time range
  • sum_over_time: Sum of all values in the time range
  • count_over_time: Count of data points in the time range
  • stddev_over_time: Standard deviation of values
  • stdvar_over_time: Standard variance of values
  • last_over_time: Last value in the time range
  • present_over_time: Returns 1 if the metric exists in the time range

Example:

For a service with varying response times over the last 10 minutes:

promql
max_over_time(http_request_duration_seconds[10m])

This shows the maximum request duration observed in the last 10 minutes.

Resets and Changes

resets()

The resets() function counts counter resets (when a counter goes down instead of up) within a time range:

promql
resets(app_crashes_total[1d])

This counts how many times the application crashed and restarted (causing the counter to reset) in the last day.

changes()

The changes() function counts the number of times a value changed within the time range:

promql
changes(app_status{job="api-server"}[1h])

This shows how many times the API server status changed in the last hour.

Time Window Syntax

In PromQL, time windows are specified using:

  • s - seconds
  • m - minutes
  • h - hours
  • d - days
  • w - weeks
  • y - years
promql
rate(http_requests_total[5m])   # 5-minute window
increase(errors_total[1h]) # 1-hour window
avg_over_time(cpu_usage[7d]) # 7-day window

Practical Examples

Detecting Service Degradation

Detecting if a service's error rate is increasing over the last 10 minutes:

promql
rate(api_http_errors_total[10m]) / rate(api_http_requests_total[10m]) > 0.05

This alerts when more than 5% of requests are resulting in errors.

Capacity Planning

Predicting when disk space will run out based on usage trends:

promql
predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 7 * 24 * 3600) / 1024 / 1024 / 1024

This shows how many GB of disk space will be left in 7 days at the current usage rate.

Comparing Day-over-Day Performance

promql
(sum(rate(http_requests_total[1h])) / sum(rate(http_requests_total[1h] offset 1d))) * 100 - 100

This calculates the percentage change in HTTP request rate compared to the same hour yesterday.

SLA Compliance Checking

promql
sum_over_time(up{job="api-service"}[30d]) / count_over_time(up{job="api-service"}[30d]) * 100 < 99.9

This checks if a service's uptime over the last 30 days is below the 99.9% SLA threshold.

Common Visualization Patterns

Heatmap of Weekly Patterns

promql
sum(rate(http_requests_total[5m])) by (day_of_week, hour)

When visualized as a heatmap, this shows traffic patterns by day of week and hour.

95th Percentile Over Time

promql
histogram_quantile(0.95, sum(rate(http_request_duration_bucket[5m])) by (le))

This shows the 95th percentile of HTTP request durations over time.

Summary

PromQL's time-based functions provide powerful tools for analyzing metrics across time dimensions. These functions allow you to:

  1. Calculate rates of change with rate() and irate()
  2. Measure increases with increase() and changes with delta()
  3. Compare current metrics with historical values using offset
  4. Predict future trends with predict_linear()
  5. Perform time-based aggregations with *_over_time() functions
  6. Detect anomalies and resets with resets() and changes()

Mastering these functions lets you build more effective monitoring dashboards and alerts that can detect trends and issues before they become critical problems.

Additional Resources

Exercises

  1. Write a PromQL query to calculate the average CPU usage over the last hour.
  2. Create a query that compares today's error rate with yesterday's at the same time.
  3. Write a query to predict when memory usage will exceed 90% if the current trend continues.
  4. Create an alert query that triggers when the service response time has increased by more than 50% compared to the last hour.
  5. Write a query to find the busiest hour of the day based on request rates.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)