PromQL Rate Function

Introduction

The rate() function is one of the most fundamental and frequently used functions in PromQL (Prometheus Query Language). It's essential for analyzing how counter metrics change over time, allowing you to calculate the per-second average rate of increase of time series within a specified time window.

In monitoring systems like Prometheus, many metrics are stored as counters - values that only increase over time (except when they reset or restart). Examples include total HTTP requests received, bytes sent, or errors encountered. While raw counter values tell you the total count since the start, the rate of change often provides more actionable insights.

This is where the rate() function comes in - it transforms monotonically increasing counter values into per-second rates that help you understand system behavior over time.

Syntax and Basic Usage

The basic syntax of the rate() function is:

rate(counter_metric[time_range])

Where:

counter_metric is a counter type metric
time_range is the time window (or "lookback window") for calculating the rate

The rate() function:

Takes a range vector as input (a time series with values over a time range)
Calculates the per-second average rate of increase over that time range
Returns an instant vector with the calculated rate values

How Rate Calculation Works

The rate() function uses the following approach to calculate the per-second rate:

It takes the first and last data points within the specified time range
Calculates the difference between these values
Divides this difference by the time difference in seconds
Accounts for counter resets (when a counter goes back to zero after a process restart)

The formula can be represented as:

rate = (last_value - first_value) / time_difference_in_seconds

With additional handling for counter resets and extrapolation for incomplete data.

Example: Basic Rate Calculation

Consider a counter metric http_requests_total that tracks the total number of HTTP requests. To calculate the per-second rate of HTTP requests over the last 5 minutes:

rate(http_requests_total[5m])

If the counter had these values:

100 at t=0 seconds
160 at t=300 seconds (5 minutes)

The calculation would be:

rate = (160 - 100) / 300 = 0.2 requests per second

Counter Resets and How Rate Handles Them

One important feature of rate() is its ability to handle counter resets. When a service restarts, counters typically reset to zero. The rate() function detects these resets and correctly calculates the rate despite them.

For example, if a counter had these values:

100 at t=0 seconds
0 at t=150 seconds (after a service restart)
50 at t=300 seconds

The rate() function would recognize the reset and calculate:

rate = ((50 - 0) + (100 - 0)) / 300 = 0.5 requests per second

This ability to handle counter resets makes rate() robust for real-world monitoring scenarios where services may restart.

Best Practices for Time Range Selection

The time range you select affects the sensitivity and accuracy of your rate calculations:

Too short (e.g., [30s]): More responsive to sudden changes but more susceptible to noise and scrape gaps
Too long (e.g., [1h]): Smoother but might mask important short-term variations

General guidelines:

For high-frequency metrics: 1-5 minutes
For standard metrics: 5-15 minutes
For slow-changing metrics: 15+ minutes

A common starting point is [5m], which balances responsiveness and stability:

rate(http_requests_total[5m])

Rate vs. irate

PromQL offers two main functions for calculating rates:

rate(): Calculates the per-second average rate over the entire time range
irate(): Calculates the per-second instant rate using only the last two data points

Here's a comparison:

Function	Calculation Method	Use Case	Advantages	Disadvantages
`rate()`	Average over entire range	General monitoring, dashboards	Smooths out spikes, better for graphing	May miss short-lived spikes
`irate()`	Only last two samples	Alerting, detecting sudden changes	More responsive to sudden changes	More noisy, less stable

For most dashboard visualizations, rate() is preferred because it provides a more stable signal.

Real-World Examples

Example 1: HTTP Request Rate by Endpoint

This query calculates the per-second rate of HTTP requests for each endpoint:

rate(http_requests_total{job="api-server"}[5m])

You can add label filters to focus on specific endpoints:

rate(http_requests_total{job="api-server", endpoint="/api/users"}[5m])

Example 2: Error Rates and Success Rates

Calculate the per-second rate of errors:

rate(http_errors_total[5m])

Calculate error percentage (combining two rate calculations):

rate(http_errors_total[5m]) / rate(http_requests_total[5m]) * 100

Example 3: Network Traffic Throughput

Calculate network throughput in MB/s:

rate(network_bytes_transferred{interface="eth0"}[5m]) / (1024 * 1024)

Example 4: CPU Usage Rate

Calculate CPU usage rate from a counter of total CPU seconds:

rate(process_cpu_seconds_total{job="app-server"}[5m]) * 100

This gives the percentage of a single CPU core used by the process.

Visualizing Rate Data

Rate data is typically visualized on time-series graphs to show trends. Here's a Mermaid diagram illustrating how raw counter data transforms into rate data:

A typical dashboard might include:

Raw request count (using the counter directly)
Request rate (using rate())
Error rate (using rate() on error counters)
Success percentage (calculated from rates)

Common Pitfalls and Solutions

1. Using Rate with Non-Counter Metrics

The rate() function is designed specifically for counter metrics. Using it with gauge metrics will produce incorrect results.

❌ Incorrect:

rate(node_memory_MemFree_bytes[5m])  # MemFree is a gauge, not a counter

✅ Correct approach for gauges: Use functions like delta() or deriv() instead:

delta(node_memory_MemFree_bytes[5m])  # Change over 5m, not per-second rate

2. Time Range Too Small

If your time range is too small, you might not capture any data points, especially if your scrape interval is close to the range.

❌ Potential issue:

rate(http_requests_total[10s])  # If scrape interval is 15s, this may not work reliably

✅ Better approach:

rate(http_requests_total[1m])  # Ensure multiple data points in the range

The general rule is to use a time range at least 4 times your scrape interval.

3. Alerting on Spurious Spikes

Alerting on rate() can sometimes trigger false alarms due to temporary spikes.

❌ Sensitive to spikes:

rate(http_errors_total[1m]) > 5

✅ More robust alerting:

rate(http_errors_total[5m]) > 5

For alerting specifically on spikes, irate() might be appropriate with proper thresholds.

Advanced Usage: Combining with Other Functions

The rate() function is often combined with other PromQL functions for more sophisticated analyses:

Aggregating Rates Across Instances

sum by (instance) (rate(http_requests_total[5m]))

This calculates the request rate for each instance separately, then sums them by instance.

Moving Averages of Rates

avg_over_time(rate(http_requests_total[5m])[1h:15m])

This creates a 1-hour moving average of the 5-minute rates, sampled every 15 minutes.

Predicting Future Values

predict_linear(rate(http_requests_total[6h])[1h:], 3600)

This predicts what the rate will be in 1 hour (3600 seconds) based on the trend of the last 6 hours.

Summary

The rate() function is a cornerstone of PromQL that transforms counter metrics into more actionable per-second rates. Key points to remember:

Use rate() only with counter metrics
Select an appropriate time range that balances responsiveness and stability
Remember that rate() handles counter resets automatically
Use rate() for visualization and general monitoring; consider irate() for alerting on sudden changes
Combine with aggregation and other functions for more sophisticated analyses

By mastering the rate() function, you'll be able to extract meaningful insights from your time-series data and build effective monitoring dashboards.

Exercises

Calculate the per-second rate of HTTP requests over the last 10 minutes.
Compare the error rates between different service endpoints.
Create a query that shows the percentage of CPU usage per container.
Build a query that calculates the ratio of errors to total requests over 5 minutes.
Implement a query that predicts what your request rate will be in 4 hours based on the current trend.

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Syntax and Basic Usage​

How Rate Calculation Works​

Example: Basic Rate Calculation​

Counter Resets and How Rate Handles Them​

Best Practices for Time Range Selection​

Rate vs. irate​

Real-World Examples​

Example 1: HTTP Request Rate by Endpoint​

Example 2: Error Rates and Success Rates​

Example 3: Network Traffic Throughput​

Example 4: CPU Usage Rate​

Visualizing Rate Data​

Common Pitfalls and Solutions​

1. Using Rate with Non-Counter Metrics​

2. Time Range Too Small​

3. Alerting on Spurious Spikes​

Advanced Usage: Combining with Other Functions​

Aggregating Rates Across Instances​

Moving Averages of Rates​

Predicting Future Values​

Summary​

Exercises​

Additional Resources​