Skip to main content

PromQL Rate Function

Introduction

The rate() function is one of the most fundamental and frequently used functions in PromQL (Prometheus Query Language). It's essential for analyzing how counter metrics change over time, allowing you to calculate the per-second average rate of increase of time series within a specified time window.

In monitoring systems like Prometheus, many metrics are stored as counters - values that only increase over time (except when they reset or restart). Examples include total HTTP requests received, bytes sent, or errors encountered. While raw counter values tell you the total count since the start, the rate of change often provides more actionable insights.

This is where the rate() function comes in - it transforms monotonically increasing counter values into per-second rates that help you understand system behavior over time.

Syntax and Basic Usage

The basic syntax of the rate() function is:

promql
rate(counter_metric[time_range])

Where:

  • counter_metric is a counter type metric
  • time_range is the time window (or "lookback window") for calculating the rate

The rate() function:

  1. Takes a range vector as input (a time series with values over a time range)
  2. Calculates the per-second average rate of increase over that time range
  3. Returns an instant vector with the calculated rate values

How Rate Calculation Works

The rate() function uses the following approach to calculate the per-second rate:

  1. It takes the first and last data points within the specified time range
  2. Calculates the difference between these values
  3. Divides this difference by the time difference in seconds
  4. Accounts for counter resets (when a counter goes back to zero after a process restart)

The formula can be represented as:

rate = (last_value - first_value) / time_difference_in_seconds

With additional handling for counter resets and extrapolation for incomplete data.

Example: Basic Rate Calculation

Consider a counter metric http_requests_total that tracks the total number of HTTP requests. To calculate the per-second rate of HTTP requests over the last 5 minutes:

promql
rate(http_requests_total[5m])

If the counter had these values:

  • 100 at t=0 seconds
  • 160 at t=300 seconds (5 minutes)

The calculation would be:

rate = (160 - 100) / 300 = 0.2 requests per second

Counter Resets and How Rate Handles Them

One important feature of rate() is its ability to handle counter resets. When a service restarts, counters typically reset to zero. The rate() function detects these resets and correctly calculates the rate despite them.

For example, if a counter had these values:

  • 100 at t=0 seconds
  • 0 at t=150 seconds (after a service restart)
  • 50 at t=300 seconds

The rate() function would recognize the reset and calculate:

rate = ((50 - 0) + (100 - 0)) / 300 = 0.5 requests per second

This ability to handle counter resets makes rate() robust for real-world monitoring scenarios where services may restart.

Best Practices for Time Range Selection

The time range you select affects the sensitivity and accuracy of your rate calculations:

  • Too short (e.g., [30s]): More responsive to sudden changes but more susceptible to noise and scrape gaps
  • Too long (e.g., [1h]): Smoother but might mask important short-term variations

General guidelines:

  • For high-frequency metrics: 1-5 minutes
  • For standard metrics: 5-15 minutes
  • For slow-changing metrics: 15+ minutes

A common starting point is [5m], which balances responsiveness and stability:

promql
rate(http_requests_total[5m])

Rate vs. irate

PromQL offers two main functions for calculating rates:

  • rate(): Calculates the per-second average rate over the entire time range
  • irate(): Calculates the per-second instant rate using only the last two data points

Here's a comparison:

FunctionCalculation MethodUse CaseAdvantagesDisadvantages
rate()Average over entire rangeGeneral monitoring, dashboardsSmooths out spikes, better for graphingMay miss short-lived spikes
irate()Only last two samplesAlerting, detecting sudden changesMore responsive to sudden changesMore noisy, less stable

For most dashboard visualizations, rate() is preferred because it provides a more stable signal.

Real-World Examples

Example 1: HTTP Request Rate by Endpoint

This query calculates the per-second rate of HTTP requests for each endpoint:

promql
rate(http_requests_total{job="api-server"}[5m])

You can add label filters to focus on specific endpoints:

promql
rate(http_requests_total{job="api-server", endpoint="/api/users"}[5m])

Example 2: Error Rates and Success Rates

Calculate the per-second rate of errors:

promql
rate(http_errors_total[5m])

Calculate error percentage (combining two rate calculations):

promql
rate(http_errors_total[5m]) / rate(http_requests_total[5m]) * 100

Example 3: Network Traffic Throughput

Calculate network throughput in MB/s:

promql
rate(network_bytes_transferred{interface="eth0"}[5m]) / (1024 * 1024)

Example 4: CPU Usage Rate

Calculate CPU usage rate from a counter of total CPU seconds:

promql
rate(process_cpu_seconds_total{job="app-server"}[5m]) * 100

This gives the percentage of a single CPU core used by the process.

Visualizing Rate Data

Rate data is typically visualized on time-series graphs to show trends. Here's a Mermaid diagram illustrating how raw counter data transforms into rate data:

A typical dashboard might include:

  1. Raw request count (using the counter directly)
  2. Request rate (using rate())
  3. Error rate (using rate() on error counters)
  4. Success percentage (calculated from rates)

Common Pitfalls and Solutions

1. Using Rate with Non-Counter Metrics

The rate() function is designed specifically for counter metrics. Using it with gauge metrics will produce incorrect results.

Incorrect:

promql
rate(node_memory_MemFree_bytes[5m])  # MemFree is a gauge, not a counter

Correct approach for gauges: Use functions like delta() or deriv() instead:

promql
delta(node_memory_MemFree_bytes[5m])  # Change over 5m, not per-second rate

2. Time Range Too Small

If your time range is too small, you might not capture any data points, especially if your scrape interval is close to the range.

Potential issue:

promql
rate(http_requests_total[10s])  # If scrape interval is 15s, this may not work reliably

Better approach:

promql
rate(http_requests_total[1m])  # Ensure multiple data points in the range

The general rule is to use a time range at least 4 times your scrape interval.

3. Alerting on Spurious Spikes

Alerting on rate() can sometimes trigger false alarms due to temporary spikes.

Sensitive to spikes:

promql
rate(http_errors_total[1m]) > 5

More robust alerting:

promql
rate(http_errors_total[5m]) > 5

For alerting specifically on spikes, irate() might be appropriate with proper thresholds.

Advanced Usage: Combining with Other Functions

The rate() function is often combined with other PromQL functions for more sophisticated analyses:

Aggregating Rates Across Instances

promql
sum by (instance) (rate(http_requests_total[5m]))

This calculates the request rate for each instance separately, then sums them by instance.

Moving Averages of Rates

promql
avg_over_time(rate(http_requests_total[5m])[1h:15m])

This creates a 1-hour moving average of the 5-minute rates, sampled every 15 minutes.

Predicting Future Values

promql
predict_linear(rate(http_requests_total[6h])[1h:], 3600)

This predicts what the rate will be in 1 hour (3600 seconds) based on the trend of the last 6 hours.

Summary

The rate() function is a cornerstone of PromQL that transforms counter metrics into more actionable per-second rates. Key points to remember:

  • Use rate() only with counter metrics
  • Select an appropriate time range that balances responsiveness and stability
  • Remember that rate() handles counter resets automatically
  • Use rate() for visualization and general monitoring; consider irate() for alerting on sudden changes
  • Combine with aggregation and other functions for more sophisticated analyses

By mastering the rate() function, you'll be able to extract meaningful insights from your time-series data and build effective monitoring dashboards.

Exercises

  1. Calculate the per-second rate of HTTP requests over the last 10 minutes.
  2. Compare the error rates between different service endpoints.
  3. Create a query that shows the percentage of CPU usage per container.
  4. Build a query that calculates the ratio of errors to total requests over 5 minutes.
  5. Implement a query that predicts what your request rate will be in 4 hours based on the current trend.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)