Rate Function

Introduction

The rate() function is a fundamental operation in LogQL metrics queries that allows you to calculate how quickly a counter metric is increasing over time. It is particularly useful for analyzing time-series data where you need to understand the velocity of change rather than just the raw accumulated values.

In monitoring and observability scenarios, the rate() function helps you answer questions like:

How many requests per second is my application processing?
What is the rate of error occurrences over time?
How quickly is disk space being consumed?

This guide will walk you through the purpose, syntax, and practical applications of the rate() function in LogQL, which is similar to its counterpart in PromQL (Prometheus Query Language).

Understanding Counter Metrics

Before diving into the rate() function, it's important to understand what counter metrics are:

Counter: A metric that only increases over time (or resets to zero when the process restarts)
Examples: Total HTTP requests, error counts, bytes sent

Counters continuously accumulate values, so looking at raw counter values often isn't useful. What's more meaningful is the rate of change of these counters, which is exactly what the rate() function calculates.

Syntax and Usage

The basic syntax of the rate() function in LogQL is:

rate(metric_expression[time_range])

Where:

metric_expression is a LogQL expression that returns a counter metric
time_range is the lookback window over which to calculate the rate

Parameters

time_range: Specifies the time window for rate calculation (e.g., 5m, 1h, 30s)
- Longer ranges produce smoother graphs with less noise
- Shorter ranges show more detail but can be noisier

Examples

Basic Usage

Here's a simple example that calculates the rate of HTTP requests per second over a 5-minute window:

rate({app="frontend"}
  | json | __error__="" 
  | unwrap request_count[5m])

This query:

Selects logs from the frontend application
Parses them as JSON
Filters out parsing errors
Extracts the request_count metric
Calculates the per-second rate of increase over 5-minute windows

Visualizing Request Rate by Status Code

To see the rate of HTTP requests by status code:

rate({app="web-server", job="nginx"}
  | pattern `<_> - - <_> "<method> <_> <_>" <status> <_>`
  | status=~"5.."
  | unwrap count_over_time({app="web-server", job="nginx"}[1m])
  by (status))

This will show you the rate of 5xx errors per second, broken down by specific status code.

Calculating CPU Usage Rate

To monitor how quickly CPU usage is changing:

rate({app="system-metrics"} 
  | json 
  | unwrap cpu_seconds_total[2m])

This calculates the per-second rate at which CPU seconds are being consumed.

Real-World Applications

Alerting on Sudden Spikes

One practical application is setting up alerts for abnormal increases in error rates:

rate({app="payment-service"} 
  | json 
  | level="error" 
  | unwrap error_count[5m]) > 10

This alert would trigger if the application starts generating more than 10 errors per second over a 5-minute window.

Capacity Planning

You can use the rate() function to analyze growth trends and plan for capacity needs:

avg_over_time(
  rate({app="database"} 
    | json 
    | unwrap storage_bytes[1h])[24h:1h]
)

This query calculates the average hourly growth rate of database storage over the past 24 hours, which can help predict when you'll need to add more storage.

Service Level Objective (SLO) Monitoring

To track whether your service is meeting its performance objectives:

sum(
  rate({app="api-gateway"} 
    | json 
    | response_time > 0.5 
    | unwrap request_count[5m])
) 
/ 
sum(
  rate({app="api-gateway"} 
    | json 
    | unwrap request_count[5m])
)

This calculates the ratio of slow requests (response time > 500ms) to total requests, helping you monitor your SLO compliance.

Common Pitfalls and Best Practices

Handling Counter Resets

When a counter resets (e.g., when a service restarts), the rate() function automatically handles this by detecting and compensating for the drop back to zero.

Choosing the Right Time Range

Too short: May produce noisy, spiky graphs
Too long: May smooth out important short-term variations
Rule of thumb: Use a range at least 4x the scrape interval for reliable results

Memory Usage Considerations

Be cautious with very long time ranges as they can consume significant memory in Loki. For long-range analysis, consider using recording rules to pre-compute common expressions.

Comparing with Other Functions

Function	Purpose	When to Use
`rate()`	Per-second average rate of increase	For regular monitoring dashboards, general trending
`irate()`	Instant rate based on last two samples	When you need to see rapid changes in real-time
`increase()`	Total increase over a time period	When you want the absolute increase rather than per-second rate

Summary

The rate() function is an essential tool in LogQL metrics that transforms raw counter values into meaningful rates of change. By calculating how quickly metrics are increasing per second, it provides valuable insights into system behavior, performance trends, and potential issues.

Key takeaways:

Use rate() to convert counter metrics into per-second rates
Choose an appropriate time range for your specific monitoring needs
Remember that rate() automatically handles counter resets
The function is invaluable for alerting, capacity planning, and SLO monitoring

Exercises

Create a query that shows the rate of 4xx errors across different service instances
Build a dashboard that compares the rate of successful transactions versus failed ones
Set up an alert that triggers when the rate of database connections exceeds a threshold
Calculate and visualize the rate at which log volume is increasing for a specific application

Additional Resources

Grafana Loki Documentation
LogQL Metrics Overview
Prometheus Rate Function (similar to Loki's implementation)

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Counter Metrics​

Syntax and Usage​

Parameters​

Examples​

Basic Usage​

Visualizing Request Rate by Status Code​

Calculating CPU Usage Rate​

Real-World Applications​

Alerting on Sudden Spikes​

Capacity Planning​

Service Level Objective (SLO) Monitoring​

Common Pitfalls and Best Practices​

Handling Counter Resets​

Choosing the Right Time Range​

Memory Usage Considerations​

Comparing with Other Functions​

Summary​

Exercises​

Additional Resources​