Skip to main content

Rate Function

Introduction

The rate() function is a fundamental operation in LogQL metrics queries that allows you to calculate how quickly a counter metric is increasing over time. It is particularly useful for analyzing time-series data where you need to understand the velocity of change rather than just the raw accumulated values.

In monitoring and observability scenarios, the rate() function helps you answer questions like:

  • How many requests per second is my application processing?
  • What is the rate of error occurrences over time?
  • How quickly is disk space being consumed?

This guide will walk you through the purpose, syntax, and practical applications of the rate() function in LogQL, which is similar to its counterpart in PromQL (Prometheus Query Language).

Understanding Counter Metrics

Before diving into the rate() function, it's important to understand what counter metrics are:

  • Counter: A metric that only increases over time (or resets to zero when the process restarts)
  • Examples: Total HTTP requests, error counts, bytes sent

Counters continuously accumulate values, so looking at raw counter values often isn't useful. What's more meaningful is the rate of change of these counters, which is exactly what the rate() function calculates.

Syntax and Usage

The basic syntax of the rate() function in LogQL is:

sql
rate(metric_expression[time_range])

Where:

  • metric_expression is a LogQL expression that returns a counter metric
  • time_range is the lookback window over which to calculate the rate

Parameters

  • time_range: Specifies the time window for rate calculation (e.g., 5m, 1h, 30s)
    • Longer ranges produce smoother graphs with less noise
    • Shorter ranges show more detail but can be noisier

Examples

Basic Usage

Here's a simple example that calculates the rate of HTTP requests per second over a 5-minute window:

sql
rate({app="frontend"}
| json | __error__=""
| unwrap request_count[5m])

This query:

  1. Selects logs from the frontend application
  2. Parses them as JSON
  3. Filters out parsing errors
  4. Extracts the request_count metric
  5. Calculates the per-second rate of increase over 5-minute windows

Visualizing Request Rate by Status Code

To see the rate of HTTP requests by status code:

sql
rate({app="web-server", job="nginx"}
| pattern `<_> - - <_> "<method> <_> <_>" <status> <_>`
| status=~"5.."
| unwrap count_over_time({app="web-server", job="nginx"}[1m])
by (status))

This will show you the rate of 5xx errors per second, broken down by specific status code.

Calculating CPU Usage Rate

To monitor how quickly CPU usage is changing:

sql
rate({app="system-metrics"} 
| json
| unwrap cpu_seconds_total[2m])

This calculates the per-second rate at which CPU seconds are being consumed.

Real-World Applications

Alerting on Sudden Spikes

One practical application is setting up alerts for abnormal increases in error rates:

sql
rate({app="payment-service"} 
| json
| level="error"
| unwrap error_count[5m]) > 10

This alert would trigger if the application starts generating more than 10 errors per second over a 5-minute window.

Capacity Planning

You can use the rate() function to analyze growth trends and plan for capacity needs:

sql
avg_over_time(
rate({app="database"}
| json
| unwrap storage_bytes[1h])[24h:1h]
)

This query calculates the average hourly growth rate of database storage over the past 24 hours, which can help predict when you'll need to add more storage.

Service Level Objective (SLO) Monitoring

To track whether your service is meeting its performance objectives:

sql
sum(
rate({app="api-gateway"}
| json
| response_time > 0.5
| unwrap request_count[5m])
)
/
sum(
rate({app="api-gateway"}
| json
| unwrap request_count[5m])
)

This calculates the ratio of slow requests (response time > 500ms) to total requests, helping you monitor your SLO compliance.

Common Pitfalls and Best Practices

Handling Counter Resets

When a counter resets (e.g., when a service restarts), the rate() function automatically handles this by detecting and compensating for the drop back to zero.

Choosing the Right Time Range

  • Too short: May produce noisy, spiky graphs
  • Too long: May smooth out important short-term variations
  • Rule of thumb: Use a range at least 4x the scrape interval for reliable results

Memory Usage Considerations

Be cautious with very long time ranges as they can consume significant memory in Loki. For long-range analysis, consider using recording rules to pre-compute common expressions.

Comparing with Other Functions

FunctionPurposeWhen to Use
rate()Per-second average rate of increaseFor regular monitoring dashboards, general trending
irate()Instant rate based on last two samplesWhen you need to see rapid changes in real-time
increase()Total increase over a time periodWhen you want the absolute increase rather than per-second rate

Summary

The rate() function is an essential tool in LogQL metrics that transforms raw counter values into meaningful rates of change. By calculating how quickly metrics are increasing per second, it provides valuable insights into system behavior, performance trends, and potential issues.

Key takeaways:

  • Use rate() to convert counter metrics into per-second rates
  • Choose an appropriate time range for your specific monitoring needs
  • Remember that rate() automatically handles counter resets
  • The function is invaluable for alerting, capacity planning, and SLO monitoring

Exercises

  1. Create a query that shows the rate of 4xx errors across different service instances
  2. Build a dashboard that compares the rate of successful transactions versus failed ones
  3. Set up an alert that triggers when the rate of database connections exceeds a threshold
  4. Calculate and visualize the rate at which log volume is increasing for a specific application

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)