Rate Function
Introduction
The rate()
function is a fundamental operation in LogQL metrics queries that allows you to calculate how quickly a counter metric is increasing over time. It is particularly useful for analyzing time-series data where you need to understand the velocity of change rather than just the raw accumulated values.
In monitoring and observability scenarios, the rate()
function helps you answer questions like:
- How many requests per second is my application processing?
- What is the rate of error occurrences over time?
- How quickly is disk space being consumed?
This guide will walk you through the purpose, syntax, and practical applications of the rate()
function in LogQL, which is similar to its counterpart in PromQL (Prometheus Query Language).
Understanding Counter Metrics
Before diving into the rate()
function, it's important to understand what counter metrics are:
- Counter: A metric that only increases over time (or resets to zero when the process restarts)
- Examples: Total HTTP requests, error counts, bytes sent
Counters continuously accumulate values, so looking at raw counter values often isn't useful. What's more meaningful is the rate of change of these counters, which is exactly what the rate()
function calculates.
Syntax and Usage
The basic syntax of the rate()
function in LogQL is:
rate(metric_expression[time_range])
Where:
metric_expression
is a LogQL expression that returns a counter metrictime_range
is the lookback window over which to calculate the rate
Parameters
- time_range: Specifies the time window for rate calculation (e.g.,
5m
,1h
,30s
)- Longer ranges produce smoother graphs with less noise
- Shorter ranges show more detail but can be noisier
Examples
Basic Usage
Here's a simple example that calculates the rate of HTTP requests per second over a 5-minute window:
rate({app="frontend"}
| json | __error__=""
| unwrap request_count[5m])
This query:
- Selects logs from the
frontend
application - Parses them as JSON
- Filters out parsing errors
- Extracts the
request_count
metric - Calculates the per-second rate of increase over 5-minute windows
Visualizing Request Rate by Status Code
To see the rate of HTTP requests by status code:
rate({app="web-server", job="nginx"}
| pattern `<_> - - <_> "<method> <_> <_>" <status> <_>`
| status=~"5.."
| unwrap count_over_time({app="web-server", job="nginx"}[1m])
by (status))
This will show you the rate of 5xx errors per second, broken down by specific status code.
Calculating CPU Usage Rate
To monitor how quickly CPU usage is changing:
rate({app="system-metrics"}
| json
| unwrap cpu_seconds_total[2m])
This calculates the per-second rate at which CPU seconds are being consumed.
Real-World Applications
Alerting on Sudden Spikes
One practical application is setting up alerts for abnormal increases in error rates:
rate({app="payment-service"}
| json
| level="error"
| unwrap error_count[5m]) > 10
This alert would trigger if the application starts generating more than 10 errors per second over a 5-minute window.
Capacity Planning
You can use the rate()
function to analyze growth trends and plan for capacity needs:
avg_over_time(
rate({app="database"}
| json
| unwrap storage_bytes[1h])[24h:1h]
)
This query calculates the average hourly growth rate of database storage over the past 24 hours, which can help predict when you'll need to add more storage.
Service Level Objective (SLO) Monitoring
To track whether your service is meeting its performance objectives:
sum(
rate({app="api-gateway"}
| json
| response_time > 0.5
| unwrap request_count[5m])
)
/
sum(
rate({app="api-gateway"}
| json
| unwrap request_count[5m])
)
This calculates the ratio of slow requests (response time > 500ms) to total requests, helping you monitor your SLO compliance.
Common Pitfalls and Best Practices
Handling Counter Resets
When a counter resets (e.g., when a service restarts), the rate()
function automatically handles this by detecting and compensating for the drop back to zero.
Choosing the Right Time Range
- Too short: May produce noisy, spiky graphs
- Too long: May smooth out important short-term variations
- Rule of thumb: Use a range at least 4x the scrape interval for reliable results
Memory Usage Considerations
Be cautious with very long time ranges as they can consume significant memory in Loki. For long-range analysis, consider using recording rules to pre-compute common expressions.
Comparing with Other Functions
Function | Purpose | When to Use |
---|---|---|
rate() | Per-second average rate of increase | For regular monitoring dashboards, general trending |
irate() | Instant rate based on last two samples | When you need to see rapid changes in real-time |
increase() | Total increase over a time period | When you want the absolute increase rather than per-second rate |
Summary
The rate()
function is an essential tool in LogQL metrics that transforms raw counter values into meaningful rates of change. By calculating how quickly metrics are increasing per second, it provides valuable insights into system behavior, performance trends, and potential issues.
Key takeaways:
- Use
rate()
to convert counter metrics into per-second rates - Choose an appropriate time range for your specific monitoring needs
- Remember that
rate()
automatically handles counter resets - The function is invaluable for alerting, capacity planning, and SLO monitoring
Exercises
- Create a query that shows the rate of 4xx errors across different service instances
- Build a dashboard that compares the rate of successful transactions versus failed ones
- Set up an alert that triggers when the rate of database connections exceeds a threshold
- Calculate and visualize the rate at which log volume is increasing for a specific application
Additional Resources
- Grafana Loki Documentation
- LogQL Metrics Overview
- Prometheus Rate Function (similar to Loki's implementation)
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)