Logs to Metrics
Introduction
When monitoring applications and infrastructure, we often have two types of data: logs and metrics. Logs are text-based records of events that happened in your system, while metrics are numerical measurements of system behavior over time.
While these data types serve different purposes, there's tremendous value in being able to extract metrics from logs. This process, known as "Logs to Metrics," allows you to:
- Quantify patterns in your log data
- Create visualizations based on log content
- Set up alerts based on log-derived metrics
- Correlate log events with performance metrics
In this guide, we'll explore how Grafana Loki allows you to transform logs into metrics using LogQL, Loki's query language. This powerful capability bridges the gap between qualitative log data and quantifiable metrics for monitoring and alerting.
Understanding Logs to Metrics Conversion
What Makes a Good Log-Based Metric?
Before diving into the technical implementation, let's understand what types of information in logs make good metrics:
- Numerical values - Response times, error codes, request sizes
- Countable events - Login attempts, API calls, errors
- Categorical data - HTTP status codes, error types, user actions
Basic Concepts
When converting logs to metrics in Loki, we generally follow this process:
- Filter the relevant logs using LogQL
- Extract the values you care about (using patterns or parsing)
- Transform these values into metrics using aggregation functions
- Visualize the resulting metrics in Grafana dashboards
Let's see how this works in practice.
Basic Log Metric Extraction with LogQL
Loki's LogQL language provides operators that allow you to extract and aggregate numerical data from your logs.
Counting Log Lines
The simplest metric you can extract is a count of log occurrences. For example, to count error logs:
count_over_time({app="myapp"} |= "error" [5m])
This query:
- Selects logs from the "myapp" application
- Filters for logs containing the word "error"
- Counts occurrences within 5-minute windows
Extracting Numerical Values
To extract specific numerical values from logs, we use regex capture groups:
{app="webserver"} | regexp "request_time=(?P<request_time>[0-9.]+)" | unwrap request_time
This query:
- Selects logs from the "webserver" application
- Uses regex to extract the request time value
- Unwraps the extracted value into a metric
Aggregating Extracted Values
Once values are extracted, you can apply aggregation functions:
{app="webserver"}
| regexp "request_time=(?P<request_time>[0-9.]+)"
| unwrap request_time
| avg_over_time[5m]
This calculates the average request time in 5-minute windows.
Practical Examples
Let's go through some real-world examples of extracting metrics from different types of logs.
Example 1: HTTP Response Times from Nginx Logs
Nginx logs typically contain response times. Let's extract and visualize these:
{job="nginx"}
| regexp `.*HTTP/1\.\d" \d+ \d+ (?P<response_time>\d+\.\d+).*`
| unwrap response_time
| avg_over_time[1m]
This query:
- Selects Nginx logs
- Extracts the response time using regex
- Calculates 1-minute average response times
In Grafana, you could visualize this as a time series graph showing how response times change over time.
Example 2: Error Rate from Application Logs
Let's calculate an error rate from application logs:
sum(count_over_time({app="payment-service"} |= "ERROR" [1m]))
/
sum(count_over_time({app="payment-service"} [1m]))
This query:
- Counts error logs in 1-minute windows
- Divides by the total number of logs to get an error rate percentage
- Returns a metric that shows what percentage of logs are errors
Example 3: Parsing JSON Logs
Many modern applications output logs in JSON format. Here's how to extract metrics from them:
{app="user-service"}
| json
| duration > 100ms
| unwrap response_time_ms
| sum_over_time[5m]
This query:
- Selects logs from the user service
- Parses them as JSON
- Filters for slow responses (>100ms)
- Extracts the response time
- Sums these times over 5-minute windows
Creating Log-Based Metrics Dashboards
Once you've defined your log-based metrics, you can visualize them in Grafana dashboards. Here's how:
- Create a new dashboard in Grafana
- Add a new panel
- Select Loki as the data source
- Enter your LogQL query
- Configure visualization options (graph, gauge, etc.)
Let's see an example dashboard structure:
Setting Up Alerts on Log-Derived Metrics
One of the most powerful features of converting logs to metrics is the ability to set up alerts based on log patterns.
For example, to alert when error rates exceed 5%:
- Create a Grafana alert rule
- Use your error rate LogQL query
- Set a threshold condition (> 0.05)
- Configure notification channels
Example alert definition:
# Alert when error rate exceeds 5%
- name: HighErrorRate
expr: sum(count_over_time({app="payment-service"} |= "ERROR" [5m])) / sum(count_over_time({app="payment-service"} [5m])) > 0.05
for: 10m
labels:
severity: warning
annotations:
summary: High error rate detected
description: Payment service error rate is above 5% for more than 10 minutes
Advanced Techniques
Using Vector Aggregation
LogQL supports vector aggregations similar to PromQL, allowing for more complex metric calculations:
sum by (status_code) (count_over_time({app="api"} | json | __error__="" [5m]))
This query counts log occurrences grouped by status code, creating a metric for each status code.
Rate vs. Count
The rate
function can be used instead of count_over_time
to normalize results to a per-second value:
rate({app="api"} |= "error" [5m])
Custom Labels and Metrics
You can create custom labels for your metrics using the label_format
operator:
{app="webserver"}
| json
| label_format service_level=`{{.status}}xx`
| unwrap response_time
This would create metrics with service level labels like "20x", "40x", etc.
Best Practices
When implementing logs to metrics conversions, consider these best practices:
- Focus on actionable metrics - Convert logs to metrics that help you make decisions
- Consider cardinality - Too many unique label values can cause performance issues
- Balance precision and performance - More detailed metrics require more processing
- Start simple - Begin with basic counts before complex extractions
- Document your queries - LogQL queries can become complex; documentation helps
LogQL Cheat Sheet for Metrics Extraction
Here's a quick reference of useful LogQL functions for metrics extraction:
Function | Description | Example |
---|---|---|
count_over_time | Count log lines over a time period | count_over_time({app="auth"} |= "failed" [5m]) |
rate | Per-second rate of log lines | rate({app="api"} [5m]) |
unwrap | Extract a value as a sample | {app="web"} | json | unwrap duration_ms |
avg_over_time | Average of values over time | avg_over_time({job="nginx"} | regexp | unwrap time [5m]) |
sum_over_time | Sum of values over time | sum_over_time({app="orders"} | json | unwrap order_total [5m]) |
max_over_time | Maximum value over time | max_over_time({app="api"} | regexp | unwrap response_time [5m]) |
Summary
Converting logs to metrics bridges the gap between qualitative log data and quantifiable metrics. With Grafana Loki and LogQL, you can:
- Extract numerical data from your logs
- Aggregate this data into meaningful metrics
- Visualize patterns and trends in your log data
- Set up alerts based on log-derived metrics
This capability provides a more complete monitoring solution, especially for applications that don't directly expose metrics or when you need to correlate logs with performance data.
Further Learning
To deepen your understanding of logs to metrics conversion in Grafana Loki, try these exercises:
- Extract and visualize response times from your application logs
- Create an error rate dashboard for your services
- Set up an alert for unusual patterns in your logs
- Experiment with different aggregation functions
- Try combining log-derived metrics with traditional metrics
Remember that effective monitoring combines both logs and metrics for a complete picture of your system's health and performance.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)