Common Metric Patterns

Introduction

LogQL, Loki's query language, provides powerful capabilities not just for retrieving logs but also for extracting metrics from your log data. This ability to transform logs into metrics bridges the gap between logging and monitoring systems, enabling you to derive quantitative insights from qualitative log data.

In this guide, we'll explore common metric patterns in LogQL that help you extract valuable metrics from your logs. These patterns are essential for effective monitoring, alerting, and visualization in Grafana Loki.

Understanding LogQL Metric Queries

Before diving into specific patterns, let's understand the basic structure of a LogQL metric query:

logQL_stream_selector | logQL_pipeline_stages | metric_extraction_function(parameter)

A metric query in LogQL consists of three main components:

Stream selector: Selects the log streams to query (e.g., {app="frontend"})
Pipeline stages: Optional transformations to prepare log lines (e.g., | json | line_format "{{.message}}")
Metric extraction: Functions that convert logs into numeric values (e.g., | rate())

Now, let's explore common metric patterns you'll use regularly.

Common Metric Patterns

1. Rate of Log Lines

The most basic metric pattern is measuring the rate of log occurrences. This is useful for tracking error rates, activity levels, or any event frequency.

Example: Error Rate Monitoring

logql
{app="payment-service"} |= "error" | rate(1m)

This query:

Selects logs from the payment service
Filters for lines containing "error"
Calculates the rate of these errors per second over a 1-minute window

Output Visualization

When visualized in Grafana, this produces a time series showing error frequency, allowing you to spot spikes or abnormal patterns.

2. Counting Specific Events

The count_over_time function lets you count occurrences within specific time windows.

logql
{app="auth-service"} |= "login attempt" | count_over_time(5m)

This counts login attempts in 5-minute buckets, useful for authentication monitoring.

Difference from Rate

While rate() gives per-second values, count_over_time() provides absolute counts within each time window.

3. Extracting and Aggregating Numeric Values

A powerful pattern is extracting numeric values from logs and aggregating them.

Example: Average Response Time

logql
{app="web-server"} 
| regexp `.*response_time=(?P<response_time>[0-9]+).*` 
| unwrap response_time 
| avg_over_time(1m)

This query:

Selects web server logs
Extracts response time using a regular expression
Unwraps the extracted value
Calculates the average over 1-minute windows

4. Calculating Percentiles

For performance monitoring, percentiles are often more useful than averages.

Example: 95th Percentile Response Time

logql
{app="api-gateway"}
| json
| duration > 0
| unwrap duration
| quantile_over_time(0.95, 5m)

This query extracts the 95th percentile of API response durations over 5-minute windows.

5. Creating Histograms

Histograms provide distribution visualization of your metrics.

Example: Response Time Distribution

logql
{app="web-server"} 
| regexp `.*response_time=(?P<response_time>[0-9]+).*` 
| unwrap response_time 
| histogram_quantile(0.5, sum by(le) (rate(${__range})))

This creates a histogram of response times and calculates the median (50th percentile).

6. Group-Based Metrics

Grouping metrics by labels allows for comparative analysis.

Example: Error Rates by Service

logql
{environment="production"} |= "ERROR"
| label_format service_name="{{ service }}"
| rate(5m)
| sum by(service_name)

This query:

Selects production error logs
Formats a service name label
Calculates error rates
Sums rates by service

7. Absent Metrics (Detecting Missing Logs)

Sometimes the absence of logs is as important as their presence.

Example: Detecting Service Silence

logql
absent(count_over_time({app="critical-service"} [10m]))

This returns a value of 1 when the critical service has no logs for 10 minutes, useful for detecting silent failures.

Applying Metrics in Real-World Scenarios

Let's explore how these patterns apply to real-world monitoring scenarios.

Application Performance Monitoring

logql
{app="ecommerce"} 
| json 
| unwrap_duration(request_time) 
| sum by (endpoint) (rate(1m))

This tracks request rates across different endpoints in an e-commerce application.

Error Budgeting

logql
1 - (
  sum(rate({app="payment-gateway"} |= "success" [1h]))
  /
  sum(rate({app="payment-gateway"} [1h]))
)

This calculates the error rate as a proportion of total requests, useful for SLO monitoring.

Security Monitoring

logql
{app="auth-service"} 
|= "failed login" 
| json 
| label_format user="{{ username }}" 
| count_over_time(5m) 
| sum by(user)

This counts failed login attempts per user, helping identify potential brute force attacks.

Visualizing Metrics in Grafana

These LogQL metric patterns can be visualized in Grafana using:

Time series panels: For rate, count, and numeric metrics over time
Gauge panels: For current values against thresholds
Bar charts: For comparing metrics across different services/components
Heatmaps: For visualizing histogram data

Common Pitfalls and Optimization

Potential Issues

High cardinality: Be cautious with high-cardinality labels in group operations
Resource consumption: Complex metric queries can be resource-intensive
Time window selection: Too small windows cause noise, too large lose detail

Optimizations

logql
# Instead of this (processes all logs first)
{app="busy-service"} | json | status=~"5.." | rate(1m)

# Do this (filters first)
{app="busy-service"} |= "status\":5" | json | rate(1m)

Always filter logs as early as possible in your query to improve performance.

Combining Metric Patterns

You can combine patterns for more sophisticated monitoring:

Summary

LogQL metric patterns transform your logs into valuable metrics, enabling:

Rate-based monitoring for events and errors
Extraction and analysis of numeric values
Distribution analysis through percentiles and histograms
Comparative analysis through grouping
Detection of missing logs

By mastering these patterns, you can build comprehensive monitoring solutions that leverage your log data beyond traditional log analysis.

Exercises

Create a query to monitor the 99th percentile response time for a web application
Develop a query that tracks error rates by HTTP status code
Build a query to detect when a critical service hasn't logged any activity for 5 minutes
Create a dashboard showing the top 5 users by login frequency

Further Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding LogQL Metric Queries​

Common Metric Patterns​

1. Rate of Log Lines​

Example: Error Rate Monitoring​

Output Visualization​

2. Counting Specific Events​

Example: Login Attempts​

Difference from Rate​

3. Extracting and Aggregating Numeric Values​

Example: Average Response Time​

4. Calculating Percentiles​

Example: 95th Percentile Response Time​

5. Creating Histograms​

Example: Response Time Distribution​

6. Group-Based Metrics​

Example: Error Rates by Service​

7. Absent Metrics (Detecting Missing Logs)​

Example: Detecting Service Silence​

Applying Metrics in Real-World Scenarios​

Application Performance Monitoring​

Error Budgeting​

Security Monitoring​

Visualizing Metrics in Grafana​

Common Pitfalls and Optimization​

Potential Issues​

Optimizations​

Combining Metric Patterns​

Summary​

Exercises​

Further Resources​

Introduction

Understanding LogQL Metric Queries

Common Metric Patterns

1. Rate of Log Lines

Example: Error Rate Monitoring

Output Visualization

2. Counting Specific Events

Example: Login Attempts

Difference from Rate

3. Extracting and Aggregating Numeric Values

Example: Average Response Time

4. Calculating Percentiles

Example: 95th Percentile Response Time

5. Creating Histograms

Example: Response Time Distribution

6. Group-Based Metrics

Example: Error Rates by Service

7. Absent Metrics (Detecting Missing Logs)

Example: Detecting Service Silence

Applying Metrics in Real-World Scenarios

Application Performance Monitoring

Error Budgeting

Security Monitoring

Visualizing Metrics in Grafana

Common Pitfalls and Optimization

Potential Issues

Optimizations

Combining Metric Patterns

Summary

Exercises

Further Resources