Pattern Parsing in LogQL
Introduction
Pattern parsing is a powerful feature in LogQL that allows you to extract structured data from unstructured log messages. When working with logs, you'll often encounter semi-structured text that contains valuable information embedded within strings. Pattern parsing provides a way to identify, extract, and work with this data without having to pre-process your logs.
In this guide, we'll explore how LogQL's pattern parsing capabilities work, various parsing methods available, and how to use them effectively to query and analyze your log data.
Understanding Pattern Parsing
Pattern parsing transforms unstructured log lines into structured data by extracting fields based on patterns. This enables you to:
- Filter logs based on the content of extracted fields
- Create metrics from extracted values
- Group and aggregate logs using extracted labels
- Perform operations on extracted numeric values
LogQL supports multiple parsing methods to extract data from logs:
- Regular expressions with named capture groups
- JSON parsing
- Logfmt parsing
- Pattern parsing with custom formats
Basic Pattern Extraction
The most common way to extract patterns in LogQL is by using the |
pipe operator followed by a parser expression.
Using Regular Expressions
Regular expressions provide a flexible way to match patterns in your logs:
{app="myapp"} |= "ERROR" | regexp `error: (?P<error_message>.*)`
In this example:
- We first filter logs for the label
app="myapp"
that contain the string "ERROR" - Then we extract the error message using a regular expression with a named capture group
error_message
The captured value becomes available as a label that you can use in your queries:
{app="myapp"} |= "ERROR"
| regexp `error: (?P<error_message>.*)`
| label_format severity="error"
JSON Parsing
For logs in JSON format, LogQL provides a dedicated JSON parser:
{app="payment-service"} | json
This extracts all top-level fields from JSON log lines. You can also extract specific fields:
{app="payment-service"}
| json transaction_id="transaction.id", amount="transaction.amount", currency="transaction.currency"
The extracted values are added as labels to your log entries.
Advanced Pattern Parsing
Nested Extraction
You can chain multiple extraction operations to parse complex logs:
{app="myapp"}
| json
| logfmt message
This first extracts JSON fields, then parses the extracted message
field using the logfmt parser.
Custom Pattern Formats
LogQL supports custom pattern formats using the pattern
parser:
{app="auth-service"}
| pattern `<time> <_> <_> <level> <_> [<trace_id>] <message>`
This extracts structured fields based on position in the log line. Underscores (<_>
) indicate fields to skip.
Unpack Parsing
The unpack
parser is useful for extracting dot notation fields from nested structures:
{app="api-gateway"}
| json
| unpack
This flattens nested JSON objects into top-level labels.
Practical Examples
Example 1: Parsing Application Logs
Let's say we have logs in this format:
2023-06-15T12:34:56Z INFO [request-123] User login successful: user_id=456 source_ip=203.0.113.42
We can extract structured data with:
{app="auth-service"} |= "User login"
| regexp `(?P<timestamp>\S+) (?P<level>\S+) \[(?P<request_id>[^\]]+)\] (?P<message>User login .*): user_id=(?P<user_id>\d+) source_ip=(?P<source_ip>\S+)`
This extracts:
timestamp
level
request_id
message
user_id
source_ip
Example 2: Parsing Error Logs and Creating Metrics
We can parse error logs and create metrics from them:
sum by (error_type) (
count_over_time(
{app="payment-service"} |= "ERROR"
| json error_type="error.type", error_code="error.code"
[5m]
)
)
This creates a count of errors grouped by error type over a 5-minute window.
Example 3: Combining Multiple Parser Types
For complex logs, you may need to combine parser types:
{app="orders"}
| json
| line_format "{{.message}}"
| logfmt
| duration_seconds = duration
This extracts JSON fields, formats just the message field, parses it with logfmt, and converts a duration field to seconds.
Performance Considerations
Pattern parsing operations can be computationally expensive, especially on large volumes of logs. To optimize performance:
- Filter logs as much as possible before applying parsers
- Use the most specific parser for your log format
- Extract only the fields you need
- Consider using the LogQL pre-processing pipeline to parse logs during ingestion
Advanced Techniques
Using Extracted Fields in Filters
Once you've extracted fields using pattern parsing, you can filter on them:
{app="web-server"}
| json method="req.method", path="req.path", status="resp.status"
| status=~"5.." and method="POST"
This finds all 500-level errors for POST requests.
Creating Dynamic Labels
You can transform extracted fields into new labels:
{app="myapp"}
| json
| label_format api_path=`{{without .path "/api/v1"}}`
This strips "/api/v1" from the path and creates a new label.
Formatting Output
The line_format
directive lets you create custom log lines from extracted fields:
{app="payment-service"}
| json
| line_format "{{.timestamp}} [{{.transaction_id}}] Amount: {{.amount}} {{.currency}}"
Troubleshooting Pattern Parsing
If your pattern parsing isn't working as expected:
- Test with smaller datasets: Limit your query to a small time range
- Debug with line_format: Use
line_format "{{.}}"
to see all extracted fields - Check your regex: Validate your regular expressions with a testing tool
- Examine raw logs: Compare your patterns against raw log samples
Summary
Pattern parsing in LogQL provides powerful capabilities for extracting structured data from logs. With the various parsing methods available (regex, JSON, logfmt, and custom patterns), you can transform unstructured logs into structured data for analysis and visualization.
By mastering pattern parsing, you'll be able to:
- Extract valuable information from logs
- Create meaningful metrics from log data
- Filter and aggregate logs based on their content
- Build insightful dashboards
Additional Resources
- Practice extracting fields from different log formats
- Experiment with different regex patterns
- Try combining multiple parsers in a single query
- Build a dashboard using extracted fields
Exercises
- Write a LogQL query to extract HTTP status codes, methods, and response times from web server logs
- Create a query that finds the 10 slowest API requests using extracted duration fields
- Parse logs containing JSON within string fields
- Build a dashboard panel showing error rates by component using pattern parsing
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)