Skip to main content

Pattern Parsing in LogQL

Introduction

Pattern parsing is a powerful feature in LogQL that allows you to extract structured data from unstructured log messages. When working with logs, you'll often encounter semi-structured text that contains valuable information embedded within strings. Pattern parsing provides a way to identify, extract, and work with this data without having to pre-process your logs.

In this guide, we'll explore how LogQL's pattern parsing capabilities work, various parsing methods available, and how to use them effectively to query and analyze your log data.

Understanding Pattern Parsing

Pattern parsing transforms unstructured log lines into structured data by extracting fields based on patterns. This enables you to:

  1. Filter logs based on the content of extracted fields
  2. Create metrics from extracted values
  3. Group and aggregate logs using extracted labels
  4. Perform operations on extracted numeric values

LogQL supports multiple parsing methods to extract data from logs:

  • Regular expressions with named capture groups
  • JSON parsing
  • Logfmt parsing
  • Pattern parsing with custom formats

Basic Pattern Extraction

The most common way to extract patterns in LogQL is by using the | pipe operator followed by a parser expression.

Using Regular Expressions

Regular expressions provide a flexible way to match patterns in your logs:

logql
{app="myapp"} |= "ERROR" | regexp `error: (?P<error_message>.*)`

In this example:

  • We first filter logs for the label app="myapp" that contain the string "ERROR"
  • Then we extract the error message using a regular expression with a named capture group error_message

The captured value becomes available as a label that you can use in your queries:

logql
{app="myapp"} |= "ERROR" 
| regexp `error: (?P<error_message>.*)`
| label_format severity="error"

JSON Parsing

For logs in JSON format, LogQL provides a dedicated JSON parser:

logql
{app="payment-service"} | json

This extracts all top-level fields from JSON log lines. You can also extract specific fields:

logql
{app="payment-service"} 
| json transaction_id="transaction.id", amount="transaction.amount", currency="transaction.currency"

The extracted values are added as labels to your log entries.

Advanced Pattern Parsing

Nested Extraction

You can chain multiple extraction operations to parse complex logs:

logql
{app="myapp"} 
| json
| logfmt message

This first extracts JSON fields, then parses the extracted message field using the logfmt parser.

Custom Pattern Formats

LogQL supports custom pattern formats using the pattern parser:

logql
{app="auth-service"} 
| pattern `<time> <_> <_> <level> <_> [<trace_id>] <message>`

This extracts structured fields based on position in the log line. Underscores (<_>) indicate fields to skip.

Unpack Parsing

The unpack parser is useful for extracting dot notation fields from nested structures:

logql
{app="api-gateway"} 
| json
| unpack

This flattens nested JSON objects into top-level labels.

Practical Examples

Example 1: Parsing Application Logs

Let's say we have logs in this format:

2023-06-15T12:34:56Z INFO [request-123] User login successful: user_id=456 source_ip=203.0.113.42

We can extract structured data with:

logql
{app="auth-service"} |= "User login" 
| regexp `(?P<timestamp>\S+) (?P<level>\S+) \[(?P<request_id>[^\]]+)\] (?P<message>User login .*): user_id=(?P<user_id>\d+) source_ip=(?P<source_ip>\S+)`

This extracts:

  • timestamp
  • level
  • request_id
  • message
  • user_id
  • source_ip

Example 2: Parsing Error Logs and Creating Metrics

We can parse error logs and create metrics from them:

logql
sum by (error_type) (
count_over_time(
{app="payment-service"} |= "ERROR"
| json error_type="error.type", error_code="error.code"
[5m]
)
)

This creates a count of errors grouped by error type over a 5-minute window.

Example 3: Combining Multiple Parser Types

For complex logs, you may need to combine parser types:

logql
{app="orders"} 
| json
| line_format "{{.message}}"
| logfmt
| duration_seconds = duration

This extracts JSON fields, formats just the message field, parses it with logfmt, and converts a duration field to seconds.

Performance Considerations

Pattern parsing operations can be computationally expensive, especially on large volumes of logs. To optimize performance:

  1. Filter logs as much as possible before applying parsers
  2. Use the most specific parser for your log format
  3. Extract only the fields you need
  4. Consider using the LogQL pre-processing pipeline to parse logs during ingestion

Advanced Techniques

Using Extracted Fields in Filters

Once you've extracted fields using pattern parsing, you can filter on them:

logql
{app="web-server"} 
| json method="req.method", path="req.path", status="resp.status"
| status=~"5.." and method="POST"

This finds all 500-level errors for POST requests.

Creating Dynamic Labels

You can transform extracted fields into new labels:

logql
{app="myapp"} 
| json
| label_format api_path=`{{without .path "/api/v1"}}`

This strips "/api/v1" from the path and creates a new label.

Formatting Output

The line_format directive lets you create custom log lines from extracted fields:

logql
{app="payment-service"} 
| json
| line_format "{{.timestamp}} [{{.transaction_id}}] Amount: {{.amount}} {{.currency}}"

Troubleshooting Pattern Parsing

If your pattern parsing isn't working as expected:

  1. Test with smaller datasets: Limit your query to a small time range
  2. Debug with line_format: Use line_format "{{.}}" to see all extracted fields
  3. Check your regex: Validate your regular expressions with a testing tool
  4. Examine raw logs: Compare your patterns against raw log samples

Summary

Pattern parsing in LogQL provides powerful capabilities for extracting structured data from logs. With the various parsing methods available (regex, JSON, logfmt, and custom patterns), you can transform unstructured logs into structured data for analysis and visualization.

By mastering pattern parsing, you'll be able to:

  • Extract valuable information from logs
  • Create meaningful metrics from log data
  • Filter and aggregate logs based on their content
  • Build insightful dashboards

Additional Resources

  • Practice extracting fields from different log formats
  • Experiment with different regex patterns
  • Try combining multiple parsers in a single query
  • Build a dashboard using extracted fields

Exercises

  1. Write a LogQL query to extract HTTP status codes, methods, and response times from web server logs
  2. Create a query that finds the 10 slowest API requests using extracted duration fields
  3. Parse logs containing JSON within string fields
  4. Build a dashboard panel showing error rates by component using pattern parsing


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)