Skip to main content

Regular Expressions in LogQL

Regular expressions (regex) are a powerful pattern-matching tool that allows you to search for specific patterns within your logs. In LogQL, Grafana Loki's query language, regular expressions enable you to create sophisticated queries to extract valuable information from your log data.

Introduction to Regular Expressions

Regular expressions are sequences of characters that define a search pattern. They're like a specialized mini-language for describing patterns in text. When used in LogQL, regular expressions help you:

  • Filter logs based on complex patterns
  • Extract specific fields from unstructured logs
  • Transform log data into structured metrics

LogQL uses RE2 syntax for regular expressions, which is similar to other regex implementations but with some specific limitations for performance reasons.

Basic Regex Syntax in LogQL

Let's start with the fundamental regex concepts as they apply to LogQL:

Literal Characters

The simplest regex patterns match exact text:

logql
{app="frontend"} |~ "login"

This query matches logs from the frontend app that contain the word "login".

Regex Operators in LogQL

LogQL provides several regex operators:

OperatorDescriptionExample
`~`Matches logs that contain a pattern | {app="frontend"} |~ "error"
!~Matches logs that don't contain a pattern{app="frontend"} !~ "debug"
`=`Exact string match (faster than regex) | {app="frontend"} |= "error"
!=Exact string exclusion{app="frontend"} != "healthy"

Common Regex Patterns

Here are some frequently used regex patterns in LogQL:

Character Classes

PatternMatchesExample
.Any single character{app="frontend"} |~ "l.gin" matches "login", "l3gin", etc.
\dAny digit (0-9){app="api"} |~ "status: \d\d\d" matches status codes
\wWord character (a-z, A-Z, 0-9, _){app="api"} |~ "\w+Error"
[abc]Any character in the brackets{app="api"} |~ "HTTP/1\.[01]" matches HTTP/1.0 or HTTP/1.1
[^abc]Any character NOT in the brackets{app="api"} |~ "[^2]xx" matches non-2xx status codes

Quantifiers

PatternMatchesExample
*0 or more of previous item{app="api"} |~ "ERROR.*timeout"
+1 or more of previous item{app="api"} |~ "IP: \d+\.\d+\.\d+\.\d+"
?0 or 1 of previous item{app="api"} |~ "https?" matches http or https
{n}Exactly n of previous item{app="api"} |~ "\d{3}" matches exactly 3 digits
{n,m}Between n and m of previous item{app="api"} |~ "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

Anchors

PatternMatchesExample
^Start of a line{app="api"} |~ "^ERROR" matches logs starting with ERROR
$End of a line{app="api"} |~ "completed$" matches logs ending with completed

Grouping and Alternation

PatternMatchesExample
(...)Groups patterns together{app="api"} |~ "(GET|POST|PUT|DELETE)"
|Alternation (OR){app="api"} |~ "error|warning|critical"

Real-World Examples

Let's look at practical examples of using regex in LogQL:

Example 1: Finding Error Patterns

logql
{app="payment-service"} |~ "(?i)(error|exception|fail|timeout)"

This query searches for logs in the payment-service app containing any error-related terms, case-insensitive ((?i)).

Example 2: Extracting Status Codes

logql
{app="nginx"} |~ "HTTP/\d\.\d\" (\d{3})"
| regexp "HTTP/\\d\\.\\d\\\" (?P<status_code>\\d{3})"

This extracts HTTP status codes from nginx logs and makes them available as the status_code label.

Example 3: JSON Error Analysis

logql
{app="api"} |~ "error"
| json
| status_code =~ "5.."

This finds API errors, parses JSON logs, and filters for 5xx status codes.

Example 4: Tracking Failed Login Attempts

logql
{app="auth-service"} |~ "login failed.*user=(?P<username>[^ ]+)"
| by (username)
| count_over_time[1h]

This tracks failed login attempts by username over the last hour.

Advanced Techniques

Using Regex with Label Filters

You can use regex to filter on labels:

logql
{app=~"api|backend", namespace="production", level=~"error|warn"}

This matches logs from apps matching "api" or "backend" in the production namespace with error or warn level.

Regex with Log Parsing

Combine regex with LogQL's parsing functions:

logql
{app="database"} |~ "query took (?P<duration>[0-9.]+)ms"
| regexp "query took (?P<duration>[0-9.]+)ms"
| duration > 100

This extracts query durations and finds slow queries (>100ms).

Creating Metrics from Logs Using Regex

Transform logs into metrics with regex:

logql
sum by (status_code) (
count_over_time({app="frontend"} |~ "status=(?P<status_code>\\d{3})" [5m])
)

This counts HTTP status codes over 5-minute windows.

Performance Considerations

When using regex in LogQL, keep these performance tips in mind:

  1. Use label matching first: Filter with labels before applying regex.
  2. Prefer exact matching: Use |= instead of |~ when possible.
  3. Anchor patterns: Use ^ and $ to limit where the pattern can match.
  4. Avoid backtracking: LogQL uses RE2, which doesn't support backtracking.
  5. Keep patterns simple: Complex regex can slow down query performance.

RE2 Limitations in LogQL

LogQL uses Google's RE2 regex engine, which has some limitations:

  • No backreferences (\1, \2, etc.)
  • No lookahead/lookbehind assertions
  • No atomic grouping or possessive quantifiers

These limitations are by design to guarantee linear-time matching performance.

Common Regex Patterns for LogQL

Here's a reference table of useful patterns for LogQL queries:

To matchPattern
IP addresses\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Timestamps (ISO8601)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}
UUIDs[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
URLshttps?://[^\s]+
Email addresses[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
JSON values"[^"]*":\s*"[^"]*"

Practice Examples

Let's walk through building a few regex patterns for common log analysis tasks:

Finding Authentication Failures

logql
{app="auth"} |~ "(?i)authentication (failed|error|invalid)"

This finds all authentication failure messages regardless of capitalization.

Extracting API Endpoints

logql
{app="api"} |~ "\"(GET|POST|PUT|DELETE) /api/v\d+/(?P<endpoint>[^/]+)"
| regexp "\"(GET|POST|PUT|DELETE) /api/v\\d+/(?P<endpoint>[^/]+)"
| by (endpoint)
| count_over_time[30m]

This extracts API endpoints and counts requests per endpoint.

Isolating Specific Error Types

logql
{app="backend"} |~ "error" !~ "rate limit"
| json
| error_type =~ "(?i)(database|connection|timeout)"

This filters for backend errors excluding rate limits, then parses JSON and filters for specific error types.

Summary

Regular expressions are an essential tool in LogQL that allow you to:

  1. Create powerful, flexible log searches
  2. Extract structured data from unstructured logs
  3. Transform log data into metrics for analysis

While regular expressions can be complex, mastering the basics will significantly enhance your ability to work with logs in Grafana Loki. Remember to balance the power of regex with performance considerations, and start with simpler patterns before moving to more complex ones.

Additional Resources

Exercises

  1. Write a LogQL query that finds all 4xx and 5xx HTTP status codes in nginx logs.
  2. Create a pattern to extract JSON error messages from logs.
  3. Build a query that finds logs containing email addresses.
  4. Write a pattern that extracts the duration from logs in the format "operation completed in 123.45ms".
  5. Create a query that shows a rate of exceptions per service over time.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)