Regular Expressions in LogQL
Regular expressions (regex) are a powerful pattern-matching tool that allows you to search for specific patterns within your logs. In LogQL, Grafana Loki's query language, regular expressions enable you to create sophisticated queries to extract valuable information from your log data.
Introduction to Regular Expressions
Regular expressions are sequences of characters that define a search pattern. They're like a specialized mini-language for describing patterns in text. When used in LogQL, regular expressions help you:
- Filter logs based on complex patterns
- Extract specific fields from unstructured logs
- Transform log data into structured metrics
LogQL uses RE2 syntax for regular expressions, which is similar to other regex implementations but with some specific limitations for performance reasons.
Basic Regex Syntax in LogQL
Let's start with the fundamental regex concepts as they apply to LogQL:
Literal Characters
The simplest regex patterns match exact text:
{app="frontend"} |~ "login"
This query matches logs from the frontend
app that contain the word "login".
Regex Operators in LogQL
LogQL provides several regex operators:
Operator | Description | Example |
---|---|---|
` | ~` | Matches logs that contain a pattern | {app="frontend"} |~ "error" |
!~ | Matches logs that don't contain a pattern | {app="frontend"} !~ "debug" |
` | =` | Exact string match (faster than regex) | {app="frontend"} |= "error" |
!= | Exact string exclusion | {app="frontend"} != "healthy" |
Common Regex Patterns
Here are some frequently used regex patterns in LogQL:
Character Classes
Pattern | Matches | Example |
---|---|---|
. | Any single character | {app="frontend"} |~ "l.gin" matches "login", "l3gin", etc. |
\d | Any digit (0-9) | {app="api"} |~ "status: \d\d\d" matches status codes |
\w | Word character (a-z, A-Z, 0-9, _) | {app="api"} |~ "\w+Error" |
[abc] | Any character in the brackets | {app="api"} |~ "HTTP/1\.[01]" matches HTTP/1.0 or HTTP/1.1 |
[^abc] | Any character NOT in the brackets | {app="api"} |~ "[^2]xx" matches non-2xx status codes |
Quantifiers
Pattern | Matches | Example |
---|---|---|
* | 0 or more of previous item | {app="api"} |~ "ERROR.*timeout" |
+ | 1 or more of previous item | {app="api"} |~ "IP: \d+\.\d+\.\d+\.\d+" |
? | 0 or 1 of previous item | {app="api"} |~ "https?" matches http or https |
{n} | Exactly n of previous item | {app="api"} |~ "\d{3}" matches exactly 3 digits |
{n,m} | Between n and m of previous item | {app="api"} |~ "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" |
Anchors
Pattern | Matches | Example |
---|---|---|
^ | Start of a line | {app="api"} |~ "^ERROR" matches logs starting with ERROR |
$ | End of a line | {app="api"} |~ "completed$" matches logs ending with completed |
Grouping and Alternation
Pattern | Matches | Example |
---|---|---|
(...) | Groups patterns together | {app="api"} |~ "(GET|POST|PUT|DELETE)" |
| | Alternation (OR) | {app="api"} |~ "error|warning|critical" |
Real-World Examples
Let's look at practical examples of using regex in LogQL:
Example 1: Finding Error Patterns
{app="payment-service"} |~ "(?i)(error|exception|fail|timeout)"
This query searches for logs in the payment-service
app containing any error-related terms, case-insensitive ((?i)
).
Example 2: Extracting Status Codes
{app="nginx"} |~ "HTTP/\d\.\d\" (\d{3})"
| regexp "HTTP/\\d\\.\\d\\\" (?P<status_code>\\d{3})"
This extracts HTTP status codes from nginx logs and makes them available as the status_code
label.
Example 3: JSON Error Analysis
{app="api"} |~ "error"
| json
| status_code =~ "5.."
This finds API errors, parses JSON logs, and filters for 5xx status codes.
Example 4: Tracking Failed Login Attempts
{app="auth-service"} |~ "login failed.*user=(?P<username>[^ ]+)"
| by (username)
| count_over_time[1h]
This tracks failed login attempts by username over the last hour.
Advanced Techniques
Using Regex with Label Filters
You can use regex to filter on labels:
{app=~"api|backend", namespace="production", level=~"error|warn"}
This matches logs from apps matching "api" or "backend" in the production namespace with error or warn level.
Regex with Log Parsing
Combine regex with LogQL's parsing functions:
{app="database"} |~ "query took (?P<duration>[0-9.]+)ms"
| regexp "query took (?P<duration>[0-9.]+)ms"
| duration > 100
This extracts query durations and finds slow queries (>100ms).
Creating Metrics from Logs Using Regex
Transform logs into metrics with regex:
sum by (status_code) (
count_over_time({app="frontend"} |~ "status=(?P<status_code>\\d{3})" [5m])
)
This counts HTTP status codes over 5-minute windows.
Performance Considerations
When using regex in LogQL, keep these performance tips in mind:
- Use label matching first: Filter with labels before applying regex.
- Prefer exact matching: Use
|=
instead of|~
when possible. - Anchor patterns: Use
^
and$
to limit where the pattern can match. - Avoid backtracking: LogQL uses RE2, which doesn't support backtracking.
- Keep patterns simple: Complex regex can slow down query performance.
RE2 Limitations in LogQL
LogQL uses Google's RE2 regex engine, which has some limitations:
- No backreferences (
\1
,\2
, etc.) - No lookahead/lookbehind assertions
- No atomic grouping or possessive quantifiers
These limitations are by design to guarantee linear-time matching performance.
Common Regex Patterns for LogQL
Here's a reference table of useful patterns for LogQL queries:
To match | Pattern |
---|---|
IP addresses | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} |
Timestamps (ISO8601) | \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2} |
UUIDs | [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} |
URLs | https?://[^\s]+ |
Email addresses | [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} |
JSON values | "[^"]*":\s*"[^"]*" |
Practice Examples
Let's walk through building a few regex patterns for common log analysis tasks:
Finding Authentication Failures
{app="auth"} |~ "(?i)authentication (failed|error|invalid)"
This finds all authentication failure messages regardless of capitalization.
Extracting API Endpoints
{app="api"} |~ "\"(GET|POST|PUT|DELETE) /api/v\d+/(?P<endpoint>[^/]+)"
| regexp "\"(GET|POST|PUT|DELETE) /api/v\\d+/(?P<endpoint>[^/]+)"
| by (endpoint)
| count_over_time[30m]
This extracts API endpoints and counts requests per endpoint.
Isolating Specific Error Types
{app="backend"} |~ "error" !~ "rate limit"
| json
| error_type =~ "(?i)(database|connection|timeout)"
This filters for backend errors excluding rate limits, then parses JSON and filters for specific error types.
Summary
Regular expressions are an essential tool in LogQL that allow you to:
- Create powerful, flexible log searches
- Extract structured data from unstructured logs
- Transform log data into metrics for analysis
While regular expressions can be complex, mastering the basics will significantly enhance your ability to work with logs in Grafana Loki. Remember to balance the power of regex with performance considerations, and start with simpler patterns before moving to more complex ones.
Additional Resources
- LogQL Official Documentation
- RE2 Syntax Reference
- Regular Expressions 101 - Test your regex patterns
Exercises
- Write a LogQL query that finds all 4xx and 5xx HTTP status codes in nginx logs.
- Create a pattern to extract JSON error messages from logs.
- Build a query that finds logs containing email addresses.
- Write a pattern that extracts the duration from logs in the format "operation completed in 123.45ms".
- Create a query that shows a rate of exceptions per service over time.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)