Best Practices for Labels
Introduction
Labels are a fundamental concept in Grafana Loki that allow you to organize, query, and filter your logs efficiently. Unlike traditional logging systems that index the full content of logs, Loki only indexes the labels associated with your log streams. This approach makes Loki more resource-efficient but requires careful consideration of how you use labels.
In this guide, we'll explore best practices for designing and implementing an effective labeling strategy for your Loki deployment. We'll cover key concepts like cardinality, common labeling patterns, and optimization techniques to help you build a scalable and performant logging system.
Understanding Label Cardinality
Before diving into best practices, it's essential to understand the concept of cardinality in the context of Loki.
What is Cardinality?
Cardinality refers to the number of unique label combinations in your logging system. For example, if you have labels for:
- environment (prod, dev, staging)
- application (app1, app2, app3)
- instance (instance-1, instance-2, ..., instance-n)
The potential cardinality is the product of all possible values: 3 environments × 3 applications × n instances.
High cardinality can significantly impact Loki's performance and resource usage. Let's examine why this matters and how to manage it.
Label Best Practices
1. Keep Cardinality Under Control
DO:
- Use labels for identifying the source of logs (service name, environment)
- Limit high-cardinality data to a small number of labels
DON'T:
- Add labels with unbounded values (user IDs, request IDs)
- Create labels for data that changes frequently
Example: Bad Practice
scrape_configs:
- job_name: app_logs
static_configs:
- targets:
- localhost
labels:
job: app_logs
user_id: "{{.user_id}}" # HIGH CARDINALITY!
request_id: "{{.request_id}}" # HIGH CARDINALITY!
environment: prod
Example: Good Practice
scrape_configs:
- job_name: app_logs
static_configs:
- targets:
- localhost
labels:
job: app_logs
service: payment-api
environment: prod
# user_id and request_id should be IN the log content
2. Use Static Labels for Fixed Dimensions
Static labels work best for dimensions that don't change or change very infrequently.
Good candidates for labels:
- environment (prod, staging, dev)
- region or datacenter (us-west, eu-central)
- service or component name (auth-service, payment-api)
- instance type (web, worker, database)
labels:
environment: production
region: us-west-2
service: payment-processing
tier: web
3. Follow Consistent Naming Conventions
A consistent naming pattern helps with clarity and usability.
DO:
- Use lowercase names
- Separate words with underscores
- Be descriptive but concise
- Use consistent plural/singular forms
Example:
# Consistent naming
labels:
environment: production
service_name: auth_api
instance_type: worker
log_level: error
4. Choose Labels for Query Efficiency
Design your labels based on how you expect to query your logs.
Example: Common Query Patterns
If you frequently need to find all errors across services:
# Label structure optimized for querying errors
labels:
app: myapp
component: api
severity: error # Good label for filtering
Then your LogQL query becomes straightforward:
{app="myapp", severity="error"} |= "failed to connect"
5. Use Dynamic Labels Sparingly
Dynamic labels that are derived from the log content can be useful but should be used cautiously.
Example: Using a Pipeline Stage to Extract Labels
pipeline_stages:
- json:
expressions:
level: level
component: component
- labels:
level:
component:
This extracts level
and component
from JSON logs and converts them to labels, but be careful that these don't have too many possible values.
6. Balance Between Labels and Log Content
Not everything needs to be a label. High-cardinality information should be kept within the log content.
Example: Proper Balance
# As labels (low cardinality)
labels:
app: shopping_cart
environment: production
# Within log content (high cardinality)
# "user_id=12345 request_id=abc-123 action=add_item item_id=56789"
7. Use Template Labels Effectively
When using Prometheus service discovery or similar mechanisms, template labels help maintain consistency.
Example:
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
Real-World Examples
Example 1: Microservice Architecture
In a microservice environment, you might use labels to categorize logs by service boundaries and infrastructure components:
labels:
# Core identifiers
environment: production
region: us-east-1
# Service information
service: payment-processing
component: transaction-validator
# Infrastructure
kubernetes_namespace: payments
kubernetes_pod_name: transaction-validator-67d8fb7b59-2njx4
# Runtime context
version: v2.3.1
Example 2: Monitoring Multiple Applications
When monitoring several applications with shared infrastructure:
scrape_configs:
- job_name: application_logs
static_configs:
- targets:
- localhost
labels:
environment: production
tenant: customer_a
app: frontend
- targets:
- localhost
labels:
environment: production
tenant: customer_a
app: backend
- targets:
- localhost
labels:
environment: staging
tenant: internal
app: frontend
Example 3: Structured Logging with Dynamic Label Extraction
Using Loki's pipeline stages to extract important fields from structured logs:
scrape_configs:
- job_name: application_logs
static_configs:
- targets:
- localhost
labels:
job: app_logs
environment: production
pipeline_stages:
- json:
expressions:
service: service
component: component
level: level
- labels:
service:
component:
level:
With corresponding structured log:
{"timestamp":"2023-07-25T15:04:05Z","service":"auth-service","component":"oauth","level":"error","message":"Failed to validate token","token_id":"abcd1234","error":"token expired"}
Only service
, component
, and level
become labels, while high-cardinality fields like token_id
remain in the log content.
Performance Impact of Labels
Measuring Cardinality
You can use Loki's metrics to monitor the cardinality of your system:
sum(count by(__name__)({__name__=~"loki_index_.*"}))
Storage Implications
Each unique label combination creates a new stream in Loki. More streams mean:
- More index entries
- Smaller individual chunks
- Less efficient compression
- Higher query overhead
A good rule of thumb is to keep your active streams (unique label combinations) under 10,000 per tenant for small deployments, though larger deployments can handle more with proper scaling.
Common Mistakes to Avoid
-
Using labels for high-cardinality data:
- User IDs
- Session IDs
- Request IDs
- Timestamps as label values
-
Creating too many label dimensions:
- Every new label multiplies potential combinations
-
Inconsistent labeling across services:
- Using different naming patterns
- Representing the same concept with different labels
-
Over-relying on labels instead of log content:
- Remember that Loki's filtering on log content is still fast
Summary
Effective label management in Grafana Loki requires balancing between queryability and cardinality. By following these best practices, you can build a scalable, efficient logging system:
- Keep cardinality under control by limiting high-cardinality data
- Use static labels for fixed dimensions
- Follow consistent naming conventions
- Design labels around your query patterns
- Use dynamic labels sparingly
- Balance between labels and log content
- Leverage template labels for consistency
Remember that labels in Loki are for identifying and selecting log streams, not for storing all metadata. The log content itself should contain the detailed information that changes frequently or has high cardinality.
Exercises
- Analyze your current logging setup and identify any high-cardinality labels that could be moved to log content.
- Design a labeling scheme for a microservice architecture with 5-10 services across development, staging, and production environments.
- Write LogQL queries that would efficiently find:
- All errors across production services
- All logs from a specific service in the last hour
- All database-related warnings in staging
Further Learning
- Explore dynamic pipeline stages for extracting labels
- Learn about LogQL and how label selection affects query performance
- Study Loki's internal architecture to understand how labels impact storage
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)