Loki Federation
Introduction
Loki Federation is an advanced feature that allows you to distribute and scale your Loki deployment across multiple clusters or regions. Federation enables you to query logs from multiple Loki instances through a single query endpoint, making it possible to build global observability systems while maintaining performance and data locality requirements.
In this guide, we'll explore how Loki Federation works, when to use it, and how to set up a federated Loki deployment.
What is Federation?
Federation in Loki refers to the ability to configure a Loki instance to forward queries to other Loki instances and merge the results. This creates a hierarchy where:
- Frontend Loki instances receive queries from users or Grafana
- These queries are executed locally and also forwarded to configured downstream Loki instances
- Results are merged and returned to the user as a single response
When to Use Federation
Loki Federation is particularly useful in several scenarios:
- Multi-Region Deployments: When you have logs distributed across different geographic regions
- Large-Scale Environments: When a single Loki instance can't handle the query load
- Organizational Boundaries: When different teams or departments maintain their own Loki instances
- Data Sovereignty: When regulatory requirements mandate keeping certain data in specific regions
How Federation Works
Federation in Loki operates through a parent-child relationship between Loki instances. The parent Loki (often called the "frontend") forwards queries to one or more child Loki instances and combines their results.
The key components involved in federation are:
- Query Frontend: Receives and coordinates queries
- Query Scheduler: Distributes queries to local and federated backends
- Querier: Executes queries against the local Loki instance and federates to remote instances
Configuring Loki Federation
Let's walk through setting up a basic federated Loki deployment.
Prerequisites
- Multiple Loki instances running in different environments
- Network connectivity between Loki instances
- Proper authentication/authorization between instances
Configuration Steps
1. Frontend Loki Configuration
In your frontend Loki configuration, you need to define the federation settings in the query_range
section:
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
# Federation configuration
queryFrontend:
federate:
enabled: true
# Configure downstream Loki instances
downstream_urls:
- url: http://loki-us-east.example.com/loki/api/v1/query_range
tenant_id: tenant-1
- url: http://loki-us-west.example.com/loki/api/v1/query_range
tenant_id: tenant-1
- url: http://loki-eu-west.example.com/loki/api/v1/query_range
tenant_id: tenant-1
2. Configure Ingester Limits
When using federation, you might want to adjust limits to handle the increased query load:
limits_config:
max_query_parallelism: 32
max_query_length: 12h
max_queries_per_tenant: 10
3. Configure Downstream Loki Instances
Each downstream instance needs standard Loki configuration but doesn't need any special settings for federation as they're simply responding to normal queries.
Example: Multi-Region Log Query
Let's see how federation works in practice with a simple example. Imagine you have applications running in three different regions, each with its own Loki instance:
loki-us-east
: Contains logs from US East regionloki-us-west
: Contains logs from US West regionloki-eu-west
: Contains logs from EU West region
With federation configured, you can query across all regions with a single request:
{app="payment-service", env="production"} |= "error"
Here's what happens behind the scenes:
- You send the query to the frontend Loki instance
- The frontend executes the query on its local data (if any)
- The frontend forwards the same query to all configured downstream instances
- Each downstream instance processes the query against its local data
- The frontend merges all results and returns them to you
The result is a unified view of logs from all three regions, without requiring you to query each region separately.
Authentication in Federated Setups
Federation requires proper authentication between Loki instances. Here's how to configure basic authentication:
queryFrontend:
federate:
enabled: true
downstream_urls:
- url: http://loki-us-east.example.com/loki/api/v1/query_range
tenant_id: tenant-1
headers:
X-Scope-OrgID: tenant-1
Authorization: Basic YWRtaW46cGFzc3dvcmQ= # Base64 encoded "admin:password"
For production environments, you should use more secure authentication methods like OAuth or mTLS.
Monitoring Federation Performance
To ensure your federated setup is performing optimally, monitor these metrics:
loki_query_frontend_queries_total{backend="federation"}
: Number of federated queriesloki_query_frontend_query_duration_seconds
: Duration of federated queriesloki_query_frontend_downstream_responses_total
: Number of responses from downstream instances
These metrics can help identify performance bottlenecks in your federation setup.
Advanced Federation Patterns
Hierarchical Federation
You can create multi-level hierarchies where regional Loki instances federate to country-level instances, which then federate to a global instance:
Query Splitting
For very large deployments, you can optimize federation by splitting queries by time range:
query_range:
split_queries_by_interval: 24h
align_queries_with_step: true
This helps distribute the query load more evenly across your infrastructure.
Troubleshooting Federation
Common issues with federation and how to solve them:
Problem: Missing Results from Some Regions
Possible Causes:
- Network connectivity issues
- Authentication problems
- Timeouts due to high query complexity
Solutions:
- Check network connectivity between Loki instances
- Verify authentication configurations
- Increase timeout settings in the frontend configuration
queryFrontend:
federate:
default_timeout: 30s
Problem: Slow Federated Queries
Possible Causes:
- Too many downstream instances
- Inefficient queries
- Resource constraints
Solutions:
- Optimize your LogQL queries
- Add query splitting by time range
- Increase resources for querier components
- Implement caching at the frontend level
Best Practices for Loki Federation
- Minimize Federation Depth: Keep federation hierarchies as shallow as possible to reduce latency
- Use Regional Caching: Implement caching at each level to reduce repeated queries
- Implement Request Timeouts: Set appropriate timeouts to prevent slow instances from blocking queries
- Monitor Federation Metrics: Keep an eye on federation-specific metrics
- Plan for Failure: Design your federation topology to handle instance failures gracefully
Summary
Loki Federation is a powerful feature that enables querying across multiple Loki instances, making it possible to build truly global observability systems. Key takeaways:
- Federation allows querying logs across multiple Loki instances through a single endpoint
- It's ideal for multi-region deployments, large-scale environments, and scenarios with organizational boundaries
- Setting up federation requires configuring frontend Loki instances to forward queries to downstream instances
- Authentication and proper performance monitoring are crucial for successful federation
- Advanced patterns like hierarchical federation can help scale to even larger deployments
Exercises
- Set up a basic federated Loki deployment with two instances running locally
- Configure authentication between federated Loki instances
- Write a LogQL query that retrieves logs from multiple federated instances
- Set up Grafana to visualize logs from your federated Loki deployment
- Monitor the performance of your federated setup using Prometheus metrics
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)