Skip to main content

Loki Federation

Introduction

Loki Federation is an advanced feature that allows you to distribute and scale your Loki deployment across multiple clusters or regions. Federation enables you to query logs from multiple Loki instances through a single query endpoint, making it possible to build global observability systems while maintaining performance and data locality requirements.

In this guide, we'll explore how Loki Federation works, when to use it, and how to set up a federated Loki deployment.

What is Federation?

Federation in Loki refers to the ability to configure a Loki instance to forward queries to other Loki instances and merge the results. This creates a hierarchy where:

  • Frontend Loki instances receive queries from users or Grafana
  • These queries are executed locally and also forwarded to configured downstream Loki instances
  • Results are merged and returned to the user as a single response

When to Use Federation

Loki Federation is particularly useful in several scenarios:

  1. Multi-Region Deployments: When you have logs distributed across different geographic regions
  2. Large-Scale Environments: When a single Loki instance can't handle the query load
  3. Organizational Boundaries: When different teams or departments maintain their own Loki instances
  4. Data Sovereignty: When regulatory requirements mandate keeping certain data in specific regions

How Federation Works

Federation in Loki operates through a parent-child relationship between Loki instances. The parent Loki (often called the "frontend") forwards queries to one or more child Loki instances and combines their results.

The key components involved in federation are:

  1. Query Frontend: Receives and coordinates queries
  2. Query Scheduler: Distributes queries to local and federated backends
  3. Querier: Executes queries against the local Loki instance and federates to remote instances

Configuring Loki Federation

Let's walk through setting up a basic federated Loki deployment.

Prerequisites

  • Multiple Loki instances running in different environments
  • Network connectivity between Loki instances
  • Proper authentication/authorization between instances

Configuration Steps

1. Frontend Loki Configuration

In your frontend Loki configuration, you need to define the federation settings in the query_range section:

yaml
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100

# Federation configuration
queryFrontend:
federate:
enabled: true

# Configure downstream Loki instances
downstream_urls:
- url: http://loki-us-east.example.com/loki/api/v1/query_range
tenant_id: tenant-1
- url: http://loki-us-west.example.com/loki/api/v1/query_range
tenant_id: tenant-1
- url: http://loki-eu-west.example.com/loki/api/v1/query_range
tenant_id: tenant-1

2. Configure Ingester Limits

When using federation, you might want to adjust limits to handle the increased query load:

yaml
limits_config:
max_query_parallelism: 32
max_query_length: 12h
max_queries_per_tenant: 10

3. Configure Downstream Loki Instances

Each downstream instance needs standard Loki configuration but doesn't need any special settings for federation as they're simply responding to normal queries.

Example: Multi-Region Log Query

Let's see how federation works in practice with a simple example. Imagine you have applications running in three different regions, each with its own Loki instance:

  • loki-us-east: Contains logs from US East region
  • loki-us-west: Contains logs from US West region
  • loki-eu-west: Contains logs from EU West region

With federation configured, you can query across all regions with a single request:

{app="payment-service", env="production"} |= "error"

Here's what happens behind the scenes:

  1. You send the query to the frontend Loki instance
  2. The frontend executes the query on its local data (if any)
  3. The frontend forwards the same query to all configured downstream instances
  4. Each downstream instance processes the query against its local data
  5. The frontend merges all results and returns them to you

The result is a unified view of logs from all three regions, without requiring you to query each region separately.

Authentication in Federated Setups

Federation requires proper authentication between Loki instances. Here's how to configure basic authentication:

yaml
queryFrontend:
federate:
enabled: true
downstream_urls:
- url: http://loki-us-east.example.com/loki/api/v1/query_range
tenant_id: tenant-1
headers:
X-Scope-OrgID: tenant-1
Authorization: Basic YWRtaW46cGFzc3dvcmQ= # Base64 encoded "admin:password"

For production environments, you should use more secure authentication methods like OAuth or mTLS.

Monitoring Federation Performance

To ensure your federated setup is performing optimally, monitor these metrics:

  1. loki_query_frontend_queries_total{backend="federation"}: Number of federated queries
  2. loki_query_frontend_query_duration_seconds: Duration of federated queries
  3. loki_query_frontend_downstream_responses_total: Number of responses from downstream instances

These metrics can help identify performance bottlenecks in your federation setup.

Advanced Federation Patterns

Hierarchical Federation

You can create multi-level hierarchies where regional Loki instances federate to country-level instances, which then federate to a global instance:

Query Splitting

For very large deployments, you can optimize federation by splitting queries by time range:

yaml
query_range:
split_queries_by_interval: 24h
align_queries_with_step: true

This helps distribute the query load more evenly across your infrastructure.

Troubleshooting Federation

Common issues with federation and how to solve them:

Problem: Missing Results from Some Regions

Possible Causes:

  • Network connectivity issues
  • Authentication problems
  • Timeouts due to high query complexity

Solutions:

  • Check network connectivity between Loki instances
  • Verify authentication configurations
  • Increase timeout settings in the frontend configuration
yaml
queryFrontend:
federate:
default_timeout: 30s

Problem: Slow Federated Queries

Possible Causes:

  • Too many downstream instances
  • Inefficient queries
  • Resource constraints

Solutions:

  • Optimize your LogQL queries
  • Add query splitting by time range
  • Increase resources for querier components
  • Implement caching at the frontend level

Best Practices for Loki Federation

  1. Minimize Federation Depth: Keep federation hierarchies as shallow as possible to reduce latency
  2. Use Regional Caching: Implement caching at each level to reduce repeated queries
  3. Implement Request Timeouts: Set appropriate timeouts to prevent slow instances from blocking queries
  4. Monitor Federation Metrics: Keep an eye on federation-specific metrics
  5. Plan for Failure: Design your federation topology to handle instance failures gracefully

Summary

Loki Federation is a powerful feature that enables querying across multiple Loki instances, making it possible to build truly global observability systems. Key takeaways:

  • Federation allows querying logs across multiple Loki instances through a single endpoint
  • It's ideal for multi-region deployments, large-scale environments, and scenarios with organizational boundaries
  • Setting up federation requires configuring frontend Loki instances to forward queries to downstream instances
  • Authentication and proper performance monitoring are crucial for successful federation
  • Advanced patterns like hierarchical federation can help scale to even larger deployments

Exercises

  1. Set up a basic federated Loki deployment with two instances running locally
  2. Configure authentication between federated Loki instances
  3. Write a LogQL query that retrieves logs from multiple federated instances
  4. Set up Grafana to visualize logs from your federated Loki deployment
  5. Monitor the performance of your federated setup using Prometheus metrics

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)