Managing Multiple Teams

Introduction

In a production environment, Grafana Loki is rarely used by just one team. As organizations scale, multiple teams need to work with the same Loki instance while maintaining isolation, security, and resource efficiency. This is where multi-tenancy becomes crucial.

Multi-tenancy in Grafana Loki allows multiple teams to use the same Loki installation without interfering with each other's logs or queries. This document explores how to effectively manage multiple teams in a Loki deployment through tenant IDs, access controls, and resource allocation strategies.

Understanding Multi-Tenancy in Loki

At its core, Loki's multi-tenancy is built around the concept of tenant IDs. Every request to Loki must include a tenant ID, which serves as a namespace that logically separates data between different teams or applications.

Key Concepts

Tenant ID: A unique identifier that segregates logs between different teams
X-Scope-OrgID: The HTTP header used to specify the tenant ID
Resource Isolation: Ensuring one team's queries don't impact other teams
Authentication & Authorization: Managing who can access what logs

Setting Up Multi-Tenancy

Basic Configuration

The simplest way to enable multi-tenancy is through Loki's configuration file:

auth_enabled: true

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_entries_limit_per_query: 5000
  max_query_parallelism: 32

The critical setting here is auth_enabled: true, which enforces tenant ID validation for all requests.

Tenant ID Management

When sending logs to Loki, you must include a tenant ID through the X-Scope-OrgID HTTP header:

curl -H "X-Scope-OrgID: team-frontend" -H "Content-Type: application/json" \
  -XPOST -s "http://localhost:3100/loki/api/v1/push" \
  --data-raw '{"streams": [{"stream": {"job": "frontend", "level": "info"}, "values": [ ["1620000000000000000", "This is a log line from the frontend team"] ]}]}'

When querying Loki, you must use the same tenant ID:

curl -H "X-Scope-OrgID: team-frontend" \
  -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={job="frontend"}' \
  --data-urlencode 'start=1620000000000000000' \
  --data-urlencode 'end=1620100000000000000'

Practical Multi-Team Setups

Let's look at how to manage multiple teams in real-world scenarios:

Scenario 1: Development Teams Structure

Consider an organization with three development teams:

Each team would use their unique tenant ID for pushing and querying logs:

# Promtail configuration for Frontend Team
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-frontend

# Promtail configuration for Backend Team
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-backend

# Promtail configuration for Infrastructure Team
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-infra

Scenario 2: Environment-Based Structure

Another approach is to organize by environments:

Implementing Access Controls

Using Auth Proxy

For production environments, you'll want more sophisticated authentication. A common approach is to use an auth proxy like NGINX:

server {
    listen 3100;
    
    location /loki/api/v1/push {
        auth_request /auth;
        proxy_pass http://loki:3100$request_uri;
    }
    
    location = /auth {
        proxy_pass http://auth-service/validate;
        proxy_pass_request_body off;
        proxy_set_header Content-Length "";
        proxy_set_header X-Original-URI $request_uri;
    }
}

Integration with Grafana

When integrating with Grafana, you can map Grafana organizations to Loki tenant IDs:

# Grafana datasource configuration
apiVersion: 1
datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      httpHeaderName1: "X-Scope-OrgID"
    secureJsonData:
      httpHeaderValue1: "${__org.name}"

This setup automatically uses the Grafana organization name as the tenant ID.

Resource Limits and Quotas

Preventing one team from consuming all resources is crucial in a multi-tenant setup.

Per-Tenant Limits

Configure limits in Loki's configuration file:

limits_config:
  per_tenant_override_config: /etc/loki/tenant-limits.yaml
  
  # Default limits
  ingestion_rate_mb: 4
  ingestion_burst_size_mb: 6
  max_query_parallelism: 16
  max_query_series: 500

Then define team-specific limits in tenant-limits.yaml:

overrides:
  "team-frontend":
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 15
    max_query_series: 1000
  
  "team-backend":
    ingestion_rate_mb: 8
    ingestion_burst_size_mb: 12
    
  "team-infra":
    ingestion_rate_mb: 20
    ingestion_burst_size_mb: 25
    max_query_parallelism: 32

Best Practices

Use Meaningful Tenant IDs: Choose tenant IDs that clearly represent teams or applications.
Document Your Multi-Tenancy Strategy: Create clear documentation about which tenant IDs exist and who should use them.
Implement Graduated Limits: Assign larger resource quotas to teams with greater logging needs.
Monitor Tenant Usage: Set up monitoring dashboards to track each tenant's resource consumption.
Regular Audits: Periodically review tenant IDs to remove unused ones and optimize resource allocation.

Here's an example of a Grafana dashboard query to monitor per-tenant metrics:

sum by (tenant_id) (rate(loki_distributor_bytes_received_total[5m]))

Troubleshooting Multi-Tenant Setups

Common Issues and Solutions

Missing Tenant ID:
- Symptom: Error message tenant ID missing
- Solution: Ensure all requests include the X-Scope-OrgID header
Unauthorized Tenant Access:
- Symptom: Error message tenant has no access to X
- Solution: Check authentication configuration and permissions
Tenant Exceeding Limits:
- Symptom: Error message rate limit reached for tenant X
- Solution: Increase limits for that tenant or optimize logging volume

Summary

Managing multiple teams in Grafana Loki through multi-tenancy allows organizations to efficiently share a single Loki installation while maintaining isolation and security. By properly configuring tenant IDs, access controls, and resource limits, you can create a scalable logging infrastructure that serves multiple teams effectively.

Proper multi-tenancy setup ensures:

Log data isolation between teams
Appropriate access controls
Fair resource allocation
Predictable performance for all teams

Additional Resources

Read the Loki Multi-tenancy documentation
Explore tenant isolation best practices
Practice implementing per-tenant rate limits

Exercises

Set up a local Loki instance with two tenant IDs and configure Promtail to send different application logs to each tenant.
Create a custom limits configuration file with different quotas for three imaginary teams.
Implement an authentication proxy that validates tenant IDs before forwarding requests to Loki.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Multi-Tenancy in Loki​

Key Concepts​

Setting Up Multi-Tenancy​

Basic Configuration​

Tenant ID Management​

Practical Multi-Team Setups​

Scenario 1: Development Teams Structure​

Scenario 2: Environment-Based Structure​

Implementing Access Controls​

Using Auth Proxy​

Integration with Grafana​

Resource Limits and Quotas​

Per-Tenant Limits​

Best Practices​

Troubleshooting Multi-Tenant Setups​

Common Issues and Solutions​

Summary​

Additional Resources​

Exercises​