Tenant Configuration

Introduction

In a multi-tenant Grafana Loki deployment, proper tenant configuration is essential for maintaining separation between different users or organizations sharing the same Loki infrastructure. Tenant configuration allows administrators to define resource limits, retention policies, and access controls on a per-tenant basis, ensuring fair resource allocation and data isolation.

This guide will walk you through the process of configuring tenants in Loki, helping you understand the key parameters and best practices for managing a multi-tenant environment.

Understanding Tenant IDs in Loki

In Loki, each tenant is identified by a unique identifier known as a tenant ID. This ID is used throughout Loki's internal systems to segregate data and apply tenant-specific configurations.

The X-Scope-OrgID Header

Loki uses the X-Scope-OrgID HTTP header to identify tenants in API requests. Every request made to Loki must include this header to specify which tenant's data is being accessed or modified.

For example, when pushing logs to Loki:

curl -H "Content-Type: application/json" \
     -H "X-Scope-OrgID: tenant-123" \
     -X POST \
     -d '{
       "streams": [
         {
           "stream": {
             "job": "app",
             "level": "info"
           },
           "values": [
             [ "1630000000000000000", "log message for tenant-123" ]
           ]
         }
       ]
     }' \
     http://localhost:3100/loki/api/v1/push

In this example, the log data is associated with the tenant identified by tenant-123.

Tenant Configuration in Loki's Runtime Configuration

Tenant-specific configurations are defined in Loki's runtime configuration file, typically specified with the -runtime-config.file flag when starting Loki.

Here's an example of a runtime configuration file with tenant-specific settings:

# runtime-config.yaml

overrides:
  tenant-123:
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 20
    max_streams_per_user: 10000
    max_chunks_per_query: 1000000
    retention_period: 744h
  
  tenant-456:
    ingestion_rate_mb: 5
    ingestion_burst_size_mb: 10
    max_streams_per_user: 5000
    max_chunks_per_query: 500000
    retention_period: 168h

Key Tenant Configuration Parameters

Let's explore the main parameters you can configure for each tenant:

Ingestion Limits

ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_line_size_bytes: 256000
max_line_size: 256000  # Deprecated in favor of max_line_size_bytes
max_streams_per_user: 10000
max_global_streams_per_user: 10000
max_chunks_per_user: 1000000

ingestion_rate_mb: Maximum ingestion rate in MB per second
ingestion_burst_size_mb: Maximum burst size for ingestion in MB
max_line_size_bytes: Maximum size of a single log line in bytes
max_streams_per_user: Maximum number of active streams per tenant
max_global_streams_per_user: Maximum number of streams across all indices per tenant
max_chunks_per_user: Maximum number of chunks that can exist in a store for a tenant

Query Limits

max_chunks_per_query: 1000000
max_query_series: 500
max_query_lookback: 720h
max_query_length: 24h
max_query_parallelism: 32
query_timeout: 1m
query_split_duration: 30m

max_chunks_per_query: Maximum number of chunks that can be fetched by a single query
max_query_series: Maximum number of series a query can return
max_query_lookback: Maximum duration into the past a query can look
max_query_length: Maximum time range for a query
max_query_parallelism: Maximum number of parallel query workers
query_timeout: Maximum duration for a query to execute
query_split_duration: Split queries by time interval for improved performance

Retention Settings

retention_period: 744h

retention_period: How long log data is kept before being deleted (e.g., 744h = 31 days)

Cardinality Limits

cardinality_limit: 100000
max_label_name_length: 1024
max_label_value_length: 2048
max_label_names_per_series: 30

cardinality_limit: Maximum number of active series per tenant
max_label_name_length: Maximum length for label names
max_label_value_length: Maximum length for label values
max_label_names_per_series: Maximum number of label names per series

Default Tenant Configuration

When no tenant-specific overrides are defined, Loki applies default limits specified in the limits_config section of the main configuration:

limits_config:
  ingestion_rate_mb: 4
  ingestion_burst_size_mb: 6
  max_streams_per_user: 10000
  max_chunks_per_query: 1000000
  retention_period: 744h

Dynamic Tenant Configuration

Loki supports dynamic runtime configuration, allowing you to update tenant settings without restarting the service. You can reload the runtime configuration by sending a POST request to the /-/reload endpoint:

curl -X POST http://localhost:3100/-/reload

Practical Example: Multi-Environment Tenant Configuration

Let's look at a practical example where we configure Loki for multiple environments within an organization:

# runtime-config.yaml
overrides:
  # Production environment with higher limits
  prod:
    ingestion_rate_mb: 20
    ingestion_burst_size_mb: 30
    max_streams_per_user: 50000
    max_chunks_per_query: 2000000
    retention_period: 2160h  # 90 days
    max_query_length: 72h
    max_query_parallelism: 32
    query_timeout: 5m
    
  # Staging environment with moderate limits
  staging:
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 15
    max_streams_per_user: 20000
    max_chunks_per_query: 1000000
    retention_period: 720h  # 30 days
    max_query_length: 48h
    max_query_parallelism: 16
    query_timeout: 2m
    
  # Development environment with lower limits
  dev:
    ingestion_rate_mb: 5
    ingestion_burst_size_mb: 8
    max_streams_per_user: 10000
    max_chunks_per_query: 500000
    retention_period: 168h  # 7 days
    max_query_length: 24h
    max_query_parallelism: 8
    query_timeout: 1m

To visualize how these tenants are isolated:

Configuring Promtail for Multi-Tenancy

When sending logs from Promtail (Loki's log collection agent) to a multi-tenant Loki setup, you need to configure Promtail to use the appropriate tenant ID:

# promtail-config.yaml
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: prod
    basic_auth:
      username: loki
      password: secret

For more complex scenarios, you can use relabeling to dynamically set the tenant ID based on labels:

clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: ${NAMESPACE}
    
scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: NAMESPACE

In this example, the Kubernetes namespace is used as the tenant ID, allowing for separate tenants per namespace.

Best Practices for Tenant Configuration

Start with Conservative Limits: Begin with conservative resource limits and gradually increase them as needed based on actual usage patterns.
Align Tenants with Organizational Structure: Map tenants to your organizational structure—for example, by department, team, or environment (prod, staging, dev).
Monitor Tenant Resource Usage: Implement monitoring to track each tenant's resource consumption, helping identify when limits need adjustment.
Use Federation for Large-Scale Deployments: For very large deployments, consider running separate Loki instances for high-traffic tenants and using federation for querying across them.
Document Tenant Configurations: Maintain clear documentation of tenant configurations, including why specific limits were chosen.
Consider Automation: For environments with many tenants, use automation to generate and update tenant configurations based on templates.
Implement Regular Reviews: Regularly review tenant configurations to ensure they remain appropriate as usage patterns evolve.

Troubleshooting Tenant Issues

Rate Limiting Errors

If users encounter 429 Too Many Requests errors, it indicates they've exceeded their ingestion rate limits:

Error: server returned HTTP status 429 Too Many Requests: Ingestion rate limit (10.00MB/sec) exceeded

Solutions:

Increase the ingestion_rate_mb and ingestion_burst_size_mb values for the affected tenant
Implement backpressure handling in your log shipping configuration
Consider log sampling for high-volume sources

Query Performance Issues

If queries are timing out or performing poorly:

Solutions:

Adjust max_chunks_per_query or max_query_series limits
Increase query_timeout for complex queries
Use more specific label selectors to reduce the scope of queries
Split large time ranges into smaller queries

Summary

Proper tenant configuration is crucial for managing a multi-tenant Loki deployment. By understanding and applying tenant-specific limits for ingestion, querying, and retention, you can ensure fair resource allocation and data isolation between different users or organizations sharing the same Loki infrastructure.

Key takeaways:

Each tenant has a unique ID specified via the X-Scope-OrgID header
Tenant configurations are defined in Loki's runtime configuration file
Important parameters include ingestion limits, query limits, and retention periods
Dynamic configuration allows updating tenant settings without service restarts
Tenant configurations should align with your organizational structure and usage patterns

Additional Resources and Exercises

Exercises

Set up a Loki instance with three different tenants, each with distinct resource limits and retention periods.
Configure Promtail to send logs to Loki using different tenant IDs based on the source application.
Create a monitoring dashboard to track resource usage per tenant, helping identify when limits need adjustment.
Implement a script that generates tenant configurations based on a template and a list of tenant properties.

Introduction​

Understanding Tenant IDs in Loki​

The X-Scope-OrgID Header​

Tenant Configuration in Loki's Runtime Configuration​

Key Tenant Configuration Parameters​

Ingestion Limits​

Query Limits​

Retention Settings​

Cardinality Limits​

Default Tenant Configuration​

Dynamic Tenant Configuration​

Practical Example: Multi-Environment Tenant Configuration​

Configuring Promtail for Multi-Tenancy​

Best Practices for Tenant Configuration​

Troubleshooting Tenant Issues​

Rate Limiting Errors​

Query Performance Issues​

Summary​

Additional Resources and Exercises​

Exercises​

Further Reading​