Per-Tenant Quotas

Introduction

In a multi-tenant Grafana Loki environment, different teams, applications, or customers (tenants) share the same Loki infrastructure. While this approach is cost-effective, it introduces a challenge: how do you prevent one tenant from consuming excessive resources and impacting others? This is where per-tenant quotas come in.

Per-tenant quotas allow you to set specific limits on resource consumption for each tenant, ensuring fair usage and system stability. These quotas help you:

Prevent resource monopolization by any single tenant
Maintain consistent performance for all users
Plan capacity and resource allocation effectively
Implement tiered service levels for different customer groups

Understanding Quota Types in Loki

Loki supports several types of quotas that can be configured on a per-tenant basis:

1. Ingestion Quotas

These quotas limit how much data a tenant can send to Loki:

Ingestion rate: Limits the volume of log data (measured in bytes) that can be ingested per second
Sample rate: Restricts the number of log lines that can be ingested per second
Maximum label names per series: Controls the complexity of tenant log metadata

2. Query Quotas

These quotas control how tenants can query their data:

Query timeout: Maximum duration a query can run before being terminated
Query parallelism: Number of queries a tenant can run in parallel
Query series limit: Maximum number of series a tenant can select in a single query
Maximum chunk age: Oldest data a tenant can query (retention control)

3. API Quotas

These quotas affect API usage:

Maximum requests per second: Controls how frequently a tenant can call Loki APIs

Configuring Per-Tenant Quotas

Loki allows you to configure quotas through configuration files or runtime overrides. Let's look at both approaches:

Configuration File Method

Quotas can be specified in Loki's configuration file. Here's an example configuration that sets up different quota limits for two tenants:

yaml
limits_config:
  # Global defaults that apply to all tenants
  ingestion_rate: 10MB
  ingestion_burst_size: 20MB
  max_label_names_per_series: 30
  max_label_name_length: 1024
  max_label_value_length: 2048

  # Per-tenant overrides
  per_tenant_override_config:
    filename: /etc/loki/tenant-quotas.yaml

# Contents of tenant-quotas.yaml
overrides:
  tenant1:
    ingestion_rate: 20MB
    ingestion_burst_size: 30MB
    max_query_parallelism: 16
  
  tenant2:
    ingestion_rate: 5MB
    ingestion_burst_size: 10MB
    max_query_parallelism: 8
    max_query_series: 5000

This configuration:

Sets global defaults for all tenants
Provides tenant1 with higher ingestion limits (20MB/s) and query parallelism (16)
Restricts tenant2 to lower ingestion limits (5MB/s) and query parallelism (8)

Runtime Configuration with API

Loki also provides an API to manage quotas dynamically. This is useful for adjusting limits without restarting Loki:

bash
# Set a new ingestion rate for tenant1
curl -X POST http://loki:3100/config/v1/overrides \
  -H "Content-Type: application/json" \
  -H "X-Scope-OrgID: tenant1" \
  -d '{"ingestion_rate": "25MB"}'

# Get current quotas for tenant2
curl -X GET http://loki:3100/config/v1/overrides \
  -H "X-Scope-OrgID: tenant2"

Monitoring Quota Usage

To effectively manage quotas, you need visibility into how tenants are using their allocated resources. Loki exports several metrics that help monitor quota usage:

yaml
# Grafana dashboard query examples
# Query to show ingestion rate by tenant
sum by(tenant) (rate(loki_distributor_bytes_received_total[5m]))

# Query to show quota limit breaches
sum by(tenant) (rate(loki_overrides_limits_overrides_total{limit="ingestion_rate"}[5m]))

Let's create a visual monitoring dashboard to track tenant quota usage:

Implementing a Progressive Quota Strategy

When implementing quotas, consider a progressive approach that balances system protection with tenant experience:

Step 1: Monitor Usage Patterns

Before setting strict limits, monitor how your tenants naturally use the system:

yaml
# Prometheus query to identify tenant usage patterns over 30 days
max_over_time(sum by(tenant) (rate(loki_distributor_bytes_received_total[5m]))[30d:1h])

Step 2: Set Generous Initial Quotas

Start with quotas well above observed usage to avoid disrupting workloads:

yaml
overrides:
  tenant1:
    ingestion_rate: "200% of observed peak"
    max_query_parallelism: 32  # High initial value

Step 3: Gradually Refine Quotas

Tighten quotas gradually as you understand usage patterns better:

yaml
# Month 1: 200% of observed peak
# Month 2: 150% of observed peak
# Month 3: 120% of observed peak + buffer

Real-World Example: E-commerce Application

Let's examine how per-tenant quotas would work in a real-world scenario with an e-commerce platform:

Tenant	Description	Quota Strategy
Orders Service	Handles order processing	Higher ingestion during sales events
User Service	Manages user accounts	Consistent, predictable load
Analytics	Business intelligence	Higher query limits, lower priority

Here's how you might configure quotas for this scenario:

yaml
overrides:
  orders-service:
    ingestion_rate: 15MB
    ingestion_burst_size: 30MB  # Allows for sales spikes
    max_query_parallelism: 8
    
  user-service:
    ingestion_rate: 5MB
    ingestion_burst_size: 8MB
    max_query_parallelism: 4
    
  analytics:
    ingestion_rate: 2MB
    max_query_parallelism: 16
    query_timeout: 5m  # Longer timeout for complex analytics

Handling Quota Breaches

When tenants hit their quotas, Loki will reject requests with appropriate error messages. You can implement additional handling:

1. Graceful Degradation

Configure your application to handle quota breaches gracefully:

go
// Example of client-side handling in Go
func sendLogsWithRetry(logs []Log, tenant string) error {
    for attempts := 0; attempts < 3; attempts++ {
        err := sendLogs(logs, tenant)
        if err != nil {
            if isQuotaError(err) {
                // If quota error, wait with exponential backoff
                time.Sleep(time.Second * time.Duration(math.Pow(2, float64(attempts))))
                continue
            }
            return err
        }
        return nil
    }
    // If we reach here, all attempts failed
    // Sample logs and send important ones only
    return sendLogs(sampleCriticalLogs(logs), tenant)
}

2. Alerting on Quota Breaches

Set up alerts to notify when tenants approach or exceed their quotas:

yaml
# Prometheus alerting rule example
groups:
- name: loki_quota_alerts
  rules:
  - alert: TenantNearIngestionQuota
    expr: sum by(tenant) (rate(loki_distributor_bytes_received_total[5m])) / on(tenant) group_left max by(tenant) (loki_limits_ingestion_rate_bytes) > 0.8
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Tenant {{ $labels.tenant }} approaching ingestion quota"
      description: "Tenant is using {{ $value | humanizePercentage }} of allocated ingestion quota"

Implementing Tiered Service Levels

Per-tenant quotas can be used to implement different service tiers:

yaml
overrides:
  # Free tier
  tier-free:
    ingestion_rate: 1MB
    max_query_parallelism: 2
    retention_period: 7d
  
  # Standard tier
  tier-standard:
    ingestion_rate: 10MB
    max_query_parallelism: 8
    retention_period: 30d
  
  # Enterprise tier
  tier-enterprise:
    ingestion_rate: 50MB
    max_query_parallelism: 32
    retention_period: 90d

This approach allows you to offer differentiated service levels based on customer needs and pricing.

Common Pitfalls and Best Practices

When implementing per-tenant quotas, be aware of these common challenges:

Pitfalls to Avoid

Setting quotas too low: Can cause legitimate traffic to be rejected
Ignoring burst patterns: Many applications have periodic spikes in logging
One-size-fits-all approach: Different tenants have different needs
Lack of monitoring: Without visibility, quota issues go undetected

Best Practices

Start with monitoring: Understand usage before setting limits
Communicate clearly: Ensure tenants understand their limits
Implement gradual enforcement: Warn before enforcing strictly
Plan for exceptions: Have a process for temporary quota increases
Regular reviews: Periodically review and adjust quotas

Summary

Per-tenant quotas are a critical component of multi-tenant Loki deployments, allowing you to:

Allocate resources fairly across tenants
Protect system stability and performance
Implement tiered service levels
Manage capacity planning effectively

By carefully configuring ingestion, query, and API quotas, you can ensure that your Loki deployment serves all tenants reliably while preventing resource monopolization. Remember to start with monitoring, set reasonable initial limits, and gradually refine your quota strategy as you gain more insights into tenant usage patterns.

Exercise: Building a Quota Management Strategy

Monitor a multi-tenant Loki deployment for 1 week without quotas
Identify the p95 and p99 usage patterns for each tenant
Design appropriate quota levels for each tenant
Implement the quotas with a 2-week warning period
Monitor the effects after full enforcement

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Quota Types in Loki​

1. Ingestion Quotas​

2. Query Quotas​

3. API Quotas​

Configuring Per-Tenant Quotas​

Configuration File Method​

Runtime Configuration with API​

Monitoring Quota Usage​

Implementing a Progressive Quota Strategy​

Step 1: Monitor Usage Patterns​

Step 2: Set Generous Initial Quotas​

Step 3: Gradually Refine Quotas​

Real-World Example: E-commerce Application​

Handling Quota Breaches​

1. Graceful Degradation​

2. Alerting on Quota Breaches​

Implementing Tiered Service Levels​

Common Pitfalls and Best Practices​

Pitfalls to Avoid​

Best Practices​

Summary​

Exercise: Building a Quota Management Strategy​

Additional Resources​