Horizontal Scaling

Introduction

Horizontal scaling is a critical approach when deploying Grafana Loki in production environments that need to handle large volumes of logs. Unlike vertical scaling (adding more resources to a single server), horizontal scaling involves adding more instances of components to distribute the workload across multiple servers.

In this guide, you'll learn how to horizontally scale Grafana Loki to handle increased log volumes while maintaining performance and reliability. This is particularly important when your logging needs grow beyond what a single instance can handle efficiently.

Understanding Loki's Components

Before diving into scaling strategies, let's understand Loki's microservice architecture, which is designed to be horizontally scalable from the ground up.

The main components that can be horizontally scaled include:

Distributors: Handle incoming log data and distribute it to ingesters
Ingesters: Process and store log data temporarily before flushing to long-term storage
Query Frontend: Splits and schedules queries across multiple queriers
Queriers: Execute queries against both ingesters and storage
Compactor: Handles compaction of stored chunks (can be scaled but often doesn't need to be)

When to Scale Horizontally

You should consider horizontally scaling your Loki deployment when:

Increased log volume: Your applications are generating more logs than your current setup can handle
Query performance degradation: Queries are taking longer to complete
High resource utilization: Your current instances are consistently at high CPU/memory utilization
Need for higher availability: You want to improve fault tolerance and eliminate single points of failure

Scaling with Kubernetes

Kubernetes is the most common platform for deploying scaled Loki instances. Let's look at how you can configure horizontal scaling with Kubernetes.

Basic Kubernetes Deployment Example

Here's a simplified example of a Kubernetes deployment for Loki components:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loki-distributor
spec:
  replicas: 3  # Start with 3 replicas
  selector:
    matchLabels:
      app: loki
      component: distributor
  template:
    metadata:
      labels:
        app: loki
        component: distributor
    spec:
      containers:
      - name: distributor
        image: grafana/loki:2.8.0
        args:
          - "-target=distributor"
          - "-config.file=/etc/loki/config.yaml"
        ports:
          - containerPort: 3100
            name: http
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 1
            memory: 1Gi

Using Horizontal Pod Autoscaler (HPA)

For automatic scaling based on metrics, you can use Kubernetes' HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: loki-distributor-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: loki-distributor
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

This configuration will automatically scale the distributor deployment between 2 and 10 replicas based on CPU utilization, targeting 70% utilization.

Configuring Loki for Horizontal Scaling

To properly scale Loki, you need an appropriate configuration. Here's a sample configuration focusing on the key parameters for horizontal scaling:

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 3  # Number of replicas for each log stream
  chunk_idle_period: 30m
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /loki/wal

memberlist:
  join_members:
    - loki-memberlist

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_global_streams_per_user: 10000

schema_config:
  configs:
    - from: 2020-07-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: s3
  aws:
    s3: s3://access_key:secret_key@region/bucket_name

Key configuration aspects for scaling:

Replication Factor: Higher values (e.g., replication_factor: 3) increase reliability but require more resources
Ring Configuration: Using memberlist for service discovery helps with dynamic scaling
Shared Storage: All components must use the same backend storage (S3 in this example)

Best Practices for Horizontal Scaling

1. Start with Component-Level Planning

Scale each component separately based on their specific resource needs:

Distributors: CPU-bound; scale based on incoming log volume
Ingesters: Memory-bound; scale based on the number of active streams and chunk storage
Queriers: CPU-bound for query processing; scale based on query load

2. Resource Allocation

resources:
  requests:
    cpu: 2
    memory: 10Gi
  limits:
    cpu: 4
    memory: 16Gi

Start with conservative estimates and adjust based on monitoring data.

3. Implement Proper Monitoring

Monitor key metrics for each component to identify scaling needs:

Distributors: Request rate, request latency, queue length
Ingesters: Memory usage, active series, flush rate, WAL length
Queriers: Query rate, query latency, query queue length

4. Use Affinity Rules for Distribution

For high availability, distribute components across different nodes:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - loki
            - key: component
              operator: In
              values:
                - ingester
        topologyKey: "kubernetes.io/hostname"

5. Implement Load Balancing

For ingestion traffic, set up a load balancer in front of distributors:

apiVersion: v1
kind: Service
metadata:
  name: loki-distributor
spec:
  selector:
    app: loki
    component: distributor
  ports:
    - port: 3100
      targetPort: 3100
  type: LoadBalancer

Real-World Example: Scaling for 100GB/day Log Volume

Let's look at a practical example for an environment processing approximately 100GB of logs per day:

Initial Setup:

3 distributors (2 vCPU, 4GB RAM each)
3 ingesters (4 vCPU, 16GB RAM each)
2 query frontends (2 vCPU, 4GB RAM each)
4 queriers (4 vCPU, 8GB RAM each)
1 compactor (2 vCPU, 8GB RAM)

Scaling Decision Points:

When ingestion latency exceeds 500ms: Add distributor replicas
When ingester memory utilization exceeds 70%: Add ingester replicas
When query latency exceeds 3 seconds: Add querier replicas

Configuration for High Availability:

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 3
    final_sleep: 0s
  chunk_idle_period: 30m
  chunk_retain_period: 1m
  wal:
    enabled: true
    dir: /loki/wal

Troubleshooting Scaling Issues

Common issues and solutions when scaling Loki:

Issue	Symptoms	Solution
Ingesters OOMing	Ingesters crashing with out-of-memory errors	Increase memory limits or decrease `max_chunk_age`
High query latency	Slow query response times	Add more queriers or optimize queries by adding more specific label filters
Replication lag	Inconsistent query results	Ensure sufficient network bandwidth between ingesters
Compaction bottlenecks	Increasing storage usage	Scale the compactor or adjust compaction interval

Alternative: Using Grafana Enterprise Metrics (GEM)

For organizations with enterprise requirements, Grafana Labs offers Grafana Enterprise Metrics (GEM), which provides:

Autoscaling capabilities
Enhanced monitoring
Support for multi-tenancy
Simplified operations

Summary

Horizontal scaling is essential for running Loki in production environments with high log volumes. The key points to remember:

Scale individual components based on their specific resource requirements
Configure proper replication for reliability
Use shared object storage like S3 or GCS
Implement comprehensive monitoring to identify scaling needs
Use Kubernetes and HPA for automated scaling
Plan for gradual scaling as your log volume increases

By following these best practices, you can build a Loki deployment that scales efficiently with your growing logging needs while maintaining performance and reliability.

Exercises

Deploy a basic horizontally scaled Loki setup with 2 distributors and 2 ingesters using the provided YAML templates.
Configure Horizontal Pod Autoscaler for your Loki components and test with varying load.
Monitor key metrics and identify which component needs scaling first in your environment.
Calculate the appropriate replication factor for your availability requirements.

Additional Resources

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Understanding Loki's Components​

When to Scale Horizontally​

Scaling with Kubernetes​

Basic Kubernetes Deployment Example​

Using Horizontal Pod Autoscaler (HPA)​

Configuring Loki for Horizontal Scaling​

Best Practices for Horizontal Scaling​

1. Start with Component-Level Planning​

2. Resource Allocation​

3. Implement Proper Monitoring​

4. Use Affinity Rules for Distribution​

5. Implement Load Balancing​

Real-World Example: Scaling for 100GB/day Log Volume​

Initial Setup:​

Scaling Decision Points:​

Configuration for High Availability:​

Troubleshooting Scaling Issues​

Alternative: Using Grafana Enterprise Metrics (GEM)​

Summary​

Exercises​

Additional Resources​