Scaling Guidelines
Introduction
Grafana Loki is designed to be horizontally scalable, allowing you to start small and grow your deployment as your logging needs increase. Whether you're running Loki on a single machine or across a distributed Kubernetes cluster, understanding how to effectively scale your implementation is crucial for maintaining performance and reliability.
This guide covers essential scaling considerations and best practices to help you grow your Loki deployment efficiently. We'll explore component-specific scaling approaches, resource optimization techniques, and architectural patterns that enable Loki to handle increasing log volumes.
Scaling Fundamentals
Before diving into specific scaling strategies, let's understand the fundamental aspects that influence Loki's scalability.
Key Scaling Dimensions
Loki scales across several dimensions:
- Query load: The number and complexity of queries
- Ingest volume: The amount of log data being sent to Loki
- Retention period: How long data is stored
- Tenant count: Number of separate organizations/projects using the same Loki instance
Monolithic vs. Microservices Deployment
Loki supports two primary deployment modes:
- Monolithic mode: All Loki components run in a single process
- Microservices mode: Components are separated and can be scaled independently
Scaling for Different Deployment Sizes
Let's examine scaling guidelines based on deployment size.
Small Deployments (Up to 100GB/day)
For small environments, a monolithic deployment is typically sufficient:
loki:
config: |
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 15m
chunk_retain_period: 30s
max_transfer_retries: 0
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/index
cache_location: /data/loki/cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
limits_config:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_global_streams_per_user: 5000
Resource Guidelines:
- CPU: 2-4 cores
- Memory: 4-8GB
- Storage: SSD for index data
- Network: 1Gbps
Medium Deployments (100GB-1TB/day)
For medium-sized deployments, consider moving to microservices mode:
# distributor configuration
distributor:
replicas: 2
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
# ingester configuration
ingester:
replicas: 3
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 1
memory: 4Gi
# querier configuration
querier:
replicas: 2
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 2Gi
Key Considerations:
- Use replicated ingesters (replication_factor: 2-3)
- Implement separate object storage (S3, GCS, etc.)
- Add query frontends with query caching
Large Deployments (1TB+/day)
For enterprise-scale deployments:
# Additional specialized microservice components
querierFrontend:
replicas: 3
compactor:
replicas: 2
indexGateway:
replicas: 3
ruler:
replicas: 2
Advanced Scaling Techniques:
- Implement tenant isolation with resource limits per tenant
- Use read and write pools for ingesters
- Add index caching layers
- Consider using Cortex chunks storage