Parallelism Settings
Introduction
Parallelism is a fundamental concept in distributed systems like Grafana Loki that allows operations to be executed simultaneously across multiple processors, cores, or machines. In Loki, proper configuration of parallelism settings can significantly improve performance for both log ingestion and query execution.
This guide explains how parallelism works in Loki, the key settings you need to understand, and how to optimize them for your specific deployment.
Understanding Parallelism in Loki
Loki's architecture is designed to scale horizontally, allowing it to handle massive volumes of log data by distributing work across multiple instances or processes. Parallelism settings control how this distribution happens.
Key Concepts
- Parallelism: The number of concurrent operations that can be performed
- Worker Pools: Groups of worker processes that handle specific tasks
- Query Sharding: Breaking down queries to be executed in parallel
- Ingesters: Components that receive, process, and store incoming logs
Important Parallelism Settings
Let's explore the most important parallelism settings in Loki:
1. Query Frontend Worker Configuration
The query frontend acts as a coordinator for incoming queries. It can distribute query load across multiple query workers.
query_frontend:
parallelism: 16
querier_worker_concurrency: 8
What these settings do:
parallelism
: Controls how many splits a single query can be broken intoquerier_worker_concurrency
: Determines how many queries a worker can process simultaneously
2. Querier Parallelism
Queriers execute the actual log queries. Their performance can be tuned with these settings:
querier:
max_concurrent: 10
worker_parallelism: 2
frontend_worker_concurrency: 4
What these settings do:
max_concurrent
: Maximum number of concurrent queries per querierworker_parallelism
: Number of worker processes within each querierfrontend_worker_concurrency
: Number of queries handled by a single worker
3. Ingester Parallelism
Ingesters handle the incoming log data and are critical for ingestion performance:
ingester:
concurrent_flushes: 4
flush_op_timeout: "2m"
max_transfer_retries: 3
What these settings do:
concurrent_flushes
: Number of concurrent chunk flush operationsflush_op_timeout
: Maximum time allowed for a flush operationmax_transfer_retries
: Retries when transferring chunks between ingesters
Practical Example: Tuning Query Performance
Let's look at a common scenario: you have a Loki deployment that's experiencing slow query performance, especially for queries spanning large time ranges.
Initial Configuration
query_frontend:
parallelism: 4
querier_worker_concurrency: 2
querier:
max_concurrent: 5
worker_parallelism: 1
Problem Analysis
With these settings:
- Each query is only split into 4 parts maximum (
parallelism: 4
) - Each worker only processes 2 queries at a time (
querier_worker_concurrency: 2
) - Each querier only runs 5 queries simultaneously (
max_concurrent: 5
) - No parallelism within workers (
worker_parallelism: 1
)
Optimized Configuration
query_frontend:
parallelism: 16
querier_worker_concurrency: 8
querier:
max_concurrent: 10
worker_parallelism: 2
Performance Improvement
With this optimization:
- Queries can now be split into 16 parts, allowing better distribution
- Workers can handle up to 8 queries simultaneously
- Each querier can run up to 10 concurrent queries
- Each worker uses 2 processes for better CPU utilization
Let's visualize the difference with a diagram:
Real-World Example: Log Ingestion Scaling
Consider a scenario where your application suddenly starts generating 3x more logs:
Before: Limited Parallelism
ingester:
concurrent_flushes: 2
max_chunk_age: "1h"
With these settings, ingesters might struggle to flush chunks to storage fast enough, causing:
- Memory pressure as chunks accumulate
- Potential data loss or delays
- Poor query performance
After: Improved Parallelism
ingester:
concurrent_flushes: 8
max_chunk_age: "1h"
With increased parallel flush operations:
- Chunks are flushed to storage more rapidly
- Memory usage remains stable even under higher ingestion loads
- Query performance stays consistent
Best Practices for Parallelism Tuning
-
Start Conservative: Begin with lower parallelism values and increase gradually.
-
Monitor Resource Usage: Watch CPU, memory, and network usage as you adjust settings.
-
Consider Hardware Limitations:
- CPU cores determine optimal worker parallelism
- Memory limits how many concurrent operations are possible
- Network bandwidth can become a bottleneck with high parallelism
-
Testing Under Load: Use tools like
logcli
with the--queries
flag to simulate multiple concurrent queries:
logcli query --addr=http://loki:3100 --queries=20 '{app="frontend"}' --from=2h
- Balanced Approach: Extremely high parallelism settings can cause resource contention. Find the sweet spot for your environment.
Advanced Configuration: Query Sharding
For large Loki deployments, query sharding provides additional parallelism by breaking queries into time-based shards:
query_scheduler:
max_outstanding_requests_per_tenant: 2048
query_frontend:
align_queries_with_step: true
cache_results: true
results_cache:
cache_size_mb: 1024
This configuration:
- Allows more outstanding requests per tenant
- Aligns queries with step intervals for better caching
- Enables result caching to reduce duplicate work
Troubleshooting Parallelism Issues
Problem: High CPU Usage with Little Throughput Gain
Symptoms:
- CPU usage is near 100%
- Increasing parallelism doesn't improve performance
Solution:
- Reduce parallelism to avoid context switching overhead
- Check for other bottlenecks (disk I/O, network)
querier:
worker_parallelism: 1 # Reduced from higher value
max_concurrent: 8 # Maintained for concurrent query handling
Problem: Memory Pressure During High Ingestion
Symptoms:
- OOM (Out of Memory) errors
- Slow query response during high ingestion
Solution:
- Adjust concurrent flushes and chunk settings
ingester:
concurrent_flushes: 6
max_chunk_age: "30m" # Reduced from 1h to flush more frequently
chunk_target_size: 1572864 # Target smaller chunks (1.5MB)
Summary
Parallelism settings in Grafana Loki provide powerful tools for scaling and performance optimization. Key takeaways:
- Query parallelism controls how queries are split and processed
- Ingester parallelism affects how efficiently logs are received and stored
- Proper configuration depends on your hardware resources and workload patterns
- Start with conservative settings and adjust based on monitoring
- Balance parallelism to avoid resource contention
By properly tuning parallelism settings, you can significantly improve both ingestion and query performance in your Loki deployment, allowing it to handle larger volumes of logs while maintaining responsive query capabilities.
Additional Resources
Exercises
-
Basic Configuration: Create a Loki configuration file with parallelism settings optimized for a system with 8 CPU cores and 16GB RAM.
-
Performance Testing: Use
logcli
to test query performance with different parallelism settings and document the results. -
Scaling Exercise: Design parallelism settings for a Loki deployment that needs to handle:
- 100GB of logs per day
- Peak of 20 concurrent users running queries
- 5 querier instances
- 3 ingester instances
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)