Benchmarking Tools
Introduction
Benchmarking is a critical practice when working with Grafana Loki to ensure your logging system performs optimally under various workloads. In this guide, we'll explore the benchmarking tools available for Grafana Loki, understand how to use them effectively, and interpret their results to make informed optimization decisions.
Benchmarking allows you to:
- Establish performance baselines
- Identify bottlenecks before they affect production
- Compare different configuration options
- Plan capacity for future growth
- Validate performance improvements after tuning
Loki's Built-in Benchmarking Tools
Grafana Loki comes with several built-in tools to help you measure various aspects of its performance. Let's explore the most important ones.
Using loki-bench
The loki-bench
tool is specifically designed to benchmark Loki's write and query performance. It simulates log ingestion and querying at scale to help you understand how your configuration performs under load.
Installation
To install loki-bench
:
go install github.com/grafana/loki/cmd/loki-bench@latest
Basic Usage
Here's a simple example to benchmark write performance:
loki-bench write --addr=http://localhost:3100 --batches=100 --batch-size=10 --batch-wait=1s
This command will:
- Send 100 batches of logs to your Loki instance
- Each batch contains 10 log entries
- Wait 1 second between batches
The output will look similar to:
Write benchmark results:
Total logs sent: 1000
Total duration: 2m15.4s
Average throughput: 7.4 logs/second
Success rate: 99.8%
Errors: 2
For query benchmarking:
loki-bench query --addr=http://localhost:3100 --query='{app="test"}' --iterations=50 --query-wait=2s
This will:
- Run the query
{app="test"}
50 times - Wait 2 seconds between queries
- Report on query performance
Advanced Configuration
For more complex benchmarking scenarios, you can use a configuration file:
loki-bench --config=bench-config.yaml
Example bench-config.yaml
:
write:
addr: http://localhost:3100
tenants: 5
batches: 200
batch-size: 20
streams: 100
labels: '{app="benchmark", env="production", level="info"}'
workers: 10
query:
addr: http://localhost:3100
queries:
- query: '{app="benchmark"} |= "error"'
range: 1h
- query: 'rate({app="benchmark"}[5m])'
range: 6h
iterations: 25
Using logcli bench
The logcli
tool, which is Loki's command-line client, also includes a benchmarking capability specifically focused on query performance.
Installation
go install github.com/grafana/loki/cmd/logcli@latest
Basic Usage
logcli bench --addr=http://localhost:3100 --query='{job="varlogs"}' --limit=1000 --iterations=10
This command will:
- Connect to your Loki instance
- Run the specified query 10 times
- Return up to 1000 log lines per query
- Calculate average query time and other performance metrics
Sample output:
Benchmarking query: {job="varlogs"}
Iterations: 10
Average query time: 0.567 seconds
Min query time: 0.432 seconds
Max query time: 0.789 seconds
95th percentile: 0.765 seconds
Query throughput: 1.76 queries per second
Third-Party Benchmarking Tools
In addition to Loki's built-in tools, several third-party solutions can be used to benchmark different aspects of Loki's performance.
k6 for HTTP Endpoint Testing
k6 is an open-source load testing tool that's particularly effective for benchmarking Loki's HTTP API endpoints.
Sample k6 Script for Loki
import http from 'k6/http';
import { sleep, check } from 'k6';
export const options = {
vus: 10, // 10 virtual users
duration: '30s',
};
export default function () {
// Benchmark pushing logs
const pushUrl = 'http://localhost:3100/loki/api/v1/push';
const payload = JSON.stringify({
streams: [
{
stream: {
app: 'benchmark-app',
environment: 'production',
},
values: [
[`${Date.now() * 1000000}`, 'This is a benchmark log line from k6'],
],
},
],
});
const params = {
headers: {
'Content-Type': 'application/json',
},
};
const pushRes = http.post(pushUrl, payload, params);
check(pushRes, {
'push status is 204': (r) => r.status === 204,
});
// Benchmark queries
const queryUrl = 'http://localhost:3100/loki/api/v1/query_range';
const queryParams = {
query: '{app="benchmark-app"}',
start: (Date.now() - 3600000) * 1000000, // 1 hour ago in nanoseconds
end: Date.now() * 1000000, // now in nanoseconds
limit: 100,
};
const queryRes = http.get(queryUrl, { params: queryParams });
check(queryRes, {
'query status is 200': (r) => r.status === 200,
'query returns results': (r) => {
const body = JSON.parse(r.body);
return body.data && body.data.result && body.data.result.length > 0;
},
});
sleep(1);
}
Run this with:
k6 run loki-benchmark.js
Prometheus for Long-term Performance Monitoring
While not a benchmarking tool per se, Prometheus is invaluable for monitoring Loki's performance metrics over time, which complements your benchmarking efforts.
Configure Prometheus to scrape Loki's metrics endpoint:
scrape_configs:
- job_name: 'loki'
static_configs:
- targets: ['localhost:3100']
metrics_path: '/metrics'
Key metrics to monitor:
loki_distributor_bytes_received_total
: Total bytes received per tenantloki_ingester_chunk_size_bytes
: Size of chunks persisted to storageloki_ingester_chunk_utilization
: Percentage of chunk space utilizedloki_query_frontend_query_seconds_bucket
: Query latency histogram
Benchmarking Best Practices
For reliable benchmark results, follow these best practices:
1. Isolate Your Testing Environment
Ensure your benchmarking is performed in an environment that closely resembles production but is isolated from it:
docker run -d --name loki-benchmark -p 3100:3100 -v $(pwd)/loki-config.yaml:/etc/loki/local-config.yaml grafana/loki:latest
2. Create Realistic Test Data
Generate log data that resembles your actual workloads:
for i in {1..10000}; do
timestamp=$(date +%s%N)
echo "{\"streams\": [{\"stream\": {\"app\": \"test-app\", \"env\": \"prod\"}, \"values\": [[\"$timestamp\", \"User id-$RANDOM logged in from IP 192.168.$((RANDOM % 255)).$((RANDOM % 255))\"]]}]}" | \
curl -H "Content-Type: application/json" -X POST -d @- http://localhost:3100/loki/api/v1/push
sleep 0.01
done
3. Progressive Load Testing
Start with a small load and progressively increase it to find the breaking point:
4. Measure Multiple Dimensions
Don't focus only on throughput; measure:
- Latency (average, 95th, 99th percentiles)
- Error rates
- Resource usage (CPU, memory, disk I/O)
- Query performance
- Cardinality handling
5. Benchmark with Different Configurations
Compare performance across different Loki configurations:
# Run benchmark with default config
loki-bench write --addr=http://localhost:3100 --batches=100 --batch-size=100 > default_results.txt
# Update configuration and restart Loki
# ...
# Run identical benchmark with new config
loki-bench write --addr=http://localhost:3100 --batches=100 --batch-size=100 > tuned_results.txt
# Compare results
diff default_results.txt tuned_results.txt
Real-World Benchmarking Scenarios
Let's explore some practical benchmarking scenarios you might encounter when working with Grafana Loki.
Scenario 1: Sizing for Log Volume Growth
If you expect your log volume to grow significantly, you need to benchmark how your Loki deployment will handle the increased load:
# Benchmark current log volume
loki-bench write --addr=http://localhost:3100 --batches=100 --batch-size=$(current_daily_logs / 1000) --workers=5
# Benchmark projected log volume (e.g., 5x growth)
loki-bench write --addr=http://localhost:3100 --batches=100 --batch-size=$(current_daily_logs * 5 / 1000) --workers=5
Scenario 2: Query Performance Tuning
If your users are experiencing slow queries, benchmark different query optimization techniques:
# Benchmark baseline query performance
logcli bench --addr=http://localhost:3100 --query='{app="frontend"} |= "error"' --iterations=20 --range=1h
# After implementing index improvements, chunk caching, etc.
logcli bench --addr=http://localhost:3100 --query='{app="frontend"} |= "error"' --iterations=20 --range=1h
Scenario 3: Comparing Storage Backends
Benchmark to compare the performance impact of different storage backends:
# Configure Loki with local filesystem backend
# Run benchmark
loki-bench write --addr=http://localhost:3100 --batches=200 --batch-size=50 > filesystem_results.txt
# Reconfigure Loki with S3 storage backend
# Run identical benchmark
loki-bench write --addr=http://localhost:3100 --batches=200 --batch-size=50 > s3_results.txt
# Compare results
Interpreting Benchmark Results
Understanding your benchmark results is as important as running the benchmarks. Here's what to look for:
Throughput Analysis
Look at how many logs per second your system can handle:
Write throughput: 15,000 logs/second
Read throughput: 50 queries/second
Latency Breakdown
Pay close attention to latency percentiles, not just averages:
Average query time: 250ms
95th percentile: 450ms
99th percentile: 780ms
If there's a large gap between average and high percentiles, it indicates inconsistent performance that will affect user experience.
Resource Utilization
Monitor system resources during benchmarks:
# Use tools like top, htop, or prometheus metrics
Look for:
- CPU saturation
- Memory growth
- Disk I/O bottlenecks
- Network saturation
Identify Breaking Points
Document when and how your system begins to fail:
System performs well up to 20,000 logs/second
At 25,000 logs/second:
- Error rate increases to 5%
- Query latency increases by 3x
- Memory usage grows unbounded
Common Benchmarking Pitfalls
Be aware of these common mistakes when benchmarking Loki:
-
Not warming up the system: Always run a warm-up phase before recording benchmark results.
-
Ignoring cardinality: Test with realistic label cardinality; low-cardinality benchmarks won't reveal real-world performance.
-
Neglecting cleanup: In long-running benchmarks, ensure proper cleanup to avoid skewed results due to disk space issues.
-
Single-dimension focus: Don't optimize only for ingest performance at the expense of query performance, or vice versa.
-
Unrealistic data: Using synthetic data that doesn't match your production patterns can lead to misleading results.
Summary
Benchmarking tools are essential for understanding, optimizing, and maintaining Grafana Loki performance. By following a systematic approach to benchmarking, you can:
- Establish reliable performance baselines
- Identify potential bottlenecks before they impact users
- Make data-driven decisions about configuration changes
- Plan effectively for future growth
- Validate the impact of your performance tuning efforts
Remember that benchmarking is not a one-time activity but an ongoing process that should be integrated into your development and operations workflows.
Additional Resources
Exercises
-
Set up a basic Loki instance and use
loki-bench
to establish baseline performance metrics. -
Create a custom k6 script that simulates your specific log patterns and query patterns.
-
Compare Loki's performance with different chunk sizes and see how it affects both ingest and query performance.
-
Benchmark how increasing label cardinality affects your Loki deployment.
-
Create a Grafana dashboard that displays the key performance metrics from your Loki benchmarks.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)