Compaction Strategies

Introduction

Compaction is a critical process in log storage systems like Grafana Loki that helps optimize storage usage, improve query performance, and reduce operational costs. In this guide, we'll explore what compaction is, why it's necessary, and the different strategies you can implement to maintain an efficient Loki deployment.

Log data tends to accumulate rapidly in monitoring systems. Without proper management, this can lead to storage inefficiency, slower queries, and higher costs. Compaction addresses these challenges by reorganizing and optimizing how log data is stored.

What is Compaction?

Compaction is the process of merging multiple smaller files into larger ones, eliminating duplicate data, and organizing the data in a way that improves read efficiency. In Loki, compaction specifically refers to the consolidation of chunks and indexes to optimize storage and retrieval.

Why Compaction Matters

Compaction provides several important benefits for a Loki deployment:

Reduced Storage Costs: By eliminating duplicates and optimizing storage formats, compaction reduces the overall storage footprint.
Improved Query Performance: Fewer, well-organized files mean faster query execution.
Better Resource Utilization: Compaction reduces the overhead of managing many small files.
Optimized Retention Policies: Makes it easier to implement and enforce data retention rules.

Compaction Strategies in Loki

1. Time-Based Compaction

Time-based compaction organizes and merges chunks based on time boundaries. This is the default strategy in Loki and works well for most deployments.

Configuration Example

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

In this configuration:

compaction_interval determines how frequently the compaction process runs
retention_enabled activates the retention policies alongside compaction
retention_delete_delay sets a safety buffer before deleting data marked for deletion

2. Size-Based Compaction

Size-based compaction triggers the compaction process when chunks reach a certain size threshold. This approach helps maintain consistent performance regardless of ingestion rates.

Configuration Example

limits_config:
  max_chunk_age: 1h
  chunk_target_size: 1536000
  chunk_idle_period: 30m

Here:

max_chunk_age defines the maximum time a chunk can be open before being compacted
chunk_target_size specifies the target size in bytes for chunks
chunk_idle_period sets how long a chunk can be idle before being closed and considered for compaction

3. Hybrid Compaction

Many production environments benefit from a hybrid approach that considers both time and size factors. This provides the best balance between storage efficiency and query performance.

Configuration Example

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true

limits_config:
  max_chunk_age: 2h
  chunk_target_size: 1536000
  chunk_idle_period: 45m

schema_config:
  configs:
    - from: 2022-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: loki_index_
        period: 24h

This configuration combines time-based compaction settings with size thresholds for a more balanced approach.

Implementing a Compaction Strategy

Let's walk through the process of implementing a compaction strategy for a medium-sized Loki deployment:

Step 1: Assess Your Log Patterns

Before choosing a compaction strategy, understand your log patterns:

// Example log volume analysis script
const hourlyVolumes = await client.query(`
  sum(rate(loki_distributor_bytes_received_total[1h])) by (tenant)
`);

// Analyze daily patterns to identify peak times
const dailyPatterns = await client.query(`
  sum(rate(loki_distributor_bytes_received_total[1d])) by (hour)
`);

Step 2: Configure Basic Compaction

Start with a basic configuration that you can adjust based on observation:

compactor:
  working_directory: /loki/compactor
  compaction_interval: 15m
  retention_enabled: true
  retention_delete_delay: 3h
  
limits_config:
  max_chunk_age: 2h
  chunk_target_size: 1048576  # 1MB
  chunk_idle_period: 30m

Step 3: Monitor and Adjust

Implement metrics to track compaction effectiveness:

// Example dashboard query to monitor compaction
const compactionMetrics = await client.query(`
  # Compaction duration
  sum(rate(loki_compactor_duration_seconds_sum[5m])) /
  sum(rate(loki_compactor_duration_seconds_count[5m]))
  
  # Compacted bytes
  sum(increase(loki_compactor_compacted_bytes_total[1h]))
`);

Step 4: Iterate on Your Strategy

Based on the metrics, adjust your configuration:

# Improved configuration after monitoring
compactor:
  working_directory: /loki/compactor
  compaction_interval: 20m  # Adjusted based on performance
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 100  # Adjusted based on delete queue
  
limits_config:
  max_chunk_age: 3h  # Increased to reduce compaction frequency
  chunk_target_size: 2097152  # Increased to 2MB for better performance
  chunk_idle_period: 45m  # Increased based on log patterns

Real-World Example: E-Commerce Application

Let's consider a real-world example of implementing compaction strategies for an e-commerce application:

Scenario

High traffic during business hours
Periodic sales events create traffic spikes
Different log volumes for different services
Cost sensitivity for storage

Solution: Custom Compaction Strategy

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: s3
  compaction_interval: 30m
  
  # During high-volume periods, adjust the strategy
  retention_enabled: true
  retention_delete_delay: 4h
  retention_delete_worker_count: 200
  
schema_config:
  configs:
    - from: 2023-01-01
      store: boltdb-shipper
      object_store: s3
      schema: v12
      index:
        prefix: index_
        period: 24h  # Daily index rotation for easier compaction
        
storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/boltdb-cache
    
    # Periodic compaction schedule with boltdb-shipper
    cache_ttl: 24h
    shared_store: s3

Results

42% reduction in storage costs
65% improvement in query performance for 7-day historical queries
More consistent performance during sales events

Common Compaction Issues and Troubleshooting

Issue 1: Compaction Isn't Happening

If you notice compaction isn't occurring:

# Check compactor logs
kubectl logs -l app=loki -c compactor | grep "compaction"

# Verify compactor metrics
curl -s http://loki:3100/metrics | grep compactor

Fix: Ensure the compactor component is properly configured and has access to the storage backend.

Issue 2: High Disk Usage Despite Compaction

If storage usage remains high despite compaction:

# Adjust retention settings
compactor:
  retention_enabled: true
  retention_delete_delay: 1h  # Reduce delay
  retention_delete_worker_count: 200  # Increase workers
  
  # Add specific retention period
limits_config:
  retention_period: 168h  # 7 days retention

Issue 3: Slow Queries After Compaction

If queries become slow after compaction:

# Optimize chunk cache settings
chunk_store_config:
  max_look_back_period: 168h
  
query_range:
  split_queries_by_interval: 12h  # Split long queries
  
limits_config:
  max_query_parallelism: 16  # Increase parallelism

Advanced Compaction Techniques

1. Multi-Tenant Compaction Strategies

For environments with multiple tenants or teams:

limits_config:
  per_tenant_override_config: /etc/loki/overrides.yaml
  
# In overrides.yaml
overrides:
  tenant1:
    retention_period: 336h  # 14 days for tenant1
    chunk_target_size: 2097152
  tenant2:
    retention_period: 72h  # 3 days for tenant2
    chunk_target_size: 1048576

2. Time-Shifting Compaction

For environments with cyclical load patterns:

compactor:
  compaction_interval: dynamic://15m,30m,1h
  compaction_window: 1h
  working_directory: /loki/compactor
  
  # Add a schedule to avoid peak hours
  schedule:
    - cron: "0 2-5 * * *"  # Run between 2-5 AM
      parallelism: 10
    - cron: "0 14-16 * * *"  # Run between 2-4 PM
      parallelism: 5

Summary

Compaction strategies are essential for maintaining an efficient and cost-effective Grafana Loki deployment. By carefully selecting and tuning your compaction approach based on your specific log patterns and requirements, you can significantly improve query performance while reducing storage costs.

The key takeaways from this guide:

Compaction merges smaller chunks into larger ones to optimize storage and improve query efficiency
Different strategies (time-based, size-based, and hybrid) suit different use cases
Regular monitoring and adjustment are crucial for maintaining optimal performance
Real-world implementations should be tailored to your specific log patterns and requirements

Exercise: Design Your Compaction Strategy

As an exercise, design a compaction strategy for one of these scenarios:

A high-volume microservices architecture with 50+ services
A security monitoring system that needs to retain logs for compliance
A low-volume IoT application with occasional traffic spikes

For each scenario, define:

Compaction interval
Chunk size and age settings
Retention policies
Monitoring approach

Additional Resources

Through proper compaction strategies, you'll be able to build a more efficient, performant, and cost-effective logging system with Grafana Loki.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

What is Compaction?​

Why Compaction Matters​

Compaction Strategies in Loki​

1. Time-Based Compaction​

Configuration Example​

2. Size-Based Compaction​

Configuration Example​

3. Hybrid Compaction​

Configuration Example​

Implementing a Compaction Strategy​

Step 1: Assess Your Log Patterns​

Step 2: Configure Basic Compaction​

Step 3: Monitor and Adjust​

Step 4: Iterate on Your Strategy​

Real-World Example: E-Commerce Application​

Scenario​

Solution: Custom Compaction Strategy​

Results​

Common Compaction Issues and Troubleshooting​

Issue 1: Compaction Isn't Happening​

Issue 2: High Disk Usage Despite Compaction​

Issue 3: Slow Queries After Compaction​

Advanced Compaction Techniques​

1. Multi-Tenant Compaction Strategies​

2. Time-Shifting Compaction​

Summary​

Exercise: Design Your Compaction Strategy​

Additional Resources​

Introduction

What is Compaction?

Why Compaction Matters

Compaction Strategies in Loki

1. Time-Based Compaction

Configuration Example

2. Size-Based Compaction

Configuration Example

3. Hybrid Compaction

Configuration Example

Implementing a Compaction Strategy

Step 1: Assess Your Log Patterns

Step 2: Configure Basic Compaction

Step 3: Monitor and Adjust

Step 4: Iterate on Your Strategy

Real-World Example: E-Commerce Application

Scenario

Solution: Custom Compaction Strategy

Results

Common Compaction Issues and Troubleshooting

Issue 1: Compaction Isn't Happening

Issue 2: High Disk Usage Despite Compaction

Issue 3: Slow Queries After Compaction

Advanced Compaction Techniques

1. Multi-Tenant Compaction Strategies

2. Time-Shifting Compaction

Summary

Exercise: Design Your Compaction Strategy

Additional Resources