Common Issues in Grafana Loki

Introduction

Grafana Loki is a powerful log aggregation system designed to be cost-effective and easy to operate. However, like any complex system, users often encounter various challenges during setup, configuration, and operation. This guide covers the most common issues you might face when working with Grafana Loki and provides detailed solutions to help you troubleshoot effectively.

Connection Issues

Cannot Connect to Loki

One of the most basic issues is the inability to connect to your Loki instance.

Symptoms

"Data source is not working" error in Grafana
Empty query results despite logs being present
Connection timeout errors

Troubleshooting Steps

Check Loki Service Status:

# For Docker installations
docker ps | grep loki

# For Kubernetes
kubectl get pods -n <namespace> | grep loki

Verify Network Access:

# Test connection using curl
curl -v http://loki:3100/ready

# Or for Kubernetes
kubectl port-forward -n <namespace> svc/loki 3100:3100
curl -v http://localhost:3100/ready

Check Grafana Data Source Configuration: Ensure your Grafana is configured with the correct URL for Loki. Common mistakes include:
- Using incorrect protocol (http vs https)
- Missing port number
- Incorrect service name in Kubernetes environments
Authentication Issues: If you've configured authentication, verify your credentials are correct in the Grafana data source configuration.

Query Problems

No Logs Found

Sometimes Loki appears to be connected properly, but queries return no results.

Symptoms

"No logs found" message
Empty query results despite knowing logs exist

Troubleshooting Steps

Check Time Range: A common mistake is querying for logs outside your selected time range. Try:
- Extending your time range
- Using the "Last 24 hours" preset to see if any logs appear
Verify Log Labels:
```
{job="my-job"}
```
- Start with a simple label query to verify logs exist
- Use the Grafana "Explore" view to browse available labels and their values

Check Log Retention: Logs might have been deleted due to retention policies:

# Check retention period configuration
grep -A 5 "retention_period" /path/to/loki-config.yaml

Inspect Index Status:

curl -s http://loki:3100/loki/api/v1/index/stats | jq

Malformed Queries

LogQL syntax issues can prevent your queries from working correctly.

Common Query Errors and Solutions

Invalid Label Matcher:

# Wrong - space after label name
{job = "app"}

# Correct
{job="app"}

Missing Closing Brackets:

# Wrong
{job="app" | json

# Correct
{job="app"} | json

Incorrect Filter Expression:

# Wrong
{job="app"} | logfmt | status == 500

# Correct
{job="app"} | logfmt | status = 500

Escape Special Characters:

# Wrong - unescaped quotes in filter
{job="app"} |~ "error: "user not found""

# Correct
{job="app"} |~ "error: \"user not found\""

Performance Issues

Slow Queries

Loki queries can sometimes be slow to execute, especially with large data volumes.

Symptoms

Queries taking several seconds or timing out
Error messages about query timeouts

Solutions

Refine Label Filters: Always include label filters to narrow the search scope:
```
# Too broad
{} |= "error"

# Better
{job="app", env="production"} |= "error"
```

Optimize Regular Expressions: Complex regex patterns can significantly slow down queries:

# Slow regex
{job="app"} |~ ".*error.*in.*process.*"

# More efficient
{job="app"} |= "error" |= "process"

Reduce Time Range: Query the smallest time range necessary for your analysis.

Use Line Filters Before Processing: Apply line filters before JSON or logfmt parsing:

# Less efficient
{job="app"} | json | error != ""

# More efficient
{job="app"} |= "error" | json | error != ""

High Memory Usage

Loki can sometimes use excessive memory, especially in high-volume environments.

Symptoms

OOM (Out of Memory) errors
Loki service restarts
General performance degradation

Solutions

Adjust Query Limits:

# In Loki configuration
limits_config:
  max_entries_limit_per_query: 5000
  max_query_series: 500

Increase Chunk Size:

chunk_store_config:
  max_look_back_period: 168h

Implement Sharding: For large deployments, consider setting up query sharding:
```
query_scheduler:
  max_outstanding_requests_per_tenant: 2048
```
Monitor Resource Usage: Use Prometheus metrics to monitor Loki's resource consumption:
```
rate(loki_distributor_bytes_received_total[5m])
```

Series Cardinality Problems

High cardinality is one of the most common issues affecting Loki performance.

Symptoms

Error messages about "too many series"
Slow query performance
Increased memory usage

Understanding Cardinality

Cardinality refers to the number of unique label combinations in your logs. High cardinality can occur when:

Using highly variable labels (like request IDs or timestamps)
Having too many label values
Logging unique identifiers as labels

Solutions

Identify High-Cardinality Labels:

curl -s http://loki:3100/loki/api/v1/cardinality/label_names | jq

Reduce Label Cardinality:
- Remove high-cardinality labels from log streams
- Use derived fields in Grafana instead of labels for high-cardinality data
- Keep dynamic values in the log message, not in labels

Configure Cardinality Limits:

limits_config:
  cardinality_limit: 100000
  max_label_name_length: 1024
  max_label_value_length: 2048

Use the stream Selector Cautiously:

# Potentially high cardinality
sum by(stream) (rate({app="frontend"}[5m]))

# Better
sum by(job, instance) (rate({app="frontend"}[5m]))

Storage Issues

Logs Not Persisting

Sometimes logs may appear briefly but then disappear unexpectedly.

Symptoms

Logs visible for recent time ranges but missing for older periods
Inconsistent query results

Solutions

Check Retention Configuration:

# In Loki configuration
limits_config:
  retention_period: 744h  # 31 days

Inspect Storage Backend: For object storage backends (S3, GCS):

# For AWS S3
aws s3 ls s3://loki-bucket/chunks/ --recursive | head

# For GCS
gsutil ls gs://loki-bucket/chunks/ | head

Verify Compactor Operation:

# Check compactor logs
kubectl logs -n <namespace> -l app=loki,component=compactor

Disk Space Issues

Local storage can fill up quickly with high-volume log ingestion.

Solutions

Monitor Disk Usage:

# Check disk usage
df -h /path/to/loki/data

# Set up Prometheus alerts
disk_free:node_filesystem_avail_bytes:ratio < 0.10

Configure Retention and Compaction:

compactor:
  working_directory: /loki/compactor
  shared_store: s3

Migrate to Object Storage: For production environments, consider using object storage like S3, GCS, or Azure Blob Storage instead of local disk.

Ingestion Issues

Log Lines Not Being Ingested

Sometimes logs sent to Loki don't appear in query results.

Symptoms

Logs not appearing despite successful Promtail/agent responses
Missing specific log patterns

Troubleshooting Steps

Check Promtail/Agent Status:

# View Promtail logs
kubectl logs -n <namespace> -l app=promtail

# Check Promtail targets
curl http://promtail:9080/targets

Monitor Ingestion Rate:

# Loki metrics in Prometheus
rate(loki_distributor_lines_received_total[5m])

Verify Rate Limits:

# In Loki configuration
limits_config:
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

Check for Rejected Logs:

# Look for rejection metrics
rate(loki_distributor_lines_rejected_total[5m])

Too Many Label Values Error

Loki has limits on label value combinations to prevent performance issues.

Solutions

Identify Problematic Labels:

curl -s http://loki:3100/loki/api/v1/label/job/values | jq | wc -l

Configure Higher Limits (if appropriate for your environment):

limits_config:
  max_label_names_per_series: 30
  max_label_value_length: 2048

Relabel or Drop High-Cardinality Labels: In Promtail configuration:

scrape_configs:
  - job_name: app
    relabel_configs:
      - source_labels: [high_cardinality_label]
        action: drop

Authentication and Authorization Issues

Access Denied Errors

Security configuration can sometimes prevent legitimate access to logs.

Symptoms

"Access denied" or "Unauthorized" errors
Authentication failures

Solutions

Check Tenant ID Configuration:

auth_enabled: true
server:
  http_listen_port: 3100

Verify Auth Tokens:

# Test with curl
curl -H "X-Scope-OrgID: tenant1" http://loki:3100/loki/api/v1/label

RBAC Configuration (when using Kubernetes): Ensure your ServiceAccount has the proper permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: loki-role
rules:
- apiGroups: [""]
  resources: ["pods", "nodes"]
  verbs: ["get", "list", "watch"]

Multi-tenancy Problems

Cross-tenant Query Issues

In multi-tenant setups, isolation between tenants is important.

Symptoms

Seeing logs from other tenants
Missing logs that should be visible

Solutions

Configure Proper Tenant IDs:

# In Promtail config
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: tenant1

Test Tenant Isolation:

# Query as tenant1
curl -H "X-Scope-OrgID: tenant1" http://loki:3100/loki/api/v1/query_range -d 'query={job="app"}' -d 'start=1625000000' -d 'end=1625001000'

# Query as tenant2
curl -H "X-Scope-OrgID: tenant2" http://loki:3100/loki/api/v1/query_range -d 'query={job="app"}' -d 'start=1625000000' -d 'end=1625001000'

Check Tenant Configurations:

limits_config:
  per_tenant_override_config: /etc/loki/tenant-overrides.yaml

Upgrading Issues

Version Compatibility

Upgrading Loki can sometimes lead to compatibility issues.

Symptoms

Service failures after upgrading
Missing logs after version change
Schema incompatibility errors

Solutions

Check Release Notes: Always review the release notes before upgrading: https://github.com/grafana/loki/releases

Backup Configuration:

# Backup config files
cp /path/to/loki-config.yaml /path/to/loki-config.yaml.backup

Migrate Schema Gradually: When upgrading between major versions, consider running both schemas in parallel:

schema_config:
  configs:
    - from: 2020-07-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index_
        period: 24h
    - from: 2022-06-01
      store: boltdb-shipper
      object_store: s3
      schema: v12
      index:
        prefix: index_
        period: 24h

Test in a Staging Environment: Always test upgrades in a non-production environment first.

Summary

This guide covered the most common issues you might encounter when working with Grafana Loki:

Connection problems and how to diagnose them
Query issues and syntax errors
Performance optimization techniques
High cardinality management
Storage configuration and troubleshooting
Log ingestion problems
Authentication and multi-tenancy issues
Upgrade challenges

Remember that Loki is designed to be simple and cost-effective, but proper configuration is key to avoiding these common pitfalls. Regular monitoring of your Loki instance can help catch issues before they become critical.

Additional Resources

Exercises

Diagnostic Challenge: Set up a local Loki instance and intentionally misconfigure it. Then use the troubleshooting techniques in this guide to identify and fix the issues.
Query Optimization: Take a complex LogQL query and optimize it for performance using the strategies outlined in this guide.
Cardinality Analysis: Analyze a set of logs and identify potential high-cardinality labels. Develop a strategy to reduce cardinality while maintaining useful information.
Monitoring Setup: Create a Grafana dashboard to monitor the health and performance of your Loki instance using the metrics mentioned in this guide.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Connection Issues​

Cannot Connect to Loki​

Symptoms​

Troubleshooting Steps​

Query Problems​

No Logs Found​

Symptoms​

Troubleshooting Steps​

Malformed Queries​

Common Query Errors and Solutions​

Performance Issues​

Slow Queries​

Symptoms​

Solutions​

High Memory Usage​

Symptoms​

Solutions​

Series Cardinality Problems​

Symptoms​

Understanding Cardinality​

Solutions​

Storage Issues​

Logs Not Persisting​

Symptoms​

Solutions​

Disk Space Issues​

Solutions​

Ingestion Issues​

Log Lines Not Being Ingested​

Symptoms​

Troubleshooting Steps​

Too Many Label Values Error​

Solutions​

Authentication and Authorization Issues​

Access Denied Errors​

Symptoms​

Solutions​

Multi-tenancy Problems​

Cross-tenant Query Issues​

Symptoms​

Solutions​

Upgrading Issues​

Version Compatibility​

Symptoms​

Solutions​

Summary​

Additional Resources​

Exercises​

Introduction

Connection Issues

Cannot Connect to Loki

Symptoms

Troubleshooting Steps

Query Problems

No Logs Found

Symptoms

Troubleshooting Steps

Malformed Queries

Common Query Errors and Solutions

Performance Issues

Slow Queries

Symptoms

Solutions

High Memory Usage

Symptoms

Solutions

Series Cardinality Problems

Symptoms

Understanding Cardinality

Solutions

Storage Issues

Logs Not Persisting

Symptoms

Solutions

Disk Space Issues

Solutions

Ingestion Issues

Log Lines Not Being Ingested

Symptoms

Troubleshooting Steps

Too Many Label Values Error

Solutions

Authentication and Authorization Issues

Access Denied Errors

Symptoms

Solutions

Multi-tenancy Problems

Cross-tenant Query Issues

Symptoms

Solutions

Upgrading Issues

Version Compatibility

Symptoms

Solutions

Summary

Additional Resources

Exercises