RabbitMQ Log Analysis

Introduction

Log analysis is a critical skill for effectively maintaining and troubleshooting RabbitMQ deployments. As a message broker handling communication between different parts of your system, RabbitMQ generates detailed logs that provide insights into its operations, errors, and performance. Understanding how to access, interpret, and analyze these logs is essential for identifying issues, optimizing performance, and ensuring the reliability of your messaging infrastructure.

In this guide, we'll explore how to locate, understand, and analyze RabbitMQ logs to solve common problems and monitor your system's health.

Understanding RabbitMQ Log Structure

Log Locations

Before analyzing logs, you need to know where to find them. RabbitMQ writes logs to different locations depending on your installation method and operating system:

Debian/Ubuntu (from package): /var/log/rabbitmq/
RPM-based distributions: /var/log/rabbitmq/
Windows: %APPDATA%\RabbitMQ\logs\
Generic Unix: $RABBITMQ_HOME/var/log/rabbitmq/
Docker containers: Usually output to standard output/error

You can verify the location by running:

bash
rabbitmqctl status | grep Log

Default Log Files

RabbitMQ typically produces several log files:

rabbit@[hostname].log - The main RabbitMQ log file
rabbit@[hostname]-sasl.log - Contains SASL (System Application Support Library) events
[hostname]-crash.log - Created when RabbitMQ crashes unexpectedly

Log Format

RabbitMQ logs follow a specific format that includes:

Timestamp
Log level (debug, info, warning, error)
Connection information (if applicable)
Message content

Example log entry:

2023-05-15 14:23:45.123 [info] <0.684.0> accepting AMQP connection <0.684.0> (127.0.0.1:54321 -> 127.0.0.1:5672)

Breaking this down:

2023-05-15 14:23:45.123: Timestamp
[info]: Log level
<0.684.0>: Erlang process ID
Remaining text: The actual log message

Configuring Log Levels

Adjusting log levels helps balance between having enough information and preventing log files from growing too large.

Available Log Levels

RabbitMQ uses the following log levels, from most to least verbose:

debug - Detailed information for debugging
info - General operational information
warning - Potential issues that don't affect core functionality
error - Errors that impact functionality
critical - Severe errors that might cause system failure

Setting Log Levels

You can configure log levels in your rabbitmq.conf file:

log.file.level = info
log.console.level = warning

For specific categories, you can use:

log.file.level.connection = debug
log.file.level.channel = warning

Common log categories include:

connection - Connection-related events
channel - Channel operations
queue - Queue operations
mirroring - Cluster synchronization events
federation - Federation events

Common Log Patterns to Monitor

Connection Issues

2023-05-15 14:25:36.456 [warning] <0.789.0> closing AMQP connection <0.789.0> (127.0.0.1:54325 -> 127.0.0.1:5672, vhost: '/', user: 'guest'): client unexpectedly closed TCP connection

This indicates a client disconnected unexpectedly. Common causes include:

Network issues
Client application crashes
Timeout configurations

Queue Problems

2023-05-15 14:30:12.789 [error] <0.831.0> Error on AMQP connection <0.831.0> (127.0.0.1:54330 -> 127.0.0.1:5672, vhost: '/', user: 'guest'), channel 1: {amqp_error,resource_locked, "queue 'important_queue' in vhost '/' in exclusive use", 'basic.publish'}

This shows an attempt to publish to a queue that's locked for exclusive use.

Memory Alerts

2023-05-15 15:05:22.123 [warning] <0.456.0> Memory alarm set on node rabbit@hostname

This critical warning indicates that RabbitMQ is approaching memory limits and will block new messages.

Disk Space Alerts

2023-05-15 15:10:45.678 [warning] <0.457.0> Disk free space monitor alarm set on node rabbit@hostname. Free disk space: 246MB. Free disk space limit: 250MB

RabbitMQ will stop accepting messages when disk space is low.

Step-by-Step Log Analysis Process

Let's walk through a systematic approach to analyzing RabbitMQ logs:

Identify the time period of interest: Focus on logs around when issues were reported.

Filter logs by severity: Look for errors and warnings first.

bash
grep -E '\[error|warning\]' /var/log/rabbitmq/[email protected]

Look for patterns: Are there recurring errors? Do they correlate with specific events?

Check for connection issues: Monitor connection establishment and termination.

bash
grep "connection" /var/log/rabbitmq/[email protected] | grep -v "accepting"

Examine queue operations: Look for queue creation, deletion, and binding events.
bash
```
grep "queue" /var/log/rabbitmq/[email protected]
```
Correlate with system events: Compare log timestamps with deployment times, high load periods, etc.

Practical Example: Troubleshooting High Memory Usage

Let's walk through troubleshooting a common scenario: RabbitMQ consuming excessive memory.

Scenario

Your application users report message delivery delays. Upon investigation, you notice RabbitMQ memory usage is high.

Analysis Process

Check for memory alarms in logs:

bash
grep "Memory alarm" /var/log/rabbitmq/[email protected]

Result:

2023-05-16 09:15:32.456 [warning] <0.567.0> Memory alarm set on node rabbit@hostname

Identify what's consuming memory using RabbitMQ management tools:

bash
rabbitmqctl report > rabbitmq_report.txt

Check for queue lengths:

bash
rabbitmqctl list_queues name messages

Result:

Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name             messages
error_queue      1543267
normal_queue     126
priority_queue   15

Examine consumer activity:

bash
rabbitmqctl list_queues name messages consumers consumer_utilisation

Result:

Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name             messages    consumers  consumer_utilisation
error_queue      1543267     0          0.0
normal_queue     126         5          0.98
priority_queue   15          2          0.95

Analysis of logs and reports shows:
- No consumers on error_queue
- High message count in error_queue
- Memory alarm triggered
Solution: Set up consumers for error_queue or implement a dead-letter policy.

Advanced Log Analysis Techniques

Using Structured Logging

RabbitMQ 3.8+ supports JSON formatting for logs, making them easier to parse programmatically:

log.file.formatter = json

Example output:

json
{"time":"2023-05-16T10:23:45.123Z","level":"info","pid":"<0.684.0>","module":"rabbit_connection_tracking","message":"accepting AMQP connection <0.684.0> (127.0.0.1:54321 -> 127.0.0.1:5672)"}

Aggregating Logs

For multi-node RabbitMQ clusters, centralize logs using tools like:

ELK Stack (Elasticsearch, Logstash, Kibana)
Graylog
Fluentd

Here's a simple Logstash configuration for RabbitMQ logs:

input {
  file {
    path => "/var/log/rabbitmq/*.log"
    type => "rabbitmq"
  }
}

filter {
  if [type] == "rabbitmq" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:log_level}\] %{GREEDYDATA:log_message}" }
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss.SSS" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "rabbitmq-logs-%{+YYYY.MM.dd}"
  }
}

Creating Monitoring Dashboards

Visualize log patterns using Kibana or Grafana dashboards to track:

Connection rates
Error frequencies
Queue operations
Memory/disk alarms

Log Analysis for Common Scenarios

Scenario 1: Consumer Not Receiving Messages

Relevant logs to check:

2023-05-16 11:30:25.456 [info] <0.789.0> accepting AMQP connection <0.789.0> (127.0.0.1:54325 -> 127.0.0.1:5672)
2023-05-16 11:30:25.457 [info] <0.789.0> connection <0.789.0> (127.0.0.1:54325 -> 127.0.0.1:5672): user 'guest' authenticated and granted access to vhost '/'
2023-05-16 11:30:25.458 [info] <0.790.0> channel created

If messages aren't being delivered despite seeing these logs, check:

Queue bindings (correct exchange and routing keys)
Consumer acknowledgment settings
Queue policy settings

Scenario 2: Cluster Partition

2023-05-16 12:45:10.123 [warning] <0.567.0> Mnesia(rabbit@hostname): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@othernode}

This indicates a network partition between nodes. Actions to take:

Check network connectivity between nodes
Verify cluster partition handling policy
Decide whether to restart nodes or apply specific recovery procedures

Scenario 3: SSL Certificate Issues

2023-05-16 14:30:45.789 [error] <0.345.0> TLS server: In state certify at ssl_handshake.erl:1700 generated SERVER ALERT: Fatal - Handshake Failure

This shows an SSL handshake failure. Check:

Certificate validity dates
Certificate trust chain
TLS version compatibility

Integration with External Monitoring

For comprehensive monitoring, integrate RabbitMQ log analysis with:

Prometheus + Grafana: Using the RabbitMQ Prometheus plugin

bash
rabbitmq-plugins enable rabbitmq_prometheus

Health checks: Simple HTTP API requests to monitor node status

bash
curl -u guest:guest http://localhost:15672/api/healthchecks/node

Automated alerts: Configure alerts based on log patterns

Practical Example: Building a Log Analysis Script

Here's a simple Python script to analyze RabbitMQ logs and extract key information:

python
import re
import sys
from collections import Counter
from datetime import datetime

def analyze_rabbitmq_logs(log_file):
    # Patterns to match
    connection_pattern = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) \[(info|warning|error)\] .*(accepting|closing) AMQP connection')
    error_pattern = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) \[error\]')
    memory_alarm_pattern = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) \[warning\] .* Memory alarm')
    
    # Counters
    connections_opened = 0
    connections_closed = 0
    errors = 0
    memory_alarms = 0
    hourly_errors = Counter()
    
    with open(log_file, 'r') as file:
        for line in file:
            # Check for connections
            conn_match = connection_pattern.search(line)
            if conn_match:
                timestamp, level, action = conn_match.groups()
                if 'accepting' in action:
                    connections_opened += 1
                else:
                    connections_closed += 1
            
            # Check for errors
            error_match = error_pattern.search(line)
            if error_match:
                timestamp = error_match.group(1)
                dt = datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S.%f')
                hour = dt.hour
                hourly_errors[hour] += 1
                errors += 1
            
            # Check for memory alarms
            if memory_alarm_pattern.search(line):
                memory_alarms += 1
    
    # Print results
    print(f"Connections opened: {connections_opened}")
    print(f"Connections closed: {connections_closed}")
    print(f"Total errors: {errors}")
    print(f"Memory alarms: {memory_alarms}")
    print("
Hourly error distribution:")
    for hour in sorted(hourly_errors.keys()):
        print(f"  Hour {hour}: {hourly_errors[hour]} errors")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python analyze_rabbitmq_logs.py <log_file>")
        sys.exit(1)
    
    analyze_rabbitmq_logs(sys.argv[1])

Run this script on your RabbitMQ log file:

bash

python analyze_rabbitmq_logs.py /var/log/rabbitmq/[email protected]

Sample output:

Connections opened: 1245
Connections closed: 1240
Total errors: 37
Memory alarms: 2

Hourly error distribution:
  Hour 2: 3 errors
  Hour 9: 12 errors
  Hour 14: 15 errors
  Hour 18: 7 errors

This gives you a quick summary of connection activity, errors, and their distribution throughout the day.

Summary

Effective RabbitMQ log analysis is essential for maintaining a healthy messaging system. We've covered:

Understanding RabbitMQ log structure and locations
Configuring appropriate log levels
Recognizing common log patterns
Systematic approaches to log analysis
Troubleshooting real-world scenarios
Advanced techniques for log aggregation and visualization
Integration with monitoring systems
Practical scripts for automating log analysis

By mastering these techniques, you'll be better equipped to:

Identify and resolve issues quickly
Optimize your RabbitMQ configuration
Prevent problems before they impact users
Understand your message broker's behavior under different conditions

Additional Resources

For further learning:

Exercises

Configure RabbitMQ to use JSON-formatted logs and create a simple parser to extract connection information.
Set up an ELK stack to centralize logs from a multi-node RabbitMQ cluster.
Create alerts for critical conditions like persistent memory alarms or high message backlogs.
Extend the log analysis script to:
- Track queue creation and deletion events
- Identify clients with frequent disconnections
- Calculate average connection lifetimes

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding RabbitMQ Log Structure​

Log Locations​

Default Log Files​

Log Format​

Configuring Log Levels​

Available Log Levels​

Setting Log Levels​

Common Log Patterns to Monitor​

Connection Issues​

Queue Problems​

Memory Alerts​

Disk Space Alerts​

Step-by-Step Log Analysis Process​

Practical Example: Troubleshooting High Memory Usage​

Scenario​

Analysis Process​

Advanced Log Analysis Techniques​

Using Structured Logging​

Aggregating Logs​

Creating Monitoring Dashboards​

Log Analysis for Common Scenarios​

Scenario 1: Consumer Not Receiving Messages​

Scenario 2: Cluster Partition​

Scenario 3: SSL Certificate Issues​

Integration with External Monitoring​

Practical Example: Building a Log Analysis Script​

Summary​

Additional Resources​

Exercises​

Introduction

Understanding RabbitMQ Log Structure

Log Locations

Default Log Files

Log Format

Configuring Log Levels

Available Log Levels

Setting Log Levels

Common Log Patterns to Monitor

Connection Issues

Queue Problems

Memory Alerts

Disk Space Alerts

Step-by-Step Log Analysis Process

Practical Example: Troubleshooting High Memory Usage

Scenario

Analysis Process

Advanced Log Analysis Techniques

Using Structured Logging

Aggregating Logs

Creating Monitoring Dashboards

Log Analysis for Common Scenarios

Scenario 1: Consumer Not Receiving Messages

Scenario 2: Cluster Partition

Scenario 3: SSL Certificate Issues

Integration with External Monitoring

Practical Example: Building a Log Analysis Script

Summary

Additional Resources

Exercises