Database Monitoring

Introduction

Database monitoring is a critical aspect of maintaining reliable applications and services. As the backbone of most modern applications, databases require constant monitoring to ensure optimal performance, prevent outages, and quickly troubleshoot issues when they arise. This guide will walk you through how to implement effective database monitoring using Grafana.

Whether you're working with relational databases like MySQL, PostgreSQL, and SQL Server, or NoSQL solutions like MongoDB and Redis, the principles and patterns outlined here will help you build comprehensive monitoring dashboards to keep your databases healthy and performant.

Why Monitor Databases?

Databases often become bottlenecks in application performance for several reasons:

They handle increasingly large volumes of data
They process complex queries from multiple sources
They're subject to varying load patterns and peak usage times
They're critical infrastructure that directly impacts user experience

Effective monitoring helps you:

Identify performance degradation before users notice
Plan capacity based on growth trends
Optimize query performance and resource utilization
Troubleshoot issues quickly when they occur
Ensure high availability and data integrity

Key Metrics to Monitor

Connection Metrics

Active connections: Number of clients currently connected
Connection rate: New connections per second
Connection errors: Failed connection attempts
Connection pooling efficiency: For applications using connection pools

Performance Metrics

Query throughput: Queries per second
Query latency: Average and percentile response times (95th, 99th)
Query errors: Failed queries and error rates
Slow queries: Number and details of queries exceeding time thresholds

Resource Utilization

CPU usage: Overall and per-query CPU consumption
Memory usage: Buffer/cache utilization, working set size
Disk I/O: Read/write operations, latency, throughput
Network I/O: Data transfer rates, packet loss, retransmits

Storage Metrics

Database size: Total size and growth rate
Index size: Space consumed by indexes
Table size: Size of individual tables
Temporary space usage: Disk space used for temporary operations

Replication Metrics (for clustered databases)

Replication lag: Delay between primary and replicas
Replication errors: Failed replication operations
Replication throughput: Rate of data replication

Setting Up Database Monitoring in Grafana

Prerequisites

Before you begin, ensure you have:

A running Grafana instance (v9.0 or later recommended)
Access to your database with monitoring permissions
Prometheus, Telegraf, or another metrics collector installed
Required data source plugins installed in Grafana

Step 1: Configure Data Sources

First, let's connect Grafana to your database metrics source:

Navigate to Configuration > Data Sources in Grafana
Click "Add data source"
Select the appropriate source type:
- Prometheus for metric-based monitoring
- Database-specific plugins (MySQL, PostgreSQL, etc.) for direct SQL queries
- Loki for database logs

For Prometheus, configure the connection:

URL: http://your-prometheus-server:9090
Access: Server (default)
Scrape interval: 15s

Step 2: Create a Dashboard

Create a new dashboard by clicking on "+" and selecting "Dashboard":

Add a new panel for each key metric group
Organize panels in logical sections
Use appropriate visualization types:
- Time series for trends over time
- Gauge for current utilization
- Table for query details
- Stat for key numbers

Step 3: Configure Basic Database Overview Panels

Let's create a panel for active connections:

-- For PostgreSQL direct connection
SELECT 
  now() AS time,
  count(*) AS connections
FROM 
  pg_stat_activity
WHERE 
  state != 'idle'

For Prometheus with a database exporter:

sum(pg_stat_activity_count{state!="idle"})

Step 4: Query Performance Monitoring

Let's create a panel for average query latency:

rate(pg_stat_statements_mean_exec_time{db="your_database"}[5m])

And for slow queries:

-- For MySQL direct connection
SELECT 
  time_ms,
  query,
  rows_examined,
  rows_sent
FROM 
  mysql.slow_log
WHERE 
  start_time > NOW() - INTERVAL 1 DAY
ORDER BY 
  time_ms DESC
LIMIT 10

Database-Specific Monitoring Patterns

MySQL Monitoring Pattern

For MySQL databases, focus on these additional metrics:

Buffer pool utilization: Track the InnoDB buffer pool efficiency
Table locks: Monitor lock contention issues
Handler statistics: Understand how data is being accessed

Example Prometheus query for buffer pool hit ratio:

(
  rate(mysql_global_status_innodb_buffer_pool_read_requests[5m]) - 
  rate(mysql_global_status_innodb_buffer_pool_reads[5m])
) / 
rate(mysql_global_status_innodb_buffer_pool_read_requests[5m])

PostgreSQL Monitoring Pattern

For PostgreSQL, these metrics are particularly important:

VACUUM activity: Track autovacuum operations and tables needing vacuum
Bloat: Monitor table and index bloat
Transaction ID wraparound: Prevent database freezing

Example query for tracking tables needing vacuum:

SELECT
  schemaname || '.' || relname AS table_name,
  n_dead_tup AS dead_tuples,
  n_live_tup AS live_tuples,
  round(n_dead_tup::numeric / (n_live_tup + n_dead_tup) * 100, 2) AS dead_ratio
FROM
  pg_stat_user_tables
WHERE
  n_live_tup > 0
ORDER BY
  dead_ratio DESC
LIMIT 10

MongoDB Monitoring Pattern

For MongoDB, focus on:

Operation counts: Queries, inserts, updates, deletes
WiredTiger cache: Cache utilization and eviction rates
Collection metrics: Document counts and sizes

Example Prometheus query for MongoDB operation rates:

sum by (type) (rate(mongodb_op_counters_total[5m]))

Creating Effective Alerts

Set up alerts to notify you of potential database issues:

High connection count alert:

sum(mysql_global_status_threads_connected) > 100

Replication lag alert:

max(mysql_slave_status_seconds_behind_master) > 300

Disk space alert:

(node_filesystem_size_bytes{mountpoint="/var/lib/mysql"} - 
 node_filesystem_free_bytes{mountpoint="/var/lib/mysql"}) / 
 node_filesystem_size_bytes{mountpoint="/var/lib/mysql"} * 100 > 85

Slow query rate alert:

increase(mysql_global_status_slow_queries[5m]) > 10

Real-World Example: E-commerce Database Monitoring

Let's walk through a practical example of monitoring a database for an e-commerce application:

Dashboard Layout Example

Overview Row:
- Database status summary
- Active connections trend
- Key performance indicators
Query Performance Row:
- Top 5 slow queries
- Query latency by type
- Error rate trend
Resource Utilization Row:
- CPU usage trend
- Memory utilization
- Disk I/O metrics
Business Impact Row:
- Correlation between database metrics and business KPIs
- Database performance during peak hours
- Historical performance during sales events

Example Variables for the Dashboard

Create template variables to make your dashboard flexible:

$database: mysql_performance_schema, mysql_information_schema
$time_range: 1h, 6h, 12h, 24h, 7d
$instance: db-prod-1, db-prod-2, db-replica-1

Best Practices for Database Monitoring

Balance detail with clarity: Too many metrics can be overwhelming; focus on actionable data

Correlate metrics: Look at relationships between metrics for better insights:

correlation(rate(mysql_global_status_questions[5m]), rate(node_cpu_seconds_total{mode="user"}[5m]))

Establish baselines: Understand normal performance to detect anomalies
Use annotations: Mark deployment times, configuration changes, and maintenance events
Implement proper retention policies: Keep high-resolution data short-term and aggregated data long-term
Segment by query type: Different queries have different performance profiles
Create dedicated dashboards for different user roles:
- DBA dashboard with technical details
- Developer dashboard focusing on query performance
- Executive dashboard showing system health and business impact

Troubleshooting Common Database Issues

High CPU Usage

Identify resource-intensive queries:

SELECT
  query,
  calls,
  total_time,
  mean_time
FROM
  pg_stat_statements
ORDER BY
  total_time DESC
LIMIT 10

Check for missing indexes with high table scans:

rate(mysql_global_status_handler_read_rnd_next[5m])

Connection Spikes

Identify connection sources:

SELECT
  usename,
  count(*)
FROM
  pg_stat_activity
GROUP BY
  usename
ORDER BY
  count(*) DESC

Check for connection leaks in application code

Disk I/O Bottlenecks

Monitor read/write latency:

rate(node_disk_read_time_seconds_total{device="sda"}[1m]) / rate(node_disk_reads_completed_total{device="sda"}[1m])

Check for queries causing excessive disk activity:

topk(10, sum by (file_name) (rate(mysql_file_summary_by_instance_total_read[10m])))

Advanced Techniques

Query Performance Analysis

Use Grafana's query performance visualization to compare similar queries:

-- For PostgreSQL
SELECT
  substring(query, 1, 50) AS query_sample,
  calls,
  total_time / calls AS avg_time,
  rows / calls AS avg_rows,
  shared_blks_hit / calls AS avg_cache_hits,
  shared_blks_read / calls AS avg_cache_misses
FROM
  pg_stat_statements
WHERE
  query ILIKE '%SELECT%customer%'
ORDER BY
  total_time DESC
LIMIT 20

Predictive Analysis

Set up predictive alerts based on trend analysis:

predict_linear(mysql_global_status_innodb_buffer_pool_pages_free[6h], 24 * 3600) < 1000

Summary

Effective database monitoring with Grafana enables you to:

Proactively identify issues before they impact users
Optimize database performance through data-driven decisions
Correlate database metrics with application performance
Plan for growth by understanding resource utilization trends
Troubleshoot efficiently with detailed performance data

By following the patterns outlined in this guide, you'll create comprehensive monitoring dashboards that give you deep visibility into your database's health and performance.

Additional Resources

Practice exercises: Set up monitoring for a test database and simulate load
Advanced learning: Explore specialized monitoring for your specific database
Community dashboards: Import and customize pre-built database monitoring dashboards

Remember that effective monitoring is an iterative process. Start with basic metrics, add detail as needed, and continuously refine your dashboards to provide the most relevant insights.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Monitor Databases?​

Key Metrics to Monitor​

Connection Metrics​

Performance Metrics​

Resource Utilization​

Storage Metrics​

Replication Metrics (for clustered databases)​

Setting Up Database Monitoring in Grafana​

Prerequisites​

Step 1: Configure Data Sources​

Step 2: Create a Dashboard​

Step 3: Configure Basic Database Overview Panels​

Step 4: Query Performance Monitoring​

Database-Specific Monitoring Patterns​

MySQL Monitoring Pattern​

PostgreSQL Monitoring Pattern​

MongoDB Monitoring Pattern​

Creating Effective Alerts​

Real-World Example: E-commerce Database Monitoring​

Dashboard Layout Example​

Example Variables for the Dashboard​

Best Practices for Database Monitoring​

Troubleshooting Common Database Issues​

High CPU Usage​

Connection Spikes​

Disk I/O Bottlenecks​

Advanced Techniques​

Query Performance Analysis​

Predictive Analysis​

Summary​

Additional Resources​