PostgreSQL Troubleshooting

Introduction

Even the most well-configured PostgreSQL databases can encounter issues. The ability to quickly identify, diagnose, and resolve problems is an essential skill for any database administrator. This guide will walk you through common PostgreSQL problems and their solutions, providing you with practical troubleshooting techniques to keep your databases running smoothly.

Connection Issues

Connection problems are among the most common issues PostgreSQL administrators face.

Issue: Connection Refused

psql: error: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 5432?

Potential causes and solutions:

PostgreSQL service is not running

Check if PostgreSQL is running:
```
sudo systemctl status postgresql
```
If not running, start it:
```
sudo systemctl start postgresql
```

PostgreSQL not listening on expected address

Check postgresql.conf for the listen_addresses setting:

# Verify the current setting
sudo grep listen_addresses /etc/postgresql/13/main/postgresql.conf

# Update it if needed (to listen on all interfaces)
listen_addresses = '*'

Firewall blocking connections

Allow connections through the firewall:
```
sudo ufw allow 5432/tcp
```

Client authentication configuration

Check the pg_hba.conf file:

# Example entry for local connections
host    all             all             127.0.0.1/32            md5

Issue: Authentication Failed

psql: error: FATAL: password authentication failed for user "postgres"

Solutions:

Verify username and password

Ensure you're using the correct credentials.

Reset PostgreSQL password

# Switch to postgres user
sudo -u postgres psql

# Change password
ALTER USER postgres WITH PASSWORD 'newpassword';

Performance Problems

Slow Queries

Identify slow queries using the pg_stat_statements extension:

Enable the extension
```
CREATE EXTENSION pg_stat_statements;
```

Find slow queries

SELECT query, calls, total_exec_time, rows, 
       100.0 * total_exec_time / sum(total_exec_time) OVER() AS percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Analyze a problematic query with EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT * FROM large_table WHERE non_indexed_column = 'value';

Create indexes for better performance

CREATE INDEX idx_large_table_column ON large_table(non_indexed_column);

High CPU Usage

If PostgreSQL is consuming excessive CPU:

Check for long-running queries

SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

Terminate a problematic query if necessary

SELECT pg_cancel_backend(pid);  -- Cancel query
SELECT pg_terminate_backend(pid);  -- Terminate connection

Optimize database configuration

Update shared_buffers and work_mem in postgresql.conf:

shared_buffers = 1GB  # 25% of RAM for dedicated servers
work_mem = 10MB       # Depends on your workload

Disk Space Issues

Database Size Growth

Monitor database and table sizes:

-- Check database sizes
SELECT pg_database.datname, 
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

-- Check table sizes
SELECT relname AS table_name,
       pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_relation_size(relid) DESC;

Solutions for Disk Space Issues

VACUUM to reclaim space

-- Basic vacuum
VACUUM;

-- Aggressive vacuum
VACUUM FULL;  -- Note: Takes locks and rewrites the whole table

Identify and remove bloat

-- Find bloated tables
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
       pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
       pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) - 
                      pg_relation_size(schemaname || '.' || tablename)) as bloat_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

Configure autovacuum properly

# postgresql.conf settings
autovacuum = on
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05

Data Corruption Issues

Detecting Corruption

Use pg_dump with verification

pg_dump -Fc -v -f backup.dump database_name

Check table corruption with consistency checks

-- Basic check - will return rows if problem exists
SELECT * FROM my_table WHERE ctid IS NULL;

Recovering from Corruption

Use Point-in-Time Recovery (PITR)

# Restore from a base backup
pg_basebackup -D /var/lib/postgresql/13/main/ -U postgres -P -Xs -R

Restore from backup
```
pg_restore -d database_name backup.dump
```

In extreme cases, try to salvage data

-- Create a new table with the structure
CREATE TABLE recovered_table AS 
SELECT * FROM corrupted_table 
WHERE ctid IS NOT NULL;

Replication Issues

Replication Lag

Check replication status and lag:

-- On primary
SELECT client_addr, state, sent_lsn, write_lsn, 
       flush_lsn, replay_lsn, 
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

-- On replica
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

Common Replication Problems and Solutions

Replication slot issues

-- Check slots
SELECT * FROM pg_replication_slots;

-- Create a new slot
SELECT pg_create_physical_replication_slot('replica_slot_name');

-- Drop an unused slot
SELECT pg_drop_replication_slot('replica_slot_name');

WAL files accumulation

Check WAL directory size:
```
du -sh /var/lib/postgresql/13/main/pg_wal/
```
Adjust max_wal_size if needed:
```
max_wal_size = 1GB
```

Transaction ID Wraparound

Monitoring Transaction ID Usage

-- Check current transaction ID usage
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;

Preventing Wraparound

Run VACUUM FREEZE before reaching critical levels:

-- Aggressively freeze old XID
VACUUM FREEZE;

-- Check tables needing freeze
SELECT schemaname, relname, n_dead_tup, last_vacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup DESC;

Troubleshooting Workflow

The following diagram shows a systematic approach to PostgreSQL troubleshooting:

PostgreSQL Log Analysis

PostgreSQL logs are invaluable for troubleshooting. Find your log file location:

SHOW log_directory;
SHOW log_filename;

Common log levels to check:

# in postgresql.conf
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = 1000  # log queries taking more than 1 second

Common error patterns to look for:

ERROR: deadlock detected
ERROR: out of memory
FATAL: terminating connection due to administrator command
WARNING: transaction ID wraparound

Summary

Effective PostgreSQL troubleshooting requires a systematic approach:

Identify the problem - Use logs, monitoring tools, and user reports
Isolate the cause - Connection, performance, disk, corruption, or replication issues
Implement a solution - Apply appropriate fixes based on the root cause
Prevent recurrence - Update configurations, implement monitoring, and create maintenance procedures

Regular maintenance, monitoring, and a solid backup strategy are the best ways to prevent serious PostgreSQL problems.

Additional Resources

Practice Exercises

Set up a test PostgreSQL database and intentionally fill the disk to practice handling disk space issues.
Create a simple replication setup and practice troubleshooting common replication problems.
Write scripts to monitor your PostgreSQL database for common issues and send alerts.
Practice performance tuning by analyzing slow queries and implementing optimizations.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Connection Issues​

Issue: Connection Refused​

Issue: Authentication Failed​

Performance Problems​

Slow Queries​

High CPU Usage​

Disk Space Issues​

Database Size Growth​

Solutions for Disk Space Issues​

Data Corruption Issues​

Detecting Corruption​

Recovering from Corruption​

Replication Issues​

Replication Lag​

Common Replication Problems and Solutions​

Transaction ID Wraparound​

Monitoring Transaction ID Usage​

Preventing Wraparound​

Troubleshooting Workflow​

PostgreSQL Log Analysis​

Summary​

Additional Resources​

Practice Exercises​