PostgreSQL Troubleshooting
Introduction
Even the most well-configured PostgreSQL databases can encounter issues. The ability to quickly identify, diagnose, and resolve problems is an essential skill for any database administrator. This guide will walk you through common PostgreSQL problems and their solutions, providing you with practical troubleshooting techniques to keep your databases running smoothly.
Connection Issues
Connection problems are among the most common issues PostgreSQL administrators face.
Issue: Connection Refused
psql: error: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 5432?
Potential causes and solutions:
- 
PostgreSQL service is not running Check if PostgreSQL is running: sudo systemctl status postgresqlIf not running, start it: sudo systemctl start postgresql
- 
PostgreSQL not listening on expected address Check postgresql.conffor thelisten_addressessetting:# Verify the current setting
 sudo grep listen_addresses /etc/postgresql/13/main/postgresql.conf
 # Update it if needed (to listen on all interfaces)
 listen_addresses = '*'
- 
Firewall blocking connections Allow connections through the firewall: sudo ufw allow 5432/tcp
- 
Client authentication configuration Check the pg_hba.conffile:# Example entry for local connections
 host all all 127.0.0.1/32 md5
Issue: Authentication Failed
psql: error: FATAL: password authentication failed for user "postgres"
Solutions:
- 
Verify username and password Ensure you're using the correct credentials. 
- 
Reset PostgreSQL password # Switch to postgres user
 sudo -u postgres psql
 # Change password
 ALTER USER postgres WITH PASSWORD 'newpassword';
Performance Problems
Slow Queries
Identify slow queries using the pg_stat_statements extension:
- 
Enable the extension CREATE EXTENSION pg_stat_statements;
- 
Find slow queries SELECT query, calls, total_exec_time, rows,
 100.0 * total_exec_time / sum(total_exec_time) OVER() AS percent
 FROM pg_stat_statements
 ORDER BY total_exec_time DESC
 LIMIT 10;
- 
Analyze a problematic query with EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM large_table WHERE non_indexed_column = 'value';
- 
Create indexes for better performance CREATE INDEX idx_large_table_column ON large_table(non_indexed_column);
High CPU Usage
If PostgreSQL is consuming excessive CPU:
- 
Check for long-running queries SELECT pid, now() - query_start AS duration, query
 FROM pg_stat_activity
 WHERE state = 'active'
 ORDER BY duration DESC;
- 
Terminate a problematic query if necessary SELECT pg_cancel_backend(pid); -- Cancel query
 SELECT pg_terminate_backend(pid); -- Terminate connection
- 
Optimize database configuration Update shared_buffersandwork_meminpostgresql.conf:shared_buffers = 1GB # 25% of RAM for dedicated servers
 work_mem = 10MB # Depends on your workload
Disk Space Issues
Database Size Growth
Monitor database and table sizes:
-- Check database sizes
SELECT pg_database.datname, 
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;
-- Check table sizes
SELECT relname AS table_name,
       pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_relation_size(relid) DESC;
Solutions for Disk Space Issues
- 
VACUUM to reclaim space -- Basic vacuum
 VACUUM;
 -- Aggressive vacuum
 VACUUM FULL; -- Note: Takes locks and rewrites the whole table
- 
Identify and remove bloat -- Find bloated tables
 SELECT schemaname, tablename,
 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
 pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) -
 pg_relation_size(schemaname || '.' || tablename)) as bloat_size
 FROM pg_tables
 WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
 ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
 LIMIT 10;
- 
Configure autovacuum properly # postgresql.conf settings
 autovacuum = on
 autovacuum_vacuum_threshold = 50
 autovacuum_analyze_threshold = 50
 autovacuum_vacuum_scale_factor = 0.1
 autovacuum_analyze_scale_factor = 0.05
Data Corruption Issues
Detecting Corruption
- 
Use pg_dump with verification pg_dump -Fc -v -f backup.dump database_name
- 
Check table corruption with consistency checks -- Basic check - will return rows if problem exists
 SELECT * FROM my_table WHERE ctid IS NULL;
Recovering from Corruption
- 
Use Point-in-Time Recovery (PITR) # Restore from a base backup
 pg_basebackup -D /var/lib/postgresql/13/main/ -U postgres -P -Xs -R
- 
Restore from backup pg_restore -d database_name backup.dump
- 
In extreme cases, try to salvage data -- Create a new table with the structure
 CREATE TABLE recovered_table AS
 SELECT * FROM corrupted_table
 WHERE ctid IS NOT NULL;
Replication Issues
Replication Lag
Check replication status and lag:
-- On primary
SELECT client_addr, state, sent_lsn, write_lsn, 
       flush_lsn, replay_lsn, 
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
-- On replica
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
Common Replication Problems and Solutions
- 
Replication slot issues -- Check slots
 SELECT * FROM pg_replication_slots;
 -- Create a new slot
 SELECT pg_create_physical_replication_slot('replica_slot_name');
 -- Drop an unused slot
 SELECT pg_drop_replication_slot('replica_slot_name');
- 
WAL files accumulation Check WAL directory size: du -sh /var/lib/postgresql/13/main/pg_wal/Adjust max_wal_sizeif needed:max_wal_size = 1GB
Transaction ID Wraparound
Monitoring Transaction ID Usage
-- Check current transaction ID usage
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;
Preventing Wraparound
Run VACUUM FREEZE before reaching critical levels:
-- Aggressively freeze old XID
VACUUM FREEZE;
-- Check tables needing freeze
SELECT schemaname, relname, n_dead_tup, last_vacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup DESC;
Troubleshooting Workflow
The following diagram shows a systematic approach to PostgreSQL troubleshooting:
PostgreSQL Log Analysis
PostgreSQL logs are invaluable for troubleshooting. Find your log file location:
SHOW log_directory;
SHOW log_filename;
Common log levels to check:
# in postgresql.conf
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = 1000  # log queries taking more than 1 second
Common error patterns to look for:
- ERROR: deadlock detected
- ERROR: out of memory
- FATAL: terminating connection due to administrator command
- WARNING: transaction ID wraparound
Summary
Effective PostgreSQL troubleshooting requires a systematic approach:
- Identify the problem - Use logs, monitoring tools, and user reports
- Isolate the cause - Connection, performance, disk, corruption, or replication issues
- Implement a solution - Apply appropriate fixes based on the root cause
- Prevent recurrence - Update configurations, implement monitoring, and create maintenance procedures
Regular maintenance, monitoring, and a solid backup strategy are the best ways to prevent serious PostgreSQL problems.
Additional Resources
Practice Exercises
- Set up a test PostgreSQL database and intentionally fill the disk to practice handling disk space issues.
- Create a simple replication setup and practice troubleshooting common replication problems.
- Write scripts to monitor your PostgreSQL database for common issues and send alerts.
- Practice performance tuning by analyzing slow queries and implementing optimizations.
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!