PostgreSQL Troubleshooting
Introduction
Even the most well-configured PostgreSQL databases can encounter issues. The ability to quickly identify, diagnose, and resolve problems is an essential skill for any database administrator. This guide will walk you through common PostgreSQL problems and their solutions, providing you with practical troubleshooting techniques to keep your databases running smoothly.
Connection Issues
Connection problems are among the most common issues PostgreSQL administrators face.
Issue: Connection Refused
psql: error: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
Potential causes and solutions:
-
PostgreSQL service is not running
Check if PostgreSQL is running:
bashsudo systemctl status postgresql
If not running, start it:
bashsudo systemctl start postgresql
-
PostgreSQL not listening on expected address
Check
postgresql.conf
for thelisten_addresses
setting:bash# Verify the current setting
sudo grep listen_addresses /etc/postgresql/13/main/postgresql.conf
# Update it if needed (to listen on all interfaces)
listen_addresses = '*' -
Firewall blocking connections
Allow connections through the firewall:
bashsudo ufw allow 5432/tcp
-
Client authentication configuration
Check the
pg_hba.conf
file:bash# Example entry for local connections
host all all 127.0.0.1/32 md5
Issue: Authentication Failed
psql: error: FATAL: password authentication failed for user "postgres"
Solutions:
-
Verify username and password
Ensure you're using the correct credentials.
-
Reset PostgreSQL password
bash# Switch to postgres user
sudo -u postgres psql
# Change password
ALTER USER postgres WITH PASSWORD 'newpassword';
Performance Problems
Slow Queries
Identify slow queries using the pg_stat_statements
extension:
-
Enable the extension
sqlCREATE EXTENSION pg_stat_statements;
-
Find slow queries
sqlSELECT query, calls, total_exec_time, rows,
100.0 * total_exec_time / sum(total_exec_time) OVER() AS percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10; -
Analyze a problematic query with EXPLAIN ANALYZE
sqlEXPLAIN ANALYZE SELECT * FROM large_table WHERE non_indexed_column = 'value';
-
Create indexes for better performance
sqlCREATE INDEX idx_large_table_column ON large_table(non_indexed_column);
High CPU Usage
If PostgreSQL is consuming excessive CPU:
-
Check for long-running queries
sqlSELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC; -
Terminate a problematic query if necessary
sqlSELECT pg_cancel_backend(pid); -- Cancel query
SELECT pg_terminate_backend(pid); -- Terminate connection -
Optimize database configuration
Update
shared_buffers
andwork_mem
inpostgresql.conf
:shared_buffers = 1GB # 25% of RAM for dedicated servers
work_mem = 10MB # Depends on your workload
Disk Space Issues
Database Size Growth
Monitor database and table sizes:
-- Check database sizes
SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;
-- Check table sizes
SELECT relname AS table_name,
pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_relation_size(relid) DESC;
Solutions for Disk Space Issues
-
VACUUM to reclaim space
sql-- Basic vacuum
VACUUM;
-- Aggressive vacuum
VACUUM FULL; -- Note: Takes locks and rewrites the whole table -
Identify and remove bloat
sql-- Find bloated tables
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) -
pg_relation_size(schemaname || '.' || tablename)) as bloat_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10; -
Configure autovacuum properly
# postgresql.conf settings
autovacuum = on
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
Data Corruption Issues
Detecting Corruption
-
Use pg_dump with verification
bashpg_dump -Fc -v -f backup.dump database_name
-
Check table corruption with consistency checks
sql-- Basic check - will return rows if problem exists
SELECT * FROM my_table WHERE ctid IS NULL;
Recovering from Corruption
-
Use Point-in-Time Recovery (PITR)
bash# Restore from a base backup
pg_basebackup -D /var/lib/postgresql/13/main/ -U postgres -P -Xs -R -
Restore from backup
bashpg_restore -d database_name backup.dump
-
In extreme cases, try to salvage data
sql-- Create a new table with the structure
CREATE TABLE recovered_table AS
SELECT * FROM corrupted_table
WHERE ctid IS NOT NULL;
Replication Issues
Replication Lag
Check replication status and lag:
-- On primary
SELECT client_addr, state, sent_lsn, write_lsn,
flush_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
-- On replica
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
Common Replication Problems and Solutions
-
Replication slot issues
sql-- Check slots
SELECT * FROM pg_replication_slots;
-- Create a new slot
SELECT pg_create_physical_replication_slot('replica_slot_name');
-- Drop an unused slot
SELECT pg_drop_replication_slot('replica_slot_name'); -
WAL files accumulation
Check WAL directory size:
bashdu -sh /var/lib/postgresql/13/main/pg_wal/
Adjust
max_wal_size
if needed:max_wal_size = 1GB
Transaction ID Wraparound
Monitoring Transaction ID Usage
-- Check current transaction ID usage
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;
Preventing Wraparound
Run VACUUM FREEZE before reaching critical levels:
-- Aggressively freeze old XID
VACUUM FREEZE;
-- Check tables needing freeze
SELECT schemaname, relname, n_dead_tup, last_vacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup DESC;
Troubleshooting Workflow
The following diagram shows a systematic approach to PostgreSQL troubleshooting:
PostgreSQL Log Analysis
PostgreSQL logs are invaluable for troubleshooting. Find your log file location:
SHOW log_directory;
SHOW log_filename;
Common log levels to check:
# in postgresql.conf
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = 1000 # log queries taking more than 1 second
Common error patterns to look for:
ERROR: deadlock detected
ERROR: out of memory
FATAL: terminating connection due to administrator command
WARNING: transaction ID wraparound
Summary
Effective PostgreSQL troubleshooting requires a systematic approach:
- Identify the problem - Use logs, monitoring tools, and user reports
- Isolate the cause - Connection, performance, disk, corruption, or replication issues
- Implement a solution - Apply appropriate fixes based on the root cause
- Prevent recurrence - Update configurations, implement monitoring, and create maintenance procedures
Regular maintenance, monitoring, and a solid backup strategy are the best ways to prevent serious PostgreSQL problems.
Additional Resources
Practice Exercises
- Set up a test PostgreSQL database and intentionally fill the disk to practice handling disk space issues.
- Create a simple replication setup and practice troubleshooting common replication problems.
- Write scripts to monitor your PostgreSQL database for common issues and send alerts.
- Practice performance tuning by analyzing slow queries and implementing optimizations.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)