PostgreSQL Backup Strategies
Database backups are a critical aspect of any production system. In this guide, we'll explore various PostgreSQL backup strategies to help you implement a robust data protection plan for your databases.
Introduction
No matter how reliable your hardware or how skilled your team, data loss can happen to anyone. Hardware failures, human errors, software bugs, or even natural disasters can compromise your data. Having a solid backup strategy isn't just good practice—it's essential for business continuity and data protection.
PostgreSQL offers several built-in tools and approaches for creating backups, each with its own advantages and use cases. Let's explore these options to help you choose the right strategy for your needs.
Understanding Backup Types
Before diving into specific tools, let's understand the main types of PostgreSQL backups:
Logical vs Physical Backups
Logical backups extract database content into a script file or other format that contains SQL commands. These backups are portable, flexible, and version-independent but generally slower for large databases.
Physical backups (also called binary backups) copy the raw database files directly. They're typically faster and produce exact replicas of your database but are often version-specific and less flexible.
Full vs Incremental Backups
Full backups contain your entire database and can be restored independently.
Incremental backups only contain data that has changed since the previous backup, making them faster and smaller but dependent on previous backups for restoration.
Backup Methods in PostgreSQL
Let's explore the primary tools and methods for backing up PostgreSQL databases.
1. pg_dump: Logical Backups of a Single Database
pg_dump
is PostgreSQL's built-in utility for creating logical backups of a single database. It generates a file containing SQL commands that, when executed, will recreate the database in the same state as it was at the time of the backup.
Basic Usage
pg_dump -U username -d database_name -f backup_file.sql
Example with Options
# Create a compressed backup in custom format
pg_dump -U postgres -d myapp_production -F c -f myapp_backup.dump
# Create a plain SQL backup excluding the large_data_table
pg_dump -U postgres -d myapp_production -T large_data_table -f myapp_backup.sql
Key Features and Options
-
Format options (-F):
- Plain text (default): SQL commands
- Custom (-F c): PostgreSQL-specific format that allows selective restores
- Directory (-F d): Directory format for parallel restore
- Archive (-F a): Compressed archive format
-
Compression (-Z): Specify compression level (0-9)
-
Table inclusion/exclusion (-t/-T): Specify tables to include or exclude
-
Schema inclusion/exclusion (-n/-N): Specify schemas to include or exclude
-
Data-only or schema-only (-a/-s): Backup just data or just schema
2. pg_dumpall: Backup Multiple Databases
While pg_dump
handles a single database, pg_dumpall
can back up an entire PostgreSQL instance, including all databases and global objects like roles and tablespaces.
# Backup all databases and global objects
pg_dumpall -U postgres -f full_backup.sql
# Backup only global objects (roles, tablespaces)
pg_dumpall -U postgres --globals-only -f globals_backup.sql
3. Continuous Archiving & Point-in-Time Recovery (PITR)
For production systems requiring minimal data loss, PostgreSQL offers Continuous Archiving using Write-Ahead Log (WAL) files, enabling Point-in-Time Recovery.
How it Works
- PostgreSQL continuously records all changes in WAL files
- These WAL files are archived to a safe location
- Periodically, a base backup (physical backup) is taken
- Recovery combines the base backup with WAL files to restore to any point in time
Setting Up WAL Archiving
First, configure your postgresql.conf
file:
# Enable WAL archiving
wal_level = replica
archive_mode = on
archive_command = 'cp %p /path/to/archive/%f'
Then take a base backup:
# Create a base backup
pg_basebackup -D /backup/base/directory -Ft -z
Recovery Process
To recover using PITR:
- Create a recovery.conf file in your data directory:
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 10:30:00'
- Start PostgreSQL, which will recover to the specified point in time
4. File System Level Backup
File system backups involve copying PostgreSQL's data directory directly. This method requires that PostgreSQL be shut down or put in backup mode to ensure consistency.
# First, stop PostgreSQL or use pg_start_backup()
sudo systemctl stop postgresql
# Copy the data directory
sudo cp -R /var/lib/postgresql/data /backup/location
# Restart PostgreSQL
sudo systemctl start postgresql
5. Logical Replication for Online Backups
Logical replication allows you to replicate specific tables or databases to another server, which can serve as a hot backup.
-- On the publisher (source) server
CREATE PUBLICATION my_publication FOR TABLE customers, orders;
-- On the subscriber (backup) server
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher dbname=mydb user=postgres'
PUBLICATION my_publication;
Implementing a Comprehensive Backup Strategy
A robust backup strategy typically combines multiple methods. Here's an example approach:
Example Implementation Plan
-
Daily Logical Backups
bash# Create a backup script
cat > /etc/cron.daily/pg_backup.sh << 'EOF'
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/var/backups/postgresql"
DB_USER="postgres"
DB_NAME="myapp_production"
# Create backup directory if it doesn't exist
mkdir -p ${BACKUP_DIR}
# Perform the backup
pg_dump -U ${DB_USER} -d ${DB_NAME} -F c -f ${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.dump
# Delete backups older than 30 days
find ${BACKUP_DIR} -name "*.dump" -type f -mtime +30 -delete
EOF
# Make it executable
chmod +x /etc/cron.daily/pg_backup.sh -
Continuous WAL Archiving Edit
postgresql.conf
:wal_level = replica
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/archive/%f && cp %p /var/lib/postgresql/archive/%f' -
Weekly Base Backup
bash# Create a weekly backup script
cat > /etc/cron.weekly/pg_base_backup.sh << 'EOF'
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d)
BACKUP_DIR="/var/backups/postgresql/basebackups"
# Create backup directory if it doesn't exist
mkdir -p ${BACKUP_DIR}
# Perform base backup
pg_basebackup -D ${BACKUP_DIR}/${TIMESTAMP} -Ft -z -U postgres
# Delete backups older than 8 weeks
find ${BACKUP_DIR} -type d -mtime +56 -exec rm -rf {} \;
EOF
# Make it executable
chmod +x /etc/cron.weekly/pg_base_backup.sh
Backup Verification and Testing
A backup is only as good as your ability to restore from it. Regular testing of your backups is crucial.
# Script to test pg_dump restoration
cat > test_restore.sh << 'EOF'
#!/bin/bash
BACKUP_FILE=$1
TEST_DB="restore_test"
# Drop test database if it exists
dropdb --if-exists -U postgres ${TEST_DB}
# Create test database
createdb -U postgres ${TEST_DB}
# Restore backup
pg_restore -U postgres -d ${TEST_DB} ${BACKUP_FILE}
# Run simple test query
psql -U postgres -d ${TEST_DB} -c "SELECT COUNT(*) FROM users;"
# Clean up
dropdb -U postgres ${TEST_DB}
EOF
chmod +x test_restore.sh
Monitoring Backup Operations
Implement monitoring to ensure your backups complete successfully:
# Script to check backup status
cat > check_backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/var/backups/postgresql"
ALERT_EMAIL="[email protected]"
# Check if any backups were created in the last 24 hours
if [ $(find ${BACKUP_DIR} -name "*.dump" -type f -mtime -1 | wc -l) -eq 0 ]; then
echo "WARNING: No PostgreSQL backups found in the last 24 hours!" | mail -s "Backup Alert" ${ALERT_EMAIL}
exit 1
fi
# Check backup size
LATEST_BACKUP=$(find ${BACKUP_DIR} -name "*.dump" -type f -mtime -1 | sort | tail -1)
if [ -f "${LATEST_BACKUP}" ]; then
SIZE=$(du -k "${LATEST_BACKUP}" | cut -f1)
if [ ${SIZE} -lt 1000 ]; then # Less than 1MB
echo "WARNING: Latest backup file is suspiciously small (${SIZE}KB)!" | mail -s "Backup Alert" ${ALERT_EMAIL}
exit 1
fi
fi
exit 0
EOF
chmod +x check_backup.sh
Best Practices for PostgreSQL Backups
- Diversify backup methods: Combine logical and physical backups
- Follow the 3-2-1 rule: Keep 3 copies, on 2 different media, with 1 offsite
- Automate everything: Backup creation, testing, and cleanup
- Document procedures: Include both backup and restoration procedures
- Monitor and alert: Be notified of backup failures
- Test restores regularly: Schedule monthly restore drills
- Balance backup frequency with impact: Consider performance impacts on production
- Encrypt sensitive backups: Protect data in transit and at rest
- Version control backup scripts: Track changes to backup procedures
- Backup configuration too: Save
postgresql.conf
andpg_hba.conf
Restoration Procedures
Knowing how to restore is as important as taking backups. Here are the basic restoration commands:
Restoring a pg_dump Backup
# For SQL format
psql -U postgres -d database_name -f backup_file.sql
# For custom format
pg_restore -U postgres -d database_name backup_file.dump
# For directory format
pg_restore -U postgres -d database_name -j 4 backup_directory
Restoring from a Base Backup with WAL
# Stop PostgreSQL
sudo systemctl stop postgresql
# Clear data directory
sudo rm -rf /var/lib/postgresql/data/*
# Restore base backup
sudo tar -xzf /path/to/base_backup.tar.gz -C /var/lib/postgresql/data
# Create recovery.conf
cat > /var/lib/postgresql/data/recovery.conf << EOF
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 10:30:00'
EOF
# Fix permissions
sudo chown -R postgres:postgres /var/lib/postgresql/data
sudo chmod 700 /var/lib/postgresql/data
# Start PostgreSQL
sudo systemctl start postgresql
Summary
PostgreSQL offers multiple backup strategies to meet different recovery needs:
- pg_dump/pg_dumpall: Logical backups suitable for smaller databases and when portability matters
- Continuous Archiving (PITR): For minimal data loss in production environments
- File System Backups: Simple but requires database downtime
- Replication: For high-availability and online backups
A comprehensive backup strategy should include a combination of these methods, regular testing, and proper monitoring. By implementing the practices outlined in this guide, you'll be well-prepared to recover from potential data loss scenarios.
Additional Resources and Exercises
Resources
Exercises
-
Exercise 1: Set up daily pg_dump backups for a test database
- Create a simple database with a few tables
- Write a script to automate backups
- Test the restoration process
-
Exercise 2: Configure WAL archiving
- Modify your PostgreSQL configuration to enable archiving
- Test recovery to a specific point in time
-
Exercise 3: Design a complete backup strategy
- Based on a scenario (e.g., e-commerce site with 100GB database)
- Include backup frequency, retention, and testing procedures
- Document the restoration process for different failure scenarios
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)