Skip to main content

PostgreSQL Point-in-Time Recovery

Introduction

Database disasters happen—whether from hardware failures, human errors, or software bugs. As your applications grow in complexity and importance, so does the value of your data. PostgreSQL Point-in-Time Recovery (PITR) is an advanced feature that allows you to restore your database to any specific moment in time, rather than just to the state of your last full backup.

In this guide, we'll explore how PITR works in PostgreSQL, why it's crucial for production environments, and how to implement it step by step. By the end, you'll understand how to protect your data against various failure scenarios and recover with minimal data loss.

What is Point-in-Time Recovery?

Point-in-Time Recovery combines base backups with continuous archiving of Write-Ahead Log (WAL) files to enable restoration of a PostgreSQL database to any point in time since the base backup was taken.

The system works through two key components:

  1. Base Backup: A snapshot of the database files at a specific moment
  2. WAL Archiving: Continuous recording of all changes made to the database after the base backup

When implementing PITR, PostgreSQL records every transaction in WAL files, which can later be replayed to recreate the exact state of the database at any desired point.

Why Use Point-in-Time Recovery?

PITR provides several critical benefits:

  • Precise Recovery: Restore to any specific second in time, not just to your last backup
  • Minimal Data Loss: Recover right up to the moment before a disaster occurred
  • Protection Against Various Failures:
    • Accidental data deletion or corruption
    • Failed application deployments
    • Human errors (like dropping the wrong table)
    • Hardware failures

Setting Up PITR in PostgreSQL

Let's walk through setting up PITR in your PostgreSQL environment.

Prerequisites

  • PostgreSQL server (9.6 or newer recommended)
  • Sufficient disk space for WAL archives
  • Administrative access to the database server

Step 1: Configure PostgreSQL for WAL Archiving

First, we need to modify PostgreSQL's configuration to enable WAL archiving. Edit your postgresql.conf file:

sql
# Enable WAL archiving
wal_level = replica # or 'logical' for PostgreSQL 10+
archive_mode = on # enables archiving
archive_command = 'cp %p /path/to/archive/%f' # command to archive a WAL file
archive_timeout = 60 # maximum time between WAL switches

The archive_command is a shell command that copies WAL files to your archive location. This can be a local directory, a network share, or a cloud storage solution.

Step 2: Set Up a Base Backup

Next, create a base backup that will serve as the starting point for recovery:

bash
pg_basebackup -D /path/to/backup -Ft -z -P

This command creates a compressed tar file of your PostgreSQL data directory. The options mean:

  • -D: Destination directory
  • -Ft: Output in tar format
  • -z: Compress with gzip
  • -P: Show progress

Step 3: Create a Recovery Configuration

For PostgreSQL 11 and earlier, create a recovery.conf file in your data directory. For PostgreSQL 12 and later, create a postgresql.auto.conf or use the standard postgresql.conf file, and create a recovery.signal file to indicate recovery mode.

For PostgreSQL 12+:

sql
# In postgresql.conf or postgresql.auto.conf
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 14:30:00'

For PostgreSQL 11 and earlier:

# In recovery.conf
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 14:30:00'

The restore_command tells PostgreSQL how to fetch archived WAL files during recovery.

Step 4: Perform the Recovery

To perform a recovery:

  1. Stop the PostgreSQL server if it's running:

    bash
    pg_ctl stop -D /path/to/data
  2. If needed, clear or move the existing data directory and extract your base backup:

    bash
    rm -rf /path/to/data/*
    tar -xzf /path/to/backup/base.tar.gz -C /path/to/data
  3. For PostgreSQL 12+, create the recovery signal file:

    bash
    touch /path/to/data/recovery.signal
  4. Start the server to begin recovery:

    bash
    pg_ctl start -D /path/to/data

PostgreSQL will begin recovering by first restoring the base backup and then applying WAL files up to the specified recovery target time.

Real-World Example: Recovering from Accidental Table Deletion

Let's walk through a complete example of using PITR to recover from an accidental table deletion.

Scenario

  • You have a production database with continuous archiving enabled
  • A developer accidentally runs DROP TABLE important_customers; at 2:45 PM
  • You need to recover the database to 2:44 PM, just before the mistake

Solution

  1. Stop the database server:

    bash
    sudo systemctl stop postgresql
  2. Move or rename the current data directory:

    bash
    mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_old
  3. Restore the most recent base backup:

    bash
    mkdir /var/lib/postgresql/14/main
    tar -xzf /backup/base_backup_latest.tar.gz -C /var/lib/postgresql/14/main
    chown -R postgres:postgres /var/lib/postgresql/14/main
  4. Create recovery configuration (PostgreSQL 14):

    bash
    # Edit postgresql.auto.conf
    echo "restore_command = 'cp /path/to/archive/%f %p'" >> /var/lib/postgresql/14/main/postgresql.auto.conf
    echo "recovery_target_time = '2023-04-15 14:44:00'" >> /var/lib/postgresql/14/main/postgresql.auto.conf
    echo "recovery_target_action = 'promote'" >> /var/lib/postgresql/14/main/postgresql.auto.conf

    # Create recovery signal file
    touch /var/lib/postgresql/14/main/recovery.signal
    chown postgres:postgres /var/lib/postgresql/14/main/recovery.signal
  5. Start the server to begin recovery:

    bash
    sudo systemctl start postgresql
  6. Monitor the recovery process:

    bash
    sudo -u postgres psql -c "SELECT pg_is_in_recovery();"

    When this returns f (false), the recovery is complete.

  7. Verify that the important_customers table exists and has the correct data:

    bash
    sudo -u postgres psql -d production -c "SELECT count(*) FROM important_customers;"

Best Practices for PITR

To ensure successful recovery when needed:

  1. Test Recovery Regularly: Simulate disasters and practice recovery to verify your process works

  2. Monitor Archiving: Ensure WAL files are being archived correctly

    bash
    # Check for archive failures
    grep -i "archive command failed" /var/log/postgresql/postgresql.log
  3. Secure Your Archives: Protect WAL archives with appropriate permissions and consider encryption

  4. Maintain Adequate Storage: Monitor archive storage and ensure sufficient space

  5. Document Recovery Procedures: Create clear, step-by-step recovery documentation for your team

Common Issues and Troubleshooting

Archive Command Failures

If your archive command fails, WAL files won't be archived properly. Check:

bash
# Verify archive command works manually
cp postgresql_wal_file /path/to/archive/postgresql_wal_file

Ensure the destination directory exists and has proper permissions.

Missing WAL Files During Recovery

If you see errors about missing WAL files during recovery:

FATAL:  could not find file "00000001000000A3000000FF": No such file or directory

This usually means there's a gap in your WAL archive. You might need to:

  • Use an older base backup
  • Adjust your recovery target time to before the gap
  • Restore missing WAL files from another source if available

Performance Considerations

Archiving WAL files can impact performance. Consider:

  • Using a dedicated disk for WAL files
  • Tuning archive_timeout based on your recovery point objective (RPO)
  • Using compression to reduce storage requirements

Summary

PostgreSQL Point-in-Time Recovery is a powerful feature that enables precise database restoration, minimizing data loss in disaster scenarios. By combining base backups with continuous WAL archiving, you can recover your database to any specific moment in time.

We've covered:

  • The concepts behind PITR
  • Setting up WAL archiving
  • Creating base backups
  • Configuring and performing recovery
  • A real-world recovery example
  • Best practices and troubleshooting

Implementing PITR should be a standard practice for any production PostgreSQL database where data loss is unacceptable.

Additional Resources

Exercises

  1. Set up WAL archiving on a test PostgreSQL instance
  2. Create a base backup of your test database
  3. Make some changes to your database, then deliberately "crash" it
  4. Restore the database to a specific point in time before the crash
  5. Create a shell script to automate the backup and archiving process
  6. Experiment with different archive_command options, including sending WAL files to remote storage

Remember that mastering Point-in-Time Recovery is an essential skill for database administrators and developers working with mission-critical data.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)