PostgreSQL Point-in-Time Recovery

Introduction

Database disasters happen—whether from hardware failures, human errors, or software bugs. As your applications grow in complexity and importance, so does the value of your data. PostgreSQL Point-in-Time Recovery (PITR) is an advanced feature that allows you to restore your database to any specific moment in time, rather than just to the state of your last full backup.

In this guide, we'll explore how PITR works in PostgreSQL, why it's crucial for production environments, and how to implement it step by step. By the end, you'll understand how to protect your data against various failure scenarios and recover with minimal data loss.

What is Point-in-Time Recovery?

Point-in-Time Recovery combines base backups with continuous archiving of Write-Ahead Log (WAL) files to enable restoration of a PostgreSQL database to any point in time since the base backup was taken.

The system works through two key components:

Base Backup: A snapshot of the database files at a specific moment
WAL Archiving: Continuous recording of all changes made to the database after the base backup

When implementing PITR, PostgreSQL records every transaction in WAL files, which can later be replayed to recreate the exact state of the database at any desired point.

Why Use Point-in-Time Recovery?

PITR provides several critical benefits:

Precise Recovery: Restore to any specific second in time, not just to your last backup
Minimal Data Loss: Recover right up to the moment before a disaster occurred
Protection Against Various Failures:
- Accidental data deletion or corruption
- Failed application deployments
- Human errors (like dropping the wrong table)
- Hardware failures

Setting Up PITR in PostgreSQL

Let's walk through setting up PITR in your PostgreSQL environment.

Prerequisites

PostgreSQL server (9.6 or newer recommended)
Sufficient disk space for WAL archives
Administrative access to the database server

Step 1: Configure PostgreSQL for WAL Archiving

First, we need to modify PostgreSQL's configuration to enable WAL archiving. Edit your postgresql.conf file:

# Enable WAL archiving
wal_level = replica                 # or 'logical' for PostgreSQL 10+
archive_mode = on                   # enables archiving
archive_command = 'cp %p /path/to/archive/%f'  # command to archive a WAL file
archive_timeout = 60                # maximum time between WAL switches

The archive_command is a shell command that copies WAL files to your archive location. This can be a local directory, a network share, or a cloud storage solution.

Step 2: Set Up a Base Backup

Next, create a base backup that will serve as the starting point for recovery:

pg_basebackup -D /path/to/backup -Ft -z -P

This command creates a compressed tar file of your PostgreSQL data directory. The options mean:

-D: Destination directory
-Ft: Output in tar format
-z: Compress with gzip
-P: Show progress

Step 3: Create a Recovery Configuration

For PostgreSQL 11 and earlier, create a recovery.conf file in your data directory. For PostgreSQL 12 and later, create a postgresql.auto.conf or use the standard postgresql.conf file, and create a recovery.signal file to indicate recovery mode.

For PostgreSQL 12+:

# In postgresql.conf or postgresql.auto.conf
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 14:30:00'

For PostgreSQL 11 and earlier:

# In recovery.conf
restore_command = 'cp /path/to/archive/%f %p'
recovery_target_time = '2023-04-15 14:30:00'

The restore_command tells PostgreSQL how to fetch archived WAL files during recovery.

Step 4: Perform the Recovery

To perform a recovery:

Stop the PostgreSQL server if it's running:
```
pg_ctl stop -D /path/to/data
```
If needed, clear or move the existing data directory and extract your base backup:
```
rm -rf /path/to/data/*
tar -xzf /path/to/backup/base.tar.gz -C /path/to/data
```
For PostgreSQL 12+, create the recovery signal file:
```
touch /path/to/data/recovery.signal
```
Start the server to begin recovery:
```
pg_ctl start -D /path/to/data
```

PostgreSQL will begin recovering by first restoring the base backup and then applying WAL files up to the specified recovery target time.

Real-World Example: Recovering from Accidental Table Deletion

Let's walk through a complete example of using PITR to recover from an accidental table deletion.

Scenario

You have a production database with continuous archiving enabled
A developer accidentally runs DROP TABLE important_customers; at 2:45 PM
You need to recover the database to 2:44 PM, just before the mistake

Solution

Stop the database server:
```
sudo systemctl stop postgresql
```

Move or rename the current data directory:

mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main_old

Restore the most recent base backup:

mkdir /var/lib/postgresql/14/main
tar -xzf /backup/base_backup_latest.tar.gz -C /var/lib/postgresql/14/main
chown -R postgres:postgres /var/lib/postgresql/14/main

Create recovery configuration (PostgreSQL 14):

# Edit postgresql.auto.conf
echo "restore_command = 'cp /path/to/archive/%f %p'" >> /var/lib/postgresql/14/main/postgresql.auto.conf
echo "recovery_target_time = '2023-04-15 14:44:00'" >> /var/lib/postgresql/14/main/postgresql.auto.conf
echo "recovery_target_action = 'promote'" >> /var/lib/postgresql/14/main/postgresql.auto.conf

# Create recovery signal file
touch /var/lib/postgresql/14/main/recovery.signal
chown postgres:postgres /var/lib/postgresql/14/main/recovery.signal

Start the server to begin recovery:
```
sudo systemctl start postgresql
```
Monitor the recovery process:
```
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
```
When this returns f (false), the recovery is complete.

Verify that the important_customers table exists and has the correct data:

sudo -u postgres psql -d production -c "SELECT count(*) FROM important_customers;"

Best Practices for PITR

To ensure successful recovery when needed:

Test Recovery Regularly: Simulate disasters and practice recovery to verify your process works

Monitor Archiving: Ensure WAL files are being archived correctly

# Check for archive failures
grep -i "archive command failed" /var/log/postgresql/postgresql.log

Secure Your Archives: Protect WAL archives with appropriate permissions and consider encryption
Maintain Adequate Storage: Monitor archive storage and ensure sufficient space
Document Recovery Procedures: Create clear, step-by-step recovery documentation for your team

Common Issues and Troubleshooting

Archive Command Failures

If your archive command fails, WAL files won't be archived properly. Check:

# Verify archive command works manually
cp postgresql_wal_file /path/to/archive/postgresql_wal_file

Ensure the destination directory exists and has proper permissions.

Missing WAL Files During Recovery

If you see errors about missing WAL files during recovery:

FATAL:  could not find file "00000001000000A3000000FF": No such file or directory

This usually means there's a gap in your WAL archive. You might need to:

Use an older base backup
Adjust your recovery target time to before the gap
Restore missing WAL files from another source if available

Performance Considerations

Archiving WAL files can impact performance. Consider:

Using a dedicated disk for WAL files
Tuning archive_timeout based on your recovery point objective (RPO)
Using compression to reduce storage requirements

Summary

PostgreSQL Point-in-Time Recovery is a powerful feature that enables precise database restoration, minimizing data loss in disaster scenarios. By combining base backups with continuous WAL archiving, you can recover your database to any specific moment in time.

We've covered:

The concepts behind PITR
Setting up WAL archiving
Creating base backups
Configuring and performing recovery
A real-world recovery example
Best practices and troubleshooting

Implementing PITR should be a standard practice for any production PostgreSQL database where data loss is unacceptable.

Additional Resources

Exercises

Set up WAL archiving on a test PostgreSQL instance
Create a base backup of your test database
Make some changes to your database, then deliberately "crash" it
Restore the database to a specific point in time before the crash
Create a shell script to automate the backup and archiving process
Experiment with different archive_command options, including sending WAL files to remote storage

Remember that mastering Point-in-Time Recovery is an essential skill for database administrators and developers working with mission-critical data.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is Point-in-Time Recovery?​

Why Use Point-in-Time Recovery?​

Setting Up PITR in PostgreSQL​

Prerequisites​

Step 1: Configure PostgreSQL for WAL Archiving​

Step 2: Set Up a Base Backup​

Step 3: Create a Recovery Configuration​

Step 4: Perform the Recovery​

Real-World Example: Recovering from Accidental Table Deletion​

Scenario​

Solution​

Best Practices for PITR​

Common Issues and Troubleshooting​

Archive Command Failures​

Missing WAL Files During Recovery​

Performance Considerations​

Summary​

Additional Resources​

Exercises​