PostgreSQL pg_restore

Introduction

In PostgreSQL's suite of database management tools, pg_restore stands as a powerful utility designed to restore PostgreSQL databases from archive files created by pg_dump. This tool is essential for database administrators and developers who need to recover data, migrate databases between environments, or set up testing scenarios with production-like data.

Unlike simple SQL script restoration, pg_restore provides advanced capabilities for handling custom-format and directory-format archives, offering features like parallel processing, selective restoration, and flexible ordering of restored objects. This makes it particularly valuable for managing large databases efficiently.

Understanding pg_restore

What is pg_restore?

pg_restore is a PostgreSQL utility that reads archive files (created by pg_dump using custom or directory format) and restores them to a PostgreSQL database. It's particularly useful because it can:

Selectively restore specific database objects
Reorder restoration sequences
Process multiple jobs in parallel
Handle large archives efficiently

When to Use pg_restore

You should use pg_restore when:

You have archives in custom or directory format created by pg_dump
You need to restore only specific objects from a backup
You want to leverage parallel processing for faster restoration
You need to modify objects during restoration (e.g., change ownership)

For plain-text SQL script backups, you would use psql instead.

Basic Usage

The simplest form of the pg_restore command looks like this:

pg_restore -d target_database archive_file

For example, to restore an archive named mydb_backup.dump to a database named mydb:

pg_restore -d mydb mydb_backup.dump

Common Options

Here are some commonly used options with pg_restore:

Option	Description
`-d dbname`	Connect to database `dbname`
`-h host`	Database server host
`-p port`	Database server port
`-U username`	Connect as specified user
`-W`	Force password prompt
`-c`	Clean (drop) database objects before recreating
`-C`	Create the database before restoring
`-e`	Exit if an error occurs (default is to continue)
`-j jobs`	Use this many parallel jobs for restore
`-v`	Verbose mode
`-n schema`	Restore only objects in this schema
`-t table`	Restore only this table

Step-by-Step Examples

Let's walk through some practical examples of using pg_restore in different scenarios.

Example 1: Basic Full Database Restore

Let's restore a complete database from a custom-format backup:

pg_restore -d target_db -v backup_file.dump

The output would look something like:

pg_restore: connecting to database "target_db"
pg_restore: creating SCHEMA "public"
pg_restore: creating TABLE "public.users"
pg_restore: creating TABLE "public.products"
pg_restore: creating INDEX "users_email_idx"
pg_restore: creating CONSTRAINT "products_pkey"
pg_restore: setting owner and privileges for TABLE "public.users"
pg_restore: setting owner and privileges for TABLE "public.products"

Example 2: Restore with Cleanup

In this example, we'll clean (drop) existing objects before restoring:

pg_restore -c -d target_db backup_file.dump

This is particularly useful when you want to ensure that the resulting database matches exactly what's in the backup, without any leftover objects.

Example 3: Parallel Restoration for Large Databases

For large databases, you can significantly speed up the restoration process by using parallel jobs:

pg_restore -d target_db -j 4 backup_file.dump

This command uses 4 parallel jobs to restore the database, which can be much faster on multi-core systems with sufficient I/O capacity.

Example 4: Selective Restoration

One of the most powerful features of pg_restore is the ability to selectively restore objects:

# Restore only the 'customers' table
pg_restore -d target_db -t customers backup_file.dump

# Restore only objects in the 'sales' schema
pg_restore -d target_db -n sales backup_file.dump

Example 5: Create Database During Restoration

If the target database doesn't exist yet, you can create it during restoration:

pg_restore -C -d postgres backup_file.dump

This creates the database with the same name as in the backup before restoring its contents. Note that the -d postgres part connects to the default postgres database first.

Advanced Usage

List Contents of an Archive

Before restoring, you might want to see what's in the archive:

pg_restore -l backup_file.dump > backup_contents.lst

This creates a list file that shows all the objects in the archive.

Modifying the Restoration Order

You can edit the list file to:

Remove objects you don't want to restore
Change the order of restoration

Then use it with the -L option:

pg_restore -d target_db -L backup_contents.lst backup_file.dump

Handling Errors

By default, pg_restore continues even if errors occur. To make it stop at the first error:

pg_restore -d target_db -e backup_file.dump

Changing Object Ownership

You can change the ownership of objects during restoration:

pg_restore -d target_db --no-owner backup_file.dump

This command restores objects without setting ownership, so they'll be owned by the user running the restore.

Understanding the Restoration Process

When pg_restore runs, it processes the backup in several phases:

Pre-data Phase: Creates schemas, tables, functions, etc.
Data Phase: Loads the actual data into tables
Post-data Phase: Creates indexes, triggers, constraints, etc.
Final Phase: Sets permissions and ownership

This staged approach optimizes the restoration process, especially for large databases.

Common Issues and Troubleshooting

Error: Database Already Exists

pg_restore: [archiver] Error while PROCESSING TOC:
pg_restore: [archiver] Error from TOC entry 3; 1262 16384 DATABASE mydb postgres
pg_restore: [archiver] could not execute query: ERROR:  database "mydb" already exists

Solution: Use the -c option to drop objects first, or manually drop the database before restoring.

Permissions Issues

pg_restore: [archiver] could not execute query: ERROR:  permission denied for schema public

Solution: Ensure the user performing the restoration has sufficient permissions, or use --no-owner to avoid permission issues.

Version Compatibility

pg_restore: [archiver] unsupported version (1.13) in file header

Solution: Make sure you're using a version of pg_restore compatible with the version of pg_dump that created the archive.

Integration with Scripts and Workflows

Automating Database Restoration

Here's a simple shell script to automate database restoration:

#!/bin/bash
# Database restoration script

DB_NAME="production_clone"
BACKUP_FILE="/path/to/backups/latest.dump"
LOG_FILE="/path/to/logs/restore_$(date +%Y%m%d).log"

echo "Starting database restoration at $(date)" > $LOG_FILE

# Drop database if it exists
psql -c "DROP DATABASE IF EXISTS $DB_NAME;" postgres

# Create fresh database
psql -c "CREATE DATABASE $DB_NAME;" postgres

# Restore from backup
pg_restore -v -d $DB_NAME $BACKUP_FILE >> $LOG_FILE 2>&1

echo "Restoration completed at $(date)" >> $LOG_FILE

# Check for errors
if grep -q "ERROR:" $LOG_FILE; then
  echo "Errors occurred during restoration. Check $LOG_FILE for details."
  exit 1
else
  echo "Restoration completed successfully!"
  exit 0
fi

Scheduled Backups and Restorations

For regular testing or development database refreshes, you might set up a cron job:

# Restore production backup to test environment every Sunday at 1 AM
0 1 * * 0 /path/to/restore_script.sh

Best Practices

When working with pg_restore, consider these best practices:

Test Restorations Regularly: Don't wait until a disaster to test your backup and restore procedures.
Use Parallel Jobs Wisely: While -j can speed up restoration, too many parallel jobs might overwhelm your system.
Consider Network Limitations: When restoring to remote servers, network bandwidth may become a bottleneck.
Monitor Disk Space: Ensure you have enough disk space for the restored database and any temporary files.
Use Version-Compatible Tools: Always use a pg_restore version compatible with your PostgreSQL server.
Secure Your Backups: Archives may contain sensitive data; ensure they're properly secured.
Document Your Procedures: Maintain clear documentation of your backup and restore procedures.

Summary

PostgreSQL's pg_restore utility is an essential tool for database administrators and developers who need to restore PostgreSQL databases from archive files. Its flexibility in handling different restoration scenarios, from full database restoration to selective object recovery, makes it incredibly valuable in various situations:

Database migration between environments
Recovery from failures
Setting up test environments with production-like data
Selective data restoration for specific needs

By understanding the various options and techniques available with pg_restore, you can efficiently manage your database restoration needs while minimizing downtime and resource usage.

Additional Resources

Official PostgreSQL Documentation on pg_restore
PostgreSQL Administration Cookbook
PostgreSQL Backup and Restore Strategies (various online tutorials)

Exercises

Create a backup of a sample database using pg_dump with the custom format, then restore it to a different database using pg_restore.
Practice selective restoration by restoring only specific tables from your backup.
Experiment with parallel restoration using different numbers of jobs (-j option) and observe the performance differences.
Create a list file from your backup, modify it to exclude certain objects, and perform a restoration using this modified list.
Write a shell script that automates the backup and restoration process for a development environment.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding pg_restore​

What is pg_restore?​

When to Use pg_restore​

Basic Usage​

Common Options​

Step-by-Step Examples​

Example 1: Basic Full Database Restore​

Example 2: Restore with Cleanup​

Example 3: Parallel Restoration for Large Databases​

Example 4: Selective Restoration​

Example 5: Create Database During Restoration​

Advanced Usage​

List Contents of an Archive​

Modifying the Restoration Order​

Handling Errors​

Changing Object Ownership​

Understanding the Restoration Process​

Common Issues and Troubleshooting​

Error: Database Already Exists​

Permissions Issues​

Version Compatibility​

Integration with Scripts and Workflows​

Automating Database Restoration​

Scheduled Backups and Restorations​

Best Practices​

Summary​

Additional Resources​

Exercises​