PostgreSQL External Tables

Introduction

External tables in PostgreSQL allow you to access data stored outside your database as if it were a regular table. Unlike traditional tables that store data directly in the database, external tables act as interfaces to external data sources, providing a seamless way to query and manipulate external data without importing it into your database.

This approach offers several advantages:

Data remains at the source - No need to duplicate large datasets
Real-time access - Query the most recent data directly from its source
Reduced storage requirements - Only store what you need in your database
Unified querying - Use standard SQL to query both internal and external data

In this guide, we'll explore how to set up and use external tables in PostgreSQL through the Foreign Data Wrapper (FDW) mechanism.

Understanding Foreign Data Wrappers

PostgreSQL implements external tables through a feature called Foreign Data Wrappers (FDWs). Based on the SQL/MED (Management of External Data) standard, FDWs provide a standardized way to access external data sources.

Here's a simple diagram showing how FDWs work:

The architecture involves several components:

Foreign Data Wrapper - The driver that knows how to communicate with a specific type of external data source
Foreign Server - Configuration for connecting to a specific external data source
Foreign Table - Maps to a table, file, or other data structure in the external data source
User Mapping - Stores credentials needed to access the external source

Popular Foreign Data Wrappers

PostgreSQL comes with several built-in FDWs, and many more are available as extensions:

FDW	Purpose	Sources
`postgres_fdw`	Connect to other PostgreSQL databases	PostgreSQL databases
`file_fdw`	Read data from files on the database server	CSV, text files
`dblink`	Execute queries on remote PostgreSQL servers	PostgreSQL databases
`mysql_fdw`	Connect to MySQL databases	MySQL databases
`mongo_fdw`	Connect to MongoDB collections	MongoDB databases
`oracle_fdw`	Connect to Oracle databases	Oracle databases
`elasticsearch_fdw`	Query Elasticsearch indexes	Elasticsearch
`s3_fdw`	Access files in Amazon S3 buckets	S3 storage

In this guide, we'll focus on two of the most common FDWs: file_fdw for accessing local files and postgres_fdw for connecting to other PostgreSQL databases.

Setting Up File-Based External Tables

Let's start with a common use case: accessing CSV data stored as files on your server.

Step 1: Enable the file_fdw Extension

First, we need to enable the file_fdw extension:

CREATE EXTENSION file_fdw;

Step 2: Create a Foreign Server

Next, we create a server object that will represent our file system:

CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;

Step 3: Create a Foreign Table

Now, we can create a foreign table that maps to our CSV file:

CREATE FOREIGN TABLE employees_external (
    id INTEGER,
    name TEXT,
    department TEXT,
    salary NUMERIC
)
SERVER file_server
OPTIONS (
    filename '/path/to/employees.csv',
    format 'csv',
    header 'true',
    delimiter ','
);

Step 4: Query the External Table

Once set up, you can query the external table just like any regular table:

SELECT * FROM employees_external;

Output:

 id |    name    | department | salary 
----+------------+------------+--------
  1 | John Smith | Sales      |  65000
  2 | Mary Jones | Marketing  |  78000
  3 | David Lee  | IT         |  82000
  4 | Lisa Wong  | Finance    |  76000

You can also join external tables with regular tables:

SELECT e.name, e.department, p.project_name
FROM employees_external e
JOIN projects p ON e.id = p.employee_id;

Working with Remote PostgreSQL Databases

Another common scenario is connecting to tables in another PostgreSQL database.

Step 1: Enable the postgres_fdw Extension

CREATE EXTENSION postgres_fdw;

Step 2: Create a Foreign Server

CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
    host 'remote-server.example.com',
    port '5432',
    dbname 'remote_database'
);

Step 3: Create a User Mapping

This stores the credentials needed to connect to the remote database:

CREATE USER MAPPING FOR local_user
SERVER remote_db
OPTIONS (
    user 'remote_user',
    password 'remote_password'
);

Step 4: Create a Foreign Table

CREATE FOREIGN TABLE remote_products (
    product_id INTEGER,
    product_name TEXT,
    price NUMERIC,
    category TEXT
)
SERVER remote_db
OPTIONS (
    schema_name 'public',
    table_name 'products'
);

Step 5: Query the Remote Data

SELECT * FROM remote_products WHERE price < 100;

Creating an Import/Export Workflow

External tables are particularly useful for ETL (Extract, Transform, Load) operations. Here's a practical example of how to use them in an import workflow:

Example: Monthly Sales Data Import

Suppose you receive monthly sales data as CSV files, and you need to import them into your analytics database:

-- Create a staging foreign table
CREATE FOREIGN TABLE monthly_sales_staging (
    date DATE,
    product_id INTEGER,
    quantity INTEGER,
    unit_price NUMERIC,
    total_amount NUMERIC
)
SERVER file_server
OPTIONS (
    filename '/path/to/incoming/sales_2023_10.csv',
    format 'csv',
    header 'true',
    delimiter ','
);

-- Create a permanent table for aggregated sales data
CREATE TABLE IF NOT EXISTS sales_summary (
    month DATE,
    product_id INTEGER,
    total_quantity INTEGER,
    revenue NUMERIC,
    CONSTRAINT pk_sales_summary PRIMARY KEY (month, product_id)
);

-- Insert the data with transformation
INSERT INTO sales_summary (month, product_id, total_quantity, revenue)
SELECT 
    DATE_TRUNC('month', date) AS month,
    product_id,
    SUM(quantity) AS total_quantity,
    SUM(total_amount) AS revenue
FROM monthly_sales_staging
GROUP BY DATE_TRUNC('month', date), product_id
ON CONFLICT (month, product_id) DO UPDATE
SET total_quantity = sales_summary.total_quantity + EXCLUDED.total_quantity,
    revenue = sales_summary.revenue + EXCLUDED.revenue;

This workflow allows you to import data without permanently storing the raw file contents in your database.

Advanced Techniques

Creating a View to Simplify Complex External Data

You can create views on top of foreign tables to simplify access and add business logic:

CREATE VIEW customer_sales AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.region,
    SUM(s.total_amount) AS total_sales
FROM customers c
JOIN sales_external s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.customer_name, c.region;

Using COPY for Bulk Importing from External Tables

You can use the COPY command to quickly move data from external tables to local tables:

-- Create a local table with the same structure
CREATE TABLE employees_local (
    id INTEGER PRIMARY KEY,
    name TEXT,
    department TEXT,
    salary NUMERIC
);

-- Copy data from external table to local table
COPY employees_local FROM PROGRAM 
    'psql -h remote-server -U user -d database -c "COPY (SELECT * FROM employees) TO STDOUT"';

Handling Access Permissions

To control who can access external data, you need to manage permissions:

-- Grant select permission on foreign table
GRANT SELECT ON employees_external TO analyst_role;

-- Revoke usage permission on foreign server
REVOKE USAGE ON FOREIGN SERVER remote_db FROM restricted_role;

Limitations and Considerations

When working with external tables, keep these limitations in mind:

Performance: Querying external data is typically slower than querying local tables
Transaction support: External tables may not fully participate in transactions
Function support: Not all PostgreSQL functions can be pushed down to external sources
Security: Credentials for external sources need careful management
Feature differences: External sources may not support all PostgreSQL features

Best Practices

Use for appropriate scenarios: External tables work best for data that changes frequently at the source or is too large to duplicate
Consider materialized views: For frequent queries, consider creating materialized views from external data
Monitor performance: External queries may need optimization differently than regular queries
Handle errors gracefully: External sources can become unavailable, so implement error handling
Secure credentials: Use encrypted password storage or connection pooling solutions

Practical Example: Building a Data Pipeline

Let's create a complete example of a data pipeline that:

Reads data from a CSV file
Transforms it
Stores results in the local database

-- Step 1: Set up the external table
CREATE EXTENSION IF NOT EXISTS file_fdw;
CREATE SERVER IF NOT EXISTS file_server FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE raw_weather_data (
    station_id TEXT,
    observation_date DATE,
    temperature NUMERIC,
    precipitation NUMERIC,
    humidity NUMERIC,
    wind_speed NUMERIC
)
SERVER file_server
OPTIONS (
    filename '/data/weather_readings.csv',
    format 'csv',
    header 'true',
    delimiter ','
);

-- Step 2: Create local tables for processed data
CREATE TABLE IF NOT EXISTS weather_stations (
    station_id TEXT PRIMARY KEY,
    location TEXT,
    elevation NUMERIC
);

CREATE TABLE IF NOT EXISTS weather_summary (
    station_id TEXT REFERENCES weather_stations(station_id),
    month DATE,
    avg_temperature NUMERIC,
    total_precipitation NUMERIC,
    avg_humidity NUMERIC,
    avg_wind_speed NUMERIC,
    PRIMARY KEY (station_id, month)
);

-- Step 3: Transform and load the data
INSERT INTO weather_summary (
    station_id,
    month,
    avg_temperature,
    total_precipitation,
    avg_humidity,
    avg_wind_speed
)
SELECT 
    station_id,
    DATE_TRUNC('month', observation_date) AS month,
    AVG(temperature) AS avg_temperature,
    SUM(precipitation) AS total_precipitation,
    AVG(humidity) AS avg_humidity,
    AVG(wind_speed) AS avg_wind_speed
FROM raw_weather_data
GROUP BY station_id, DATE_TRUNC('month', observation_date)
ON CONFLICT (station_id, month) DO UPDATE
SET avg_temperature = EXCLUDED.avg_temperature,
    total_precipitation = EXCLUDED.total_precipitation,
    avg_humidity = EXCLUDED.avg_humidity,
    avg_wind_speed = EXCLUDED.avg_wind_speed;

This pipeline reads raw weather data, aggregates it by month, and stores the summary in a local table.

Summary

PostgreSQL external tables, implemented through Foreign Data Wrappers, provide a powerful way to integrate external data sources into your database workflow. They allow you to:

Access data from various sources (files, other databases, web services)
Query external data using standard SQL
Join external and local data seamlessly
Build efficient data pipelines and ETL processes

While external tables have some limitations compared to local tables, they offer significant benefits in scenarios involving large datasets, frequently changing data, or distributed data architectures.

Additional Resources

To learn more about PostgreSQL external tables and Foreign Data Wrappers:

Exercises

Create a foreign table that accesses a CSV file containing customer data.
Set up a foreign table that connects to another PostgreSQL database and queries product inventory.
Build a reporting solution that combines data from a local sales table with customer data from an external source.
Create a data validation workflow that compares data in your local database with an external reference database.
Implement a logging system that archives log data from your database to external files using foreign tables.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Understanding Foreign Data Wrappers​

Popular Foreign Data Wrappers​

Setting Up File-Based External Tables​

Step 1: Enable the file_fdw Extension​

Step 2: Create a Foreign Server​

Step 3: Create a Foreign Table​

Step 4: Query the External Table​

Working with Remote PostgreSQL Databases​

Step 1: Enable the postgres_fdw Extension​

Step 2: Create a Foreign Server​

Step 3: Create a User Mapping​

Step 4: Create a Foreign Table​

Step 5: Query the Remote Data​

Creating an Import/Export Workflow​

Example: Monthly Sales Data Import​

Advanced Techniques​

Creating a View to Simplify Complex External Data​

Using COPY for Bulk Importing from External Tables​

Handling Access Permissions​

Limitations and Considerations​

Best Practices​

Practical Example: Building a Data Pipeline​

Summary​

Additional Resources​

Exercises​

Introduction

Understanding Foreign Data Wrappers

Popular Foreign Data Wrappers

Setting Up File-Based External Tables

Step 1: Enable the file_fdw Extension

Step 2: Create a Foreign Server

Step 3: Create a Foreign Table

Step 4: Query the External Table

Working with Remote PostgreSQL Databases

Step 1: Enable the postgres_fdw Extension

Step 2: Create a Foreign Server

Step 3: Create a User Mapping

Step 4: Create a Foreign Table

Step 5: Query the Remote Data

Creating an Import/Export Workflow

Example: Monthly Sales Data Import

Advanced Techniques

Creating a View to Simplify Complex External Data

Using COPY for Bulk Importing from External Tables

Handling Access Permissions

Limitations and Considerations

Best Practices

Practical Example: Building a Data Pipeline

Summary

Additional Resources

Exercises