Skip to main content

PostgreSQL CSV Import Export

Introduction

Working with data often involves transferring information between different systems and formats. CSV (Comma-Separated Values) is one of the most common formats for exchanging data due to its simplicity and widespread support. PostgreSQL, a powerful open-source relational database, provides robust tools for importing data from CSV files and exporting query results to CSV format.

In this tutorial, we'll explore how to:

  • Import CSV data into PostgreSQL tables
  • Export PostgreSQL data to CSV files
  • Handle common challenges and edge cases
  • Use practical techniques in real-world scenarios

Understanding CSV Format

Before diving into import and export operations, let's understand the CSV format:

  • CSV files store tabular data as plain text
  • Each line represents a row of data
  • Fields within a row are separated by commas (or sometimes other delimiters)
  • Fields may be enclosed in quotes, especially if they contain commas or line breaks

A simple CSV file might look like this:

id,name,email,age
1,John Doe,[email protected],32
2,Jane Smith,[email protected],28
3,Bob Johnson,"[email protected]",45

Importing CSV Data into PostgreSQL

PostgreSQL offers several methods to import CSV data, each with its own advantages:

Method 1: Using the COPY Command

The COPY command is PostgreSQL's native method for fast data import and export. It requires direct file system access.

Basic Syntax:

sql
COPY table_name(column1, column2, ...)
FROM '/path/to/your/file.csv'
WITH (FORMAT csv, HEADER true);

Example:

Let's create a table and import data from a CSV file:

sql
-- Create a table to store user data
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
age INTEGER
);

-- Import data from CSV file
COPY users(id, name, email, age)
FROM '/tmp/users.csv'
WITH (FORMAT csv, HEADER true);

This will read the CSV file at /tmp/users.csv, skip the header row, and insert the data into the users table.

Method 2: Using \copy in psql

The \copy command in psql is similar to COPY, but it doesn't require server file system access. Instead, it reads the file from the client machine and transfers the data to the server.

Basic Syntax:

sql
\copy table_name(column1, column2, ...) FROM '/path/to/your/file.csv' WITH (FORMAT csv, HEADER true);

Example:

sql
\copy users(id, name, email, age) FROM '/tmp/users.csv' WITH (FORMAT csv, HEADER true);

Method 3: Using COPY FROM STDIN

For applications that need programmatic control, you can use COPY FROM STDIN:

sql
COPY users(id, name, email, age) FROM STDIN WITH (FORMAT csv, HEADER true);
-- Here you would provide the CSV data
-- End with a backslash-period on a line by itself
\.

Handling Common Import Challenges

Dealing with Custom Delimiters

If your CSV uses a delimiter other than a comma (such as tabs or semicolons):

sql
COPY users FROM '/tmp/users.csv' WITH (FORMAT csv, DELIMITER ';', HEADER true);

Handling NULL Values

Specify how NULL values are represented in your CSV:

sql
COPY users FROM '/tmp/users.csv' WITH (
FORMAT csv,
HEADER true,
NULL 'NA'
);

Dealing with Quoted Fields

For CSV files with quoted fields:

sql
COPY users FROM '/tmp/users.csv' WITH (
FORMAT csv,
HEADER true,
QUOTE '"'
);

Handling Date Formats

When importing dates, you might need to convert them:

sql
-- Create a table with a date column
CREATE TABLE events (
id SERIAL PRIMARY KEY,
event_name VARCHAR(100),
event_date DATE
);

-- Import with explicit date format conversion
CREATE TEMP TABLE temp_events (
id INTEGER,
event_name VARCHAR(100),
event_date VARCHAR(20) -- Store as string initially
);

COPY temp_events FROM '/tmp/events.csv' WITH (FORMAT csv, HEADER true);

-- Convert and insert into the main table
INSERT INTO events
SELECT id, event_name, TO_DATE(event_date, 'MM/DD/YYYY')
FROM temp_events;

DROP TABLE temp_events;

Exporting PostgreSQL Data to CSV

Now let's look at how to export data from PostgreSQL to CSV format.

Method 1: Using COPY TO

The COPY TO command exports query results to a file:

Basic Syntax:

sql
COPY (SELECT column1, column2, ... FROM table_name) 
TO '/path/to/output.csv'
WITH (FORMAT csv, HEADER true);

Example:

sql
-- Export all users
COPY (SELECT * FROM users)
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, HEADER true);

-- Export specific columns or filtered data
COPY (
SELECT name, email
FROM users
WHERE age > 30
)
TO '/tmp/filtered_users.csv'
WITH (FORMAT csv, HEADER true);

Method 2: Using \copy in psql

Similar to importing, \copy can be used to export data to the client machine:

sql
\copy (SELECT * FROM users) TO '/tmp/exported_users.csv' WITH (FORMAT csv, HEADER true);

Customizing Export Format

Changing Delimiters

sql
COPY (SELECT * FROM users) 
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, DELIMITER ';', HEADER true);

Custom NULL Representation

sql
COPY (SELECT * FROM users) 
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, HEADER true, NULL 'N/A');

Including or Excluding Headers

sql
-- With headers
COPY (SELECT * FROM users)
TO '/tmp/with_headers.csv'
WITH (FORMAT csv, HEADER true);

-- Without headers
COPY (SELECT * FROM users)
TO '/tmp/without_headers.csv'
WITH (FORMAT csv);

Real-World Examples

Let's explore some practical applications of CSV import and export in PostgreSQL.

Example 1: Data Migration Between Systems

Suppose you need to migrate user data from an old system to a new PostgreSQL database:

sql
-- Step 1: Create a table in the new system
CREATE TABLE new_users (
id SERIAL PRIMARY KEY,
full_name VARCHAR(100),
email VARCHAR(100),
signup_date DATE,
status VARCHAR(20)
);

-- Step 2: Import data from the old system's export
COPY new_users(id, full_name, email, signup_date, status)
FROM '/tmp/old_system_users.csv'
WITH (FORMAT csv, HEADER true);

-- Step 3: Verify the import
SELECT COUNT(*) FROM new_users;

Example 2: Data Transformation During Import

Sometimes you need to transform data during the import process:

sql
-- Create a temporary table matching the CSV structure
CREATE TEMP TABLE temp_sales (
date_str VARCHAR(20),
product_id INTEGER,
quantity INTEGER,
price NUMERIC(10,2)
);

-- Import raw data
COPY temp_sales FROM '/tmp/sales.csv' WITH (FORMAT csv, HEADER true);

-- Transform and insert into the main table
INSERT INTO sales (sale_date, product_id, quantity, price, total_amount)
SELECT
TO_DATE(date_str, 'YYYY-MM-DD'),
product_id,
quantity,
price,
quantity * price AS total_amount
FROM temp_sales;

-- Clean up
DROP TABLE temp_sales;

Example 3: Periodic Data Export for Reporting

For a reporting system that needs daily data exports:

sql
-- Create a function to export daily sales data
CREATE OR REPLACE FUNCTION export_daily_sales(date DATE) RETURNS void AS $$
BEGIN
EXECUTE format('COPY (
SELECT
product_name,
category,
SUM(quantity) AS units_sold,
SUM(price * quantity) AS total_revenue
FROM
sales
JOIN products ON sales.product_id = products.id
WHERE
sale_date = %L
GROUP BY
product_name, category
ORDER BY
total_revenue DESC
) TO ''/tmp/sales_report_%s.csv'' WITH (FORMAT csv, HEADER true)',
date, to_char(date, 'YYYY_MM_DD'));
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT export_daily_sales('2023-09-15');

Handling Large Datasets

When dealing with very large CSV files, consider these strategies:

1. Using COPY with FREEZE

For bulk loading into a new or truncated table:

sql
COPY large_table FROM '/path/to/large_file.csv' WITH (FORMAT csv, HEADER true, FREEZE);

The FREEZE option skips the VACUUM phase, making imports faster.

2. Batch Processing

For extremely large files, consider splitting them into smaller chunks:

bash
# In a shell script
split -l 1000000 large_file.csv chunk_

Then import each chunk:

sql
-- Create a temporary table for staging
CREATE TEMP TABLE staging_table (LIKE target_table);

-- Import each chunk
COPY staging_table FROM '/path/to/chunk_aa' WITH (FORMAT csv);
INSERT INTO target_table SELECT * FROM staging_table;
TRUNCATE staging_table;

-- Repeat for each chunk...

3. Using Foreign Data Wrapper

For very large files that you don't want to fully import:

sql
-- Create foreign data wrapper extension
CREATE EXTENSION file_fdw;

-- Create server
CREATE SERVER csv_server FOREIGN DATA WRAPPER file_fdw;

-- Create foreign table
CREATE FOREIGN TABLE foreign_users (
id INTEGER,
name VARCHAR(100),
email VARCHAR(100),
age INTEGER
) SERVER csv_server
OPTIONS (
filename '/path/to/large_file.csv',
format 'csv',
header 'true'
);

-- Query directly from CSV
SELECT * FROM foreign_users WHERE age > 30;

Performance Tips

To optimize CSV operations:

  1. Disable Constraints and Indexes Before Import:

    sql
    -- Disable triggers
    ALTER TABLE users DISABLE TRIGGER ALL;

    -- Drop indexes
    DROP INDEX users_email_idx;

    -- After import
    ALTER TABLE users ENABLE TRIGGER ALL;
    CREATE INDEX users_email_idx ON users(email);
  2. Increase Memory Settings:

    sql
    SET work_mem = '256MB';  -- Adjust based on your server capacity
  3. Use COPY Instead of INSERT for bulk operations.

  4. Consider Using UNLOGGED Tables for staging:

    sql
    CREATE UNLOGGED TABLE staging_users (LIKE users);

Troubleshooting Common Issues

Error: Permission Denied

When using COPY:

ERROR: could not open file "/path/to/file.csv" for reading: Permission denied

Solution: Either give the PostgreSQL user permission to read the file or use \copy instead:

sql
\copy users FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true);

Error: Invalid Input Syntax

When data types don't match:

ERROR: invalid input syntax for type integer: "N/A"

Solution: Handle NULL values and type conversions:

sql
COPY users FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true, NULL 'N/A');

Error: Extra Data After Last Expected Column

When CSV has more columns than the table:

ERROR: extra data after last expected column

Solution: Specify the columns explicitly:

sql
COPY users(id, name, email) FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true);

Summary

PostgreSQL offers powerful tools for working with CSV data:

  • Use COPY FROM or \copy to import CSV data into PostgreSQL tables
  • Use COPY TO or \copy to export query results to CSV files
  • Handle special cases like custom delimiters, NULL values, and quoted fields
  • Consider performance optimizations for large datasets
  • Troubleshoot common issues with permissions and data types

These techniques enable efficient data exchange between PostgreSQL and other systems, making it easier to integrate databases with various applications and data processing workflows.

Additional Resources

Exercises

  1. Create a table for storing product information and import data from a CSV file.
  2. Export filtered data from a PostgreSQL table to a CSV file with a custom delimiter.
  3. Create a script that automatically exports daily sales data to CSV files.
  4. Write a function that imports a CSV file while performing data transformations.
  5. Create a solution for incrementally importing large CSV files without duplicating data.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)