PostgreSQL CSV Import Export
Introduction
Working with data often involves transferring information between different systems and formats. CSV (Comma-Separated Values) is one of the most common formats for exchanging data due to its simplicity and widespread support. PostgreSQL, a powerful open-source relational database, provides robust tools for importing data from CSV files and exporting query results to CSV format.
In this tutorial, we'll explore how to:
- Import CSV data into PostgreSQL tables
- Export PostgreSQL data to CSV files
- Handle common challenges and edge cases
- Use practical techniques in real-world scenarios
Understanding CSV Format
Before diving into import and export operations, let's understand the CSV format:
- CSV files store tabular data as plain text
- Each line represents a row of data
- Fields within a row are separated by commas (or sometimes other delimiters)
- Fields may be enclosed in quotes, especially if they contain commas or line breaks
A simple CSV file might look like this:
id,name,email,age
1,John Doe,[email protected],32
2,Jane Smith,[email protected],28
3,Bob Johnson,"[email protected]",45
Importing CSV Data into PostgreSQL
PostgreSQL offers several methods to import CSV data, each with its own advantages:
Method 1: Using the COPY Command
The COPY
command is PostgreSQL's native method for fast data import and export. It requires direct file system access.
Basic Syntax:
COPY table_name(column1, column2, ...)
FROM '/path/to/your/file.csv'
WITH (FORMAT csv, HEADER true);
Example:
Let's create a table and import data from a CSV file:
-- Create a table to store user data
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
age INTEGER
);
-- Import data from CSV file
COPY users(id, name, email, age)
FROM '/tmp/users.csv'
WITH (FORMAT csv, HEADER true);
This will read the CSV file at /tmp/users.csv
, skip the header row, and insert the data into the users
table.
Method 2: Using \copy in psql
The \copy
command in psql is similar to COPY
, but it doesn't require server file system access. Instead, it reads the file from the client machine and transfers the data to the server.
Basic Syntax:
\copy table_name(column1, column2, ...) FROM '/path/to/your/file.csv' WITH (FORMAT csv, HEADER true);
Example:
\copy users(id, name, email, age) FROM '/tmp/users.csv' WITH (FORMAT csv, HEADER true);
Method 3: Using COPY FROM STDIN
For applications that need programmatic control, you can use COPY FROM STDIN
:
COPY users(id, name, email, age) FROM STDIN WITH (FORMAT csv, HEADER true);
-- Here you would provide the CSV data
-- End with a backslash-period on a line by itself
\.
Handling Common Import Challenges
Dealing with Custom Delimiters
If your CSV uses a delimiter other than a comma (such as tabs or semicolons):
COPY users FROM '/tmp/users.csv' WITH (FORMAT csv, DELIMITER ';', HEADER true);
Handling NULL Values
Specify how NULL values are represented in your CSV:
COPY users FROM '/tmp/users.csv' WITH (
FORMAT csv,
HEADER true,
NULL 'NA'
);
Dealing with Quoted Fields
For CSV files with quoted fields:
COPY users FROM '/tmp/users.csv' WITH (
FORMAT csv,
HEADER true,
QUOTE '"'
);
Handling Date Formats
When importing dates, you might need to convert them:
-- Create a table with a date column
CREATE TABLE events (
id SERIAL PRIMARY KEY,
event_name VARCHAR(100),
event_date DATE
);
-- Import with explicit date format conversion
CREATE TEMP TABLE temp_events (
id INTEGER,
event_name VARCHAR(100),
event_date VARCHAR(20) -- Store as string initially
);
COPY temp_events FROM '/tmp/events.csv' WITH (FORMAT csv, HEADER true);
-- Convert and insert into the main table
INSERT INTO events
SELECT id, event_name, TO_DATE(event_date, 'MM/DD/YYYY')
FROM temp_events;
DROP TABLE temp_events;
Exporting PostgreSQL Data to CSV
Now let's look at how to export data from PostgreSQL to CSV format.
Method 1: Using COPY TO
The COPY TO
command exports query results to a file:
Basic Syntax:
COPY (SELECT column1, column2, ... FROM table_name)
TO '/path/to/output.csv'
WITH (FORMAT csv, HEADER true);
Example:
-- Export all users
COPY (SELECT * FROM users)
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, HEADER true);
-- Export specific columns or filtered data
COPY (
SELECT name, email
FROM users
WHERE age > 30
)
TO '/tmp/filtered_users.csv'
WITH (FORMAT csv, HEADER true);
Method 2: Using \copy in psql
Similar to importing, \copy
can be used to export data to the client machine:
\copy (SELECT * FROM users) TO '/tmp/exported_users.csv' WITH (FORMAT csv, HEADER true);
Customizing Export Format
Changing Delimiters
COPY (SELECT * FROM users)
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, DELIMITER ';', HEADER true);
Custom NULL Representation
COPY (SELECT * FROM users)
TO '/tmp/exported_users.csv'
WITH (FORMAT csv, HEADER true, NULL 'N/A');
Including or Excluding Headers
-- With headers
COPY (SELECT * FROM users)
TO '/tmp/with_headers.csv'
WITH (FORMAT csv, HEADER true);
-- Without headers
COPY (SELECT * FROM users)
TO '/tmp/without_headers.csv'
WITH (FORMAT csv);
Real-World Examples
Let's explore some practical applications of CSV import and export in PostgreSQL.
Example 1: Data Migration Between Systems
Suppose you need to migrate user data from an old system to a new PostgreSQL database:
-- Step 1: Create a table in the new system
CREATE TABLE new_users (
id SERIAL PRIMARY KEY,
full_name VARCHAR(100),
email VARCHAR(100),
signup_date DATE,
status VARCHAR(20)
);
-- Step 2: Import data from the old system's export
COPY new_users(id, full_name, email, signup_date, status)
FROM '/tmp/old_system_users.csv'
WITH (FORMAT csv, HEADER true);
-- Step 3: Verify the import
SELECT COUNT(*) FROM new_users;
Example 2: Data Transformation During Import
Sometimes you need to transform data during the import process:
-- Create a temporary table matching the CSV structure
CREATE TEMP TABLE temp_sales (
date_str VARCHAR(20),
product_id INTEGER,
quantity INTEGER,
price NUMERIC(10,2)
);
-- Import raw data
COPY temp_sales FROM '/tmp/sales.csv' WITH (FORMAT csv, HEADER true);
-- Transform and insert into the main table
INSERT INTO sales (sale_date, product_id, quantity, price, total_amount)
SELECT
TO_DATE(date_str, 'YYYY-MM-DD'),
product_id,
quantity,
price,
quantity * price AS total_amount
FROM temp_sales;
-- Clean up
DROP TABLE temp_sales;
Example 3: Periodic Data Export for Reporting
For a reporting system that needs daily data exports:
-- Create a function to export daily sales data
CREATE OR REPLACE FUNCTION export_daily_sales(date DATE) RETURNS void AS $$
BEGIN
EXECUTE format('COPY (
SELECT
product_name,
category,
SUM(quantity) AS units_sold,
SUM(price * quantity) AS total_revenue
FROM
sales
JOIN products ON sales.product_id = products.id
WHERE
sale_date = %L
GROUP BY
product_name, category
ORDER BY
total_revenue DESC
) TO ''/tmp/sales_report_%s.csv'' WITH (FORMAT csv, HEADER true)',
date, to_char(date, 'YYYY_MM_DD'));
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT export_daily_sales('2023-09-15');
Handling Large Datasets
When dealing with very large CSV files, consider these strategies:
1. Using COPY with FREEZE
For bulk loading into a new or truncated table:
COPY large_table FROM '/path/to/large_file.csv' WITH (FORMAT csv, HEADER true, FREEZE);
The FREEZE
option skips the VACUUM phase, making imports faster.
2. Batch Processing
For extremely large files, consider splitting them into smaller chunks:
# In a shell script
split -l 1000000 large_file.csv chunk_
Then import each chunk:
-- Create a temporary table for staging
CREATE TEMP TABLE staging_table (LIKE target_table);
-- Import each chunk
COPY staging_table FROM '/path/to/chunk_aa' WITH (FORMAT csv);
INSERT INTO target_table SELECT * FROM staging_table;
TRUNCATE staging_table;
-- Repeat for each chunk...
3. Using Foreign Data Wrapper
For very large files that you don't want to fully import:
-- Create foreign data wrapper extension
CREATE EXTENSION file_fdw;
-- Create server
CREATE SERVER csv_server FOREIGN DATA WRAPPER file_fdw;
-- Create foreign table
CREATE FOREIGN TABLE foreign_users (
id INTEGER,
name VARCHAR(100),
email VARCHAR(100),
age INTEGER
) SERVER csv_server
OPTIONS (
filename '/path/to/large_file.csv',
format 'csv',
header 'true'
);
-- Query directly from CSV
SELECT * FROM foreign_users WHERE age > 30;
Performance Tips
To optimize CSV operations:
-
Disable Constraints and Indexes Before Import:
sql-- Disable triggers
ALTER TABLE users DISABLE TRIGGER ALL;
-- Drop indexes
DROP INDEX users_email_idx;
-- After import
ALTER TABLE users ENABLE TRIGGER ALL;
CREATE INDEX users_email_idx ON users(email); -
Increase Memory Settings:
sqlSET work_mem = '256MB'; -- Adjust based on your server capacity
-
Use COPY Instead of INSERT for bulk operations.
-
Consider Using UNLOGGED Tables for staging:
sqlCREATE UNLOGGED TABLE staging_users (LIKE users);
Troubleshooting Common Issues
Error: Permission Denied
When using COPY
:
ERROR: could not open file "/path/to/file.csv" for reading: Permission denied
Solution: Either give the PostgreSQL user permission to read the file or use \copy
instead:
\copy users FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true);
Error: Invalid Input Syntax
When data types don't match:
ERROR: invalid input syntax for type integer: "N/A"
Solution: Handle NULL values and type conversions:
COPY users FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true, NULL 'N/A');
Error: Extra Data After Last Expected Column
When CSV has more columns than the table:
ERROR: extra data after last expected column
Solution: Specify the columns explicitly:
COPY users(id, name, email) FROM '/path/to/file.csv' WITH (FORMAT csv, HEADER true);
Summary
PostgreSQL offers powerful tools for working with CSV data:
- Use
COPY FROM
or\copy
to import CSV data into PostgreSQL tables - Use
COPY TO
or\copy
to export query results to CSV files - Handle special cases like custom delimiters, NULL values, and quoted fields
- Consider performance optimizations for large datasets
- Troubleshoot common issues with permissions and data types
These techniques enable efficient data exchange between PostgreSQL and other systems, making it easier to integrate databases with various applications and data processing workflows.
Additional Resources
Exercises
- Create a table for storing product information and import data from a CSV file.
- Export filtered data from a PostgreSQL table to a CSV file with a custom delimiter.
- Create a script that automatically exports daily sales data to CSV files.
- Write a function that imports a CSV file while performing data transformations.
- Create a solution for incrementally importing large CSV files without duplicating data.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)