Pandas CSV Export
Introduction
Exporting data to CSV (Comma-Separated Values) files is one of the most common tasks when working with pandas. CSV files are widely used because they're simple, portable, and can be opened with a variety of applications, from spreadsheet software like Excel to text editors. In this tutorial, we'll explore how to export pandas DataFrames to CSV files using the to_csv()
method, along with various options to customize the output according to your needs.
Basic CSV Export
Let's start with a simple example of exporting a DataFrame to a CSV file:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 34, 29, 42],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Salary': [50000, 60000, 55000, 70000]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Export to CSV
df.to_csv('employees.csv')
print("\nDataFrame exported to 'employees.csv'")
Output:
Original DataFrame:
Name Age City Salary
0 John 28 New York 50000
1 Anna 34 Paris 60000
2 Peter 29 Berlin 55000
3 Linda 42 London 70000
DataFrame exported to 'employees.csv'
If you open the generated employees.csv
file, you'll notice that pandas automatically includes an index column:
,Name,Age,City,Salary
0,John,28,New York,50000
1,Anna,34,Paris,60000
2,Peter,29,Berlin,55000
3,Linda,42,London,70000
Customizing CSV Export
Excluding the Index Column
Often, you don't want to include the DataFrame's index in the exported CSV. You can exclude it using the index
parameter:
# Export without index
df.to_csv('employees_no_index.csv', index=False)
Now the CSV file will look like this:
Name,Age,City,Salary
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000
Changing the Delimiter
While CSV stands for "Comma-Separated Values," you might want to use a different delimiter:
# Use semicolon as delimiter
df.to_csv('employees_semicolon.csv', sep=';', index=False)
The output file will use semicolons instead of commas:
Name;Age;City;Salary
John;28;New York;50000
Anna;34;Paris;60000
Peter;29;Berlin;55000
Linda;42;London;70000
Customizing Headers
You can choose to exclude column headers or specify custom ones:
# Without headers
df.to_csv('employees_no_header.csv', index=False, header=False)
# With custom headers
df.to_csv('employees_custom_header.csv', index=False,
header=['FullName', 'Years', 'Location', 'Income'])
Output without headers:
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000
Output with custom headers:
FullName,Years,Location,Income
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000
Handling Missing Values
Pandas allows you to specify how missing values should be represented in the CSV file:
# Create DataFrame with missing values
df_with_na = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter', None],
'Age': [28, None, 29, 42],
'City': ['New York', 'Paris', None, 'London'],
'Salary': [50000, 60000, None, 70000]
})
print("DataFrame with missing values:")
print(df_with_na)
# Export with default NA representation
df_with_na.to_csv('employees_with_na.csv', index=False)
# Export with custom NA representation
df_with_na.to_csv('employees_custom_na.csv', index=False, na_rep='MISSING')
Output:
DataFrame with missing values:
Name Age City Salary
0 John 28.0 New York 50000.0
1 Anna NaN Paris 60000.0
2 Peter 29.0 None NaN
3 None 42.0 London 70000.0
Default NA representation (empty fields):
Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,,Paris,60000.0
Peter,29.0,,
,42.0,London,70000.0
Custom NA representation:
Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,MISSING,Paris,60000.0
Peter,29.0,MISSING,MISSING
MISSING,42.0,London,70000.0
Working with Date and Time
When exporting data that includes dates and times, you might want to control the format:
import pandas as pd
from datetime import datetime
# Create a DataFrame with dates
date_df = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter'],
'Date Joined': [datetime(2020, 1, 15), datetime(2019, 5, 20), datetime(2021, 3, 10)],
'Last Login': pd.to_datetime(['2023-07-15 08:30:00', '2023-07-14 15:45:00', '2023-07-16 12:15:00'])
})
print("DataFrame with dates:")
print(date_df)
# Export with default date format
date_df.to_csv('employees_dates.csv', index=False)
# Export with custom date format
date_df.to_csv('employees_custom_dates.csv', index=False, date_format='%Y-%m-%d')
Output:
DataFrame with dates:
Name Date Joined Last Login
0 John 2020-01-15 2023-07-15 08:30:00
1 Anna 2019-05-20 2023-07-14 15:45:00
2 Peter 2021-03-10 2023-07-16 12:15:00
Custom date format output:
Name,Date Joined,Last Login
John,2020-01-15,2023-07-15
Anna,2019-05-20,2023-07-14
Peter,2021-03-10,2023-07-16
Handling File Encoding
When your data contains special characters or non-ASCII text, you need to consider file encoding:
# Create a DataFrame with special characters
special_df = pd.DataFrame({
'Name': ['José', 'Søren', 'François', 'Natalía'],
'Country': ['Spain', 'Denmark', 'France', 'Russia']
})
print("DataFrame with special characters:")
print(special_df)
# Export with UTF-8 encoding (recommended for international text)
special_df.to_csv('international_names.csv', index=False, encoding='utf-8')
# For older systems that might need a different encoding
special_df.to_csv('international_names_legacy.csv', index=False, encoding='latin-1')
Output:
DataFrame with special characters:
Name Country
0 José Spain
1 Søren Denmark
2 François France
3 Natalía Russia
Real-World Example: Exporting Filtered Data
Let's see a practical example where we filter data before exporting it:
import pandas as pd
# Load sample data (sales data)
sales_data = {
'Date': pd.date_range(start='2023-01-01', periods=20),
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Laptop',
'Phone', 'Laptop', 'Tablet', 'Monitor', 'Phone',
'Laptop', 'Phone', 'Tablet', 'Laptop', 'Monitor',
'Phone', 'Laptop', 'Tablet', 'Phone', 'Monitor'],
'Region': ['North', 'South', 'East', 'West', 'North',
'East', 'West', 'North', 'South', 'East',
'West', 'North', 'South', 'East', 'West',
'North', 'South', 'East', 'West', 'North'],
'Units': [10, 25, 15, 8, 12, 30, 14, 18, 7, 22,
9, 28, 13, 11, 6, 24, 16, 19, 27, 5],
'Price': [1200, 800, 500, 300, 1200,
750, 1250, 550, 320, 820,
1180, 780, 520, 1220, 310,
790, 1150, 530, 810, 330]
}
sales_df = pd.DataFrame(sales_data)
sales_df['Total'] = sales_df['Units'] * sales_df['Price']
print("Original Sales Data:")
print(sales_df.head())
# Filter data: Get only laptop sales
laptop_sales = sales_df[sales_df['Product'] == 'Laptop']
print("\nLaptop Sales:")
print(laptop_sales)
# Export filtered data
laptop_sales.to_csv('laptop_sales.csv', index=False)
# Filter by multiple conditions: Get high-value sales (>$10000)
high_value_sales = sales_df[sales_df['Total'] > 10000]
print("\nHigh Value Sales:")
print(high_value_sales)
# Export with specific columns
high_value_sales.to_csv('high_value_sales.csv',
index=False,
columns=['Date', 'Product', 'Region', 'Total'])
# Group data and export summary
summary = sales_df.groupby('Product')['Total'].sum().reset_index()
summary.to_csv('product_sales_summary.csv', index=False)
print("\nSales Summary by Product:")
print(summary)
Advanced CSV Export Options
Exporting a Subset of Columns
You can specify which columns to include in the export:
# Export only specific columns
df.to_csv('employees_subset.csv', columns=['Name', 'Salary'], index=False)
Output:
Name,Salary
John,50000
Anna,60000
Peter,55000
Linda,70000
Float Formatting
Control how floating-point numbers are displayed:
import numpy as np
# Create DataFrame with floating point numbers
float_df = pd.DataFrame({
'Name': ['Product A', 'Product B', 'Product C'],
'Price': [19.9999, 45.5, 32.33333],
'Rating': [4.7, 3.9, 4.2],
'Score': [0.967452, 0.782541, 0.845123]
})
print("DataFrame with floating point numbers:")
print(float_df)
# Export with default formatting
float_df.to_csv('products_default.csv', index=False)
# Export with specified float format
float_df.to_csv('products_formatted.csv', index=False, float_format='%.2f')
Output:
DataFrame with floating point numbers:
Name Price Rating Score
0 Product A 19.99990 4.7 0.967452
1 Product B 45.50000 3.9 0.782541
2 Product C 32.33333 4.2 0.845123
Formatted output:
Name,Price,Rating,Score
Product A,20.00,4.70,0.97
Product B,45.50,3.90,0.78
Product C,32.33,4.20,0.85
Quoting Options
Control how text fields are quoted:
import csv
# Create DataFrame with text that needs quoting
quote_df = pd.DataFrame({
'Product': ['Basic Laptop', 'Pro Laptop, with accessories', 'Tablet "Mini"'],
'Description': ['Standard model', 'Includes keyboard, mouse, and stand', 'Compact, 8" screen']
})
print("DataFrame with text requiring quotes:")
print(quote_df)
# Default quoting
quote_df.to_csv('products_default_quotes.csv', index=False)
# Quote all fields
quote_df.to_csv('products_all_quoted.csv', index=False, quoting=csv.QUOTE_ALL)
# Quote only non-numeric fields
quote_df.to_csv('products_nonnumeric_quoted.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
Compression
You can directly export to a compressed CSV file:
# Export to a gzipped CSV file
df.to_csv('employees_compressed.csv.gz', index=False, compression='gzip')
# Export to a zip file
df.to_csv('employees_compressed.csv.zip', index=False, compression='zip')
Writing to a Buffer Instead of a File
Sometimes you might want to write to a buffer instead of directly to a file:
from io import StringIO
# Write to string buffer
buffer = StringIO()
df.to_csv(buffer, index=False)
# Get the CSV as a string
csv_string = buffer.getvalue()
print("CSV as a string:")
print(csv_string[:100] + "...") # Showing just the first 100 chars
Summary
In this tutorial, we've explored how to export pandas DataFrames to CSV files using the to_csv()
method. We've covered:
- Basic export syntax with
df.to_csv(filename)
- Customizing export options (index, delimiter, headers)
- Handling missing values with
na_rep
- Working with date and time formats
- Managing file encodings for international text
- Real-world examples with filtering and data preparation
- Advanced options like column selection, float formatting, quoting, and compression
The ability to export data to CSV files is an essential skill for any data professional. CSV files provide a universal format that can be easily shared and imported into various applications.
Additional Resources
- Pandas to_csv() official documentation
- CSV file format specification
- Common CSV issues and solutions
Exercises
- Create a DataFrame with at least 5 columns and 10 rows of sample data, then export it to a CSV file without the index.
- Import a CSV file, filter the data based on a condition, and export the filtered data to a new CSV file.
- Create a DataFrame with some missing values and export it with custom NA representation.
- Generate a DataFrame with date information and export it with a custom date format.
- Create a DataFrame with numerical data and export it with custom float formatting that shows exactly 3 decimal places.
If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)