Skip to main content

Pandas CSV Export

Introduction

Exporting data to CSV (Comma-Separated Values) files is one of the most common tasks when working with pandas. CSV files are widely used because they're simple, portable, and can be opened with a variety of applications, from spreadsheet software like Excel to text editors. In this tutorial, we'll explore how to export pandas DataFrames to CSV files using the to_csv() method, along with various options to customize the output according to your needs.

Basic CSV Export

Let's start with a simple example of exporting a DataFrame to a CSV file:

python
import pandas as pd

# Create a sample DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 34, 29, 42],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Salary': [50000, 60000, 55000, 70000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Export to CSV
df.to_csv('employees.csv')
print("\nDataFrame exported to 'employees.csv'")

Output:

Original DataFrame:
Name Age City Salary
0 John 28 New York 50000
1 Anna 34 Paris 60000
2 Peter 29 Berlin 55000
3 Linda 42 London 70000

DataFrame exported to 'employees.csv'

If you open the generated employees.csv file, you'll notice that pandas automatically includes an index column:

,Name,Age,City,Salary
0,John,28,New York,50000
1,Anna,34,Paris,60000
2,Peter,29,Berlin,55000
3,Linda,42,London,70000

Customizing CSV Export

Excluding the Index Column

Often, you don't want to include the DataFrame's index in the exported CSV. You can exclude it using the index parameter:

python
# Export without index
df.to_csv('employees_no_index.csv', index=False)

Now the CSV file will look like this:

Name,Age,City,Salary
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Changing the Delimiter

While CSV stands for "Comma-Separated Values," you might want to use a different delimiter:

python
# Use semicolon as delimiter
df.to_csv('employees_semicolon.csv', sep=';', index=False)

The output file will use semicolons instead of commas:

Name;Age;City;Salary
John;28;New York;50000
Anna;34;Paris;60000
Peter;29;Berlin;55000
Linda;42;London;70000

Customizing Headers

You can choose to exclude column headers or specify custom ones:

python
# Without headers
df.to_csv('employees_no_header.csv', index=False, header=False)

# With custom headers
df.to_csv('employees_custom_header.csv', index=False,
header=['FullName', 'Years', 'Location', 'Income'])

Output without headers:

John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Output with custom headers:

FullName,Years,Location,Income
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Handling Missing Values

Pandas allows you to specify how missing values should be represented in the CSV file:

python
# Create DataFrame with missing values
df_with_na = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter', None],
'Age': [28, None, 29, 42],
'City': ['New York', 'Paris', None, 'London'],
'Salary': [50000, 60000, None, 70000]
})

print("DataFrame with missing values:")
print(df_with_na)

# Export with default NA representation
df_with_na.to_csv('employees_with_na.csv', index=False)

# Export with custom NA representation
df_with_na.to_csv('employees_custom_na.csv', index=False, na_rep='MISSING')

Output:

DataFrame with missing values:
Name Age City Salary
0 John 28.0 New York 50000.0
1 Anna NaN Paris 60000.0
2 Peter 29.0 None NaN
3 None 42.0 London 70000.0

Default NA representation (empty fields):

Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,,Paris,60000.0
Peter,29.0,,
,42.0,London,70000.0

Custom NA representation:

Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,MISSING,Paris,60000.0
Peter,29.0,MISSING,MISSING
MISSING,42.0,London,70000.0

Working with Date and Time

When exporting data that includes dates and times, you might want to control the format:

python
import pandas as pd
from datetime import datetime

# Create a DataFrame with dates
date_df = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter'],
'Date Joined': [datetime(2020, 1, 15), datetime(2019, 5, 20), datetime(2021, 3, 10)],
'Last Login': pd.to_datetime(['2023-07-15 08:30:00', '2023-07-14 15:45:00', '2023-07-16 12:15:00'])
})

print("DataFrame with dates:")
print(date_df)

# Export with default date format
date_df.to_csv('employees_dates.csv', index=False)

# Export with custom date format
date_df.to_csv('employees_custom_dates.csv', index=False, date_format='%Y-%m-%d')

Output:

DataFrame with dates:
Name Date Joined Last Login
0 John 2020-01-15 2023-07-15 08:30:00
1 Anna 2019-05-20 2023-07-14 15:45:00
2 Peter 2021-03-10 2023-07-16 12:15:00

Custom date format output:

Name,Date Joined,Last Login
John,2020-01-15,2023-07-15
Anna,2019-05-20,2023-07-14
Peter,2021-03-10,2023-07-16

Handling File Encoding

When your data contains special characters or non-ASCII text, you need to consider file encoding:

python
# Create a DataFrame with special characters
special_df = pd.DataFrame({
'Name': ['José', 'Søren', 'François', 'Natalía'],
'Country': ['Spain', 'Denmark', 'France', 'Russia']
})

print("DataFrame with special characters:")
print(special_df)

# Export with UTF-8 encoding (recommended for international text)
special_df.to_csv('international_names.csv', index=False, encoding='utf-8')

# For older systems that might need a different encoding
special_df.to_csv('international_names_legacy.csv', index=False, encoding='latin-1')

Output:

DataFrame with special characters:
Name Country
0 José Spain
1 Søren Denmark
2 François France
3 Natalía Russia

Real-World Example: Exporting Filtered Data

Let's see a practical example where we filter data before exporting it:

python
import pandas as pd

# Load sample data (sales data)
sales_data = {
'Date': pd.date_range(start='2023-01-01', periods=20),
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Laptop',
'Phone', 'Laptop', 'Tablet', 'Monitor', 'Phone',
'Laptop', 'Phone', 'Tablet', 'Laptop', 'Monitor',
'Phone', 'Laptop', 'Tablet', 'Phone', 'Monitor'],
'Region': ['North', 'South', 'East', 'West', 'North',
'East', 'West', 'North', 'South', 'East',
'West', 'North', 'South', 'East', 'West',
'North', 'South', 'East', 'West', 'North'],
'Units': [10, 25, 15, 8, 12, 30, 14, 18, 7, 22,
9, 28, 13, 11, 6, 24, 16, 19, 27, 5],
'Price': [1200, 800, 500, 300, 1200,
750, 1250, 550, 320, 820,
1180, 780, 520, 1220, 310,
790, 1150, 530, 810, 330]
}

sales_df = pd.DataFrame(sales_data)
sales_df['Total'] = sales_df['Units'] * sales_df['Price']

print("Original Sales Data:")
print(sales_df.head())

# Filter data: Get only laptop sales
laptop_sales = sales_df[sales_df['Product'] == 'Laptop']
print("\nLaptop Sales:")
print(laptop_sales)

# Export filtered data
laptop_sales.to_csv('laptop_sales.csv', index=False)

# Filter by multiple conditions: Get high-value sales (>$10000)
high_value_sales = sales_df[sales_df['Total'] > 10000]
print("\nHigh Value Sales:")
print(high_value_sales)

# Export with specific columns
high_value_sales.to_csv('high_value_sales.csv',
index=False,
columns=['Date', 'Product', 'Region', 'Total'])

# Group data and export summary
summary = sales_df.groupby('Product')['Total'].sum().reset_index()
summary.to_csv('product_sales_summary.csv', index=False)
print("\nSales Summary by Product:")
print(summary)

Advanced CSV Export Options

Exporting a Subset of Columns

You can specify which columns to include in the export:

python
# Export only specific columns
df.to_csv('employees_subset.csv', columns=['Name', 'Salary'], index=False)

Output:

Name,Salary
John,50000
Anna,60000
Peter,55000
Linda,70000

Float Formatting

Control how floating-point numbers are displayed:

python
import numpy as np

# Create DataFrame with floating point numbers
float_df = pd.DataFrame({
'Name': ['Product A', 'Product B', 'Product C'],
'Price': [19.9999, 45.5, 32.33333],
'Rating': [4.7, 3.9, 4.2],
'Score': [0.967452, 0.782541, 0.845123]
})

print("DataFrame with floating point numbers:")
print(float_df)

# Export with default formatting
float_df.to_csv('products_default.csv', index=False)

# Export with specified float format
float_df.to_csv('products_formatted.csv', index=False, float_format='%.2f')

Output:

DataFrame with floating point numbers:
Name Price Rating Score
0 Product A 19.99990 4.7 0.967452
1 Product B 45.50000 3.9 0.782541
2 Product C 32.33333 4.2 0.845123

Formatted output:

Name,Price,Rating,Score
Product A,20.00,4.70,0.97
Product B,45.50,3.90,0.78
Product C,32.33,4.20,0.85

Quoting Options

Control how text fields are quoted:

python
import csv

# Create DataFrame with text that needs quoting
quote_df = pd.DataFrame({
'Product': ['Basic Laptop', 'Pro Laptop, with accessories', 'Tablet "Mini"'],
'Description': ['Standard model', 'Includes keyboard, mouse, and stand', 'Compact, 8" screen']
})

print("DataFrame with text requiring quotes:")
print(quote_df)

# Default quoting
quote_df.to_csv('products_default_quotes.csv', index=False)

# Quote all fields
quote_df.to_csv('products_all_quoted.csv', index=False, quoting=csv.QUOTE_ALL)

# Quote only non-numeric fields
quote_df.to_csv('products_nonnumeric_quoted.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

Compression

You can directly export to a compressed CSV file:

python
# Export to a gzipped CSV file
df.to_csv('employees_compressed.csv.gz', index=False, compression='gzip')

# Export to a zip file
df.to_csv('employees_compressed.csv.zip', index=False, compression='zip')

Writing to a Buffer Instead of a File

Sometimes you might want to write to a buffer instead of directly to a file:

python
from io import StringIO

# Write to string buffer
buffer = StringIO()
df.to_csv(buffer, index=False)

# Get the CSV as a string
csv_string = buffer.getvalue()
print("CSV as a string:")
print(csv_string[:100] + "...") # Showing just the first 100 chars

Summary

In this tutorial, we've explored how to export pandas DataFrames to CSV files using the to_csv() method. We've covered:

  • Basic export syntax with df.to_csv(filename)
  • Customizing export options (index, delimiter, headers)
  • Handling missing values with na_rep
  • Working with date and time formats
  • Managing file encodings for international text
  • Real-world examples with filtering and data preparation
  • Advanced options like column selection, float formatting, quoting, and compression

The ability to export data to CSV files is an essential skill for any data professional. CSV files provide a universal format that can be easily shared and imported into various applications.

Additional Resources

Exercises

  1. Create a DataFrame with at least 5 columns and 10 rows of sample data, then export it to a CSV file without the index.
  2. Import a CSV file, filter the data based on a condition, and export the filtered data to a new CSV file.
  3. Create a DataFrame with some missing values and export it with custom NA representation.
  4. Generate a DataFrame with date information and export it with a custom date format.
  5. Create a DataFrame with numerical data and export it with custom float formatting that shows exactly 3 decimal places.


If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)