Pandas CSV Export

Introduction

Exporting data to CSV (Comma-Separated Values) files is one of the most common tasks when working with pandas. CSV files are widely used because they're simple, portable, and can be opened with a variety of applications, from spreadsheet software like Excel to text editors. In this tutorial, we'll explore how to export pandas DataFrames to CSV files using the to_csv() method, along with various options to customize the output according to your needs.

Basic CSV Export

Let's start with a simple example of exporting a DataFrame to a CSV file:

python
import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 34, 29, 42],
    'City': ['New York', 'Paris', 'Berlin', 'London'],
    'Salary': [50000, 60000, 55000, 70000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Export to CSV
df.to_csv('employees.csv')
print("\nDataFrame exported to 'employees.csv'")

Output:

Original DataFrame:
    Name  Age     City  Salary
0   John   28  New York   50000
1   Anna   34     Paris   60000
2  Peter   29    Berlin   55000
3  Linda   42    London   70000

DataFrame exported to 'employees.csv'

If you open the generated employees.csv file, you'll notice that pandas automatically includes an index column:

,Name,Age,City,Salary
0,John,28,New York,50000
1,Anna,34,Paris,60000
2,Peter,29,Berlin,55000
3,Linda,42,London,70000

Customizing CSV Export

Excluding the Index Column

Often, you don't want to include the DataFrame's index in the exported CSV. You can exclude it using the index parameter:

python
# Export without index
df.to_csv('employees_no_index.csv', index=False)

Now the CSV file will look like this:

Name,Age,City,Salary
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Changing the Delimiter

While CSV stands for "Comma-Separated Values," you might want to use a different delimiter:

python
# Use semicolon as delimiter
df.to_csv('employees_semicolon.csv', sep=';', index=False)

The output file will use semicolons instead of commas:

Name;Age;City;Salary
John;28;New York;50000
Anna;34;Paris;60000
Peter;29;Berlin;55000
Linda;42;London;70000

Customizing Headers

You can choose to exclude column headers or specify custom ones:

python
# Without headers
df.to_csv('employees_no_header.csv', index=False, header=False)

# With custom headers
df.to_csv('employees_custom_header.csv', index=False, 
          header=['FullName', 'Years', 'Location', 'Income'])

Output without headers:

John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Output with custom headers:

FullName,Years,Location,Income
John,28,New York,50000
Anna,34,Paris,60000
Peter,29,Berlin,55000
Linda,42,London,70000

Handling Missing Values

Pandas allows you to specify how missing values should be represented in the CSV file:

python
# Create DataFrame with missing values
df_with_na = pd.DataFrame({
    'Name': ['John', 'Anna', 'Peter', None],
    'Age': [28, None, 29, 42],
    'City': ['New York', 'Paris', None, 'London'],
    'Salary': [50000, 60000, None, 70000]
})

print("DataFrame with missing values:")
print(df_with_na)

# Export with default NA representation
df_with_na.to_csv('employees_with_na.csv', index=False)

# Export with custom NA representation
df_with_na.to_csv('employees_custom_na.csv', index=False, na_rep='MISSING')

Output:

DataFrame with missing values:
    Name   Age     City  Salary
0   John  28.0  New York  50000.0
1   Anna   NaN     Paris  60000.0
2  Peter  29.0      None      NaN
3   None  42.0    London  70000.0

Default NA representation (empty fields):

Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,,Paris,60000.0
Peter,29.0,,
,42.0,London,70000.0

Custom NA representation:

Name,Age,City,Salary
John,28.0,New York,50000.0
Anna,MISSING,Paris,60000.0
Peter,29.0,MISSING,MISSING
MISSING,42.0,London,70000.0

Working with Date and Time

When exporting data that includes dates and times, you might want to control the format:

python
import pandas as pd
from datetime import datetime

# Create a DataFrame with dates
date_df = pd.DataFrame({
    'Name': ['John', 'Anna', 'Peter'],
    'Date Joined': [datetime(2020, 1, 15), datetime(2019, 5, 20), datetime(2021, 3, 10)],
    'Last Login': pd.to_datetime(['2023-07-15 08:30:00', '2023-07-14 15:45:00', '2023-07-16 12:15:00'])
})

print("DataFrame with dates:")
print(date_df)

# Export with default date format
date_df.to_csv('employees_dates.csv', index=False)

# Export with custom date format
date_df.to_csv('employees_custom_dates.csv', index=False, date_format='%Y-%m-%d')

Output:

DataFrame with dates:
    Name Date Joined           Last Login
0   John  2020-01-15 2023-07-15 08:30:00
1   Anna  2019-05-20 2023-07-14 15:45:00
2  Peter  2021-03-10 2023-07-16 12:15:00

Custom date format output:

Name,Date Joined,Last Login
John,2020-01-15,2023-07-15
Anna,2019-05-20,2023-07-14
Peter,2021-03-10,2023-07-16

Handling File Encoding

When your data contains special characters or non-ASCII text, you need to consider file encoding:

python
# Create a DataFrame with special characters
special_df = pd.DataFrame({
    'Name': ['José', 'Søren', 'François', 'Natalía'],
    'Country': ['Spain', 'Denmark', 'France', 'Russia']
})

print("DataFrame with special characters:")
print(special_df)

# Export with UTF-8 encoding (recommended for international text)
special_df.to_csv('international_names.csv', index=False, encoding='utf-8')

# For older systems that might need a different encoding
special_df.to_csv('international_names_legacy.csv', index=False, encoding='latin-1')

Output:

DataFrame with special characters:
       Name  Country
0      José    Spain
1     Søren  Denmark
2  François   France
3   Natalía   Russia

Real-World Example: Exporting Filtered Data

Let's see a practical example where we filter data before exporting it:

python
import pandas as pd

# Load sample data (sales data)
sales_data = {
    'Date': pd.date_range(start='2023-01-01', periods=20),
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Laptop',
               'Phone', 'Laptop', 'Tablet', 'Monitor', 'Phone',
               'Laptop', 'Phone', 'Tablet', 'Laptop', 'Monitor',
               'Phone', 'Laptop', 'Tablet', 'Phone', 'Monitor'],
    'Region': ['North', 'South', 'East', 'West', 'North',
              'East', 'West', 'North', 'South', 'East',
              'West', 'North', 'South', 'East', 'West',
              'North', 'South', 'East', 'West', 'North'],
    'Units': [10, 25, 15, 8, 12, 30, 14, 18, 7, 22, 
             9, 28, 13, 11, 6, 24, 16, 19, 27, 5],
    'Price': [1200, 800, 500, 300, 1200, 
             750, 1250, 550, 320, 820, 
             1180, 780, 520, 1220, 310, 
             790, 1150, 530, 810, 330]
}

sales_df = pd.DataFrame(sales_data)
sales_df['Total'] = sales_df['Units'] * sales_df['Price']

print("Original Sales Data:")
print(sales_df.head())

# Filter data: Get only laptop sales
laptop_sales = sales_df[sales_df['Product'] == 'Laptop']
print("\nLaptop Sales:")
print(laptop_sales)

# Export filtered data
laptop_sales.to_csv('laptop_sales.csv', index=False)

# Filter by multiple conditions: Get high-value sales (>$10000)
high_value_sales = sales_df[sales_df['Total'] > 10000]
print("\nHigh Value Sales:")
print(high_value_sales)

# Export with specific columns
high_value_sales.to_csv('high_value_sales.csv', 
                       index=False,
                       columns=['Date', 'Product', 'Region', 'Total'])

# Group data and export summary
summary = sales_df.groupby('Product')['Total'].sum().reset_index()
summary.to_csv('product_sales_summary.csv', index=False)
print("\nSales Summary by Product:")
print(summary)

Advanced CSV Export Options

Exporting a Subset of Columns

You can specify which columns to include in the export:

python
# Export only specific columns
df.to_csv('employees_subset.csv', columns=['Name', 'Salary'], index=False)

Output:

Name,Salary
John,50000
Anna,60000
Peter,55000
Linda,70000

Float Formatting

Control how floating-point numbers are displayed:

python
import numpy as np

# Create DataFrame with floating point numbers
float_df = pd.DataFrame({
    'Name': ['Product A', 'Product B', 'Product C'],
    'Price': [19.9999, 45.5, 32.33333],
    'Rating': [4.7, 3.9, 4.2],
    'Score': [0.967452, 0.782541, 0.845123]
})

print("DataFrame with floating point numbers:")
print(float_df)

# Export with default formatting
float_df.to_csv('products_default.csv', index=False)

# Export with specified float format
float_df.to_csv('products_formatted.csv', index=False, float_format='%.2f')

Output:

DataFrame with floating point numbers:
        Name     Price  Rating     Score
0  Product A  19.99990     4.7  0.967452
1  Product B  45.50000     3.9  0.782541
2  Product C  32.33333     4.2  0.845123

Formatted output:

Name,Price,Rating,Score
Product A,20.00,4.70,0.97
Product B,45.50,3.90,0.78
Product C,32.33,4.20,0.85

Quoting Options

Control how text fields are quoted:

python
import csv

# Create DataFrame with text that needs quoting
quote_df = pd.DataFrame({
    'Product': ['Basic Laptop', 'Pro Laptop, with accessories', 'Tablet "Mini"'],
    'Description': ['Standard model', 'Includes keyboard, mouse, and stand', 'Compact, 8" screen']
})

print("DataFrame with text requiring quotes:")
print(quote_df)

# Default quoting
quote_df.to_csv('products_default_quotes.csv', index=False)

# Quote all fields
quote_df.to_csv('products_all_quoted.csv', index=False, quoting=csv.QUOTE_ALL)

# Quote only non-numeric fields
quote_df.to_csv('products_nonnumeric_quoted.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

Compression

You can directly export to a compressed CSV file:

python
# Export to a gzipped CSV file
df.to_csv('employees_compressed.csv.gz', index=False, compression='gzip')

# Export to a zip file
df.to_csv('employees_compressed.csv.zip', index=False, compression='zip')

Writing to a Buffer Instead of a File

Sometimes you might want to write to a buffer instead of directly to a file:

python
from io import StringIO

# Write to string buffer
buffer = StringIO()
df.to_csv(buffer, index=False)

# Get the CSV as a string
csv_string = buffer.getvalue()
print("CSV as a string:")
print(csv_string[:100] + "...")  # Showing just the first 100 chars

Summary

In this tutorial, we've explored how to export pandas DataFrames to CSV files using the to_csv() method. We've covered:

Basic export syntax with df.to_csv(filename)
Customizing export options (index, delimiter, headers)
Handling missing values with na_rep
Working with date and time formats
Managing file encodings for international text
Real-world examples with filtering and data preparation
Advanced options like column selection, float formatting, quoting, and compression

The ability to export data to CSV files is an essential skill for any data professional. CSV files provide a universal format that can be easily shared and imported into various applications.

Additional Resources

Exercises

Create a DataFrame with at least 5 columns and 10 rows of sample data, then export it to a CSV file without the index.
Import a CSV file, filter the data based on a condition, and export the filtered data to a new CSV file.
Create a DataFrame with some missing values and export it with custom NA representation.
Generate a DataFrame with date information and export it with a custom date format.
Create a DataFrame with numerical data and export it with custom float formatting that shows exactly 3 decimal places.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic CSV Export​

Customizing CSV Export​

Excluding the Index Column​

Changing the Delimiter​

Customizing Headers​

Handling Missing Values​

Working with Date and Time​

Handling File Encoding​

Real-World Example: Exporting Filtered Data​

Advanced CSV Export Options​

Exporting a Subset of Columns​

Float Formatting​

Quoting Options​

Compression​

Writing to a Buffer Instead of a File​

Summary​

Additional Resources​

Exercises​

Introduction

Basic CSV Export

Customizing CSV Export

Excluding the Index Column

Changing the Delimiter

Customizing Headers

Handling Missing Values

Working with Date and Time

Handling File Encoding

Real-World Example: Exporting Filtered Data

Advanced CSV Export Options

Exporting a Subset of Columns

Float Formatting

Quoting Options

Compression

Writing to a Buffer Instead of a File

Summary

Additional Resources

Exercises