Skip to main content

Pandas Export Formatting

When exporting data from Pandas DataFrames to external files, you often need to control how the data is formatted. This guide covers various formatting options available when exporting Pandas DataFrames to different file formats like CSV, Excel, and JSON.

Introduction to Export Formatting

Pandas provides powerful options to customize the appearance, precision, and structure of your exported data. Whether you need to format dates, control decimal places, adjust column widths, or structure JSON output, Pandas offers numerous parameters to achieve the desired formatting.

Proper formatting ensures that:

  • Your data is readable and presentable
  • Numeric precision meets your requirements
  • Dates and times appear in the desired format
  • Files can be properly imported by other systems

Let's explore the formatting options available for different export methods.

Formatting CSV Exports

When exporting to CSV files using to_csv(), several parameters control the formatting of your output.

Basic CSV Formatting

python
import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000.75, 65000.50, 75000.25],
'Hired': pd.to_datetime(['2021-05-12', '2020-03-15', '2019-11-24'])
})

# Basic CSV export
df.to_csv('basic_export.csv')

Controlling Number Formats

You can control how numbers are formatted in CSV files:

python
# Format with 2 decimal places
df.to_csv('formatted_numbers.csv', float_format='%.2f')

Output in formatted_numbers.csv:

,Name,Age,Salary,Hired
0,Alice,25,50000.75,2021-05-12 00:00:00
1,Bob,30,65000.50,2020-03-15 00:00:00
2,Charlie,35,75000.25,2019-11-24 00:00:00

Date Formatting

Format dates when exporting:

python
# Format dates as MM/DD/YYYY
df.to_csv('formatted_dates.csv', date_format='%m/%d/%Y')

Output in formatted_dates.csv:

,Name,Age,Salary,Hired
0,Alice,25,50000.75,05/12/2021
1,Bob,30,65000.5,03/15/2020
2,Charlie,35,75000.25,11/24/2019

Other CSV Formatting Options

python
# Customize CSV export with multiple formatting options
df.to_csv('custom_format.csv',
float_format='%.1f', # One decimal place for floats
date_format='%Y-%m-%d', # ISO date format
index=False, # Don't include index
decimal=',', # Use comma as decimal separator
quoting=1) # Quote all non-numeric fields

Excel Export Formatting

Excel exports offer even more formatting options through the to_excel() method.

Basic Excel Formatting

python
# Basic Excel export
df.to_excel('basic_export.xlsx', sheet_name='Employee Data')

Number Formatting in Excel

python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
'Percentage': [0.25, 0.5, 0.75],
'Dollars': [1234.56, 2345.67, 3456.78],
'Scientific': [0.000001, 0.0000001, 0.00000001]
})

# Create a Pandas Excel writer using XlsxWriter as the engine
with pd.ExcelWriter('formatted_numbers.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1')

# Get the workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']

# Create format objects
percent_format = workbook.add_format({'num_format': '0.0%'})
currency_format = workbook.add_format({'num_format': '$#,##0.00'})
scientific_format = workbook.add_format({'num_format': '0.00E+00'})

# Apply formats to columns
worksheet.set_column('B:B', 10, percent_format) # Percentage column
worksheet.set_column('C:C', 12, currency_format) # Dollar column
worksheet.set_column('D:D', 14, scientific_format) # Scientific column

In the Excel file, the numbers will be formatted as:

  • Percentage column: 25.0%, 50.0%, 75.0%
  • Dollars column: 1,234.56,1,234.56, 2,345.67, $3,456.78
  • Scientific column: 1.00E-06, 1.00E-07, 1.00E-08

Styling and Conditional Formatting

For more advanced Excel formatting, you can use the styler API:

python
import pandas as pd

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Score': [85, 92, 78, 95],
'Grade': ['B', 'A', 'C', 'A']
})

# Apply styling
styled_df = df.style.background_gradient(subset=['Score'], cmap='viridis')

# Export with styling
styled_df.to_excel('styled_export.xlsx', engine='openpyxl')

JSON Export Formatting

When exporting to JSON, you can control the structure and indentation of your output.

Basic JSON Formatting

python
# Basic JSON export
df.to_json('basic_export.json')

Formatting JSON Structure

python
# Format as records (list of dictionaries)
df.to_json('records_format.json', orient='records')

Output in records_format.json:

json
[
{"Name":"Alice","Age":25,"Salary":50000.75,"Hired":"2021-05-12T00:00:00.000Z"},
{"Name":"Bob","Age":30,"Salary":65000.50,"Hired":"2020-03-15T00:00:00.000Z"},
{"Name":"Charlie","Age":35,"Salary":75000.25,"Hired":"2019-11-24T00:00:00.000Z"}
]

Pretty-Printed JSON

python
# Pretty-printed JSON with indentation
df.to_json('pretty_export.json', orient='records', indent=4)

Output in pretty_export.json:

json
[
{
"Name": "Alice",
"Age": 25,
"Salary": 50000.75,
"Hired": "2021-05-12T00:00:00.000Z"
},
{
"Name": "Bob",
"Age": 30,
"Salary": 65000.5,
"Hired": "2020-03-15T00:00:00.000Z"
},
{
"Name": "Charlie",
"Age": 35,
"Salary": 75000.25,
"Hired": "2019-11-24T00:00:00.000Z"
}
]

Date Formatting in JSON

python
# Format dates in ISO format
df.to_json('date_format.json', orient='records', date_format='iso')

# Format dates as epoch timestamps (milliseconds since 1970-01-01)
df.to_json('epoch_format.json', orient='records', date_format='epoch')

HTML Export Formatting

Pandas can also export DataFrames to HTML with various styling options.

python
# Basic HTML export
html = df.to_html()

# Export with styling
styled_html = df.style.highlight_max(axis=0).to_html()

# Save to file
with open('styled_table.html', 'w') as f:
f.write(styled_html)

Real-World Application: Creating a Financial Report

Let's create a comprehensive financial report with proper formatting across multiple formats:

python
import pandas as pd
import numpy as np
from datetime import datetime

# Create financial data
np.random.seed(42)
dates = pd.date_range(start='2023-01-01', periods=12, freq='M')
data = {
'Date': dates,
'Revenue': np.random.randint(50000, 100000, 12),
'Expenses': np.random.randint(30000, 60000, 12),
'Growth_Rate': np.random.random(12) * 0.2, # 0-20% growth
}

financial_df = pd.DataFrame(data)
financial_df['Profit'] = financial_df['Revenue'] - financial_df['Expenses']
financial_df['Profit_Margin'] = financial_df['Profit'] / financial_df['Revenue']

# Create Excel report with formatting
with pd.ExcelWriter('financial_report.xlsx', engine='xlsxwriter') as writer:
financial_df.to_excel(writer, sheet_name='Monthly Financials', index=False)

workbook = writer.book
worksheet = writer.sheets['Monthly Financials']

# Define formats
currency_format = workbook.add_format({'num_format': '$#,##0', 'align': 'right'})
date_format = workbook.add_format({'num_format': 'mmm-yyyy', 'align': 'center'})
percent_format = workbook.add_format({'num_format': '0.0%', 'align': 'right'})
header_format = workbook.add_format({
'bold': True,
'font_color': 'white',
'bg_color': '#4F81BD',
'align': 'center',
'border': 1
})

# Apply formats
worksheet.set_column('A:A', 12, date_format)
worksheet.set_column('B:D', 15, currency_format)
worksheet.set_column('E:E', 15, percent_format)

# Apply header format
for col_num, value in enumerate(financial_df.columns.values):
worksheet.write(0, col_num, value, header_format)

# Add chart
chart = workbook.add_chart({'type': 'line'})
chart.add_series({
'name': 'Revenue',
'categories': ['Monthly Financials', 1, 0, 12, 0],
'values': ['Monthly Financials', 1, 1, 12, 1],
})
chart.add_series({
'name': 'Expenses',
'values': ['Monthly Financials', 1, 2, 12, 2],
})
chart.set_title({'name': 'Revenue vs Expenses'})
chart.set_x_axis({'name': 'Month'})
chart.set_y_axis({'name': 'Amount ($)'})
worksheet.insert_chart('G2', chart)

# Create CSV report
financial_df.to_csv('financial_report.csv',
float_format='%.2f',
date_format='%b-%Y',
index=False)

# Create JSON report
financial_df.to_json('financial_report.json',
orient='records',
date_format='iso',
indent=4)

This example creates a financial report in three formats:

  1. An Excel file with proper currency, date and percentage formatting, plus a chart
  2. A CSV file with controlled decimal places and date formatting
  3. A JSON file with a readable structure and proper date format

Summary

Proper formatting when exporting Pandas DataFrames ensures your data is presented clearly and accurately. The formatting options vary by output format:

  • CSV exports: Control decimal precision, date formats, and various delimiters
  • Excel exports: Apply number formats, cell styling, conditional formatting, and charts
  • JSON exports: Control structure orientation, indentation, and date representation
  • HTML exports: Apply styling and highlighting for web presentation

By mastering these formatting options, you can create professional and readable data exports suited for various purposes, from data sharing to report generation.

Additional Resources

Exercises

  1. Create a DataFrame with different numeric columns (integers, decimals, percentages) and export it to CSV with appropriate formatting for each column type.

  2. Export a DataFrame to Excel with conditional formatting that highlights values above a certain threshold in green and below another threshold in red.

  3. Create a financial dashboard in Excel using Pandas and XlsxWriter with at least three different chart types.

  4. Compare the file size and readability of JSON exports using different orient parameters (records, split, index) and determine which is best for your specific use case.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)