Pandas Export Formatting
When exporting data from Pandas DataFrames to external files, you often need to control how the data is formatted. This guide covers various formatting options available when exporting Pandas DataFrames to different file formats like CSV, Excel, and JSON.
Introduction to Export Formatting
Pandas provides powerful options to customize the appearance, precision, and structure of your exported data. Whether you need to format dates, control decimal places, adjust column widths, or structure JSON output, Pandas offers numerous parameters to achieve the desired formatting.
Proper formatting ensures that:
- Your data is readable and presentable
- Numeric precision meets your requirements
- Dates and times appear in the desired format
- Files can be properly imported by other systems
Let's explore the formatting options available for different export methods.
Formatting CSV Exports
When exporting to CSV files using to_csv()
, several parameters control the formatting of your output.
Basic CSV Formatting
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000.75, 65000.50, 75000.25],
'Hired': pd.to_datetime(['2021-05-12', '2020-03-15', '2019-11-24'])
})
# Basic CSV export
df.to_csv('basic_export.csv')
Controlling Number Formats
You can control how numbers are formatted in CSV files:
# Format with 2 decimal places
df.to_csv('formatted_numbers.csv', float_format='%.2f')
Output in formatted_numbers.csv
:
,Name,Age,Salary,Hired
0,Alice,25,50000.75,2021-05-12 00:00:00
1,Bob,30,65000.50,2020-03-15 00:00:00
2,Charlie,35,75000.25,2019-11-24 00:00:00
Date Formatting
Format dates when exporting:
# Format dates as MM/DD/YYYY
df.to_csv('formatted_dates.csv', date_format='%m/%d/%Y')
Output in formatted_dates.csv
:
,Name,Age,Salary,Hired
0,Alice,25,50000.75,05/12/2021
1,Bob,30,65000.5,03/15/2020
2,Charlie,35,75000.25,11/24/2019
Other CSV Formatting Options
# Customize CSV export with multiple formatting options
df.to_csv('custom_format.csv',
float_format='%.1f', # One decimal place for floats
date_format='%Y-%m-%d', # ISO date format
index=False, # Don't include index
decimal=',', # Use comma as decimal separator
quoting=1) # Quote all non-numeric fields
Excel Export Formatting
Excel exports offer even more formatting options through the to_excel()
method.
Basic Excel Formatting
# Basic Excel export
df.to_excel('basic_export.xlsx', sheet_name='Employee Data')
Number Formatting in Excel
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'Percentage': [0.25, 0.5, 0.75],
'Dollars': [1234.56, 2345.67, 3456.78],
'Scientific': [0.000001, 0.0000001, 0.00000001]
})
# Create a Pandas Excel writer using XlsxWriter as the engine
with pd.ExcelWriter('formatted_numbers.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1')
# Get the workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create format objects
percent_format = workbook.add_format({'num_format': '0.0%'})
currency_format = workbook.add_format({'num_format': '$#,##0.00'})
scientific_format = workbook.add_format({'num_format': '0.00E+00'})
# Apply formats to columns
worksheet.set_column('B:B', 10, percent_format) # Percentage column
worksheet.set_column('C:C', 12, currency_format) # Dollar column
worksheet.set_column('D:D', 14, scientific_format) # Scientific column
In the Excel file, the numbers will be formatted as:
- Percentage column: 25.0%, 50.0%, 75.0%
- Dollars column: 2,345.67, $3,456.78
- Scientific column: 1.00E-06, 1.00E-07, 1.00E-08
Styling and Conditional Formatting
For more advanced Excel formatting, you can use the styler
API:
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Score': [85, 92, 78, 95],
'Grade': ['B', 'A', 'C', 'A']
})
# Apply styling
styled_df = df.style.background_gradient(subset=['Score'], cmap='viridis')
# Export with styling
styled_df.to_excel('styled_export.xlsx', engine='openpyxl')
JSON Export Formatting
When exporting to JSON, you can control the structure and indentation of your output.
Basic JSON Formatting
# Basic JSON export
df.to_json('basic_export.json')
Formatting JSON Structure
# Format as records (list of dictionaries)
df.to_json('records_format.json', orient='records')
Output in records_format.json
:
[
{"Name":"Alice","Age":25,"Salary":50000.75,"Hired":"2021-05-12T00:00:00.000Z"},
{"Name":"Bob","Age":30,"Salary":65000.50,"Hired":"2020-03-15T00:00:00.000Z"},
{"Name":"Charlie","Age":35,"Salary":75000.25,"Hired":"2019-11-24T00:00:00.000Z"}
]
Pretty-Printed JSON
# Pretty-printed JSON with indentation
df.to_json('pretty_export.json', orient='records', indent=4)
Output in pretty_export.json
:
[
{
"Name": "Alice",
"Age": 25,
"Salary": 50000.75,
"Hired": "2021-05-12T00:00:00.000Z"
},
{
"Name": "Bob",
"Age": 30,
"Salary": 65000.5,
"Hired": "2020-03-15T00:00:00.000Z"
},
{
"Name": "Charlie",
"Age": 35,
"Salary": 75000.25,
"Hired": "2019-11-24T00:00:00.000Z"
}
]
Date Formatting in JSON
# Format dates in ISO format
df.to_json('date_format.json', orient='records', date_format='iso')
# Format dates as epoch timestamps (milliseconds since 1970-01-01)
df.to_json('epoch_format.json', orient='records', date_format='epoch')
HTML Export Formatting
Pandas can also export DataFrames to HTML with various styling options.
# Basic HTML export
html = df.to_html()
# Export with styling
styled_html = df.style.highlight_max(axis=0).to_html()
# Save to file
with open('styled_table.html', 'w') as f:
f.write(styled_html)
Real-World Application: Creating a Financial Report
Let's create a comprehensive financial report with proper formatting across multiple formats:
import pandas as pd
import numpy as np
from datetime import datetime
# Create financial data
np.random.seed(42)
dates = pd.date_range(start='2023-01-01', periods=12, freq='M')
data = {
'Date': dates,
'Revenue': np.random.randint(50000, 100000, 12),
'Expenses': np.random.randint(30000, 60000, 12),
'Growth_Rate': np.random.random(12) * 0.2, # 0-20% growth
}
financial_df = pd.DataFrame(data)
financial_df['Profit'] = financial_df['Revenue'] - financial_df['Expenses']
financial_df['Profit_Margin'] = financial_df['Profit'] / financial_df['Revenue']
# Create Excel report with formatting
with pd.ExcelWriter('financial_report.xlsx', engine='xlsxwriter') as writer:
financial_df.to_excel(writer, sheet_name='Monthly Financials', index=False)
workbook = writer.book
worksheet = writer.sheets['Monthly Financials']
# Define formats
currency_format = workbook.add_format({'num_format': '$#,##0', 'align': 'right'})
date_format = workbook.add_format({'num_format': 'mmm-yyyy', 'align': 'center'})
percent_format = workbook.add_format({'num_format': '0.0%', 'align': 'right'})
header_format = workbook.add_format({
'bold': True,
'font_color': 'white',
'bg_color': '#4F81BD',
'align': 'center',
'border': 1
})
# Apply formats
worksheet.set_column('A:A', 12, date_format)
worksheet.set_column('B:D', 15, currency_format)
worksheet.set_column('E:E', 15, percent_format)
# Apply header format
for col_num, value in enumerate(financial_df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Add chart
chart = workbook.add_chart({'type': 'line'})
chart.add_series({
'name': 'Revenue',
'categories': ['Monthly Financials', 1, 0, 12, 0],
'values': ['Monthly Financials', 1, 1, 12, 1],
})
chart.add_series({
'name': 'Expenses',
'values': ['Monthly Financials', 1, 2, 12, 2],
})
chart.set_title({'name': 'Revenue vs Expenses'})
chart.set_x_axis({'name': 'Month'})
chart.set_y_axis({'name': 'Amount ($)'})
worksheet.insert_chart('G2', chart)
# Create CSV report
financial_df.to_csv('financial_report.csv',
float_format='%.2f',
date_format='%b-%Y',
index=False)
# Create JSON report
financial_df.to_json('financial_report.json',
orient='records',
date_format='iso',
indent=4)
This example creates a financial report in three formats:
- An Excel file with proper currency, date and percentage formatting, plus a chart
- A CSV file with controlled decimal places and date formatting
- A JSON file with a readable structure and proper date format
Summary
Proper formatting when exporting Pandas DataFrames ensures your data is presented clearly and accurately. The formatting options vary by output format:
- CSV exports: Control decimal precision, date formats, and various delimiters
- Excel exports: Apply number formats, cell styling, conditional formatting, and charts
- JSON exports: Control structure orientation, indentation, and date representation
- HTML exports: Apply styling and highlighting for web presentation
By mastering these formatting options, you can create professional and readable data exports suited for various purposes, from data sharing to report generation.
Additional Resources
- Pandas to_csv documentation
- Pandas to_excel documentation
- Pandas to_json documentation
- XlsxWriter documentation
Exercises
-
Create a DataFrame with different numeric columns (integers, decimals, percentages) and export it to CSV with appropriate formatting for each column type.
-
Export a DataFrame to Excel with conditional formatting that highlights values above a certain threshold in green and below another threshold in red.
-
Create a financial dashboard in Excel using Pandas and XlsxWriter with at least three different chart types.
-
Compare the file size and readability of JSON exports using different orient parameters (
records
,split
,index
) and determine which is best for your specific use case.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)