Pandas Excel Export
Introduction
When working with data in Python, exporting your processed data to Excel is a common requirement. Excel remains one of the most widely used tools for data analysis and sharing across organizations. Pandas, Python's powerful data manipulation library, provides robust functionality to export DataFrame objects to Excel files with various customization options.
In this tutorial, you'll learn how to export your Pandas DataFrames to Excel files, apply formatting, work with multiple sheets, and handle various export configurations to create professional spreadsheets.
Prerequisites
Before we dive into exporting to Excel, make sure you have the following libraries installed:
# Install required libraries
# pip install pandas openpyxl xlsxwriter
You'll need either openpyxl
or xlsxwriter
as the Excel engine. Both have their advantages, but we'll primarily use openpyxl
in this tutorial.
Basic Excel Export with Pandas
Let's start with the simplest way to export a DataFrame to Excel:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 34, 29, 42],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Salary': [75000, 80000, 65000, 90000]
}
df = pd.DataFrame(data)
# Basic export to Excel
df.to_excel('employees.xlsx', index=False)
This creates a file named 'employees.xlsx' with our data. The index=False
parameter prevents the DataFrame index from being included in the Excel file, which is often what you want unless the index contains meaningful information.
Configuring the Excel Export
The to_excel()
method offers many customization options:
# Export with more configuration options
df.to_excel(
'employees_config.xlsx',
sheet_name='Employee Data', # Custom sheet name
index=False, # Don't include index
freeze_panes=(1, 0), # Freeze the first row
engine='openpyxl' # Specify the Excel engine
)
Here's what each parameter does:
sheet_name
: Sets the name of the worksheetfreeze_panes
: Freezes rows and columns for better navigation of large datasetsengine
: Specifies which Excel writer to use ('openpyxl' or 'xlsxwriter')
Working with Excel Writers for More Control
For more advanced formatting, you can use Excel writers explicitly:
# Create an Excel writer
with pd.ExcelWriter('employees_formatted.xlsx', engine='openpyxl') as writer:
# Write the DataFrame to Excel
df.to_excel(writer, sheet_name='Employees', index=False)
# Get the xlsxwriter worksheet object
worksheet = writer.sheets['Employees']
# Get the dimensions of the DataFrame
(max_row, max_col) = df.shape
# Set column widths
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).map(len).max(), len(col) + 2)
worksheet.column_dimensions[chr(65 + i)].width = column_len
This approach gives you access to the underlying Excel worksheet, allowing you to adjust column widths to fit your data better.
Exporting Multiple DataFrames to Different Sheets
You can export multiple DataFrames to different worksheets in the same Excel file:
import pandas as pd
# Create sample DataFrames
employees_df = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 34, 29, 42],
'Department': ['IT', 'HR', 'IT', 'Finance']
})
salaries_df = pd.DataFrame({
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Salary': [75000, 80000, 65000, 90000],
'Bonus': [5000, 6000, 3000, 10000]
})
# Export to multiple sheets
with pd.ExcelWriter('company_data.xlsx', engine='openpyxl') as writer:
employees_df.to_excel(writer, sheet_name='Employees', index=False)
salaries_df.to_excel(writer, sheet_name='Salaries', index=False)
This creates an Excel file with two sheets: 'Employees' and 'Salaries'.
Advanced Formatting with XlsxWriter
If you need more advanced formatting, XlsxWriter
is a great option:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 34, 29, 42],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Salary': [75000, 80000, 65000, 90000]
}
df = pd.DataFrame(data)
# Create a Pandas Excel writer using XlsxWriter as the engine
with pd.ExcelWriter('styled_employees.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Employees', index=False)
# Get workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Employees']
# Create a format for the header
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1
})
# Apply the header format to the first row
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Add a number format for the salary column
money_format = workbook.add_format({'num_format': '$#,##0'})
worksheet.set_column('D:D', None, money_format) # Apply to salary column
# Auto-fit columns
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).map(len).max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
This example creates an Excel file with:
- Formatted headers with background color and borders
- Currency formatting for the salary column
- Auto-fitted columns
Real-World Example: Sales Report Export
Let's create a more comprehensive example of a sales report that might be used in a business context:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# Create sample sales data
np.random.seed(42) # For reproducible results
dates = pd.date_range(start='2023-01-01', periods=100, freq='D')
products = ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard']
regions = ['North', 'South', 'East', 'West']
sales_data = []
for _ in range(500):
date = np.random.choice(dates)
product = np.random.choice(products)
region = np.random.choice(regions)
quantity = np.random.randint(1, 10)
price = np.random.choice([899, 699, 499, 349, 79])
sales_data.append({
'Date': date,
'Product': product,
'Region': region,
'Quantity': quantity,
'Unit Price': price,
'Revenue': quantity * price
})
# Create DataFrame
sales_df = pd.DataFrame(sales_data)
# Create summary DataFrames
product_summary = sales_df.groupby('Product').agg({
'Quantity': 'sum',
'Revenue': 'sum'
}).reset_index()
region_summary = sales_df.groupby('Region').agg({
'Quantity': 'sum',
'Revenue': 'sum'
}).reset_index()
monthly_summary = sales_df.copy()
monthly_summary['Month'] = monthly_summary['Date'].dt.strftime('%Y-%m')
monthly_summary = monthly_summary.groupby('Month').agg({
'Quantity': 'sum',
'Revenue': 'sum'
}).reset_index()
# Export to Excel with formatting
with pd.ExcelWriter('sales_report.xlsx', engine='xlsxwriter') as writer:
# Write DataFrames to different sheets
sales_df.to_excel(writer, sheet_name='Raw Data', index=False)
product_summary.to_excel(writer, sheet_name='By Product', index=False)
region_summary.to_excel(writer, sheet_name='By Region', index=False)
monthly_summary.to_excel(writer, sheet_name='Monthly Trends', index=False)
# Get workbook and add formatting
workbook = writer.book
# Add formats
header_format = workbook.add_format({
'bold': True,
'bg_color': '#4F81BD',
'font_color': 'white',
'border': 1
})
money_format = workbook.add_format({'num_format': '$#,##0.00'})
date_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})
# Format each sheet
for sheet_name in ['Raw Data', 'By Product', 'By Region', 'Monthly Trends']:
worksheet = writer.sheets[sheet_name]
df = sales_df if sheet_name == 'Raw Data' else (
product_summary if sheet_name == 'By Product' else (
region_summary if sheet_name == 'By Region' else monthly_summary
))
# Format headers
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Set column formats
if 'Revenue' in df.columns:
revenue_col = df.columns.get_loc('Revenue')
worksheet.set_column(revenue_col, revenue_col, None, money_format)
if 'Unit Price' in df.columns:
price_col = df.columns.get_loc('Unit Price')
worksheet.set_column(price_col, price_col, None, money_format)
if 'Date' in df.columns:
date_col = df.columns.get_loc('Date')
worksheet.set_column(date_col, date_col, None, date_format)
# Auto-fit columns
for i, col in enumerate(df.columns):
if col != 'Date': # Skip date column as it's already formatted
column_len = max(df[col].astype(str).map(len).max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
else:
worksheet.set_column(i, i, 12) # Fixed width for date column
# Add charts to the Product summary sheet
product_chart = workbook.add_chart({'type': 'column'})
product_chart.add_series({
'name': 'Revenue by Product',
'categories': ['By Product', 1, 0, len(product_summary), 0],
'values': ['By Product', 1, 2, len(product_summary), 2],
})
product_chart.set_title({'name': 'Revenue by Product'})
product_chart.set_size({'width': 500, 'height': 300})
writer.sheets['By Product'].insert_chart('E2', product_chart)
This example creates a comprehensive sales report with:
- Raw data sheet with all sales records
- Summary sheets for products, regions, and monthly trends
- Proper formatting for headers, currency values, and dates
- A chart showing revenue by product
Common Issues and Solutions
Issue 1: Missing Excel Engine
If you encounter an error like ImportError: Missing optional dependency 'openpyxl'
, you need to install the appropriate Excel engine:
# Install the required engine
# pip install openpyxl # For .xlsx files
# pip install xlwt # For .xls files (legacy)
Issue 2: Excel File Already Exists
If you're running your code repeatedly, you might encounter a PermissionError
if the Excel file is still open. Always close the file before running your code again or use a different filename.
Issue 3: Date and Time Formatting Issues
Excel can sometimes misinterpret date formats. Use explicit formatting:
with pd.ExcelWriter('dates_example.xlsx', engine='xlsxwriter') as writer:
df_with_dates.to_excel(writer, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
date_format = workbook.add_format({'num_format': 'yyyy-mm-dd'})
date_col_idx = df_with_dates.columns.get_loc('date_column')
worksheet.set_column(date_col_idx, date_col_idx, None, date_format)
Summary
Exporting Pandas DataFrames to Excel is a versatile process that allows you to:
- Create basic Excel exports with minimal code
- Add formatting and styling to your Excel files
- Export multiple DataFrames to different sheets in the same file
- Create professional reports with charts and formatting
- Customize exports to match your exact requirements
Whether you're creating a simple data export or a complex formatted report, Pandas provides the tools you need to generate Excel files for any purpose.
Additional Resources
Exercises
- Basic Export: Create a DataFrame with student grades and export it to Excel without the index.
- Multiple Sheets: Create two DataFrames (one for students, one for courses) and export them to different sheets in the same Excel file.
- Formatted Report: Create a sales report DataFrame and export it with proper formatting (headers, currency format for prices).
- Conditional Formatting: Export a DataFrame with conditional formatting that highlights cells above a certain threshold.
- Charts and Graphs: Export a DataFrame and add a bar chart representation of the data.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)