Pandas Styling
When working with data in pandas, it's often necessary to present your findings in a clear, visually appealing way. Pandas provides a rich set of styling tools that allow you to format your DataFrames for better visualization and communication of insights. This guide will walk you through the various styling capabilities in pandas, from basic techniques to advanced customization.
Introduction to Pandas Styling
The .style
accessor in pandas allows you to format and style your DataFrames without changing the underlying data. This is particularly useful when you want to:
- Highlight specific values or patterns
- Format numbers appropriately (currencies, percentages, etc.)
- Create visually appealing reports
- Add color scales to represent data magnitude
- Generate publication-quality tables
Let's dive into how you can leverage these styling capabilities!
Basic Styling Techniques
Getting Started with Styling
To begin styling a DataFrame, you simply access the .style
property of your DataFrame:
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': np.linspace(1, 10, 5),
'B': np.linspace(10, 20, 5),
'C': ['a', 'b', 'c', 'd', 'e']
})
# Access the style property
styled_df = df.style
styled_df
Output:
Number Formatting
One of the most common styling needs is to format numbers with specific precision or as percentages, currencies, etc.
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'Price': [1.2345, 2.3456, 3.4567],
'Percentage': [0.1234, 0.2345, 0.3456],
'Large Number': [1234567, 2345678, 3456789]
})
# Format with different styles
styled_df = df.style.format({
'Price': '${:.2f}',
'Percentage': '{:.2%}',
'Large Number': '{:,.0f}'
})
styled_df
Output:
The output shows:
- 'Price' column formatted as currency with 2 decimal places
- 'Percentage' column formatted as percentage with 2 decimal places
- 'Large Number' column formatted with thousand separators
Conditional Formatting
Highlight Max and Min Values
You can easily highlight the maximum and minimum values in your DataFrame:
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame(np.random.randn(5, 3),
columns=['A', 'B', 'C'])
# Highlight the maximum values in green and minimum values in red
styled_df = df.style.highlight_max(color='lightgreen').highlight_min(color='pink')
styled_df
Output:
Custom Conditional Formatting
You can define custom highlighting conditions with the .applymap()
and .apply()
methods:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3),
columns=['A', 'B', 'C'])
# Define a function for conditional styling
def color_negative_red(val):
"""
Takes a scalar and returns a string with
the CSS property `'color: red'` for negative
values, black otherwise.
"""
color = 'red' if val < 0 else 'black'
return f'color: {color}'
# Apply the styling function to the entire DataFrame
styled_df = df.style.applymap(color_negative_red)
styled_df
Output:
Color Scales and Gradients
Background Color Scales
You can apply color scales to visualize the distribution of values:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 3),
columns=['A', 'B', 'C'])
# Apply a color background gradient
styled_df = df.style.background_gradient(cmap='viridis')
styled_df
Output:
Applying Color Scales by Columns
You can also apply color scales separately for each column:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 3) * [10, 100, 1000],
columns=['A', 'B', 'C'])
# Apply color background gradient by column
styled_df = df.style.background_gradient(cmap='coolwarm', axis=0)
styled_df
Output:
Bar Charts and Visual Representations
Adding Bar Charts
Pandas styling allows you to add in-cell bar charts for numerical comparisons:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.random.randint(0, 100, 5),
'B': np.random.randint(0, 100, 5),
'C': np.random.randint(0, 100, 5)
})
# Add bar charts to visualize data
styled_df = df.style.bar(subset=['A', 'B', 'C'], color='#d65f5f')
styled_df
Output:
Table Styling for Presentation
Table Properties
You can modify the appearance of the entire table:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3),
columns=['A', 'B', 'C'])
# Style the entire table
styled_df = df.style.set_properties(**{
'background-color': '#f5f5f5',
'border': '1px solid black',
'text-align': 'center',
'font-weight': 'bold'
})
styled_df
Output:
Adding Captions and Headers
Add captions and style your table headers:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3),
columns=['A', 'B', 'C'])
# Add caption and style headers
styled_df = df.style.set_caption("Sample Data Analysis")\
.set_table_styles([
{'selector': 'th',
'props': [('background-color', '#007bff'),
('color', 'white'),
('font-weight', 'bold')]},
{'selector': 'caption',
'props': [('caption-side', 'top'),
('font-size', '18px'),
('font-weight', 'bold')]}
])
styled_df
Output:
Combining Multiple Styling Features
You can chain multiple styling methods to create comprehensive visualizations:
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({
'A': np.linspace(1, 10, 10),
'B': np.linspace(11, 20, 10) + np.random.randn(10),
'C': np.random.uniform(20, 30, 10)
})
# Apply multiple styles
styled_df = df.style\
.format('{:.2f}')\
.background_gradient(subset=['A'], cmap='Reds')\
.background_gradient(subset=['B'], cmap='Blues')\
.background_gradient(subset=['C'], cmap='Greens')\
.highlight_max(color='yellow', axis=0)\
.set_caption('Combined Styling Example')\
.set_table_styles([
{'selector': 'th',
'props': [('background-color', '#4CAF50'),
('color', 'white'),
('font-weight', 'bold')]}
])
styled_df
Output:
Real-world Application: Financial Dashboard
Let's create a more comprehensive example showing how styling can be used in a financial context:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create sample financial data
np.random.seed(42)
index = pd.date_range('2023-01-01', periods=10, freq='B')
data = {
'Open': np.random.uniform(100, 150, 10),
'High': np.random.uniform(150, 200, 10),
'Low': np.random.uniform(50, 100, 10),
'Close': np.random.uniform(100, 150, 10),
'Volume': np.random.randint(1000000, 5000000, 10),
'Change %': np.random.uniform(-0.05, 0.05, 10)
}
df = pd.DataFrame(data, index=index)
# Fix the data to ensure High >= Open >= Low and High >= Close >= Low
for i in range(len(df)):
df.loc[df.index[i], 'High'] = max(df.loc[df.index[i], ['Open', 'Close', 'High']].max(),
df.loc[df.index[i], 'High'])
df.loc[df.index[i], 'Low'] = min(df.loc[df.index[i], ['Open', 'Close', 'Low']].min(),
df.loc[df.index[i], 'Low'])
# Create a styled financial table
def color_change(val):
color = 'green' if val > 0 else 'red'
return f'color: {color}; font-weight: bold'
styled_df = df.style\
.format({
'Open': '${:.2f}',
'High': '${:.2f}',
'Low': '${:.2f}',
'Close': '${:.2f}',
'Volume': '{:,.0f}',
'Change %': '{:.2%}'
})\
.applymap(color_change, subset=['Change %'])\
.background_gradient(subset=['Volume'], cmap='Blues')\
.bar(subset=['High', 'Low'], color=['lightgreen', 'pink'])\
.set_caption('Stock Price Analysis - January 2023')\
.set_table_styles([
{'selector': 'th',
'props': [('background-color', '#2c3e50'),
('color', 'white'),
('font-weight', 'bold'),
('text-align', 'center')]},
{'selector': 'caption',
'props': [('caption-side', 'top'),
('font-size', '16px'),
('font-weight', 'bold'),
('color', '#2c3e50')]}
])
styled_df
Output:
Exporting Styled DataFrames
Converting to HTML
You can export your styled DataFrame to HTML for web applications:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3),
columns=['A', 'B', 'C'])
# Style the DataFrame
styled_df = df.style.format('{:.2f}')\
.background_gradient(cmap='viridis')
# Export to HTML (can be saved to a file)
html = styled_df.to_html()
print(html[:300] + "...") # Showing first 300 characters
Output:
<style type="text/css">
#T_12345_row0_col0, #T_12345_row0_col1, #T_12345_row0_col2, #T_12345_row1_col0, #T_12345_row1_col1, #T_12345_row1_col2, #T_12345_row2_col0, #T_12345_row2_col1, #T_12345_row2_col2, #T_12345_row3_col0, #T_12345_row3_col1, #T_12345_row3_col2, #T_12345_row4_col0, #T_12345_row4_col1, #T_12345_row4_col2 {
background-color: ...
Summary
Pandas styling functionality offers a powerful way to enhance your DataFrames for better visualization and communication. In this guide, we covered:
- Basic number formatting and styling
- Conditional formatting and highlighting
- Color scales and gradients
- In-cell bar charts
- Table styling for presentations
- Combining multiple styling features
- Real-world applications
- Exporting styled DataFrames
With these tools, you can transform plain DataFrames into informative, visually appealing presentations that effectively communicate your data insights.
Additional Resources and Exercises
Additional Resources
- Official Pandas Styling Documentation
- Pandas Styling Cookbook
- Python for Data Analysis (Book by Wes McKinney)
Exercises
-
Basic Styling Exercise: Create a DataFrame with student grades and format it to show percentages. Highlight scores above 90% in green and below 60% in red.
-
Financial Data Exercise: Generate a mock stock portfolio and create a styled table showing gains and losses with appropriate color coding.
-
Sales Dashboard Challenge: Create a DataFrame with monthly sales data and style it to include:
- Currency formatting
- Color gradient for sales values
- Bar charts comparing monthly performance
- Highlighted top and bottom months
- Custom header styling
-
Custom Style Function: Create a custom styling function that applies a 'traffic light' color scheme (red, yellow, green) based on threshold values.
-
Export Challenge: Style a DataFrame with various formatting features and export it to HTML for use in a web page.
By mastering pandas styling, you'll be able to create more effective data presentations that highlight key insights and make your analysis more impactful.
If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)