Pandas Multiple Subplots
When analyzing data with pandas, you often need to compare multiple visualizations side by side. Creating multiple subplots allows you to present different aspects of your data in a single, organized figure. This tutorial will teach you how to create and customize multiple subplots using Pandas and its integration with Matplotlib.
Introduction to Subplots
Subplots are separate plotting areas arranged in a grid within a single figure. They're especially useful when you want to:
- Compare different variables or relationships
- Show the same data with different visualization types
- Present before/after scenarios
- Display related but distinct data views
Pandas leverages Matplotlib's subplot capabilities, giving you powerful options to create complex visualization layouts.
Basic Syntax for Creating Subplots
There are two main approaches to creating subplots with Pandas:
- Using Pandas' built-in plotting methods with the
subplots
parameter - Creating Matplotlib figure and axes objects explicitly and passing them to Pandas
Let's explore both methods.
Method 1: Using Pandas' Built-in Subplot Parameter
Pandas DataFrame's plot()
method includes a subplots
parameter that, when set to True
, creates a separate subplot for each column in your DataFrame.
Example: Basic Column Subplots
Let's create a simple DataFrame and plot each column as a separate subplot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Set a consistent style for better visuals
plt.style.use('seaborn-v0_8')
# Create sample data
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=100)
df = pd.DataFrame({
'Temperature': np.random.normal(25, 5, 100),
'Humidity': np.random.normal(60, 10, 100),
'Wind Speed': np.random.normal(15, 7, 100),
'Rainfall': np.random.exponential(5, 100)
}, index=dates)
# Create subplots for each column
fig, axes = df.plot(subplots=True, figsize=(12, 10), layout=(2, 2))
plt.tight_layout()
plt.show()
In this example:
- We created a DataFrame with weather-related metrics
df.plot(subplots=True)
creates a separate subplot for each columnfigsize=(12, 10)
sets the overall figure sizelayout=(2, 2)
arranges the subplots in a 2×2 gridplt.tight_layout()
automatically adjusts subplot parameters for optimal spacing
The resulting figure contains four separate line plots, one for each weather metric.
Customizing Subplot Layouts
You can customize various aspects of the subplots:
fig, axes = df.plot(
subplots=True,
figsize=(14, 10),
layout=(2, 2),
sharex=True, # Share the x-axis among subplots
title='Weather Metrics Over Time',
legend=True,
fontsize=12,
rot=45 # Rotate x-axis labels
)
# Customize the overall figure
plt.suptitle('Weather Data Analysis', fontsize=16, y=1.02)
plt.tight_layout()
plt.show()
The sharex=True
parameter ensures all subplots share the same x-axis scale, making it easier to compare trends across metrics.
Method 2: Using Matplotlib's Subplot System
For more control over your subplot layout, you can create Matplotlib figure and axes objects explicitly:
# Create a figure and axes explicitly
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Flatten the axes array for easier indexing
axs = axes.flatten()
# Plot different visualizations on each subplot
df['Temperature'].plot(ax=axs[0], title='Temperature Over Time', color='red')
df['Humidity'].plot(ax=axs[1], title='Humidity Over Time', color='blue')
# Create a histogram on the third subplot
df['Wind Speed'].plot(kind='hist', ax=axs[2], bins=20, title='Wind Speed Distribution', color='green')
# Create a boxplot on the fourth subplot
df.boxplot(column=['Rainfall'], ax=axs[3])
axs[3].set_title('Rainfall Distribution')
plt.tight_layout()
plt.show()
This approach allows:
- Different types of plots in each subplot
- Precise control over which data appears in each subplot
- Custom styling for individual subplots
Creating Mixed Visualization Types
One advantage of creating subplots is the ability to display different visualization types together:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Line plot
df['Temperature'].plot(ax=axes[0, 0], title='Temperature Trend', color='red')
# Scatter plot
df.plot.scatter(x='Temperature', y='Humidity', ax=axes[0, 1],
title='Temperature vs Humidity', c='blue', alpha=0.5)
# Histogram
df['Rainfall'].plot.hist(ax=axes[1, 0], bins=15, title='Rainfall Distribution')
# Box plot
columns_to_plot = ['Temperature', 'Humidity', 'Wind Speed']
df[columns_to_plot].plot.box(ax=axes[1, 1], title='Weather Metrics Comparison')
plt.suptitle('Multi-dimensional Weather Analysis', fontsize=16)
plt.tight_layout()
plt.show()
This example demonstrates how to combine different visualization types (line, scatter, histogram, and boxplot) in a single figure for comprehensive data analysis.
Real-world Example: Stock Market Analysis
Let's look at a practical example analyzing multiple stock prices:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
# Generate sample stock data
np.random.seed(42)
date_today = datetime.now()
dates = pd.date_range(date_today - timedelta(days=365), date_today, freq='B')
stocks = pd.DataFrame({
'AAPL': 150 + np.cumsum(np.random.normal(0.001, 0.02, len(dates))),
'MSFT': 250 + np.cumsum(np.random.normal(0.001, 0.025, len(dates))),
'GOOG': 2800 + np.cumsum(np.random.normal(0.001, 0.03, len(dates))),
'AMZN': 3300 + np.cumsum(np.random.normal(0.0005, 0.035, len(dates)))
}, index=dates)
# Calculate daily returns
returns = stocks.pct_change().dropna()
# Create a 2x2 subplot layout
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
# Plot 1: Stock prices over time
stocks.plot(ax=axes[0, 0], title='Stock Prices Over Time')
axes[0, 0].set_ylabel('Price ($)')
axes[0, 0].legend(loc='upper left')
# Plot 2: Daily returns
returns.plot(ax=axes[0, 1], title='Daily Returns', alpha=0.5)
axes[0, 1].set_ylabel('Daily Return')
axes[0, 1].legend(loc='upper left')
# Plot 3: Return distributions
returns.plot(kind='hist', bins=50, alpha=0.5, ax=axes[1, 0], title='Return Distribution')
axes[1, 0].set_xlabel('Daily Return')
axes[1, 0].legend(loc='upper left')
# Plot 4: Correlation matrix
pd.plotting.scatter_matrix(returns, diagonal='kde', ax=axes[1, 1], alpha=0.5)
plt.suptitle('Stock Market Analysis', fontsize=16)
plt.tight_layout()
plt.subplots_adjust(top=0.95)
plt.show()
This example creates a comprehensive stock market analysis dashboard with:
- Line plots showing raw stock prices
- Daily returns over time
- Return distribution histograms
- Correlation scatter plots between different stocks
Advanced Layout Techniques
For more complex layouts or when you need different sized subplots, you can use GridSpec
:
import matplotlib.gridspec as gridspec
# Create figure with custom grid layout
fig = plt.figure(figsize=(15, 10))
gs = gridspec.GridSpec(2, 3) # 2 rows, 3 columns
# Create axes with different sizes
ax1 = plt.subplot(gs[0, :2]) # First row, span first two columns
ax2 = plt.subplot(gs[0, 2]) # First row, third column
ax3 = plt.subplot(gs[1, 0]) # Second row, first column
ax4 = plt.subplot(gs[1, 1:]) # Second row, span second and third columns
# Plot data on each subplot
df['Temperature'].plot(ax=ax1, title='Temperature Trend', color='red')
df['Humidity'].plot.hist(ax=ax2, bins=15, title='Humidity Distribution', color='blue')
df.plot.scatter(x='Temperature', y='Humidity', ax=ax3, title='Temp vs Humidity')
df[['Wind Speed', 'Rainfall']].plot(ax=ax4, title='Wind and Rain')
plt.tight_layout()
plt.show()
This creates a layout with different sized subplots, allowing you to emphasize certain visualizations over others.
Working with Time Series Data
Time series data is particularly well-suited for subplot analysis. Here's an example showing different time aggregations:
# Create a time series dataset
ts_data = pd.DataFrame({
'Sales': np.random.normal(1000, 200, 365) +
np.sin(np.linspace(0, 2*np.pi, 365)) * 300
}, index=pd.date_range('2023-01-01', periods=365))
# Create figure with 3 subplots
fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=False)
# Daily data
ts_data.plot(ax=axes[0], title='Daily Sales')
# Weekly resampled data
ts_data.resample('W').mean().plot(ax=axes[1], title='Weekly Average Sales')
# Monthly resampled data
monthly = ts_data.resample('M').mean()
monthly.plot(kind='bar', ax=axes[2], title='Monthly Average Sales')
axes[2].set_xticklabels([d.strftime('%b') for d in monthly.index])
plt.tight_layout()
plt.show()
This visualization shows the same sales data at different time scales (daily, weekly, and monthly), helping identify patterns that might not be visible at a single scale.
Summary
Multiple subplots are a powerful way to present different aspects of your data in a single, organized figure. In this tutorial, you learned:
- How to create basic subplots using Pandas' built-in
plot(subplots=True)
parameter - How to create custom subplot layouts with Matplotlib's
plt.subplots()
- Techniques for displaying different visualization types in a single figure
- Advanced layout customization using
GridSpec
- Practical applications for financial data and time series
By combining multiple visualizations in a single figure, you can create rich, informative dashboards that help tell a more complete story about your data.
Exercises
-
Create a 2×2 subplot for the iris dataset (available via
sklearn.datasets.load_iris()
) showing: scatter plot of sepal length vs. width, scatter plot of petal length vs. width, histogram of all features, and a boxplot of all features. -
Load a CSV of your choice and create a dashboard with at least three different visualization types arranged in subplots.
-
Create a subplot for stock data that includes: price over time, a 20-day moving average, trading volume, and relative strength index (RSI).
Additional Resources
- Matplotlib Subplot Documentation
- Pandas Visualization Guide
- Seaborn: Statistical Data Visualization
- Plotly: Interactive Visualization Library
Happy plotting!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)