Pandas Time Series Visualization
Time series data is everywhere - from stock prices and weather patterns to website traffic and sales figures. Being able to visualize this data effectively is crucial to understanding trends, identifying patterns, and making data-driven decisions. In this lesson, we'll explore how to create powerful visualizations for time series data using Pandas and matplotlib.
Introduction to Time Series Visualization
Time series visualization helps us understand how data changes over time, spot seasonal patterns, identify outliers, and communicate findings effectively. Pandas makes this process relatively simple by integrating with matplotlib and providing specialized plotting methods for time-based data.
Before we start with visualization, let's ensure we have the necessary libraries imported:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Make plots look better
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 6)
Creating Sample Time Series Data
Let's start by creating some sample time series data that we'll use throughout this lesson:
# Create a date range
date_rng = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
# Create a DataFrame with random data
np.random.seed(42) # For reproducibility
df = pd.DataFrame(date_rng, columns=['date'])
df['sales'] = np.random.randint(50, 500, size=(len(date_rng)))
df['temperature'] = np.random.normal(70, 15, size=(len(date_rng)))
df['website_visits'] = np.random.randint(1000, 5000, size=(len(date_rng)))
# Add some seasonality to make it more realistic
df['sales'] = df['sales'] + df.index % 100
df['temperature'] = df['temperature'] + 20 * np.sin(np.pi * df.index / 180)
# Set the date as index
df.set_index('date', inplace=True)
# Display the first few rows
print(df.head())
Output:
sales temperature website_visits
date
2023-01-01 166 73.246719 2283
2023-01-02 117 71.888558 2225
2023-01-03 168 87.281043 4271
2023-01-04 219 66.741789 4774
2023-01-05 270 89.992927 1625
Basic Line Plots for Time Series
The simplest way to visualize time series data is using a line plot:
# Simple line plot for sales
plt.figure(figsize=(12, 6))
df['sales'].plot()
plt.title('Daily Sales Over Time')
plt.ylabel('Sales')
plt.grid(True)
plt.tight_layout()
plt.show()
To plot multiple time series on the same plot:
# Plot multiple columns
df[['sales', 'website_visits']].plot(figsize=(12, 6))
plt.title('Sales and Website Visits Over Time')
plt.ylabel('Value')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
However, this won't work well if the scales are very different. We can use secondary y-axis for this:
# Using secondary y-axis for different scales
fig, ax1 = plt.subplots(figsize=(12, 6))
color = 'tab:blue'
ax1.set_xlabel('Date')
ax1.set_ylabel('Sales', color=color)
ax1.plot(df.index, df['sales'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx() # Create a second y-axis sharing the same x-axis
color = 'tab:red'
ax2.set_ylabel('Website Visits', color=color)
ax2.plot(df.index, df['website_visits'], color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()
plt.title('Sales and Website Visits Over Time')
plt.grid(False)
plt.show()
Customizing Line Plots
We can customize our plots in many ways:
plt.figure(figsize=(12, 6))
df['temperature'].plot(
color='orange',
linestyle='-',
linewidth=2,
marker='o',
markersize=3,
alpha=0.7
)
plt.title('Daily Temperature Over Time', fontsize=16)
plt.ylabel('Temperature (°F)', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Resampling for Clearer Visualizations
For large datasets, plotting every single point can create cluttered visualizations. Resampling helps create clearer plots by aggregating data:
# Resample to monthly data
monthly_data = df.resample('M').mean()
plt.figure(figsize=(12, 6))
monthly_data['sales'].plot(kind='bar', color='skyblue')
plt.title('Average Monthly Sales')
plt.ylabel('Sales')
plt.xlabel('Month')
plt.xticks(rotation=45)
plt.grid(True, axis='y')
plt.tight_layout()
plt.show()
Area Plots for Cumulative or Stacked Visualization
Area plots are useful for showing cumulative changes or comparing related values:
# Create some additional columns for demonstration
df['returns'] = df['sales'] * np.random.normal(0.1, 0.05, size=(len(df)))
df['profit'] = df['sales'] - df['returns']
# Resample to weekly data for clearer visualization
weekly_data = df[['sales', 'returns', 'profit']].resample('W').sum()
# Create an area plot
weekly_data.plot.area(figsize=(12, 6), alpha=0.5)
plt.title('Weekly Sales Breakdown')
plt.ylabel('Amount')
plt.grid(True)
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()
Seasonal Plots
To visualize seasonal patterns, we can create seasonal decomposition plots:
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose the time series
result = seasonal_decompose(df['temperature'], model='additive', period=365)
# Plot the decomposition
fig = plt.figure(figsize=(12, 10))
plt.subplot(411)
plt.plot(result.observed, label='Observed')
plt.legend(loc='upper left')
plt.grid(True)
plt.subplot(412)
plt.plot(result.trend, label='Trend')
plt.legend(loc='upper left')
plt.grid(True)
plt.subplot(413)
plt.plot(result.seasonal, label='Seasonality')
plt.legend(loc='upper left')
plt.grid(True)
plt.subplot(414)
plt.plot(result.resid, label='Residuals')
plt.legend(loc='upper left')
plt.grid(True)
plt.tight_layout()
plt.show()
Heatmaps for Time Series Data
Heatmaps can be very useful for visualizing time series patterns:
# Create a pivot table with months as rows and days as columns
pivot_data = df['temperature'].copy()
pivot_data.index = pd.MultiIndex.from_arrays([
pivot_data.index.month,
pivot_data.index.day
], names=['month', 'day'])
pivot_table = pivot_data.unstack(level=0)
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_table, cmap='YlOrRd', linewidths=0.1)
plt.title('Temperature Heatmap by Month and Day')
plt.xlabel('Month')
plt.ylabel('Day')
plt.tight_layout()
plt.show()
Subplots for Multiple Time Series
To compare multiple time series side-by-side:
fig, axes = plt.subplots(3, 1, figsize=(12, 12), sharex=True)
df['sales'].plot(ax=axes[0], title='Daily Sales', color='blue')
axes[0].set_ylabel('Sales')
axes[0].grid(True)
df['temperature'].plot(ax=axes[1], title='Daily Temperature', color='red')
axes[1].set_ylabel('Temperature (°F)')
axes[1].grid(True)
df['website_visits'].plot(ax=axes[2], title='Daily Website Visits', color='green')
axes[2].set_ylabel('Visits')
axes[2].grid(True)
plt.tight_layout()
plt.show()
Interactive Visualization (Bonus)
While we're primarily using matplotlib, for interactive visualizations, libraries like Plotly are excellent:
import plotly.express as px
# Creating an interactive time series plot
fig = px.line(df.reset_index(), x='date', y='sales',
title='Interactive Sales Data Visualization')
fig.update_layout(xaxis_title='Date', yaxis_title='Sales')
# This would display an interactive plot in a notebook
# fig.show()
# For Docusaurus, we'll just mention that this creates an interactive plot
print("This code would create an interactive plot with hover information and zoom capabilities")
Real-World Example: Stock Price Visualization
Let's create a more complex real-world example by visualizing stock price data:
# Download some sample stock data
# We'll use pandas_datareader, but we're just creating sample data here
stock_data = pd.DataFrame({
'date': pd.date_range(start='2022-01-01', end='2022-12-31', freq='B'),
'open': np.random.normal(100, 5, 261),
'high': np.random.normal(105, 5, 261),
'low': np.random.normal(95, 5, 261),
'close': np.random.normal(100, 5, 261),
'volume': np.random.normal(1000000, 200000, 261)
})
# Make the data more realistic
for i in range(1, len(stock_data)):
# Each day's open is somewhat dependent on previous day's close
stock_data.loc[i, 'open'] = stock_data.loc[i-1, 'close'] * (1 + np.random.normal(0, 0.01))
stock_data.loc[i, 'high'] = max(stock_data.loc[i, 'open'] * (1 + np.random.uniform(0, 0.02)),
stock_data.loc[i, 'open'])
stock_data.loc[i, 'low'] = min(stock_data.loc[i, 'open'] * (1 - np.random.uniform(0, 0.02)),
stock_data.loc[i, 'open'])
stock_data.loc[i, 'close'] = stock_data.loc[i, 'open'] * (1 + np.random.normal(0, 0.01))
stock_data.loc[i, 'volume'] = abs(stock_data.loc[i-1, 'volume'] * (1 + np.random.normal(0, 0.1)))
stock_data.set_index('date', inplace=True)
# Visualization using line plot for closing prices
plt.figure(figsize=(12, 6))
stock_data['close'].plot(color='blue')
plt.title('Stock Closing Prices')
plt.ylabel('Price ($)')
plt.grid(True)
plt.tight_layout()
plt.show()
Now let's add a volume subplot and a moving average line:
# Calculate 20-day moving average
stock_data['MA20'] = stock_data['close'].rolling(window=20).mean()
# Create a figure with two subplots (price and volume)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), gridspec_kw={'height_ratios': [3, 1]}, sharex=True)
# Plot price and moving average on the first subplot
ax1.plot(stock_data.index, stock_data['close'], label='Close', color='blue', alpha=0.7)
ax1.plot(stock_data.index, stock_data['MA20'], label='20-Day MA', color='red', linestyle='-')
ax1.set_ylabel('Price ($)')
ax1.set_title('Stock Price with 20-Day Moving Average')
ax1.legend(loc='upper left')
ax1.grid(True)
# Plot volume on the second subplot
ax2.bar(stock_data.index, stock_data['volume'], color='green', alpha=0.5)
ax2.set_ylabel('Volume')
ax2.set_xlabel('Date')
ax2.grid(True)
plt.tight_layout()
plt.show()
Summary
In this lesson, we explored various techniques for visualizing time series data using Pandas and matplotlib:
- Basic Line Plots: The foundation of time series visualization
- Multiple Series Plotting: How to plot and compare multiple time series
- Customization: Ways to enhance your plots with colors, styles, and annotations
- Resampling: Techniques to aggregate data for clearer visualizations
- Area Plots: For showing cumulative changes or composition
- Seasonal Decomposition: For identifying trends, seasonality, and residuals
- Heatmaps: For visualizing patterns across different time dimensions
- Subplots: For comparing multiple time series side-by-side
- Real-World Example: Stock price visualization with volume and moving averages
Time series visualization is a powerful tool for understanding temporal data patterns and communicating insights effectively.
Exercises
- Create a time series visualization of daily temperature data that includes both the actual temperatures and a 7-day moving average.
- Download some real stock data using
pandas_datareader
and create a candlestick chart. - Create a heatmap showing website traffic by hour of day and day of week.
- Visualize a time series with seasonal patterns and add vertical lines to mark the beginning of each season.
- Create a dashboard-style layout with multiple time series visualizations for a retail business (sales, inventory, customer traffic, etc.).
Additional Resources
- Pandas Visualization Documentation
- Matplotlib Time Series Plotting
- Seaborn Time Series Visualization
- Plotly Time Series Documentation
- Book: "Python for Data Analysis" by Wes McKinney (Creator of Pandas)
Understanding how to visualize time series data effectively is a critical skill for data analysis, particularly for business, finance, and scientific applications. These techniques will help you extract meaningful insights from your temporal data and communicate those insights clearly to others.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)