Pandas Rolling Windows

When working with time series data, we often want to analyze trends and patterns by looking at data points over a specific window of time rather than individual data points. This technique is called rolling window analysis or moving window analysis. Pandas provides powerful functionality for this through its rolling window operations.

What are Rolling Windows?

A rolling window (or moving window) is a fixed-size window that slides over your time series data. For each position of the window, a calculation is performed using only the data points that fall within that window. As the window moves, older data points are dropped, and newer ones are included.

Some common uses of rolling windows include:

Computing moving averages to smooth out noise
Calculating rolling standard deviations to measure volatility
Finding rolling maximum or minimum values
Applying custom functions to windows of data

Basic Rolling Window Operations

Let's start by creating a simple time series dataset to demonstrate rolling windows:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a time series dataset
dates = pd.date_range(start='2023-01-01', periods=30, freq='D')
data = pd.Series(np.random.randn(30).cumsum(), index=dates)

print(data.head())

Output:

2023-01-01   -0.087688
2023-01-02   -1.343966
2023-01-03   -1.787641
2023-01-04   -2.452995
2023-01-05   -2.140078
dtype: float64

Simple Moving Average (SMA)

The most common rolling operation is calculating the moving average. Let's calculate a 7-day moving average:

python
# Calculate 7-day moving average
rolling_mean = data.rolling(window=7).mean()

print(rolling_mean.head(10))

Output:

2023-01-01         NaN
2023-01-02         NaN
2023-01-03         NaN
2023-01-04         NaN
2023-01-05         NaN
2023-01-06         NaN
2023-01-07   -1.754673
2023-01-08   -1.593573
2023-01-09   -1.413036
2023-01-10   -1.182968
dtype: float64

Notice that the first 6 values are NaN. This is because we need at least 7 data points to calculate a 7-day moving average. Let's visualize our original data and the moving average:

python
plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data')
plt.plot(rolling_mean, label='7-day Moving Average', color='red')
plt.legend()
plt.title('Original Data vs 7-day Moving Average')
plt.grid(True)
plt.tight_layout()
plt.show()

Configuring the Rolling Window

The rolling() function accepts several parameters to customize how the window operates:

Window Size

The window parameter defines the size of the rolling window. It can be:

An integer representing the number of observations in each window
An offset string like '7D' for 7 days or '1M' for 1 month (when working with time series)

python
# Using an integer window
rolling_mean_7 = data.rolling(window=7).mean()

# Using a time-based window (7 days)
rolling_mean_7d = data.rolling(window='7D').mean()

Window Types

Pandas supports different types of windows based on the win_type parameter:

python
# Gaussian weighted moving average
gaussian_weighted = data.rolling(window=7, win_type='gaussian').mean(std=2)

# Exponentially weighted moving average (using a different method)
exp_weighted = data.ewm(span=7).mean()

plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data', alpha=0.5)
plt.plot(rolling_mean_7, label='Simple Moving Average (7)', color='red')
plt.plot(gaussian_weighted, label='Gaussian Weighted', color='green')
plt.plot(exp_weighted, label='Exponentially Weighted', color='purple')
plt.legend()
plt.title('Different Types of Moving Averages')
plt.grid(True)
plt.tight_layout()
plt.show()

Handling Window Edges

By default, the rolling function requires a full window to compute a value. You can modify this behavior with the min_periods parameter:

python
# Only require 1 valid observation in window
flexible_rolling = data.rolling(window=7, min_periods=1).mean()
print(flexible_rolling.head(7))

Output:

2023-01-01   -0.087688
2023-01-02   -0.715827
2023-01-03   -1.073098
2023-01-04   -1.418072
2023-01-05   -1.562474
2023-01-06   -1.611246
2023-01-07   -1.754673
dtype: float64

Now we get values for all dates, not just after accumulating 7 data points.

Common Rolling Window Functions

Pandas provides many built-in methods for rolling windows:

python
# Calculate rolling standard deviation (volatility)
rolling_std = data.rolling(window=7).std()

# Calculate rolling minimum
rolling_min = data.rolling(window=7).min()

# Calculate rolling maximum
rolling_max = data.rolling(window=7).max()

# Calculate rolling sum
rolling_sum = data.rolling(window=7).sum()

Let's visualize some of these statistics:

python
plt.figure(figsize=(12, 8))

plt.subplot(2, 1, 1)
plt.plot(data, label='Original Data')
plt.plot(rolling_mean, label='Rolling Mean', color='red')
plt.fill_between(
    rolling_max.index,
    rolling_min,
    rolling_max,
    color='lightgray',
    label='Min-Max Range'
)
plt.legend()
plt.title('Rolling Mean with Min-Max Range')
plt.grid(True)

plt.subplot(2, 1, 2)
plt.plot(rolling_std, label='Rolling Std Dev', color='green')
plt.legend()
plt.title('Rolling Standard Deviation (Volatility)')
plt.grid(True)

plt.tight_layout()
plt.show()

Applying Custom Functions to Rolling Windows

You can apply custom functions to rolling windows using the apply() method:

python
# Custom function to calculate range (max - min)
def range_func(x):
    return x.max() - x.min()

# Apply custom function to rolling window
rolling_range = data.rolling(window=7).apply(range_func)

print("Rolling range for the first 10 days:")
print(rolling_range.head(10))

Output:

Rolling range for the first 10 days:
2023-01-01         NaN
2023-01-02         NaN
2023-01-03         NaN
2023-01-04         NaN
2023-01-05         NaN
2023-01-06         NaN
2023-01-07    2.365307
2023-01-08    2.440310
2023-01-09    2.273697
2023-01-10    2.365307
dtype: float64

Rolling Windows with DataFrames

Rolling operations work with DataFrames as well. Let's create a DataFrame with multiple columns:

python
# Create a DataFrame with multiple columns
np.random.seed(42)
df = pd.DataFrame({
    'A': np.random.randn(30).cumsum(),
    'B': np.random.randn(30).cumsum() * 1.5,
    'C': np.random.randn(30).cumsum() * 0.5
}, index=pd.date_range(start='2023-01-01', periods=30, freq='D'))

# Apply rolling mean to all columns
rolling_means = df.rolling(window=7).mean()

print(rolling_means.head(10))

# Plot all columns and their moving averages
plt.figure(figsize=(12, 8))

for col in df.columns:
    plt.plot(df[col], label=f'{col} Original', alpha=0.5)
    plt.plot(rolling_means[col], label=f'{col} Rolling Mean', linestyle='--')

plt.legend()
plt.title('Multiple Time Series with Rolling Means')
plt.grid(True)
plt.tight_layout()
plt.show()

Real-World Application: Stock Price Analysis

Let's look at a real-world example of using rolling windows to analyze stock price data:

python
# This example requires yfinance package
# pip install yfinance
import yfinance as yf

# Download stock data for Microsoft
msft = yf.download('MSFT', start='2022-01-01', end='2023-01-01')

# Calculate different moving averages
msft['SMA_20'] = msft['Close'].rolling(window=20).mean()  # 20-day moving average
msft['SMA_50'] = msft['Close'].rolling(window=50).mean()  # 50-day moving average
msft['SMA_200'] = msft['Close'].rolling(window=200).mean()  # 200-day moving average

# Calculate rolling volatility (20-day standard deviation)
msft['Volatility'] = msft['Close'].rolling(window=20).std()

# Plot the data
plt.figure(figsize=(14, 10))

# Plot closing price and moving averages
plt.subplot(2, 1, 1)
plt.plot(msft['Close'], label='MSFT Close', alpha=0.5)
plt.plot(msft['SMA_20'], label='20-day SMA', linestyle='--')
plt.plot(msft['SMA_50'], label='50-day SMA', linestyle='-.')
plt.plot(msft['SMA_200'], label='200-day SMA', linestyle=':')
plt.title('Microsoft Stock Price with Moving Averages')
plt.legend()
plt.grid(True)

# Plot volatility
plt.subplot(2, 1, 2)
plt.plot(msft['Volatility'], label='20-day Volatility', color='red')
plt.title('Rolling Volatility (20-day)')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

Rolling Window Statistical Methods

In addition to basic statistics, Pandas rolling windows provide many statistical methods:

python
# Using our original time series data
rolling_stats = pd.DataFrame({
    'Original': data,
    'Mean': data.rolling(window=7).mean(),
    'Median': data.rolling(window=7).median(),
    'Std Dev': data.rolling(window=7).std(),
    'Variance': data.rolling(window=7).var(),
    'Skew': data.rolling(window=7).skew(),
    'Kurt': data.rolling(window=7).kurt(),
    'Quantile 0.9': data.rolling(window=7).quantile(0.9)
})

print(rolling_stats.tail())

Center-aligned Windows

By default, the rolling windows are right-aligned, meaning the calculated value is assigned to the rightmost observation. You can use center-aligned windows with the center=True parameter:

python
# Right-aligned (default)
right_aligned = data.rolling(window=7).mean()

# Center-aligned
center_aligned = data.rolling(window=7, center=True).mean()

plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data', alpha=0.7)
plt.plot(right_aligned, label='Right Aligned', color='red')
plt.plot(center_aligned, label='Center Aligned', color='green')
plt.legend()
plt.title('Right-aligned vs Center-aligned Rolling Mean')
plt.grid(True)
plt.tight_layout()
plt.show()

Summary

Rolling windows are a powerful feature in Pandas for time series analysis:

They provide a way to smooth data and identify trends by looking at data over time windows
The basic function is rolling() which creates a rolling window object
Common operations include calculating moving averages, standard deviations, and other statistics
You can customize window size, alignment, edge handling, and window types
Custom functions can be applied to rolling windows using the apply() method
Rolling windows work with both Series and DataFrame objects

Rolling windows are essential tools for time series analysis, especially in financial applications, signal processing, and forecasting.

Additional Resources

Exercises

Download a stock price dataset of your choice and calculate the 5-day, 20-day, and 50-day moving averages.
Create a financial trading signal based on moving average crossovers.
Implement a custom function that calculates the "Bollinger Bands" (mean ± 2 × standard deviation) using rolling windows.
Compare the simple moving average with exponential moving average using different window sizes on a real dataset.
Analyze hourly temperature data and use rolling windows to identify daily patterns and anomalies.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

What are Rolling Windows?​

Basic Rolling Window Operations​

Simple Moving Average (SMA)​

Configuring the Rolling Window​

Window Size​

Window Types​

Handling Window Edges​

Common Rolling Window Functions​

Applying Custom Functions to Rolling Windows​

Rolling Windows with DataFrames​

Real-World Application: Stock Price Analysis​

Rolling Window Statistical Methods​

Center-aligned Windows​

Summary​

Additional Resources​

Exercises​

What are Rolling Windows?

Basic Rolling Window Operations

Simple Moving Average (SMA)

Configuring the Rolling Window

Window Size

Window Types

Handling Window Edges

Common Rolling Window Functions

Applying Custom Functions to Rolling Windows

Rolling Windows with DataFrames

Real-World Application: Stock Price Analysis

Rolling Window Statistical Methods

Center-aligned Windows

Summary

Additional Resources

Exercises