Pandas Rolling Aggregation

Introduction

Rolling aggregations (also called moving or rolling window calculations) are essential tools in data analysis, particularly for time series data. These calculations involve computing metrics over a sliding window of data points rather than the entire dataset at once. In pandas, the rolling() method allows us to easily implement these operations to identify trends, smooth out noise, and generate new features from time-dependent data.

In this tutorial, you'll learn:

What rolling aggregations are and why they're useful
How to use pandas' rolling() method
Common rolling window calculations
Real-world applications of rolling aggregations

Understanding Rolling Windows

A rolling window is a fixed-size subset of data that "rolls" or "slides" through your dataset. For each position of this window, an aggregation function (like mean, sum, or standard deviation) is applied to the data points within the window.

Why Use Rolling Aggregations?

Smooth out noise: Rolling averages reduce random fluctuations
Identify trends: Makes patterns more visible by reducing short-term variability
Feature engineering: Create new variables for machine learning models
Technical analysis: Calculate moving averages and other indicators for financial data

Basic Rolling Window Operations

Let's start with creating a simple time series dataset:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a simple time series dataset
dates = pd.date_range('20230101', periods=10)
data = pd.Series([10, 11, 9, 13, 14, 12, 15, 16, 14, 18], index=dates)
print(data)

The output would look like this:

2023-01-01    10
2023-01-02    11
2023-01-03     9
2023-01-04    13
2023-01-05    14
2023-01-06    12
2023-01-07    15
2023-01-08    16
2023-01-09    14
2023-01-10    18
Freq: D, dtype: int64

Simple Rolling Mean

The most common rolling operation is the moving average. Let's calculate a 3-day moving average:

# Calculate 3-day rolling average
rolling_mean = data.rolling(window=3).mean()
print(rolling_mean)

Output:

2023-01-01     NaN
2023-01-02     NaN
2023-01-03    10.0
2023-01-04    11.0
2023-01-05    12.0
2023-01-06    13.0
2023-01-07    13.67
2023-01-08    14.33
2023-01-09    15.0
2023-01-10    16.0
Freq: D, dtype: float64

Notice that the first two values are NaN. This is because we need at least 3 data points for a 3-day window, and we don't have enough data for the first two dates.

Let's visualize this:

plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data')
plt.plot(rolling_mean, label='3-Day Rolling Average')
plt.legend()
plt.title('Original Data vs. 3-Day Rolling Average')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.show()

Common Rolling Aggregation Methods

After calling rolling(), you can chain various aggregation methods:

# Different types of rolling aggregations
rolling_sum = data.rolling(window=3).sum()
rolling_max = data.rolling(window=3).max()
rolling_min = data.rolling(window=3).min()
rolling_std = data.rolling(window=3).std()

# Display results in a DataFrame for comparison
results = pd.DataFrame({
    'Original': data,
    'Rolling Mean': rolling_mean,
    'Rolling Sum': rolling_sum,
    'Rolling Max': rolling_max,
    'Rolling Min': rolling_min,
    'Rolling Std': rolling_std
})
print(results)

Output would be a table showing all these calculations side by side.

Customizing the Rolling Window

Window Size

The window size determines the number of observations used in each calculation:

# Compare different window sizes
rolling_mean_2 = data.rolling(window=2).mean()
rolling_mean_4 = data.rolling(window=4).mean()

plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data')
plt.plot(rolling_mean_2, label='2-Day Rolling Mean')
plt.plot(rolling_mean_4, label='4-Day Rolling Mean')
plt.legend()
plt.title('Comparison of Different Window Sizes')
plt.show()

Window Types

By default, pandas uses a fixed-size window, but you can also specify other window types:

# Exponential weighted window - gives more weight to recent observations
exp_weighted_avg = data.ewm(span=3).mean()

plt.figure(figsize=(10, 6))
plt.plot(data, label='Original Data')
plt.plot(rolling_mean, label='Simple 3-Day Rolling Mean')
plt.plot(exp_weighted_avg, label='Exponential Weighted Mean (span=3)')
plt.legend()
plt.title('Simple vs Exponential Weighted Moving Average')
plt.show()

Min Periods

You can specify the minimum number of observations in the window required to have a value:

# Require at least 2 observations instead of the full window of 3
flexible_rolling_mean = data.rolling(window=3, min_periods=2).mean()
print(flexible_rolling_mean)

Output:

2023-01-01     NaN
2023-01-02    10.5
2023-01-03    10.0
2023-01-04    11.0
2023-01-05    12.0
2023-01-06    13.0
2023-01-07    13.67
2023-01-08    14.33
2023-01-09    15.0
2023-01-10    16.0
Freq: D, dtype: float64

Notice that now we have a value for 2023-01-02 which was previously NaN.

Center Parameter

By default, the label used is the right edge of the window. You can use the center parameter to set the label at the center of the window:

# Center the window
centered_rolling_mean = data.rolling(window=3, center=True).mean()
print(centered_rolling_mean)

Rolling Aggregation with DataFrames

Rolling operations work with DataFrames too, applying the operation to each column:

# Create a DataFrame with multiple columns
df = pd.DataFrame({
    'A': [10, 11, 9, 13, 14, 12, 15, 16, 14, 18],
    'B': [5, 8, 7, 9, 10, 8, 12, 14, 10, 15]
}, index=dates)

# Apply rolling mean to each column
df_rolling = df.rolling(window=3).mean()
print(df_rolling)

Custom Aggregation Functions

You can use the apply() method to implement custom rolling window functions:

# Define a custom function that returns range (max - min)
def rolling_range(x):
    return x.max() - x.min()

# Apply custom function to rolling window
rolling_range_values = data.rolling(window=4).apply(rolling_range)
print(rolling_range_values)

Practical Example: Stock Price Analysis

Let's see a real-world example where rolling aggregations are commonly used - technical analysis for stock prices:

# Generate some sample stock price data
stock_prices = pd.Series(
    [100, 102, 104, 103, 105, 107, 108, 109, 110, 112, 
     111, 113, 114, 116, 115, 114, 116, 118, 117, 120],
    index=pd.date_range('2023-01-01', periods=20)
)

# Technical indicators
short_ma = stock_prices.rolling(window=5).mean()  # Short-term moving average
long_ma = stock_prices.rolling(window=10).mean()  # Long-term moving average
volatility = stock_prices.rolling(window=5).std()  # Volatility

# Plot the data
plt.figure(figsize=(12, 8))

plt.subplot(2, 1, 1)
plt.plot(stock_prices, label='Stock Price')
plt.plot(short_ma, label='5-Day MA')
plt.plot(long_ma, label='10-Day MA')
plt.legend()
plt.title('Stock Price with Moving Averages')
plt.grid(True)

plt.subplot(2, 1, 2)
plt.plot(volatility, color='red', label='5-Day Volatility')
plt.legend()
plt.title('Stock Price Volatility (5-Day Rolling Standard Deviation)')
plt.grid(True)

plt.tight_layout()
plt.show()

In this example:

The 5-day moving average (short_ma) shows short-term trends
The 10-day moving average (long_ma) shows longer-term trends
When short_ma crosses above long_ma, it might indicate a bullish signal
The rolling standard deviation estimates volatility over time

Rolling Window with Time-Based Periods

For time series data, you can specify time-based windows instead of a fixed number of observations:

# Create a time series with irregular frequency
irregular_ts = pd.Series(
    [10, 15, 12, 18, 14, 17, 20, 22, 19, 25],
    index=pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-05', 
                          '2023-01-07', '2023-01-08', '2023-01-12',
                          '2023-01-15', '2023-01-18', '2023-01-20',
                          '2023-01-25'])
)

# 5-day rolling mean using time-based window
time_based_rolling = irregular_ts.rolling('5D').mean()
print(time_based_rolling)

This calculates the mean for data points within a 5-day period from each point, which is more appropriate for irregularly spaced time series.

Summary

Rolling aggregations in pandas provide a powerful way to analyze time series data by computing statistics over sliding windows. Key points to remember:

The rolling() method creates a rolling window view of the data
Common methods include .mean(), .sum(), .min(), .max(), .std()
You can customize windows with parameters like window, min_periods, and center
For time series, use time-based windows with strings like '5D', '1M', etc.
Custom functions can be applied using .apply()

Rolling aggregations are frequently used in financial analysis, signal processing, anomaly detection, and any field that works with time series data. They're excellent for smoothing noisy data and identifying underlying patterns.

Exercises

To solidify your understanding of rolling aggregations, try these exercises:

Generate a random time series and apply different window sizes (3, 7, and 14) for rolling means. Compare the results visually.
Create a custom rolling function that calculates the median absolute deviation within each window.
Using real-world stock price data (from yfinance or another source), create a trading strategy based on short and long-term moving averages.
Implement a rolling window operation that detects outliers (values more than 2 standard deviations from the rolling mean).
Compare the performance of different window types (simple, exponential, Gaussian) for smoothing a noisy time series.

Additional Resources

Happy data analyzing with pandas rolling aggregations!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Rolling Windows​

Why Use Rolling Aggregations?​

Basic Rolling Window Operations​

Simple Rolling Mean​

Common Rolling Aggregation Methods​

Customizing the Rolling Window​

Window Size​

Window Types​

Min Periods​

Center Parameter​

Rolling Aggregation with DataFrames​

Custom Aggregation Functions​

Practical Example: Stock Price Analysis​

Rolling Window with Time-Based Periods​

Summary​

Exercises​

Additional Resources​