Pandas Window Functions

Introduction

Window functions are powerful tools in data analysis that allow you to perform calculations across a set of rows that are somehow related to the current row. Unlike regular aggregations that collapse data into a single value, window functions preserve the number of rows in the result while enabling calculations that depend on values from other rows.

In pandas, window functions are implemented through various methods like rolling(), expanding(), and groupby() with specialized window operations. These functions are essential for time-series analysis, financial modeling, and many other data science applications where you need to calculate moving averages, cumulative statistics, or other rolling computations.

This guide will walk you through the major types of window functions in pandas and show you how to apply them to solve real-world problems.

Rolling Window Functions

Rolling window functions compute metrics over a sliding window of a specific size. They're particularly useful for smoothing time-series data and identifying trends.

Basic Rolling Window Operations

Let's start with a simple example of calculating a moving average:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a sample dataframe with date index
dates = pd.date_range('20230101', periods=10)
df = pd.DataFrame({
    'value': [10, 15, 13, 17, 20, 22, 19, 25, 23, 30]
}, index=dates)

# Calculate a 3-day moving average
df['rolling_mean_3d'] = df['value'].rolling(window=3).mean()

print(df.head(7))

Output:

            value  rolling_mean_3d
2023-01-01     10              NaN
2023-01-02     15              NaN
2023-01-03     13       12.666667
2023-01-04     17       15.000000
2023-01-05     20       16.666667
2023-01-06     22       19.666667
2023-01-07     19       20.333333

Notice that the first two values in the rolling_mean_3d column are NaN because we need at least 3 data points to calculate a 3-day moving average.

Common Rolling Window Functions

Rolling windows support many statistical operations:

python
# Multiple window functions at once
rolling_stats = df['value'].rolling(window=3).agg(['mean', 'std', 'min', 'max', 'sum'])
print(rolling_stats.head(5))

Output:

                mean       std   min   max   sum
2023-01-01       NaN       NaN   NaN   NaN   NaN
2023-01-02       NaN       NaN   NaN   NaN   NaN
2023-01-03  12.666667  2.516611  10.0  15.0  38.0
2023-01-04  15.000000  2.000000  13.0  17.0  45.0
2023-01-05  16.666667  3.511885  13.0  20.0  50.0

Customizing Rolling Windows

You can customize how the rolling window behaves by using additional parameters:

python
# Specify a minimum number of observations required
df['rolling_mean_3d_min2'] = df['value'].rolling(window=3, min_periods=2).mean()

# Center the window instead of using past values
df['rolling_mean_3d_centered'] = df['value'].rolling(window=3, center=True).mean()

print(df[['value', 'rolling_mean_3d', 'rolling_mean_3d_min2', 'rolling_mean_3d_centered']].head(5))

Output:

            value  rolling_mean_3d  rolling_mean_3d_min2  rolling_mean_3d_centered
2023-01-01     10              NaN               NaN                  12.666667
2023-01-02     15              NaN              12.5                  15.000000
2023-01-03     13       12.666667          12.666667                  16.666667
2023-01-04     17       15.000000          15.000000                  19.666667
2023-01-05     20       16.666667          16.666667                  20.333333

Expanding Window Functions

Unlike rolling windows with fixed size, expanding windows grow as they incorporate more data points. They're useful for calculating cumulative statistics.

python
# Calculate cumulative statistics
df['expanding_mean'] = df['value'].expanding().mean()
df['expanding_sum'] = df['value'].expanding().sum()

print(df[['value', 'expanding_mean', 'expanding_sum']].head(6))

Output:

            value  expanding_mean  expanding_sum
2023-01-01     10       10.000000            10
2023-01-02     15       12.500000            25
2023-01-03     13       12.666667            38
2023-01-04     17       13.750000            55
2023-01-05     20       15.000000            75
2023-01-06     22       16.166667            97

Expanding windows are particularly useful in financial applications for calculating cumulative returns or analyzing how metrics evolve over time.

Exponentially Weighted Window Functions

Exponentially weighted windows assign higher weights to more recent observations and less weight to older ones. This is useful when recent data is more important than older data.

python
# Calculate exponentially weighted moving average with different spans
df['ewm_span5'] = df['value'].ewm(span=5).mean()
df['ewm_span10'] = df['value'].ewm(span=10).mean()

print(df[['value', 'ewm_span5', 'ewm_span10']].head(6))

Output:

            value  ewm_span5  ewm_span10
2023-01-01     10  10.000000  10.000000
2023-01-02     15  13.333333  12.727273
2023-01-03     13  13.142857  12.809917
2023-01-04     17  15.380952  14.336626
2023-01-05     20  18.126984  16.457022
2023-01-06     22  20.551991  18.464654

The span parameter determines how quickly the weights decay - a smaller span means more weight on recent observations and faster decay of older values.

Window Functions with GroupBy

One of the most powerful features of pandas is combining window functions with the groupby() operation. This allows you to perform window calculations within specific groups of data.

Let's create a more complex dataset to demonstrate this:

python
# Create a sample sales dataset
data = {
    'date': pd.date_range('20230101', periods=12),
    'store': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'sales': [100, 120, 90, 110, 95, 130, 105, 125, 115, 140, 110, 135]
}
sales_df = pd.DataFrame(data)

# Calculate rolling average sales for each store
sales_df['rolling_avg_sales'] = sales_df.groupby('store')['sales'].rolling(window=3).mean().reset_index(0, drop=True)

print(sales_df.head(8))

Output:

        date store  sales  rolling_avg_sales
2023-01-01     A    100               NaN
2023-01-02     B    120               NaN
2023-01-03     A     90               NaN
2023-01-04     B    110               NaN
2023-01-05     A     95        95.000000
2023-01-06     B    130       120.000000
2023-01-07     A    105        96.666667
2023-01-08     B    125       121.666667

This calculates a 3-day moving average of sales for each store separately.

Real-World Applications

Stock Price Analysis

Window functions are extensively used in financial analysis. Let's see a practical example with stock data:

python
# Let's simulate stock price data
np.random.seed(42)
dates = pd.date_range('20230101', periods=30)
stock_data = pd.DataFrame({
    'price': 100 + np.cumsum(np.random.normal(0.1, 1, 30))
}, index=dates)

# Calculate common technical indicators
stock_data['SMA_5'] = stock_data['price'].rolling(window=5).mean()  # Simple Moving Average
stock_data['EMA_5'] = stock_data['price'].ewm(span=5).mean()  # Exponential Moving Average
stock_data['Upper_Band'] = stock_data['SMA_5'] + 2*stock_data['price'].rolling(window=5).std()
stock_data['Lower_Band'] = stock_data['SMA_5'] - 2*stock_data['price'].rolling(window=5).std()

# Calculate daily returns
stock_data['daily_return'] = stock_data['price'].pct_change()
# Calculate cumulative returns
stock_data['cumulative_return'] = (1 + stock_data['daily_return']).cumprod() - 1

print(stock_data.head(7))

Output:

              price      SMA_5      EMA_5  Upper_Band  Lower_Band  daily_return  cumulative_return
2023-01-01  99.7818       NaN   99.781800         NaN         NaN           NaN                NaN
2023-01-02  99.7784       NaN   99.779332         NaN         NaN      -0.0003          -0.000340
2023-01-03 100.9642       NaN  100.270669         NaN         NaN       0.0119           0.011845
2023-01-04 100.5109       NaN  100.358135         NaN         NaN      -0.0045           0.007309
2023-01-05 101.9812 100.6033 101.036534  103.303168  97.903426       0.0146           0.022039
2023-01-06 101.6037 100.9677 101.265928  103.509686  98.425696      -0.0037           0.018253
2023-01-07 102.7713 101.5663 101.854220  104.434559  98.698001       0.0115           0.029960

This example shows how to calculate several important financial indicators:

Simple moving average (SMA): indicates the average price over a period
Exponential moving average (EMA): gives more weight to recent prices
Bollinger Bands (Upper_Band and Lower_Band): indicate price volatility
Cumulative Return: shows the total return over time

Sales Data Analysis

Window functions are also useful for detecting trends in sales data:

python
# Create more realistic sales data with seasonality and trend
dates = pd.date_range('20230101', periods=90)
base_trend = np.linspace(100, 150, 90)  # Increasing trend
weekly_pattern = 15 * np.sin(np.arange(90) * (2 * np.pi / 7))  # Weekly seasonality
noise = np.random.normal(0, 5, 90)  # Random noise

sales_data = pd.DataFrame({
    'date': dates,
    'sales': base_trend + weekly_pattern + noise
})

# Calculate various window metrics
sales_data['7d_avg'] = sales_data['sales'].rolling(7).mean()  # Weekly average
sales_data['30d_avg'] = sales_data['sales'].rolling(30).mean()  # Monthly average
sales_data['7d_vs_prev'] = sales_data['sales'].rolling(7).mean().diff(7)  # Week-over-week change

# Print recent data
print(sales_data.tail(7))

Output:

          date      sales     7d_avg     30d_avg  7d_vs_prev
2023-03-25  143.93173  126.89578  125.793734    2.029874
2023-03-26  140.71028  129.65700  126.323997    1.496822
2023-03-27  155.96685  133.91193  126.907697    3.283950
2023-03-28  153.25078  139.46409  127.459770    6.100780
2023-03-29  137.49872  142.66988  127.960977    9.090680
2023-03-30  156.30304  146.93978  128.508834   12.877987
2023-03-31  155.82115  149.06894  129.043845   20.209194

This analysis allows you to:

Smooth out day-to-day noise with rolling averages
Compare short-term (7-day) vs. long-term (30-day) trends
Identify week-over-week performance changes

Custom Window Functions

While pandas provides many built-in window functions, you can also apply your own custom functions to windows using .apply().

python
# Define a custom window function
def custom_metric(window_values):
    """Calculate ratio between max and min values in window"""
    return window_values.max() / window_values.min() if window_values.min() > 0 else np.nan

# Apply the custom function to a rolling window
df['volatility_ratio'] = df['value'].rolling(window=3).apply(custom_metric)

print(df[['value', 'volatility_ratio']].head(7))

Output:

            value  volatility_ratio
2023-01-01     10               NaN
2023-01-02     15               NaN
2023-01-03     13         1.500000
2023-01-04     17         1.307692
2023-01-05     20         1.538462
2023-01-06     22         1.294118
2023-01-07     19         1.157895

This custom function calculates a simple volatility ratio by dividing the maximum value by the minimum value in each window.

Performance Considerations

Window functions can be computationally intensive, especially with large datasets. Here are some tips to improve performance:

Specify engine='numba' for many rolling operations to accelerate calculation:

python
# Only works if numba is installed
df['fast_rolling'] = df['value'].rolling(window=3, engine='numba').mean()

Pre-sort your data if using time-based windows to avoid unnecessary sorting operations.
Consider using smaller windows or downsampling your data if working with very long time series.

Use numba-optimized custom functions for complex custom window operations:

python
from numba import jit

@jit(nopython=True)
def optimized_function(values):
    # Efficient implementation here
    return result

Summary

Window functions in pandas are essential tools for time series analysis and sophisticated data manipulation. In this guide, we've covered:

Rolling windows for calculating metrics over fixed-size sliding windows
Expanding windows for cumulative calculations
Exponentially weighted windows that give more importance to recent observations
GroupBy with windows for group-specific window calculations
Real-world applications in finance and sales analysis
Custom window functions for specialized metrics
Performance considerations for working with large datasets

Mastering window functions will significantly enhance your data analysis capabilities in pandas, enabling you to detect patterns, smooth noisy data, and extract insights that would otherwise be hidden in raw data.

Further Resources and Exercises

Additional Resources

Exercises

Basic Rolling Windows: Create a time series of daily temperatures for a month and calculate a 7-day moving average. Plot both the raw data and the smoothed line.
Window Function with GroupBy: Use the Kaggle Superstore Dataset to calculate a 3-month rolling sum of sales by product category.
Custom Window Function: Implement a rolling window function that calculates the percentage of values in the window that exceed the window's median.
Financial Analysis: Download stock data for a company of your choice and implement a trading strategy based on when the 20-day moving average crosses above the 50-day moving average.
Advanced Exercise: Implement a rolling window function that detects outliers by flagging values that are more than 2 standard deviations from the rolling mean.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Rolling Window Functions​

Basic Rolling Window Operations​

Common Rolling Window Functions​

Customizing Rolling Windows​

Expanding Window Functions​

Exponentially Weighted Window Functions​

Window Functions with GroupBy​

Real-World Applications​

Stock Price Analysis​

Sales Data Analysis​

Custom Window Functions​

Performance Considerations​

Summary​

Further Resources and Exercises​

Additional Resources​

Exercises​

Introduction

Rolling Window Functions

Basic Rolling Window Operations

Common Rolling Window Functions

Customizing Rolling Windows

Expanding Window Functions

Exponentially Weighted Window Functions

Window Functions with GroupBy

Real-World Applications

Stock Price Analysis

Sales Data Analysis

Custom Window Functions

Performance Considerations

Summary

Further Resources and Exercises

Additional Resources

Exercises