Skip to main content

Pandas Resampling

Time series data often comes with different frequencies - maybe you have data collected every minute, but you want to analyze it on an hourly basis. Or perhaps you have daily stock prices but want to see monthly trends. This is where pandas resampling becomes incredibly useful.

What is Resampling?

Resampling is the process of changing the frequency of your time series data. This can be:

  • Downsampling: Converting data from a higher frequency to a lower one (e.g., minutes to hours)
  • Upsampling: Converting data from a lower frequency to a higher one (e.g., days to hours)

The resample() function in pandas provides a convenient way to perform frequency conversion and time-based aggregation.

Basic Resampling Syntax

The basic syntax for resampling is:

python
df.resample(rule).aggregation_method()

Where:

  • rule is the time frequency you want to convert to (e.g., 'D' for day, 'M' for month)
  • aggregation_method is how you want to aggregate the data (e.g., mean(), sum(), count())

Common Frequency Aliases

Before we dive into examples, let's look at some common frequency aliases used in pandas:

AliasDescription
'D'Calendar day
'W'Weekly
'M'Month end
'Q'Quarter end
'Y'Year end
'H'Hourly
'T' or 'min'Minute
'S'Second

You can also use multiples like '2D' (every 2 days) or '4H' (every 4 hours).

Downsampling Example

Let's start with a simple example of downsampling. Imagine we have temperature readings taken every hour, and we want to find the daily average temperature.

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a time series of hourly temperature data
dates = pd.date_range('2023-01-01', '2023-01-05', freq='H')
hourly_temp = pd.Series(np.random.normal(15, 5, len(dates)), index=dates)
print("Original hourly data (first 5 rows):")
print(hourly_temp.head())

Output:

Original hourly data (first 5 rows):
2023-01-01 00:00:00 14.194255
2023-01-01 01:00:00 17.511866
2023-01-01 02:00:00 13.145039
2023-01-01 03:00:00 17.157833
2023-01-01 04:00:00 10.142336
Freq: H, dtype: float64

Now, let's resample this hourly data to daily averages:

python
# Resample to daily frequency by taking the mean
daily_avg_temp = hourly_temp.resample('D').mean()
print("\nResampled daily average temperatures:")
print(daily_avg_temp)

Output:

Resampled daily average temperatures:
2023-01-01 15.125065
2023-01-02 14.523831
2023-01-03 15.376299
2023-01-04 14.912478
2023-01-05 14.260212
Freq: D, dtype: float64

We can visualize both the original and resampled data:

python
plt.figure(figsize=(12, 6))
hourly_temp.plot(label='Hourly', alpha=0.5)
daily_avg_temp.plot(label='Daily Average', linewidth=2)
plt.title('Hourly vs Daily Average Temperatures')
plt.legend()
plt.tight_layout()
plt.show()

Multiple Aggregation Methods

Often, you'll want more than one statistic when resampling. For example, you might want to know the minimum, maximum, and average temperature for each day:

python
# Multiple aggregations
daily_stats = hourly_temp.resample('D').agg(['mean', 'min', 'max', 'std'])
print("\nDaily temperature statistics:")
print(daily_stats)

Output:

Daily temperature statistics:
mean min max std
2023-01-01 15.125065 5.437970 25.993872 5.073413
2023-01-02 14.523831 3.384390 24.701195 5.033228
2023-01-03 15.376299 5.156941 25.637072 5.083605
2023-01-04 14.912478 3.947211 24.953857 5.260543
2023-01-05 14.260212 3.775066 24.581462 6.288110

Upsampling Example

Upsampling converts data from a lower frequency to a higher one. Since we're creating data points where none existed before, we need to decide how to fill these new points.

Let's start with daily data and upsample to hourly:

python
# Start with daily data
daily_data = pd.Series([10, 12, 15, 18, 20],
index=pd.date_range('2023-01-01', '2023-01-05', freq='D'))
print("Original daily data:")
print(daily_data)

# Upsample to hourly frequency
# Note that we need to specify how to fill the newly created records
hourly_data_ffill = daily_data.resample('H').ffill() # Forward fill

Output:

Original daily data:
2023-01-01 10
2023-01-02 12
2023-01-03 15
2023-01-04 18
2023-01-05 20
Freq: D, dtype: int64

Let's check the first few rows of our upsampled data:

python
print("\nUpsampled to hourly data (first 10 rows with forward fill):")
print(hourly_data_ffill.head(10))

Output:

Upsampled to hourly data (first 10 rows with forward fill):
2023-01-01 00:00:00 10
2023-01-01 01:00:00 10
2023-01-01 02:00:00 10
2023-01-01 03:00:00 10
2023-01-01 04:00:00 10
2023-01-01 05:00:00 10
2023-01-01 06:00:00 10
2023-01-01 07:00:00 10
2023-01-01 08:00:00 10
2023-01-01 09:00:00 10
Freq: H, dtype: int64

You can see that each day's value is repeated for all hours of that day.

Fill Methods for Upsampling

When upsampling, you need to decide how to fill the newly created time periods. Common methods include:

  • ffill (or pad): Fill with the value from the previous valid observation
  • bfill (or backfill): Fill with the value from the next valid observation
  • interpolate: Use interpolation to estimate values between known points
python
# Compare different fill methods
hourly_data_bfill = daily_data.resample('H').bfill() # Back fill
hourly_data_interp = daily_data.resample('H').interpolate() # Interpolate

# Plot to compare
plt.figure(figsize=(12, 6))
daily_data.plot(marker='o', markersize=10, linestyle='-', label='Original Daily')
hourly_data_ffill.plot(alpha=0.5, label='Forward Fill')
hourly_data_bfill.plot(alpha=0.5, label='Backward Fill')
hourly_data_interp.plot(alpha=0.5, label='Interpolate')

plt.title('Comparison of Upsampling Methods')
plt.legend()
plt.tight_layout()
plt.show()

Resampling with DataFrames

So far, we've used Series examples, but resampling works with DataFrames too, applying the operation to each column:

python
# Create a DataFrame with multiple columns
df = pd.DataFrame({
'Temperature': np.random.normal(15, 5, len(dates)),
'Humidity': np.random.normal(60, 10, len(dates)),
'Pressure': np.random.normal(1013, 5, len(dates))
}, index=dates)

print("Original DataFrame (first 5 rows):")
print(df.head())

# Resample to 6-hour intervals
resampled_df = df.resample('6H').mean()
print("\nResampled DataFrame (6-hourly means):")
print(resampled_df.head())

Real-world Application: Stock Market Analysis

Let's look at a real-world example using stock price data. We'll analyze daily price data but resample it to see weekly and monthly trends.

python
# Let's create some sample stock data
dates = pd.date_range('2022-01-01', '2022-12-31', freq='D')
np.random.seed(42) # For reproducibility
stock_price = 100 + np.cumsum(np.random.normal(0.1, 1, len(dates))) # Random walk
stock_volume = np.random.randint(1000, 10000, len(dates))

stock_df = pd.DataFrame({
'Price': stock_price,
'Volume': stock_volume
}, index=dates)

# Daily returns
stock_df['Daily_Return'] = stock_df['Price'].pct_change() * 100

print("Stock data (first 5 rows):")
print(stock_df.head())

# Resample to weekly data
weekly_data = stock_df.resample('W').agg({
'Price': 'mean',
'Volume': 'sum',
'Daily_Return': 'mean'
})

print("\nWeekly resampled data (first 5 rows):")
print(weekly_data.head())

# Resample to monthly data
monthly_data = stock_df.resample('M').agg({
'Price': ['mean', 'min', 'max'],
'Volume': 'sum',
'Daily_Return': 'mean'
})

print("\nMonthly resampled data (first 3 rows):")
print(monthly_data.head(3))

The monthly resampling gives us a concise summary of each month's performance, including average, minimum, and maximum prices, total trading volume, and average daily return.

Working with Custom Business Periods

Sometimes regular calendar periods don't align with your business needs. For example, you might want to analyze data by fiscal quarters that don't align with calendar quarters.

Pandas supports custom business periods with resample():

python
# Business day resampling
business_day_data = hourly_temp.resample('B').mean() # 'B' for business days

# Business month end
business_month_end = hourly_temp.resample('BM').mean()

print("Business day resampled data (first 5 rows):")
print(business_day_data.head())

Advanced Resampling: Origin and Closed Parameters

You can control exactly how periods are defined using the closed and origin parameters:

python
# Default resampling (period ends are inclusive)
default_daily = hourly_temp.resample('D').mean()

# Changing the closed parameter
left_closed = hourly_temp.resample('D', closed='left').mean() # Include left boundary, exclude right
right_closed = hourly_temp.resample('D', closed='right').mean() # Include right boundary, exclude left

print("Comparing closed parameter effects:")
print(pd.concat([default_daily.head(3), left_closed.head(3), right_closed.head(3)], axis=1,
keys=['default', 'left', 'right']))

The origin parameter lets you specify where time periods begin, which can be useful for custom business periods.

Summary

Resampling is a powerful feature in pandas that allows you to:

  1. Downsample: Reduce frequency and aggregate data (e.g., minute → hour → day)
  2. Upsample: Increase frequency and fill in missing data (e.g., month → day)
  3. Apply various aggregation methods: mean, sum, min, max, etc.
  4. Work with custom business periods: Business days, quarters, etc.

When resampling time series data, remember:

  • Choose appropriate aggregation methods based on your data and analysis needs
  • For upsampling, decide how to fill gaps (forward fill, backward fill, interpolation)
  • Consider whether you need custom period boundaries using closed and origin

Mastering resampling will help you analyze time-based data at different granularities, making it an essential tool in any data analyst's toolkit.

Exercises

  1. Take daily temperature data and resample it to weekly and monthly averages.
  2. Use stock market data to calculate monthly returns from daily prices.
  3. Upsample monthly sales data to daily values using different filling methods, and compare which gives the most realistic results.
  4. Create hourly website traffic data and resample it to show patterns by hour of day and day of week.
  5. Take 15-minute interval exercise data (e.g., heart rate) and downsample it to hourly summaries.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)