Skip to main content

Pandas Seasonal Decomposition

Time series data often contains multiple underlying patterns that contribute to the values we observe. To better understand these patterns and make more accurate predictions, we can decompose a time series into its fundamental components. In this tutorial, we'll explore seasonal decomposition in Pandas, a powerful technique for breaking down time series data.

Introduction to Time Series Decomposition

Time series decomposition splits a time series into several component parts, typically:

  1. Trend - The long-term progression of the series (upward or downward)
  2. Seasonality - Regular patterns that repeat at fixed intervals
  3. Residual (or irregular) - Random fluctuations that cannot be attributed to trend or seasonality

Understanding these components helps data analysts and scientists better understand the data, remove unwanted components, and build more accurate forecasting models.

Prerequisites

Before we begin, make sure you have the following libraries installed:

python
# Install required packages if needed
# !pip install pandas numpy matplotlib statsmodels

Let's import the necessary libraries for our examples:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

Basic Seasonal Decomposition

The seasonal_decompose function from the statsmodels library makes it easy to decompose a time series into its components. Let's create a simple example with synthetic data:

python
# Create a date range
dates = pd.date_range(start='2020-01-01', periods=730, freq='D')

# Create a time series with trend and seasonality
trend = np.linspace(10, 30, 730) # Increasing trend
seasonality = 5 * np.sin(np.arange(730) * (2 * np.pi / 365)) # Yearly seasonality
noise = np.random.normal(0, 1, 730) # Random noise

# Combine components
ts_data = trend + seasonality + noise

# Create a pandas Series
time_series = pd.Series(ts_data, index=dates)

# Display the first few values
print(time_series.head())

Output:

2020-01-01    10.090039
2020-01-02 10.549292
2020-01-03 10.167876
2020-01-04 10.882820
2020-01-05 10.601168
Freq: D, dtype: float64

Let's visualize our time series:

python
plt.figure(figsize=(12, 6))
plt.plot(time_series)
plt.title('Synthetic Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.show()

Now, let's decompose this time series into its components:

python
decomposition = seasonal_decompose(time_series, model='additive', period=365)

# Plot the decomposed components
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 12))
decomposition.observed.plot(ax=ax1)
ax1.set_title('Observed')
decomposition.trend.plot(ax=ax2)
ax2.set_title('Trend')
decomposition.seasonal.plot(ax=ax3)
ax3.set_title('Seasonality')
decomposition.resid.plot(ax=ax4)
ax4.set_title('Residuals')
plt.tight_layout()
plt.show()

Understanding the Components

Let's examine what each component represents:

1. Trend Component

The trend component shows the long-term progression of your time series. It answers questions like:

  • Is the data generally increasing or decreasing over time?
  • Are there any long-term cycles or patterns?
python
print("First 5 values of the trend component:")
print(decomposition.trend.head())
print("\nLast 5 values of the trend component:")
print(decomposition.trend.tail())

Output:

First 5 values of the trend component:
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 10.64342
Freq: D, dtype: float64

Last 5 values of the trend component:
2021-12-28 29.36164
2021-12-29 29.38904
2021-12-30 29.41644
2021-12-31 29.44384
2022-01-01 NaN
Freq: D, dtype: float64

Notice that the trend component has NaN values at the beginning and end. This is because the trend is calculated using a rolling average, and there aren't enough data points at the boundaries.

2. Seasonal Component

The seasonal component captures recurring patterns at fixed intervals:

python
print("Seasonal component pattern (first 14 days):")
print(decomposition.seasonal.head(14))

Output:

Seasonal component pattern (first 14 days):
2020-01-01 0.000000
2020-01-02 0.086083
2020-01-03 0.172055
2020-01-04 0.257805
2020-01-05 0.343225
2020-01-06 0.428204
2020-01-07 0.512634
2020-01-08 0.596405
2020-01-09 0.679411
2020-01-10 0.761544
2020-01-11 0.842701
2020-01-12 0.922776
2020-01-13 1.001670
2020-01-14 1.079282
Freq: D, dtype: float64

The seasonal component repeats with the period specified (365 days in our example), showing how values fluctuate within each cycle.

3. Residual Component

Residuals represent what's left after removing trend and seasonality—the "unexplained" part of your data:

python
print("Residual component statistics:")
print(f"Mean: {decomposition.resid.mean()}")
print(f"Standard Deviation: {decomposition.resid.std()}")

Output:

Residual component statistics:
Mean: -0.0024627243333999917
Standard Deviation: 0.9986236321877903

In an ideal decomposition, residuals should look like random noise with no discernible pattern.

Additive vs. Multiplicative Decomposition

There are two main models for time series decomposition:

  1. Additive: Y = Trend + Seasonality + Residual

    • Use when seasonal variations are consistent in magnitude over time
    • Our example above used an additive model
  2. Multiplicative: Y = Trend * Seasonality * Residual

    • Use when seasonal variations increase/decrease proportionally with the trend

Let's see how to use a multiplicative model:

python
# Create data with multiplicative seasonality
trend_mult = np.linspace(10, 50, 730)
seasonality_mult = 1 + 0.3 * np.sin(np.arange(730) * (2 * np.pi / 365))
noise_mult = np.random.normal(1, 0.05, 730)

# Combine components multiplicatively
ts_data_mult = trend_mult * seasonality_mult * noise_mult

# Create a pandas Series
time_series_mult = pd.Series(ts_data_mult, index=dates)

# Decompose using multiplicative model
decomposition_mult = seasonal_decompose(time_series_mult, model='multiplicative', period=365)

# Plot the decomposed components
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 12))
decomposition_mult.observed.plot(ax=ax1)
ax1.set_title('Observed (Multiplicative)')
decomposition_mult.trend.plot(ax=ax2)
ax2.set_title('Trend')
decomposition_mult.seasonal.plot(ax=ax3)
ax3.set_title('Seasonality')
decomposition_mult.resid.plot(ax=ax4)
ax4.set_title('Residuals')
plt.tight_layout()
plt.show()

Notice how in the multiplicative model:

  • The seasonal component is expressed as factors (around 1.0)
  • The amplitude of seasonal variations increases with the trend level

Real-World Example: Analyzing Weather Data

Let's apply seasonal decomposition to real-world data. We'll use monthly temperature data:

python
# Download temperature data (this is a synthetic example)
# In real applications, you would load your own data
np.random.seed(42)
dates = pd.date_range('2010-01-01', '2019-12-31', freq='M')
temperatures = 20 + 10 * np.sin(np.arange(len(dates)) * (2 * np.pi / 12)) + np.linspace(0, 3, len(dates)) + np.random.normal(0, 2, len(dates))
temp_data = pd.Series(temperatures, index=dates)

# Plot the temperature data
plt.figure(figsize=(12, 6))
temp_data.plot()
plt.title('Monthly Temperature Data (2010-2019)')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()

# Decompose the temperature data
temp_decomposition = seasonal_decompose(temp_data, model='additive', period=12)

# Plot the decomposition
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 10))
temp_decomposition.observed.plot(ax=ax1)
ax1.set_title('Observed Temperature')
temp_decomposition.trend.plot(ax=ax2)
ax2.set_title('Temperature Trend')
temp_decomposition.seasonal.plot(ax=ax3)
ax3.set_title('Temperature Seasonality')
temp_decomposition.resid.plot(ax=ax4)
ax4.set_title('Temperature Residuals')
plt.tight_layout()
plt.show()

Analyzing the Results

From our temperature decomposition, we can observe:

  1. Trend Component: Shows a general warming trend over the decade.
  2. Seasonal Component: Clearly shows the yearly temperature cycle with peaks in summer and troughs in winter.
  3. Residual Component: Captures unusual weather events and other random fluctuations.

Handling Missing Values

Time series data often contains missing values. Let's see how to handle them:

python
# Create a copy of our time series with some missing values
ts_with_missing = time_series.copy()
# Randomly set 5% of the values to NaN
random_indices = np.random.choice(len(ts_with_missing), size=int(len(ts_with_missing) * 0.05), replace=False)
ts_with_missing.iloc[random_indices] = np.nan

print(f"Number of missing values: {ts_with_missing.isna().sum()}")

# Fill missing values using forward fill method
ts_filled = ts_with_missing.fillna(method='ffill')

# Now we can decompose as before
decomposition_filled = seasonal_decompose(ts_filled, model='additive', period=365)

# Plot to verify the results
plt.figure(figsize=(12, 6))
plt.plot(time_series, label='Original Data')
plt.plot(ts_with_missing, 'r.', label='Missing Values', alpha=0.5)
plt.plot(ts_filled, 'g--', label='Filled Data')
plt.legend()
plt.title('Handling Missing Values in Time Series Data')
plt.show()

Practical Applications of Seasonal Decomposition

  1. Forecasting: Removing seasonality can improve forecasting models.
  2. Anomaly Detection: Unexpectedly large residuals can indicate unusual events.
  3. Seasonal Adjustment: Removing seasonality helps compare values across different time periods.
  4. Understanding Business Patterns: Identifying predictable cycles helps with planning.

Let's demonstrate an example of anomaly detection:

python
# Create a time series with an anomaly
anomaly_ts = time_series.copy()
# Add an anomaly at a specific date
anomaly_date = pd.Timestamp('2020-07-15')
anomaly_ts[anomaly_date] += 15 # Add a spike

# Decompose the series
anomaly_decomp = seasonal_decompose(anomaly_ts, model='additive', period=365)

# Check if we can detect the anomaly in the residuals
plt.figure(figsize=(12, 6))
plt.plot(anomaly_decomp.resid)
plt.axhline(y=3*anomaly_decomp.resid.std(), color='r', linestyle='--', label='3σ Threshold')
plt.axhline(y=-3*anomaly_decomp.resid.std(), color='r', linestyle='--')
plt.scatter(anomaly_date, anomaly_decomp.resid[anomaly_date], color='red', s=100, label='Anomaly')
plt.title('Anomaly Detection using Residuals')
plt.legend()
plt.grid(True)
plt.show()

print(f"Residual value at anomaly date: {anomaly_decomp.resid[anomaly_date]}")
print(f"3-sigma threshold: {3*anomaly_decomp.resid.std()}")

Advanced Tips for Better Decomposition

  1. Choosing the Right Period: Selecting the correct period is crucial for accurate decomposition.
  2. Handling Trend Changes: Consider using piecewise decomposition for data with trend shifts.
  3. Filtering: Pre-filtering data can sometimes improve decomposition results.
  4. Alternative Methods: For complex time series, consider STL decomposition (Seasonal and Trend decomposition using LOESS).

Example of using STL decomposition:

python
from statsmodels.tsa.seasonal import STL

stl = STL(time_series, period=365)
result = stl.fit()

# Plot the results
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 10))
ax1.plot(result.observed)
ax1.set_title('Original Series')
ax2.plot(result.trend)
ax2.set_title('Trend (STL)')
ax3.plot(result.seasonal)
ax3.set_title('Seasonal (STL)')
ax4.plot(result.resid)
ax4.set_title('Residual (STL)')
plt.tight_layout()
plt.show()

Summary

Seasonal decomposition is a powerful technique for understanding time series data by breaking it down into its fundamental components:

  • Trend: The long-term progression of the data
  • Seasonality: Regular, recurring patterns
  • Residual: Random variations and noise

Using the seasonal_decompose function from statsmodels, we can easily perform this analysis in Python. The key benefits include:

  • Better understanding of underlying patterns
  • Improved forecasting by handling each component separately
  • Anomaly detection using residual analysis
  • Ability to remove seasonal effects for clearer trend analysis

Remember to choose between additive and multiplicative models based on whether seasonal variations are consistent (additive) or proportional to the trend level (multiplicative).

Additional Resources

Exercises

  1. Download a dataset of monthly retail sales and decompose it using both additive and multiplicative models. Which model seems more appropriate and why?

  2. Create a synthetic time series with a complex seasonal pattern (e.g., both weekly and yearly cycles). Can seasonal decomposition still identify these patterns?

  3. Implement an anomaly detection system using seasonal decomposition that flags data points whose residuals exceed a certain threshold.

  4. Compare the results of seasonal decomposition and STL decomposition on a real-world dataset. What are the advantages and disadvantages of each method?

  5. Choose a time series with missing values. Compare different imputation methods (mean, median, forward-fill, etc.) and assess how they affect the decomposition results.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)