Skip to main content

Pandas Time Shifting

Time shifting is a powerful feature in Pandas that allows you to move data values forward or backward in time. This technique is essential for time series analysis, particularly when working with financial data, forecasting, or comparing current values with past or future periods.

Introduction to Time Shifting

Time shifting refers to the process of moving data points along the time axis. In practical terms, this means accessing values from previous or future time periods relative to each current observation. Common applications include:

  • Calculating period-over-period changes
  • Creating lag variables for forecasting models
  • Analyzing leading or trailing indicators
  • Computing moving comparisons (e.g., year-over-year growth)

Pandas provides several powerful functions for time shifting, with shift() being the most commonly used.

Basic Time Shifting with shift()

The shift() function moves values by a specified number of periods. Positive values shift data forward in time (creating lagged values), while negative values shift data backward (creating lead values).

Let's see a basic example:

python
import pandas as pd
import numpy as np

# Create a simple time series
dates = pd.date_range('2023-01-01', periods=5, freq='D')
data = pd.Series(range(1, 6), index=dates)
print("Original data:")
print(data)

# Shift forward by 1 period (lag)
print("\nShifted forward by 1 (lag):")
print(data.shift(1))

# Shift backward by 1 period (lead)
print("\nShifted backward by 1 (lead):")
print(data.shift(-1))

Output:

Original data:
2023-01-01 1
2023-01-02 2
2023-01-03 3
2023-01-04 4
2023-01-05 5
Freq: D, dtype: int64

Shifted forward by 1 (lag):
2023-01-01 NaN
2023-01-02 1.0
2023-01-03 2.0
2023-01-04 3.0
2023-01-05 4.0
Freq: D, dtype: float64

Shifted backward by 1 (lead):
2023-01-01 2.0
2023-01-02 3.0
2023-01-03 4.0
2023-01-04 5.0
2023-01-05 NaN
Freq: D, dtype: float64

Notice that when we shift data, NaN values are introduced at the beginning (for forward shifts) or at the end (for backward shifts) of the series because those values don't exist in the original data.

Advanced shift() Parameters

The shift() function accepts several parameters to customize the shifting behavior:

Frequency-Based Shifting

You can shift by a specific time frequency instead of by index position:

python
# Create a time series with irregular dates
dates = pd.DatetimeIndex(['2023-01-01', '2023-01-03', '2023-01-06', '2023-01-10'])
irregular_data = pd.Series([10, 20, 30, 40], index=dates)

print("Original irregular data:")
print(irregular_data)

# Shift by 2 days (not by index position)
print("\nShifted by 2 days (freq='2D'):")
print(irregular_data.shift(periods=1, freq='2D'))

Output:

Original irregular data:
2023-01-01 10
2023-01-03 20
2023-01-06 30
2023-01-10 40
dtype: int64

Shifted by 2 days (freq='2D'):
2023-01-03 10
2023-01-05 20
2023-01-08 30
2023-01-12 40
dtype: int64

Notice how the dates themselves are shifted by 2 days, but the values remain associated with their original observations.

Fill Value

You can specify a fill value to replace NaN values created during shifting:

python
print("Shift with custom fill value:")
print(data.shift(1, fill_value=0))

Output:

Shift with custom fill value:
2023-01-01 0
2023-01-02 1
2023-01-03 2
2023-01-04 3
2023-01-05 4
Freq: D, dtype: int64

Time Shifting in DataFrames

The shift() function works on DataFrames as well, shifting all columns by default:

python
# Create a DataFrame with time series data
df = pd.DataFrame({
'Sales': [100, 150, 200, 250, 300],
'Costs': [80, 100, 120, 150, 190]
}, index=pd.date_range('2023-01-01', periods=5, freq='D'))

print("Original DataFrame:")
print(df)

print("\nShifted DataFrame (1 period):")
print(df.shift(1))

Output:

Original DataFrame:
Sales Costs
2023-01-01 100 80
2023-01-02 150 100
2023-01-03 200 120
2023-01-04 250 150
2023-01-05 300 190

Shifted DataFrame (1 period):
Sales Costs
2023-01-01 NaN NaN
2023-01-02 100.0 80.0
2023-01-03 150.0 100.0
2023-01-04 200.0 120.0
2023-01-05 250.0 150.0

Shifting Specific Columns

You can shift only specific columns of a DataFrame:

python
# Shift only the Sales column
df_shifted = df.copy()
df_shifted['Sales_Lag1'] = df['Sales'].shift(1)
print("\nDataFrame with lagged Sales column:")
print(df_shifted)

Output:

DataFrame with lagged Sales column:
Sales Costs Sales_Lag1
2023-01-01 100 80 NaN
2023-01-02 150 100 100.0
2023-01-03 200 120 150.0
2023-01-04 250 150 200.0
2023-01-05 300 190 250.0

Time Shifting vs. Index Shifting

It's important to understand the difference between:

  1. shift() - Shifts the values but keeps the same index
  2. tshift() - Shifts the index but keeps the same values

Let's see the difference:

python
ts = pd.Series(range(3), index=pd.date_range('2023-01-01', periods=3, freq='D'))
print("Original:")
print(ts)

print("\nUsing shift() - values shift, index remains the same:")
print(ts.shift(1))

print("\nUsing tshift() - index shifts, values remain the same:")
print(ts.tshift(1))

Output:

Original:
2023-01-01 0
2023-01-02 1
2023-01-03 2
Freq: D, dtype: int64

Using shift() - values shift, index remains the same:
2023-01-01 NaN
2023-01-02 0.0
2023-01-03 1.0
Freq: D, dtype: float64

Using tshift() - index shifts, values remain the same:
2023-01-02 0
2023-01-03 1
2023-01-04 2
Freq: D, dtype: int64

Note: tshift() is deprecated in newer versions of Pandas and may be removed in future versions. You can achieve the same effect using shift(periods, freq=...).

Practical Applications

Let's explore some common real-world applications of time shifting:

Calculating Day-Over-Day Changes

Time shifting is extremely useful for calculating changes over time periods:

python
# Daily stock prices
stock_prices = pd.Series([100, 101, 103, 99, 105, 102, 107],
index=pd.date_range('2023-07-01', periods=7, freq='B'))

print("Stock Prices:")
print(stock_prices)

# Calculate day-over-day change
stock_prices['Previous_Day'] = stock_prices.shift(1)
stock_prices['Daily_Change'] = stock_prices - stock_prices['Previous_Day']
stock_prices['Percent_Change'] = stock_prices['Daily_Change'] / stock_prices['Previous_Day'] * 100

print("\nWith Day-over-Day Changes:")
print(stock_prices)

Output:

Stock Prices:
2023-07-03 100
2023-07-04 101
2023-07-05 103
2023-07-06 99
2023-07-07 105
2023-07-10 102
2023-07-11 107
Freq: B, dtype: int64

With Day-over-Day Changes:
0 Previous_Day Daily_Change Percent_Change
2023-07-03 100 NaN NaN NaN
2023-07-04 101 100.0 1.0 1.00
2023-07-05 103 101.0 2.0 1.98
2023-07-06 99 103.0 -4.0 -3.88
2023-07-07 105 99.0 6.0 6.06
2023-07-10 102 105.0 -3.0 -2.86
2023-07-11 107 102.0 5.0 4.90

Creating Features for Time Series Forecasting

Time shifting is essential when creating lag features for forecasting models:

python
# Monthly sales data
monthly_sales = pd.Series([10000, 12000, 9800, 11500, 14000, 16300, 15200, 16700, 18500, 19200, 18700, 22300],
index=pd.date_range('2023-01-01', periods=12, freq='MS'))

# Create a dataframe with lagged features
sales_df = pd.DataFrame({'Sales': monthly_sales})
sales_df['Sales_Lag1'] = monthly_sales.shift(1) # Previous month
sales_df['Sales_Lag2'] = monthly_sales.shift(2) # Two months ago
sales_df['Sales_Lag12'] = monthly_sales.shift(12) # Same month last year

print("Sales Data with Lag Features:")
print(sales_df.head(7))

Output:

Sales Data with Lag Features:
Sales Sales_Lag1 Sales_Lag2 Sales_Lag12
2023-01-01 10000 NaN NaN NaN
2023-02-01 12000 10000.0 NaN NaN
2023-03-01 9800 12000.0 10000.0 NaN
2023-04-01 11500 9800.0 12000.0 NaN
2023-05-01 14000 11500.0 9800.0 NaN
2023-06-01 16300 14000.0 11500.0 NaN
2023-07-01 15200 16300.0 14000.0 NaN

Year-over-Year Comparison

Time shifting can help analyze seasonal patterns by comparing with the same period in previous years:

python
# Quarterly revenue data over multiple years
quarters = pd.date_range('2021-01-01', periods=12, freq='Q')
quarterly_revenue = pd.Series([75, 85, 100, 120, 80, 95, 110, 130, 90, 100, 115, 140], index=quarters)

yoy_comparison = pd.DataFrame({'Revenue': quarterly_revenue})
yoy_comparison['Last_Year'] = quarterly_revenue.shift(4) # Same quarter last year
yoy_comparison['YoY_Change'] = yoy_comparison['Revenue'] - yoy_comparison['Last_Year']
yoy_comparison['YoY_Growth'] = (yoy_comparison['YoY_Change'] / yoy_comparison['Last_Year'] * 100).round(1)

print("Year-over-Year Comparison:")
print(yoy_comparison)

Output:

Year-over-Year Comparison:
Revenue Last_Year YoY_Change YoY_Growth
2021-03-31 75 NaN NaN NaN
2021-06-30 85 NaN NaN NaN
2021-09-30 100 NaN NaN NaN
2021-12-31 120 NaN NaN NaN
2022-03-31 80 75.0 5.0 6.7
2022-06-30 95 85.0 10.0 11.8
2022-09-30 110 100.0 10.0 10.0
2022-12-31 130 120.0 10.0 8.3
2023-03-31 90 80.0 10.0 12.5
2023-06-30 100 95.0 5.0 5.3
2023-09-30 115 110.0 5.0 4.5
2023-12-31 140 130.0 10.0 7.7

Technical Analysis with Shifting

Time shifting is extensively used in financial technical analysis to calculate indicators:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Simulating daily stock price data
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=60, freq='B')
prices = pd.Series(np.cumsum(np.random.randn(60)*0.1) + 100, index=dates)

# Calculate simple moving average
def calculate_sma(data, window):
return data.rolling(window=window).mean()

# Calculate RSI (Relative Strength Index)
def calculate_rsi(data, window=14):
# Calculate price changes
delta = data.diff()

# Separate gains and losses
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)

# Calculate average gain and loss
avg_gain = gain.rolling(window=window).mean()
avg_loss = loss.rolling(window=window).mean()

# Calculate RS and RSI
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))

return rsi

# Create technical indicators using shifting
tech_df = pd.DataFrame({'Close': prices})
tech_df['SMA_20'] = calculate_sma(prices, 20)
tech_df['RSI_14'] = calculate_rsi(prices)
tech_df['Prev_Close'] = prices.shift(1)
tech_df['Price_Change'] = tech_df['Close'] - tech_df['Prev_Close']
tech_df['MACD'] = calculate_sma(prices, 12) - calculate_sma(prices, 26)

print("Technical Analysis DataFrame:")
print(tech_df.tail())

# Simple visualization
plt.figure(figsize=(12, 6))
plt.plot(tech_df['Close'], label='Close Price')
plt.plot(tech_df['SMA_20'], label='20-day SMA')
plt.title('Stock Price with 20-day Simple Moving Average')
plt.legend()
plt.tight_layout()

This would generate a technical analysis DataFrame and plot with a 20-day simple moving average.

Summary

Time shifting is a fundamental technique in time series analysis with Pandas. Key takeaways:

  1. Use shift() to move data values forward (lag) or backward (lead) in time
  2. Forward shifts (positive periods) are useful for analyzing past influences on current values
  3. Backward shifts (negative periods) help predict future outcomes based on current conditions
  4. When shifting, NaN values are introduced at the edges, which can be replaced with custom fill values
  5. Time shifting works on both Series and DataFrame objects
  6. The technique is essential for calculating period-over-period changes and creating lag features for forecasting

Time shifting enables powerful analyses like year-over-year comparisons, technical indicators for financial data, and feature engineering for predictive models.

Exercises

To practice your time shifting skills, try these exercises:

  1. Load a dataset with daily temperature readings and calculate the 7-day temperature change.
  2. Create a weekly sales dataset and compute the percentage growth compared to the same week last year.
  3. Using stock price data, implement a trading strategy that buys when the 5-day moving average crosses above the 20-day moving average.
  4. Create a lag feature matrix for a time series with lags 1, 7, and 30 to predict future values.
  5. Implement a seasonal decomposition that compares each month's value to its average over the past three years.

Additional Resources

With these techniques, you'll be well-equipped to manipulate time series data and extract valuable insights from temporal patterns.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)