Pandas Time Shifting
Time shifting is a powerful feature in Pandas that allows you to move data values forward or backward in time. This technique is essential for time series analysis, particularly when working with financial data, forecasting, or comparing current values with past or future periods.
Introduction to Time Shifting
Time shifting refers to the process of moving data points along the time axis. In practical terms, this means accessing values from previous or future time periods relative to each current observation. Common applications include:
- Calculating period-over-period changes
- Creating lag variables for forecasting models
- Analyzing leading or trailing indicators
- Computing moving comparisons (e.g., year-over-year growth)
Pandas provides several powerful functions for time shifting, with shift()
being the most commonly used.
Basic Time Shifting with shift()
The shift()
function moves values by a specified number of periods. Positive values shift data forward in time (creating lagged values), while negative values shift data backward (creating lead values).
Let's see a basic example:
import pandas as pd
import numpy as np
# Create a simple time series
dates = pd.date_range('2023-01-01', periods=5, freq='D')
data = pd.Series(range(1, 6), index=dates)
print("Original data:")
print(data)
# Shift forward by 1 period (lag)
print("\nShifted forward by 1 (lag):")
print(data.shift(1))
# Shift backward by 1 period (lead)
print("\nShifted backward by 1 (lead):")
print(data.shift(-1))
Output:
Original data:
2023-01-01 1
2023-01-02 2
2023-01-03 3
2023-01-04 4
2023-01-05 5
Freq: D, dtype: int64
Shifted forward by 1 (lag):
2023-01-01 NaN
2023-01-02 1.0
2023-01-03 2.0
2023-01-04 3.0
2023-01-05 4.0
Freq: D, dtype: float64
Shifted backward by 1 (lead):
2023-01-01 2.0
2023-01-02 3.0
2023-01-03 4.0
2023-01-04 5.0
2023-01-05 NaN
Freq: D, dtype: float64
Notice that when we shift data, NaN values are introduced at the beginning (for forward shifts) or at the end (for backward shifts) of the series because those values don't exist in the original data.
Advanced shift()
Parameters
The shift()
function accepts several parameters to customize the shifting behavior:
Frequency-Based Shifting
You can shift by a specific time frequency instead of by index position:
# Create a time series with irregular dates
dates = pd.DatetimeIndex(['2023-01-01', '2023-01-03', '2023-01-06', '2023-01-10'])
irregular_data = pd.Series([10, 20, 30, 40], index=dates)
print("Original irregular data:")
print(irregular_data)
# Shift by 2 days (not by index position)
print("\nShifted by 2 days (freq='2D'):")
print(irregular_data.shift(periods=1, freq='2D'))
Output:
Original irregular data:
2023-01-01 10
2023-01-03 20
2023-01-06 30
2023-01-10 40
dtype: int64
Shifted by 2 days (freq='2D'):
2023-01-03 10
2023-01-05 20
2023-01-08 30
2023-01-12 40
dtype: int64
Notice how the dates themselves are shifted by 2 days, but the values remain associated with their original observations.
Fill Value
You can specify a fill value to replace NaN values created during shifting:
print("Shift with custom fill value:")
print(data.shift(1, fill_value=0))
Output:
Shift with custom fill value:
2023-01-01 0
2023-01-02 1
2023-01-03 2
2023-01-04 3
2023-01-05 4
Freq: D, dtype: int64
Time Shifting in DataFrames
The shift()
function works on DataFrames as well, shifting all columns by default:
# Create a DataFrame with time series data
df = pd.DataFrame({
'Sales': [100, 150, 200, 250, 300],
'Costs': [80, 100, 120, 150, 190]
}, index=pd.date_range('2023-01-01', periods=5, freq='D'))
print("Original DataFrame:")
print(df)
print("\nShifted DataFrame (1 period):")
print(df.shift(1))
Output:
Original DataFrame:
Sales Costs
2023-01-01 100 80
2023-01-02 150 100
2023-01-03 200 120
2023-01-04 250 150
2023-01-05 300 190
Shifted DataFrame (1 period):
Sales Costs
2023-01-01 NaN NaN
2023-01-02 100.0 80.0
2023-01-03 150.0 100.0
2023-01-04 200.0 120.0
2023-01-05 250.0 150.0
Shifting Specific Columns
You can shift only specific columns of a DataFrame:
# Shift only the Sales column
df_shifted = df.copy()
df_shifted['Sales_Lag1'] = df['Sales'].shift(1)
print("\nDataFrame with lagged Sales column:")
print(df_shifted)
Output:
DataFrame with lagged Sales column:
Sales Costs Sales_Lag1
2023-01-01 100 80 NaN
2023-01-02 150 100 100.0
2023-01-03 200 120 150.0
2023-01-04 250 150 200.0
2023-01-05 300 190 250.0
Time Shifting vs. Index Shifting
It's important to understand the difference between:
shift()
- Shifts the values but keeps the same indextshift()
- Shifts the index but keeps the same values
Let's see the difference:
ts = pd.Series(range(3), index=pd.date_range('2023-01-01', periods=3, freq='D'))
print("Original:")
print(ts)
print("\nUsing shift() - values shift, index remains the same:")
print(ts.shift(1))
print("\nUsing tshift() - index shifts, values remain the same:")
print(ts.tshift(1))
Output:
Original:
2023-01-01 0
2023-01-02 1
2023-01-03 2
Freq: D, dtype: int64
Using shift() - values shift, index remains the same:
2023-01-01 NaN
2023-01-02 0.0
2023-01-03 1.0
Freq: D, dtype: float64
Using tshift() - index shifts, values remain the same:
2023-01-02 0
2023-01-03 1
2023-01-04 2
Freq: D, dtype: int64
Note: tshift()
is deprecated in newer versions of Pandas and may be removed in future versions. You can achieve the same effect using shift(periods, freq=...)
.
Practical Applications
Let's explore some common real-world applications of time shifting:
Calculating Day-Over-Day Changes
Time shifting is extremely useful for calculating changes over time periods:
# Daily stock prices
stock_prices = pd.Series([100, 101, 103, 99, 105, 102, 107],
index=pd.date_range('2023-07-01', periods=7, freq='B'))
print("Stock Prices:")
print(stock_prices)
# Calculate day-over-day change
stock_prices['Previous_Day'] = stock_prices.shift(1)
stock_prices['Daily_Change'] = stock_prices - stock_prices['Previous_Day']
stock_prices['Percent_Change'] = stock_prices['Daily_Change'] / stock_prices['Previous_Day'] * 100
print("\nWith Day-over-Day Changes:")
print(stock_prices)
Output:
Stock Prices:
2023-07-03 100
2023-07-04 101
2023-07-05 103
2023-07-06 99
2023-07-07 105
2023-07-10 102
2023-07-11 107
Freq: B, dtype: int64
With Day-over-Day Changes:
0 Previous_Day Daily_Change Percent_Change
2023-07-03 100 NaN NaN NaN
2023-07-04 101 100.0 1.0 1.00
2023-07-05 103 101.0 2.0 1.98
2023-07-06 99 103.0 -4.0 -3.88
2023-07-07 105 99.0 6.0 6.06
2023-07-10 102 105.0 -3.0 -2.86
2023-07-11 107 102.0 5.0 4.90
Creating Features for Time Series Forecasting
Time shifting is essential when creating lag features for forecasting models:
# Monthly sales data
monthly_sales = pd.Series([10000, 12000, 9800, 11500, 14000, 16300, 15200, 16700, 18500, 19200, 18700, 22300],
index=pd.date_range('2023-01-01', periods=12, freq='MS'))
# Create a dataframe with lagged features
sales_df = pd.DataFrame({'Sales': monthly_sales})
sales_df['Sales_Lag1'] = monthly_sales.shift(1) # Previous month
sales_df['Sales_Lag2'] = monthly_sales.shift(2) # Two months ago
sales_df['Sales_Lag12'] = monthly_sales.shift(12) # Same month last year
print("Sales Data with Lag Features:")
print(sales_df.head(7))
Output:
Sales Data with Lag Features:
Sales Sales_Lag1 Sales_Lag2 Sales_Lag12
2023-01-01 10000 NaN NaN NaN
2023-02-01 12000 10000.0 NaN NaN
2023-03-01 9800 12000.0 10000.0 NaN
2023-04-01 11500 9800.0 12000.0 NaN
2023-05-01 14000 11500.0 9800.0 NaN
2023-06-01 16300 14000.0 11500.0 NaN
2023-07-01 15200 16300.0 14000.0 NaN
Year-over-Year Comparison
Time shifting can help analyze seasonal patterns by comparing with the same period in previous years:
# Quarterly revenue data over multiple years
quarters = pd.date_range('2021-01-01', periods=12, freq='Q')
quarterly_revenue = pd.Series([75, 85, 100, 120, 80, 95, 110, 130, 90, 100, 115, 140], index=quarters)
yoy_comparison = pd.DataFrame({'Revenue': quarterly_revenue})
yoy_comparison['Last_Year'] = quarterly_revenue.shift(4) # Same quarter last year
yoy_comparison['YoY_Change'] = yoy_comparison['Revenue'] - yoy_comparison['Last_Year']
yoy_comparison['YoY_Growth'] = (yoy_comparison['YoY_Change'] / yoy_comparison['Last_Year'] * 100).round(1)
print("Year-over-Year Comparison:")
print(yoy_comparison)
Output:
Year-over-Year Comparison:
Revenue Last_Year YoY_Change YoY_Growth
2021-03-31 75 NaN NaN NaN
2021-06-30 85 NaN NaN NaN
2021-09-30 100 NaN NaN NaN
2021-12-31 120 NaN NaN NaN
2022-03-31 80 75.0 5.0 6.7
2022-06-30 95 85.0 10.0 11.8
2022-09-30 110 100.0 10.0 10.0
2022-12-31 130 120.0 10.0 8.3
2023-03-31 90 80.0 10.0 12.5
2023-06-30 100 95.0 5.0 5.3
2023-09-30 115 110.0 5.0 4.5
2023-12-31 140 130.0 10.0 7.7
Technical Analysis with Shifting
Time shifting is extensively used in financial technical analysis to calculate indicators:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Simulating daily stock price data
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=60, freq='B')
prices = pd.Series(np.cumsum(np.random.randn(60)*0.1) + 100, index=dates)
# Calculate simple moving average
def calculate_sma(data, window):
return data.rolling(window=window).mean()
# Calculate RSI (Relative Strength Index)
def calculate_rsi(data, window=14):
# Calculate price changes
delta = data.diff()
# Separate gains and losses
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
# Calculate average gain and loss
avg_gain = gain.rolling(window=window).mean()
avg_loss = loss.rolling(window=window).mean()
# Calculate RS and RSI
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
return rsi
# Create technical indicators using shifting
tech_df = pd.DataFrame({'Close': prices})
tech_df['SMA_20'] = calculate_sma(prices, 20)
tech_df['RSI_14'] = calculate_rsi(prices)
tech_df['Prev_Close'] = prices.shift(1)
tech_df['Price_Change'] = tech_df['Close'] - tech_df['Prev_Close']
tech_df['MACD'] = calculate_sma(prices, 12) - calculate_sma(prices, 26)
print("Technical Analysis DataFrame:")
print(tech_df.tail())
# Simple visualization
plt.figure(figsize=(12, 6))
plt.plot(tech_df['Close'], label='Close Price')
plt.plot(tech_df['SMA_20'], label='20-day SMA')
plt.title('Stock Price with 20-day Simple Moving Average')
plt.legend()
plt.tight_layout()
This would generate a technical analysis DataFrame and plot with a 20-day simple moving average.
Summary
Time shifting is a fundamental technique in time series analysis with Pandas. Key takeaways:
- Use
shift()
to move data values forward (lag) or backward (lead) in time - Forward shifts (positive periods) are useful for analyzing past influences on current values
- Backward shifts (negative periods) help predict future outcomes based on current conditions
- When shifting, NaN values are introduced at the edges, which can be replaced with custom fill values
- Time shifting works on both Series and DataFrame objects
- The technique is essential for calculating period-over-period changes and creating lag features for forecasting
Time shifting enables powerful analyses like year-over-year comparisons, technical indicators for financial data, and feature engineering for predictive models.
Exercises
To practice your time shifting skills, try these exercises:
- Load a dataset with daily temperature readings and calculate the 7-day temperature change.
- Create a weekly sales dataset and compute the percentage growth compared to the same week last year.
- Using stock price data, implement a trading strategy that buys when the 5-day moving average crosses above the 20-day moving average.
- Create a lag feature matrix for a time series with lags 1, 7, and 30 to predict future values.
- Implement a seasonal decomposition that compares each month's value to its average over the past three years.
Additional Resources
- Pandas Official Documentation on Time Series
- Time Series Analysis in Python - A Comprehensive Guide
- Practical Time Series Analysis with Pandas
- Python for Finance: Mastering Data-Driven Finance
With these techniques, you'll be well-equipped to manipulate time series data and extract valuable insights from temporal patterns.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)