Pandas Datetime Indexing
Time series data is everywhere - from stock prices and weather measurements to website traffic and sales figures. Working effectively with time-based data in pandas requires understanding datetime indexing, which is what we'll explore in this tutorial.
What is Datetime Indexing?
Datetime indexing is a powerful pandas feature that lets you use dates and times as the primary index for your DataFrame or Series. This unlocks specialized time series functionality, making it easier to:
- Select data based on date ranges
- Resample time series data to different frequencies
- Perform time-based operations efficiently
- Handle time zones and calendar-based calculations
Creating a DataFrame with Datetime Index
Let's start by creating a simple DataFrame with a datetime index:
import pandas as pd
import numpy as np
from datetime import datetime
# Create a date range as the index
date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')
# Create a DataFrame with the date range as index
df = pd.DataFrame(
data=np.random.randn(len(date_rng), 2),
index=date_rng,
columns=['Value 1', 'Value 2']
)
print(df.head())
This outputs:
Value 1 Value 2
2023-01-01 0.422081 -0.128437
2023-01-02 -0.573528 -0.145933
2023-01-03 0.925940 -1.102207
2023-01-04 0.553123 0.176009
2023-01-05 1.360425 -0.372772
The pd.date_range()
function is perfect for creating datetime indices. It accepts various arguments like:
start
: The starting dateend
: The ending dateperiods
: Number of periods (instead of specifying end)freq
: Frequency string (D for daily, H for hourly, etc.)
Converting Existing Columns to Datetime Index
Often, your data already has date information in a column. You can convert this column to a datetime index:
# Create sample data with a date column
data = {
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
'value': [10, 11, 9, 13, 14]
}
# Create DataFrame
df = pd.DataFrame(data)
# Convert the date column to datetime type
df['date'] = pd.to_datetime(df['date'])
# Set the date column as index
df_indexed = df.set_index('date')
print(df_indexed)
Output:
value
date
2023-01-01 10
2023-01-02 11
2023-01-03 9
2023-01-04 13
2023-01-05 14
Selecting Data Using Datetime Index
Once you have a datetime index, you can use powerful selection methods:
Selecting by string representation
# Select data for a specific date
print(df_indexed.loc['2023-01-03'])
# Select data for a date range
print(df_indexed.loc['2023-01-02':'2023-01-04'])
Output for the specific date:
value 9
Name: 2023-01-03 00:00:00, dtype: int64
Output for the date range:
value
date
2023-01-02 11
2023-01-03 9
2023-01-04 13
Using datetime objects for selection
# Select using a datetime object
start_date = pd.Timestamp('2023-01-02')
end_date = pd.Timestamp('2023-01-04')
print(df_indexed.loc[start_date:end_date])
Output:
value
date
2023-01-02 11
2023-01-03 9
2023-01-04 13
Datetime Components and Partial String Indexing
Pandas supports partial string indexing, letting you select data based on parts of dates:
# Create a longer date range with different frequency
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='MS') # MS = Month Start
# Create a DataFrame
sales_data = pd.DataFrame(
data=np.random.randint(100, 1000, size=len(date_rng)),
index=date_rng,
columns=['Sales']
)
# Select all data from 2023
print("Sales for 2023:")
print(sales_data.loc['2023'])
# Select data for March across years
print("\nSales for March across all years:")
print(sales_data.loc[sales_data.index.month == 3])
# Select data for first quarter of 2023
print("\nSales for Q1 2023:")
print(sales_data.loc['2023-01':'2023-03'])
Datetime Attributes and Methods
Datetime indices provide useful attributes for analysis and filtering:
# Create a DataFrame with datetime index
date_rng = pd.date_range(start='2023-01-01', periods=365, freq='D')
daily_data = pd.DataFrame(
data=np.random.randn(len(date_rng)),
index=date_rng,
columns=['Value']
)
# Get day of week (0 is Monday, 6 is Sunday)
daily_data['day_of_week'] = daily_data.index.dayofweek
# Is this a weekend?
daily_data['is_weekend'] = daily_data['day_of_week'] > 4
# Month and quarter
daily_data['month'] = daily_data.index.month
daily_data['quarter'] = daily_data.index.quarter
print(daily_data.head())
# Filter for weekends only
weekend_data = daily_data[daily_data['is_weekend']]
print(f"\nNumber of weekend days: {len(weekend_data)}")
# Average value by month
monthly_avg = daily_data.groupby('month')['Value'].mean()
print("\nMonthly averages:")
print(monthly_avg)
Real-world Example: Analyzing Stock Data
Let's use datetime indexing to analyze stock price data:
import pandas as pd
import matplotlib.pyplot as plt
# Sample stock data (in practice, you might use yfinance or another data source)
data = {
'Date': [
'2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10',
'2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15',
'2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20'
],
'Close': [
150.5, 152.3, 151.8, 153.2, 155.7,
156.9, 157.2, 156.3, 158.1, 157.5,
159.8, 160.2, 161.5, 162.3, 160.8,
159.7, 163.2, 165.1, 164.7, 166.2
],
'Volume': [
1200000, 1350000, 980000, 1150000, 1430000,
1320000, 890000, 910000, 1560000, 1670000,
1340000, 1230000, 1450000, 1080000, 970000,
1120000, 1690000, 1740000, 1320000, 1490000
]
}
# Create a DataFrame
stock_data = pd.DataFrame(data)
# Convert Date column to datetime and set as index
stock_data['Date'] = pd.to_datetime(stock_data['Date'])
stock_data = stock_data.set_index('Date')
# Calculate 5-day moving average
stock_data['MA5'] = stock_data['Close'].rolling(window=5).mean()
# Plot the data
plt.figure(figsize=(12, 6))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price')
plt.plot(stock_data.index, stock_data['MA5'], label='5-Day MA', linestyle='--')
plt.title('Stock Price with 5-Day Moving Average')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.tight_layout()
# Identify days with above-average volume
avg_volume = stock_data['Volume'].mean()
high_volume_days = stock_data[stock_data['Volume'] > avg_volume]
print(f"Days with above-average volume:")
print(high_volume_days)
# Analyze weekly performance
stock_data['Weekly_Return'] = stock_data['Close'].pct_change(periods=5)
print("\nWeekly returns:")
print(stock_data['Weekly_Return'].tail())
Advanced Datetime Indexing Features
Using between_time()
for Time-of-day Analysis
If your datetime index includes time information, you can select rows based on time ranges:
# Create hourly data for a week
hourly_index = pd.date_range(
start='2023-01-01 00:00:00',
end='2023-01-07 23:00:00',
freq='H'
)
hourly_data = pd.DataFrame(
data={'value': np.random.randn(len(hourly_index))},
index=hourly_index
)
# Select data between 9 AM and 5 PM (business hours)
business_hours = hourly_data.between_time('9:00', '17:00')
print(f"Total hours: {len(hourly_data)}")
print(f"Business hours: {len(business_hours)}")
Handling Time Zones
Pandas datetime indices can work with time zones:
# Create a time zone aware datetime index
tz_index = pd.date_range(
start='2023-01-01',
periods=5,
freq='D',
tz='UTC'
)
tz_data = pd.DataFrame(
data={'value': range(5)},
index=tz_index
)
print("UTC Time:")
print(tz_data)
# Convert to US Eastern Time
us_eastern = tz_data.tz_convert('US/Eastern')
print("\nUS Eastern Time:")
print(us_eastern)
Summary
Datetime indexing in pandas transforms how you work with time series data by providing:
- Intuitive slicing and selection based on dates and times
- Built-in methods for time-based calculations
- Easy filtering by date components (month, day of week, etc.)
- Seamless integration with pandas' powerful grouping and resampling operations
By mastering datetime indexing, you can perform complex time series analysis with clean, readable code.
Further Learning
Exercises
- Create a DataFrame with daily temperature data for a year and find the average temperature by month.
- Download stock price data for your favorite company and analyze how prices differ between days of the week.
- Create a dataset with hourly website visits and identify peak usage times.
Additional Resources
- Pandas Time Series documentation
- Pandas DatetimeIndex API reference
- For real-world time series analysis, explore libraries like Prophet or statsmodels
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)