Skip to main content

Pandas Datetime Indexing

Time series data is everywhere - from stock prices and weather measurements to website traffic and sales figures. Working effectively with time-based data in pandas requires understanding datetime indexing, which is what we'll explore in this tutorial.

What is Datetime Indexing?

Datetime indexing is a powerful pandas feature that lets you use dates and times as the primary index for your DataFrame or Series. This unlocks specialized time series functionality, making it easier to:

  • Select data based on date ranges
  • Resample time series data to different frequencies
  • Perform time-based operations efficiently
  • Handle time zones and calendar-based calculations

Creating a DataFrame with Datetime Index

Let's start by creating a simple DataFrame with a datetime index:

python
import pandas as pd
import numpy as np
from datetime import datetime

# Create a date range as the index
date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')

# Create a DataFrame with the date range as index
df = pd.DataFrame(
data=np.random.randn(len(date_rng), 2),
index=date_rng,
columns=['Value 1', 'Value 2']
)

print(df.head())

This outputs:

                Value 1    Value 2
2023-01-01 0.422081 -0.128437
2023-01-02 -0.573528 -0.145933
2023-01-03 0.925940 -1.102207
2023-01-04 0.553123 0.176009
2023-01-05 1.360425 -0.372772

The pd.date_range() function is perfect for creating datetime indices. It accepts various arguments like:

  • start: The starting date
  • end: The ending date
  • periods: Number of periods (instead of specifying end)
  • freq: Frequency string (D for daily, H for hourly, etc.)

Converting Existing Columns to Datetime Index

Often, your data already has date information in a column. You can convert this column to a datetime index:

python
# Create sample data with a date column
data = {
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
'value': [10, 11, 9, 13, 14]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert the date column to datetime type
df['date'] = pd.to_datetime(df['date'])

# Set the date column as index
df_indexed = df.set_index('date')

print(df_indexed)

Output:

            value
date
2023-01-01 10
2023-01-02 11
2023-01-03 9
2023-01-04 13
2023-01-05 14

Selecting Data Using Datetime Index

Once you have a datetime index, you can use powerful selection methods:

Selecting by string representation

python
# Select data for a specific date
print(df_indexed.loc['2023-01-03'])

# Select data for a date range
print(df_indexed.loc['2023-01-02':'2023-01-04'])

Output for the specific date:

value    9
Name: 2023-01-03 00:00:00, dtype: int64

Output for the date range:

            value
date
2023-01-02 11
2023-01-03 9
2023-01-04 13

Using datetime objects for selection

python
# Select using a datetime object
start_date = pd.Timestamp('2023-01-02')
end_date = pd.Timestamp('2023-01-04')

print(df_indexed.loc[start_date:end_date])

Output:

            value
date
2023-01-02 11
2023-01-03 9
2023-01-04 13

Datetime Components and Partial String Indexing

Pandas supports partial string indexing, letting you select data based on parts of dates:

python
# Create a longer date range with different frequency
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='MS') # MS = Month Start

# Create a DataFrame
sales_data = pd.DataFrame(
data=np.random.randint(100, 1000, size=len(date_rng)),
index=date_rng,
columns=['Sales']
)

# Select all data from 2023
print("Sales for 2023:")
print(sales_data.loc['2023'])

# Select data for March across years
print("\nSales for March across all years:")
print(sales_data.loc[sales_data.index.month == 3])

# Select data for first quarter of 2023
print("\nSales for Q1 2023:")
print(sales_data.loc['2023-01':'2023-03'])

Datetime Attributes and Methods

Datetime indices provide useful attributes for analysis and filtering:

python
# Create a DataFrame with datetime index
date_rng = pd.date_range(start='2023-01-01', periods=365, freq='D')
daily_data = pd.DataFrame(
data=np.random.randn(len(date_rng)),
index=date_rng,
columns=['Value']
)

# Get day of week (0 is Monday, 6 is Sunday)
daily_data['day_of_week'] = daily_data.index.dayofweek

# Is this a weekend?
daily_data['is_weekend'] = daily_data['day_of_week'] > 4

# Month and quarter
daily_data['month'] = daily_data.index.month
daily_data['quarter'] = daily_data.index.quarter

print(daily_data.head())

# Filter for weekends only
weekend_data = daily_data[daily_data['is_weekend']]
print(f"\nNumber of weekend days: {len(weekend_data)}")

# Average value by month
monthly_avg = daily_data.groupby('month')['Value'].mean()
print("\nMonthly averages:")
print(monthly_avg)

Real-world Example: Analyzing Stock Data

Let's use datetime indexing to analyze stock price data:

python
import pandas as pd
import matplotlib.pyplot as plt

# Sample stock data (in practice, you might use yfinance or another data source)
data = {
'Date': [
'2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10',
'2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15',
'2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20'
],
'Close': [
150.5, 152.3, 151.8, 153.2, 155.7,
156.9, 157.2, 156.3, 158.1, 157.5,
159.8, 160.2, 161.5, 162.3, 160.8,
159.7, 163.2, 165.1, 164.7, 166.2
],
'Volume': [
1200000, 1350000, 980000, 1150000, 1430000,
1320000, 890000, 910000, 1560000, 1670000,
1340000, 1230000, 1450000, 1080000, 970000,
1120000, 1690000, 1740000, 1320000, 1490000
]
}

# Create a DataFrame
stock_data = pd.DataFrame(data)

# Convert Date column to datetime and set as index
stock_data['Date'] = pd.to_datetime(stock_data['Date'])
stock_data = stock_data.set_index('Date')

# Calculate 5-day moving average
stock_data['MA5'] = stock_data['Close'].rolling(window=5).mean()

# Plot the data
plt.figure(figsize=(12, 6))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price')
plt.plot(stock_data.index, stock_data['MA5'], label='5-Day MA', linestyle='--')
plt.title('Stock Price with 5-Day Moving Average')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.tight_layout()

# Identify days with above-average volume
avg_volume = stock_data['Volume'].mean()
high_volume_days = stock_data[stock_data['Volume'] > avg_volume]
print(f"Days with above-average volume:")
print(high_volume_days)

# Analyze weekly performance
stock_data['Weekly_Return'] = stock_data['Close'].pct_change(periods=5)
print("\nWeekly returns:")
print(stock_data['Weekly_Return'].tail())

Advanced Datetime Indexing Features

Using between_time() for Time-of-day Analysis

If your datetime index includes time information, you can select rows based on time ranges:

python
# Create hourly data for a week
hourly_index = pd.date_range(
start='2023-01-01 00:00:00',
end='2023-01-07 23:00:00',
freq='H'
)

hourly_data = pd.DataFrame(
data={'value': np.random.randn(len(hourly_index))},
index=hourly_index
)

# Select data between 9 AM and 5 PM (business hours)
business_hours = hourly_data.between_time('9:00', '17:00')
print(f"Total hours: {len(hourly_data)}")
print(f"Business hours: {len(business_hours)}")

Handling Time Zones

Pandas datetime indices can work with time zones:

python
# Create a time zone aware datetime index
tz_index = pd.date_range(
start='2023-01-01',
periods=5,
freq='D',
tz='UTC'
)

tz_data = pd.DataFrame(
data={'value': range(5)},
index=tz_index
)

print("UTC Time:")
print(tz_data)

# Convert to US Eastern Time
us_eastern = tz_data.tz_convert('US/Eastern')
print("\nUS Eastern Time:")
print(us_eastern)

Summary

Datetime indexing in pandas transforms how you work with time series data by providing:

  • Intuitive slicing and selection based on dates and times
  • Built-in methods for time-based calculations
  • Easy filtering by date components (month, day of week, etc.)
  • Seamless integration with pandas' powerful grouping and resampling operations

By mastering datetime indexing, you can perform complex time series analysis with clean, readable code.

Further Learning

Exercises

  1. Create a DataFrame with daily temperature data for a year and find the average temperature by month.
  2. Download stock price data for your favorite company and analyze how prices differ between days of the week.
  3. Create a dataset with hourly website visits and identify peak usage times.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)