Pandas Date Ranges
Introduction
When working with time series data in pandas, one of the most fundamental tools you'll need is a way to create sequences of dates and times. Pandas provides a powerful function called date_range()
that allows you to generate these sequences with precision and flexibility.
Whether you're analyzing stock market data, tracking weather patterns, or processing sensor readings, understanding how to create and manipulate date ranges will be essential to your data analysis workflow.
In this tutorial, we'll explore the capabilities of pd.date_range()
and learn how to create various types of date sequences for your time series analysis needs.
Understanding pd.date_range()
The pd.date_range()
function generates a sequence of datetime objects with a specified frequency. It's similar to Python's built-in range()
function but specifically designed for datetime values.
Basic Syntax
pd.date_range(start=None, end=None, periods=None, freq=None, **kwargs)
Where:
start
: The starting date/timeend
: The end date/timeperiods
: Number of periods to generatefreq
: Frequency string indicating the interval between dates (e.g., 'D' for days, 'M' for month-end)
Creating Basic Date Ranges
Let's start with some simple examples:
Example 1: Date Range With Start and End Dates
import pandas as pd
# Create a date range from January 1, 2023 to January 10, 2023
date_range = pd.date_range(start='2023-01-01', end='2023-01-10')
print(date_range)
Output:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
'2023-01-09', '2023-01-10'],
dtype='datetime64[ns]', freq='D')
By default, the frequency is daily ('D').
Example 2: Date Range With Start and Number of Periods
# Create 5 dates starting from January 1, 2023
date_range = pd.date_range(start='2023-01-01', periods=5)
print(date_range)
Output:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'],
dtype='datetime64[ns]', freq='D')
Example 3: Date Range With End and Number of Periods
# Create 5 dates ending at January 10, 2023
date_range = pd.date_range(end='2023-01-10', periods=5)
print(date_range)
Output:
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09',
'2023-01-10'],
dtype='datetime64[ns]', freq='D')
Working with Different Frequencies
One of the most powerful features of date_range()
is the ability to specify different frequencies.
Common Frequency Aliases
Alias | Description |
---|---|
D | Calendar day |
B | Business day |
H | Hourly |
T, min | Minute |
S | Second |
M | Month end |
MS | Month start |
W | Weekly |
Q | Quarter end |
QS | Quarter start |
A, Y | Year end |
AS, YS | Year start |
Let's see some examples:
Example 4: Weekly Frequency
# Create dates for each Monday in January 2023
weekly_range = pd.date_range(start='2023-01-01', end='2023-01-31', freq='W')
print(weekly_range)
Output:
DatetimeIndex(['2023-01-01', '2023-01-08', '2023-01-15', '2023-01-22',
'2023-01-29'],
dtype='datetime64[ns]', freq='W-SUN')
Example 5: Business Days
# Create a range of business days (excluding weekends)
business_days = pd.date_range(start='2023-01-01', periods=10, freq='B')
print(business_days)
Output:
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11',
'2023-01-12', '2023-01-13'],
dtype='datetime64[ns]', freq='B')
Notice how weekends are skipped.
Example 6: Month-End Frequency
# Create a range of month-end dates for the year 2023
month_ends = pd.date_range(start='2023-01-31', periods=12, freq='M')
print(month_ends)
Output:
DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
'2023-05-31', '2023-06-30', '2023-07-31', '2023-08-31',
'2023-09-30', '2023-10-31', '2023-11-30', '2023-12-31'],
dtype='datetime64[ns]', freq='M')
Example 7: Hourly Frequency
# Create hourly timestamps for January 1, 2023
hourly_range = pd.date_range(start='2023-01-01', periods=24, freq='H')
print(hourly_range)
Output:
DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 01:00:00',
'2023-01-01 02:00:00', '2023-01-01 03:00:00',
'2023-01-01 04:00:00', '2023-01-01 05:00:00',
'2023-01-01 06:00:00', '2023-01-01 07:00:00',
'2023-01-01 08:00:00', '2023-01-01 09:00:00',
'2023-01-01 10:00:00', '2023-01-01 11:00:00',
'2023-01-01 12:00:00', '2023-01-01 13:00:00',
'2023-01-01 14:00:00', '2023-01-01 15:00:00',
'2023-01-01 16:00:00', '2023-01-01 17:00:00',
'2023-01-01 18:00:00', '2023-01-01 19:00:00',
'2023-01-01 20:00:00', '2023-01-01 21:00:00',
'2023-01-01 22:00:00', '2023-01-01 23:00:00'],
dtype='datetime64[ns]', freq='H')
Custom Frequency Strings
You can create more complex frequencies by using custom frequency strings:
Example 8: Every 2 Hours
# Create a range with 2-hour intervals
two_hourly = pd.date_range(start='2023-01-01', periods=12, freq='2H')
print(two_hourly)
Output:
DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 02:00:00',
'2023-01-01 04:00:00', '2023-01-01 06:00:00',
'2023-01-01 08:00:00', '2023-01-01 10:00:00',
'2023-01-01 12:00:00', '2023-01-01 14:00:00',
'2023-01-01 16:00:00', '2023-01-01 18:00:00',
'2023-01-01 20:00:00', '2023-01-01 22:00:00'],
dtype='datetime64[ns]', freq='2H')
Example 9: Every 15 Minutes
# Create a range with 15-minute intervals
fifteen_min = pd.date_range(start='2023-01-01 00:00:00', periods=8, freq='15min')
print(fifteen_min)
Output:
DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 00:15:00',
'2023-01-01 00:30:00', '2023-01-01 00:45:00',
'2023-01-01 01:00:00', '2023-01-01 01:15:00',
'2023-01-01 01:30:00', '2023-01-01 01:45:00'],
dtype='datetime64[ns]', freq='15T')
Example 10: Every 3 Months
# Create a range of every 3 months (quarterly)
quarterly = pd.date_range(start='2023-01-01', periods=4, freq='3M')
print(quarterly)
Output:
DatetimeIndex(['2023-01-31', '2023-04-30', '2023-07-31', '2023-10-31'],
dtype='datetime64[ns]', freq='3M')
Practical Applications
Let's explore some real-world applications of date ranges in data analysis.
Application 1: Creating a Time Series DataFrame
import numpy as np
# Create a date range for daily temperatures in January 2023
dates = pd.date_range(start='2023-01-01', end='2023-01-31', freq='D')
# Generate random temperature data (in Celsius)
temperatures = np.random.normal(loc=5, scale=3, size=len(dates))
# Create the DataFrame
temp_df = pd.DataFrame({'date': dates, 'temperature': temperatures})
temp_df.set_index('date', inplace=True)
print(temp_df.head())
Output:
temperature
date
2023-01-01 5.432431
2023-01-02 4.216436
2023-01-03 2.178246
2023-01-04 8.752985
2023-01-05 2.905871
Application 2: Resampling Time Series Data
# Create hourly data
hourly_data = pd.date_range(start='2023-01-01', periods=48, freq='H')
values = np.random.normal(loc=10, scale=2, size=len(hourly_data))
hourly_df = pd.DataFrame({'value': values}, index=hourly_data)
# Resample to daily average
daily_avg = hourly_df.resample('D').mean()
print("Hourly data (first 5 rows):")
print(hourly_df.head())
print("\nDaily averages:")
print(daily_avg)
Output:
Hourly data (first 5 rows):
value
2023-01-01 00:00:00 10.523452
2023-01-01 01:00:00 9.632145
2023-01-01 02:00:00 10.987638
2023-01-01 03:00:00 7.125896
2023-01-01 04:00:00 11.367425
Daily averages:
value
2023-01-01 9.854621
2023-01-02 9.721536
2023-01-03 9.986347
Application 3: Creating a Business Day Calendar for Stock Analysis
# Create business days for January 2023
business_days = pd.date_range(start='2023-01-01', end='2023-01-31', freq='B')
# Simulate stock prices
np.random.seed(42) # For reproducibility
initial_price = 100
daily_returns = np.random.normal(loc=0.001, scale=0.02, size=len(business_days))
stock_prices = initial_price * (1 + daily_returns).cumprod()
# Create DataFrame
stock_df = pd.DataFrame({'date': business_days, 'price': stock_prices})
stock_df.set_index('date', inplace=True)
print(stock_df.head())
# Calculate weekly average prices
weekly_avg = stock_df.resample('W').mean()
print("\nWeekly average prices:")
print(weekly_avg)
Output:
price
date
2023-01-02 100.835868
2023-01-03 99.659915
2023-01-04 100.859357
2023-01-05 102.582601
2023-01-06 104.906752
Weekly average prices:
price
date
2023-01-08 101.768898
2023-01-15 105.637448
2023-01-22 103.456743
2023-01-29 106.892510
Advanced Date Range Features
Time Zones
You can specify time zones in your date range:
# Create a date range in a specific time zone
nyc_dates = pd.date_range(
start='2023-01-01',
periods=5,
freq='D',
tz='America/New_York'
)
print(nyc_dates)
# Convert to another time zone
london_dates = nyc_dates.tz_convert('Europe/London')
print("\nConverted to London time:")
print(london_dates)
Output:
DatetimeIndex(['2023-01-01 00:00:00-05:00', '2023-01-02 00:00:00-05:00',
'2023-01-03 00:00:00-05:00', '2023-01-04 00:00:00-05:00',
'2023-01-05 00:00:00-05:00'],
dtype='datetime64[ns, America/New_York]', freq='D')
Converted to London time:
DatetimeIndex(['2023-01-01 05:00:00+00:00', '2023-01-02 05:00:00+00:00',
'2023-01-03 05:00:00+00:00', '2023-01-04 05:00:00+00:00',
'2023-01-05 05:00:00+00:00'],
dtype='datetime64[ns, Europe/London]', freq='D')
Normalized Parameter
The normalize
parameter can be used to set all times to midnight:
# Create a date range without normalizing
dates1 = pd.date_range(start='2023-01-01 10:30:00', periods=5, freq='D')
print("Without normalize:")
print(dates1)
# Create a date range with normalizing
dates2 = pd.date_range(start='2023-01-01 10:30:00', periods=5, freq='D', normalize=True)
print("\nWith normalize (all times set to midnight):")
print(dates2)
Output:
Without normalize:
DatetimeIndex(['2023-01-01 10:30:00', '2023-01-02 10:30:00',
'2023-01-03 10:30:00', '2023-01-04 10:30:00',
'2023-01-05 10:30:00'],
dtype='datetime64[ns]', freq='D')
With normalize (all times set to midnight):
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'],
dtype='datetime64[ns]', freq='D')
Working with Inclusive Parameters
The inclusive
parameter controls whether the start and end points are included in the range:
# Default behavior includes both start and end
dates1 = pd.date_range(start='2023-01-01', end='2023-01-05')
print("Default (inclusive='both'):")
print(dates1)
# Include only the start date
dates2 = pd.date_range(start='2023-01-01', end='2023-01-05', inclusive='left')
print("\nInclusive='left':")
print(dates2)
# Include only the end date
dates3 = pd.date_range(start='2023-01-01', end='2023-01-05', inclusive='right')
print("\nInclusive='right':")
print(dates3)
# Include neither start nor end date
dates4 = pd.date_range(start='2023-01-01', end='2023-01-05', inclusive='neither')
print("\nInclusive='neither':")
print(dates4)
Output:
Default (inclusive='both'):
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'],
dtype='datetime64[ns]', freq='D')
Inclusive='left':
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
dtype='datetime64[ns]', freq='D')
Inclusive='right':
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
dtype='datetime64[ns]', freq='D')
Inclusive='neither':
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04'],
dtype='datetime64[ns]', freq='D')
Summary
In this tutorial, we've explored the powerful pd.date_range()
function in pandas, which is essential for time series analysis:
- We learned how to create date ranges with various parameters including start, end, and periods
- We explored different frequency options from daily to monthly to custom intervals
- We saw how to apply date ranges in practical scenarios such as creating time series datasets and resampling data
- We covered advanced features like time zones and normalization
Understanding date ranges in pandas opens up many possibilities for time series analysis and allows you to effectively work with time-based data across various domains.
Additional Exercises
- Create a date range for every other Friday in the year 2023.
- Generate a time series of hourly temperatures for a week, with higher temperatures during the day (8am-6pm) and lower temperatures at night.
- Create a DataFrame with daily stock prices for a year, then resample it to find monthly maximum, minimum, and average prices.
- Create a date range for the last business day of each month in 2023.
- Generate a date range with 15-minute intervals for a trading day (9:30 AM to 4:00 PM), then create a DataFrame with random stock price movements.
Additional Resources
- Pandas Official Documentation on date_range
- Pandas Time Series Documentation
- Pandas Frequency Aliases
Happy time series analysis with pandas!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)