Pandas DateTime Data

Time series data is everywhere - from stock prices and weather measurements to website traffic and sensor readings. Pandas provides powerful tools for working with date and time data, making it easy to handle timestamps, generate date ranges, and perform time-based analysis.

Introduction to DateTime Data in Pandas

DateTime data represents specific points in time, including dates, times, or both. Pandas has specialized data types and functions to handle time-based data efficiently:

Timestamp: Represents a single point in time
DatetimeIndex: An Index object specialized for datetime values
Period: Represents a time span (like a day, month, or year)
PeriodIndex: An Index of Period objects

By mastering these tools, you can easily:

Parse dates from strings
Generate date sequences
Resample time series data
Shift data forward or backward in time
Analyze time-based patterns

Converting Strings to DateTime

One common task is converting string representations of dates into proper datetime objects. Pandas makes this easy with the to_datetime() function.

import pandas as pd

# Converting single string to datetime
date_str = "2023-09-15"
date_obj = pd.to_datetime(date_str)
print(date_obj)

# Converting a series of dates
date_series = pd.Series(['2023-01-10', '2023-02-15', '2023-03-20'])
datetime_series = pd.to_datetime(date_series)
print(datetime_series)

Output:

2023-09-15 00:00:00
0   2023-01-10
1   2023-02-15
2   2023-03-20
dtype: datetime64[ns]

Handling Different Date Formats

Pandas is smart about recognizing various date formats, but you can also specify the format explicitly:

# Different date formats
dates = pd.Series(['10/2/2023', '11/3/2023', '12/4/2023'])
dates_dt = pd.to_datetime(dates, format='%m/%d/%Y')
print(dates_dt)

# European format (day first)
european_dates = pd.Series(['15-09-2023', '16-10-2023', '17-11-2023'])
european_dates_dt = pd.to_datetime(european_dates, format='%d-%m-%Y')
print(european_dates_dt)

Output:

 2023-10-02
 2023-11-03
 2023-12-04
dtype: datetime64[ns]
 2023-09-15
 2023-10-16
 2023-11-17
dtype: datetime64[ns]

Handling Errors in Date Parsing

When dealing with messy data, you can control how pandas handles invalid dates:

# Handling errors
mixed_dates = pd.Series(['2023-01-10', 'not a date', '2023-03-20'])

# Coerce errors (invalid dates become NaT - Not a Time)
dates_coerced = pd.to_datetime(mixed_dates, errors='coerce')
print(dates_coerced)

# Raise errors (default behavior)
try:
    dates_error = pd.to_datetime(mixed_dates, errors='raise')
except ValueError as e:
    print(f"Error: {e}")

Output:

0   2023-01-10
1          NaT
2   2023-03-20
dtype: datetime64[ns]
Error: Unable to parse string "not a date" at position 1

Creating Date Ranges

Pandas provides the date_range() function to create sequences of evenly spaced dates:

# Create a range of dates (default is daily frequency)
date_range = pd.date_range(start='2023-01-01', end='2023-01-10')
print(date_range)

# Specify number of periods instead of end date
date_periods = pd.date_range(start='2023-01-01', periods=5)
print(date_periods)

# Change the frequency
weekly_dates = pd.date_range(start='2023-01-01', periods=5, freq='W')
print("Weekly:")
print(weekly_dates)

monthly_dates = pd.date_range(start='2023-01-01', periods=5, freq='M')
print("\nMonthly:")
print(monthly_dates)

business_days = pd.date_range(start='2023-01-01', periods=5, freq='B')
print("\nBusiness days:")
print(business_days)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10'],
              dtype='datetime64[ns]', freq='D')
              
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05'],
              dtype='datetime64[ns]', freq='D')
              
Weekly:
DatetimeIndex(['2023-01-01', '2023-01-08', '2023-01-15', '2023-01-22',
               '2023-01-29'],
              dtype='datetime64[ns]', freq='W-SUN')

Monthly:
DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
               '2023-05-31'],
              dtype='datetime64[ns]', freq='M')

Business days:
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
               '2023-01-06'],
              dtype='datetime64[ns]', freq='B')

Common Frequency Aliases

Pandas supports many frequency aliases for different time periods:

Alias	Description
D	Calendar day
B	Business day
W	Weekly
M	Month end
MS	Month start
Q	Quarter end
QS	Quarter start
A or Y	Year end
AS or YS	Year start
H	Hourly
T or min	Minute
S	Second

Working with DateTime in DataFrames

Let's create a DataFrame with datetime data and explore how to use it:

# Create a DataFrame with date index
dates = pd.date_range('2023-01-01', periods=6)
df = pd.DataFrame({'value': [10, 12, 15, 14, 18, 20]}, index=dates)
print(df)

Output:

            value
2023-01-01     10
2023-01-02     12
2023-01-03     15
2023-01-04     14
2023-01-05     18
2023-01-06     20

Accessing DateTime Components

You can easily extract components like year, month, day, etc. from datetime objects:

# Extract date components
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df['day_of_week'] = df.index.day_name()
print(df)

Output:

            value  year  month  day day_of_week
2023-01-01     10  2023      1    1      Sunday
2023-01-02     12  2023      1    2      Monday
2023-01-03     15  2023      1    3     Tuesday
2023-01-04     14  2023      1    4   Wednesday
2023-01-05     18  2023      1    5    Thursday
2023-01-06     20  2023      1    6      Friday

Filtering Data by Date

You can filter DataFrame rows based on dates:

# Filter by specific date
print("Data for January 3rd:")
print(df.loc['2023-01-03'])

# Filter by date range
print("\nData from Jan 2 to Jan 4:")
print(df.loc['2023-01-02':'2023-01-04'])

# Filter using datetime components
weekdays = df[df.index.day_name().isin(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])]
print("\nWeekdays only:")
print(weekdays)

Output:

Data for January 3rd:
value                15
year               2023
month                 1
day                   3
day_of_week     Tuesday
Name: 2023-01-03 00:00:00, dtype: object

Data from Jan 2 to Jan 4:
            value  year  month  day day_of_week
2023-01-02     12  2023      1    2      Monday
2023-01-03     15  2023      1    3     Tuesday
2023-01-04     14  2023      1    4   Wednesday

Weekdays only:
            value  year  month  day day_of_week
2023-01-02     12  2023      1    2      Monday
2023-01-03     15  2023      1    3     Tuesday
2023-01-04     14  2023      1    4   Wednesday
2023-01-05     18  2023      1    5    Thursday
2023-01-06     20  2023      1    6      Friday

Time Series Analysis

Time series data often requires specialized operations like resampling, shifting, and rolling calculations.

Resampling Time Series Data

Resampling changes the frequency of your time series data:

# Create sample time series data
dates = pd.date_range('2023-01-01', periods=12, freq='D')
ts = pd.Series(range(12), index=dates)
print("Original data (daily):")
print(ts)

# Downsample to weekly data (sum values)
weekly_sum = ts.resample('W').sum()
print("\nDownsampled to weekly (sum):")
print(weekly_sum)

# Upsample to hourly data (forward fill)
hourly = ts.resample('H').ffill().head(10)  # Show just first 10 rows
print("\nUpsampled to hourly (first 10 rows, forward-filled):")
print(hourly)

Output:

Original data (daily):
2023-01-01     0
2023-01-02     1
2023-01-03     2
2023-01-04     3
2023-01-05     4
2023-01-06     5
2023-01-07     6
2023-01-08     7
2023-01-09     8
2023-01-10     9
2023-01-11    10
2023-01-12    11
Freq: D, dtype: int64

Downsampled to weekly (sum):
2023-01-01     0
2023-01-08    28
2023-01-15    38
Freq: W-SUN, dtype: int64

Upsampled to hourly (first 10 rows, forward-filled):
2023-01-01 00:00:00    0
2023-01-01 01:00:00    0
2023-01-01 02:00:00    0
2023-01-01 03:00:00    0
2023-01-01 04:00:00    0
2023-01-01 05:00:00    0
2023-01-01 06:00:00    0
2023-01-01 07:00:00    0
2023-01-01 08:00:00    0
2023-01-01 09:00:00    0
Freq: H, dtype: int64

Shifting and Lagging Data

Shifting data is useful for calculating changes or comparing values with previous periods:

# Create sample DataFrame
dates = pd.date_range('2023-01-01', periods=5)
df = pd.DataFrame({'value': [10, 12, 15, 14, 18]}, index=dates)

# Shift data forward by 1 period
df['previous_day'] = df['value'].shift(1)

# Calculate day-over-day change
df['day_change'] = df['value'] - df['previous_day']

# Calculate percentage change
df['pct_change'] = df['value'].pct_change() * 100

print(df)

Output:

            value  previous_day  day_change  pct_change
2023-01-01     10           NaN         NaN         NaN
2023-01-02     12          10.0         2.0   20.000000
2023-01-03     15          12.0         3.0   25.000000
2023-01-04     14          15.0        -1.0   -6.666667
2023-01-05     18          14.0         4.0   28.571429

Rolling Windows

Rolling windows are useful for calculating moving averages and other statistics:

# Create sample DataFrame with more data points
dates = pd.date_range('2023-01-01', periods=10)
df = pd.DataFrame({
    'value': [10, 12, 15, 14, 18, 20, 17, 21, 25, 19]
}, index=dates)

# Calculate 3-day rolling average
df['3d_rolling_avg'] = df['value'].rolling(window=3).mean()

# Calculate 3-day rolling standard deviation
df['3d_rolling_std'] = df['value'].rolling(window=3).std()

print(df)

Output:

            value  3d_rolling_avg  3d_rolling_std
2023-01-01     10             NaN             NaN
2023-01-02     12             NaN             NaN
2023-01-03     15      12.333333       2.516611
2023-01-04     14      13.666667       1.527525
2023-01-05     18      15.666667       2.081666
2023-01-06     20      17.333333       3.055050
2023-01-07     17      18.333333       1.527525
2023-01-08     21      19.333333       2.081666
2023-01-09     25      21.000000       4.000000
2023-01-10     19      21.666667       3.055050

Real-World Application: Sales Analysis

Let's apply what we've learned to a real-world example with sales data:

# Create sample sales data
dates = pd.date_range('2023-01-01', '2023-01-31')
np.random.seed(42)  # For reproducible results
daily_sales = np.random.randint(50, 200, size=len(dates))

sales_df = pd.DataFrame({
    'date': dates,
    'sales': daily_sales
})

# Set date as index
sales_df = sales_df.set_index('date')

# Add weekday information
sales_df['day_of_week'] = sales_df.index.day_name()

# Calculate 7-day rolling average
sales_df['7d_avg'] = sales_df['sales'].rolling(7).mean()

# Sample of data
print("Sample of daily sales data:")
print(sales_df.head())

# Average sales by day of week
day_of_week_avg = sales_df.groupby('day_of_week')['sales'].mean().reindex([
    'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'
])
print("\nAverage sales by day of week:")
print(day_of_week_avg)

# Resample to weekly totals
weekly_sales = sales_df['sales'].resample('W').sum()
print("\nWeekly sales totals:")
print(weekly_sales)

Output:

Sample of daily sales data:
            sales day_of_week  7d_avg
date                                 
2023-01-01    51      Sunday     NaN
2023-01-02   168      Monday     NaN
2023-01-03   113     Tuesday     NaN
2023-01-04    88   Wednesday     NaN
2023-01-05   150    Thursday     NaN

Average sales by day of week:
day_of_week
Monday      134.600000
Tuesday     111.200000
Wednesday   109.000000
Thursday    127.750000
Friday      120.000000
Saturday    124.500000
Sunday      120.250000
Name: sales, dtype: float64

Weekly sales totals:
date
2023-01-01     51
2023-01-08    807
2023-01-15    783
2023-01-22    756
2023-01-29    839
Freq: W-SUN, Name: sales, dtype: int64

Summary

In this lesson, we've explored how to work with datetime data in pandas:

Converting strings to datetime using pd.to_datetime()
Creating date ranges with pd.date_range()
Extracting datetime components like year, month, day
Filtering DataFrames based on dates
Resampling time series data
Shifting and calculating rolling windows
Applying these techniques to real-world data analysis

DateTime handling is a powerful feature in pandas that allows you to effectively work with time series data, making it easier to analyze trends, patterns, and seasonal variations in your data.

Practice Exercises

Create a DataFrame with daily temperature readings for one month, then calculate the weekly average temperature.
Parse dates from a list of strings in different formats (mix of US and European formats).
Create a DataFrame of stock prices and calculate the 5-day rolling average and standard deviation.
Build a dataset of website visits by hour, then resample it to show total daily and weekly traffic.
Calculate the difference between weekday and weekend average values for any time series dataset.

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction to DateTime Data in Pandas​

Converting Strings to DateTime​

Handling Different Date Formats​

Handling Errors in Date Parsing​

Creating Date Ranges​

Common Frequency Aliases​

Working with DateTime in DataFrames​

Accessing DateTime Components​

Filtering Data by Date​

Time Series Analysis​

Resampling Time Series Data​

Shifting and Lagging Data​

Rolling Windows​

Real-World Application: Sales Analysis​

Summary​

Practice Exercises​

Additional Resources​

Introduction to DateTime Data in Pandas

Converting Strings to DateTime

Handling Different Date Formats

Handling Errors in Date Parsing

Creating Date Ranges

Common Frequency Aliases

Working with DateTime in DataFrames

Accessing DateTime Components

Filtering Data by Date

Time Series Analysis

Resampling Time Series Data

Shifting and Lagging Data

Rolling Windows

Real-World Application: Sales Analysis

Summary

Practice Exercises

Additional Resources