Skip to main content

Pandas DateTime Data

Time series data is everywhere - from stock prices and weather measurements to website traffic and sensor readings. Pandas provides powerful tools for working with date and time data, making it easy to handle timestamps, generate date ranges, and perform time-based analysis.

Introduction to DateTime Data in Pandas

DateTime data represents specific points in time, including dates, times, or both. Pandas has specialized data types and functions to handle time-based data efficiently:

  • Timestamp: Represents a single point in time
  • DatetimeIndex: An Index object specialized for datetime values
  • Period: Represents a time span (like a day, month, or year)
  • PeriodIndex: An Index of Period objects

By mastering these tools, you can easily:

  • Parse dates from strings
  • Generate date sequences
  • Resample time series data
  • Shift data forward or backward in time
  • Analyze time-based patterns

Converting Strings to DateTime

One common task is converting string representations of dates into proper datetime objects. Pandas makes this easy with the to_datetime() function.

python
import pandas as pd

# Converting single string to datetime
date_str = "2023-09-15"
date_obj = pd.to_datetime(date_str)
print(date_obj)

# Converting a series of dates
date_series = pd.Series(['2023-01-10', '2023-02-15', '2023-03-20'])
datetime_series = pd.to_datetime(date_series)
print(datetime_series)

Output:

2023-09-15 00:00:00
0 2023-01-10
1 2023-02-15
2 2023-03-20
dtype: datetime64[ns]

Handling Different Date Formats

Pandas is smart about recognizing various date formats, but you can also specify the format explicitly:

python
# Different date formats
dates = pd.Series(['10/2/2023', '11/3/2023', '12/4/2023'])
dates_dt = pd.to_datetime(dates, format='%m/%d/%Y')
print(dates_dt)

# European format (day first)
european_dates = pd.Series(['15-09-2023', '16-10-2023', '17-11-2023'])
european_dates_dt = pd.to_datetime(european_dates, format='%d-%m-%Y')
print(european_dates_dt)

Output:

0   2023-10-02
1 2023-11-03
2 2023-12-04
dtype: datetime64[ns]
0 2023-09-15
1 2023-10-16
2 2023-11-17
dtype: datetime64[ns]

Handling Errors in Date Parsing

When dealing with messy data, you can control how pandas handles invalid dates:

python
# Handling errors
mixed_dates = pd.Series(['2023-01-10', 'not a date', '2023-03-20'])

# Coerce errors (invalid dates become NaT - Not a Time)
dates_coerced = pd.to_datetime(mixed_dates, errors='coerce')
print(dates_coerced)

# Raise errors (default behavior)
try:
dates_error = pd.to_datetime(mixed_dates, errors='raise')
except ValueError as e:
print(f"Error: {e}")

Output:

0   2023-01-10
1 NaT
2 2023-03-20
dtype: datetime64[ns]
Error: Unable to parse string "not a date" at position 1

Creating Date Ranges

Pandas provides the date_range() function to create sequences of evenly spaced dates:

python
# Create a range of dates (default is daily frequency)
date_range = pd.date_range(start='2023-01-01', end='2023-01-10')
print(date_range)

# Specify number of periods instead of end date
date_periods = pd.date_range(start='2023-01-01', periods=5)
print(date_periods)

# Change the frequency
weekly_dates = pd.date_range(start='2023-01-01', periods=5, freq='W')
print("Weekly:")
print(weekly_dates)

monthly_dates = pd.date_range(start='2023-01-01', periods=5, freq='M')
print("\nMonthly:")
print(monthly_dates)

business_days = pd.date_range(start='2023-01-01', periods=5, freq='B')
print("\nBusiness days:")
print(business_days)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
'2023-01-09', '2023-01-10'],
dtype='datetime64[ns]', freq='D')

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'],
dtype='datetime64[ns]', freq='D')

Weekly:
DatetimeIndex(['2023-01-01', '2023-01-08', '2023-01-15', '2023-01-22',
'2023-01-29'],
dtype='datetime64[ns]', freq='W-SUN')

Monthly:
DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
'2023-05-31'],
dtype='datetime64[ns]', freq='M')

Business days:
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06'],
dtype='datetime64[ns]', freq='B')

Common Frequency Aliases

Pandas supports many frequency aliases for different time periods:

AliasDescription
DCalendar day
BBusiness day
WWeekly
MMonth end
MSMonth start
QQuarter end
QSQuarter start
A or YYear end
AS or YSYear start
HHourly
T or minMinute
SSecond

Working with DateTime in DataFrames

Let's create a DataFrame with datetime data and explore how to use it:

python
# Create a DataFrame with date index
dates = pd.date_range('2023-01-01', periods=6)
df = pd.DataFrame({'value': [10, 12, 15, 14, 18, 20]}, index=dates)
print(df)

Output:

            value
2023-01-01 10
2023-01-02 12
2023-01-03 15
2023-01-04 14
2023-01-05 18
2023-01-06 20

Accessing DateTime Components

You can easily extract components like year, month, day, etc. from datetime objects:

python
# Extract date components
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df['day_of_week'] = df.index.day_name()
print(df)

Output:

            value  year  month  day day_of_week
2023-01-01 10 2023 1 1 Sunday
2023-01-02 12 2023 1 2 Monday
2023-01-03 15 2023 1 3 Tuesday
2023-01-04 14 2023 1 4 Wednesday
2023-01-05 18 2023 1 5 Thursday
2023-01-06 20 2023 1 6 Friday

Filtering Data by Date

You can filter DataFrame rows based on dates:

python
# Filter by specific date
print("Data for January 3rd:")
print(df.loc['2023-01-03'])

# Filter by date range
print("\nData from Jan 2 to Jan 4:")
print(df.loc['2023-01-02':'2023-01-04'])

# Filter using datetime components
weekdays = df[df.index.day_name().isin(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])]
print("\nWeekdays only:")
print(weekdays)

Output:

Data for January 3rd:
value 15
year 2023
month 1
day 3
day_of_week Tuesday
Name: 2023-01-03 00:00:00, dtype: object

Data from Jan 2 to Jan 4:
value year month day day_of_week
2023-01-02 12 2023 1 2 Monday
2023-01-03 15 2023 1 3 Tuesday
2023-01-04 14 2023 1 4 Wednesday

Weekdays only:
value year month day day_of_week
2023-01-02 12 2023 1 2 Monday
2023-01-03 15 2023 1 3 Tuesday
2023-01-04 14 2023 1 4 Wednesday
2023-01-05 18 2023 1 5 Thursday
2023-01-06 20 2023 1 6 Friday

Time Series Analysis

Time series data often requires specialized operations like resampling, shifting, and rolling calculations.

Resampling Time Series Data

Resampling changes the frequency of your time series data:

python
# Create sample time series data
dates = pd.date_range('2023-01-01', periods=12, freq='D')
ts = pd.Series(range(12), index=dates)
print("Original data (daily):")
print(ts)

# Downsample to weekly data (sum values)
weekly_sum = ts.resample('W').sum()
print("\nDownsampled to weekly (sum):")
print(weekly_sum)

# Upsample to hourly data (forward fill)
hourly = ts.resample('H').ffill().head(10) # Show just first 10 rows
print("\nUpsampled to hourly (first 10 rows, forward-filled):")
print(hourly)

Output:

Original data (daily):
2023-01-01 0
2023-01-02 1
2023-01-03 2
2023-01-04 3
2023-01-05 4
2023-01-06 5
2023-01-07 6
2023-01-08 7
2023-01-09 8
2023-01-10 9
2023-01-11 10
2023-01-12 11
Freq: D, dtype: int64

Downsampled to weekly (sum):
2023-01-01 0
2023-01-08 28
2023-01-15 38
Freq: W-SUN, dtype: int64

Upsampled to hourly (first 10 rows, forward-filled):
2023-01-01 00:00:00 0
2023-01-01 01:00:00 0
2023-01-01 02:00:00 0
2023-01-01 03:00:00 0
2023-01-01 04:00:00 0
2023-01-01 05:00:00 0
2023-01-01 06:00:00 0
2023-01-01 07:00:00 0
2023-01-01 08:00:00 0
2023-01-01 09:00:00 0
Freq: H, dtype: int64

Shifting and Lagging Data

Shifting data is useful for calculating changes or comparing values with previous periods:

python
# Create sample DataFrame
dates = pd.date_range('2023-01-01', periods=5)
df = pd.DataFrame({'value': [10, 12, 15, 14, 18]}, index=dates)

# Shift data forward by 1 period
df['previous_day'] = df['value'].shift(1)

# Calculate day-over-day change
df['day_change'] = df['value'] - df['previous_day']

# Calculate percentage change
df['pct_change'] = df['value'].pct_change() * 100

print(df)

Output:

            value  previous_day  day_change  pct_change
2023-01-01 10 NaN NaN NaN
2023-01-02 12 10.0 2.0 20.000000
2023-01-03 15 12.0 3.0 25.000000
2023-01-04 14 15.0 -1.0 -6.666667
2023-01-05 18 14.0 4.0 28.571429

Rolling Windows

Rolling windows are useful for calculating moving averages and other statistics:

python
# Create sample DataFrame with more data points
dates = pd.date_range('2023-01-01', periods=10)
df = pd.DataFrame({
'value': [10, 12, 15, 14, 18, 20, 17, 21, 25, 19]
}, index=dates)

# Calculate 3-day rolling average
df['3d_rolling_avg'] = df['value'].rolling(window=3).mean()

# Calculate 3-day rolling standard deviation
df['3d_rolling_std'] = df['value'].rolling(window=3).std()

print(df)

Output:

            value  3d_rolling_avg  3d_rolling_std
2023-01-01 10 NaN NaN
2023-01-02 12 NaN NaN
2023-01-03 15 12.333333 2.516611
2023-01-04 14 13.666667 1.527525
2023-01-05 18 15.666667 2.081666
2023-01-06 20 17.333333 3.055050
2023-01-07 17 18.333333 1.527525
2023-01-08 21 19.333333 2.081666
2023-01-09 25 21.000000 4.000000
2023-01-10 19 21.666667 3.055050

Real-World Application: Sales Analysis

Let's apply what we've learned to a real-world example with sales data:

python
# Create sample sales data
dates = pd.date_range('2023-01-01', '2023-01-31')
np.random.seed(42) # For reproducible results
daily_sales = np.random.randint(50, 200, size=len(dates))

sales_df = pd.DataFrame({
'date': dates,
'sales': daily_sales
})

# Set date as index
sales_df = sales_df.set_index('date')

# Add weekday information
sales_df['day_of_week'] = sales_df.index.day_name()

# Calculate 7-day rolling average
sales_df['7d_avg'] = sales_df['sales'].rolling(7).mean()

# Sample of data
print("Sample of daily sales data:")
print(sales_df.head())

# Average sales by day of week
day_of_week_avg = sales_df.groupby('day_of_week')['sales'].mean().reindex([
'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'
])
print("\nAverage sales by day of week:")
print(day_of_week_avg)

# Resample to weekly totals
weekly_sales = sales_df['sales'].resample('W').sum()
print("\nWeekly sales totals:")
print(weekly_sales)

Output:

Sample of daily sales data:
sales day_of_week 7d_avg
date
2023-01-01 51 Sunday NaN
2023-01-02 168 Monday NaN
2023-01-03 113 Tuesday NaN
2023-01-04 88 Wednesday NaN
2023-01-05 150 Thursday NaN

Average sales by day of week:
day_of_week
Monday 134.600000
Tuesday 111.200000
Wednesday 109.000000
Thursday 127.750000
Friday 120.000000
Saturday 124.500000
Sunday 120.250000
Name: sales, dtype: float64

Weekly sales totals:
date
2023-01-01 51
2023-01-08 807
2023-01-15 783
2023-01-22 756
2023-01-29 839
Freq: W-SUN, Name: sales, dtype: int64

Summary

In this lesson, we've explored how to work with datetime data in pandas:

  • Converting strings to datetime using pd.to_datetime()
  • Creating date ranges with pd.date_range()
  • Extracting datetime components like year, month, day
  • Filtering DataFrames based on dates
  • Resampling time series data
  • Shifting and calculating rolling windows
  • Applying these techniques to real-world data analysis

DateTime handling is a powerful feature in pandas that allows you to effectively work with time series data, making it easier to analyze trends, patterns, and seasonal variations in your data.

Practice Exercises

  1. Create a DataFrame with daily temperature readings for one month, then calculate the weekly average temperature.

  2. Parse dates from a list of strings in different formats (mix of US and European formats).

  3. Create a DataFrame of stock prices and calculate the 5-day rolling average and standard deviation.

  4. Build a dataset of website visits by hour, then resample it to show total daily and weekly traffic.

  5. Calculate the difference between weekday and weekend average values for any time series dataset.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)