Skip to main content

Pandas Time Series Basics

Time series data is a sequence of data points collected over time intervals, making it one of the most common data types in data science and analytics. Whether you're analyzing stock prices, website traffic, or climate data, understanding how to work with time series in pandas is an essential skill for any data practitioner.

Introduction to Time Series in Pandas

Pandas provides powerful capabilities for working with time-based data through its DatetimeIndex and time series functionality. These features let you:

  • Convert between different date and time representations
  • Slice and select data based on dates
  • Resample time series to different frequencies
  • Perform time-based operations like shifting and lagging
  • Handle time zones and daylight saving time

In this guide, we'll explore the basics of working with time series in pandas, starting with creating time series objects and moving on to essential operations.

Setting Up Your Environment

Before we begin, make sure you have pandas installed and imported:

python
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Optional but useful for better visualizations
plt.style.use('seaborn-v0_8')

# Display settings for better readability
pd.set_option('display.max_rows', 10)

Creating Time Series Data

Creating a DatetimeIndex

The foundation of time series in pandas is the DatetimeIndex. Let's start by creating one:

python
# Creating a DatetimeIndex
dates = pd.date_range(start='2023-01-01', end='2023-01-10')
print(dates)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
'2023-01-09', '2023-01-10'],
dtype='datetime64[ns]', freq='D')

Creating a Series with a DatetimeIndex

Let's create a simple Series with a DatetimeIndex:

python
# Creating a Series with DatetimeIndex
temperatures = pd.Series([32, 30, 31, 28, 33, 35, 29, 30, 31, 32], index=dates)
print(temperatures)

Output:

2023-01-01    32
2023-01-02 30
2023-01-03 31
2023-01-04 28
2023-01-05 33
2023-01-06 35
2023-01-07 29
2023-01-08 30
2023-01-09 31
2023-01-10 32
Freq: D, dtype: int64

Creating a DataFrame with DatetimeIndex

For more complex data, a DataFrame might be more appropriate:

python
# Creating a DataFrame with DatetimeIndex
weather_data = pd.DataFrame({
'Temperature': [32, 30, 31, 28, 33, 35, 29, 30, 31, 32],
'Humidity': [80, 82, 78, 77, 85, 84, 80, 81, 79, 78]
}, index=dates)
print(weather_data)

Output:

            Temperature  Humidity
2023-01-01 32 80
2023-01-02 30 82
2023-01-03 31 78
2023-01-04 28 77
2023-01-05 33 85
2023-01-06 35 84
2023-01-07 29 80
2023-01-08 30 81
2023-01-09 31 79
2023-01-10 32 78

Date Ranges and Frequency

The date_range() function is incredibly versatile. You can specify different frequencies and create various types of date ranges:

python
# Daily frequency (default)
daily = pd.date_range('2023-01-01', periods=5)
print("Daily dates:")
print(daily)

# Monthly frequency
monthly = pd.date_range('2023-01-01', periods=5, freq='M')
print("\nMonthly dates (month ends):")
print(monthly)

# Business day frequency
business = pd.date_range('2023-01-01', periods=5, freq='B')
print("\nBusiness days:")
print(business)

# Hourly frequency
hourly = pd.date_range('2023-01-01', periods=5, freq='H')
print("\nHourly:")
print(hourly)

Output:

Daily dates:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'],
dtype='datetime64[ns]', freq='D')

Monthly dates (month ends):
DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
'2023-05-31'],
dtype='datetime64[ns]', freq='M')

Business days:
DatetimeIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06'],
dtype='datetime64[ns]', freq='B')

Hourly:
DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 01:00:00',
'2023-01-01 02:00:00', '2023-01-01 03:00:00',
'2023-01-01 04:00:00'],
dtype='datetime64[ns]', freq='H')

Here are some common frequency aliases:

AliasDescription
DCalendar day
BBusiness day
WWeekly
MMonth end
MSMonth start
QQuarter end
QSQuarter start
A, YYear end
AS, YSYear start
HHourly
T, minMinutely
SSecondly

Converting to Datetime

Often, your data will come with dates as strings. Converting them to datetime objects is crucial for time series analysis:

python
# Converting strings to datetime
date_strings = ['01/01/2023', '01/02/2023', '01/03/2023']
dates = pd.to_datetime(date_strings)
print(dates)

# Different format
date_strings = ['Jan 1, 2023', 'Jan 2, 2023', 'Jan 3, 2023']
dates = pd.to_datetime(date_strings)
print("\nDifferent format:")
print(dates)

# Custom format
date_strings = ['01.01.2023', '02.01.2023', '03.01.2023']
dates = pd.to_datetime(date_strings, format='%d.%m.%Y')
print("\nCustom format:")
print(dates)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]')

Different format:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]')

Custom format:
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]')

Converting Columns in a DataFrame

Often you'll need to convert a column in a DataFrame to datetime:

python
# Creating a DataFrame with date strings
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-02', '2023-01-03'],
'value': [100, 102, 104]
})

# Converting to datetime
df['date'] = pd.to_datetime(df['date'])
print(df)
print(f"\nData type of 'date' column: {df['date'].dtype}")

# Setting the date column as index
df.set_index('date', inplace=True)
print("\nWith datetime index:")
print(df)

Output:

        date  value
0 2023-01-01 100
1 2023-01-02 102
2 2023-01-03 104

Data type of 'date' column: datetime64[ns]

With datetime index:
value
date
2023-01-01 100
2023-01-02 102
2023-01-03 104

Time Series Indexing and Slicing

One of the most powerful features of pandas time series is the ability to select data based on dates.

Basic Date Selection

python
# Using our temperature series from earlier
print("Original series:")
print(temperatures)

# Selecting a specific date
print("\nTemperature on January 5th, 2023:")
print(temperatures['2023-01-05'])

# Selecting a range of dates
print("\nTemperatures from January 3rd to January 7th, 2023:")
print(temperatures['2023-01-03':'2023-01-07'])

Output:

Original series:
2023-01-01 32
2023-01-02 30
2023-01-03 31
2023-01-04 28
2023-01-05 33
2023-01-06 35
2023-01-07 29
2023-01-08 30
2023-01-09 31
2023-01-10 32
Freq: D, dtype: int64

Temperature on January 5th, 2023:
33

Temperatures from January 3rd to January 7th, 2023:
2023-01-03 31
2023-01-04 28
2023-01-05 33
2023-01-06 35
2023-01-07 29
Freq: D, dtype: int64

Partial String Indexing

Pandas allows you to select dates using partial string matching:

python
# Select all dates in January 2023
print("All readings in January:")
print(temperatures['2023-01'])

# Select first 5 days
print("\nFirst 5 days:")
print(temperatures['2023-01-01':'2023-01-05'])

Output:

All readings in January:
2023-01-01 32
2023-01-02 30
2023-01-03 31
2023-01-04 28
2023-01-05 33
2023-01-06 35
2023-01-07 29
2023-01-08 30
2023-01-09 31
2023-01-10 32
Freq: D, dtype: int64

First 5 days:
2023-01-01 32
2023-01-02 30
2023-01-03 31
2023-01-04 28
2023-01-05 33
Freq: D, dtype: int64

Datetime Components and Properties

You can easily extract various components from datetime indices:

python
# Create a time series with different times
dates = pd.date_range('2023-01-01', periods=5, freq='D')
ts = pd.Series(np.random.randn(len(dates)), index=dates)

print("Original series:")
print(ts)

# Extracting components
print("\nYear:")
print(ts.index.year)

print("\nMonth:")
print(ts.index.month)

print("\nDay:")
print(ts.index.day)

print("\nDay of week (0=Monday, 6=Sunday):")
print(ts.index.dayofweek)

print("\nDay name:")
print(ts.index.day_name())

Output:

Original series:
2023-01-01 0.283124
2023-01-02 0.699674
2023-01-03 -0.290013
2023-01-04 -0.776091
2023-01-05 -0.129797
Freq: D, dtype: float64

Year:
array([2023, 2023, 2023, 2023, 2023])

Month:
array([1, 1, 1, 1, 1])

Day:
array([1, 2, 3, 4, 5])

Day of week (0=Monday, 6=Sunday):
array([6, 0, 1, 2, 3])

Day name:
array(['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday'], dtype=object)

You can use these components for filtering:

python
# Filter to get only weekdays
weekdays = ts[ts.index.dayofweek < 5] # 0-4 are Monday to Friday
print("Weekdays only:")
print(weekdays)

Output:

Weekdays only:
2023-01-02 0.699674
2023-01-03 -0.290013
2023-01-04 -0.776091
2023-01-05 -0.129797
Freq: D, dtype: float64

Real-World Example: Stock Price Analysis

Let's work through a practical example using stock price data:

python
# Create some sample stock price data
dates = pd.date_range('2023-01-01', periods=10, freq='B')
prices = pd.Series([150, 152, 151, 153, 154, 153, 155, 157, 156, 158], index=dates)
print("Stock prices:")
print(prices)

# Calculate daily returns
daily_returns = prices.pct_change().dropna()
print("\nDaily returns:")
print(daily_returns)

# Calculate average return by day of the week
day_returns = daily_returns.groupby(daily_returns.index.day_name()).mean()
print("\nAverage return by day of week:")
print(day_returns)

# Plot the stock prices
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
prices.plot(title='Stock Prices - January 2023')
plt.grid(True)
plt.tight_layout()

Output:

Stock prices:
2023-01-02 150
2023-01-03 152
2023-01-04 151
2023-01-05 153
2023-01-06 154
2023-01-09 153
2023-01-10 155
2023-01-11 157
2023-01-12 156
2023-01-13 158
Freq: B, dtype: int64

Daily returns:
2023-01-03 0.013333
2023-01-04 -0.006579
2023-01-05 0.013245
2023-01-06 0.006536
2023-01-09 -0.006494
2023-01-10 0.013072
2023-01-11 0.012903
2023-01-12 -0.006369
2023-01-13 0.012821
Freq: B, dtype: float64

Average return by day of week:
Monday -0.006494
Thursday 0.006307
Tuesday 0.013203
Wednesday 0.006324
Friday 0.009678
dtype: float64

Time Zone Handling

Pandas provides functionality to work with time zones:

python
# Create a time series with a specific timezone
dates = pd.date_range('2023-01-01', periods=3, tz='US/Eastern')
ts = pd.Series([1, 2, 3], index=dates)
print("Time series with Eastern timezone:")
print(ts)

# Convert to a different timezone
ts_pacific = ts.tz_convert('US/Pacific')
print("\nConverted to Pacific timezone:")
print(ts_pacific)

# Localize a naive timestamp to a timezone
naive_dates = pd.date_range('2023-01-01', periods=3)
ts_naive = pd.Series([1, 2, 3], index=naive_dates)
ts_localized = ts_naive.tz_localize('UTC')
print("\nNaive timestamps localized to UTC:")
print(ts_localized)

Output:

Time series with Eastern timezone:
2023-01-01 00:00:00-05:00 1
2023-01-02 00:00:00-05:00 2
2023-01-03 00:00:00-05:00 3
Freq: D, dtype: int64

Converted to Pacific timezone:
2023-01-01 00:00:00-08:00 1
2023-01-02 00:00:00-08:00 2
2023-01-03 00:00:00-08:00 3
Freq: D, dtype: int64

Naive timestamps localized to UTC:
2023-01-01 00:00:00+00:00 1
2023-01-02 00:00:00+00:00 2
2023-01-03 00:00:00+00:00 3
Freq: D, dtype: int64

Time Periods and Period Index

Pandas also provides Period objects that represent time spans:

python
# Create periods
periods = pd.period_range('2023-01', periods=3, freq='M')
print("Monthly periods:")
print(periods)

# Create a series with periods
period_series = pd.Series([100, 110, 120], index=periods)
print("\nSeries with period index:")
print(period_series)

# Convert timestamps to periods
dates = pd.date_range('2023-01-01', periods=3)
ts = pd.Series([1, 2, 3], index=dates)
ts_periods = ts.to_period('M')
print("\nTimestamps converted to monthly periods:")
print(ts_periods)

Output:

Monthly periods:
PeriodIndex(['2023-01', '2023-02', '2023-03'], dtype='period[M]')

Series with period index:
2023-01 100
2023-02 110
2023-03 120
Freq: M, dtype: int64

Timestamps converted to monthly periods:
2023-01 1
2023-01 2
2023-01 3
Freq: M, dtype: int64

Summary

In this guide, we've covered the basics of working with time series data in pandas:

  • Creating date ranges with date_range()
  • Converting strings to datetime using to_datetime()
  • Indexing and slicing time series data
  • Extracting date components like year, month, and day
  • Working with time zones
  • Using period indexes

These fundamental skills form the foundation for more advanced time series operations like resampling, shifting, and time series analysis, which we'll explore in future guides.

Additional Resources

To deepen your understanding of pandas time series:

  1. Official pandas documentation on Time Series / Date functionality
  2. pandas Time Series Exercises on GitHub
  3. Real-world time series datasets from Kaggle

Practice Exercises

  1. Create a time series of daily temperatures for one month and calculate the 7-day rolling average.
  2. Download historical stock price data using pandas-datareader and analyze the trends.
  3. Create a time series with hourly data and convert it to daily, weekly, and monthly frequencies.
  4. Extract all Mondays from a year-long daily time series and calculate summary statistics.
  5. Create a time series with multiple time zones and practice converting between them.

Happy coding!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)