Skip to main content

Pandas Period

Introduction

In data analysis, working with time-based data is extremely common. Whether you're analyzing sales trends, monitoring website traffic, or processing sensor data, understanding how to represent and manipulate time periods is essential.

Pandas, a powerful data manipulation library in Python, provides a specialized data structure called Period for representing time spans such as days, months, quarters, or years. Unlike timestamp objects that represent specific points in time, Period objects represent time spans or intervals, making them perfect for time-based grouping and analysis.

In this tutorial, we'll explore how to create and work with Pandas Period objects, understand their unique properties, and see how they fit into time series analysis.

What is a Pandas Period?

A Period represents a span of time, such as a day, month, quarter, or year. It differs from a timestamp in that it represents a duration rather than a specific moment. This makes Period objects particularly useful for:

  • Financial analysis (like quarterly or monthly reporting)
  • Seasonal comparisons
  • Any analysis where you want to group by time intervals

Creating Period Objects

Let's start by creating some basic Period objects:

python
import pandas as pd

# Create a period representing January 2023
jan_2023 = pd.Period('2023-01', freq='M')
print(jan_2023)

# Create a period representing the year 2023
year_2023 = pd.Period('2023', freq='A')
print(year_2023)

# Create a period representing Q1 2023
q1_2023 = pd.Period('2023Q1', freq='Q')
print(q1_2023)

# Create a period representing March 15, 2023
day_period = pd.Period('2023-03-15', freq='D')
print(day_period)

Output:

2023-01
2023
2023Q1
2023-03-15

As you can see, the freq parameter is crucial when creating Period objects. It specifies the frequency or time span that the Period represents.

Common Frequency Aliases

Here are some common frequency aliases used with Period objects:

  • D: Calendar day
  • B: Business day
  • W: Week
  • M: Month end
  • Q: Quarter end
  • A or Y: Year end
  • H: Hour
  • T or min: Minute
  • S: Second

You can also specify custom frequencies with multiples, like 2D for every two days or 3M for every three months.

Converting Between Different Frequencies

You can convert a Period from one frequency to another using the asfreq() method:

python
# Start with a monthly period
monthly_period = pd.Period('2023-01', freq='M')
print(f"Monthly period: {monthly_period}")

# Convert to a daily period (defaults to the last day of the month)
daily_period = monthly_period.asfreq('D')
print(f"Daily period (end of month): {daily_period}")

# Convert to a daily period at the start of the month
daily_period_start = monthly_period.asfreq('D', how='start')
print(f"Daily period (start of month): {daily_period_start}")

# Convert to quarterly
quarterly_period = monthly_period.asfreq('Q')
print(f"Quarterly period: {quarterly_period}")

Output:

Monthly period: 2023-01
Daily period (end of month): 2023-01-31
Daily period (start of month): 2023-01-01
Quarterly period: 2023Q1

Performing Arithmetic with Periods

You can perform addition and subtraction with Period objects:

python
# Start with January 2023
current_period = pd.Period('2023-01', freq='M')
print(f"Current period: {current_period}")

# Add 3 months
next_quarter = current_period + 3
print(f"Three months later: {next_quarter}")

# Subtract 1 year
previous_year = current_period - 12
print(f"One year earlier: {previous_year}")

# Find the difference between two periods
future_period = pd.Period('2024-06', freq='M')
months_between = future_period - current_period
print(f"Months between {current_period} and {future_period}: {months_between}")

Output:

Current period: 2023-01
Three months later: 2023-04
One year earlier: 2022-01
Months between 2023-01 and 2024-06: 17

Creating Period Ranges

Similar to date_range() for timestamps, Pandas offers period_range() for creating sequences of Period objects:

python
# Create a range of monthly periods for the year 2023
months_2023 = pd.period_range(start='2023-01', end='2023-12', freq='M')
print("Monthly periods for 2023:")
print(months_2023)

# Create 4 quarters starting from Q1 2023
quarters = pd.period_range(start='2023Q1', periods=4, freq='Q')
print("\nQuarterly periods:")
print(quarters)

# Create business days for January 2023
business_days = pd.period_range(start='2023-01-01', end='2023-01-31', freq='B')
print(f"\nNumber of business days in January 2023: {len(business_days)}")
print(business_days[:5]) # Show first 5 business days

Output:

Monthly periods for 2023:
PeriodIndex(['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
'2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12'],
dtype='period[M]')

Quarterly periods:
PeriodIndex(['2023Q1', '2023Q2', '2023Q3', '2023Q4'], dtype='period[Q-DEC]')

Number of business days in January 2023: 22
PeriodIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06'],
dtype='period[B]')

Using PeriodIndex in DataFrames

One of the most powerful applications of Period objects is to use them as an index in a DataFrame, which enables easy time-based grouping and analysis:

python
import numpy as np

# Create a DataFrame with PeriodIndex
months = pd.period_range('2023-01', periods=12, freq='M')
data = {
'sales': np.random.randint(100, 500, size=12),
'expenses': np.random.randint(50, 300, size=12)
}

df = pd.DataFrame(data, index=months)
print(df.head())

# Calculate profit
df['profit'] = df['sales'] - df['expenses']

# Group by quarter
quarterly_data = df.resample('Q').sum()
print("\nQuarterly data:")
print(quarterly_data)

# Group by semester (half-year)
semester_data = df.resample('2M').sum()
print("\nBi-monthly data:")
print(semester_data)

Output (Note: Your random values will differ):

         sales  expenses
2023-01 345 176
2023-02 269 128
2023-03 478 271
2023-04 394 191
2023-05 153 68

Quarterly data:
sales expenses profit
2023Q1 1092 575 517
2023Q2 869 400 469
2023Q3 938 457 481
2023Q4 861 400 461

Bi-monthly data:
sales expenses profit
2023-01 345 176 169
2023-03 747 399 348
2023-05 547 259 288
2023-07 441 198 243
2023-09 497 259 238
2023-11 364 141 223

Real-World Example: Monthly Sales Analysis

Let's see a more complete example of analyzing monthly sales data:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create sample sales data
np.random.seed(42) # For reproducible results
date_range = pd.period_range(start='2020-01', end='2023-12', freq='M')
sales = 1000 + np.random.normal(0, 100, len(date_range)) + \
np.sin(np.arange(len(date_range)) * 2 * np.pi / 12) * 300 # Adding seasonality

# Create DataFrame
sales_data = pd.DataFrame({'sales': sales}, index=date_range)

# Calculate year-over-year growth
sales_data['yoy_growth'] = sales_data['sales'].pct_change(12) * 100

# Calculate rolling 3-month average
sales_data['3M_avg'] = sales_data['sales'].rolling(window=3).mean()

# Print summary
print(sales_data.head())

# Group by year
yearly_sales = sales_data['sales'].groupby(sales_data.index.year).sum()
print("\nYearly Sales:")
print(yearly_sales)

# Group by quarter
quarterly_sales = sales_data['sales'].groupby([sales_data.index.year, sales_data.index.quarter]).sum()
print("\nQuarterly Sales (First 4 quarters):")
print(quarterly_sales.head(4))

# Plot the data (you'd need to run this in a Jupyter notebook or script to see the plot)
plt.figure(figsize=(12, 6))
sales_data['sales'].plot(label='Monthly Sales')
sales_data['3M_avg'].plot(label='3-Month Moving Average', linewidth=2)
plt.title('Monthly Sales with 3-Month Moving Average')
plt.legend()
plt.grid(True)
# plt.show()

Output (Partial):

         sales  yoy_growth     3M_avg
2020-01 931.60 NaN NaN
2020-02 809.37 NaN NaN
2020-03 915.33 NaN 885.4334
2020-04 1175.98 NaN 966.8947
2020-05 1263.15 NaN 1118.1533

Yearly Sales:
2020 13319.124132
2021 12436.450829
2022 12739.972705
2023 13232.977411
Name: sales, dtype: float64

Quarterly Sales (First 4 quarters):
2020 1 2656.304298
2 3337.044867
3 3739.226160
4 3586.548808
Name: sales, dtype: float64

In this example, we:

  1. Created monthly sales data with seasonal patterns
  2. Calculated year-over-year growth and moving averages
  3. Grouped data by different time periods (year, quarter)
  4. Created a visualization of the trends

This demonstrates how Period objects enable powerful time-series analysis and make it easy to work with time-based groupings.

PeriodIndex vs. DatetimeIndex

You might be wondering when to use PeriodIndex versus DatetimeIndex. Here's a quick comparison:

  • PeriodIndex: Represents time spans (e.g., January 2023, Q1 2023)

    • Best for: Financial reporting, seasonal analysis, and when you want to think in terms of "months" or "quarters" rather than specific dates
    • Frequency is explicit in the index itself
  • DatetimeIndex: Represents specific points in time (e.g., 2023-01-15 14:30:00)

    • Best for: Event data, time-stamped logs, and high-frequency data
    • More flexible for irregular time series
python
# PeriodIndex example - Monthly data
period_df = pd.DataFrame({
'value': [1, 2, 3]
}, index=pd.period_range('2023-01', periods=3, freq='M'))

# DatetimeIndex example - Specific dates
datetime_df = pd.DataFrame({
'value': [1, 2, 3]
}, index=pd.date_range('2023-01-01', periods=3, freq='MS'))

print("PeriodIndex DataFrame:")
print(period_df)
print("\nDatetimeIndex DataFrame:")
print(datetime_df)

Output:

PeriodIndex DataFrame:
value
2023-01 1
2023-02 2
2023-03 3

DatetimeIndex DataFrame:
value
2023-01-01 1
2023-02-01 2
2023-03-01 3

Converting Between Timestamps and Periods

You can convert between datetime timestamps and periods:

python
# Convert from timestamps to periods
dates = pd.date_range('2023-01-01', periods=3, freq='M')
periods = dates.to_period('M')
print(f"Dates: {dates}")
print(f"Periods: {periods}")

# Convert from periods to timestamps (start of period)
periods = pd.period_range('2023-01', periods=3, freq='M')
start_dates = periods.to_timestamp(how='start')
print(f"\nPeriods: {periods}")
print(f"Start dates: {start_dates}")

# Convert from periods to timestamps (end of period)
end_dates = periods.to_timestamp(how='end')
print(f"End dates: {end_dates}")

Output:

Dates: DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31'], dtype='datetime64[ns]', freq='M')
Periods: PeriodIndex(['2023-01', '2023-02', '2023-03'], dtype='period[M]')

Periods: PeriodIndex(['2023-01', '2023-02', '2023-03'], dtype='period[M]')
Start dates: DatetimeIndex(['2023-01-01', '2023-02-01', '2023-03-01'], dtype='datetime64[ns]', freq=None)
End dates: DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31'], dtype='datetime64[ns]', freq=None)

Summary

Pandas Period objects provide a powerful way to represent and work with time spans in your data analysis:

  • Period Objects represent time spans like days, months, quarters, or years
  • Creating Periods involves specifying a date string and frequency
  • Period Arithmetic lets you add/subtract time spans and find differences
  • PeriodIndex enables time-based grouping and aggregation in DataFrames
  • Frequency Conversion allows you to move between different time spans
  • Resampling supports aggregating data at different time frequencies

By using Periods effectively, your time series analysis becomes more intuitive, especially when working with financial data, seasonal patterns, or any data that naturally groups into time intervals.

Exercises

To reinforce your learning, try these exercises:

  1. Create a DataFrame with monthly sales data for 3 years, and calculate the month-over-month percentage change.
  2. Using a PeriodIndex with quarterly frequency, find the best and worst performing quarters.
  3. Convert a dataset with daily data to monthly summaries using period frequency conversion.
  4. Create a custom business quarter system where Q1 starts in February instead of January.
  5. Compare the average values of your data by day of week using Period objects.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)