Pandas Period
Introduction
In data analysis, working with time-based data is extremely common. Whether you're analyzing sales trends, monitoring website traffic, or processing sensor data, understanding how to represent and manipulate time periods is essential.
Pandas, a powerful data manipulation library in Python, provides a specialized data structure called Period
for representing time spans such as days, months, quarters, or years. Unlike timestamp objects that represent specific points in time, Period objects represent time spans or intervals, making them perfect for time-based grouping and analysis.
In this tutorial, we'll explore how to create and work with Pandas Period objects, understand their unique properties, and see how they fit into time series analysis.
What is a Pandas Period?
A Period represents a span of time, such as a day, month, quarter, or year. It differs from a timestamp in that it represents a duration rather than a specific moment. This makes Period objects particularly useful for:
- Financial analysis (like quarterly or monthly reporting)
- Seasonal comparisons
- Any analysis where you want to group by time intervals
Creating Period Objects
Let's start by creating some basic Period objects:
import pandas as pd
# Create a period representing January 2023
jan_2023 = pd.Period('2023-01', freq='M')
print(jan_2023)
# Create a period representing the year 2023
year_2023 = pd.Period('2023', freq='A')
print(year_2023)
# Create a period representing Q1 2023
q1_2023 = pd.Period('2023Q1', freq='Q')
print(q1_2023)
# Create a period representing March 15, 2023
day_period = pd.Period('2023-03-15', freq='D')
print(day_period)
Output:
2023-01
2023
2023Q1
2023-03-15
As you can see, the freq
parameter is crucial when creating Period objects. It specifies the frequency or time span that the Period represents.
Common Frequency Aliases
Here are some common frequency aliases used with Period objects:
D
: Calendar dayB
: Business dayW
: WeekM
: Month endQ
: Quarter endA
orY
: Year endH
: HourT
ormin
: MinuteS
: Second
You can also specify custom frequencies with multiples, like 2D
for every two days or 3M
for every three months.
Converting Between Different Frequencies
You can convert a Period from one frequency to another using the asfreq()
method:
# Start with a monthly period
monthly_period = pd.Period('2023-01', freq='M')
print(f"Monthly period: {monthly_period}")
# Convert to a daily period (defaults to the last day of the month)
daily_period = monthly_period.asfreq('D')
print(f"Daily period (end of month): {daily_period}")
# Convert to a daily period at the start of the month
daily_period_start = monthly_period.asfreq('D', how='start')
print(f"Daily period (start of month): {daily_period_start}")
# Convert to quarterly
quarterly_period = monthly_period.asfreq('Q')
print(f"Quarterly period: {quarterly_period}")
Output:
Monthly period: 2023-01
Daily period (end of month): 2023-01-31
Daily period (start of month): 2023-01-01
Quarterly period: 2023Q1
Performing Arithmetic with Periods
You can perform addition and subtraction with Period objects:
# Start with January 2023
current_period = pd.Period('2023-01', freq='M')
print(f"Current period: {current_period}")
# Add 3 months
next_quarter = current_period + 3
print(f"Three months later: {next_quarter}")
# Subtract 1 year
previous_year = current_period - 12
print(f"One year earlier: {previous_year}")
# Find the difference between two periods
future_period = pd.Period('2024-06', freq='M')
months_between = future_period - current_period
print(f"Months between {current_period} and {future_period}: {months_between}")
Output:
Current period: 2023-01
Three months later: 2023-04
One year earlier: 2022-01
Months between 2023-01 and 2024-06: 17
Creating Period Ranges
Similar to date_range()
for timestamps, Pandas offers period_range()
for creating sequences of Period objects:
# Create a range of monthly periods for the year 2023
months_2023 = pd.period_range(start='2023-01', end='2023-12', freq='M')
print("Monthly periods for 2023:")
print(months_2023)
# Create 4 quarters starting from Q1 2023
quarters = pd.period_range(start='2023Q1', periods=4, freq='Q')
print("\nQuarterly periods:")
print(quarters)
# Create business days for January 2023
business_days = pd.period_range(start='2023-01-01', end='2023-01-31', freq='B')
print(f"\nNumber of business days in January 2023: {len(business_days)}")
print(business_days[:5]) # Show first 5 business days
Output:
Monthly periods for 2023:
PeriodIndex(['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
'2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12'],
dtype='period[M]')
Quarterly periods:
PeriodIndex(['2023Q1', '2023Q2', '2023Q3', '2023Q4'], dtype='period[Q-DEC]')
Number of business days in January 2023: 22
PeriodIndex(['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06'],
dtype='period[B]')
Using PeriodIndex in DataFrames
One of the most powerful applications of Period objects is to use them as an index in a DataFrame, which enables easy time-based grouping and analysis:
import numpy as np
# Create a DataFrame with PeriodIndex
months = pd.period_range('2023-01', periods=12, freq='M')
data = {
'sales': np.random.randint(100, 500, size=12),
'expenses': np.random.randint(50, 300, size=12)
}
df = pd.DataFrame(data, index=months)
print(df.head())
# Calculate profit
df['profit'] = df['sales'] - df['expenses']
# Group by quarter
quarterly_data = df.resample('Q').sum()
print("\nQuarterly data:")
print(quarterly_data)
# Group by semester (half-year)
semester_data = df.resample('2M').sum()
print("\nBi-monthly data:")
print(semester_data)
Output (Note: Your random values will differ):
sales expenses
2023-01 345 176
2023-02 269 128
2023-03 478 271
2023-04 394 191
2023-05 153 68
Quarterly data:
sales expenses profit
2023Q1 1092 575 517
2023Q2 869 400 469
2023Q3 938 457 481
2023Q4 861 400 461
Bi-monthly data:
sales expenses profit
2023-01 345 176 169
2023-03 747 399 348
2023-05 547 259 288
2023-07 441 198 243
2023-09 497 259 238
2023-11 364 141 223
Real-World Example: Monthly Sales Analysis
Let's see a more complete example of analyzing monthly sales data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create sample sales data
np.random.seed(42) # For reproducible results
date_range = pd.period_range(start='2020-01', end='2023-12', freq='M')
sales = 1000 + np.random.normal(0, 100, len(date_range)) + \
np.sin(np.arange(len(date_range)) * 2 * np.pi / 12) * 300 # Adding seasonality
# Create DataFrame
sales_data = pd.DataFrame({'sales': sales}, index=date_range)
# Calculate year-over-year growth
sales_data['yoy_growth'] = sales_data['sales'].pct_change(12) * 100
# Calculate rolling 3-month average
sales_data['3M_avg'] = sales_data['sales'].rolling(window=3).mean()
# Print summary
print(sales_data.head())
# Group by year
yearly_sales = sales_data['sales'].groupby(sales_data.index.year).sum()
print("\nYearly Sales:")
print(yearly_sales)
# Group by quarter
quarterly_sales = sales_data['sales'].groupby([sales_data.index.year, sales_data.index.quarter]).sum()
print("\nQuarterly Sales (First 4 quarters):")
print(quarterly_sales.head(4))
# Plot the data (you'd need to run this in a Jupyter notebook or script to see the plot)
plt.figure(figsize=(12, 6))
sales_data['sales'].plot(label='Monthly Sales')
sales_data['3M_avg'].plot(label='3-Month Moving Average', linewidth=2)
plt.title('Monthly Sales with 3-Month Moving Average')
plt.legend()
plt.grid(True)
# plt.show()
Output (Partial):
sales yoy_growth 3M_avg
2020-01 931.60 NaN NaN
2020-02 809.37 NaN NaN
2020-03 915.33 NaN 885.4334
2020-04 1175.98 NaN 966.8947
2020-05 1263.15 NaN 1118.1533
Yearly Sales:
2020 13319.124132
2021 12436.450829
2022 12739.972705
2023 13232.977411
Name: sales, dtype: float64
Quarterly Sales (First 4 quarters):
2020 1 2656.304298
2 3337.044867
3 3739.226160
4 3586.548808
Name: sales, dtype: float64
In this example, we:
- Created monthly sales data with seasonal patterns
- Calculated year-over-year growth and moving averages
- Grouped data by different time periods (year, quarter)
- Created a visualization of the trends
This demonstrates how Period objects enable powerful time-series analysis and make it easy to work with time-based groupings.
PeriodIndex vs. DatetimeIndex
You might be wondering when to use PeriodIndex versus DatetimeIndex. Here's a quick comparison:
-
PeriodIndex: Represents time spans (e.g., January 2023, Q1 2023)
- Best for: Financial reporting, seasonal analysis, and when you want to think in terms of "months" or "quarters" rather than specific dates
- Frequency is explicit in the index itself
-
DatetimeIndex: Represents specific points in time (e.g., 2023-01-15 14:30:00)
- Best for: Event data, time-stamped logs, and high-frequency data
- More flexible for irregular time series
# PeriodIndex example - Monthly data
period_df = pd.DataFrame({
'value': [1, 2, 3]
}, index=pd.period_range('2023-01', periods=3, freq='M'))
# DatetimeIndex example - Specific dates
datetime_df = pd.DataFrame({
'value': [1, 2, 3]
}, index=pd.date_range('2023-01-01', periods=3, freq='MS'))
print("PeriodIndex DataFrame:")
print(period_df)
print("\nDatetimeIndex DataFrame:")
print(datetime_df)
Output:
PeriodIndex DataFrame:
value
2023-01 1
2023-02 2
2023-03 3
DatetimeIndex DataFrame:
value
2023-01-01 1
2023-02-01 2
2023-03-01 3
Converting Between Timestamps and Periods
You can convert between datetime timestamps and periods:
# Convert from timestamps to periods
dates = pd.date_range('2023-01-01', periods=3, freq='M')
periods = dates.to_period('M')
print(f"Dates: {dates}")
print(f"Periods: {periods}")
# Convert from periods to timestamps (start of period)
periods = pd.period_range('2023-01', periods=3, freq='M')
start_dates = periods.to_timestamp(how='start')
print(f"\nPeriods: {periods}")
print(f"Start dates: {start_dates}")
# Convert from periods to timestamps (end of period)
end_dates = periods.to_timestamp(how='end')
print(f"End dates: {end_dates}")
Output:
Dates: DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31'], dtype='datetime64[ns]', freq='M')
Periods: PeriodIndex(['2023-01', '2023-02', '2023-03'], dtype='period[M]')
Periods: PeriodIndex(['2023-01', '2023-02', '2023-03'], dtype='period[M]')
Start dates: DatetimeIndex(['2023-01-01', '2023-02-01', '2023-03-01'], dtype='datetime64[ns]', freq=None)
End dates: DatetimeIndex(['2023-01-31', '2023-02-28', '2023-03-31'], dtype='datetime64[ns]', freq=None)
Summary
Pandas Period objects provide a powerful way to represent and work with time spans in your data analysis:
- Period Objects represent time spans like days, months, quarters, or years
- Creating Periods involves specifying a date string and frequency
- Period Arithmetic lets you add/subtract time spans and find differences
- PeriodIndex enables time-based grouping and aggregation in DataFrames
- Frequency Conversion allows you to move between different time spans
- Resampling supports aggregating data at different time frequencies
By using Periods effectively, your time series analysis becomes more intuitive, especially when working with financial data, seasonal patterns, or any data that naturally groups into time intervals.
Exercises
To reinforce your learning, try these exercises:
- Create a DataFrame with monthly sales data for 3 years, and calculate the month-over-month percentage change.
- Using a PeriodIndex with quarterly frequency, find the best and worst performing quarters.
- Convert a dataset with daily data to monthly summaries using period frequency conversion.
- Create a custom business quarter system where Q1 starts in February instead of January.
- Compare the average values of your data by day of week using Period objects.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)