Pandas Pivot

When working with data in pandas, you'll often need to reorganize or restructure it to make it more suitable for analysis or visualization. The pivot() function is a powerful tool that allows you to reshape your data by specifying which columns should become the new index, columns, and values.

What is Pivot?

The pivot operation allows you to transform your data from a "long" format to a "wide" format. It's similar to creating pivot tables in spreadsheet applications like Excel. This operation is particularly useful when you want to:

Reorganize your data for better visualization
Prepare your data for specific types of analysis
Convert stacked data into a more readable tabular format

Basic Syntax

The basic syntax of the pivot() function is:

python
DataFrame.pivot(index=None, columns=None, values=None)

Where:

index: Column to use for the resulting table's index
columns: Column to use for the resulting table's columns
values: Column(s) to use for populating the values in the resulting table

Simple Example

Let's start with a basic example to understand how pivoting works:

python
import pandas as pd
import numpy as np

# Create a simple DataFrame
data = {
    'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 120, 180]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
         date product  sales
0  2023-01-01       A    100
1  2023-01-01       B    150
2  2023-01-02       A    120
3  2023-01-02       B    180

Now, let's pivot this data to have dates as rows, products as columns, and sales as values:

python
pivoted_df = df.pivot(index='date', columns='product', values='sales')
print("\nPivoted DataFrame:")
print(pivoted_df)

Output:

Pivoted DataFrame:
product           A      B
date                      
2023-01-01  100.0  150.0
2023-01-02  120.0  180.0

The pivot operation has restructured our data so that:

The unique values from the date column became the index
The unique values from the product column became the column headers
The corresponding values from the sales column filled the table

Handling Duplicate Values

One limitation of the pivot() function is that it requires unique combinations of index and column values. If there are duplicate combinations, you'll get a ValueError.

Let's see what happens if we have duplicate data:

python
# Create a DataFrame with duplicate combinations
data_with_duplicates = {
    'date': ['2023-01-01', '2023-01-01', '2023-01-01', '2023-01-02'],
    'product': ['A', 'A', 'B', 'B'],
    'sales': [100, 110, 150, 180]
}

df_duplicates = pd.DataFrame(data_with_duplicates)
print("DataFrame with duplicates:")
print(df_duplicates)

Output:

DataFrame with duplicates:
         date product  sales
0  2023-01-01       A    100
1  2023-01-01       A    110
2  2023-01-01       B    150
3  2023-01-02       B    180

If we try to pivot this data:

python
try:
    df_duplicates.pivot(index='date', columns='product', values='sales')
except ValueError as e:
    print(f"\nError: {e}")

Output:

Error: Index contains duplicate entries, cannot reshape

To handle duplicates, you can use the pivot_table() function instead, which can aggregate duplicate values:

python
pivot_table_df = df_duplicates.pivot_table(index='date', columns='product', values='sales', aggfunc='mean')
print("\nPivot Table (with mean aggregation):")
print(pivot_table_df)

Output:

Pivot Table (with mean aggregation):
product           A      B
date                      
2023-01-01  105.0  150.0
2023-01-02    NaN  180.0

Multiple Value Columns

You can also pivot multiple value columns at once:

python
# Create DataFrame with multiple value columns
data_multi = {
    'date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 120, 180],
    'units': [10, 15, 12, 18]
}

df_multi = pd.DataFrame(data_multi)
print("DataFrame with multiple value columns:")
print(df_multi)

Output:

DataFrame with multiple value columns:
         date product  sales  units
0  2023-01-01       A    100     10
1  2023-01-01       B    150     15
2  2023-01-02       A    120     12
3  2023-01-02       B    180     18

python
# To pivot with multiple value columns, we need to use pivot_table
multi_pivot = df_multi.pivot_table(index='date', columns='product')
print("\nPivoted DataFrame with multiple value columns:")
print(multi_pivot)

Output:

Pivoted DataFrame with multiple value columns:
              sales          units       
product           A      B      A     B
date                                    
2023-01-01  100.0  150.0   10.0  15.0
2023-01-02  120.0  180.0   12.0  18.0

Notice how this creates a hierarchical column structure with sales and units as the top level, and the product types as the second level.

Real-World Example: Temperature Analysis

Let's look at a more practical example. Suppose you have temperature data for different cities over several days, and you want to analyze and visualize it effectively.

python
# Temperature data for different cities
temp_data = {
    'date': ['2023-01-01', '2023-01-01', '2023-01-01',
             '2023-01-02', '2023-01-02', '2023-01-02',
             '2023-01-03', '2023-01-03', '2023-01-03'],
    'city': ['New York', 'Chicago', 'San Francisco',
             'New York', 'Chicago', 'San Francisco',
             'New York', 'Chicago', 'San Francisco'],
    'temperature': [32, 28, 60, 30, 25, 59, 31, 27, 58]
}

temp_df = pd.DataFrame(temp_data)
print("Temperature Data:")
print(temp_df)

Output:

Temperature Data:
         date          city  temperature
2023-01-01      New York           32
2023-01-01       Chicago           28
2023-01-01  San Francisco          60
2023-01-02      New York           30
2023-01-02       Chicago           25
2023-01-02  San Francisco          59
2023-01-03      New York           31
2023-01-03       Chicago           27
2023-01-03  San Francisco          58

Let's pivot this to get a clearer view of temperature trends by city:

python
temp_pivoted = temp_df.pivot(index='date', columns='city', values='temperature')
print("\nPivoted Temperature Data:")
print(temp_pivoted)

Output:

Pivoted Temperature Data:
city        Chicago  New York  San Francisco
date                                        
2023-01-01       28        32             60
2023-01-02       25        30             59
2023-01-03       27        31             58

Now you can easily compare temperatures across cities and days. You might visualize this data with:

python
import matplotlib.pyplot as plt

# Plot temperature trends
temp_pivoted.plot(kind='line', marker='o', figsize=(10, 6))
plt.title('Temperature Trends by City')
plt.ylabel('Temperature (°F)')
plt.grid(True)
plt.tight_layout()

Reshaping Back: From Wide to Long Format

After pivoting data from long to wide format, you might want to convert it back. You can use the melt() function for this purpose, which is essentially the reverse operation of pivot:

python
# Starting with our pivoted temperature data
print("Pivoted (wide) format:")
print(temp_pivoted)

# Melt back to long format
melted_df = temp_pivoted.reset_index().melt(id_vars='date', 
                                           value_name='temperature',
                                           var_name='city')
print("\nMelted (long) format:")
print(melted_df)

Output:

Pivoted (wide) format:
city        Chicago  New York  San Francisco
date                                        
2023-01-01       28        32             60
2023-01-02       25        30             59
2023-01-03       27        31             58

Melted (long) format:
         date          city  temperature
0  2023-01-01       Chicago           28
1  2023-01-02       Chicago           25
2  2023-01-03       Chicago           27
3  2023-01-01      New York           32
4  2023-01-02      New York           30
5  2023-01-03      New York           31
6  2023-01-01  San Francisco           60
7  2023-01-02  San Francisco           59
8  2023-01-03  San Francisco           58

Practical Use Case: Sales Analysis

Let's explore a more complex example analyzing sales data:

python
# Create sales data
sales_data = {
    'date': pd.date_range(start='2023-01-01', periods=12, freq='D').repeat(3),
    'product': ['Laptop', 'Phone', 'Tablet'] * 12,
    'region': ['North', 'South', 'East', 'West'] * 9,
    'sales': np.random.randint(5, 50, size=36),
    'units': np.random.randint(1, 10, size=36)
}

sales_df = pd.DataFrame(sales_data)
print("Sales Data Sample:")
print(sales_df.head(10))

We can pivot this data to analyze sales by date and product:

python
# Pivot to analyze daily sales by product
daily_product_sales = sales_df.pivot_table(
    index='date', 
    columns='product', 
    values='sales',
    aggfunc='sum'
)

print("\nDaily Product Sales:")
print(daily_product_sales.head())

# Pivot to analyze regional sales by product
regional_product_sales = sales_df.pivot_table(
    index='region', 
    columns='product', 
    values=['sales', 'units'],
    aggfunc='sum'
)

print("\nRegional Product Sales:")
print(regional_product_sales)

This type of reshaping makes it much easier to spot trends, compare performance across categories, and create meaningful visualizations.

Summary

The pivot() function in pandas is a powerful tool for reshaping data from long to wide format. It allows you to:

Reorganize your data for better readability and analysis
Transform data for visualization
Create cross-tabulations similar to Excel pivot tables

Key points to remember:

Basic syntax: DataFrame.pivot(index, columns, values)
Requires unique combinations of index and column values
Use pivot_table() when you have duplicate entries that need aggregation
Can handle multiple value columns, creating hierarchical column structures
Pairs well with melt() for converting back from wide to long format

Exercises

Practice pivoting a dataset with temperature data for different cities over time.
Create a dataset with sales information and use pivot to analyze sales by product and region.
Use pivot_table() with different aggregation functions (sum, mean, max) on a dataset with duplicate entries.
Pivot a dataset with multiple value columns and analyze the resulting hierarchical column structure.
Use pivot to prepare data for a visualization showing trends over time.

Additional Resources

Happy pivoting!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

What is Pivot?​

Basic Syntax​

Simple Example​

Handling Duplicate Values​

Multiple Value Columns​

Real-World Example: Temperature Analysis​

Reshaping Back: From Wide to Long Format​

Practical Use Case: Sales Analysis​

Summary​

Exercises​

Additional Resources​

What is Pivot?

Basic Syntax

Simple Example

Handling Duplicate Values

Multiple Value Columns

Real-World Example: Temperature Analysis

Reshaping Back: From Wide to Long Format

Practical Use Case: Sales Analysis

Summary

Exercises

Additional Resources