Pandas Bar Plots
Bar plots are one of the most common and effective ways to visualize categorical data. They're especially useful for comparing values across different categories. In this tutorial, we'll explore how to create stunning bar plots using Pandas' built-in plotting capabilities.
Introduction to Bar Plots in Pandas
Pandas provides simple yet powerful functionality for creating bar plots directly from DataFrames and Series. Under the hood, Pandas leverages Matplotlib to generate these visualizations, but offers a more streamlined API that's perfect for quick data exploration.
Bar plots are ideal for:
- Comparing values across categories
- Displaying frequencies or counts
- Showing distribution of categorical data
- Visualizing before/after scenarios
Let's dive in and learn how to create these useful visualizations!
Basic Bar Plot with Pandas
To get started, we'll need to import the necessary libraries and create some sample data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Set style for better aesthetics
plt.style.use('seaborn-v0_8')
# For Jupyter notebooks, use this to show plots inline
%matplotlib inline
# Create sample data
data = {'Product': ['Laptop', 'Phone', 'Monitor', 'Keyboard', 'Mouse'],
'Sales': [300, 400, 150, 80, 120]}
df = pd.DataFrame(data)
print(df)
This will output:
Product Sales
0 Laptop 300
1 Phone 400
2 Monitor 150
3 Keyboard 80
4 Mouse 120
Now, let's create a simple bar plot showing the sales of each product:
# Create a basic bar plot
df.plot(kind='bar', x='Product', y='Sales', figsize=(10, 6))
plt.title('Product Sales Comparison')
plt.ylabel('Sales (units)')
plt.xlabel('Products')
plt.show()
In this example:
kind='bar'
specifies that we want a bar plotx='Product'
sets the x-axis labelsy='Sales'
determines the height of each barfigsize=(10, 6)
sets the figure dimensions
Horizontal Bar Plots
Sometimes horizontal bar plots are more effective, especially when you have long category names. You can create them using barh
:
# Create a horizontal bar plot
df.plot(kind='barh', x='Product', y='Sales', figsize=(10, 6))
plt.title('Product Sales Comparison')
plt.xlabel('Sales (units)')
plt.ylabel('Products')
plt.show()
Customizing Bar Plots
Let's enhance our bar plot with customizations:
# Create a customized bar plot
ax = df.plot(kind='bar', x='Product', y='Sales', figsize=(12, 7),
color='skyblue', edgecolor='black', width=0.7)
# Add title and labels with custom font sizes
plt.title('Product Sales Comparison', fontsize=16)
plt.ylabel('Sales (units)', fontsize=14)
plt.xlabel('Products', fontsize=14)
# Add data values on top of each bar
for i, v in enumerate(df['Sales']):
ax.text(i, v + 5, str(v), ha='center', fontsize=12)
# Customize grid
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Customize ticks
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Grouped Bar Plots
Grouped bar plots allow you to compare multiple variables across categories:
# Create sample data for grouped bar plot
data = {'Product': ['Laptop', 'Phone', 'Monitor', 'Keyboard', 'Mouse'],
'Sales_2021': [300, 400, 150, 80, 120],
'Sales_2022': [350, 450, 200, 70, 140]}
df_grouped = pd.DataFrame(data)
print(df_grouped)
# Create grouped bar plot
df_grouped.plot(kind='bar', x='Product', y=['Sales_2021', 'Sales_2022'],
figsize=(12, 7), width=0.7)
plt.title('Product Sales Comparison: 2021 vs 2022', fontsize=16)
plt.ylabel('Sales (units)', fontsize=14)
plt.xlabel('Products', fontsize=14)
plt.legend(['2021', '2022'])
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Output:
Product Sales_2021 Sales_2022
0 Laptop 300 350
1 Phone 400 450
2 Monitor 150 200
3 Keyboard 80 70
4 Mouse 120 140
Stacked Bar Plots
Stacked bar plots are useful for showing the composition of categories:
# Create stacked bar plot
df_grouped.plot(kind='bar', x='Product', y=['Sales_2021', 'Sales_2022'],
figsize=(12, 7), stacked=True, width=0.7)
plt.title('Product Sales Stacked: 2021 vs 2022', fontsize=16)
plt.ylabel('Total Sales (units)', fontsize=14)
plt.xlabel('Products', fontsize=14)
plt.legend(['2021', '2022'])
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Real-world Example: Sales Data Analysis
Let's work through a more comprehensive example analyzing monthly sales data:
# Create more realistic sample data
np.random.seed(42)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
data = {
'Month': months,
'Electronics': np.random.randint(50000, 100000, 12),
'Clothing': np.random.randint(30000, 70000, 12),
'Home & Kitchen': np.random.randint(20000, 50000, 12),
'Books': np.random.randint(10000, 30000, 12)
}
sales_df = pd.DataFrame(data)
print(sales_df.head())
# Calculate total sales
sales_df['Total'] = sales_df[['Electronics', 'Clothing', 'Home & Kitchen', 'Books']].sum(axis=1)
# Find top 3 months by total sales
top_months = sales_df.sort_values('Total', ascending=False).head(3)['Month'].values
print(f"Top 3 months by sales: {', '.join(top_months)}")
Output:
Month Electronics Clothing Home & Kitchen Books
0 Jan 51658 39771 25506 17269
1 Feb 92959 42415 25663 28287
2 Mar 56762 50366 47834 15367
3 Apr 69021 49401 22234 23236
4 May 56307 53194 30846 29437
Top 3 months by sales: Feb, Jul, Oct
Now, let's create an insightful visualization:
# Melt the dataframe to get it into the right format for plotting
plot_df = sales_df.melt(id_vars=['Month'],
value_vars=['Electronics', 'Clothing', 'Home & Kitchen', 'Books'],
var_name='Category', value_name='Sales')
# Create a grouped bar plot
plt.figure(figsize=(14, 8))
chart = sns.barplot(data=plot_df, x='Month', y='Sales', hue='Category')
# Highlight top months
for month in top_months:
idx = months.index(month)
plt.axvspan(idx-0.4, idx+0.4, alpha=0.1, color='red')
# Add title and labels
plt.title('Monthly Sales by Product Category', fontsize=16)
plt.ylabel('Sales ($)', fontsize=14)
plt.xlabel('Month', fontsize=14)
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add a text annotation for top months
plt.text(0.5, 0.95, f"Top months highlighted: {', '.join(top_months)}",
transform=plt.gca().transAxes, ha='center',
bbox=dict(facecolor='white', alpha=0.5))
plt.tight_layout()
plt.show()
For the above example, you'll need to add this import:
import seaborn as sns
Percentage Bar Plots
Sometimes, you want to show the relative proportions rather than absolute values:
# Calculate percentage contribution for each category
category_cols = ['Electronics', 'Clothing', 'Home & Kitchen', 'Books']
for col in category_cols:
sales_df[f'{col}_pct'] = sales_df[col] / sales_df['Total'] * 100
# Create percentage stacked bar plot
pct_cols = [f'{col}_pct' for col in category_cols]
sales_df.plot(kind='bar', x='Month', y=pct_cols,
figsize=(14, 8), stacked=True, width=0.8)
plt.title('Monthly Sales Composition by Category (%)', fontsize=16)
plt.ylabel('Percentage of Total Sales', fontsize=14)
plt.xlabel('Month', fontsize=14)
plt.xticks(rotation=45)
plt.legend(labels=category_cols)
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Add percentage signs to y-axis
plt.gca().yaxis.set_major_formatter(plt.matplotlib.ticker.PercentFormatter())
plt.tight_layout()
plt.show()
Summary
In this tutorial, we've explored the versatility of Pandas for creating bar plots. We covered:
- Basic vertical and horizontal bar plots
- Customizing bar plots with colors, labels, and annotations
- Creating grouped and stacked bar plots
- Working through real-world examples with sales data
- Visualizing percentage contributions
Bar plots are one of the most effective ways to compare values across categories, and Pandas makes it remarkably easy to create them from your data.
Additional Resources and Exercises
Additional Resources
- Pandas Visualization Documentation
- Matplotlib Bar Plot Documentation
- Data Visualization with Python and Matplotlib
Practice Exercises
-
Basic Exercise: Create a bar plot showing the population of the top 10 most populous countries.
-
Intermediate Exercise: Create a grouped bar plot comparing the quarterly revenue and profit for a company over the last 3 years.
-
Advanced Exercise: Create a stacked percentage bar plot showing the market share of different smartphone manufacturers over time.
-
Challenge: Create a bar plot with error bars showing average temperature by month with standard deviation indicators.
Remember, the best way to master bar plots is through practice. Try to incorporate them into your data analysis projects to gain more experience!
Happy plotting!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)