Skip to main content

Pandas Plot Method

Introduction

Data visualization is a crucial part of data analysis. It helps us understand patterns, identify trends, and communicate findings effectively. While there are dedicated visualization libraries like Matplotlib and Seaborn, Pandas comes with its own convenient plotting capabilities through the plot() method, which is built on top of Matplotlib.

The Pandas plot() method provides a simple and intuitive interface for creating common plots directly from DataFrame and Series objects. This integration makes it incredibly efficient to explore data visually during the analysis process without switching contexts or libraries.

In this tutorial, we'll explore how to use Pandas' plotting functionality to create various types of visualizations that can help you better understand your data.

Basic Plotting with Pandas

Before we start, let's import the necessary libraries and create some sample data:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# For better-looking plots in Jupyter notebooks
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

Creating a Simple Line Plot

The most basic plot you can create with Pandas is a line plot. Let's create a simple DataFrame with some time series data:

python
# Create a DataFrame with some sample time-series data
dates = pd.date_range('2023-01-01', periods=12, freq='M')
df = pd.DataFrame({
'Sales': np.random.randint(100, 200, size=12),
'Revenue': np.random.randint(1000, 2000, size=12)
}, index=dates)

print(df.head())

Output:

            Sales  Revenue
2023-01-31 124 1426
2023-02-28 142 1718
2023-03-31 168 1847
2023-04-30 118 1291
2023-05-31 187 1508

Now let's create a simple line plot:

python
# Create a line plot
df.plot()
plt.title('Monthly Sales and Revenue')
plt.ylabel('Value')
plt.xlabel('Date')
plt.show()

Basic Line Plot

In the example above, Pandas automatically used the DataFrame's index as the x-axis and created a line for each column in the DataFrame.

Plot Types Available in Pandas

The plot() method supports multiple plot types through the kind parameter. Here are the most common ones:

  • 'line': Line plot (default)
  • 'bar': Vertical bar plot
  • 'barh': Horizontal bar plot
  • 'hist': Histogram
  • 'box': Box plot
  • 'kde': Kernel Density Estimate plot
  • 'density': Same as 'kde'
  • 'area': Area plot
  • 'pie': Pie plot
  • 'scatter': Scatter plot
  • 'hexbin': Hexagonal bin plot

Let's explore some of these plot types with examples.

Bar Plot

Bar plots are useful for comparing quantities between different categories:

python
# Create a bar plot
monthly_sales = df['Sales']
monthly_sales.plot(kind='bar', figsize=(10, 5), color='skyblue')
plt.title('Monthly Sales')
plt.ylabel('Sales')
plt.xlabel('Month')
plt.show()

Bar Plot

Histogram

Histograms help visualize the distribution of a dataset:

python
# Create a histogram
df['Revenue'].plot(kind='hist', bins=10, figsize=(10, 5), color='lightgreen', alpha=0.7)
plt.title('Revenue Distribution')
plt.xlabel('Revenue')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
plt.show()

Histogram

Scatter Plot

Scatter plots are great for visualizing relationships between two variables:

python
# Create a scatter plot
df.plot(kind='scatter', x='Sales', y='Revenue', figsize=(10, 5),
color='purple', alpha=0.7, s=100)
plt.title('Sales vs Revenue')
plt.grid(True)
plt.show()

Scatter Plot

Box Plot

Box plots show the distribution of your data and highlight potential outliers:

python
# Create a box plot
df.plot(kind='box', figsize=(10, 5))
plt.title('Distribution of Sales and Revenue')
plt.ylabel('Value')
plt.grid(True)
plt.show()

Box Plot

Pie Chart

Pie charts are useful for showing percentages of a whole:

python
# Let's create some category data for a pie chart
category_sales = pd.Series([15000, 12000, 8000, 7500, 3000],
index=['Electronics', 'Clothing', 'Home', 'Books', 'Other'],
name='Sales')

# Create a pie chart
category_sales.plot(kind='pie', figsize=(8, 8), autopct='%1.1f%%',
startangle=90, shadow=True, explode=(0.1, 0, 0, 0, 0))
plt.title('Sales by Category')
plt.ylabel('') # Hide the y-label
plt.show()

Pie Chart

Customizing Plots

Pandas' plot() method accepts many parameters to customize your visualizations. Here are some common customizations:

Adjusting Figure Size and Layout

python
# Change figure size
df.plot(figsize=(12, 6))
plt.show()

Setting Colors and Styles

python
# Customize colors and styles
df.plot(style=['--', ':'], color=['blue', 'red'], linewidth=2)
plt.show()

Adding a Grid, Legend, and Labels

python
# Add grid, legend, and labels
df.plot(grid=True)
plt.legend(loc='best') # 'best' automatically places the legend in an optimal position
plt.title('Sales and Revenue Over Time')
plt.ylabel('Value')
plt.show()

Creating Subplots

You can create multiple plots in a single figure using the subplots parameter:

python
# Create subplots
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(10, 8))

df['Sales'].plot(ax=axes[0], title='Monthly Sales')
df['Revenue'].plot(ax=axes[1], title='Monthly Revenue')

plt.tight_layout()
plt.show()

Subplots

Real-World Example: Analyzing E-commerce Data

Let's apply Pandas plotting capabilities to analyze a realistic e-commerce dataset:

python
# Create a more realistic e-commerce dataset
np.random.seed(42)
date_range = pd.date_range('2023-01-01', '2023-12-31', freq='D')
n_days = len(date_range)

ecommerce_data = pd.DataFrame({
'Date': date_range,
'Orders': np.random.normal(100, 20, n_days).astype(int),
'Revenue': np.random.normal(5000, 1000, n_days),
'Visitors': np.random.normal(1500, 300, n_days).astype(int),
'Conversion_Rate': np.random.normal(6, 1, n_days) / 100
})

# Add a weekend flag for analysis
ecommerce_data['Weekday'] = ecommerce_data['Date'].dt.day_name()
ecommerce_data['Is_Weekend'] = ecommerce_data['Weekday'].isin(['Saturday', 'Sunday'])

# Show the first few rows
print(ecommerce_data.head())

Output:

        Date  Orders    Revenue  Visitors  Conversion_Rate    Weekday  Is_Weekend
0 2023-01-01 85 4108.2729 1252 0.069345 Sunday True
1 2023-01-02 108 5318.6794 1557 0.053511 Monday False
2 2023-01-03 93 4209.0677 1717 0.064755 Tuesday False
3 2023-01-04 96 5086.9307 1162 0.044285 Wednesday False
4 2023-01-05 114 5860.2232 1562 0.063800 Thursday False

Now, let's create some meaningful visualizations to analyze this data:

1. Weekly Orders Trend

python
# Resample data to weekly frequency and plot
weekly_orders = ecommerce_data.set_index('Date')['Orders'].resample('W').mean()

weekly_orders.plot(figsize=(12, 6), color='blue', marker='o', linestyle='-')
plt.title('Average Weekly Orders')
plt.ylabel('Orders')
plt.grid(True)
plt.show()

Weekly Orders Trend

2. Revenue vs. Visitors Scatter Plot

python
ecommerce_data.plot(kind='scatter', x='Visitors', y='Revenue', figsize=(10, 6),
alpha=0.6, s=ecommerce_data['Orders']/2, c='Conversion_Rate',
cmap='viridis', colorbar=True)
plt.title('Relationship between Visitors and Revenue')
plt.xlabel('Number of Visitors')
plt.ylabel('Revenue ($)')
plt.grid(True)
plt.show()

Revenue vs Visitors

3. Weekday vs. Weekend Comparison

python
# Group data by weekday and calculate averages
weekday_stats = ecommerce_data.groupby('Weekday')[['Orders', 'Revenue', 'Visitors', 'Conversion_Rate']].mean()
# Reorder days of the week
weekday_stats = weekday_stats.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])

# Create a bar plot for orders by day of the week
weekday_stats['Orders'].plot(kind='bar', figsize=(12, 6), color='lightblue')
plt.title('Average Orders by Day of Week')
plt.ylabel('Average Orders')
plt.grid(axis='y')
plt.show()

Weekday Comparison

4. Monthly Revenue Box Plot

python
# Extract month from date and create a box plot of revenue by month
ecommerce_data['Month'] = ecommerce_data['Date'].dt.month_name()
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']

# Create a box plot
plt.figure(figsize=(14, 7))
ecommerce_data.boxplot(column=['Revenue'], by='Month', grid=True,
rot=45, fontsize=10, figsize=(14, 7))
plt.title('Monthly Revenue Distribution', fontsize=14)
plt.suptitle('') # Remove the default suptitle
plt.ylabel('Revenue ($)', fontsize=12)
plt.tight_layout()
plt.show()

Monthly Revenue Box Plot

Summary

The Pandas plot() method provides a convenient and powerful interface for creating visualizations directly from your DataFrame or Series objects. Key takeaways include:

  1. Pandas plotting is built on top of Matplotlib, providing a simpler interface for common plots
  2. The plot() method supports numerous chart types through the kind parameter
  3. You can customize plots with parameters like figsize, color, style, and more
  4. For complex visualizations, you can access the underlying Matplotlib functionality
  5. Creating plots directly from your DataFrame or Series keeps your data analysis workflow smooth and efficient

While Pandas' plotting functionality is great for quick exploratory data analysis, you might want to use specialized libraries like Matplotlib or Seaborn for more complex or publication-quality visualizations.

Additional Resources

To further enhance your data visualization skills with Pandas:

Practice Exercises

  1. Create a dataset of your choice and visualize it using at least three different plot types.
  2. Take a real-world dataset (e.g., from Kaggle) and create visualizations to explore and analyze the data.
  3. Create a dashboard-like layout with multiple subplots showing different aspects of your data.
  4. Experiment with customizing your plots by changing colors, styles, and adding annotations.
  5. Try recreating a visualization you've seen in a publication or online article using Pandas' plotting capabilities.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)