Skip to main content

Pandas Line Plots

Introduction

Line plots are one of the most common and effective visualizations for displaying data that changes over time or sequential data points. They excel at showing trends, patterns, and relationships between variables across a continuous axis. In data analysis, line plots are especially useful for time series data, such as stock prices, temperature readings, or any measurement collected at regular intervals.

Pandas, the popular data manipulation library for Python, provides built-in plotting functionality through its integration with Matplotlib. This makes it incredibly straightforward to create line plots directly from your DataFrame or Series without having to manually configure matplotlib.

In this tutorial, we'll explore how to create various types of line plots using pandas, customize their appearance, and apply them to real-world datasets.

Basic Line Plot with Pandas

Creating a line plot with pandas is as simple as calling the .plot() method on a DataFrame or Series. Let's start with a basic example:

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set the style for better visuals
plt.style.use('ggplot')

# Create some sample data
dates = pd.date_range('2023-01-01', periods=12, freq='M')
data = pd.Series(np.random.randn(12).cumsum(), index=dates)

# Create a basic line plot
ax = data.plot(figsize=(10, 6))
ax.set_title('Basic Line Plot in Pandas')
ax.set_ylabel('Value')
ax.set_xlabel('Date')

plt.show()

This will generate a line plot where:

  • The x-axis represents the dates from our DatetimeIndex
  • The y-axis shows our cumulative random values
  • The plot is automatically sized to 10×6 inches

The output looks like a line connecting all our data points chronologically, making it easy to spot the trend over time.

Multiple Line Plots from DataFrame

When working with a DataFrame containing multiple columns, pandas makes it easy to plot several lines on the same chart:

python
# Create a DataFrame with multiple columns
df = pd.DataFrame({
'Product A': np.random.randn(12).cumsum(),
'Product B': np.random.randn(12).cumsum(),
'Product C': np.random.randn(12).cumsum()
}, index=dates)

# Plot all columns
ax = df.plot(figsize=(10, 6), title='Sales Performance of Different Products')
ax.set_ylabel('Sales')
ax.set_xlabel('Month')
plt.legend(loc='best')

plt.show()

In this example, each column in the DataFrame is plotted as a separate line with a different color. The legend is automatically created based on the column names.

Customizing Line Plots

Pandas passes additional parameters to matplotlib, giving you extensive control over the appearance of your plots:

python
# Customize line styles, colors, and markers
ax = df.plot(
figsize=(12, 6),
style=['-', '--', '-.'], # Different line styles
color=['blue', 'red', 'green'], # Custom colors
marker=['o', 's', '^'], # Add markers: circle, square, triangle
markersize=8,
linewidth=2,
alpha=0.7, # Transparency
grid=True,
title='Customized Line Plot'
)

ax.set_ylabel('Value', fontsize=12)
ax.set_xlabel('Date', fontsize=12)
plt.legend(loc='upper left', fontsize=10)

plt.show()

Here's what each customization does:

  • style: Sets different line styles (solid, dashed, dash-dot)
  • color: Specifies the color for each line
  • marker: Adds markers at data points
  • markersize: Sets the size of markers
  • linewidth: Controls line thickness
  • alpha: Sets transparency (0-1)
  • grid: Shows grid lines
  • Font sizes can be specified for labels and legends

Handling Missing Data

Real-world datasets often contain missing values. Let's see how pandas handles them in line plots:

python
# Create data with missing values
df_missing = df.copy()
df_missing.iloc[3:5, 0] = np.nan # Set some values to NaN
df_missing.iloc[7:9, 1] = np.nan

# Plot with different options for missing data
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Default behavior: connect across missing data
df_missing.plot(ax=axes[0], title='Default: Connected Lines')
axes[0].set_ylabel('Value')

# Set interpolate=False to show gaps
df_missing.plot(ax=axes[1], title='With interpolate=False', interpolate=False)
axes[1].set_ylabel('Value')

plt.tight_layout()
plt.show()

By default, pandas connects the line across missing values. Setting interpolate=False creates gaps in the line where data is missing, which can be more honest about data availability.

Practical Example: Visualizing Stock Prices

Let's apply our line plotting skills to a real-world scenario by analyzing stock prices:

python
# Sample stock data (using Yahoo Finance data format)
def get_stock_data():
# In a real scenario, you'd use yfinance or another API
# This is simulated data
dates = pd.date_range('2022-01-01', periods=252, freq='B') # Business days

np.random.seed(42) # For reproducibility

# Simulate price movements
base_price = 100
volatility = 0.01

# Random walk for stock prices
movements = np.random.normal(0, volatility, len(dates))
price_movements = (1 + movements).cumprod()
prices = base_price * price_movements

# Create a DataFrame
stocks = pd.DataFrame({
'AAPL': prices * 1.2,
'MSFT': prices * 0.9,
'GOOG': prices * 1.1,
'AMZN': prices * 0.95
}, index=dates)

return stocks

stock_data = get_stock_data()

# Plot stock prices
ax = stock_data.plot(figsize=(12, 6), linewidth=1.5)
ax.set_title('Stock Price Evolution', fontsize=14)
ax.set_ylabel('Price ($)', fontsize=12)
ax.set_xlabel('Date', fontsize=12)
ax.grid(True, alpha=0.3)
plt.legend(loc='upper left')

# Add horizontal line for reference
plt.axhline(y=100, color='gray', linestyle='--', alpha=0.7)
plt.text(stock_data.index[0], 101, 'Reference ($100)', fontsize=10)

plt.tight_layout()
plt.show()

This example creates a visualization of four stock prices over time, adding a horizontal reference line and proper labels for clarity.

Line Plot Variations

Pandas supports several variations of line plots that serve different analytical purposes:

Log Scale

For data with exponential growth or large value ranges, a log scale can be helpful:

python
# Create data with exponential growth
dates = pd.date_range('2023-01-01', periods=20)
data = pd.DataFrame({
'Linear': np.linspace(1, 10, 20),
'Exponential': np.exp(np.linspace(0, 3, 20))
}, index=dates)

# Plot with regular scale
ax1 = data.plot(figsize=(12, 5), title='Regular Scale')

# Plot with log scale on y-axis
ax2 = data.plot(figsize=(12, 5), logy=True, title='Log Scale (Y-axis)')

plt.tight_layout()
plt.show()

The log scale makes it easier to see percentage changes and proportional relationships.

Area Plot

We can also fill the area below the line to emphasize the magnitude:

python
# Create an area plot
ax = df.plot.area(figsize=(10, 6), alpha=0.5, stacked=False)
ax.set_title('Area Plot')
ax.set_ylabel('Value')
ax.set_xlabel('Date')
plt.legend(loc='best')

plt.show()

Setting stacked=True would stack the areas on top of each other, which is useful for showing part-to-whole relationships.

Best Practices for Line Plots

Here are some tips for creating effective line plots:

  1. Keep it simple: Don't plot too many lines on the same graph (ideally 3-5 max)
  2. Use appropriate scales: Consider log scales for exponential growth or large ranges
  3. Add context: Always include clear titles, axis labels, and legends
  4. Highlight important information: Use color strategically to emphasize key lines
  5. Consider smoothing: For noisy data, consider using rolling averages

Let's implement some of these practices:

python
# Create noisy data
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=100)
noisy_data = pd.Series(np.random.normal(0, 1, 100).cumsum(), index=dates)

# Calculate a 7-day rolling average
smooth_data = noisy_data.rolling(window=7).mean()

# Plot both the original and smoothed data
fig, ax = plt.subplots(figsize=(12, 6))

noisy_data.plot(ax=ax, alpha=0.3, color='gray', label='Daily data')
smooth_data.plot(ax=ax, linewidth=2.5, color='blue', label='7-day average')

ax.set_title('Smoothing Noisy Data with Rolling Average')
ax.set_ylabel('Value')
ax.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

This example shows how smoothing can help reveal underlying trends in noisy data.

Summary

Pandas line plots offer a powerful and flexible way to visualize time series and sequential data with minimal code. In this tutorial, we covered:

  • Creating basic line plots from Series and DataFrames
  • Plotting multiple lines on the same chart
  • Customizing line appearance (style, color, markers)
  • Handling missing data
  • Applying line plots to real-world examples like stock data
  • Creating variations like log-scale plots and area plots
  • Best practices for effective line plot visualization

Line plots are an essential tool in any data analyst's toolkit, particularly for analyzing trends over time. Pandas makes creating these visualizations straightforward, allowing you to focus on interpreting the data rather than writing complex plotting code.

Additional Resources and Exercises

Further Reading

Exercises

  1. Basic Line Plot: Download a CSV file containing historical weather data and create a line plot showing temperature changes over time.

  2. Multiple Lines: Create a line plot showing both maximum and minimum temperatures on the same chart. Add appropriate labels and a legend.

  3. Customization Challenge: Take the weather data and create a visually appealing line plot with custom colors, markers, and styling. Add a title and axis labels.

  4. Missing Data Handling: Introduce some missing values into your dataset and experiment with different ways to handle them in your line plot.

  5. Advanced Project: Find a financial dataset with multiple stock prices. Create a line plot showing the percentage change from the starting point rather than absolute prices. Include a reference line at y=0 and add annotations for significant events.

By completing these exercises, you'll gain hands-on experience with pandas line plots and develop practical data visualization skills!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)