Pandas Series

Introduction

In pandas, a Series is one of the fundamental data structures that serves as a building block for working with data. You can think of a Series as a one-dimensional labeled array capable of holding data of any type (integers, strings, floating-point numbers, Python objects, etc.). It's similar to a column in a spreadsheet or a database table, or a single variable in your dataset.

A Series is characterized by:

An index that labels each element in the Series
A collection of data values
The ability to hold any data type (even mixed types)
A variety of built-in methods for data manipulation and analysis

In this guide, we'll explore how to create, manipulate, and work with pandas Series objects.

Prerequisites

To follow along with this tutorial, make sure you have pandas installed:

pip install pandas

Let's begin by importing pandas:

import pandas as pd
import numpy as np  # We'll use numpy in some examples

Creating a Series

From a List

The simplest way to create a pandas Series is from a list:

# Create a Series from a list
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

Output:

  10
  20
  30
  40
  50
dtype: int64

Notice that pandas automatically assigned an index (0 to 4) to our Series.

With Custom Index

You can specify your own index labels when creating a Series:

# Create a Series with custom indices
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index)
print(s)

Output:

a    10
b    20
c    30
d    40
e    50
dtype: int64

From a Dictionary

When creating a Series from a dictionary, the keys become the index:

# Create a Series from a dictionary
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
s = pd.Series(data)
print(s)

Output:

a    10
b    20
c    30
d    40
e    50
dtype: int64

With Scalar Value

You can create a Series with a single scalar value that gets repeated for each index:

# Create a Series with a scalar value
s = pd.Series(5, index=['a', 'b', 'c', 'd', 'e'])
print(s)

Output:

a    5
b    5
c    5
d    5
e    5
dtype: int64

From NumPy Arrays

Series can also be created from NumPy arrays:

# Create a Series from a NumPy array
data = np.array([10, 20, 30, 40, 50])
s = pd.Series(data)
print(s)

Output:

  10
  20
  30
  40
  50
dtype: int64

Series Attributes

Let's explore some important attributes of a Series:

# Create a sample Series
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# 1. Values
print("Series values:")
print(s.values)

# 2. Index
print("\nSeries index:")
print(s.index)

# 3. Data type
print("\nSeries data type:")
print(s.dtype)

# 4. Shape
print("\nSeries shape:")
print(s.shape)

# 5. Size
print("\nSeries size:")
print(s.size)

# 6. Name (if set)
s.name = "My Series"
print("\nSeries name:")
print(s.name)

Output:

Series values:
[10 20 30 40 50]

Series index:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series data type:
int64

Series shape:
(5,)

Series size:
5

Series name:
My Series

Accessing Series Elements

There are multiple ways to access elements in a Series:

By Position (Integer Location)

Using iloc for integer-based indexing:

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Access single element by position
print(s.iloc[0])  # First element
print(s.iloc[-1])  # Last element

# Slicing by position
print(s.iloc[1:4])  # Elements at positions 1, 2, and 3

Output:

10
50
b    20
c    30
d    40
dtype: int64

By Label (Index)

Using loc for label-based indexing:

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Access single element by label
print(s.loc['a'])

# Slicing by label (inclusive of end label)
print(s.loc['b':'d'])

Output:

10
b    20
c    30
d    40
dtype: int64

Direct Indexing

You can also use direct indexing, which can be either position or label-based:

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# By label
print(s['a'])

# By multiple labels
print(s[['a', 'c', 'e']])

# By boolean condition
print(s[s > 30])

Output:

10
a    10
c    30
e    50
dtype: int64
d    40
e    50
dtype: int64

Series Operations

Series support various operations that make data manipulation easy:

Mathematical Operations

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Addition
print(s + 5)

# Multiplication
print(s * 2)

# Power
print(s ** 2)

Output:

a    15
b    25
c    35
d    45
e    55
dtype: int64
a     20
b     40
c     60
d     80
e    100
dtype: int64
a     100
b     400
c     900
d    1600
e    2500
dtype: int64

Operations Between Series

s1 = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
s2 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

# Addition of two Series
print(s1 + s2)

# With different indices
s3 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'f', 'g'])
print(s1 + s3)  # Note the NaN values where indices don't align

Output:

a    11
b    22
c    33
d    44
e    55
dtype: int64
a    11.0
b    22.0
c    33.0
d    NaN
e    NaN
f    NaN
g    NaN
dtype: float64

Applying Functions

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Apply a function to each element
print(s.apply(lambda x: x * 2))

# Apply NumPy functions
print(np.sqrt(s))

Output:

a     20
b     40
c     60
d     80
e    100
dtype: int64
a    3.162278
b    4.472136
c    5.477226
d    6.324555
e    7.071068
dtype: float64

Common Series Methods

Here are some commonly used methods for Series:

Statistical Methods

s = pd.Series([10, 20, 30, 40, 50])

print(f"Mean: {s.mean()}")
print(f"Median: {s.median()}")
print(f"Standard deviation: {s.std()}")
print(f"Min: {s.min()}")
print(f"Max: {s.max()}")
print(f"Sum: {s.sum()}")
print(f"Description: \n{s.describe()}")

Output:

Mean: 30.0
Median: 30.0
Standard deviation: 15.811388300841896
Min: 10
Max: 50
Sum: 150
Description: 
count     5.000000
mean     30.000000
std      15.811388
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
dtype: float64

Data Transformation

s = pd.Series([10, 20, 30, 40, 50])

# Cumulative sum
print("Cumulative sum:")
print(s.cumsum())

# Percentage change
print("\nPercentage change:")
print(s.pct_change())

# Shift values (move values by 1)
print("\nShifted values:")
print(s.shift(1))

# Replace values
print("\nReplaced values (30 -> 300):")
print(s.replace(30, 300))

Output:

Cumulative sum:
   10
   30
   60
  100
  150
dtype: int64

Percentage change:
       NaN
  1.000000
  0.500000
  0.333333
  0.250000
dtype: float64

Shifted values:
  NaN
 10.0
 20.0
 30.0
 40.0
dtype: float64

Replaced values (30 -> 300):
   10
   20
  300
   40
   50
dtype: int64

Filtering and Sorting

s = pd.Series([30, 10, 50, 20, 40], index=['a', 'b', 'c', 'd', 'e'])

# Filtering
print("Values greater than 30:")
print(s[s > 30])

# Checking for values
print("\nIs 50 in the Series?")
print(50 in s.values)

# Sorting by value
print("\nSorted by value:")
print(s.sort_values())

# Sorting by index
print("\nSorted by index:")
print(s.sort_index())

Output:

Values greater than 30:
c    50
e    40
dtype: int64

Is 50 in the Series?
True

Sorted by value:
b    10
d    20
a    30
e    40
c    50
dtype: int64

Sorted by index:
a    30
b    10
c    50
d    20
e    40
dtype: int64

Handling Missing Data

s = pd.Series([10, np.nan, 30, np.nan, 50], index=['a', 'b', 'c', 'd', 'e'])
print("Original Series with NaN values:")
print(s)

# Check for null values
print("\nNull value check:")
print(s.isnull())

# Drop null values
print("\nSeries with NaN values dropped:")
print(s.dropna())

# Fill null values
print("\nSeries with NaN values filled with 0:")
print(s.fillna(0))

# Fill null values with forward fill method
print("\nSeries with NaN values forward filled:")
print(s.ffill())  # Also known as s.fillna(method='ffill')

Output:

Original Series with NaN values:
a    10.0
b     NaN
c    30.0
d     NaN
e    50.0
dtype: float64

Null value check:
a    False
b     True
c    False
d     True
e    False
dtype: bool

Series with NaN values dropped:
a    10.0
c    30.0
e    50.0
dtype: float64

Series with NaN values filled with 0:
a    10.0
b     0.0
c    30.0
d     0.0
e    50.0
dtype: float64

Series with NaN values forward filled:
a    10.0
b    10.0
c    30.0
d    30.0
e    50.0
dtype: float64

Practical Examples

Let's look at some practical examples of using Series in real-world scenarios:

Example 1: Stock Prices Analysis

# Daily closing prices of a stock for one week
dates = pd.date_range('2023-01-01', periods=5, freq='D')
prices = pd.Series([150.5, 152.3, 151.9, 153.7, 155.2], index=dates)
print("Stock prices:")
print(prices)

# Calculate daily returns
daily_returns = prices.pct_change()
print("\nDaily returns:")
print(daily_returns)

# Calculate statistics
print("\nSummary statistics:")
print(daily_returns.describe())

# Find days with positive returns
print("\nDays with positive returns:")
print(prices[daily_returns > 0])

Output:

Stock prices:
2023-01-01    150.5
2023-01-02    152.3
2023-01-03    151.9
2023-01-04    153.7
2023-01-05    155.2
dtype: float64

Daily returns:
2023-01-01         NaN
2023-01-02    0.011960
2023-01-03   -0.002626
2023-01-04    0.011850
2023-01-05    0.009760
dtype: float64

Summary statistics:
count    4.000000
mean     0.007736
std      0.006995
min     -0.002626
25%      0.007005
50%      0.010805
75%      0.011933
max      0.011960
dtype: float64

Days with positive returns:
2023-01-02    152.3
2023-01-04    153.7
2023-01-05    155.2
dtype: float64

Example 2: Sales Data Analysis

# Monthly sales data for a year
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
sales = pd.Series([10500, 12600, 14800, 13900, 15200, 16800, 
                  17500, 18100, 16400, 15300, 14700, 19200], index=months)

print("Monthly sales:")
print(sales)

# Calculate total yearly sales
print(f"\nTotal yearly sales: ${sales.sum():,}")

# Find the month with highest sales
print(f"\nMonth with highest sales: {sales.idxmax()} (${sales.max():,})")

# Find the month with lowest sales
print(f"\nMonth with lowest sales: {sales.idxmin()} (${sales.min():,})")

# Calculate quarterly sales
q1 = sales.iloc[0:3].sum()
q2 = sales.iloc[3:6].sum()
q3 = sales.iloc[6:9].sum()
q4 = sales.iloc[9:12].sum()

quarterly_sales = pd.Series([q1, q2, q3, q4], index=['Q1', 'Q2', 'Q3', 'Q4'])
print("\nQuarterly sales:")
print(quarterly_sales)

Output:

Monthly sales:
Jan     10500
Feb     12600
Mar     14800
Apr     13900
May     15200
Jun     16800
Jul     17500
Aug     18100
Sep     16400
Oct     15300
Nov     14700
Dec     19200
dtype: int64

Total yearly sales: $185,000

Month with highest sales: Dec ($19,200)

Month with lowest sales: Jan ($10,500)

Quarterly sales:
Q1    37900
Q2    45900
Q3    52000
Q4    49200
dtype: int64

Example 3: Customer Survey Ratings

# Customer satisfaction ratings (1-5 scale)
ratings = pd.Series([5, 4, 4, 5, 3, 2, 5, 5, 4, 3, 5, 4, 5, 3, 4])

# Count of each rating
rating_counts = ratings.value_counts().sort_index()
print("Rating distribution:")
print(rating_counts)

# Percentage of each rating
rating_percent = ratings.value_counts(normalize=True).sort_index() * 100
print("\nPercentage distribution:")
print(rating_percent.map('{:.1f}%'.format))

# Average rating
print(f"\nAverage rating: {ratings.mean():.2f} out of 5")

# Percentage of satisfied customers (rating 4 or 5)
satisfied = ((ratings >= 4).sum() / ratings.size) * 100
print(f"\nPercentage of satisfied customers: {satisfied:.1f}%")

Output:

Rating distribution:
2    1
3    3
4    5
5    6
dtype: int64

Percentage distribution:
2    6.7%
3    20.0%
4    33.3%
5    40.0%
dtype: object

Average rating: 4.07 out of 5

Percentage of satisfied customers: 73.3%

Series vs. Other Python Data Structures

It's helpful to understand how pandas Series compares to other Python data structures:

Feature	pandas Series	Python List	NumPy Array	Python Dictionary
Labeled index	✓	✗	✗	✓ (keys)
Homogeneous data	Recommended but not required	✗	✓	✗
Math operations	✓ (vectorized)	✗	✓ (vectorized)	✗
Built-in data analysis	✓	✗	Limited	✗
Missing data handling	✓	✗	Limited	✗

Summary

In this guide, we've covered the pandas Series object in detail:

A Series is a one-dimensional labeled array capable of holding any data type
Series objects can be created from various data structures like lists, dictionaries, and NumPy arrays
Series have a flexible indexing system that allows access by position or label
Series support vectorized operations and built-in methods for data analysis
Series provide extensive functionality for handling missing data and data manipulation

The Series is the fundamental building block in pandas that, along with the DataFrame, enables powerful and efficient data analysis in Python. Understanding how to work with Series is essential for any data analysis workflow using pandas.

Exercises

To practice your Series skills, try these exercises:

Create a Series containing the temperatures (in Celsius) for a week and convert them to Fahrenheit (F = C * 9/5 + 32).
Given a Series of monthly expenses, calculate the total, average, minimum, and maximum expenses.
Create a Series of rainfall data with some missing values, then calculate the average rainfall after filling missing values with the mean.
Create a Series of test scores and calculate what percentage of students scored above the average.
Create a Series of stock prices and calculate the daily percentage change, then identify the day with the highest price increase.

Additional Resources

Happy coding with pandas Series!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Creating a Series​

From a List​

With Custom Index​

From a Dictionary​

With Scalar Value​

From NumPy Arrays​

Series Attributes​

Accessing Series Elements​

By Position (Integer Location)​

By Label (Index)​

Direct Indexing​

Series Operations​

Mathematical Operations​

Operations Between Series​

Applying Functions​

Common Series Methods​

Statistical Methods​

Data Transformation​

Filtering and Sorting​

Handling Missing Data​

Practical Examples​

Example 1: Stock Prices Analysis​

Example 2: Sales Data Analysis​

Example 3: Customer Survey Ratings​

Series vs. Other Python Data Structures​

Summary​

Exercises​

Additional Resources​

Introduction

Prerequisites

Creating a Series

From a List

With Custom Index

From a Dictionary

With Scalar Value

From NumPy Arrays

Series Attributes

Accessing Series Elements

By Position (Integer Location)

By Label (Index)

Direct Indexing

Series Operations

Mathematical Operations

Operations Between Series

Applying Functions

Common Series Methods

Statistical Methods

Data Transformation

Filtering and Sorting

Handling Missing Data

Practical Examples

Example 1: Stock Prices Analysis

Example 2: Sales Data Analysis

Example 3: Customer Survey Ratings

Series vs. Other Python Data Structures

Summary

Exercises

Additional Resources