Skip to main content

Pandas Series

Introduction

In pandas, a Series is one of the fundamental data structures that serves as a building block for working with data. You can think of a Series as a one-dimensional labeled array capable of holding data of any type (integers, strings, floating-point numbers, Python objects, etc.). It's similar to a column in a spreadsheet or a database table, or a single variable in your dataset.

A Series is characterized by:

  • An index that labels each element in the Series
  • A collection of data values
  • The ability to hold any data type (even mixed types)
  • A variety of built-in methods for data manipulation and analysis

In this guide, we'll explore how to create, manipulate, and work with pandas Series objects.

Prerequisites

To follow along with this tutorial, make sure you have pandas installed:

bash
pip install pandas

Let's begin by importing pandas:

python
import pandas as pd
import numpy as np # We'll use numpy in some examples

Creating a Series

From a List

The simplest way to create a pandas Series is from a list:

python
# Create a Series from a list
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

Output:

0    10
1 20
2 30
3 40
4 50
dtype: int64

Notice that pandas automatically assigned an index (0 to 4) to our Series.

With Custom Index

You can specify your own index labels when creating a Series:

python
# Create a Series with custom indices
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index)
print(s)

Output:

a    10
b 20
c 30
d 40
e 50
dtype: int64

From a Dictionary

When creating a Series from a dictionary, the keys become the index:

python
# Create a Series from a dictionary
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
s = pd.Series(data)
print(s)

Output:

a    10
b 20
c 30
d 40
e 50
dtype: int64

With Scalar Value

You can create a Series with a single scalar value that gets repeated for each index:

python
# Create a Series with a scalar value
s = pd.Series(5, index=['a', 'b', 'c', 'd', 'e'])
print(s)

Output:

a    5
b 5
c 5
d 5
e 5
dtype: int64

From NumPy Arrays

Series can also be created from NumPy arrays:

python
# Create a Series from a NumPy array
data = np.array([10, 20, 30, 40, 50])
s = pd.Series(data)
print(s)

Output:

0    10
1 20
2 30
3 40
4 50
dtype: int64

Series Attributes

Let's explore some important attributes of a Series:

python
# Create a sample Series
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# 1. Values
print("Series values:")
print(s.values)

# 2. Index
print("\nSeries index:")
print(s.index)

# 3. Data type
print("\nSeries data type:")
print(s.dtype)

# 4. Shape
print("\nSeries shape:")
print(s.shape)

# 5. Size
print("\nSeries size:")
print(s.size)

# 6. Name (if set)
s.name = "My Series"
print("\nSeries name:")
print(s.name)

Output:

Series values:
[10 20 30 40 50]

Series index:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Series data type:
int64

Series shape:
(5,)

Series size:
5

Series name:
My Series

Accessing Series Elements

There are multiple ways to access elements in a Series:

By Position (Integer Location)

Using iloc for integer-based indexing:

python
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Access single element by position
print(s.iloc[0]) # First element
print(s.iloc[-1]) # Last element

# Slicing by position
print(s.iloc[1:4]) # Elements at positions 1, 2, and 3

Output:

10
50
b 20
c 30
d 40
dtype: int64

By Label (Index)

Using loc for label-based indexing:

python
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Access single element by label
print(s.loc['a'])

# Slicing by label (inclusive of end label)
print(s.loc['b':'d'])

Output:

10
b 20
c 30
d 40
dtype: int64

Direct Indexing

You can also use direct indexing, which can be either position or label-based:

python
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# By label
print(s['a'])

# By multiple labels
print(s[['a', 'c', 'e']])

# By boolean condition
print(s[s > 30])

Output:

10
a 10
c 30
e 50
dtype: int64
d 40
e 50
dtype: int64

Series Operations

Series support various operations that make data manipulation easy:

Mathematical Operations

python
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Addition
print(s + 5)

# Multiplication
print(s * 2)

# Power
print(s ** 2)

Output:

a    15
b 25
c 35
d 45
e 55
dtype: int64
a 20
b 40
c 60
d 80
e 100
dtype: int64
a 100
b 400
c 900
d 1600
e 2500
dtype: int64

Operations Between Series

python
s1 = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
s2 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

# Addition of two Series
print(s1 + s2)

# With different indices
s3 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'f', 'g'])
print(s1 + s3) # Note the NaN values where indices don't align

Output:

a    11
b 22
c 33
d 44
e 55
dtype: int64
a 11.0
b 22.0
c 33.0
d NaN
e NaN
f NaN
g NaN
dtype: float64

Applying Functions

python
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# Apply a function to each element
print(s.apply(lambda x: x * 2))

# Apply NumPy functions
print(np.sqrt(s))

Output:

a     20
b 40
c 60
d 80
e 100
dtype: int64
a 3.162278
b 4.472136
c 5.477226
d 6.324555
e 7.071068
dtype: float64

Common Series Methods

Here are some commonly used methods for Series:

Statistical Methods

python
s = pd.Series([10, 20, 30, 40, 50])

print(f"Mean: {s.mean()}")
print(f"Median: {s.median()}")
print(f"Standard deviation: {s.std()}")
print(f"Min: {s.min()}")
print(f"Max: {s.max()}")
print(f"Sum: {s.sum()}")
print(f"Description: \n{s.describe()}")

Output:

Mean: 30.0
Median: 30.0
Standard deviation: 15.811388300841896
Min: 10
Max: 50
Sum: 150
Description:
count 5.000000
mean 30.000000
std 15.811388
min 10.000000
25% 20.000000
50% 30.000000
75% 40.000000
max 50.000000
dtype: float64

Data Transformation

python
s = pd.Series([10, 20, 30, 40, 50])

# Cumulative sum
print("Cumulative sum:")
print(s.cumsum())

# Percentage change
print("\nPercentage change:")
print(s.pct_change())

# Shift values (move values by 1)
print("\nShifted values:")
print(s.shift(1))

# Replace values
print("\nReplaced values (30 -> 300):")
print(s.replace(30, 300))

Output:

Cumulative sum:
0 10
1 30
2 60
3 100
4 150
dtype: int64

Percentage change:
0 NaN
1 1.000000
2 0.500000
3 0.333333
4 0.250000
dtype: float64

Shifted values:
0 NaN
1 10.0
2 20.0
3 30.0
4 40.0
dtype: float64

Replaced values (30 -> 300):
0 10
1 20
2 300
3 40
4 50
dtype: int64

Filtering and Sorting

python
s = pd.Series([30, 10, 50, 20, 40], index=['a', 'b', 'c', 'd', 'e'])

# Filtering
print("Values greater than 30:")
print(s[s > 30])

# Checking for values
print("\nIs 50 in the Series?")
print(50 in s.values)

# Sorting by value
print("\nSorted by value:")
print(s.sort_values())

# Sorting by index
print("\nSorted by index:")
print(s.sort_index())

Output:

Values greater than 30:
c 50
e 40
dtype: int64

Is 50 in the Series?
True

Sorted by value:
b 10
d 20
a 30
e 40
c 50
dtype: int64

Sorted by index:
a 30
b 10
c 50
d 20
e 40
dtype: int64

Handling Missing Data

python
s = pd.Series([10, np.nan, 30, np.nan, 50], index=['a', 'b', 'c', 'd', 'e'])
print("Original Series with NaN values:")
print(s)

# Check for null values
print("\nNull value check:")
print(s.isnull())

# Drop null values
print("\nSeries with NaN values dropped:")
print(s.dropna())

# Fill null values
print("\nSeries with NaN values filled with 0:")
print(s.fillna(0))

# Fill null values with forward fill method
print("\nSeries with NaN values forward filled:")
print(s.ffill()) # Also known as s.fillna(method='ffill')

Output:

Original Series with NaN values:
a 10.0
b NaN
c 30.0
d NaN
e 50.0
dtype: float64

Null value check:
a False
b True
c False
d True
e False
dtype: bool

Series with NaN values dropped:
a 10.0
c 30.0
e 50.0
dtype: float64

Series with NaN values filled with 0:
a 10.0
b 0.0
c 30.0
d 0.0
e 50.0
dtype: float64

Series with NaN values forward filled:
a 10.0
b 10.0
c 30.0
d 30.0
e 50.0
dtype: float64

Practical Examples

Let's look at some practical examples of using Series in real-world scenarios:

Example 1: Stock Prices Analysis

python
# Daily closing prices of a stock for one week
dates = pd.date_range('2023-01-01', periods=5, freq='D')
prices = pd.Series([150.5, 152.3, 151.9, 153.7, 155.2], index=dates)
print("Stock prices:")
print(prices)

# Calculate daily returns
daily_returns = prices.pct_change()
print("\nDaily returns:")
print(daily_returns)

# Calculate statistics
print("\nSummary statistics:")
print(daily_returns.describe())

# Find days with positive returns
print("\nDays with positive returns:")
print(prices[daily_returns > 0])

Output:

Stock prices:
2023-01-01 150.5
2023-01-02 152.3
2023-01-03 151.9
2023-01-04 153.7
2023-01-05 155.2
dtype: float64

Daily returns:
2023-01-01 NaN
2023-01-02 0.011960
2023-01-03 -0.002626
2023-01-04 0.011850
2023-01-05 0.009760
dtype: float64

Summary statistics:
count 4.000000
mean 0.007736
std 0.006995
min -0.002626
25% 0.007005
50% 0.010805
75% 0.011933
max 0.011960
dtype: float64

Days with positive returns:
2023-01-02 152.3
2023-01-04 153.7
2023-01-05 155.2
dtype: float64

Example 2: Sales Data Analysis

python
# Monthly sales data for a year
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
sales = pd.Series([10500, 12600, 14800, 13900, 15200, 16800,
17500, 18100, 16400, 15300, 14700, 19200], index=months)

print("Monthly sales:")
print(sales)

# Calculate total yearly sales
print(f"\nTotal yearly sales: ${sales.sum():,}")

# Find the month with highest sales
print(f"\nMonth with highest sales: {sales.idxmax()} (${sales.max():,})")

# Find the month with lowest sales
print(f"\nMonth with lowest sales: {sales.idxmin()} (${sales.min():,})")

# Calculate quarterly sales
q1 = sales.iloc[0:3].sum()
q2 = sales.iloc[3:6].sum()
q3 = sales.iloc[6:9].sum()
q4 = sales.iloc[9:12].sum()

quarterly_sales = pd.Series([q1, q2, q3, q4], index=['Q1', 'Q2', 'Q3', 'Q4'])
print("\nQuarterly sales:")
print(quarterly_sales)

Output:

Monthly sales:
Jan 10500
Feb 12600
Mar 14800
Apr 13900
May 15200
Jun 16800
Jul 17500
Aug 18100
Sep 16400
Oct 15300
Nov 14700
Dec 19200
dtype: int64

Total yearly sales: $185,000

Month with highest sales: Dec ($19,200)

Month with lowest sales: Jan ($10,500)

Quarterly sales:
Q1 37900
Q2 45900
Q3 52000
Q4 49200
dtype: int64

Example 3: Customer Survey Ratings

python
# Customer satisfaction ratings (1-5 scale)
ratings = pd.Series([5, 4, 4, 5, 3, 2, 5, 5, 4, 3, 5, 4, 5, 3, 4])

# Count of each rating
rating_counts = ratings.value_counts().sort_index()
print("Rating distribution:")
print(rating_counts)

# Percentage of each rating
rating_percent = ratings.value_counts(normalize=True).sort_index() * 100
print("\nPercentage distribution:")
print(rating_percent.map('{:.1f}%'.format))

# Average rating
print(f"\nAverage rating: {ratings.mean():.2f} out of 5")

# Percentage of satisfied customers (rating 4 or 5)
satisfied = ((ratings >= 4).sum() / ratings.size) * 100
print(f"\nPercentage of satisfied customers: {satisfied:.1f}%")

Output:

Rating distribution:
2 1
3 3
4 5
5 6
dtype: int64

Percentage distribution:
2 6.7%
3 20.0%
4 33.3%
5 40.0%
dtype: object

Average rating: 4.07 out of 5

Percentage of satisfied customers: 73.3%

Series vs. Other Python Data Structures

It's helpful to understand how pandas Series compares to other Python data structures:

Featurepandas SeriesPython ListNumPy ArrayPython Dictionary
Labeled index✓ (keys)
Homogeneous dataRecommended but not required
Math operations✓ (vectorized)✓ (vectorized)
Built-in data analysisLimited
Missing data handlingLimited

Summary

In this guide, we've covered the pandas Series object in detail:

  • A Series is a one-dimensional labeled array capable of holding any data type
  • Series objects can be created from various data structures like lists, dictionaries, and NumPy arrays
  • Series have a flexible indexing system that allows access by position or label
  • Series support vectorized operations and built-in methods for data analysis
  • Series provide extensive functionality for handling missing data and data manipulation

The Series is the fundamental building block in pandas that, along with the DataFrame, enables powerful and efficient data analysis in Python. Understanding how to work with Series is essential for any data analysis workflow using pandas.

Exercises

To practice your Series skills, try these exercises:

  1. Create a Series containing the temperatures (in Celsius) for a week and convert them to Fahrenheit (F = C * 9/5 + 32).
  2. Given a Series of monthly expenses, calculate the total, average, minimum, and maximum expenses.
  3. Create a Series of rainfall data with some missing values, then calculate the average rainfall after filling missing values with the mean.
  4. Create a Series of test scores and calculate what percentage of students scored above the average.
  5. Create a Series of stock prices and calculate the daily percentage change, then identify the day with the highest price increase.

Additional Resources

Happy coding with pandas Series!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)