Pandas Mathematical Functions
Introduction
Mathematical operations are a fundamental part of data analysis and transformation. Pandas, being one of the most powerful data manipulation libraries in Python, provides a comprehensive set of mathematical functions that allow you to perform calculations on your datasets efficiently.
In this tutorial, we'll explore how Pandas makes mathematical operations intuitive and powerful, allowing you to transform your data with ease. Whether you need to perform basic arithmetic, statistical calculations, or more complex mathematical transformations, Pandas has you covered.
Basic Mathematical Operations
Arithmetic Operations
Pandas allows you to perform arithmetic operations (+, -, *, /) directly on Series and DataFrame objects.
import pandas as pd
import numpy as np
# Create a simple DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
print("Original DataFrame:")
print(df)
# Addition
print("\nAdding 5 to all values:")
print(df + 5)
# Multiplication
print("\nMultiplying all values by 2:")
print(df * 2)
# Division
print("\nDividing all values by 10:")
print(df / 10)
Output:
Original DataFrame:
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500
Adding 5 to all values:
A B C
0 6 15 105
1 7 25 205
2 8 35 305
3 9 45 405
4 10 55 505
Multiplying all values by 2:
A B C
0 2 20 200
1 4 40 400
2 6 60 600
3 8 80 800
4 10 100 1000
Dividing all values by 10:
A B C
0 0.1 1.0 10.0
1 0.2 2.0 20.0
2 0.3 3.0 30.0
3 0.4 4.0 40.0
4 0.5 5.0 50.0
Operations Between DataFrames
You can perform operations between different DataFrames as well:
# Create two DataFrames
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'A': [10, 20, 30],
'B': [40, 50, 60]
})
print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)
# Addition of two DataFrames
print("\nAddition of two DataFrames:")
print(df1 + df2)
# Subtraction of two DataFrames
print("\nSubtraction (df2 - df1):")
print(df2 - df1)
Output:
DataFrame 1:
A B
0 1 4
1 2 5
2 3 6
DataFrame 2:
A B
0 10 40
1 20 50
2 30 60
Addition of two DataFrames:
A B
0 11 44
1 22 55
2 33 66
Subtraction (df2 - df1):
A B
0 9 36
1 18 45
2 27 54
Aggregate Mathematical Functions
Pandas provides various aggregate functions to compute descriptive statistics.
Basic Statistical Functions
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
print("DataFrame:")
print(df)
# Sum of all columns
print("\nSum of all columns:")
print(df.sum())
# Mean of all columns
print("\nMean of all columns:")
print(df.mean())
# Standard deviation
print("\nStandard deviation:")
print(df.std())
# Min and Max values
print("\nMinimum values:")
print(df.min())
print("\nMaximum values:")
print(df.max())
Output:
DataFrame:
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500
Sum of all columns:
A 15
B 150
C 1500
dtype: int64
Mean of all columns:
A 3.0
B 30.0
C 300.0
dtype: float64
Standard deviation:
A 1.581139
B 15.811388
C 158.113883
dtype: float64
Minimum values:
A 1
B 10
C 100
dtype: int64
Maximum values:
A 5
B 50
C 500
dtype: int64
Row-wise and Column-wise Operations
You can apply functions along rows or columns using the axis
parameter:
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
print("DataFrame:")
print(df)
# Column-wise sum (default)
print("\nColumn-wise sum:")
print(df.sum())
# Row-wise sum
print("\nRow-wise sum:")
print(df.sum(axis=1))
# Column-wise mean
print("\nColumn-wise mean:")
print(df.mean())
# Row-wise mean
print("\nRow-wise mean:")
print(df.mean(axis=1))
Output:
DataFrame:
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500
Column-wise sum:
A 15
B 150
C 1500
dtype: int64
Row-wise sum:
0 111
1 222
2 333
3 444
4 555
dtype: int64
Column-wise mean:
A 3.0
B 30.0
C 300.0
dtype: float64
Row-wise mean:
0 37.0
1 74.0
2 111.0
3 148.0
4 185.0
dtype: float64
Advanced Mathematical Functions
Element-wise Functions
Pandas supports NumPy's universal functions (ufuncs) for element-wise operations:
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [0.1, 0.2, 0.3, 0.4, 0.5]
})
print("DataFrame:")
print(df)
# Square root
print("\nSquare root:")
print(np.sqrt(df))
# Exponential
print("\nExponential (e^x):")
print(np.exp(df))
# Logarithm (natural log)
print("\nNatural logarithm:")
print(np.log(df))
# Rounding
print("\nRounded to nearest integer:")
print(np.round(df))
Output:
DataFrame:
A B
0 1 0.1
1 2 0.2
2 3 0.3
3 4 0.4
4 5 0.5
Square root:
A B
0 1.000000 0.316228
1 1.414214 0.447214
2 1.732051 0.547723
3 2.000000 0.632456
4 2.236068 0.707107
Exponential (e^x):
A B
0 2.718282 1.105171
1 7.389056 1.221403
2 20.085537 1.349859
3 54.598150 1.491825
4 148.413159 1.648721
Natural logarithm:
A B
0 0.000000 -2.302585
1 0.693147 -1.609438
2 1.098612 -1.203973
3 1.386294 -0.916291
4 1.609438 -0.693147
Rounded to nearest integer:
A B
0 1.0 0.0
1 2.0 0.0
2 3.0 0.0
3 4.0 0.0
4 5.0 0.0
Custom Functions with apply()
The apply()
method allows you to apply custom functions to your data:
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
print("DataFrame:")
print(df)
# Apply custom function to each column
def double_and_add_one(col):
return col * 2 + 1
print("\nApplying double_and_add_one to each column:")
print(df.apply(double_and_add_one))
# Apply custom function to each row
def range_of_values(row):
return row.max() - row.min()
print("\nRange of values in each row:")
print(df.apply(range_of_values, axis=1))
# Using lambda functions
print("\nSquare of each value:")
print(df.apply(lambda x: x ** 2))
Output:
DataFrame:
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500
Applying double_and_add_one to each column:
A B C
0 3 21 201
1 5 41 401
2 7 61 601
3 9 81 801
4 11 101 1001
Range of values in each row:
0 99
1 198
2 297
3 396
4 495
dtype: int64
Square of each value:
A B C
0 1 100 10000
1 4 400 40000
2 9 900 90000
3 16 1600 160000
4 25 2500 250000
Practical Examples
Example 1: Financial Analysis
Let's calculate some basic financial metrics using Pandas:
# Stock price data
stock_data = pd.DataFrame({
'Date': pd.date_range(start='2023-01-01', periods=5),
'Open': [100, 102, 104, 103, 105],
'High': [105, 107, 108, 106, 110],
'Low': [98, 100, 102, 101, 103],
'Close': [102, 104, 103, 105, 108]
})
stock_data.set_index('Date', inplace=True)
print("Stock price data:")
print(stock_data)
# Calculate daily returns
stock_data['Daily_Return'] = stock_data['Close'].pct_change() * 100
# Calculate trading range
stock_data['Range'] = stock_data['High'] - stock_data['Low']
# Calculate moving average of closing price (window of 3 days)
stock_data['MA3'] = stock_data['Close'].rolling(window=3).mean()
print("\nStock data with calculated metrics:")
print(stock_data)
print("\nSummary statistics:")
print(stock_data.describe())
Output:
Stock price data:
Open High Low Close
Date
2023-01-01 100 105 98 102
2023-01-02 102 107 100 104
2023-01-03 104 108 102 103
2023-01-04 103 106 101 105
2023-01-05 105 110 103 108
Stock data with calculated metrics:
Open High Low Close Daily_Return Range MA3
Date
2023-01-01 100 105 98 102 NaN 7 NaN
2023-01-02 102 107 100 104 1.960784 7 NaN
2023-01-03 104 108 102 103 -0.961538 6 103.000000
2023-01-04 103 106 101 105 1.941748 5 104.000000
2023-01-05 105 110 103 108 2.857143 7 105.333333
Summary statistics:
Open High Low Close Daily_Return Range MA3
count 5.000000 5.000000 5.000000 5.000000 4.000000 5.000000 3.000000
mean 102.800000 107.200000 100.800000 104.400000 1.449534 6.400000 104.111111
std 1.923538 1.923538 1.923538 2.302173 1.794594 0.894427 1.154701
min 100.000000 105.000000 98.000000 102.000000 -0.961538 5.000000 103.000000
25% 102.000000 106.000000 100.000000 103.000000 0.714970 6.000000 103.500000
50% 103.000000 107.000000 101.000000 104.000000 1.951266 7.000000 104.000000
75% 104.000000 108.000000 102.000000 105.000000 2.190329 7.000000 104.666667
max 105.000000 110.000000 103.000000 108.000000 2.857143 7.000000 105.333333
Example 2: Data Normalization
Normalization is a common preprocessing step in machine learning:
# Sample dataset
data = pd.DataFrame({
'Feature1': [10, 20, 30, 40, 50],
'Feature2': [100, 150, 200, 250, 300],
'Feature3': [1000, 2000, 3000, 4000, 5000]
})
print("Original dataset:")
print(data)
# Min-Max Normalization (scaling features to range [0,1])
def min_max_normalize(x):
return (x - x.min()) / (x.max() - x.min())
# Z-score normalization (standardization)
def z_score_normalize(x):
return (x - x.mean()) / x.std()
normalized_data_minmax = data.apply(min_max_normalize)
print("\nMin-Max Normalized data (scale 0 to 1):")
print(normalized_data_minmax)
normalized_data_zscore = data.apply(z_score_normalize)
print("\nZ-Score Normalized data:")
print(normalized_data_zscore)
Output:
Original dataset:
Feature1 Feature2 Feature3
0 10 100 1000
1 20 150 2000
2 30 200 3000
3 40 250 4000
4 50 300 5000
Min-Max Normalized data (scale 0 to 1):
Feature1 Feature2 Feature3
0 0.0 0.0 0.0
1 0.25 0.25 0.25
2 0.5 0.5 0.5
3 0.75 0.75 0.75
4 1.0 1.0 1.0
Z-Score Normalized data:
Feature1 Feature2 Feature3
0 -1.264911 -1.264911 -1.264911
1 -0.632456 -0.632456 -0.632456
2 0.000000 0.000000 0.000000
3 0.632456 0.632456 0.632456
4 1.264911 1.264911 1.264911
Summary
In this tutorial, we explored Pandas' comprehensive set of mathematical functions that enable efficient data transformation and analysis. We covered:
- Basic Arithmetic Operations: Addition, subtraction, multiplication, and division on DataFrames
- Aggregate Functions: Computing statistics like sum, mean, standard deviation
- Element-wise Functions: Applying mathematical functions to each element
- Custom Functions: Using
apply()
to implement your own operations - Practical Examples: Financial analysis and data normalization
With these tools, you can perform a wide range of mathematical transformations on your datasets, making Pandas an indispensable tool for data analysis and preparation for machine learning.
Additional Resources and Exercises
Further Reading
- Pandas Documentation on Computation
- NumPy Universal Functions
- Python for Data Analysis by Wes McKinney (creator of Pandas)
Exercises
-
Basic Operations:
- Create a DataFrame with employee information (name, salary, bonus) and calculate their total compensation (salary + bonus).
- Calculate percentage increase in their compensation if everyone gets a 5% raise.
-
Statistical Analysis:
- Create a DataFrame with student exam scores for different subjects and calculate:
- Average score for each student
- Average score for each subject
- Identify highest and lowest performing student in each subject
- Create a DataFrame with student exam scores for different subjects and calculate:
-
Advanced Transformation:
- Create a dataset with temperature readings in Celsius for a week.
- Convert the temperatures to Fahrenheit (F = C * 9/5 + 32)
- Calculate the 3-day moving average of temperatures
- Identify days with temperatures above the weekly average
-
Real-world Data:
- Download a real dataset (e.g., stock prices, weather data)
- Apply at least three different mathematical transformations
- Create visualizations to show the results of your transformations
By practicing these exercises, you'll gain practical experience with Pandas' mathematical functions and develop your data transformation skills.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)