Pandas Mathematical Functions

Introduction

Mathematical operations are a fundamental part of data analysis and transformation. Pandas, being one of the most powerful data manipulation libraries in Python, provides a comprehensive set of mathematical functions that allow you to perform calculations on your datasets efficiently.

In this tutorial, we'll explore how Pandas makes mathematical operations intuitive and powerful, allowing you to transform your data with ease. Whether you need to perform basic arithmetic, statistical calculations, or more complex mathematical transformations, Pandas has you covered.

Basic Mathematical Operations

Arithmetic Operations

Pandas allows you to perform arithmetic operations (+, -, *, /) directly on Series and DataFrame objects.

python
import pandas as pd
import numpy as np

# Create a simple DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

print("Original DataFrame:")
print(df)

# Addition
print("\nAdding 5 to all values:")
print(df + 5)

# Multiplication
print("\nMultiplying all values by 2:")
print(df * 2)

# Division
print("\nDividing all values by 10:")
print(df / 10)

Output:

Original DataFrame:
   A   B    C
1  10  100
2  20  200
3  30  300
4  40  400
5  50  500

Adding 5 to all values:
    A   B    C
 6  15  105
 7  25  205
 8  35  305
 9  45  405
10  55  505

Multiplying all values by 2:
    A   B    C
 2  20  200
 4  40  400
 6  60  600
 8  80  800
10 100 1000

Dividing all values by 10:
    A    B     C
0.1  1.0  10.0
0.2  2.0  20.0
0.3  3.0  30.0
0.4  4.0  40.0
0.5  5.0  50.0

Operations Between DataFrames

You can perform operations between different DataFrames as well:

python
# Create two DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60]
})

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)

# Addition of two DataFrames
print("\nAddition of two DataFrames:")
print(df1 + df2)

# Subtraction of two DataFrames
print("\nSubtraction (df2 - df1):")
print(df2 - df1)

Output:

DataFrame 1:
   A  B
0  1  4
1  2  5
2  3  6

DataFrame 2:
    A   B
0  10  40
1  20  50
2  30  60

Addition of two DataFrames:
    A   B
0  11  44
1  22  55
2  33  66

Subtraction (df2 - df1):
    A   B
0   9  36
1  18  45
2  27  54

Aggregate Mathematical Functions

Pandas provides various aggregate functions to compute descriptive statistics.

Basic Statistical Functions

python
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

print("DataFrame:")
print(df)

# Sum of all columns
print("\nSum of all columns:")
print(df.sum())

# Mean of all columns
print("\nMean of all columns:")
print(df.mean())

# Standard deviation
print("\nStandard deviation:")
print(df.std())

# Min and Max values
print("\nMinimum values:")
print(df.min())
print("\nMaximum values:")
print(df.max())

Output:

DataFrame:
   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300
3  4  40  400
4  5  50  500

Sum of all columns:
A      15
B     150
C    1500
dtype: int64

Mean of all columns:
A      3.0
B     30.0
C    300.0
dtype: float64

Standard deviation:
A      1.581139
B     15.811388
C    158.113883
dtype: float64

Minimum values:
A      1
B     10
C    100
dtype: int64

Maximum values:
A      5
B     50
C    500
dtype: int64

Row-wise and Column-wise Operations

You can apply functions along rows or columns using the axis parameter:

python
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

print("DataFrame:")
print(df)

# Column-wise sum (default)
print("\nColumn-wise sum:")
print(df.sum())

# Row-wise sum
print("\nRow-wise sum:")
print(df.sum(axis=1))

# Column-wise mean
print("\nColumn-wise mean:")
print(df.mean())

# Row-wise mean
print("\nRow-wise mean:")
print(df.mean(axis=1))

Output:

DataFrame:
   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300
3  4  40  400
4  5  50  500

Column-wise sum:
A      15
B     150
C    1500
dtype: int64

Row-wise sum:
0    111
1    222
2    333
3    444
4    555
dtype: int64

Column-wise mean:
A      3.0
B     30.0
C    300.0
dtype: float64

Row-wise mean:
0     37.0
1     74.0
2    111.0
3    148.0
4    185.0
dtype: float64

Advanced Mathematical Functions

Element-wise Functions

Pandas supports NumPy's universal functions (ufuncs) for element-wise operations:

python
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [0.1, 0.2, 0.3, 0.4, 0.5]
})

print("DataFrame:")
print(df)

# Square root
print("\nSquare root:")
print(np.sqrt(df))

# Exponential
print("\nExponential (e^x):")
print(np.exp(df))

# Logarithm (natural log)
print("\nNatural logarithm:")
print(np.log(df))

# Rounding
print("\nRounded to nearest integer:")
print(np.round(df))

Output:

DataFrame:
   A    B
1  0.1
2  0.2
3  0.3
4  0.4
5  0.5

Square root:
          A         B
1.000000  0.316228
1.414214  0.447214
1.732051  0.547723
2.000000  0.632456
2.236068  0.707107

Exponential (e^x):
           A         B
 2.718282  1.105171
 7.389056  1.221403
20.085537  1.349859
54.598150  1.491825
148.413159  1.648721

Natural logarithm:
          A         B
0.000000 -2.302585
0.693147 -1.609438
1.098612 -1.203973
1.386294 -0.916291
1.609438 -0.693147

Rounded to nearest integer:
    A    B
1.0  0.0
2.0  0.0
3.0  0.0
4.0  0.0
5.0  0.0

Custom Functions with `apply()`

The apply() method allows you to apply custom functions to your data:

python
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

print("DataFrame:")
print(df)

# Apply custom function to each column
def double_and_add_one(col):
    return col * 2 + 1

print("\nApplying double_and_add_one to each column:")
print(df.apply(double_and_add_one))

# Apply custom function to each row
def range_of_values(row):
    return row.max() - row.min()

print("\nRange of values in each row:")
print(df.apply(range_of_values, axis=1))

# Using lambda functions
print("\nSquare of each value:")
print(df.apply(lambda x: x ** 2))

Output:

DataFrame:
   A   B    C
1  10  100
2  20  200
3  30  300
4  40  400
5  50  500

Applying double_and_add_one to each column:
    A    B     C
 3   21   201
 5   41   401
 7   61   601
 9   81   801
11  101  1001

Range of values in each row:
   99
  198
  297
  396
  495
dtype: int64

Square of each value:
    A     B       C
 1   100   10000
 4   400   40000
 9   900   90000
16  1600  160000
25  2500  250000

Practical Examples

Example 1: Financial Analysis

Let's calculate some basic financial metrics using Pandas:

python
# Stock price data
stock_data = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', periods=5),
    'Open': [100, 102, 104, 103, 105],
    'High': [105, 107, 108, 106, 110],
    'Low': [98, 100, 102, 101, 103],
    'Close': [102, 104, 103, 105, 108]
})

stock_data.set_index('Date', inplace=True)
print("Stock price data:")
print(stock_data)

# Calculate daily returns
stock_data['Daily_Return'] = stock_data['Close'].pct_change() * 100

# Calculate trading range
stock_data['Range'] = stock_data['High'] - stock_data['Low']

# Calculate moving average of closing price (window of 3 days)
stock_data['MA3'] = stock_data['Close'].rolling(window=3).mean()

print("\nStock data with calculated metrics:")
print(stock_data)

print("\nSummary statistics:")
print(stock_data.describe())

Output:

Stock price data:
            Open  High  Low  Close
Date                              
2023-01-01   100   105   98    102
2023-01-02   102   107  100    104
2023-01-03   104   108  102    103
2023-01-04   103   106  101    105
2023-01-05   105   110  103    108

Stock data with calculated metrics:
            Open  High  Low  Close  Daily_Return  Range        MA3
Date                                                              
2023-01-01   100   105   98    102           NaN      7        NaN
2023-01-02   102   107  100    104     1.960784      7        NaN
2023-01-03   104   108  102    103    -0.961538      6  103.000000
2023-01-04   103   106  101    105     1.941748      5  104.000000
2023-01-05   105   110  103    108     2.857143      7  105.333333

Summary statistics:
             Open       High        Low      Close  Daily_Return     Range        MA3
count    5.000000   5.000000   5.000000   5.000000     4.000000  5.000000   3.000000
mean   102.800000 107.200000 100.800000 104.400000     1.449534  6.400000 104.111111
std      1.923538   1.923538   1.923538   2.302173     1.794594  0.894427   1.154701
min    100.000000 105.000000  98.000000 102.000000    -0.961538  5.000000 103.000000
25%    102.000000 106.000000 100.000000 103.000000     0.714970  6.000000 103.500000
50%    103.000000 107.000000 101.000000 104.000000     1.951266  7.000000 104.000000
75%    104.000000 108.000000 102.000000 105.000000     2.190329  7.000000 104.666667
max    105.000000 110.000000 103.000000 108.000000     2.857143  7.000000 105.333333

Example 2: Data Normalization

Normalization is a common preprocessing step in machine learning:

python
# Sample dataset
data = pd.DataFrame({
    'Feature1': [10, 20, 30, 40, 50],
    'Feature2': [100, 150, 200, 250, 300],
    'Feature3': [1000, 2000, 3000, 4000, 5000]
})

print("Original dataset:")
print(data)

# Min-Max Normalization (scaling features to range [0,1])
def min_max_normalize(x):
    return (x - x.min()) / (x.max() - x.min())

# Z-score normalization (standardization)
def z_score_normalize(x):
    return (x - x.mean()) / x.std()

normalized_data_minmax = data.apply(min_max_normalize)
print("\nMin-Max Normalized data (scale 0 to 1):")
print(normalized_data_minmax)

normalized_data_zscore = data.apply(z_score_normalize)
print("\nZ-Score Normalized data:")
print(normalized_data_zscore)

Output:

Original dataset:
   Feature1  Feature2  Feature3
      10       100      1000
      20       150      2000
      30       200      3000
      40       250      4000
      50       300      5000

Min-Max Normalized data (scale 0 to 1):
   Feature1  Feature2  Feature3
     0.0       0.0       0.0
     0.25      0.25      0.25
     0.5       0.5       0.5
     0.75      0.75      0.75
     1.0       1.0       1.0

Z-Score Normalized data:
   Feature1  Feature2  Feature3
-1.264911 -1.264911 -1.264911
-0.632456 -0.632456 -0.632456
0.000000  0.000000  0.000000
0.632456  0.632456  0.632456
1.264911  1.264911  1.264911

Summary

In this tutorial, we explored Pandas' comprehensive set of mathematical functions that enable efficient data transformation and analysis. We covered:

Basic Arithmetic Operations: Addition, subtraction, multiplication, and division on DataFrames
Aggregate Functions: Computing statistics like sum, mean, standard deviation
Element-wise Functions: Applying mathematical functions to each element
Custom Functions: Using apply() to implement your own operations
Practical Examples: Financial analysis and data normalization

With these tools, you can perform a wide range of mathematical transformations on your datasets, making Pandas an indispensable tool for data analysis and preparation for machine learning.

Additional Resources and Exercises

Exercises

Basic Operations:
- Create a DataFrame with employee information (name, salary, bonus) and calculate their total compensation (salary + bonus).
- Calculate percentage increase in their compensation if everyone gets a 5% raise.
Statistical Analysis:
- Create a DataFrame with student exam scores for different subjects and calculate:
  - Average score for each student
  - Average score for each subject
  - Identify highest and lowest performing student in each subject
Advanced Transformation:
- Create a dataset with temperature readings in Celsius for a week.
- Convert the temperatures to Fahrenheit (F = C * 9/5 + 32)
- Calculate the 3-day moving average of temperatures
- Identify days with temperatures above the weekly average
Real-world Data:
- Download a real dataset (e.g., stock prices, weather data)
- Apply at least three different mathematical transformations
- Create visualizations to show the results of your transformations

By practicing these exercises, you'll gain practical experience with Pandas' mathematical functions and develop your data transformation skills.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Mathematical Operations​

Arithmetic Operations​

Operations Between DataFrames​

Aggregate Mathematical Functions​

Basic Statistical Functions​

Row-wise and Column-wise Operations​

Advanced Mathematical Functions​

Element-wise Functions​

Custom Functions with apply()​

Practical Examples​

Example 1: Financial Analysis​

Example 2: Data Normalization​

Summary​

Additional Resources and Exercises​

Further Reading​

Exercises​

Introduction

Basic Mathematical Operations

Arithmetic Operations

Operations Between DataFrames

Aggregate Mathematical Functions

Basic Statistical Functions

Row-wise and Column-wise Operations

Advanced Mathematical Functions

Element-wise Functions

Custom Functions with `apply()`

Practical Examples

Example 1: Financial Analysis

Example 2: Data Normalization

Summary

Additional Resources and Exercises

Further Reading

Exercises