Pandas Apply Functions

Introduction

When working with data in Pandas, you'll often need to transform values or perform calculations across your DataFrame or Series objects. While basic operations and built-in methods can handle many tasks, sometimes you need to apply custom logic to your data. This is where Pandas' apply functions come in.

In this tutorial, we'll explore how to use apply(), applymap(), and map() functions in Pandas to transform data efficiently. These functions allow you to apply custom operations to your data without writing explicit loops, making your code more concise and often more performant.

The Apply Family of Functions

Pandas provides several functions for applying operations to your data:

apply() - Works on both DataFrame and Series objects to apply a function along an axis
applymap() - Works on DataFrame objects to apply a function to every element
map() - Works on Series objects to map values to other values

Let's explore each of these functions with examples.

Using `apply()` with Series

The apply() method for Series allows you to apply a function to each element in the Series. This is perfect when you need to transform each value independently.

Basic Example

import pandas as pd

# Create a simple Series
numbers = pd.Series([1, 2, 3, 4, 5])

# Apply a square function to each element
squared = numbers.apply(lambda x: x**2)

print("Original Series:")
print(numbers)
print("\nAfter applying square function:")
print(squared)

Output:

Original Series:
  1
  2
  3
  4
  5
dtype: int64

After applying square function:
   1
   4
   9
  16
  25
dtype: int64

Using Named Functions

You can also use named functions instead of lambda functions:

import pandas as pd
import numpy as np

# Create a Series with some missing values
data = pd.Series([10, 20, np.nan, 30, np.nan, 40])

def replace_missing(x):
    return 0 if pd.isna(x) else x

# Apply the function to replace missing values
cleaned_data = data.apply(replace_missing)

print("Original Series:")
print(data)
print("\nAfter replacing missing values:")
print(cleaned_data)

Output:

Original Series:
  10.0
  20.0
   NaN
  30.0
   NaN
  40.0
dtype: float64

After replacing missing values:
  10.0
  20.0
   0.0
  30.0
   0.0
  40.0
dtype: float64

Using `apply()` with DataFrames

When working with DataFrames, apply() operates on entire rows or columns at once, depending on the axis parameter.

Applying to Columns (Default: axis=0)

import pandas as pd

# Create a DataFrame with student scores
data = {
    'Math': [85, 90, 70, 95, 80],
    'Science': [90, 85, 95, 88, 92],
    'English': [75, 85, 80, 90, 85]
}

df = pd.DataFrame(data)

# Calculate the average score for each subject
avg_scores = df.apply(np.mean)

print("Student Scores DataFrame:")
print(df)
print("\nAverage score for each subject:")
print(avg_scores)

Output:

Student Scores DataFrame:
   Math  Science  English
0    85       90       75
1    90       85       85
2    70       95       80
3    95       88       90
4    80       92       85

Average score for each subject:
Math       84.0
Science    90.0
English    83.0
dtype: float64

Applying to Rows (axis=1)

import pandas as pd

# Create a DataFrame with student scores
data = {
    'Math': [85, 90, 70, 95, 80],
    'Science': [90, 85, 95, 88, 92],
    'English': [75, 85, 80, 90, 85]
}

df = pd.DataFrame(data)

# Calculate the average score for each student
df['Average'] = df.apply(lambda row: row.mean(), axis=1)

# Determine if the student passed (average >= 80)
df['Passed'] = df['Average'].apply(lambda x: 'Yes' if x >= 80 else 'No')

print("Student Scores with Average and Pass Status:")
print(df)

Output:

Student Scores with Average and Pass Status:
   Math  Science  English  Average Passed
  85       90       75     83.33    Yes
  90       85       85     86.67    Yes
  70       95       80     81.67    Yes
  95       88       90     91.00    Yes
  80       92       85     85.67    Yes

Using `applymap()` for Element-wise Operations

The applymap() function applies a function to each element in a DataFrame, making it ideal for element-wise transformations.

import pandas as pd
import numpy as np

# Create a DataFrame with some float values
data = {
    'A': [1.23456, 2.34567, 3.45678],
    'B': [4.56789, 5.67890, 6.78901],
    'C': [7.89012, 8.90123, 9.01234]
}

df = pd.DataFrame(data)

# Round all values to 2 decimal places
rounded_df = df.applymap(lambda x: round(x, 2))

print("Original DataFrame:")
print(df)
print("\nAfter rounding all values:")
print(rounded_df)

Output:

Original DataFrame:
          A        B        C
0  1.23456  4.56789  7.89012
1  2.34567  5.67890  8.90123
2  3.45678  6.78901  9.01234

After rounding all values:
      A     B     C
0  1.23  4.57  7.89
1  2.35  5.68  8.90
2  3.46  6.79  9.01

Using `map()` for Value Substitution

The map() function works on Series and is perfect for value substitution or mapping values from one domain to another.

Basic Mapping

import pandas as pd

# Create a Series with fruit names
fruits = pd.Series(['apple', 'banana', 'orange', 'grape', 'apple', 'orange'])

# Create a mapping dictionary
fruit_prices = {
    'apple': 1.2,
    'banana': 0.5,
    'orange': 0.8,
    'grape': 2.5
}

# Map the fruits to their prices
fruit_prices_series = fruits.map(fruit_prices)

print("Fruits:")
print(fruits)
print("\nMapped Prices:")
print(fruit_prices_series)

Output:

Fruits:
   apple
  banana
  orange
   grape
   apple
  orange
dtype: object

Mapped Prices:
  1.2
  0.5
  0.8
  2.5
  1.2
  0.8
dtype: float64

Mapping with a Function

You can also use a function with map():

import pandas as pd

# Create a Series with string values
data = pd.Series(['PYTHON', 'pandas', 'DATA', 'analysis'])

# Apply a function to standardize the strings
standardized = data.map(lambda x: x.capitalize())

print("Original Series:")
print(data)
print("\nAfter standardizing:")
print(standardized)

Output:

Original Series:
0      PYTHON
1      pandas
2        DATA
3    analysis
dtype: object

After standardizing:
0      Python
1      Pandas
2        Data
3    Analysis
dtype: object

Real-world Examples

Example 1: Cleaning and Transforming Customer Data

import pandas as pd
import numpy as np

# Sample customer data
data = {
    'customer_id': [101, 102, 103, 104, 105],
    'name': ['John Smith', 'JANE DOE', 'robert johnson', 'Sarah Williams', 'mike brown'],
    'email': ['[email protected]', 'jane@example', '[email protected]', '', '[email protected]'],
    'purchase_amount': [125.50, 200.75, np.nan, 350.25, 175.00],
    'purchase_date': ['2023-01-15', '2023-01-20', '2023-01-25', '2023-02-01', '2023-02-10']
}

df = pd.DataFrame(data)

# Data cleaning and transformation
# 1. Standardize names (first letter capitalized)
df['name'] = df['name'].apply(lambda x: ' '.join([word.capitalize() for word in x.split()]))

# 2. Validate emails
def validate_email(email):
    if not email or '@' not in email or '.' not in email.split('@')[1]:
        return 'Invalid Email'
    return email

df['email'] = df['email'].apply(validate_email)

# 3. Fill missing purchase amounts with average
avg_purchase = df['purchase_amount'].mean()
df['purchase_amount'] = df['purchase_amount'].apply(lambda x: avg_purchase if pd.isna(x) else x)

# 4. Convert dates to datetime and extract month
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
df['purchase_month'] = df['purchase_date'].apply(lambda x: x.strftime('%B'))

print("Cleaned and transformed customer data:")
print(df)

Output:

Cleaned and transformed customer data:
   customer_id            name              email  purchase_amount purchase_date purchase_month
       101      John Smith  [email protected]         125.50    2023-01-15      January
       102        Jane Doe      Invalid Email         200.75    2023-01-20      January
       103  Robert Johnson  [email protected]         212.88    2023-01-25      January
       104  Sarah Williams      Invalid Email         350.25    2023-02-01     February
       105      Mike Brown  [email protected]         175.00    2023-02-10     February

Example 2: Analyzing Financial Data

import pandas as pd
import numpy as np

# Sample stock data
data = {
    'date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
    'stock_a': [100, 102, 104, 103, 105, 107, 108, 106, 104, 105],
    'stock_b': [50, 52, 51, 53, 54, 52, 51, 50, 51, 52],
    'stock_c': [200, 198, 195, 197, 201, 203, 205, 202, 200, 205]
}

df = pd.DataFrame(data)
df.set_index('date', inplace=True)

# Calculate daily returns
def calculate_return(column):
    return column.pct_change() * 100

daily_returns = df.apply(calculate_return)

# Calculate volatility (standard deviation of returns)
volatility = daily_returns.apply(np.std)

# Calculate cumulative returns
def calculate_cumulative_return(column):
    return ((column.iloc[-1] - column.iloc[0]) / column.iloc[0]) * 100

cumulative_returns = df.apply(calculate_cumulative_return)

# Create a summary DataFrame
summary = pd.DataFrame({
    'Starting Price': df.iloc[0],
    'Ending Price': df.iloc[-1],
    'Cumulative Return (%)': cumulative_returns,
    'Volatility (%)': volatility
})

print("Stock Price Summary:")
print(summary)

Output:

Stock Price Summary:
         Starting Price  Ending Price  Cumulative Return (%)  Volatility (%)
stock_a            100           105                    5.00          1.25
stock_b             50            52                    4.00          1.37
stock_c            200           205                    2.50          1.30

Tips and Best Practices

Use vectorized operations when possible: Before reaching for apply(), check if there's a built-in Pandas function that can do the job more efficiently.
Consider performance: For large DataFrames, apply() can be slower than vectorized operations. If performance is critical, consider alternatives like NumPy operations.
Choose the right function:
- Use apply() when working with rows or columns as a whole
- Use applymap() for element-wise operations on DataFrames
- Use map() for simple value substitutions on Series
Pass additional arguments to your function using partial from functools:

from functools import partial

def custom_function(x, multiplier):
    return x * multiplier

# Apply with a specific multiplier
df.apply(partial(custom_function, multiplier=2))

Combine with method chaining for cleaner code:

result = (df
          .dropna()
          .apply(custom_function)
          .sort_values())

Summary

Pandas apply functions provide a powerful way to transform data in DataFrames and Series:

apply() lets you apply functions to entire rows or columns
applymap() applies functions to every element in a DataFrame
map() is great for value substitution in a Series

These functions help you avoid explicit loops and make your data transformation code more concise and readable. While they may not always be the most performant option for large datasets, they strike a good balance between readability and efficiency for most data analysis tasks.

Exercises

Create a DataFrame with employee data (name, department, salary) and use apply() to calculate a bonus for each employee based on their department and salary.
Given a Series of dates, use apply() to extract the day of the week for each date.
Create a DataFrame with product information and use applymap() to format all string columns to be title case and all numeric columns to have two decimal places.
Use map() to convert a Series of country codes to full country names using a dictionary.
Analyze a dataset of your choice using the apply functions to transform and extract meaningful information.

Additional Resources

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

The Apply Family of Functions​

Using apply() with Series​

Basic Example​

Using Named Functions​

Using apply() with DataFrames​

Applying to Columns (Default: axis=0)​

Applying to Rows (axis=1)​

Using applymap() for Element-wise Operations​

Using map() for Value Substitution​

Basic Mapping​

Mapping with a Function​

Real-world Examples​

Example 1: Cleaning and Transforming Customer Data​

Example 2: Analyzing Financial Data​

Tips and Best Practices​

Summary​

Exercises​

Additional Resources​

Introduction

The Apply Family of Functions

Using `apply()` with Series

Basic Example

Using Named Functions

Using `apply()` with DataFrames

Applying to Columns (Default: axis=0)

Applying to Rows (axis=1)

Using `applymap()` for Element-wise Operations

Using `map()` for Value Substitution

Basic Mapping

Mapping with a Function

Real-world Examples

Example 1: Cleaning and Transforming Customer Data

Example 2: Analyzing Financial Data

Tips and Best Practices

Summary

Exercises

Additional Resources