Pandas Column Operations
When working with data in Pandas, column operations are among the most common tasks you'll perform. Whether you're selecting specific columns for analysis, creating new columns from existing data, or transforming column values, mastering column operations is essential for effective data manipulation.
Introduction to Pandas Columns
In a Pandas DataFrame, columns represent variables or features of your dataset. Each column has a name (label) and contains data of a specific type. Understanding how to work with columns efficiently is crucial for data analysis and preprocessing.
Let's start by importing Pandas and creating a sample DataFrame:
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'Age': [24, 27, 22, 32, 28],
    'City': ['New York', 'Boston', 'Chicago', 'Seattle', 'Miami'],
    'Salary': [72000, 65000, 58000, 90000, 76000]
}
df = pd.DataFrame(data)
print(df)
Output:
      Name  Age     City  Salary
0    Alice   24  New York   72000
1      Bob   27    Boston   65000
2  Charlie   22   Chicago   58000
3    David   32   Seattle   90000
4     Emma   28     Miami   76000
Basic Column Selection
Selecting a Single Column
You can select a single column from a DataFrame in two ways:
# Method 1: Using square brackets - returns a pandas Series
ages = df['Age']
print("Type:", type(ages))
print(ages)
# Method 2: Using dot notation (only works for simple column names without spaces or special characters)
ages_dot = df.Age
print("Type:", type(ages_dot))
print(ages_dot)
Output:
Type: <class 'pandas.core.series.Series'>
0    24
1    27
2    22
3    32
4    28
Name: Age, dtype: int64
Type: <class 'pandas.core.series.Series'>
0    24
1    27
2    22
3    32
4    28
Name: Age, dtype: int64
Selecting Multiple Columns
To select multiple columns, you can provide a list of column names:
# Select Name and Salary columns
name_salary = df[['Name', 'Salary']]
print(type(name_salary))
print(name_salary)
Output:
<class 'pandas.core.frame.DataFrame'>
      Name  Salary
0    Alice   72000
1      Bob   65000
2  Charlie   58000
3    David   90000
4     Emma   76000
Adding and Creating New Columns
Adding a New Column with a Constant Value
# Add a department column with the same value for all rows
df['Department'] = 'Engineering'
print(df)
Output:
      Name  Age     City  Salary   Department
0    Alice   24  New York   72000  Engineering
1      Bob   27    Boston   65000  Engineering
2  Charlie   22   Chicago   58000  Engineering
3    David   32   Seattle   90000  Engineering
4     Emma   28     Miami   76000  Engineering
Creating a New Column Based on Existing Columns
You can create new columns by performing operations on existing columns:
# Calculate bonus as 10% of salary
df['Bonus'] = df['Salary'] * 0.1
print(df)
Output:
      Name  Age     City  Salary   Department    Bonus
0    Alice   24  New York   72000  Engineering   7200.0
1      Bob   27    Boston   65000  Engineering   6500.0
2  Charlie   22   Chicago   58000  Engineering   5800.0
3    David   32   Seattle   90000  Engineering   9000.0
4     Emma   28     Miami   76000  Engineering   7600.0
Using apply() to Create Columns
For more complex operations, you can use the apply() function:
# Create a new column with custom logic
def calculate_experience(age):
    return max(0, age - 21)
df['Experience'] = df['Age'].apply(calculate_experience)
print(df)
Output:
      Name  Age     City  Salary   Department    Bonus  Experience
0    Alice   24  New York   72000  Engineering   7200.0           3
1      Bob   27    Boston   65000  Engineering   6500.0           6
2  Charlie   22   Chicago   58000  Engineering   5800.0           1
3    David   32   Seattle   90000  Engineering   9000.0          11
4     Emma   28     Miami   76000  Engineering   7600.0           7
Using Lambda Functions with Apply
You can also use lambda functions for simpler operations:
# Create a column for total compensation
df['Total_Comp'] = df.apply(lambda row: row['Salary'] + row['Bonus'], axis=1)
print(df[['Name', 'Salary', 'Bonus', 'Total_Comp']])
Output:
      Name  Salary    Bonus  Total_Comp
0    Alice   72000   7200.0      79200.0
1      Bob   65000   6500.0      71500.0
2  Charlie   58000   5800.0      63800.0
3    David   90000   9000.0      99000.0
4     Emma   76000   7600.0      83600.0
Renaming Columns
Using the rename() Method
# Rename specific columns
renamed_df = df.rename(columns={'Name': 'Employee', 'City': 'Location'})
print(renamed_df.columns)
Output:
Index(['Employee', 'Age', 'Location', 'Salary', 'Department', 'Bonus', 'Experience',
       'Total_Comp'],
      dtype='object')
Renaming All Columns at Once
# Create a smaller dataframe for demo
mini_df = df[['Name', 'Age', 'Salary']].copy()
# Rename all columns
mini_df.columns = ['Employee', 'Years', 'Annual_Pay']
print(mini_df)
Output:
  Employee  Years  Annual_Pay
0    Alice     24       72000
1      Bob     27       65000
2  Charlie     22       58000
3    David     32       90000
4     Emma     28       76000
Deleting Columns
Using drop() Method
# Remove the Department column
df_no_dept = df.drop('Department', axis=1)
print(df_no_dept.columns)
# Remove multiple columns
df_simplified = df.drop(['Bonus', 'Experience', 'Total_Comp'], axis=1)
print(df_simplified.columns)
Output:
Index(['Name', 'Age', 'City', 'Salary', 'Bonus', 'Experience', 'Total_Comp'], dtype='object')
Index(['Name', 'Age', 'City', 'Salary', 'Department'], dtype='object')
Using del Statement
# Create a copy to avoid modifying our main DataFrame
df_copy = df.copy()
# Delete a column
del df_copy['Bonus']
print(df_copy.columns)
Output:
Index(['Name', 'Age', 'City', 'Salary', 'Department', 'Experience', 'Total_Comp'], dtype='object')
Column Operations and Transformations
Applying Math Operations to Columns
# Increase all salaries by 5%
df['New_Salary'] = df['Salary'] * 1.05
print(df[['Name', 'Salary', 'New_Salary']])
Output:
      Name  Salary  New_Salary
0    Alice   72000     75600.0
1      Bob   65000     68250.0
2  Charlie   58000     60900.0
3    David   90000     94500.0
4     Emma   76000     79800.0
String Operations on Text Columns
Pandas provides .str accessor for string operations:
# Convert names to uppercase
df['Name_Upper'] = df['Name'].str.upper()
# Get the first 3 characters of each city
df['City_Short'] = df['City'].str[:3]
print(df[['Name', 'Name_Upper', 'City', 'City_Short']])
Output:
      Name Name_Upper     City City_Short
0    Alice     ALICE  New York        New
1      Bob       BOB    Boston        Bos
2  Charlie   CHARLIE   Chicago        Chi
3    David     DAVID   Seattle        Sea
4     Emma      EMMA     Miami        Mia
Conditional Column Operations
You can create columns based on conditions using numpy.where() or pandas.Series.map():
# Using numpy.where
df['Salary_Category'] = np.where(df['Salary'] >= 70000, 'High', 'Standard')
# More complex conditions using multiple where statements
conditions = [
    (df['Age'] < 25),
    (df['Age'] >= 25) & (df['Age'] < 30),
    (df['Age'] >= 30)
]
choices = ['Junior', 'Mid-level', 'Senior']
df['Experience_Level'] = np.select(conditions, choices, default='Unknown')
print(df[['Name', 'Age', 'Salary', 'Salary_Category', 'Experience_Level']])
Output:
      Name  Age  Salary Salary_Category Experience_Level
0    Alice   24   72000           High          Junior
1      Bob   27   65000       Standard       Mid-level
2  Charlie   22   58000       Standard          Junior
3    David   32   90000           High          Senior
4     Emma   28   76000           High       Mid-level
Using map() and replace()
# Create a mapping dictionary
city_region = {
    'New York': 'East',
    'Boston': 'East',
    'Chicago': 'Midwest',
    'Seattle': 'West',
    'Miami': 'South'
}
# Apply mapping
df['Region'] = df['City'].map(city_region)
print(df[['City', 'Region']])
Output:
      City  Region
0  New York    East
1    Boston    East
2   Chicago Midwest
3   Seattle    West
4     Miami   South
Real-World Example: Customer Data Analysis
Let's look at a practical example where we analyze customer data to derive insights:
# Create a customer dataset
customer_data = {
    'customer_id': range(1001, 1011),
    'name': ['John Smith', 'Mary Johnson', 'Robert Brown', 'Patricia Davis', 
             'Michael Miller', 'Jennifer Wilson', 'William Moore', 'Elizabeth Taylor', 
             'David Anderson', 'Susan Thomas'],
    'order_total': [120.50, 55.75, 310.00, 45.25, 78.90, 210.30, 15.50, 180.75, 90.20, 325.15],
    'order_date': ['2023-01-15', '2023-01-17', '2023-01-20', '2023-01-22', '2023-01-25', 
                   '2023-02-02', '2023-02-05', '2023-02-10', '2023-02-15', '2023-02-20'],
    'payment_method': ['Credit Card', 'PayPal', 'Credit Card', 'Debit Card', 'Credit Card', 
                       'PayPal', 'Debit Card', 'Credit Card', 'PayPal', 'Credit Card']
}
customers_df = pd.DataFrame(customer_data)
# Convert date string to datetime objects
customers_df['order_date'] = pd.to_datetime(customers_df['order_date'])
# Extract month from the date
customers_df['order_month'] = customers_df['order_date'].dt.month_name()
# Calculate tax (8%)
customers_df['tax'] = customers_df['order_total'] * 0.08
# Calculate total with tax
customers_df['total_with_tax'] = customers_df['order_total'] + customers_df['tax']
# Create discount column based on order amount
conditions = [
    (customers_df['order_total'] < 50),
    (customers_df['order_total'] >= 50) & (customers_df['order_total'] < 200),
    (customers_df['order_total'] >= 200)
]
discount_rates = [0.00, 0.05, 0.10]
customers_df['discount_rate'] = np.select(conditions, discount_rates)
# Calculate final amount after discount
customers_df['final_amount'] = customers_df['total_with_tax'] * (1 - customers_df['discount_rate'])
# Round currency values
for col in ['tax', 'total_with_tax', 'final_amount']:
    customers_df[col] = customers_df[col].round(2)
# Display results
print(customers_df[['name', 'order_total', 'order_month', 'tax', 'discount_rate', 'final_amount']])
Output:
              name  order_total order_month    tax  discount_rate  final_amount
0      John Smith       120.50    January   9.64          0.05       123.88
1    Mary Johnson        55.75    January   4.46          0.05        57.20
2    Robert Brown       310.00    January  24.80          0.10       301.32
3   Patricia Davis        45.25    January   3.62          0.00        48.87
4   Michael Miller        78.90    January   6.31          0.05        80.97
5  Jennifer Wilson       210.30   February  16.82          0.10       204.52
6    William Moore        15.50   February   1.24          0.00        16.74
7  Elizabeth Taylor      180.75   February  14.46          0.05       185.20
8    David Anderson       90.20   February   7.22          0.05        92.45
9     Susan Thomas       325.15   February  26.01          0.10       315.74
Summary
In this tutorial, you've learned how to perform various operations on columns in Pandas DataFrames:
- Basic Column Selection: Selecting single and multiple columns
 - Adding and Creating Columns: Adding constant values and creating derived columns
 - Renaming Columns: Using 
rename()and direct assignment tocolumnsattribute - Deleting Columns: Using 
drop()anddel - Column Transformations:
- Applying mathematical operations
 - String manipulations with 
.straccessor - Conditional operations with 
np.where()andnp.select() - Mapping values with 
map()andreplace() 
 
These operations form the foundation of data manipulation in Pandas and will help you efficiently transform and analyze data in your projects.
Exercises
To reinforce your learning, try these exercises:
- Create a DataFrame with student data (name, scores in different subjects) and calculate the average score for each student.
 - Using the customer DataFrame from our example, add a new column that categorizes customers into 'New' (February) and 'Returning' (January) based on their order month.
 - Create a column that extracts the first name only from the customer names.
 - Add a 'preferred_customer' column that marks customers as True if they spent more than $200 and used Credit Card, otherwise False.
 - Create a 'days_since_order' column that calculates the number of days between the order date and February 28, 2023.
 
Additional Resources
- Pandas Documentation: Working with Text Data
 - Pandas Documentation: Indexing and Selecting Data
 - 10 Minutes to Pandas - Column Operations
 - Real Python: Pandas DataFrame Tutorial
 
With these skills, you'll be well-equipped to handle data manipulation tasks in your data analysis projects!
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!