Skip to main content

Pandas Slicing

Slicing is one of the most powerful features in pandas for extracting specific portions of your data. Similar to how you can slice Python lists, pandas provides enhanced functionality to slice both Series and DataFrame objects in a more intuitive and flexible way.

Introduction to Slicing in Pandas

Slicing in pandas allows you to select a subset of your data by specifying start and end points. Unlike Python's standard slicing, pandas slicing works on both index labels and integer positions, giving you more flexibility when working with your data.

There are two main approaches for slicing in pandas:

  • Index-based slicing: Using .iloc[] to slice based on integer positions
  • Label-based slicing: Using .loc[] to slice based on index labels

Let's understand each approach with examples.

Index-based Slicing with .iloc[]

.iloc[] is used for integer-location based indexing. It works similarly to Python's standard list slicing.

Basic Syntax

python
# Basic syntax
df.iloc[row_start:row_end, column_start:column_end]

# For just rows
df.iloc[row_start:row_end]

# For a single row
df.iloc[row_index]

# For a single column
df.iloc[:, column_index]

Examples of .iloc[] Slicing

Let's create a simple DataFrame to demonstrate:

python
import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 45],
'Salary': [50000, 60000, 70000, 80000, 90000],
'Department': ['HR', 'IT', 'Finance', 'IT', 'Marketing']
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age  Salary Department
0 Alice 25 50000 HR
1 Bob 30 60000 IT
2 Charlie 35 70000 Finance
3 David 40 80000 IT
4 Eva 45 90000 Marketing

Now, let's slice this DataFrame using .iloc[]:

python
# Select the first two rows
first_two_rows = df.iloc[0:2]
print("First two rows:")
print(first_two_rows)

# Select rows 1 through 3
middle_rows = df.iloc[1:4]
print("\nRows 1 through 3:")
print(middle_rows)

# Select specific columns (columns 1 and 2)
specific_cols = df.iloc[:, 1:3]
print("\nColumns 1 and 2 (Age and Salary):")
print(specific_cols)

# Select a subset of rows and columns
subset = df.iloc[1:4, 0:2]
print("\nRows 1-3 and columns 0-1:")
print(subset)

Output:

First two rows:
Name Age Salary Department
0 Alice 25 50000 HR
1 Bob 30 60000 IT

Rows 1 through 3:
Name Age Salary Department
1 Bob 30 60000 IT
2 Charlie 35 70000 Finance
3 David 40 80000 IT

Columns 1 and 2 (Age and Salary):
Age Salary
0 25 50000
1 30 60000
2 35 70000
3 40 80000
4 45 90000

Rows 1-3 and columns 0-1:
Name Age
1 Bob 30
2 Charlie 35
3 David 40

Label-based Slicing with .loc[]

.loc[] is used for label-based indexing. It allows you to select data by the names of rows and columns rather than their integer positions.

Basic Syntax

python
# Basic syntax
df.loc[row_label_start:row_label_end, column_label_start:column_label_end]

# For just rows
df.loc[row_label_start:row_label_end]

# For a single row
df.loc[row_label]

# For a single column
df.loc[:, column_label]

Examples of .loc[] Slicing

Let's create a DataFrame with custom index labels:

python
# Create a DataFrame with custom row indices
df_indexed = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])
print(df_indexed)

Output:

       Name  Age  Salary Department
A Alice 25 50000 HR
B Bob 30 60000 IT
C Charlie 35 70000 Finance
D David 40 80000 IT
E Eva 45 90000 Marketing

Now, let's slice using .loc[]:

python
# Select rows by label
subset_rows = df_indexed.loc['B':'D']
print("Rows B through D:")
print(subset_rows)

# Select specific columns
subset_cols = df_indexed.loc[:, 'Name':'Age']
print("\nColumns 'Name' and 'Age':")
print(subset_cols)

# Select a subset of rows and columns
subset = df_indexed.loc['B':'D', 'Name':'Salary']
print("\nRows B-D and columns Name through Salary:")
print(subset)

# Select specific rows and columns using lists
specific_cells = df_indexed.loc[['A', 'C', 'E'], ['Name', 'Department']]
print("\nSpecific rows and columns:")
print(specific_cells)

Output:

Rows B through D:
Name Age Salary Department
B Bob 30 60000 IT
C Charlie 35 70000 Finance
D David 40 80000 IT

Columns 'Name' and 'Age':
Name Age
A Alice 25
B Bob 30
C Charlie 35
D David 40
E Eva 45

Rows B-D and columns Name through Salary:
Name Age Salary
B Bob 30 60000
C Charlie 35 70000
D David 40 80000

Specific rows and columns:
Name Department
A Alice HR
C Charlie Finance
E Eva Marketing

Slicing a Pandas Series

Slicing a Series works in a similar way to a DataFrame, but since a Series is one-dimensional, we only need to specify the row indexer.

python
# Create a Series
ages = df['Age']
print("Original Series:")
print(ages)

# Slice with integer position
print("\nFirst three ages using iloc:")
print(ages.iloc[0:3])

# Slice with labels (if the Series has a custom index)
ages_indexed = df_indexed['Age']
print("\nAges from B to D using loc:")
print(ages_indexed.loc['B':'D'])

Output:

Original Series:
0 25
1 30
2 35
3 40
4 45
Name: Age, dtype: int64

First three ages using iloc:
0 25
1 30
2 35
Name: Age, dtype: int64

Ages from B to D using loc:
B 30
C 35
D 40
Name: Age, dtype: int64

Advanced Slicing Techniques

Boolean Indexing with Slicing

You can combine boolean indexing with slicing for more powerful selections:

python
# Get rows where Age > 30 and then select specific columns
filtered_subset = df.loc[df['Age'] > 30, 'Name':'Salary']
print("Rows with Age > 30, columns from Name to Salary:")
print(filtered_subset)

Output:

Rows with Age > 30, columns from Name to Salary:
Name Age Salary
2 Charlie 35 70000
3 David 40 80000
4 Eva 45 90000

Using iloc and loc with Lists

You can use lists of positions or labels for non-contiguous selections:

python
# Select specific rows and columns using iloc with lists
non_contiguous_iloc = df.iloc[[0, 2, 4], [0, 2]]
print("Non-contiguous selection with iloc:")
print(non_contiguous_iloc)

# Select specific rows and columns using loc with lists
non_contiguous_loc = df_indexed.loc[['A', 'C', 'E'], ['Name', 'Salary']]
print("\nNon-contiguous selection with loc:")
print(non_contiguous_loc)

Output:

Non-contiguous selection with iloc:
Name Salary
0 Alice 50000
2 Charlie 70000
4 Eva 90000

Non-contiguous selection with loc:
Name Salary
A Alice 50000
C Charlie 70000
E Eva 90000

Practical Example: Data Analysis with Slicing

Let's see how slicing can be used in a real-world analysis scenario. Imagine we have a dataset of daily temperatures for a month:

python
# Create a date range for a month
dates = pd.date_range(start='2023-01-01', periods=31, freq='D')

# Create temperature data
np.random.seed(42) # For reproducibility
temps = np.random.normal(10, 5, size=31) # Mean 10°C, std 5°C
temps_df = pd.DataFrame({'Temperature': temps}, index=dates)

print("Temperature data for January 2023:")
print(temps_df.head())

Output:

Temperature data for January 2023:
Temperature
2023-01-01 13.098047
2023-01-02 8.855402
2023-01-03 11.157983
2023-01-04 9.189796
2023-01-05 10.132882

Now, let's perform some analysis using slicing:

python
# Get the temperatures for the first week
first_week = temps_df.loc['2023-01-01':'2023-01-07']
print("\nFirst week temperatures:")
print(first_week)

# Calculate the average temperature for the first week
first_week_avg = first_week['Temperature'].mean()
print(f"\nAverage temperature for the first week: {first_week_avg:.2f}°C")

# Get temperatures for the second half of the month
second_half = temps_df.iloc[15:]
print("\nSecond half of January:")
print(second_half.head())

# Find the coldest and warmest days
coldest_day = temps_df.loc[temps_df['Temperature'].idxmin()]
warmest_day = temps_df.loc[temps_df['Temperature'].idxmax()]

print(f"\nColdest day: {temps_df['Temperature'].idxmin()}, Temperature: {coldest_day['Temperature']:.2f}°C")
print(f"Warmest day: {temps_df['Temperature'].idxmax()}, Temperature: {warmest_day['Temperature']:.2f}°C")

Output:

First week temperatures:
Temperature
2023-01-01 13.098047
2023-01-02 8.855402
2023-01-03 11.157983
2023-01-04 9.189796
2023-01-05 10.132882
2023-01-06 14.873587
2023-01-07 12.165912

Average temperature for the first week: 11.35°C

Second half of January:
Temperature
2023-01-16 7.118012
2023-01-17 10.256441
2023-01-18 9.346381
2023-01-19 10.488328
2023-01-20 5.766287

Coldest day: 2023-01-23, Temperature: 1.20°C
Warmest day: 2023-01-06, Temperature: 14.87°C

Common Pitfalls and Best Practices

  1. Inclusive vs. Exclusive Endpoints:

    • With .iloc[], the end point is exclusive (like regular Python slicing)
    • With .loc[], the end point is inclusive (different from regular Python slicing)
  2. Chained Indexing: Avoid chained indexing like df['col1']['col2'] as it can lead to unexpected behavior. Instead use df.loc[:, ['col1', 'col2']].

  3. Copying vs. Views: Be aware that slices might return views, not copies. Use .copy() if you need to modify the slice without affecting the original:

python
# Creating a copy to avoid modifying the original
subset_copy = df.iloc[0:2].copy()
subset_copy['Age'] = 0 # Won't affect the original DataFrame
  1. Performance Considerations:
    • .loc[] and .iloc[] are generally faster than boolean indexing
    • Accessing a single column with df['column_name'] is faster than df.loc[:, 'column_name']

Summary

In this tutorial, we learned how to use pandas slicing to extract specific data from DataFrames and Series:

  1. Index-based slicing using .iloc[] for integer position-based selection
  2. Label-based slicing using .loc[] for label-based selection
  3. Series slicing for one-dimensional data
  4. Advanced techniques combining boolean indexing with slicing
  5. Practical applications of slicing in data analysis

Mastering pandas slicing is essential for effective data manipulation and analysis, allowing you to focus on exactly the data you need.

Exercises

To practice what you've learned, try these exercises:

  1. Create a DataFrame with 10 rows and 5 columns, then:

    • Select rows 2-5 and columns 1-3
    • Select odd-numbered rows and even-numbered columns
    • Select the first and last rows, and the first and last columns
  2. Using the temperature dataset example:

    • Find the average temperature for each week of the month
    • Identify days where the temperature was above the monthly average
    • Create a new column showing the temperature difference from the previous day
  3. Create a DataFrame with custom row and column labels, then practice slicing with .loc[] to retrieve different subsets of data.

Additional Resources

Happy slicing!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)