Skip to main content

Pandas Position Selection

In Pandas, there are multiple ways to select data from DataFrames and Series. While label-based selection (using loc) allows you to access data by row and column names, position-based selection gives you the ability to access data by its integer position. This approach is similar to how you would access elements in a Python list.

Introduction to Position-Based Selection

Position-based selection in Pandas is primarily done using the iloc method, which stands for "integer location." This method allows you to select data based on the numerical position of rows and columns, rather than their labels.

Let's explore how to use position-based selection methods in Pandas with clear examples.

Basic Position Selection with iloc

The iloc indexer allows you to select data by integer-based positions.

Selecting a Single Value

python
import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
'Salary': [50000, 60000, 75000, 80000, 65000]
})

# Select the value at row 1, column 2
value = df.iloc[1, 2]
print(value)

Output:

London

In this example, df.iloc[1, 2] selects the element at the 2nd row (index 1) and 3rd column (index 2).

Selecting a Row

python
# Select the entire second row
row = df.iloc[1]
print(row)

Output:

Name      Bob
Age 30
City London
Salary 60000
Name: 1, dtype: object

Selecting a Column

python
# Select the third column
column = df.iloc[:, 2]
print(column)

Output:

0    New York
1 London
2 Paris
3 Tokyo
4 Sydney
Name: City, dtype: object

Slicing with iloc

You can use slices to select ranges of rows and columns:

python
# Select rows 1-3 and columns 0-2
subset = df.iloc[1:4, 0:3]
print(subset)

Output:

      Name  Age    City
1 Bob 30 London
2 Charlie 35 Paris
3 David 40 Tokyo

Remember that slicing in Python is inclusive of the start index but exclusive of the end index.

Using Lists with iloc

You can pass lists to iloc to select specific rows or columns by position:

python
# Select rows 0, 2, and 4, and columns 0 and 2
subset = df.iloc[[0, 2, 4], [0, 2]]
print(subset)

Output:

      Name    City
0 Alice New York
2 Charlie Paris
4 Emma Sydney

Boolean Indexing with iloc

You can combine iloc with boolean arrays:

python
# Create a boolean mask
mask = df['Age'] > 30
print(mask)

# Use boolean mask with iloc
selected_rows = df.iloc[mask.values]
print(selected_rows)

Output:

0    False
1 False
2 True
3 True
4 True
Name: Age, dtype: bool

Name Age City Salary
2 Charlie 35 Paris 75000
3 David 40 Tokyo 80000
4 Emma 45 Sydney 65000

Notice that we need to use .values to convert the pandas Series to a NumPy array when using it with iloc.

Fast Scalar Access with iat

If you need to access a single value and speed is critical, iat is faster than iloc:

python
# Using iat for fast scalar access
value = df.iat[0, 1] # First row, second column
print(value)

Output:

25

The iat indexer is optimized for scalar access and is faster than iloc when you're only retrieving a single value.

Real-World Application: Data Analysis

Let's see how position selection can be used in a data analysis workflow:

python
# Create a sample sales dataset
sales_data = pd.DataFrame({
'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
'Product': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C'],
'Quantity': [10, 15, 8, 12, 20, 14, 9, 11, 16, 13],
'Price': [100, 200, 100, 150, 200, 100, 150, 200, 100, 150]
})

# Calculate revenue
sales_data['Revenue'] = sales_data['Quantity'] * sales_data['Price']

# Select the first 5 days of data
first_week = sales_data.iloc[:5]
print("First Week Data:")
print(first_week)

# Calculate average revenue for the first 5 days
avg_revenue_first_week = first_week['Revenue'].mean()
print(f"\nAverage Revenue in First Week: ${avg_revenue_first_week:.2f}")

# Select specific columns for analysis (Product, Quantity, Revenue)
analysis_data = sales_data.iloc[:, [1, 2, 4]]
print("\nAnalysis Data:")
print(analysis_data)

Output:

First Week Data:
Date Product Quantity Price Revenue
0 2023-01-01 A 10 100 1000
1 2023-01-02 B 15 200 3000
2 2023-01-03 A 8 100 800
3 2023-01-04 C 12 150 1800
4 2023-01-05 B 20 200 4000

Average Revenue in First Week: $2120.00

Analysis Data:
Product Quantity Revenue
0 A 10 1000
1 B 15 3000
2 A 8 800
3 C 12 1800
4 B 20 4000
5 A 14 1400
6 C 9 1350
7 B 11 2200
8 A 16 1600
9 C 13 1950

Common Pitfalls and Tips

  1. Zero-based indexing: Remember that Pandas uses zero-based indexing, so the first row or column is accessed with index 0.

  2. Out of bounds errors: Using an index outside the valid range will result in an IndexError:

python
# This will cause an error
try:
value = df.iloc[10, 0] # There's no row at index 10
except IndexError as e:
print(f"Error: {e}")
  1. Mixing iloc and loc: Don't confuse iloc (integer-position based) with loc (label-based). For example:
python
# This may not give you the result you expect
print(df.columns)
print(df.iloc[:, 1]) # Selects the second column (Age)
print(df.loc[:, 1]) # This will likely cause an error as there's no column labeled '1'
  1. Chained indexing: Avoid chained indexing when making assignments:
python
# Don't do this:
df.iloc[0]['Age'] = 26 # This may not modify the original DataFrame

# Do this instead:
df.iloc[0, 1] = 26 # This will work correctly

Summary

Position-based selection in Pandas is a powerful way to access data based on integer indices rather than labels. Here's what we covered:

  • Using iloc for integer-based indexing of rows and columns
  • Selecting single values, entire rows, or columns
  • Slicing ranges of rows and columns
  • Using lists with iloc for non-contiguous selection
  • Using iat for faster scalar access
  • Applying position-based selection in real-world data analysis

Position-based selection is particularly useful when:

  • You don't know or care about the labels of your data
  • You want to select data based on its order in the DataFrame
  • You need to perform operations on specific portions of your data by position

Practice Exercises

  1. Create a DataFrame with 10 rows and 5 columns, and select:

    • The first 3 rows and all columns
    • The last 2 rows and the first 3 columns
    • Every other row and column
  2. Write a function that takes a DataFrame and returns a new DataFrame containing:

    • The first and last row
    • Every column except the first one
  3. Create a Pandas Series with 20 random numbers and use iloc to:

    • Select the 5 highest values
    • Calculate the mean of every third value

Additional Resources

By mastering position-based selection in Pandas, you'll have greater flexibility and control when working with your data for analysis, cleaning, or transformation tasks.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)