Skip to main content

Pandas GeoPandas Introduction

What is GeoPandas?

GeoPandas is an open-source Python library that extends the functionality of Pandas to enable geospatial operations in Python. It combines the capabilities of Pandas for data manipulation with the geospatial tools of libraries like Shapely, Fiona, and matplotlib.

If you're familiar with Pandas DataFrames, GeoPandas will feel natural as it introduces a new data structure called GeoDataFrame, which is a Pandas DataFrame with special geometry columns and geospatial operations.

Why Use GeoPandas?

  • Seamless Pandas Integration: Works with the Pandas ecosystem you already know
  • Spatial Operations: Perform spatial joins, overlays, and distance calculations
  • Visualization: Easily create maps directly from your data
  • Coordinate System Management: Handle projections and transformations
  • File Format Support: Read and write various geospatial formats like Shapefiles, GeoJSON, etc.

Installation

Before we begin, you'll need to install GeoPandas. It has some dependencies that can sometimes be tricky to install, so the recommended way is via conda:

bash
conda install -c conda-forge geopandas

Alternatively, you can use pip:

bash
pip install geopandas

Basic Concepts in GeoPandas

GeoDataFrame and GeoSeries

The two main data structures in GeoPandas are:

  1. GeoSeries: A vector of geometric objects (similar to a Pandas Series)
  2. GeoDataFrame: A tabular data structure with a GeoSeries column (similar to a Pandas DataFrame)

Let's create a simple GeoDataFrame:

python
import geopandas as gpd
from shapely.geometry import Point

# Create some point geometries
geometry = [Point(xy) for xy in zip([-73.9857, -74.0060, -73.9701],
[40.7484, 40.7128, 40.7831])]

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(
{'name': ['Empire State', 'One World Trade', 'Columbia University'],
'geometry': geometry},
crs="EPSG:4326" # WGS84 coordinate reference system
)

print(gdf)

Output:

                name                    geometry
0 Empire State POINT (-73.98570 40.74840)
1 One World Trade POINT (-74.00600 40.71280)
2 Columbia University POINT (-73.97010 40.78310)

Let's break down what we did:

  1. We created point geometries using Shapely's Point class
  2. We created a GeoDataFrame with a name column and a geometry column
  3. We specified the coordinate reference system (CRS) as WGS84 (common for GPS data)

Reading Geospatial Data

One of the most common tasks is reading existing geospatial data:

python
# Reading a shapefile
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Examine the first few rows
print(world.head())

Output:

    pop_est      continent                      name iso_a3  gdp_md_est  \
0 920938 Oceania Fiji FJI 8374.0
1 53950935 Africa Tanzania TZA 150600.0
2 603253 Oceania W. Sahara ESH 906.5
3 35623680 North America Canada CAN 1674000.0
4 326625791 North America United States of America USA 18560000.0

geometry
0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Basic Visualization

One of the strengths of GeoPandas is the easy visualization of geospatial data:

python
# Basic plot
world.plot(figsize=(10, 6));

# Plot with country names colored by continent
world.plot(column='continent',
categorical=True,
legend=True,
figsize=(10, 6));

The first plot will show a simple world map, while the second will color-code the countries by continent.

Working with Geometry and Coordinate Reference Systems

Understanding Coordinate Reference Systems (CRS)

A coordinate reference system defines how the coordinates in your data relate to locations on the Earth's surface. GeoPandas uses the industry-standard pyproj library to handle CRS transformations.

python
# Check the CRS of our world dataset
print(world.crs)

# Reproject to a different CRS (Mercator projection)
world_mercator = world.to_crs(epsg=3395)
print(world_mercator.crs)

Output:

EPSG:4326
EPSG:3395

Geometric Operations

GeoPandas offers various geometric operations through the GeoSeries object:

python
# Create two polygons
from shapely.geometry import Polygon
polygon1 = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
polygon2 = Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)])

# Create a GeoSeries
gs = gpd.GeoSeries([polygon1, polygon2])

# Calculate the area
print("Areas:")
print(gs.area)

# Find the intersection
intersection = gs[0].intersection(gs[1])
print("\nIntersection:", intersection)

# Check if polygons contain a specific point
point = Point(0.75, 0.75)
print("\nPolygon 1 contains point:", polygon1.contains(point))
print("Polygon 2 contains point:", polygon2.contains(point))

Output:

Areas:
0 1.0
1 1.0
dtype: float64

Intersection: POLYGON ((0.5 0.5, 1 0.5, 1 1, 0.5 1, 0.5 0.5))

Polygon 1 contains point: True
Polygon 2 contains point: True

Spatial Operations with GeoPandas

Spatial Joins

Spatial joins allow you to combine attributes from different datasets based on their spatial relationships:

python
# Create GeoDataFrame of cities
cities = gpd.GeoDataFrame(
{'name': ['New York', 'Paris', 'Tokyo', 'Cairo'],
'country': ['United States', 'France', 'Japan', 'Egypt']},
geometry=[
Point(-74.0060, 40.7128), # New York
Point(2.3522, 48.8566), # Paris
Point(139.6917, 35.6895), # Tokyo
Point(31.2357, 30.0444) # Cairo
],
crs="EPSG:4326"
)

# Perform spatial join to find which countries contain these cities
joined = gpd.sjoin(cities, world, how="left", predicate="within")
print(joined[['name_left', 'name_right']])

Output:

   name_left          name_right
0 New York United States of America
1 Paris France
2 Tokyo Japan
3 Cairo Egypt

Buffering and Overlay Operations

GeoPandas allows you to create buffers and perform overlay operations:

python
# Create buffers around cities (in degrees, since we're using WGS84)
cities_buffered = cities.copy()
cities_buffered['geometry'] = cities.buffer(1) # 1 degree buffer

# Plot the cities with their buffers
ax = world.plot(figsize=(12, 8), color='lightgrey')
cities_buffered.plot(ax=ax, alpha=0.5, color='red')
cities.plot(ax=ax, color='black', markersize=10)

This will plot the world map with cities as black points and red circular buffers around them.

Real-world Example: Analyzing Population Density

Let's combine what we've learned to analyze global population density:

python
# Load our world dataset again
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Calculate area in square kilometers (first converting to equal-area projection)
world_equal_area = world.to_crs('+proj=eck4')
world_equal_area['area_sqkm'] = world_equal_area.area / 10**6 # Convert to sq km

# Calculate population density
world_equal_area['pop_density'] = world_equal_area['pop_est'] / world_equal_area['area_sqkm']

# Plot population density
world_equal_area.plot(column='pop_density',
legend=True,
legend_kwds={'label': 'Population per sq km'},
figsize=(12, 8),
scheme='quantiles',
cmap='YlOrRd')

This code:

  1. Loads the world dataset
  2. Transforms it to an equal-area projection (to get accurate area calculations)
  3. Calculates the area in square kilometers
  4. Computes population density
  5. Creates a choropleth map showing population density

Summary

In this introduction to GeoPandas, we've covered:

  • The basics of GeoDataFrame and GeoSeries data structures
  • Reading and visualizing geospatial data
  • Working with coordinate reference systems
  • Performing spatial operations like joins, buffers, and overlays
  • A practical example with population density analysis

GeoPandas seamlessly integrates with Pandas, making it an excellent tool for combining traditional data analysis with geospatial capabilities. Whether you're working with city planning, environmental science, logistics, or just about any field that involves location data, GeoPandas provides a powerful toolkit for geospatial analysis in Python.

Additional Resources

Exercises

  1. Create a GeoDataFrame with the locations of five major cities and visualize them on a world map.
  2. Load a shapefile of countries and calculate the centroids for each country.
  3. Find all countries that share a border with Brazil using spatial joins.
  4. Create a map showing the 10 most populated countries with a color gradient.
  5. Calculate the distance between Paris and Tokyo using GeoPandas and appropriate projections.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)