Execution Plans

Introduction

When you run a SQL query on a database, the database engine doesn't just immediately start fetching data. Instead, it first creates a roadmap for how to retrieve the data most efficiently. This roadmap is called an execution plan.

Think of an execution plan as the GPS navigation system for your database query. Just as a GPS might suggest different routes depending on traffic conditions, the database optimizer creates an execution plan that represents what it believes is the most efficient way to execute your query.

Understanding execution plans is crucial for writing performant database queries and troubleshooting slow operations. In this article, we'll explore what execution plans are, how to read them, and how to use them to optimize your database queries.

What is an Execution Plan?

An execution plan (or query plan) is a sequence of steps that the database engine follows to retrieve or modify data. The database query optimizer generates this plan by:

Parsing the SQL query
Analyzing available indexes
Considering table statistics (size, data distribution)
Evaluating different access methods
Choosing what it calculates to be the optimal plan

Execution plans show you exactly how the database will execute your query - whether it will use indexes, perform table scans, the order of operations, and how tables will be joined.

Types of Execution Plans

Most database systems offer two main types of execution plans:

Estimated Execution Plan

Generated without actually running the query
Based on statistics and estimates
Useful for quickly testing query performance without affecting the database

Actual Execution Plan

Generated by actually executing the query
Contains runtime metrics like actual row counts
More accurate but requires running the query

How to View Execution Plans

Different database systems have different methods to view execution plans. Here are examples for some popular databases:

SQL Server

-- Estimated execution plan
EXPLAIN SELECT * FROM customers WHERE customer_id = 123;

-- Actual execution plan
SET STATISTICS PROFILE ON;
SELECT * FROM customers WHERE customer_id = 123;
SET STATISTICS PROFILE OFF;

-- Or using Management Studio, press Ctrl+M before running your query

MySQL

EXPLAIN SELECT * FROM customers WHERE customer_id = 123;

-- For more detailed information
EXPLAIN ANALYZE SELECT * FROM customers WHERE customer_id = 123;

PostgreSQL

EXPLAIN SELECT * FROM customers WHERE customer_id = 123;

-- With execution statistics
EXPLAIN ANALYZE SELECT * FROM customers WHERE customer_id = 123;

Oracle

EXPLAIN PLAN FOR SELECT * FROM customers WHERE customer_id = 123;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Reading an Execution Plan

Let's look at a sample execution plan from PostgreSQL:

EXPLAIN ANALYZE SELECT customers.name, orders.order_date 
FROM customers JOIN orders ON customers.id = orders.customer_id
WHERE orders.total > 100;

Example output:

Hash Join  (cost=33.38..62.84 rows=485 width=40) (actual time=0.362..0.364 rows=15 loops=1)
  Hash Cond: (orders.customer_id = customers.id)
  ->  Seq Scan on orders  (cost=0.00..25.70 rows=485 width=16) (actual time=0.025..0.026 rows=15 loops=1)
        Filter: (total > 100)
        Rows Removed by Filter: 5
  ->  Hash  (cost=21.30..21.30 rows=961 width=36) (actual time=0.249..0.249 rows=20 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on customers  (cost=0.00..21.30 rows=961 width=36) (actual time=0.011..0.012 rows=20 loops=1)

Let's break down this execution plan:

Reading Direction: Execution plans are read from the innermost (most indented) operations to the outermost operations, and from bottom to top.
Common Operations:
- Seq Scan: A sequential scan reads every row in a table (similar to a full table scan).
- Index Scan: Uses an index to find specific rows.
- Hash Join: Creates a hash table from one dataset and probes it with another dataset.
- Nested Loop: For each row in the outer table, scans the inner table for matches.
- Sort: Sorts data, often for ORDER BY or GROUP BY operations.
Cost Metrics:
- cost: Estimated computational effort (not in time units)
- rows: Estimated number of rows
- actual time: Actual execution time in milliseconds
- actual rows: Actual number of rows processed

In our example, the database:

Performs a sequential scan on the customers table (bottom operation)
Creates a hash table from the customers data
Performs a sequential scan on the orders table, filtering for orders with total > 100
Performs a hash join, matching customer_id in orders with id in customers

Visualizing Execution Plans

Execution plans can be visualized as trees, making them easier to understand:

Common Issues Identified by Execution Plans

Execution plans can help you identify several common performance issues:

1. Table Scans Instead of Index Scans

Problem: When you see a "Seq Scan" (sequential scan) on a large table, it means the database is reading every row instead of using an index.

Example:

-- Query
SELECT * FROM customers WHERE last_name = 'Smith';

-- Execution plan showing a problem
Seq Scan on customers  (cost=0.00..155.00 rows=15 width=540)
  Filter: (last_name = 'Smith'::text)

Solution: Create an index on the column being filtered:

CREATE INDEX idx_customers_last_name ON customers(last_name);

After creating the index, the execution plan might change to:

Index Scan using idx_customers_last_name on customers  (cost=0.29..8.31 rows=15 width=540)
  Index Cond: (last_name = 'Smith'::text)

2. Inefficient Joins

Problem: Nested loop joins on large tables can be expensive.

Example:

-- Query
SELECT c.name, o.order_date FROM customers c, orders o 
WHERE c.id = o.customer_id;

-- Problematic execution plan
Nested Loop  (cost=0.00..20433.00 rows=10000 width=16)
  ->  Seq Scan on customers c  (cost=0.00..155.00 rows=10000 width=8)
  ->  Filter: (customer_id = c.id)
      ->  Seq Scan on orders o  (cost=0.00..155.00 rows=1 width=12)

Solution: Ensure both join columns are indexed and use explicit join syntax:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Improved query
SELECT c.name, o.order_date 
FROM customers c JOIN orders o ON c.id = o.customer_id;

3. Unnecessary Sorting

Problem: Sorting large datasets is expensive.

Example:

-- Query
SELECT * FROM orders ORDER BY order_date;

-- Execution plan showing expensive sort
Sort  (cost=358.13..383.13 rows=10000 width=50)
  Sort Key: order_date
  ->  Seq Scan on orders  (cost=0.00..115.00 rows=10000 width=50)

Solution: Create an index on the sorting column:

CREATE INDEX idx_orders_order_date ON orders(order_date);

The new execution plan might be:

Index Scan using idx_orders_order_date on orders  (cost=0.29..283.29 rows=10000 width=50)

Optimizing Based on Execution Plans

Let's go through a real-world example of optimizing a query using execution plans:

Original Query

SELECT p.product_name, c.category_name, SUM(od.quantity) as total_sold
FROM products p
JOIN order_details od ON p.product_id = od.product_id
JOIN categories c ON p.category_id = c.category_id
WHERE od.order_date > '2023-01-01'
GROUP BY p.product_name, c.category_name
ORDER BY total_sold DESC;

Original Execution Plan

Sort  (cost=15432.34..15482.34 rows=20000 width=68) (actual time=354.182..354.192 rows=537 loops=1)
  Sort Key: (sum(od.quantity)) DESC
  ->  HashAggregate  (cost=14782.61..14982.61 rows=20000 width=68) (actual time=353.856..354.023 rows=537 loops=1)
        Group Key: p.product_name, c.category_name
        ->  Hash Join  (cost=1465.12..12782.61 rows=200000 width=36) (actual time=15.107..351.290 rows=6819 loops=1)
              Hash Cond: (od.product_id = p.product_id)
              ->  Seq Scan on order_details od  (cost=0.00..8541.32 rows=200000 width=12) (actual time=0.018..337.566 rows=6819 loops=1)
                    Filter: (order_date > '2023-01-01'::date)
                    Rows Removed by Filter: 193181
              ->  Hash  (cost=958.87..958.87 rows=40499 width=40) (actual time=15.065..15.066 rows=1000 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 4MB
                    ->  Hash Join  (cost=26.82..958.87 rows=40499 width=40) (actual time=0.161..14.224 rows=1000 loops=1)
                          Hash Cond: (p.category_id = c.category_id)
                          ->  Seq Scan on products p  (cost=0.00..722.99 rows=40499 width=20) (actual time=0.009..13.048 rows=1000 loops=1)
                          ->  Hash  (cost=15.30..15.30 rows=919 width=36) (actual time=0.130..0.130 rows=10 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                ->  Seq Scan on categories c  (cost=0.00..15.30 rows=919 width=36) (actual time=0.013..0.116 rows=10 loops=1)

Issues Identified

Sequential scan on order_details table with a date filter
Sequential scan on products table
Inefficient joins due to missing indexes

Optimized Solution

Add an index on the order_date column to improve filtering:

CREATE INDEX idx_order_details_order_date ON order_details(order_date);

Add indexes on the join columns:

CREATE INDEX idx_order_details_product_id ON order_details(product_id);
CREATE INDEX idx_products_category_id ON products(category_id);

Create a covering index for the sort operation:

CREATE INDEX idx_order_details_quantity ON order_details(product_id, quantity);

Results After Optimization

The optimized execution plan might look like:

Sort  (cost=2243.34..2293.34 rows=20000 width=68) (actual time=74.182..74.192 rows=537 loops=1)
  Sort Key: (sum(od.quantity)) DESC
  ->  HashAggregate  (cost=1593.61..1793.61 rows=20000 width=68) (actual time=73.856..74.023 rows=537 loops=1)
        Group Key: p.product_name, c.category_name
        ->  Hash Join  (cost=465.12..1082.61 rows=51000 width=36) (actual time=5.107..71.290 rows=6819 loops=1)
              Hash Cond: (od.product_id = p.product_id)
              ->  Index Scan using idx_order_details_order_date on order_details od  (cost=0.42..541.32 rows=51000 width=12) (actual time=0.018..57.566 rows=6819 loops=1)
                    Index Cond: (order_date > '2023-01-01'::date)
              ->  Hash  (cost=358.87..358.87 rows=40499 width=40) (actual time=5.065..5.066 rows=1000 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 128kB
                    ->  Hash Join  (cost=16.82..358.87 rows=40499 width=40) (actual time=0.121..4.224 rows=1000 loops=1)
                          Hash Cond: (p.category_id = c.category_id)
                          ->  Index Scan using idx_products_category_id on products p  (cost=0.29..222.99 rows=40499 width=20) (actual time=0.009..3.048 rows=1000 loops=1)
                          ->  Hash  (cost=15.30..15.30 rows=919 width=36) (actual time=0.100..0.100 rows=10 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                ->  Seq Scan on categories c  (cost=0.00..15.30 rows=919 width=36) (actual time=0.013..0.086 rows=10 loops=1)

After optimization, we've dramatically reduced the cost and execution time of the query, as shown in the improved execution plan.

Best Practices for Working with Execution Plans

Focus on the most expensive operations first: Look for operations with the highest cost or time values.
Compare estimated vs. actual rows: Large discrepancies indicate outdated statistics.
Look for sequential scans on large tables: These are often opportunities for indexing.
Examine join operations: Nested loops on large datasets are usually inefficient.

Update statistics regularly: Ensure the optimizer has accurate data to work with:

-- In SQL Server
UPDATE STATISTICS table_name;

-- In PostgreSQL
ANALYZE table_name;

-- In MySQL
ANALYZE TABLE table_name;

Test in a development environment: Never add indexes or make changes directly in production without testing.
Consider the trade-offs: Adding indexes speeds up SELECT queries but can slow down INSERT, UPDATE, and DELETE operations.

Summary

Execution plans are powerful tools that reveal how the database engine processes your queries. By understanding and analyzing these plans, you can:

Identify performance bottlenecks in your queries
Make informed decisions about index creation
Optimize your database schema and queries
Troubleshoot slow-running operations

Remember that the database optimizer doesn't always get it right. Statistics might be outdated, or the optimizer might not understand your data distribution well. By learning to read and interpret execution plans, you gain the power to take control of your query performance.

Further Learning

To deepen your understanding of execution plans:

Learn about statistics and how they affect the optimizer's decisions
Explore query hints and how they can influence execution plans
Study index types (B-tree, hash, bitmap) and their impact on different operations
Practice reading execution plans with increasingly complex queries

Exercises

Generate and compare execution plans for the following queries:

SELECT * FROM customers WHERE last_name = 'Smith';
SELECT * FROM customers WHERE customer_id = 100;

Which one is more efficient and why?

Write a query that joins three tables of your choice, then generate its execution plan. Identify at least two ways to optimize it.
Find a query in your existing codebase that runs slowly. Generate its execution plan and suggest improvements.
Experiment with different index types on a test table and observe how they affect execution plans for various query patterns.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is an Execution Plan?​

Types of Execution Plans​

Estimated Execution Plan​

Actual Execution Plan​

How to View Execution Plans​

SQL Server​

MySQL​

PostgreSQL​

Oracle​

Reading an Execution Plan​

Visualizing Execution Plans​

Common Issues Identified by Execution Plans​

1. Table Scans Instead of Index Scans​

2. Inefficient Joins​

3. Unnecessary Sorting​

Optimizing Based on Execution Plans​

Original Query​

Original Execution Plan​

Issues Identified​

Optimized Solution​

Results After Optimization​

Best Practices for Working with Execution Plans​

Summary​

Further Learning​

Exercises​

Introduction

What is an Execution Plan?

Types of Execution Plans

Estimated Execution Plan

Actual Execution Plan

How to View Execution Plans

SQL Server

MySQL

PostgreSQL

Oracle

Reading an Execution Plan

Visualizing Execution Plans

Common Issues Identified by Execution Plans

1. Table Scans Instead of Index Scans

2. Inefficient Joins

3. Unnecessary Sorting

Optimizing Based on Execution Plans

Original Query

Original Execution Plan

Issues Identified

Optimized Solution

Results After Optimization

Best Practices for Working with Execution Plans

Summary

Further Learning

Exercises