PostgreSQL Recursive Queries
Introduction
Have you ever needed to query hierarchical data like an organizational chart, file system, or category tree? Or perhaps traverse a graph of relationships? These tasks can be challenging with standard SQL queries, but PostgreSQL makes them elegant and efficient with recursive queries.
Recursive queries allow you to start with a base result set and then repeatedly apply additional queries to that result until you've retrieved all the data you need. They're implemented in PostgreSQL using a special form of Common Table Expressions (CTEs) and are incredibly powerful for working with hierarchical or graph-structured data.
In this tutorial, we'll explore:
- What recursive queries are and when to use them
- The syntax and structure of recursive CTEs
- Step-by-step examples with real-world applications
- Best practices and optimization techniques
Understanding Recursive CTEs
A recursive Common Table Expression (CTE) has two main parts:
- A non-recursive term: The initial query that forms the base result set
- A recursive term: A query that references the CTE itself and is applied repeatedly
The syntax follows this pattern:
WITH RECURSIVE cte_name AS (
-- non-recursive term: initial query to get base result
SELECT ...
UNION [ALL]
-- recursive term: query that references cte_name
SELECT ... FROM ... JOIN cte_name ON ...
)
SELECT * FROM cte_name;
The query execution works like this:
- Execute the non-recursive term to create the initial working table
- While the working table is not empty:
- Execute the recursive term, substituting the current working table for the recursive self-reference
- Append results to the result set
- Replace the working table with the new results
Let's visualize this process:
Basic Example: Generating a Sequence
Let's start with a simple example to generate a sequence of numbers from 1 to 10:
WITH RECURSIVE numbers AS (
-- non-recursive term: start with 1
SELECT 1 AS n
UNION
-- recursive term: add 1 to previous number, stop at 10
SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT * FROM numbers;
Output:
n
---
1
2
3
4
5
6
7
8
9
10
In this example:
- The non-recursive term returns a single row with the value 1
- The recursive term adds 1 to each number and includes it if less than 10
- The process continues until no more rows are returned (when n reaches 10)
Real-World Example 1: Employee Hierarchy
Let's explore a practical example with an organizational hierarchy. Consider this employees
table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT NULL,
FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
);
INSERT INTO employees VALUES
(1, 'Sarah Johnson', NULL), -- CEO
(2, 'Michael Chen', 1), -- CTO, reports to CEO
(3, 'Emily Davis', 1), -- CFO, reports to CEO
(4, 'Alex Rodriguez', 2), -- Dev Manager, reports to CTO
(5, 'Lisa Wang', 2), -- QA Manager, reports to CTO
(6, 'David Kim', 4), -- Developer, reports to Dev Manager
(7, 'Priya Patel', 4), -- Developer, reports to Dev Manager
(8, 'James Wilson', 5); -- QA Tester, reports to QA Manager
Now, let's write a recursive query to display all employees under a specific manager, including their reporting level:
WITH RECURSIVE org_hierarchy AS (
-- non-recursive term: start with the selected manager
SELECT employee_id, name, manager_id, 0 AS level
FROM employees
WHERE employee_id = 2 -- Michael Chen (CTO)
UNION ALL
-- recursive term: find all employees who report to someone in the current result
SELECT e.employee_id, e.name, e.manager_id, h.level + 1
FROM employees e
JOIN org_hierarchy h ON e.manager_id = h.employee_id
)
SELECT
level,
REPEAT(' ', level) || name AS employee,
employee_id,
manager_id
FROM org_hierarchy
ORDER BY level, name;
Output:
level | employee | employee_id | manager_id
-------|-----------------------|-------------|------------
0 | Michael Chen | 2 | 1
1 | Alex Rodriguez | 4 | 2
1 | Lisa Wang | 5 | 2
2 | David Kim | 6 | 4
2 | James Wilson | 8 | 5
2 | Priya Patel | 7 | 4
This query:
- Starts with Michael Chen (the CTO)
- Finds all employees who report directly to Michael
- Then finds all employees who report to those direct reports
- Continues until all levels of the hierarchy are found
- Uses the
level
field to track the depth in the organization
Real-World Example 2: Category Hierarchy
Another common use case for recursive queries is working with category hierarchies, like product categories in an e-commerce system:
CREATE TABLE categories (
category_id INT PRIMARY KEY,
name VARCHAR(100),
parent_id INT NULL,
FOREIGN KEY (parent_id) REFERENCES categories(category_id)
);
INSERT INTO categories VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Gaming Laptops', 3),
(5, 'Business Laptops', 3),
(6, 'Desktop Computers', 2),
(7, 'Smartphones', 1),
(8, 'Android Phones', 7),
(9, 'iOS Phones', 7);
Let's write a query to display the full category path for each category:
WITH RECURSIVE category_paths AS (
-- non-recursive term: start with top-level categories
SELECT
category_id,
name,
parent_id,
name AS path
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- recursive term: join each category with its parent's path
SELECT
c.category_id,
c.name,
c.parent_id,
cp.path || ' > ' || c.name
FROM categories c
JOIN category_paths cp ON c.parent_id = cp.category_id
)
SELECT category_id, name, path FROM category_paths
ORDER BY path;
Output:
category_id | name | path
-------------|-----------------|--------------------------------
1 | Electronics | Electronics
2 | Computers | Electronics > Computers
7 | Smartphones | Electronics > Smartphones
3 | Laptops | Electronics > Computers > Laptops
6 | Desktop Computers | Electronics > Computers > Desktop Computers
8 | Android Phones | Electronics > Smartphones > Android Phones
9 | iOS Phones | Electronics > Smartphones > iOS Phones
4 | Gaming Laptops | Electronics > Computers > Laptops > Gaming Laptops
5 | Business Laptops | Electronics > Computers > Laptops > Business Laptops
This query:
- Starts with top-level categories (those with NULL parent_id)
- For each category, joins with its parent to build the full path
- Continues recursively until all categories have been processed
Real-World Example 3: Graph Traversal
Recursive queries can also be used for graph traversal problems, such as finding all possible routes between cities:
CREATE TABLE routes (
departure_city VARCHAR(100),
arrival_city VARCHAR(100),
distance INT,
PRIMARY KEY (departure_city, arrival_city)
);
INSERT INTO routes VALUES
('New York', 'Boston', 215),
('New York', 'Washington', 225),
('Boston', 'Chicago', 960),
('Boston', 'Toronto', 550),
('Washington', 'Miami', 1095),
('Washington', 'Chicago', 700),
('Chicago', 'Los Angeles', 2015),
('Toronto', 'Vancouver', 3370),
('Miami', 'Los Angeles', 2735);
Now, let's find all possible routes from New York to Los Angeles, with a maximum of 3 connections:
WITH RECURSIVE possible_routes AS (
-- non-recursive term: direct flights from New York
SELECT
departure_city,
arrival_city,
ARRAY[departure_city, arrival_city] AS route,
distance,
1 AS connections
FROM routes
WHERE departure_city = 'New York'
UNION ALL
-- recursive term: add connections where possible
SELECT
r.departure_city,
r.arrival_city,
pr.route || r.arrival_city,
pr.distance + r.distance,
pr.connections + 1
FROM routes r
JOIN possible_routes pr ON r.departure_city = pr.arrival_city
WHERE
pr.connections < 3 AND -- limit to 3 connections
NOT r.arrival_city = ANY(pr.route) -- avoid cycles
)
SELECT
route,
distance,
connections
FROM possible_routes
WHERE arrival_city = 'Los Angeles'
ORDER BY distance;
Output:
route | distance | connections
-------------------------------------------|----------|------------
{New York,Washington,Chicago,Los Angeles} | 2940 | 3
{New York,Boston,Chicago,Los Angeles} | 3190 | 3
This query:
- Starts with all direct flights from New York
- Recursively adds connections where the departure city matches the previous arrival city
- Tracks the complete route using an array
- Avoids cycles by ensuring we don't revisit cities
- Limits to a maximum of 3 connections
- Finally filters for routes ending in Los Angeles
Advanced Techniques and Optimizations
1. Cycle Detection
In graph traversal queries, you might encounter cycles where you revisit the same node. The example above used the NOT x = ANY(array)
technique, but PostgreSQL also provides a built-in cycle detection mechanism:
WITH RECURSIVE search_graph(id, link, path, cycle) AS (
SELECT g.id, g.link, ARRAY[g.id], false
FROM graph g
WHERE id = 'start_node'
UNION ALL
SELECT g.id, g.link, path || g.id, g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
The cycle
column becomes true if we revisit a node, allowing us to stop recursion.
2. Limiting Recursion Depth
The connections < 3
condition in our routes example limits recursion depth. This is crucial for performance in large graphs:
WITH RECURSIVE limited_recursion AS (
SELECT ..., 1 AS depth
FROM ...
UNION ALL
SELECT ..., lr.depth + 1
FROM ... JOIN limited_recursion lr ON ...
WHERE lr.depth < 10 -- maximum recursion depth
)
3. Using UNION vs. UNION ALL
UNION
eliminates duplicates but requires additional sorting and comparisonUNION ALL
keeps all rows including duplicates but is faster- Use
UNION ALL
when you're sure duplicates won't occur or don't matter - Use
UNION
when you need to ensure unique results
4. Optimizing Join Operations
Join operations in the recursive term can be performance-intensive. Consider:
- Adding appropriate indexes
- Limiting the working set size early
- Using more specific join conditions
Common Pitfalls and Solutions
1. Infinite Recursion
Without proper termination conditions, recursive queries can run indefinitely:
-- Dangerous! No termination condition
WITH RECURSIVE infinite AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM infinite
)
SELECT * FROM infinite; -- Will run until hitting resource limits
Always include a termination condition:
WITH RECURSIVE safe AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM safe WHERE n < 100 -- Clear termination
)
SELECT * FROM safe;
2. Performance Issues with Deep Recursion
For very deep hierarchies, performance can degrade. Consider:
- Limiting recursion depth
- Using materialized paths or nested set models for truly large hierarchies
- Breaking complex queries into multiple simpler recursive queries
3. Order of Operations
The order of operations in recursive queries can affect both results and performance. Be careful with:
- WHERE clauses (apply them early)
- JOIN operations (choose the right join type)
- Aggregations (usually need to be outside the recursive CTE)
Summary
Recursive queries in PostgreSQL are a powerful tool for working with hierarchical and graph-structured data. They allow you to:
- Navigate parent-child relationships
- Build paths through complex data structures
- Generate sequences and series
- Solve problems that would be difficult or impossible with standard SQL
Key concepts to remember:
- A recursive CTE consists of a non-recursive term and a recursive term connected by UNION or UNION ALL
- The non-recursive term forms the base result set
- The recursive term refers to the CTE itself and expands the result set
- Always include a termination condition to prevent infinite recursion
- Consider performance implications for large or deeply nested data
Exercises
- Basic: Write a recursive query to generate Fibonacci numbers up to 100.
- Intermediate: Using the employees table from our example, write a query to find all employees at exactly level 2 under the CEO.
- Advanced: Extend the routes example to find all possible routes between two cities with the fewest connections, regardless of distance.
Additional Resources
- PostgreSQL Documentation on Recursive Queries
- PostgreSQL: Introduction to WITH Queries (CTEs)
- PostgreSQL Wiki: Recursive Queries
By mastering recursive queries in PostgreSQL, you'll be able to solve complex data problems with elegant, efficient SQL - a valuable skill for any database developer!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)