Common Table Expressions
Introduction
Common Table Expressions (CTEs) are a powerful feature in SQL that allow you to write complex queries in a more readable and maintainable way. Think of them as temporary result sets or named subqueries that exist only for the duration of a query. CTEs are defined using the WITH
clause and can significantly improve how you structure and organize your SQL code.
If you've ever found yourself struggling with deeply nested subqueries or repeating the same subquery multiple times, CTEs offer an elegant solution to these problems.
Basic Syntax
The basic syntax of a CTE looks like this:
WITH cte_name AS (
-- Your SELECT statement goes here
)
SELECT * FROM cte_name;
Let's break down the components:
- The
WITH
keyword signals the start of a CTE definition cte_name
is the name you assign to your CTE- Inside the parentheses, you write a SELECT statement that defines your temporary result set
- After the CTE definition, you write your main query that can reference the CTE by name
Simple CTE Example
Let's start with a basic example. Imagine we have an employees
table and we want to find employees who earn above the average salary:
-- Without CTE (using subquery)
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- The same query using a CTE
WITH AverageSalary AS (
SELECT AVG(salary) AS avg_salary
FROM employees
)
SELECT e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e, AverageSalary a
WHERE e.salary > a.avg_salary;
Both queries produce the same result, but the CTE version is more readable and explicitly names the concept we're working with (the average salary).
Multiple CTEs
You can define multiple CTEs in a single query by separating them with commas:
WITH
HighSalaryEmployees AS (
SELECT * FROM employees WHERE salary > 75000
),
MarketingDepartment AS (
SELECT * FROM departments WHERE department_name = 'Marketing'
)
SELECT e.*
FROM HighSalaryEmployees e
JOIN MarketingDepartment d ON e.department_id = d.department_id;
This query finds high-earning employees in the Marketing department by combining two CTEs.
CTEs vs. Subqueries
While CTEs and subqueries can often accomplish the same tasks, CTEs offer several advantages:
- Readability: CTEs make complex queries more readable by breaking them into named, logical parts
- Reusability: A CTE can be referenced multiple times in the main query without rewriting the subquery
- Modularity: CTEs help you build complex queries step by step
- Recursive capabilities: CTEs can be recursive, which allows them to solve problems that can't be addressed with regular subqueries
Let's see an example of reusing a CTE multiple times:
WITH ProductStats AS (
SELECT
product_id,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
AVG(unit_price) AS avg_price
FROM order_details
GROUP BY product_id
)
SELECT
p.product_name,
ps.order_count,
ps.total_quantity,
ps.avg_price
FROM products p
JOIN ProductStats ps ON p.product_id = ps.product_id
WHERE ps.order_count > 10
OR ps.total_quantity > 100
OR ps.avg_price > 50;
By defining the ProductStats
CTE once, we can reference it multiple times in our WHERE conditions without having to rewrite the aggregation logic.
Recursive CTEs
One of the most powerful features of CTEs is their ability to be recursive. A recursive CTE references itself, allowing you to work with hierarchical or graph-like data structures, such as organizational charts, bill of materials, or category trees.
The syntax for a recursive CTE includes:
- An anchor member (the non-recursive part)
- A recursive member (the part that references the CTE itself)
- A UNION ALL operator connecting the two parts
Here's an example of using a recursive CTE to traverse an employee hierarchy:
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member: select the CEO (employee with no manager)
SELECT employee_id, first_name, last_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: select all direct reports
SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
first_name,
last_name,
level,
REPEAT(' ', level - 1) || first_name || ' ' || last_name AS hierarchy
FROM EmployeeHierarchy
ORDER BY level, first_name;
This query starts with the CEO (the employee with no manager) and recursively finds all employees who report to already-found employees, building a complete organizational hierarchy.
Output example:
employee_id | first_name | last_name | level | hierarchy
------------+------------+-----------+-------+------------------
1 | John | Smith | 1 | John Smith
3 | Alice | Johnson | 2 | Alice Johnson
5 | David | Brown | 2 | David Brown
8 | Emma | Davis | 3 | Emma Davis
12 | Michael | Wilson | 3 | Michael Wilson
15 | Sophia | Moore | 4 | Sophia Moore
Note: The RECURSIVE
keyword is required in some database systems like PostgreSQL and SQLite, while in others like SQL Server, it's implied and not needed.
Real-World Applications
CTEs are extremely useful in many practical scenarios. Here are some common applications:
1. Data Analysis and Reporting
CTEs can help structure complex analytical queries by breaking them into logical steps:
WITH
MonthlySales AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(amount) AS total_sales
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
),
MonthlyAverage AS (
SELECT
year,
AVG(total_sales) AS avg_monthly_sales
FROM MonthlySales
GROUP BY year
)
SELECT
ms.year,
ms.month,
ms.total_sales,
ma.avg_monthly_sales,
(ms.total_sales - ma.avg_monthly_sales) / ma.avg_monthly_sales * 100 AS percent_diff
FROM MonthlySales ms
JOIN MonthlyAverage ma ON ms.year = ma.year
ORDER BY ms.year, ms.month;
This query calculates monthly sales, yearly average sales, and the percentage difference between each month's sales and the yearly average.
2. Finding Gaps in Sequences
CTEs can help identify gaps in sequential data, like missing dates or IDs:
WITH
DateSequence AS (
SELECT generate_series(
'2023-01-01'::date,
'2023-01-31'::date,
'1 day'::interval
) AS date
),
SalesData AS (
SELECT DISTINCT sale_date
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
)
SELECT ds.date
FROM DateSequence ds
LEFT JOIN SalesData sd ON ds.date = sd.sale_date
WHERE sd.sale_date IS NULL
ORDER BY ds.date;
This query finds all dates in January 2023 that don't have any sales records.
3. Running Totals and Moving Averages
CTEs can be used to calculate running totals and moving averages:
WITH DailySales AS (
SELECT
sale_date,
SUM(amount) AS daily_total
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY sale_date
),
MovingAverage AS (
SELECT
sale_date,
daily_total,
AVG(daily_total) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg
FROM DailySales
)
SELECT * FROM MovingAverage
ORDER BY sale_date;
This query calculates the 7-day moving average of daily sales.
Performance Considerations
While CTEs improve query readability and maintainability, they may have performance implications:
- CTEs are typically materialized, meaning the database might create temporary results for each CTE
- In some database systems, the optimizer may not be able to push predicates into the CTE
- For complex queries, it's always a good idea to compare the execution plan of a CTE-based query with an equivalent subquery-based one
Most modern database systems are intelligent enough to optimize CTE usage, but be cautious with very large datasets or highly complex queries.
Common Table Expressions Across Database Systems
CTEs are supported by most major database systems, but there are slight syntax variations:
Database System | Support | Notes |
---|---|---|
PostgreSQL | Full | Requires RECURSIVE keyword for recursive CTEs |
MySQL (8.0+) | Full | Added support for recursive CTEs in version 8.0 |
SQL Server | Full | RECURSIVE keyword is optional |
Oracle | Full | Uses WITH clause; recursive CTEs supported since Oracle 11g R2 |
SQLite (3.8.3+) | Full | Requires RECURSIVE keyword for recursive CTEs |
Summary
Common Table Expressions (CTEs) are a powerful SQL feature that improve query readability, modularity, and maintainability. They allow you to:
- Write complex queries in a more structured and logical way
- Break down complicated logic into named, understandable parts
- Reuse subquery logic without duplicating code
- Handle recursive hierarchical data structures
Whether you're writing analytical queries, generating reports, or working with hierarchical data, CTEs can help you write cleaner, more maintainable SQL code.
Practice Exercises
To solidify your understanding of CTEs, try these exercises:
- Write a CTE to find the top 5 customers by total purchase amount
- Create a recursive CTE to generate a sequence of dates for the next 30 days
- Use multiple CTEs to analyze sales data by both product category and customer region
- Write a recursive CTE to find all subordinates of a specific manager in an employee hierarchy
- Create a CTE to calculate year-over-year growth percentages for monthly sales
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)