Skip to main content

Common Table Expressions

Introduction

Common Table Expressions (CTEs) are a powerful feature in SQL that allow you to write complex queries in a more readable and maintainable way. Think of them as temporary result sets or named subqueries that exist only for the duration of a query. CTEs are defined using the WITH clause and can significantly improve how you structure and organize your SQL code.

If you've ever found yourself struggling with deeply nested subqueries or repeating the same subquery multiple times, CTEs offer an elegant solution to these problems.

Basic Syntax

The basic syntax of a CTE looks like this:

sql
WITH cte_name AS (
-- Your SELECT statement goes here
)
SELECT * FROM cte_name;

Let's break down the components:

  1. The WITH keyword signals the start of a CTE definition
  2. cte_name is the name you assign to your CTE
  3. Inside the parentheses, you write a SELECT statement that defines your temporary result set
  4. After the CTE definition, you write your main query that can reference the CTE by name

Simple CTE Example

Let's start with a basic example. Imagine we have an employees table and we want to find employees who earn above the average salary:

sql
-- Without CTE (using subquery)
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- The same query using a CTE
WITH AverageSalary AS (
SELECT AVG(salary) AS avg_salary
FROM employees
)
SELECT e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e, AverageSalary a
WHERE e.salary > a.avg_salary;

Both queries produce the same result, but the CTE version is more readable and explicitly names the concept we're working with (the average salary).

Multiple CTEs

You can define multiple CTEs in a single query by separating them with commas:

sql
WITH 
HighSalaryEmployees AS (
SELECT * FROM employees WHERE salary > 75000
),
MarketingDepartment AS (
SELECT * FROM departments WHERE department_name = 'Marketing'
)
SELECT e.*
FROM HighSalaryEmployees e
JOIN MarketingDepartment d ON e.department_id = d.department_id;

This query finds high-earning employees in the Marketing department by combining two CTEs.

CTEs vs. Subqueries

While CTEs and subqueries can often accomplish the same tasks, CTEs offer several advantages:

  1. Readability: CTEs make complex queries more readable by breaking them into named, logical parts
  2. Reusability: A CTE can be referenced multiple times in the main query without rewriting the subquery
  3. Modularity: CTEs help you build complex queries step by step
  4. Recursive capabilities: CTEs can be recursive, which allows them to solve problems that can't be addressed with regular subqueries

Let's see an example of reusing a CTE multiple times:

sql
WITH ProductStats AS (
SELECT
product_id,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
AVG(unit_price) AS avg_price
FROM order_details
GROUP BY product_id
)
SELECT
p.product_name,
ps.order_count,
ps.total_quantity,
ps.avg_price
FROM products p
JOIN ProductStats ps ON p.product_id = ps.product_id
WHERE ps.order_count > 10
OR ps.total_quantity > 100
OR ps.avg_price > 50;

By defining the ProductStats CTE once, we can reference it multiple times in our WHERE conditions without having to rewrite the aggregation logic.

Recursive CTEs

One of the most powerful features of CTEs is their ability to be recursive. A recursive CTE references itself, allowing you to work with hierarchical or graph-like data structures, such as organizational charts, bill of materials, or category trees.

The syntax for a recursive CTE includes:

  1. An anchor member (the non-recursive part)
  2. A recursive member (the part that references the CTE itself)
  3. A UNION ALL operator connecting the two parts

Here's an example of using a recursive CTE to traverse an employee hierarchy:

sql
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member: select the CEO (employee with no manager)
SELECT employee_id, first_name, last_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive member: select all direct reports
SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
first_name,
last_name,
level,
REPEAT(' ', level - 1) || first_name || ' ' || last_name AS hierarchy
FROM EmployeeHierarchy
ORDER BY level, first_name;

This query starts with the CEO (the employee with no manager) and recursively finds all employees who report to already-found employees, building a complete organizational hierarchy.

Output example:

employee_id | first_name | last_name | level | hierarchy
------------+------------+-----------+-------+------------------
1 | John | Smith | 1 | John Smith
3 | Alice | Johnson | 2 | Alice Johnson
5 | David | Brown | 2 | David Brown
8 | Emma | Davis | 3 | Emma Davis
12 | Michael | Wilson | 3 | Michael Wilson
15 | Sophia | Moore | 4 | Sophia Moore

Note: The RECURSIVE keyword is required in some database systems like PostgreSQL and SQLite, while in others like SQL Server, it's implied and not needed.

Real-World Applications

CTEs are extremely useful in many practical scenarios. Here are some common applications:

1. Data Analysis and Reporting

CTEs can help structure complex analytical queries by breaking them into logical steps:

sql
WITH 
MonthlySales AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(amount) AS total_sales
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
),
MonthlyAverage AS (
SELECT
year,
AVG(total_sales) AS avg_monthly_sales
FROM MonthlySales
GROUP BY year
)
SELECT
ms.year,
ms.month,
ms.total_sales,
ma.avg_monthly_sales,
(ms.total_sales - ma.avg_monthly_sales) / ma.avg_monthly_sales * 100 AS percent_diff
FROM MonthlySales ms
JOIN MonthlyAverage ma ON ms.year = ma.year
ORDER BY ms.year, ms.month;

This query calculates monthly sales, yearly average sales, and the percentage difference between each month's sales and the yearly average.

2. Finding Gaps in Sequences

CTEs can help identify gaps in sequential data, like missing dates or IDs:

sql
WITH 
DateSequence AS (
SELECT generate_series(
'2023-01-01'::date,
'2023-01-31'::date,
'1 day'::interval
) AS date
),
SalesData AS (
SELECT DISTINCT sale_date
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
)
SELECT ds.date
FROM DateSequence ds
LEFT JOIN SalesData sd ON ds.date = sd.sale_date
WHERE sd.sale_date IS NULL
ORDER BY ds.date;

This query finds all dates in January 2023 that don't have any sales records.

3. Running Totals and Moving Averages

CTEs can be used to calculate running totals and moving averages:

sql
WITH DailySales AS (
SELECT
sale_date,
SUM(amount) AS daily_total
FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY sale_date
),
MovingAverage AS (
SELECT
sale_date,
daily_total,
AVG(daily_total) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg
FROM DailySales
)
SELECT * FROM MovingAverage
ORDER BY sale_date;

This query calculates the 7-day moving average of daily sales.

Performance Considerations

While CTEs improve query readability and maintainability, they may have performance implications:

  1. CTEs are typically materialized, meaning the database might create temporary results for each CTE
  2. In some database systems, the optimizer may not be able to push predicates into the CTE
  3. For complex queries, it's always a good idea to compare the execution plan of a CTE-based query with an equivalent subquery-based one

Most modern database systems are intelligent enough to optimize CTE usage, but be cautious with very large datasets or highly complex queries.

Common Table Expressions Across Database Systems

CTEs are supported by most major database systems, but there are slight syntax variations:

Database SystemSupportNotes
PostgreSQLFullRequires RECURSIVE keyword for recursive CTEs
MySQL (8.0+)FullAdded support for recursive CTEs in version 8.0
SQL ServerFullRECURSIVE keyword is optional
OracleFullUses WITH clause; recursive CTEs supported since Oracle 11g R2
SQLite (3.8.3+)FullRequires RECURSIVE keyword for recursive CTEs

Summary

Common Table Expressions (CTEs) are a powerful SQL feature that improve query readability, modularity, and maintainability. They allow you to:

  • Write complex queries in a more structured and logical way
  • Break down complicated logic into named, understandable parts
  • Reuse subquery logic without duplicating code
  • Handle recursive hierarchical data structures

Whether you're writing analytical queries, generating reports, or working with hierarchical data, CTEs can help you write cleaner, more maintainable SQL code.

Practice Exercises

To solidify your understanding of CTEs, try these exercises:

  1. Write a CTE to find the top 5 customers by total purchase amount
  2. Create a recursive CTE to generate a sequence of dates for the next 30 days
  3. Use multiple CTEs to analyze sales data by both product category and customer region
  4. Write a recursive CTE to find all subordinates of a specific manager in an employee hierarchy
  5. Create a CTE to calculate year-over-year growth percentages for monthly sales

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)