Skip to main content

PostgreSQL Common Table Expressions

Introduction

Common Table Expressions (CTEs) are a powerful feature in PostgreSQL that allow you to write auxiliary statements for use in larger queries. Think of CTEs as temporary result sets or virtual tables that exist only during the execution of your query. They're defined using the WITH clause and can significantly improve the readability and maintainability of complex SQL queries.

CTEs are particularly useful when:

  • You need to reference the same subquery multiple times
  • You want to make complex queries more readable by breaking them down
  • You need to create recursive queries
  • You want to improve query performance in certain scenarios

Let's dive into how CTEs work and explore their various applications in PostgreSQL.

Basic Syntax

The basic syntax for a CTE is:

sql
WITH cte_name AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT * FROM cte_name;

Here's how it works:

  1. The WITH keyword starts the CTE definition
  2. cte_name is what you'll use to reference this temporary result set
  3. Inside the parentheses, you write a valid SELECT query
  4. After closing the CTE definition, you write your main query that can reference the CTE

Simple CTE Example

Let's start with a basic example. Suppose we have a sales table with customer purchases:

sql
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
customer_id INTEGER,
product_id INTEGER,
sale_date DATE,
amount DECIMAL(10, 2)
);

INSERT INTO sales (customer_id, product_id, sale_date, amount)
VALUES
(1, 101, '2023-01-15', 150.00),
(1, 102, '2023-01-20', 75.50),
(2, 101, '2023-01-16', 150.00),
(3, 103, '2023-01-18', 200.25),
(2, 104, '2023-01-21', 50.75),
(1, 103, '2023-01-25', 200.25);

Now, let's use a CTE to find customers who have spent more than $300 in total:

sql
WITH customer_totals AS (
SELECT
customer_id,
SUM(amount) AS total_spent
FROM
sales
GROUP BY
customer_id
)
SELECT
customer_id,
total_spent
FROM
customer_totals
WHERE
total_spent > 300;

Output:

 customer_id | total_spent
-------------+-------------
1 | 425.75

In this example, we first create a CTE named customer_totals that calculates the sum of amounts for each customer. Then, in our main query, we select from this CTE and filter for customers who spent more than $300.

Multiple CTEs

You can define multiple CTEs separated by commas within a single WITH clause:

sql
WITH customer_totals AS (
SELECT
customer_id,
SUM(amount) AS total_spent
FROM
sales
GROUP BY
customer_id
),
big_spenders AS (
SELECT
customer_id,
total_spent
FROM
customer_totals
WHERE
total_spent > 200
)
SELECT
customer_id,
total_spent,
'Big Spender' AS customer_type
FROM
big_spenders;

Output:

 customer_id | total_spent | customer_type
-------------+-------------+---------------
1 | 425.75 | Big Spender
2 | 200.75 | Big Spender

CTEs Referencing Other CTEs

One CTE can reference another CTE defined earlier in the same WITH clause:

sql
WITH monthly_sales AS (
SELECT
EXTRACT(MONTH FROM sale_date) AS month,
SUM(amount) AS monthly_total
FROM
sales
GROUP BY
EXTRACT(MONTH FROM sale_date)
),
average_monthly_sales AS (
SELECT AVG(monthly_total) AS avg_monthly
FROM monthly_sales
)
SELECT
month,
monthly_total,
avg_monthly,
monthly_total - avg_monthly AS difference
FROM
monthly_sales,
average_monthly_sales
ORDER BY
month;

Output (assuming all data is from January):

 month | monthly_total | avg_monthly | difference
-------+---------------+-------------+------------
1 | 826.75 | 826.75 | 0.00

Recursive CTEs

One of the most powerful features of CTEs is their ability to be recursive. Recursive CTEs allow you to reference the CTE itself within its definition, which is excellent for working with hierarchical or graph-like data.

The syntax for a recursive CTE is:

sql
WITH RECURSIVE cte_name AS (
-- Non-recursive term (base case)
SELECT ...

UNION [ALL]

-- Recursive term (references cte_name)
SELECT ... FROM cte_name ...
)
SELECT * FROM cte_name;

Let's create an example with an employee hierarchy:

sql
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
manager_id INTEGER NULL
);

INSERT INTO employees (id, name, manager_id)
VALUES
(1, 'John (CEO)', NULL),
(2, 'Sarah', 1),
(3, 'Michael', 1),
(4, 'David', 2),
(5, 'Lisa', 2),
(6, 'Robert', 3),
(7, 'Jennifer', 3),
(8, 'Thomas', 4),
(9, 'Emma', 7);

Now, let's use a recursive CTE to find all employees under Sarah (ID 2), including those several levels down:

sql
WITH RECURSIVE employee_hierarchy AS (
-- Base case: start with Sarah (ID 2)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE id = 2

UNION

-- Recursive case: find all employees reporting to someone in our CTE
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT
id,
LPAD('', (level - 1) * 4, ' ') || name AS employee,
level
FROM
employee_hierarchy
ORDER BY
level, id;

Output:

 id |     employee     | level
----+------------------+-------
2 | Sarah | 1
4 | David | 2
5 | Lisa | 2
8 | Thomas | 3

This example shows the power of recursive CTEs for traversing hierarchical data. The query first selects Sarah as the base case, then recursively finds all employees who report to anyone already in the results.

Real-World Applications

1. Analyzing Customer Purchase Patterns

CTEs can be useful for cohort analysis and understanding customer behavior over time:

sql
WITH first_purchase AS (
SELECT
customer_id,
MIN(sale_date) AS first_purchase_date
FROM
sales
GROUP BY
customer_id
),
purchase_frequency AS (
SELECT
s.customer_id,
COUNT(*) AS num_purchases,
MAX(sale_date) - MIN(sale_date) AS date_range
FROM
sales s
GROUP BY
s.customer_id
HAVING
COUNT(*) > 1
)
SELECT
fp.customer_id,
fp.first_purchase_date,
pf.num_purchases,
pf.date_range AS days_between_first_and_last,
CASE
WHEN pf.date_range > 0 THEN
(pf.num_purchases::FLOAT - 1) / (pf.date_range / 30)
ELSE 0
END AS monthly_purchase_rate
FROM
first_purchase fp
JOIN
purchase_frequency pf ON fp.customer_id = pf.customer_id
ORDER BY
monthly_purchase_rate DESC;

2. Filling Date Gaps with a Recursive CTE

Generating a complete date series to fill gaps in data:

sql
WITH RECURSIVE date_series AS (
-- Base case: start with the earliest date
SELECT MIN(sale_date) AS date
FROM sales

UNION ALL

-- Recursive case: add one day until we reach the end date
SELECT date + 1
FROM date_series
WHERE date < (SELECT MAX(sale_date) FROM sales)
)
SELECT
ds.date,
COALESCE(SUM(s.amount), 0) AS daily_sales
FROM
date_series ds
LEFT JOIN
sales s ON ds.date = s.sale_date
GROUP BY
ds.date
ORDER BY
ds.date;

3. Finding Median Values

PostgreSQL doesn't have a built-in median function, but you can calculate it using a CTE:

sql
WITH numbered_rows AS (
SELECT
amount,
ROW_NUMBER() OVER (ORDER BY amount) AS row_num,
COUNT(*) OVER () AS total_count
FROM
sales
)
SELECT
AVG(amount) AS median
FROM
numbered_rows
WHERE
row_num BETWEEN total_count/2.0 AND total_count/2.0 + 1
OR
(total_count % 2 = 1 AND row_num = (total_count+1)/2);

Performance Considerations

While CTEs make queries more readable, there are some performance considerations to keep in mind:

  1. Materialization: In PostgreSQL versions before 12, CTEs were always materialized, meaning they were evaluated once and the results stored in memory. This could sometimes lead to worse performance compared to using subqueries.

  2. Optimization: Starting with PostgreSQL 12, the optimizer can sometimes inline CTEs (treating them like views or subqueries) when appropriate, potentially improving performance.

  3. Explicit Materialization: If you need to force materialization for performance reasons, you can use the MATERIALIZED keyword (in PostgreSQL 12+):

sql
WITH my_cte AS MATERIALIZED (
-- complex query here
)
SELECT * FROM my_cte;
  1. Or prevent materialization:
sql
WITH my_cte AS NOT MATERIALIZED (
-- query here
)
SELECT * FROM my_cte;

Debugging Complex Queries with CTEs

CTEs are excellent tools for debugging complex queries. You can break down a complicated query into smaller parts and check the output of each part:

sql
WITH step1 AS (
-- First part of your complex query
SELECT product_id, COUNT(*) AS sales_count
FROM sales
GROUP BY product_id
),
step2 AS (
-- Second part
SELECT product_id, sales_count,
RANK() OVER (ORDER BY sales_count DESC) AS sales_rank
FROM step1
)
-- Use this to debug step1
-- SELECT * FROM step1;

-- Use this to debug step2
-- SELECT * FROM step2;

-- Final query
SELECT * FROM step2 WHERE sales_rank <= 3;

Just uncomment the SELECT statements for the steps you want to debug.

Summary

Common Table Expressions (CTEs) are a powerful feature in PostgreSQL that help you:

  • Write more readable and maintainable complex queries
  • Break down complex logic into simpler, named components
  • Create recursive queries for hierarchical data
  • Reference the same subquery multiple times without duplication
  • Debug complex queries step by step

CTEs follow a simple syntax pattern with the WITH keyword, but they enable sophisticated query capabilities that would otherwise be difficult to express. Whether you're working with hierarchical data, need to perform multi-step calculations, or simply want to make your queries more readable, CTEs are an essential tool in your PostgreSQL toolkit.

Exercises

  1. Create a CTE to find products that have been purchased by more than 2 different customers.
  2. Write a recursive CTE to generate a sequence of dates for the next 30 days.
  3. Use CTEs to rank customers by their total purchases and then find the top 20% of customers.
  4. Create a CTE to find which day of the week has the highest average sales amount.
  5. Write a query with multiple CTEs to analyze which products are often purchased together.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)