MySQL Common Table Expressions
Common Table Expressions (CTEs) are one of the most powerful features in MySQL that can help you write cleaner, more maintainable SQL code. Introduced in MySQL 8.0, CTEs allow you to define temporary result sets that exist just for the duration of a query, making complex queries more readable and easier to understand.
What Are Common Table Expressions?
A Common Table Expression (CTE) is a named temporary result set that exists within the scope of a single statement and can be referenced within that statement. Think of a CTE as a temporary view that exists only during the execution of your query.
CTEs are defined using the WITH
clause and have the following general syntax:
WITH cte_name AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT * FROM cte_name;
Why Use CTEs?
Before diving into examples, let's understand the key benefits of using CTEs:
- Improved Readability: CTEs make complex queries more readable by breaking them into logical, named components
- Code Maintainability: Easier to maintain and modify queries as components are clearly separated
- Reusability: A CTE can be referenced multiple times within a query
- Recursive Capabilities: Support for recursive queries (hierarchical data)
- Alternative to Subqueries: Often more readable than nested subqueries
Basic CTE Example
Let's start with a simple example. Imagine we have an employees
table and want to find all employees earning above the average salary:
-- Without CTE
SELECT emp_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- With CTE
WITH AvgSalaryCTE AS (
SELECT AVG(salary) as avg_salary
FROM employees
)
SELECT e.emp_id, e.first_name, e.last_name, e.salary
FROM employees e, AvgSalaryCTE
WHERE e.salary > AvgSalaryCTE.avg_salary;
While this simple example might not show a huge advantage, the benefits become more apparent in complex queries.
Multiple CTEs
You can define multiple CTEs in a single query by separating them with commas:
WITH
DepartmentSalaries AS (
SELECT dept_id, AVG(salary) as avg_dept_salary
FROM employees
GROUP BY dept_id
),
TopDepartments AS (
SELECT dept_id
FROM DepartmentSalaries
ORDER BY avg_dept_salary DESC
LIMIT 3
)
SELECT d.dept_name, ds.avg_dept_salary
FROM departments d
JOIN DepartmentSalaries ds ON d.dept_id = ds.dept_id
WHERE d.dept_id IN (SELECT dept_id FROM TopDepartments);
This query identifies the top 3 departments with the highest average salaries and returns their names and average salaries.
Using CTEs for Data Analysis
CTEs are particularly useful for data analysis tasks. Here's an example of using a CTE to calculate the percentage of sales by product category:
WITH CategorySales AS (
SELECT
c.category_name,
SUM(o.quantity * o.unit_price) as total_sales
FROM order_details o
JOIN products p ON o.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
GROUP BY c.category_name
),
TotalSales AS (
SELECT SUM(total_sales) as grand_total
FROM CategorySales
)
SELECT
cs.category_name,
cs.total_sales,
(cs.total_sales / ts.grand_total * 100) as percentage
FROM CategorySales cs, TotalSales ts
ORDER BY percentage DESC;
This query calculates the total sales by product category and then computes the percentage each category contributes to the overall sales.
Recursive CTEs
One of the most powerful features of CTEs is their ability to reference themselves, creating recursive queries. This is particularly useful for hierarchical data like organizational structures, bill of materials, or category trees.
Here's an example of a recursive CTE that traverses an employee hierarchy:
WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: start with CEO (manager_id is NULL)
SELECT emp_id, first_name, last_name, manager_id, 0 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive part: join with employees who have managers
SELECT e.emp_id, e.first_name, e.last_name, e.manager_id, eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.emp_id
)
SELECT
emp_id,
first_name,
last_name,
level,
CONCAT(REPEAT(' ', level), first_name, ' ', last_name) as hierarchy
FROM EmployeeHierarchy
ORDER BY level, first_name;
This query starts with the CEO (assuming they have a NULL
manager_id) and recursively finds all employees that report up through the hierarchy, displaying their level in the organization and a formatted hierarchical view.
The output might look something like:
emp_id first_name last_name level hierarchy
1 Sarah Johnson 0 Sarah Johnson
2 Michael Smith 1 Michael Smith
5 Emily Brown 1 Emily Brown
3 David Wilson 2 David Wilson
6 Jessica Davis 2 Jessica Davis
4 Jennifer Taylor 3 Jennifer Taylor
CTEs for Simplifying Complex Joins
CTEs can make complex join operations more readable by breaking them down:
WITH CustomerOrders AS (
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) as order_count,
SUM(o.total_amount) as total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
),
CustomerCategories AS (
SELECT
customer_id,
customer_name,
order_count,
total_spent,
CASE
WHEN total_spent > 10000 THEN 'Platinum'
WHEN total_spent > 5000 THEN 'Gold'
WHEN total_spent > 1000 THEN 'Silver'
ELSE 'Bronze'
END as customer_category
FROM CustomerOrders
)
SELECT
cc.customer_name,
cc.order_count,
cc.total_spent,
cc.customer_category,
COUNT(*) OVER (PARTITION BY cc.customer_category) as customers_in_category
FROM CustomerCategories cc
ORDER BY cc.total_spent DESC;
This query first aggregates customer order data, then categorizes customers based on their spending, and finally adds information about how many customers are in each category.
Practical Example: Sales Analysis
Let's look at a practical example where CTEs can help perform a complex sales analysis:
WITH MonthlySales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') as month,
SUM(total_amount) as monthly_revenue
FROM orders
WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH)
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
SalesGrowth AS (
SELECT
current.month,
current.monthly_revenue,
prev.monthly_revenue as prev_month_revenue,
(current.monthly_revenue - prev.monthly_revenue) / prev.monthly_revenue * 100 as growth_percent
FROM MonthlySales current
LEFT JOIN MonthlySales prev ON current.month = DATE_FORMAT(DATE_ADD(STR_TO_DATE(prev.month, '%Y-%m'), INTERVAL 1 MONTH), '%Y-%m')
)
SELECT
month,
monthly_revenue,
prev_month_revenue,
ROUND(growth_percent, 2) as growth_percent,
CASE
WHEN growth_percent > 10 THEN 'Strong Growth'
WHEN growth_percent BETWEEN 0 AND 10 THEN 'Moderate Growth'
WHEN growth_percent BETWEEN -10 AND 0 THEN 'Slight Decline'
ELSE 'Significant Decline'
END as performance_category
FROM SalesGrowth
ORDER BY month;
This analysis:
- Calculates monthly sales revenue for the past 12 months
- Compares each month to the previous month to calculate growth percentage
- Categorizes monthly performance based on growth rates
CTE vs. Subqueries or Temporary Tables
To understand why CTEs are valuable, let's compare them with alternatives:
CTE vs. Subquery
-- Using a subquery
SELECT
p.product_name,
p.unit_price,
(SELECT AVG(unit_price) FROM products) as avg_price
FROM products p
WHERE p.unit_price > (SELECT AVG(unit_price) FROM products);
-- Using a CTE
WITH AvgPrice AS (
SELECT AVG(unit_price) as avg_price
FROM products
)
SELECT
p.product_name,
p.unit_price,
ap.avg_price
FROM products p, AvgPrice ap
WHERE p.unit_price > ap.avg_price;
The CTE version avoids calculating the average price twice and provides a named reference that makes the query intent clearer.
CTE vs. Temporary Table
-- Using a temporary table
CREATE TEMPORARY TABLE top_customers AS
SELECT customer_id, SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 100;
SELECT c.customer_name, tc.total_spent
FROM customers c
JOIN top_customers tc ON c.customer_id = tc.customer_id;
DROP TEMPORARY TABLE top_customers;
-- Using a CTE
WITH top_customers AS (
SELECT customer_id, SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 100
)
SELECT c.customer_name, tc.total_spent
FROM customers c
JOIN top_customers tc ON c.customer_id = tc.customer_id;
The CTE version eliminates the need for creating and dropping temporary tables, making the query more concise and self-contained.
Best Practices for Using CTEs
- Use Meaningful Names: Choose clear, descriptive names for your CTEs
- Keep CTEs Focused: Each CTE should have a single logical purpose
- Consider Performance: CTEs are not always optimized the same way as subqueries
- Limit Recursion Depth: For recursive CTEs, consider adding a depth limit to prevent infinite recursion
- Documentation: Add comments to explain complex CTEs
- Test with Real Data: CTEs may perform differently with various data volumes
CTE Limitations in MySQL
While CTEs are powerful, they have some limitations in MySQL:
- CTEs exist only for the duration of the query
- You cannot create indexes on CTEs
- Recursive CTEs have a default recursion limit (set by
cte_max_recursion_depth
, default is 1000) - MySQL's query optimizer may not handle all CTE patterns optimally
Summary
Common Table Expressions (CTEs) are a powerful feature in MySQL that allow you to write more readable, maintainable, and expressive SQL queries. They are particularly valuable for:
- Breaking down complex queries into manageable, named components
- Recursive querying for hierarchical data
- Simplifying queries with multiple references to the same derived table
- Making complex analytical queries more readable
By mastering CTEs, you'll be able to write cleaner SQL code and tackle more complex data problems with confidence.
Exercises
To practice using CTEs, try these exercises:
- Write a CTE that finds customers who have placed more than the average number of orders.
- Create a recursive CTE to generate a sequence of dates for the next 30 days from today.
- Use CTEs to find products that have been sold in all regions.
- Write a query with multiple CTEs to analyze which day of the week has the highest average order value.
- Create a recursive CTE to display a category hierarchy (assuming categories can have subcategories).
Additional Resources
- MySQL Official Documentation on CTEs
- MySQL Recursive CTE Examples
- "SQL Cookbook" by Anthony Molinaro (contains many patterns that can be implemented using CTEs)
- "SQL Performance Explained" by Markus Winand (discusses optimization aspects of different SQL constructs including CTEs)
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)