Correlated Subqueries

Introduction

When working with databases, you'll often need to write queries that depend on data from other parts of your database. Correlated subqueries are a powerful SQL technique that allows you to reference columns from the outer query within an inner query. Unlike regular subqueries that can be executed independently, correlated subqueries are dependent on the outer query and must be re-evaluated for each row processed by the outer query.

Think of correlated subqueries as a way to perform row-by-row processing in SQL, where each row in the outer query is analyzed in relation to other rows or tables.

Basic Concept

A correlated subquery is a subquery that references columns from the outer query. This creates a dependency between the inner and outer queries, causing the inner query to execute once for each row processed by the outer query.

The basic structure looks like this:

SELECT column1, column2, ...
FROM table1 outer
WHERE column1 operator (
    SELECT column1, column2, ...
    FROM table2
    WHERE outer.column = table2.column
);

Notice the reference to outer.column in the inner query - this is what makes it "correlated" with the outer query.

How Correlated Subqueries Work

Let's break down the execution process:

For each row in the outer query:

The database engine selects a row from the outer query
The inner query executes using values from the current outer query row
The result of the inner query is used to determine if the current outer row meets the conditions
The process repeats for each row in the outer table

Common Use Cases

One common use is to find records that have specific related records in another table.

For example, find all employees who manage at least one department:

SELECT employee_id, first_name, last_name
FROM employees emp
WHERE EXISTS (
    SELECT 1
    FROM departments dept
    WHERE dept.manager_id = emp.employee_id
);

2. Comparing Against Aggregates Within Groups

Correlated subqueries excel at comparing values against aggregates within groups.

For example, find employees who earn more than the average salary in their department:

SELECT employee_id, first_name, last_name, salary, department_id
FROM employees emp
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = emp.department_id
);

Correlated subqueries can be used in UPDATE statements to modify data based on relationships.

UPDATE products p
SET price = price * 1.1
WHERE EXISTS (
    SELECT 1
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE oi.product_id = p.product_id
    AND o.order_date >= '2023-01-01'
);

This query increases the price by 10% for all products that were ordered since January 1, 2023.

Real-World Example: E-commerce Database

Let's consider a practical example using an e-commerce database with tables for customers, orders, and products.

Database Schema

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100),
    signup_date DATE
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    unit_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Example 1: Finding VIP Customers

Let's identify customers who have placed orders with a total value higher than the average order value:

SELECT c.customer_id, c.customer_name, o.order_id, o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount > (
    SELECT AVG(total_amount)
    FROM orders
    WHERE customer_id = c.customer_id
);

Example 2: Identifying Popular Products

Find products that have been ordered more frequently than the average product in their category:

SELECT p.product_id, p.product_name, p.category,
    (SELECT COUNT(*) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) AS order_count
FROM products p
WHERE (
    SELECT COUNT(*) 
    FROM order_items oi 
    WHERE oi.product_id = p.product_id
) > (
    SELECT AVG(product_order_count)
    FROM (
        SELECT COUNT(*) AS product_order_count
        FROM order_items oi
        JOIN products p2 ON oi.product_id = p2.product_id
        WHERE p2.category = p.category
        GROUP BY oi.product_id
    ) AS category_stats
);

Example 3: Finding Customers with No Recent Orders

Identify customers who haven't placed an order in the last 90 days:

SELECT customer_id, customer_name, email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date > CURRENT_DATE - INTERVAL '90' DAY
);

Performance Considerations

Correlated subqueries can be powerful but may impact performance since the inner query runs once for each row processed by the outer query. Consider these best practices:

Use indexes on the columns referenced in the join conditions between the outer and inner queries
Consider alternatives like JOINs or window functions when possible
Limit the scope of the outer query if you only need to process a subset of rows
Test performance with EXPLAIN PLAN to understand how the database executes your query

Here's a comparison between a correlated subquery and an equivalent JOIN:

-- Correlated subquery
SELECT e.employee_id, e.first_name, e.last_name
FROM employees e
WHERE EXISTS (
    SELECT 1
    FROM departments d
    WHERE d.manager_id = e.employee_id
);

-- Equivalent JOIN
SELECT DISTINCT e.employee_id, e.first_name, e.last_name
FROM employees e
JOIN departments d ON d.manager_id = e.employee_id;

Both queries return the same result, but the JOIN approach might perform better in many database systems.

Common Operators Used with Correlated Subqueries

Several operators are commonly used with correlated subqueries:

EXISTS / NOT EXISTS: Checks if the subquery returns any rows
IN / NOT IN: Compares a value to a set of values returned by the subquery
ANY / SOME: Returns true if any of the subquery values meet the condition
ALL: Returns true if all of the subquery values meet the condition
Comparison operators (=, >, <, etc.): When the subquery returns a single value

Example with EXISTS:

SELECT product_name
FROM products p
WHERE EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
    AND oi.quantity > 10
);

This returns all products that have been ordered in quantities greater than 10 at least once.

Summary

Correlated subqueries are a powerful SQL feature that allows you to:

Reference outer query columns within an inner query
Perform row-by-row analysis and comparisons
Solve complex data problems that require contextual analysis
Filter, update, or delete data based on related information

While they can impact performance compared to other SQL constructs, they provide a flexible way to express complex logic and are essential for certain types of queries that would be difficult to write otherwise.

Exercises

To practice your understanding of correlated subqueries, try these exercises:

Write a query to find all products that have never been ordered.
Find customers who have placed orders for all products in a specific category.
Identify departments where all employees earn above the company's average salary.
Find the most expensive product in each category.
Identify customers who have spent more in the current year than they did in the previous year.

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Concept​

How Correlated Subqueries Work​

Common Use Cases​

1. Finding Records with Related Data​

2. Comparing Against Aggregates Within Groups​

3. Updating Records Based on Related Data​

Real-World Example: E-commerce Database​

Database Schema​

Example 1: Finding VIP Customers​

Example 2: Identifying Popular Products​

Example 3: Finding Customers with No Recent Orders​

Performance Considerations​

Common Operators Used with Correlated Subqueries​

Example with EXISTS:​

Summary​

Exercises​

Additional Resources​