Subqueries
Introduction
Subqueries (also known as nested queries or inner queries) are a powerful SQL feature that allows you to use the results of one query within another query. They enable you to break down complex problems into simpler steps and often provide alternative ways to perform operations that would otherwise require complex joins or unions.
Think of subqueries as queries within queries - like nesting one set of parentheses inside another in a mathematical expression. They can help you retrieve data that would be difficult or impossible to get with a single, simple query.
What Are Subqueries?
A subquery is a SELECT statement that is nested within another SQL statement (which could be a SELECT, INSERT, UPDATE, or DELETE). The inner query executes first and its results are used by the outer query.
SELECT column_name
FROM table_name
WHERE column_name operator (SELECT column_name FROM table_name WHERE condition);
In this structure:
- The query inside the parentheses is the subquery (inner query)
- The query containing the subquery is the main query (outer query)
Types of Subqueries
Subqueries can be categorized based on their relationship with the main query:
1. Single-Row Subqueries
These return exactly one row and one column. They're typically used with single-row comparison operators like =
, >
, <
, >=
, <=
, <>
.
2. Multiple-Row Subqueries
These return multiple rows but only one column. They're used with multiple-row operators like IN
, ANY
, ALL
.
3. Correlated Subqueries
These reference columns from the outer query. The subquery is executed once for each row processed by the outer query.
4. Scalar Subqueries
These return a single value (one row, one column) and can be used almost anywhere in SQL where a single value is expected.
Basic Subquery Examples
Example 1: Finding employees who earn more than average
-- Find employees who earn more than the average salary
SELECT first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Input: The employees table with salary information Output:
first_name | last_name | salary
------------|------------|--------
John | Smith | 75000
Sarah | Johnson | 82000
Michael | Williams | 78000
In this example:
- The subquery
(SELECT AVG(salary) FROM employees)
calculates the average salary - The main query selects employees whose salary is greater than this average
Example 2: Using IN with a subquery
-- Find all orders placed by customers in New York
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM customers
WHERE city = 'New York'
);
Input: Orders and customers tables Output:
order_id | order_date | total_amount
---------|------------|-------------
1001 | 2023-05-15 | 250.99
1008 | 2023-05-18 | 175.50
1015 | 2023-05-22 | 99.95
This example:
- Uses a subquery to get all customer IDs from New York
- Finds all orders placed by these customers
Subqueries in Different Clauses
Subqueries can be used in various parts of SQL statements:
In the SELECT Clause
-- Display each department with its average salary
SELECT
department_name,
(SELECT AVG(salary) FROM employees WHERE employees.department_id = departments.department_id) AS avg_salary
FROM departments;
Input: Departments and employees tables Output:
department_name | avg_salary
----------------|------------
Engineering | 85000.00
Marketing | 67500.00
Finance | 72000.00
Human Resources | 58000.00
In the FROM Clause
When used in the FROM clause, a subquery creates a derived table (also called an inline view):
-- Count the number of employees in each salary band
SELECT salary_band, COUNT(*) as employee_count
FROM (
SELECT
CASE
WHEN salary < 50000 THEN 'Low'
WHEN salary BETWEEN 50000 AND 80000 THEN 'Medium'
ELSE 'High'
END AS salary_band
FROM employees
) AS salary_bands
GROUP BY salary_band;
Input: Employees table with various salaries Output:
salary_band | employee_count
------------|---------------
Low | 12
Medium | 24
High | 8
In the WHERE Clause
This is the most common use of subqueries:
-- Find products that have never been ordered
SELECT product_name
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_details
);
Input: Products and order_details tables Output:
product_name
--------------
Wireless Mouse
Bluetooth Speaker
USB-C Hub
In the HAVING Clause
-- Find departments with average salary higher than the company average
SELECT
department_id,
AVG(salary) as dept_avg_salary
FROM
employees
GROUP BY
department_id
HAVING
AVG(salary) > (SELECT AVG(salary) FROM employees);
Input: Employees table with department and salary information Output:
department_id | dept_avg_salary
--------------|----------------
1 | 85000.00
3 | 72000.00
Correlated Subqueries
A correlated subquery is a subquery that references columns from the outer query. It's executed once for each row processed by the outer query.
-- Find employees who earn more than the average salary in their department
SELECT e1.employee_id, e1.first_name, e1.last_name, e1.salary, e1.department_id
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
Input: Employees table with department and salary information Output:
employee_id | first_name | last_name | salary | department_id
------------|------------|-----------|---------|---------------
101 | John | Smith | 85000 | 1
105 | Sarah | Johnson | 75000 | 2
110 | Michael | Williams | 90000 | 3
How this works:
- For each row in the outer query, the subquery is executed
- The subquery calculates the average salary for that employee's department
- The outer query compares the employee's salary to this average
Exists and Not Exists
The EXISTS
operator is used to check whether a subquery returns any rows.
-- Find customers who have placed at least one order
SELECT customer_id, first_name, last_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Input: Customers and orders tables Output:
customer_id | first_name | last_name
------------|------------|----------
1 | Alice | Johnson
3 | Bob | Smith
4 | Carol | Williams
Similarly, NOT EXISTS
can be used to find records that don't have related records:
-- Find customers who haven't placed any orders
SELECT customer_id, first_name, last_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Subqueries with ANY, ALL, and SOME
These operators are used with subqueries that return multiple values:
ANY
Returns TRUE if any of the subquery values meet the condition.
-- Find products that cost more than any product in the 'Electronics' category
SELECT product_name, price
FROM products
WHERE price > ANY (
SELECT price
FROM products
WHERE category = 'Electronics'
);
ALL
Returns TRUE only if all the subquery values meet the condition.
-- Find products that cost more than all products in the 'Books' category
SELECT product_name, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Books'
);
Practical Examples
Example 1: Finding duplicates in a table
-- Find duplicate email addresses in the customers table
SELECT email, COUNT(email) as count
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;
Input: Customers table with email addresses Output:
email | count
----------------------|------
[email protected] | 2
[email protected] | 3
Example 2: Finding the second highest salary
-- Find the employee with the second highest salary
SELECT first_name, last_name, salary
FROM employees
WHERE salary = (
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
);
Input: Employees table with salary information Output:
first_name | last_name | salary
-----------|-----------|-------
Sarah | Johnson | 95000
Example 3: Comparing current data with historical data
-- Find products whose price has increased since last month
SELECT
p.product_id,
p.product_name,
p.current_price,
(SELECT price FROM price_history ph
WHERE ph.product_id = p.product_id
AND month = 'April'
AND year = 2023) as last_month_price
FROM
products p
WHERE
p.current_price > (
SELECT price
FROM price_history ph
WHERE ph.product_id = p.product_id
AND month = 'April'
AND year = 2023
);
Input: Products table and price_history table Output:
product_id | product_name | current_price | last_month_price
-----------|-------------------|---------------|----------------
101 | Wireless Keyboard | 59.99 | 49.99
105 | USB-C Cable | 15.99 | 12.99
110 | Bluetooth Speaker | 79.99 | 69.99
Common Subquery Patterns
Filtering with Multiple Conditions
-- Find customers who have placed orders over $1000 and have a premium membership
SELECT customer_id, first_name, last_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE total_amount > 1000
)
AND customer_id IN (
SELECT customer_id
FROM memberships
WHERE type = 'Premium'
);
Finding Records in One Table But Not Another
-- Find products that exist in inventory but haven't been ordered yet
SELECT product_id, product_name
FROM products
WHERE product_id IN (
SELECT product_id
FROM inventory
)
AND product_id NOT IN (
SELECT product_id
FROM order_details
);
Top N Per Group
-- Find the top 3 highest paid employees in each department
SELECT *
FROM (
SELECT
department_id,
employee_id,
first_name,
last_name,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as salary_rank
FROM
employees
) ranked_employees
WHERE salary_rank <= 3;
Performance Considerations
While subqueries are powerful, they can impact performance if not used carefully:
-
Execution Order: Remember that subqueries usually execute first, and their results are used by the outer query.
-
Alternatives to Subqueries: In many cases, joins can be more efficient than subqueries. Consider:
-- Using a subquery (might be less efficient)
SELECT first_name, last_name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location_id = 1700
);
-- Using a join (often more efficient)
SELECT e.first_name, e.last_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.location_id = 1700;
-
Correlated Subqueries: These can be especially performance-intensive as they run once for each row in the outer query.
-
Indexing: Ensure that columns used in join conditions and WHERE clauses are properly indexed.
Summary
Subqueries are a powerful SQL feature that allow you to:
- Break down complex queries into simpler, more manageable parts
- Perform operations that would be difficult or impossible with simple queries
- Create more readable and maintainable SQL code
They can be used in SELECT, FROM, WHERE, and HAVING clauses and can return single values, single columns with multiple rows, or even entire result sets.
While powerful, subqueries should be used thoughtfully, as they can impact performance. In many cases, joins might provide a more efficient alternative.
Practice Exercises
-
Write a query to find employees who earn more than their manager.
-
Find all customers who have placed orders on all days when the total daily sales exceeded $10,000.
-
Write a query to find departments that have a higher average salary than the company average.
-
Find products that have been ordered by all customers who live in California.
-
Write a query to find the most expensive product in each category.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)