MySQL Subqueries
Introduction
Subqueries, also known as nested queries or inner queries, are powerful SQL constructs that allow you to use the results of one query within another query. They enable you to perform complex operations that would otherwise require multiple separate queries or complex joins.
In this tutorial, you'll learn:
- What subqueries are and how they work
- Different types of subqueries in MySQL
- When and how to use subqueries effectively
- Common subquery patterns and best practices
Mastering subqueries will significantly enhance your ability to retrieve and manipulate data in MySQL databases.
What Are Subqueries?
A subquery is a query nested inside another query. The inner query executes first and produces a result that is then used by the outer query to complete its execution.
Here's the basic structure:
SELECT column_name(s)
FROM table_name
WHERE column_name OPERATOR (SELECT column_name FROM table_name WHERE condition);
The subquery (inner query) is enclosed in parentheses and typically appears on the right side of a comparison operator.
Types of Subqueries in MySQL
MySQL supports several types of subqueries, each with different characteristics and use cases:
1. Scalar Subqueries
A scalar subquery returns a single value (one row, one column) and can be used anywhere a single value is expected.
Example:
SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);
In this example, the subquery (SELECT AVG(price) FROM products)
returns a single value—the average price of all products. The outer query then returns all products with prices greater than this average.
Output:
+------------------+--------+
| product_name | price |
+------------------+--------+
| Premium Laptop | 1200.00|
| Gaming Console | 499.99 |
| DSLR Camera | 899.50 |
+------------------+--------+
2. Row Subqueries
A row subquery returns a single row with multiple columns and can be used with row constructors.
Example:
SELECT *
FROM employees
WHERE (department_id, salary) =
(SELECT department_id, MAX(salary)
FROM employees
WHERE department_id = 3);
This query finds employees who have the same department_id and salary as the employee with the highest salary in department 3.
3. Column Subqueries
A column subquery returns a single column with multiple rows. These are often used with the IN, ANY, or ALL operators.
Example:
SELECT product_name, category_id
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE parent_category = 'Electronics');
This query finds all products in categories that belong to the 'Electronics' parent category.
Output:
+-------------------+-------------+
| product_name | category_id |
+-------------------+-------------+
| Premium Laptop | 1 |
| Smartphone | 1 |
| Wireless Earbuds | 2 |
| LED TV | 3 |
+-------------------+-------------+
4. Table Subqueries
A table subquery returns multiple columns and multiple rows, essentially acting as a derived table.
Example:
SELECT o.order_id, c.customer_name, o.order_date
FROM orders o
JOIN (
SELECT customer_id, customer_name
FROM customers
WHERE country = 'USA'
) c ON o.customer_id = c.customer_id;
In this example, the subquery creates a temporary table of USA-based customers that is then joined with the orders table.
Subquery Locations in SQL Statements
Subqueries can appear in different parts of a SQL statement:
1. In the WHERE Clause
SELECT product_name, unit_price
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);
2. In the FROM Clause (Derived Tables)
SELECT avg_prices.category_id, avg_prices.avg_price, p.product_name
FROM (
SELECT category_id, AVG(unit_price) as avg_price
FROM products
GROUP BY category_id
) avg_prices
JOIN products p ON p.category_id = avg_prices.category_id;
3. In the SELECT Clause
SELECT
employee_name,
salary,
(SELECT AVG(salary) FROM employees) as avg_company_salary,
salary - (SELECT AVG(salary) FROM employees) as salary_difference
FROM employees;
Output:
+--------------+--------+-------------------+------------------+
| employee_name| salary | avg_company_salary| salary_difference|
+--------------+--------+-------------------+------------------+
| John Smith | 60000 | 55000 | 5000 |
| Jane Doe | 70000 | 55000 | 15000 |
| Bob Johnson | 45000 | 55000 | -10000 |
| Alice Brown | 55000 | 55000 | 0 |
| Mike Wilson | 45000 | 55000 | -10000 |
+--------------+--------+-------------------+------------------+
4. In the HAVING Clause
SELECT category_id, AVG(unit_price) as avg_price
FROM products
GROUP BY category_id
HAVING AVG(unit_price) > (
SELECT AVG(unit_price) FROM products
);
Subquery Operators
MySQL offers several operators specifically designed to work with subqueries:
IN Operator
Checks if a value matches any value in the subquery result.
SELECT product_name, unit_price
FROM products
WHERE category_id IN (
SELECT category_id
FROM categories
WHERE category_name LIKE '%Beverages%'
);
NOT IN Operator
Checks if a value does not match any value in the subquery result.
SELECT product_name
FROM products
WHERE supplier_id NOT IN (
SELECT supplier_id
FROM suppliers
WHERE country = 'Germany'
);
EXISTS Operator
Checks if the subquery returns any rows.
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id AND o.order_date >= '2023-01-01'
);
This query returns customers who have placed at least one order since January 1, 2023.
ANY and SOME Operators
Returns TRUE if the comparison is TRUE for ANY of the values in the subquery.
SELECT product_name, unit_price
FROM products
WHERE unit_price > ANY (
SELECT unit_price
FROM products
WHERE category_id = 2
);
This query finds products with a unit price greater than at least one product in category 2.
ALL Operator
Returns TRUE if the comparison is TRUE for ALL values in the subquery.
SELECT product_name, unit_price
FROM products
WHERE unit_price > ALL (
SELECT unit_price
FROM products
WHERE category_id = 3
);
This query finds products with a unit price greater than all products in category 3.
Correlated Subqueries
A correlated subquery is a subquery that depends on the outer query for its values. The subquery is executed once for each row processed by the outer query.
Example:
SELECT e1.employee_id, e1.employee_name, e1.salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
This query finds employees who earn more than the average salary in their department. Notice how the inner query references a column (e1.department_id
) from the outer query.
Practical Examples
Example 1: Finding Top Customers
Find customers who have placed orders with a total value higher than the average order value:
SELECT c.customer_id, c.customer_name, SUM(o.total_amount) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING SUM(o.total_amount) > (
SELECT AVG(total_amount)
FROM orders
)
ORDER BY total_spent DESC;
Example 2: Finding Products that Have Never Been Ordered
SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_details od
WHERE od.product_id = p.product_id
);
Example 3: Finding Departments with Above-Average Salary Expenses
SELECT d.department_name, SUM(e.salary) as total_salary
FROM departments d
JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
HAVING SUM(e.salary) > (
SELECT AVG(dept_salary)
FROM (
SELECT SUM(salary) as dept_salary
FROM employees
GROUP BY department_id
) as avg_dept_salaries
);
Best Practices and Performance Considerations
-
Use EXISTS for checking existence: When you just need to check if records exist, use EXISTS instead of IN as it's often more efficient.
-
Avoid subqueries in loops: Subqueries that run repeatedly inside a loop can severely impact performance.
-
Consider alternatives: In many cases, JOINs can be used instead of subqueries and may offer better performance.
-
Limit the scope: Keep your subqueries as focused as possible to improve performance.
-
Index key columns: Ensure that columns used in the subquery conditions are properly indexed.
-
Consider query rewrites: Sometimes rewriting a query with a subquery as a join can improve performance.
-
Test and compare: Always test different approaches to see which one performs better with your specific data.
Summary
Subqueries are powerful tools that enable you to write complex, flexible SQL queries in MySQL. They can be used in various parts of SQL statements, including SELECT, FROM, WHERE, and HAVING clauses.
Key points to remember:
- Subqueries execute inside out—inner queries execute before outer queries
- They can return scalar values, rows, columns, or tables
- Correlated subqueries reference the outer query and run once per outer query row
- Various operators like IN, EXISTS, ANY, and ALL can be used with subqueries
- While powerful, consider performance implications, especially with large datasets
By mastering subqueries, you'll significantly expand your ability to extract meaningful information from your MySQL databases.
Exercises
-
Write a query to find all products that are more expensive than the average price in their respective categories.
-
Write a query to find customers who have placed more orders than the average number of orders per customer.
-
Write a query using EXISTS to find suppliers who supply at least one product with a unit price greater than $100.
-
Write a query to find employees who earn more than their department manager.
-
Write a query to find categories that have no products.
Additional Resources
Happy querying!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)