PostgreSQL Subqueries

Introduction

Subqueries (also known as nested queries or inner queries) are a powerful feature in PostgreSQL that allow you to use the result of one query within another query. They enable you to perform complex operations that might otherwise require multiple separate queries or complex joins.

Think of subqueries as queries within queries - like nesting one SQL statement inside another. This capability allows you to break down complex problems into smaller, more manageable pieces, making your SQL code more readable and often more efficient.

In this tutorial, we'll explore how subqueries work in PostgreSQL, when to use them, and practical examples to demonstrate their power.

Understanding Subqueries

A subquery is enclosed in parentheses and can appear in various parts of a SQL statement:

In the SELECT clause
In the FROM clause
In the WHERE clause
In the HAVING clause

Let's start with a simple example. Imagine we have two tables: employees and departments:

CREATE TABLE departments (
    department_id SERIAL PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL
);

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    salary NUMERIC(10, 2) NOT NULL,
    department_id INTEGER REFERENCES departments(department_id)
);

Types of Subqueries

There are several types of subqueries in PostgreSQL:

1. Scalar Subqueries

A scalar subquery returns a single value and can be used wherever a single value expression is expected.

Example: Find employees who earn more than the average salary:

SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary) 
    FROM employees
);

Output:

    name    |  salary  
------------+----------
 John Smith | 85000.00
 Lisa Wong  | 78500.00
 David Kim  | 92000.00

In this example, the subquery (SELECT AVG(salary) FROM employees) calculates the average salary, and the outer query finds all employees with salaries higher than that average.

2. Row Subqueries

A row subquery returns a single row with multiple columns.

Example: Find an employee who has the same department and salary as employee 'John Smith':

SELECT name
FROM employees
WHERE (department_id, salary) = (
    SELECT department_id, salary
    FROM employees
    WHERE name = 'John Smith'
) AND name != 'John Smith';

Output:

    name    
------------
 Emily Chen

3. Table Subqueries

A table subquery returns multiple rows and multiple columns. It's typically used in the FROM clause.

Example: Calculating department statistics:

SELECT d.department_name, stats.avg_salary, stats.employee_count
FROM departments d
JOIN (
    SELECT department_id, 
           AVG(salary) as avg_salary, 
           COUNT(*) as employee_count
    FROM employees
    GROUP BY department_id
) stats ON d.department_id = stats.department_id;

Output:

 department_name | avg_salary | employee_count 
-----------------+------------+----------------
 Engineering     | 78333.33   | 3
 Marketing       | 65000.00   | 2
 Finance         | 72500.00   | 2
 Human Resources | 58000.00   | 1

4. Correlated Subqueries

A correlated subquery refers to columns from the outer query. The subquery is executed once for each row processed by the outer query.

Example: Find employees who earn the highest salary in their department:

SELECT name, department_id, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

Output:

    name    | department_id |  salary  
------------+---------------+----------
 John Smith | 1             | 85000.00
 Lisa Wong  | 2             | 78500.00
 David Kim  | 3             | 92000.00
 Alex Park  | 4             | 58000.00

Subqueries with Operators

EXISTS Operator

The EXISTS operator checks if a subquery returns any rows.

Example: Find departments that have at least one employee with a salary over $70,000:

SELECT department_name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
    AND e.salary > 70000
);

Output:

 department_name 
-----------------
 Engineering
 Marketing
 Finance

IN Operator

The IN operator checks if a value matches any value in a list returned by a subquery.

Example: Find employees who work in the Engineering or Marketing departments:

SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM departments
    WHERE department_name IN ('Engineering', 'Marketing')
);

Output:

    name    
------------
 John Smith
 Emily Chen
 Michael Lee
 Lisa Wong
 Sarah Johnson

ANY and ALL Operators

The ANY operator compares a value to each value returned by a subquery. The ALL operator compares a value to all values returned by a subquery.

Example with ANY: Find employees whose salary is greater than ANY of the Marketing department salaries:

SELECT name, salary
FROM employees
WHERE salary > ANY (
    SELECT salary
    FROM employees
    WHERE department_id = (
        SELECT department_id
        FROM departments
        WHERE department_name = 'Marketing'
    )
);

Output:

    name    |  salary  
------------+----------
 John Smith | 85000.00
 David Kim  | 92000.00
 Robert Lee | 73000.00

Example with ALL: Find employees whose salary is greater than ALL of the Marketing department salaries:

SELECT name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department_id = (
        SELECT department_id
        FROM departments
        WHERE department_name = 'Marketing'
    )
);

Output:

    name    |  salary  
------------+----------
 John Smith | 85000.00
 David Kim  | 92000.00

Common Table Expressions (CTEs) vs. Subqueries

While subqueries are powerful, PostgreSQL also offers Common Table Expressions (CTEs) as an alternative approach. Let's compare:

Example using a CTE instead of a subquery:

WITH dept_stats AS (
    SELECT department_id, 
           AVG(salary) as avg_salary, 
           COUNT(*) as employee_count
    FROM employees
    GROUP BY department_id
)
SELECT d.department_name, s.avg_salary, s.employee_count
FROM departments d
JOIN dept_stats s ON d.department_id = s.department_id;

CTEs are often more readable for complex queries, especially when you need to reference the same subquery multiple times.

Practical Real-World Examples

Example 1: Finding Outliers

Identifying employees whose salaries are significantly above the average for their department:

SELECT e.name, e.salary, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > (
    SELECT AVG(salary) * 1.5
    FROM employees
    WHERE department_id = e.department_id
);

Example 2: Report Generation

Creating a department performance report:

SELECT 
    d.department_name,
    (SELECT COUNT(*) FROM employees WHERE department_id = d.department_id) as employee_count,
    (SELECT MAX(salary) FROM employees WHERE department_id = d.department_id) as max_salary,
    (SELECT MIN(salary) FROM employees WHERE department_id = d.department_id) as min_salary,
    (SELECT AVG(salary) FROM employees WHERE department_id = d.department_id) as avg_salary
FROM departments d
ORDER BY avg_salary DESC;

Example 3: Data Cleanup

Finding duplicate employee records:

SELECT e1.employee_id, e1.name
FROM employees e1
WHERE EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.employee_id != e1.employee_id
    AND e2.name = e1.name
    AND e2.department_id = e1.department_id
);

Performance Considerations

Subqueries are powerful but can impact performance. Here are some tips:

Avoid correlated subqueries when possible: They run once per outer query row
Consider indexes: Ensure columns used in subqueries are properly indexed
Use CTEs for complex logic: Sometimes more readable and can be optimized differently
EXPLAIN ANALYZE: Use PostgreSQL's EXPLAIN ANALYZE to check query execution plans

For example, compare these two approaches:

-- Using a correlated subquery (potentially slower)
SELECT e.name, 
       (SELECT department_name FROM departments WHERE department_id = e.department_id) as department
FROM employees e;

-- Using a JOIN (often faster)
SELECT e.name, d.department_name as department
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

The JOIN approach is typically more efficient for this scenario.

Summary

PostgreSQL subqueries are a powerful tool that allows you to:

Break down complex queries into smaller, more manageable pieces
Use the results of one query inside another
Create more flexible and dynamic database operations
Solve problems that would be difficult to express with simple joins

Key points to remember:

Subqueries can appear in SELECT, FROM, WHERE, and HAVING clauses
They can return single values, rows, or tables
Correlated subqueries reference columns from the outer query
Common operators used with subqueries include EXISTS, IN, ANY, and ALL
Consider performance implications when using subqueries

Practice Exercises

Write a query to find employees who earn more than the average salary of their department.
Find departments that have no employees.
List employees who have the same salary as at least one employee in a different department.
Find the department with the highest average salary.
Create a query that returns each employee's name along with their salary percentile within their department.

Additional Resources

Remember that mastering subqueries takes practice, so experiment with different approaches and analyze query performance to become proficient in writing efficient PostgreSQL queries.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Subqueries​

Types of Subqueries​

1. Scalar Subqueries​

2. Row Subqueries​

3. Table Subqueries​

4. Correlated Subqueries​

Subqueries with Operators​

EXISTS Operator​

IN Operator​

ANY and ALL Operators​

Common Table Expressions (CTEs) vs. Subqueries​

Practical Real-World Examples​

Example 1: Finding Outliers​

Example 2: Report Generation​

Example 3: Data Cleanup​

Performance Considerations​

Summary​

Practice Exercises​

Additional Resources​

Introduction

Understanding Subqueries

Types of Subqueries

1. Scalar Subqueries

2. Row Subqueries

3. Table Subqueries

4. Correlated Subqueries

Subqueries with Operators

EXISTS Operator

IN Operator

ANY and ALL Operators

Common Table Expressions (CTEs) vs. Subqueries

Practical Real-World Examples

Example 1: Finding Outliers

Example 2: Report Generation

Example 3: Data Cleanup

Performance Considerations

Summary

Practice Exercises

Additional Resources