PostgreSQL Lateral Joins
Introduction
PostgreSQL LATERAL joins are one of the database's most powerful yet often overlooked features. Introduced in PostgreSQL 9.3, LATERAL joins allow you to reference columns from preceding tables or subqueries in the FROM clause, creating correlated subqueries that execute once per outer row.
Think of LATERAL joins as a "for each" loop in your SQL queries - for each row in the outer query, the LATERAL subquery runs using values from that outer row. This enables complex query patterns that would otherwise require multiple queries or procedural code.
In this tutorial, we'll explore:
- What LATERAL joins are and how they differ from regular joins
- When to use LATERAL joins
- Practical examples of LATERAL in action
- Common use cases and performance considerations
Prerequisites
Before diving into LATERAL joins, you should be familiar with:
- Basic SQL queries
- PostgreSQL regular joins (INNER, LEFT, RIGHT, FULL)
- Subqueries in PostgreSQL
Understanding LATERAL Joins
The Problem LATERAL Solves
In standard SQL, a subquery in the FROM clause cannot reference columns from other tables in the same FROM clause. This limitation makes certain query patterns difficult to express.
For example, imagine you want to find the three most recent orders for each customer. Without LATERAL, you'd need to write a procedural solution or complex window functions.
How LATERAL Works
The LATERAL keyword allows a subquery in the FROM clause to reference columns from preceding FROM items. This creates a correlated subquery that executes once for each row from the preceding FROM items.
The basic syntax looks like this:
SELECT ...
FROM table_reference [AS] alias,
LATERAL (subquery that can reference columns from table_reference) [AS] alias
Basic LATERAL JOIN Examples
Let's start with a simple example using two tables: customers
and orders
.
First, let's create our sample tables:
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
customer_name VARCHAR(100),
city VARCHAR(100)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
order_date DATE,
amount DECIMAL(10, 2)
);
-- Insert sample data
INSERT INTO customers (customer_name, city)
VALUES
('John Smith', 'New York'),
('Jane Doe', 'Los Angeles'),
('Bob Johnson', 'Chicago');
INSERT INTO orders (customer_id, order_date, amount)
VALUES
(1, '2023-01-15', 150.00),
(1, '2023-02-20', 89.99),
(1, '2023-03-10', 45.50),
(1, '2023-04-05', 200.00),
(2, '2023-01-10', 120.00),
(2, '2023-02-15', 75.50),
(3, '2023-03-20', 300.00),
(3, '2023-04-25', 150.00);
Example 1: Finding the Latest Order for Each Customer
Let's use a LATERAL join to find the most recent order for each customer:
SELECT c.customer_id, c.customer_name, o.order_id, o.order_date, o.amount
FROM customers c,
LATERAL (
SELECT order_id, order_date, amount
FROM orders
WHERE orders.customer_id = c.customer_id
ORDER BY order_date DESC
LIMIT 1
) o;
Result:
customer_id | customer_name | order_id | order_date | amount
-------------+---------------+----------+-------------+--------
1 | John Smith | 4 | 2023-04-05 | 200.00
2 | Jane Doe | 6 | 2023-02-15 | 75.50
3 | Bob Johnson | 8 | 2023-04-25 | 150.00
How This Works:
- For each row in the
customers
table, PostgreSQL executes the LATERAL subquery - The subquery can reference the
c.customer_id
from the current customer row - It finds the most recent order for that specific customer
- The results are joined with the customer information
Example 2: Finding the Top 3 Most Recent Orders for Each Customer
Now let's extend our example to retrieve the three most recent orders for each customer:
SELECT c.customer_id, c.customer_name, o.order_id, o.order_date, o.amount
FROM customers c
LEFT JOIN LATERAL (
SELECT order_id, order_date, amount
FROM orders
WHERE orders.customer_id = c.customer_id
ORDER BY order_date DESC
LIMIT 3
) o ON true
ORDER BY c.customer_id, o.order_date DESC;
Result:
customer_id | customer_name | order_id | order_date | amount
-------------+---------------+----------+-------------+--------
1 | John Smith | 4 | 2023-04-05 | 200.00
1 | John Smith | 3 | 2023-03-10 | 45.50
1 | John Smith | 2 | 2023-02-20 | 89.99
2 | Jane Doe | 6 | 2023-02-15 | 75.50
2 | Jane Doe | 5 | 2023-01-10 | 120.00
3 | Bob Johnson | 8 | 2023-04-25 | 150.00
3 | Bob Johnson | 7 | 2023-03-20 | 300.00
Note about JOIN LATERAL vs LATERAL
In the examples above:
customers c, LATERAL (...)
is equivalent tocustomers c CROSS JOIN LATERAL (...)
- We used
LEFT JOIN LATERAL
in the second example to ensure customers with fewer than 3 orders still appear in the results
LATERAL with VALUES
LATERAL can also be used with the VALUES clause to generate multiple rows from a single input row:
SELECT c.customer_id, c.customer_name, v.month
FROM customers c
CROSS JOIN LATERAL (
VALUES ('January'), ('February'), ('March')
) AS v(month)
ORDER BY c.customer_id, v.month;
Result:
customer_id | customer_name | month
-------------+---------------+----------
1 | John Smith | February
1 | John Smith | January
1 | John Smith | March
2 | Jane Doe | February
2 | Jane Doe | January
2 | Jane Doe | March
3 | Bob Johnson | February
3 | Bob Johnson | January
3 | Bob Johnson | March
This is particularly useful for generating reports or transforming data.
Real-World Applications
Example 1: Calculating Running Totals for Each Customer
Let's calculate the running total of order amounts for each customer:
SELECT c.customer_id, c.customer_name,
o.order_id, o.order_date, o.amount,
sum_o.running_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN LATERAL (
SELECT SUM(o2.amount) AS running_total
FROM orders o2
WHERE o2.customer_id = c.customer_id
AND o2.order_date <= o.order_date
) sum_o ON true
ORDER BY c.customer_id, o.order_date;
Result:
customer_id | customer_name | order_id | order_date | amount | running_total
-------------+---------------+----------+-------------+--------+--------------
1 | John Smith | 1 | 2023-01-15 | 150.00 | 150.00
1 | John Smith | 2 | 2023-02-20 | 89.99 | 239.99
1 | John Smith | 3 | 2023-03-10 | 45.50 | 285.49
1 | John Smith | 4 | 2023-04-05 | 200.00 | 485.49
2 | Jane Doe | 5 | 2023-01-10 | 120.00 | 120.00
2 | Jane Doe | 6 | 2023-02-15 | 75.50 | 195.50
3 | Bob Johnson | 7 | 2023-03-20 | 300.00 | 300.00
3 | Bob Johnson | 8 | 2023-04-25 | 150.00 | 450.00
Example 2: Finding the First and Last Order for Each Customer
SELECT c.customer_id, c.customer_name,
first_o.order_date AS first_order_date, first_o.amount AS first_order_amount,
last_o.order_date AS last_order_date, last_o.amount AS last_order_amount
FROM customers c
LEFT JOIN LATERAL (
SELECT order_date, amount
FROM orders
WHERE orders.customer_id = c.customer_id
ORDER BY order_date ASC
LIMIT 1
) first_o ON true
LEFT JOIN LATERAL (
SELECT order_date, amount
FROM orders
WHERE orders.customer_id = c.customer_id
ORDER BY order_date DESC
LIMIT 1
) last_o ON true;
Result:
customer_id | customer_name | first_order_date | first_order_amount | last_order_date | last_order_amount
-------------+---------------+------------------+--------------------+-----------------+-------------------
1 | John Smith | 2023-01-15 | 150.00 | 2023-04-05 | 200.00
2 | Jane Doe | 2023-01-10 | 120.00 | 2023-02-15 | 75.50
3 | Bob Johnson | 2023-03-20 | 300.00 | 2023-04-25 | 150.00
Example 3: Aggregating Data with Different Criteria in a Single Query
Suppose we want to get various metrics for each customer in one query:
SELECT c.customer_id, c.customer_name,
total.order_count,
total.total_amount,
recent.recent_order_count,
recent.recent_amount
FROM customers c
LEFT JOIN LATERAL (
SELECT COUNT(*) AS order_count, SUM(amount) AS total_amount
FROM orders
WHERE orders.customer_id = c.customer_id
) total ON true
LEFT JOIN LATERAL (
SELECT COUNT(*) AS recent_order_count, SUM(amount) AS recent_amount
FROM orders
WHERE orders.customer_id = c.customer_id
AND order_date >= '2023-03-01'
) recent ON true;
Result:
customer_id | customer_name | order_count | total_amount | recent_order_count | recent_amount
-------------+---------------+-------------+--------------+--------------------+---------------
1 | John Smith | 4 | 485.49 | 2 | 245.50
2 | Jane Doe | 2 | 195.50 | 0 | 0.00
3 | Bob Johnson | 2 | 450.00 | 2 | 450.00
Advanced Usage: LATERAL with Table Functions
One of the most powerful applications of LATERAL is using it with table functions. PostgreSQL allows functions to return sets of rows, and LATERAL lets us call these functions with parameters from the outer query.
Example: Using LATERAL with a Custom Table Function
Let's create a simple table function that generates a series of dates and use it with LATERAL:
CREATE OR REPLACE FUNCTION generate_date_series(
start_date DATE,
end_date DATE
) RETURNS TABLE (date_value DATE) AS $$
BEGIN
RETURN QUERY
SELECT generate_series(start_date, end_date, '1 day'::interval)::date;
END;
$$ LANGUAGE plpgsql;
-- Now use it with LATERAL
SELECT c.customer_id, c.customer_name, d.date_value
FROM customers c
CROSS JOIN LATERAL generate_date_series('2023-01-01', '2023-01-03') d
ORDER BY c.customer_id, d.date_value;
Result:
customer_id | customer_name | date_value
-------------+---------------+------------
1 | John Smith | 2023-01-01
1 | John Smith | 2023-01-02
1 | John Smith | 2023-01-03
2 | Jane Doe | 2023-01-01
2 | Jane Doe | 2023-01-02
2 | Jane Doe | 2023-01-03
3 | Bob Johnson | 2023-01-01
3 | Bob Johnson | 2023-01-02
3 | Bob Johnson | 2023-01-03
Performance Considerations
While LATERAL joins are powerful, they come with performance implications:
- Execution Model: LATERAL subqueries execute once per outer row, which can be expensive for large datasets
- Index Usage: Properly indexing columns used in the join condition is crucial
- Alternative Approaches: For some use cases, window functions might offer better performance
- Query Planning: Use
EXPLAIN ANALYZE
to understand how PostgreSQL executes your LATERAL queries
Example of analyzing query performance:
EXPLAIN ANALYZE
SELECT c.customer_id, c.customer_name, o.order_id, o.order_date
FROM customers c
LEFT JOIN LATERAL (
SELECT order_id, order_date
FROM orders
WHERE orders.customer_id = c.customer_id
ORDER BY order_date DESC
LIMIT 1
) o ON true;
When to Use LATERAL Joins
LATERAL joins are particularly useful when:
- You need to reference columns from preceding tables in a subquery
- You want to limit the number of rows returned from a related table (e.g., "top N per group")
- You need to perform complex calculations based on the outer row's values
- You're working with table functions that take parameters from the outer query
- You want to avoid multiple round trips to the database
When Not to Use LATERAL Joins
LATERAL might not be the best choice when:
- Simple joins or window functions can accomplish the same task with better performance
- You're working with very large tables and performance is critical
- You need to optimize for readability and maintainability (LATERAL can be more difficult to understand)
Summary
PostgreSQL LATERAL joins provide a powerful way to reference columns from preceding tables in subqueries, enabling complex query patterns that would otherwise require multiple queries or procedural code.
Key takeaways:
- LATERAL joins act like a "for each" loop in SQL, executing a subquery once per outer row
- They excel at "top N per group" queries, row generation, and working with table functions
- LATERAL can be used with CROSS JOIN, INNER JOIN, or LEFT JOIN depending on your needs
- Performance considerations are important, especially for large datasets
With LATERAL joins, you can write more efficient and expressive queries that solve complex data problems in a single database operation.
Exercises
- Create a query that finds the top 2 most expensive orders for each customer, along with the customer's total order count.
- Write a query using LATERAL to calculate what percentage each order represents of the customer's total spending.
- Use LATERAL with the PostgreSQL
unnest()
function to expand an array column into multiple rows. - Create a report showing each customer's monthly spending for the first 3 months of 2023, using LATERAL to generate the months.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)