PostgreSQL Window Functions
Introduction
Window functions are a powerful feature in PostgreSQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions that group rows into a single output row, window functions retain all rows in the result while adding calculated values based on a set of rows related to each row.
Window functions are perfect when you need to:
- Calculate running totals or moving averages
- Rank rows within a partition
- Access data from preceding or following rows
- Compare values across related rows
If you've ever needed to compare each row to others or calculate cumulative values without losing the detail of individual rows, window functions are the solution you've been looking for.
Basic Syntax
The general syntax for a window function is:
SELECT
column1,
column2,
window_function(arg1, arg2, ...) OVER (
PARTITION BY partition_expression
ORDER BY sort_expression
frame_clause
)
FROM table_name;
Let's break down the components:
window_function
: The function to apply (likeSUM
,AVG
,ROW_NUMBER
, etc.)OVER
clause: Defines the "window" of rows to operate onPARTITION BY
: Optional; divides rows into groups or partitionsORDER BY
: Optional; defines the order of rows within each partitionframe_clause
: Optional; further restricts the set of rows within the partition
Common Window Functions
Row Numbering Functions
PostgreSQL provides several functions for numbering rows:
ROW_NUMBER()
Assigns a unique number to each row in the result set.
SELECT
product_name,
category,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) as row_num
FROM products;
Output:
product_name | category | price | row_num
----------------+-----------+-------+---------
Smartphone X | Electronics| 999.99| 1
Laptop Pro | Electronics| 849.50| 2
Coffee Machine | Appliances | 129.99| 3
Desk Lamp | Home | 45.50| 4
Tea Kettle | Appliances | 29.99| 5
RANK() and DENSE_RANK()
These functions assign ranks to rows, with RANK()
leaving gaps for ties and DENSE_RANK()
assigning consecutive ranks.
SELECT
product_name,
category,
price,
RANK() OVER (ORDER BY price DESC) as rank,
DENSE_RANK() OVER (ORDER BY price DESC) as dense_rank
FROM products;
Output:
product_name | category | price | rank | dense_rank
----------------+-----------+-------+------+------------
Smartphone X | Electronics| 999.99| 1 | 1
Laptop Pro | Electronics| 849.50| 2 | 2
Coffee Machine | Appliances | 129.99| 3 | 3
Desk Lamp | Home | 45.50| 4 | 4
Tea Kettle | Appliances | 29.99| 5 | 5
If we add a product with the same price as another:
SELECT
product_name,
category,
price,
RANK() OVER (ORDER BY price DESC) as rank,
DENSE_RANK() OVER (ORDER BY price DESC) as dense_rank
FROM products;
Output with ties:
product_name | category | price | rank | dense_rank
----------------+-----------+-------+------+------------
Smartphone X | Electronics| 999.99| 1 | 1
Laptop Pro | Electronics| 849.50| 2 | 2
Tablet Mini | Electronics| 129.99| 3 | 3
Coffee Machine | Appliances | 129.99| 3 | 3
Desk Lamp | Home | 45.50| 5 | 4
Tea Kettle | Appliances | 29.99| 6 | 5
Notice how RANK()
skips rank 4 after the tie, while DENSE_RANK()
uses consecutive ranks.
NTILE(n)
This function divides rows into a specified number of approximately equal groups.
SELECT
product_name,
category,
price,
NTILE(3) OVER (ORDER BY price DESC) as price_tier
FROM products;
Output:
product_name | category | price | price_tier
----------------+-----------+-------+------------
Smartphone X | Electronics| 999.99| 1
Laptop Pro | Electronics| 849.50| 1
Tablet Mini | Electronics| 129.99| 2
Coffee Machine | Appliances | 129.99| 2
Desk Lamp | Home | 45.50| 3
Tea Kettle | Appliances | 29.99| 3
Aggregate Window Functions
These functions perform calculations across rows within the window.
SUM(), AVG(), MIN(), MAX()
SELECT
category,
product_name,
price,
SUM(price) OVER (PARTITION BY category) as category_total,
AVG(price) OVER (PARTITION BY category) as category_avg,
price / SUM(price) OVER (PARTITION BY category) * 100 as pct_of_category
FROM products
ORDER BY category, price DESC;
Output:
category | product_name | price | category_total | category_avg | pct_of_category
-------------+---------------+-------+----------------+--------------+----------------
Appliances | Coffee Machine| 129.99| 159.98 | 79.99 | 81.25
Appliances | Tea Kettle | 29.99| 159.98 | 79.99 | 18.75
Electronics | Smartphone X | 999.99| 1979.48 | 659.83 | 50.52
Electronics | Laptop Pro | 849.50| 1979.48 | 659.83 | 42.92
Electronics | Tablet Mini | 129.99| 1979.48 | 659.83 | 6.56
Home | Desk Lamp | 45.50| 45.50 | 45.50 | 100.00
This query shows:
- Total price of all products in each category
- Average price of products in each category
- Each product's price as a percentage of its category total
Running Totals and Moving Calculations
Window functions can calculate running totals or moving averages:
SELECT
order_date,
order_amount,
SUM(order_amount) OVER (ORDER BY order_date) as running_total,
AVG(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3days
FROM orders
ORDER BY order_date;
Output:
order_date | order_amount | running_total | moving_avg_3days
------------+--------------+---------------+------------------
2023-01-01 | 100.00 | 100.00 | 100.00
2023-01-02 | 150.00 | 250.00 | 125.00
2023-01-03 | 200.00 | 450.00 | 150.00
2023-01-04 | 125.00 | 575.00 | 158.33
2023-01-05 | 175.00 | 750.00 | 166.67
Working with Window Frame Clauses
The frame clause defines which rows within a partition are included in the calculation. The default frame depends on whether you have an ORDER BY
clause:
- With
ORDER BY
: Default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- Without
ORDER BY
: Default isRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Common Frame Clauses
-- Current row and all previous rows
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Current row and one previous row
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
-- Current row, one previous, and one following row
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-- All rows in the partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Example: Moving Averages
SELECT
date,
daily_sales,
AVG(daily_sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_average
FROM daily_sales
ORDER BY date;
Output:
date | daily_sales | seven_day_average
------------+-------------+-------------------
2023-01-01 | 1200.00| 1200.00
2023-01-02 | 1350.00| 1275.00
2023-01-03 | 1100.00| 1216.67
2023-01-04 | 1450.00| 1275.00
2023-01-05 | 1300.00| 1280.00
2023-01-06 | 1050.00| 1241.67
2023-01-07 | 1100.00| 1221.43
2023-01-08 | 1400.00| 1250.00
Working with PARTITION BY
The PARTITION BY
clause divides the result set into partitions to which the window function is applied separately.
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_salary_rank,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
MAX(salary) OVER (PARTITION BY department) as dept_max_salary
FROM employees
ORDER BY department, dept_salary_rank;
Output:
department | employee_name | salary | dept_salary_rank | dept_avg_salary | dept_max_salary
------------+---------------+---------+------------------+-----------------+----------------
Engineering| John Smith | 95000.00| 1 | 82500.00 | 95000.00
Engineering| Lisa Jones | 90000.00| 2 | 82500.00 | 95000.00
Engineering| Mike Johnson | 85000.00| 3 | 82500.00 | 95000.00
Engineering| Sarah Williams| 60000.00| 4 | 82500.00 | 95000.00
Marketing | David Lee | 78000.00| 1 | 68000.00 | 78000.00
Marketing | Emily Chen | 72000.00| 2 | 68000.00 | 78000.00
Marketing | Robert Garcia | 65000.00| 3 | 68000.00 | 78000.00
Marketing | Mary Brown | 57000.00| 4 | 68000.00 | 78000.00
Sales | James Wilson | 82000.00| 1 | 73250.00 | 82000.00
Sales | Patricia Moore| 80000.00| 2 | 73250.00 | 82000.00
Sales | Thomas Taylor | 75000.00| 3 | 73250.00 | 82000.00
Sales | Jennifer Lewis| 56000.00| 4 | 73250.00 | 82000.00
Advanced Window Function Examples
Percentiles
SELECT
product_name,
category,
price,
PERCENT_RANK() OVER (ORDER BY price) as percent_rank,
CUME_DIST() OVER (ORDER BY price) as cumulative_distribution
FROM products;
Output:
product_name | category | price | percent_rank | cumulative_distribution
----------------+-----------+-------+--------------+-------------------------
Tea Kettle | Appliances| 29.99| 0.0 | 0.1667
Desk Lamp | Home | 45.50| 0.25 | 0.3333
Coffee Machine | Appliances| 129.99| 0.5 | 0.6667
Laptop Pro | Electronics| 849.50| 0.75 | 0.8333
Smartphone X | Electronics| 999.99| 1.0 | 1.0000
Lag and Lead Functions
These functions access data from preceding or following rows without requiring a self-join.
SELECT
order_date,
order_amount,
LAG(order_amount, 1) OVER (ORDER BY order_date) as previous_day_amount,
order_amount - LAG(order_amount, 1) OVER (ORDER BY order_date) as day_over_day_change,
LEAD(order_amount, 1) OVER (ORDER BY order_date) as next_day_amount
FROM daily_orders
ORDER BY order_date;
Output:
order_date | order_amount | previous_day_amount | day_over_day_change | next_day_amount
------------+--------------+---------------------+---------------------+-----------------
2023-01-01 | 1200.00 | NULL | NULL | 1350.00
2023-01-02 | 1350.00 | 1200.00 | 150.00 | 1100.00
2023-01-03 | 1100.00 | 1350.00 | -250.00 | 1450.00
2023-01-04 | 1450.00 | 1100.00 | 350.00 | 1300.00
2023-01-05 | 1300.00 | 1450.00 | -150.00 | NULL
First_Value and Last_Value
These functions retrieve values from the first or last row in the window frame.
SELECT
department,
employee_name,
salary,
FIRST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as highest_paid_employee,
LAST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lowest_paid_employee
FROM employees
ORDER BY department, salary DESC;
Output:
department | employee_name | salary | highest_paid_employee | lowest_paid_employee
------------+---------------+---------+-----------------------+----------------------
Engineering| John Smith | 95000.00| John Smith | Sarah Williams
Engineering| Lisa Jones | 90000.00| John Smith | Sarah Williams
Engineering| Mike Johnson | 85000.00| John Smith | Sarah Williams
Engineering| Sarah Williams| 60000.00| John Smith | Sarah Williams
Marketing | David Lee | 78000.00| David Lee | Mary Brown
Marketing | Emily Chen | 72000.00| David Lee | Mary Brown
Marketing | Robert Garcia | 65000.00| David Lee | Mary Brown
Marketing | Mary Brown | 57000.00| David Lee | Mary Brown
Sales | James Wilson | 82000.00| James Wilson | Jennifer Lewis
Sales | Patricia Moore| 80000.00| James Wilson | Jennifer Lewis
Sales | Thomas Taylor | 75000.00| James Wilson | Jennifer Lewis
Sales | Jennifer Lewis| 56000.00| James Wilson | Jennifer Lewis
Real-World Applications
Analyzing Sales Data
Window functions are perfect for sales analysis:
SELECT
product_category,
DATE_TRUNC('month', order_date) as month,
SUM(order_amount) as monthly_sales,
SUM(SUM(order_amount)) OVER (
PARTITION BY product_category
ORDER BY DATE_TRUNC('month', order_date)
ROWS UNBOUNDED PRECEDING
) as cumulative_sales,
SUM(order_amount) / SUM(SUM(order_amount)) OVER (
PARTITION BY product_category
) * 100 as percentage_of_category_total
FROM sales
GROUP BY product_category, DATE_TRUNC('month', order_date)
ORDER BY product_category, month;
Employee Salary Analysis
SELECT
e.employee_id,
e.name,
e.department,
e.salary,
e.hire_date,
ROUND(e.salary / AVG(e.salary) OVER (PARTITION BY e.department) * 100, 2) as percent_of_dept_avg,
RANK() OVER (PARTITION BY e.department ORDER BY e.salary DESC) as dept_salary_rank,
DENSE_RANK() OVER (ORDER BY e.salary DESC) as company_salary_rank,
FIRST_VALUE(e.name) OVER (
PARTITION BY e.department
ORDER BY e.hire_date
ROWS UNBOUNDED PRECEDING
) as dept_most_senior,
LAG(e.salary) OVER (
PARTITION BY e.department
ORDER BY e.hire_date
) as previous_hire_salary
FROM employees e
ORDER BY e.department, e.salary DESC;
Time Series Analysis
Window functions excel at time series analysis:
SELECT
date,
stock_price,
AVG(stock_price) OVER (
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as moving_avg_5day,
stock_price - LAG(stock_price, 5) OVER (ORDER BY date) as price_change_5day,
(stock_price / LAG(stock_price, 5) OVER (ORDER BY date) - 1) * 100 as percent_change_5day
FROM stock_prices
ORDER BY date;
Combining Multiple Window Functions
You can use multiple window functions in a single query:
SELECT
department,
employee_name,
salary,
RANK() OVER dept_salary_window as salary_rank,
AVG(salary) OVER dept_window as dept_avg,
MAX(salary) OVER dept_window as dept_max,
MIN(salary) OVER dept_window as dept_min,
salary - AVG(salary) OVER dept_window as diff_from_avg
FROM employees
WINDOW
dept_window AS (PARTITION BY department),
dept_salary_window AS (PARTITION BY department ORDER BY salary DESC)
ORDER BY department, salary_rank;
This approach uses the WINDOW
clause to define window specifications once and reuse them, making the query more readable.
Performance Considerations
Window functions are powerful but can be resource-intensive. Here are some tips:
- Use indexes: Ensure the columns in
PARTITION BY
andORDER BY
clauses are indexed. - Limit partitions: Excessively large numbers of partitions can degrade performance.
- Consider materialized views: For frequently used window calculations, consider materialized views.
- Monitor execution plans: Use
EXPLAIN ANALYZE
to understand how PostgreSQL executes your window functions.
Summary
Window functions in PostgreSQL are a powerful tool for analytics and reporting. They allow you to:
- Perform calculations across related rows without collapsing them into a single output row
- Rank and number rows within partitions
- Create running totals and moving averages
- Compare current rows with preceding or following rows
- Calculate percentiles and distributions
By mastering window functions, you can write more efficient queries and perform complex analytics directly in your database, reducing the need for application-side processing.
Additional Resources and Exercises
Resources
Exercises
-
Basic Window Functions: Write a query that ranks products by price within each category.
-
Running Totals: Calculate the running total of sales for each month of the year.
-
Moving Averages: Calculate a 7-day moving average of daily website visits.
-
Employee Analysis: For each employee, show their salary, the department average, and how much their salary differs from the department average as a percentage.
-
Advanced Challenge: Write a query that identifies products whose price is higher than the average price of its category, and show by what percentage it exceeds that average.
Solutions to these exercises can be found by applying the concepts learned in this tutorial!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)