PostgreSQL Window Functions
Introduction
Window functions are one of PostgreSQL's most powerful features, yet many beginners don't fully understand their potential. Unlike regular aggregate functions that group rows into a single output row, window functions perform calculations across a set of table rows that are somehow related to the current row. This allows you to compare values from one row to another without collapsing multiple rows into one.
Window functions are particularly useful when you need to:
- Calculate running totals or moving averages
- Rank items within groups
- Compare a row's value with values from other rows
- Access both detail and summary data simultaneously
Basic Syntax
The general syntax for a window function is:
SELECT column1, column2,
window_function() OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression [ASC | DESC]]
[frame_clause]
)
FROM table_name;
Let's break down the key components:
window_function()
: The function to apply (likeSUM
,AVG
,ROW_NUMBER
, etc.)OVER
: The clause that defines the windowPARTITION BY
: Divides the result set into partitions (optional)ORDER BY
: Defines the logical order of rows within each partition (optional)frame_clause
: Defines which rows to include in the window frame (optional)
Common Window Functions
Row Numbering Functions
These functions assign unique numbers to rows within a partition:
ROW_NUMBER()
: Assigns a unique sequential integer to rowsRANK()
: Assigns the same rank to ties, with gapsDENSE_RANK()
: Assigns the same rank to ties, without gapsNTILE(n)
: Divides rows into n approximately equal groups
Let's see them in action with an example:
-- Create a sample table
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(100),
salary NUMERIC(10, 2)
);
-- Insert sample data
INSERT INTO employees (name, department, salary) VALUES
('Alice', 'Engineering', 70000),
('Bob', 'Engineering', 80000),
('Charlie', 'Engineering', 80000),
('David', 'Marketing', 65000),
('Emily', 'Marketing', 90000),
('Frank', 'Finance', 95000),
('Grace', 'Finance', 95000);
-- Using row numbering functions
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;
Result:
name | department | salary | row_num | rank | dense_rank |
---|---|---|---|---|---|
Bob | Engineering | 80000.00 | 1 | 1 | 1 |
Charlie | Engineering | 80000.00 | 2 | 1 | 1 |
Alice | Engineering | 70000.00 | 3 | 3 | 2 |
Emily | Marketing | 90000.00 | 1 | 1 | 1 |
David | Marketing | 65000.00 | 2 | 2 | 2 |
Frank | Finance | 95000.00 | 1 | 1 | 1 |
Grace | Finance | 95000.00 | 2 | 1 | 1 |
Notice how:
ROW_NUMBER()
always gives unique numbers, even for tiesRANK()
gives the same rank for ties but skips the next rankDENSE_RANK()
gives the same rank for ties without skipping
Aggregate Window Functions
These functions perform calculations across rows:
SUM()
: Calculates the sumAVG()
: Calculates the averageCOUNT()
: Counts the number of rowsMIN()
: Finds the minimum valueMAX()
: Finds the maximum value
Let's see an example of using these functions:
SELECT
name,
department,
salary,
SUM(salary) OVER (PARTITION BY department) as dept_total,
AVG(salary) OVER (PARTITION BY department) as dept_avg,
salary / SUM(salary) OVER (PARTITION BY department) * 100 as pct_of_dept_total
FROM employees
ORDER BY department, salary DESC;
Result:
name | department | salary | dept_total | dept_avg | pct_of_dept_total |
---|---|---|---|---|---|
Frank | Finance | 95000.00 | 190000.00 | 95000.00 | 50.0000000000 |
Grace | Finance | 95000.00 | 190000.00 | 95000.00 | 50.0000000000 |
Bob | Engineering | 80000.00 | 230000.00 | 76666.67 | 34.7826086957 |
Charlie | Engineering | 80000.00 | 230000.00 | 76666.67 | 34.7826086957 |
Alice | Engineering | 70000.00 | 230000.00 | 76666.67 | 30.4347826087 |
Emily | Marketing | 90000.00 | 155000.00 | 77500.00 | 58.0645161290 |
David | Marketing | 65000.00 | 155000.00 | 77500.00 | 41.9354838710 |
This query shows:
- Total salary for each department
- Average salary in each department
- Each employee's salary as a percentage of their department's total
Value Functions
These functions access values from other rows:
LAG()
: Accesses the value from a previous rowLEAD()
: Accesses the value from a subsequent rowFIRST_VALUE()
: Gets the first value in the window frameLAST_VALUE()
: Gets the last value in the window frameNTH_VALUE()
: Gets the nth value in the window frame
Example:
-- Create a sample table for time series data
CREATE TABLE sales (
sale_date DATE,
amount NUMERIC(10, 2)
);
-- Insert sample data
INSERT INTO sales VALUES
('2023-01-01', 1200),
('2023-01-02', 1500),
('2023-01-03', 900),
('2023-01-04', 1600),
('2023-01-05', 1100),
('2023-01-06', 2000),
('2023-01-07', 1800);
-- Using value functions
SELECT
sale_date,
amount,
LAG(amount, 1) OVER (ORDER BY sale_date) as prev_day_amount,
amount - LAG(amount, 1) OVER (ORDER BY sale_date) as day_over_day_change,
LEAD(amount, 1) OVER (ORDER BY sale_date) as next_day_amount,
FIRST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as first_in_3days,
LAST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as last_in_3days
FROM sales;
Result (abbreviated for clarity):
sale_date | amount | prev_day_amount | day_over_day_change | next_day_amount | first_in_3days | last_in_3days |
---|---|---|---|---|---|---|
2023-01-01 | 1200.00 | NULL | NULL | 1500.00 | 1200.00 | 900.00 |
2023-01-02 | 1500.00 | 1200.00 | 300.00 | 900.00 | 1200.00 | 1600.00 |
2023-01-03 | 900.00 | 1500.00 | -600.00 | 1600.00 | 1200.00 | 1100.00 |
2023-01-04 | 1600.00 | 900.00 | 700.00 | 1100.00 | 900.00 | 2000.00 |
... | ... | ... | ... | ... | ... | ... |
This example shows how you can:
- Compare today's sales with yesterday's
- Calculate day-over-day changes
- View a rolling window of 3 days
Window Frame Clause
The frame clause determines which rows are included in the window for each current row. The syntax is:
ROWS | RANGE BETWEEN frame_start AND frame_end
Where frame_start
and frame_end
can be:
CURRENT ROW
: The current rown PRECEDING
: n rows before the current rown FOLLOWING
: n rows after the current rowUNBOUNDED PRECEDING
: All rows from the start of the partitionUNBOUNDED FOLLOWING
: All rows to the end of the partition
Let's see an example of using frame clauses:
-- Calculate running totals and moving averages
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) as running_total,
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as three_day_sum,
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as five_day_avg
FROM sales;
Result:
sale_date | amount | running_total | three_day_sum | five_day_avg |
---|---|---|---|---|
2023-01-01 | 1200.00 | 1200.00 | 1200.00 | 1240.00 |
2023-01-02 | 1500.00 | 2700.00 | 2700.00 | 1300.00 |
2023-01-03 | 900.00 | 3600.00 | 3600.00 | 1420.00 |
2023-01-04 | 1600.00 | 5200.00 | 4000.00 | 1500.00 |
2023-01-05 | 1100.00 | 6300.00 | 3600.00 | 1480.00 |
2023-01-06 | 2000.00 | 8300.00 | 4700.00 | 1625.00 |
2023-01-07 | 1800.00 | 10100.00 | 4900.00 | 1633.33 |
This example shows:
- Running total of sales over time
- A 3-day rolling sum (current day plus 2 previous days)
- A 5-day moving average (2 days before, current day, and 2 days after)
Named Window Definitions
If you need to use the same window definition multiple times, you can define it once using the WINDOW
clause:
SELECT
name,
department,
salary,
ROW_NUMBER() OVER dept_window as row_num,
RANK() OVER dept_window as rank,
DENSE_RANK() OVER dept_window as dense_rank
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC);
This produces the same result as our earlier row numbering example but with cleaner, more maintainable code.
Real-World Applications
Application 1: Finding Top N Per Group
A common requirement is to find the top N items in each group:
-- Find the highest paid employee in each department
WITH ranked_employees AS (
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rank = 1;
Result:
name | department | salary |
---|---|---|
Bob | Engineering | 80000.00 |
Charlie | Engineering | 80000.00 |
Emily | Marketing | 90000.00 |
Frank | Finance | 95000.00 |
Grace | Finance | 95000.00 |
Application 2: Calculating Percentiles
Window functions can be used to calculate percentiles:
-- Create a larger dataset
CREATE TABLE test_scores (
student_id SERIAL PRIMARY KEY,
subject VARCHAR(100),
score INT
);
-- Insert sample data (abbreviated)
INSERT INTO test_scores (subject, score) VALUES
('Math', 85), ('Math', 92), ('Math', 78), -- more rows...
('Science', 88), ('Science', 76), ('Science', 94), -- more rows...
('History', 81), ('History', 89), ('History', 67); -- more rows...
-- Calculate percentile ranks
SELECT
student_id,
subject,
score,
PERCENT_RANK() OVER (PARTITION BY subject ORDER BY score) * 100 as percentile
FROM test_scores
ORDER BY subject, score DESC;
This helps teachers understand how students perform relative to their peers.
Application 3: Time Series Analysis
Window functions excel at time series analysis:
-- Analyzing stock prices
CREATE TABLE stock_prices (
trading_date DATE,
closing_price NUMERIC(10, 2)
);
-- Calculate various metrics
SELECT
trading_date,
closing_price,
closing_price - LAG(closing_price, 1) OVER (ORDER BY trading_date) as daily_change,
(closing_price / LAG(closing_price, 1) OVER (ORDER BY trading_date) - 1) * 100 as daily_pct_change,
AVG(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as five_day_ma,
MAX(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) as twenty_day_high
FROM stock_prices;
This type of analysis is crucial for finance professionals and algorithmic traders.
Performance Considerations
While window functions are powerful, they can impact performance with large datasets. Keep these tips in mind:
- Indexing: Ensure columns used in
PARTITION BY
andORDER BY
clauses are properly indexed - Limit window size: Use specific frame clauses instead of the default when possible
- Materialized views: Consider using materialized views for frequently run window function queries
- Common Table Expressions: Break complex window function queries into CTEs for better readability and potentially better performance
Summary
PostgreSQL window functions are an essential tool for advanced data analysis. They allow you to:
- Perform calculations across rows related to the current row
- Maintain detail data while accessing aggregate information
- Simplify complex queries that would otherwise require self-joins or subqueries
- Handle time series data elegantly
- Solve common business problems like ranking, percentiles, and running totals
By mastering window functions, you'll be able to write more efficient and powerful queries, extracting deeper insights from your data.
Additional Resources and Exercises
Resources
Exercises
-
Basic Ranking: Create a table of products with categories and prices. Write a query to find the top 3 most expensive products in each category.
-
Running Totals: Using the sales table from our examples, calculate a running total of sales for each month of the year.
-
Moving Averages: For a time series dataset of your choice, calculate a 7-day moving average and compare it to the daily values.
-
Percentile Analysis: Create a table of student test scores and calculate which quartile each student falls into for each subject.
-
Gap Detection: Using a sequence of dates or numbers, write a query that identifies gaps in the sequence using window functions.
Remember, mastering window functions takes practice. Start with simple examples and gradually work your way up to more complex scenarios. Happy querying!
If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)