PostgreSQL Window Functions

Introduction

Window functions are one of PostgreSQL's most powerful features, yet many beginners don't fully understand their potential. Unlike regular aggregate functions that group rows into a single output row, window functions perform calculations across a set of table rows that are somehow related to the current row. This allows you to compare values from one row to another without collapsing multiple rows into one.

Window functions are particularly useful when you need to:

Calculate running totals or moving averages
Rank items within groups
Compare a row's value with values from other rows
Access both detail and summary data simultaneously

Basic Syntax

The general syntax for a window function is:

sql
SELECT column1, column2,
       window_function() OVER (
         [PARTITION BY partition_expression]
         [ORDER BY sort_expression [ASC | DESC]]
         [frame_clause]
       )
FROM table_name;

Let's break down the key components:

window_function(): The function to apply (like SUM, AVG, ROW_NUMBER, etc.)
OVER: The clause that defines the window
PARTITION BY: Divides the result set into partitions (optional)
ORDER BY: Defines the logical order of rows within each partition (optional)
frame_clause: Defines which rows to include in the window frame (optional)

Common Window Functions

Row Numbering Functions

These functions assign unique numbers to rows within a partition:

ROW_NUMBER(): Assigns a unique sequential integer to rows
RANK(): Assigns the same rank to ties, with gaps
DENSE_RANK(): Assigns the same rank to ties, without gaps
NTILE(n): Divides rows into n approximately equal groups

Let's see them in action with an example:

sql
-- Create a sample table
CREATE TABLE employees (
  emp_id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  department VARCHAR(100),
  salary NUMERIC(10, 2)
);

-- Insert sample data
INSERT INTO employees (name, department, salary) VALUES
  ('Alice', 'Engineering', 70000),
  ('Bob', 'Engineering', 80000),
  ('Charlie', 'Engineering', 80000),
  ('David', 'Marketing', 65000),
  ('Emily', 'Marketing', 90000),
  ('Frank', 'Finance', 95000),
  ('Grace', 'Finance', 95000);

-- Using row numbering functions
SELECT 
  name,
  department,
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;

Result:

name	department	salary	row_num	rank	dense_rank
Bob	Engineering	80000.00	1	1	1
Charlie	Engineering	80000.00	2	1	1
Alice	Engineering	70000.00	3	3	2
Emily	Marketing	90000.00	1	1	1
David	Marketing	65000.00	2	2	2
Frank	Finance	95000.00	1	1	1
Grace	Finance	95000.00	2	1	1

Notice how:

ROW_NUMBER() always gives unique numbers, even for ties
RANK() gives the same rank for ties but skips the next rank
DENSE_RANK() gives the same rank for ties without skipping

Aggregate Window Functions

These functions perform calculations across rows:

SUM(): Calculates the sum
AVG(): Calculates the average
COUNT(): Counts the number of rows
MIN(): Finds the minimum value
MAX(): Finds the maximum value

Let's see an example of using these functions:

sql
SELECT 
  name,
  department,
  salary,
  SUM(salary) OVER (PARTITION BY department) as dept_total,
  AVG(salary) OVER (PARTITION BY department) as dept_avg,
  salary / SUM(salary) OVER (PARTITION BY department) * 100 as pct_of_dept_total
FROM employees
ORDER BY department, salary DESC;

Result:

name	department	salary	dept_total	dept_avg	pct_of_dept_total
Frank	Finance	95000.00	190000.00	95000.00	50.0000000000
Grace	Finance	95000.00	190000.00	95000.00	50.0000000000
Bob	Engineering	80000.00	230000.00	76666.67	34.7826086957
Charlie	Engineering	80000.00	230000.00	76666.67	34.7826086957
Alice	Engineering	70000.00	230000.00	76666.67	30.4347826087
Emily	Marketing	90000.00	155000.00	77500.00	58.0645161290
David	Marketing	65000.00	155000.00	77500.00	41.9354838710

This query shows:

Total salary for each department
Average salary in each department
Each employee's salary as a percentage of their department's total

Value Functions

These functions access values from other rows:

LAG(): Accesses the value from a previous row
LEAD(): Accesses the value from a subsequent row
FIRST_VALUE(): Gets the first value in the window frame
LAST_VALUE(): Gets the last value in the window frame
NTH_VALUE(): Gets the nth value in the window frame

Example:

sql
-- Create a sample table for time series data
CREATE TABLE sales (
  sale_date DATE,
  amount NUMERIC(10, 2)
);

-- Insert sample data
INSERT INTO sales VALUES
  ('2023-01-01', 1200),
  ('2023-01-02', 1500),
  ('2023-01-03', 900),
  ('2023-01-04', 1600),
  ('2023-01-05', 1100),
  ('2023-01-06', 2000),
  ('2023-01-07', 1800);

-- Using value functions
SELECT 
  sale_date,
  amount,
  LAG(amount, 1) OVER (ORDER BY sale_date) as prev_day_amount,
  amount - LAG(amount, 1) OVER (ORDER BY sale_date) as day_over_day_change,
  LEAD(amount, 1) OVER (ORDER BY sale_date) as next_day_amount,
  FIRST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as first_in_3days,
  LAST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as last_in_3days
FROM sales;

Result (abbreviated for clarity):

sale_date	amount	prev_day_amount	day_over_day_change	next_day_amount	first_in_3days	last_in_3days
2023-01-01	1200.00	NULL	NULL	1500.00	1200.00	900.00
2023-01-02	1500.00	1200.00	300.00	900.00	1200.00	1600.00
2023-01-03	900.00	1500.00	-600.00	1600.00	1200.00	1100.00
2023-01-04	1600.00	900.00	700.00	1100.00	900.00	2000.00
...	...	...	...	...	...	...

This example shows how you can:

Compare today's sales with yesterday's
Calculate day-over-day changes
View a rolling window of 3 days

Window Frame Clause

The frame clause determines which rows are included in the window for each current row. The syntax is:

sql
ROWS | RANGE BETWEEN frame_start AND frame_end

Where frame_start and frame_end can be:

CURRENT ROW: The current row
n PRECEDING: n rows before the current row
n FOLLOWING: n rows after the current row
UNBOUNDED PRECEDING: All rows from the start of the partition
UNBOUNDED FOLLOWING: All rows to the end of the partition

Let's see an example of using frame clauses:

sql
-- Calculate running totals and moving averages
SELECT 
  sale_date,
  amount,
  SUM(amount) OVER (ORDER BY sale_date) as running_total,
  SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as three_day_sum,
  AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as five_day_avg
FROM sales;

Result:

sale_date	amount	running_total	three_day_sum	five_day_avg
2023-01-01	1200.00	1200.00	1200.00	1240.00
2023-01-02	1500.00	2700.00	2700.00	1300.00
2023-01-03	900.00	3600.00	3600.00	1420.00
2023-01-04	1600.00	5200.00	4000.00	1500.00
2023-01-05	1100.00	6300.00	3600.00	1480.00
2023-01-06	2000.00	8300.00	4700.00	1625.00
2023-01-07	1800.00	10100.00	4900.00	1633.33

This example shows:

Running total of sales over time
A 3-day rolling sum (current day plus 2 previous days)
A 5-day moving average (2 days before, current day, and 2 days after)

Named Window Definitions

If you need to use the same window definition multiple times, you can define it once using the WINDOW clause:

sql
SELECT 
  name,
  department,
  salary,
  ROW_NUMBER() OVER dept_window as row_num,
  RANK() OVER dept_window as rank,
  DENSE_RANK() OVER dept_window as dense_rank
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC);

This produces the same result as our earlier row numbering example but with cleaner, more maintainable code.

Real-World Applications

Application 1: Finding Top N Per Group

A common requirement is to find the top N items in each group:

sql
-- Find the highest paid employee in each department
WITH ranked_employees AS (
  SELECT 
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
  FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rank = 1;

Result:

name	department	salary
Bob	Engineering	80000.00
Charlie	Engineering	80000.00
Emily	Marketing	90000.00
Frank	Finance	95000.00
Grace	Finance	95000.00

Application 2: Calculating Percentiles

Window functions can be used to calculate percentiles:

sql
-- Create a larger dataset
CREATE TABLE test_scores (
  student_id SERIAL PRIMARY KEY,
  subject VARCHAR(100),
  score INT
);

-- Insert sample data (abbreviated)
INSERT INTO test_scores (subject, score) VALUES
  ('Math', 85), ('Math', 92), ('Math', 78), -- more rows...
  ('Science', 88), ('Science', 76), ('Science', 94), -- more rows...
  ('History', 81), ('History', 89), ('History', 67); -- more rows...

-- Calculate percentile ranks
SELECT 
  student_id,
  subject,
  score,
  PERCENT_RANK() OVER (PARTITION BY subject ORDER BY score) * 100 as percentile
FROM test_scores
ORDER BY subject, score DESC;

This helps teachers understand how students perform relative to their peers.

Application 3: Time Series Analysis

Window functions excel at time series analysis:

sql
-- Analyzing stock prices
CREATE TABLE stock_prices (
  trading_date DATE,
  closing_price NUMERIC(10, 2)
);

-- Calculate various metrics
SELECT 
  trading_date,
  closing_price,
  closing_price - LAG(closing_price, 1) OVER (ORDER BY trading_date) as daily_change,
  (closing_price / LAG(closing_price, 1) OVER (ORDER BY trading_date) - 1) * 100 as daily_pct_change,
  AVG(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as five_day_ma,
  MAX(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) as twenty_day_high
FROM stock_prices;

This type of analysis is crucial for finance professionals and algorithmic traders.

Performance Considerations

While window functions are powerful, they can impact performance with large datasets. Keep these tips in mind:

Indexing: Ensure columns used in PARTITION BY and ORDER BY clauses are properly indexed
Limit window size: Use specific frame clauses instead of the default when possible
Materialized views: Consider using materialized views for frequently run window function queries
Common Table Expressions: Break complex window function queries into CTEs for better readability and potentially better performance

Summary

PostgreSQL window functions are an essential tool for advanced data analysis. They allow you to:

Perform calculations across rows related to the current row
Maintain detail data while accessing aggregate information
Simplify complex queries that would otherwise require self-joins or subqueries
Handle time series data elegantly
Solve common business problems like ranking, percentiles, and running totals

By mastering window functions, you'll be able to write more efficient and powerful queries, extracting deeper insights from your data.

Additional Resources and Exercises

Resources

Exercises

Basic Ranking: Create a table of products with categories and prices. Write a query to find the top 3 most expensive products in each category.
Running Totals: Using the sales table from our examples, calculate a running total of sales for each month of the year.
Moving Averages: For a time series dataset of your choice, calculate a 7-day moving average and compare it to the daily values.
Percentile Analysis: Create a table of student test scores and calculate which quartile each student falls into for each subject.
Gap Detection: Using a sequence of dates or numbers, write a query that identifies gaps in the sequence using window functions.

Remember, mastering window functions takes practice. Start with simple examples and gradually work your way up to more complex scenarios. Happy querying!

If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)

Introduction​

Basic Syntax​

Common Window Functions​

Row Numbering Functions​

Aggregate Window Functions​

Value Functions​

Window Frame Clause​

Named Window Definitions​

Real-World Applications​

Application 1: Finding Top N Per Group​

Application 2: Calculating Percentiles​

Application 3: Time Series Analysis​

Performance Considerations​

Summary​

Additional Resources and Exercises​

Resources​

Exercises​

Introduction

Basic Syntax

Common Window Functions

Row Numbering Functions

Aggregate Window Functions

Value Functions

Window Frame Clause

Named Window Definitions

Real-World Applications

Application 1: Finding Top N Per Group

Application 2: Calculating Percentiles

Application 3: Time Series Analysis

Performance Considerations

Summary

Additional Resources and Exercises

Resources

Exercises