Skip to main content

PostgreSQL Window Functions

Introduction

Window functions are one of PostgreSQL's most powerful features, yet many beginners don't fully understand their potential. Unlike regular aggregate functions that group rows into a single output row, window functions perform calculations across a set of table rows that are somehow related to the current row. This allows you to compare values from one row to another without collapsing multiple rows into one.

Window functions are particularly useful when you need to:

  • Calculate running totals or moving averages
  • Rank items within groups
  • Compare a row's value with values from other rows
  • Access both detail and summary data simultaneously

Basic Syntax

The general syntax for a window function is:

sql
SELECT column1, column2,
window_function() OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression [ASC | DESC]]
[frame_clause]
)
FROM table_name;

Let's break down the key components:

  • window_function(): The function to apply (like SUM, AVG, ROW_NUMBER, etc.)
  • OVER: The clause that defines the window
  • PARTITION BY: Divides the result set into partitions (optional)
  • ORDER BY: Defines the logical order of rows within each partition (optional)
  • frame_clause: Defines which rows to include in the window frame (optional)

Common Window Functions

Row Numbering Functions

These functions assign unique numbers to rows within a partition:

  • ROW_NUMBER(): Assigns a unique sequential integer to rows
  • RANK(): Assigns the same rank to ties, with gaps
  • DENSE_RANK(): Assigns the same rank to ties, without gaps
  • NTILE(n): Divides rows into n approximately equal groups

Let's see them in action with an example:

sql
-- Create a sample table
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(100),
salary NUMERIC(10, 2)
);

-- Insert sample data
INSERT INTO employees (name, department, salary) VALUES
('Alice', 'Engineering', 70000),
('Bob', 'Engineering', 80000),
('Charlie', 'Engineering', 80000),
('David', 'Marketing', 65000),
('Emily', 'Marketing', 90000),
('Frank', 'Finance', 95000),
('Grace', 'Finance', 95000);

-- Using row numbering functions
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;

Result:

namedepartmentsalaryrow_numrankdense_rank
BobEngineering80000.00111
CharlieEngineering80000.00211
AliceEngineering70000.00332
EmilyMarketing90000.00111
DavidMarketing65000.00222
FrankFinance95000.00111
GraceFinance95000.00211

Notice how:

  • ROW_NUMBER() always gives unique numbers, even for ties
  • RANK() gives the same rank for ties but skips the next rank
  • DENSE_RANK() gives the same rank for ties without skipping

Aggregate Window Functions

These functions perform calculations across rows:

  • SUM(): Calculates the sum
  • AVG(): Calculates the average
  • COUNT(): Counts the number of rows
  • MIN(): Finds the minimum value
  • MAX(): Finds the maximum value

Let's see an example of using these functions:

sql
SELECT 
name,
department,
salary,
SUM(salary) OVER (PARTITION BY department) as dept_total,
AVG(salary) OVER (PARTITION BY department) as dept_avg,
salary / SUM(salary) OVER (PARTITION BY department) * 100 as pct_of_dept_total
FROM employees
ORDER BY department, salary DESC;

Result:

namedepartmentsalarydept_totaldept_avgpct_of_dept_total
FrankFinance95000.00190000.0095000.0050.0000000000
GraceFinance95000.00190000.0095000.0050.0000000000
BobEngineering80000.00230000.0076666.6734.7826086957
CharlieEngineering80000.00230000.0076666.6734.7826086957
AliceEngineering70000.00230000.0076666.6730.4347826087
EmilyMarketing90000.00155000.0077500.0058.0645161290
DavidMarketing65000.00155000.0077500.0041.9354838710

This query shows:

  • Total salary for each department
  • Average salary in each department
  • Each employee's salary as a percentage of their department's total

Value Functions

These functions access values from other rows:

  • LAG(): Accesses the value from a previous row
  • LEAD(): Accesses the value from a subsequent row
  • FIRST_VALUE(): Gets the first value in the window frame
  • LAST_VALUE(): Gets the last value in the window frame
  • NTH_VALUE(): Gets the nth value in the window frame

Example:

sql
-- Create a sample table for time series data
CREATE TABLE sales (
sale_date DATE,
amount NUMERIC(10, 2)
);

-- Insert sample data
INSERT INTO sales VALUES
('2023-01-01', 1200),
('2023-01-02', 1500),
('2023-01-03', 900),
('2023-01-04', 1600),
('2023-01-05', 1100),
('2023-01-06', 2000),
('2023-01-07', 1800);

-- Using value functions
SELECT
sale_date,
amount,
LAG(amount, 1) OVER (ORDER BY sale_date) as prev_day_amount,
amount - LAG(amount, 1) OVER (ORDER BY sale_date) as day_over_day_change,
LEAD(amount, 1) OVER (ORDER BY sale_date) as next_day_amount,
FIRST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as first_in_3days,
LAST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) as last_in_3days
FROM sales;

Result (abbreviated for clarity):

sale_dateamountprev_day_amountday_over_day_changenext_day_amountfirst_in_3dayslast_in_3days
2023-01-011200.00NULLNULL1500.001200.00900.00
2023-01-021500.001200.00300.00900.001200.001600.00
2023-01-03900.001500.00-600.001600.001200.001100.00
2023-01-041600.00900.00700.001100.00900.002000.00
.....................

This example shows how you can:

  • Compare today's sales with yesterday's
  • Calculate day-over-day changes
  • View a rolling window of 3 days

Window Frame Clause

The frame clause determines which rows are included in the window for each current row. The syntax is:

sql
ROWS | RANGE BETWEEN frame_start AND frame_end

Where frame_start and frame_end can be:

  • CURRENT ROW: The current row
  • n PRECEDING: n rows before the current row
  • n FOLLOWING: n rows after the current row
  • UNBOUNDED PRECEDING: All rows from the start of the partition
  • UNBOUNDED FOLLOWING: All rows to the end of the partition

Let's see an example of using frame clauses:

sql
-- Calculate running totals and moving averages
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) as running_total,
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as three_day_sum,
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as five_day_avg
FROM sales;

Result:

sale_dateamountrunning_totalthree_day_sumfive_day_avg
2023-01-011200.001200.001200.001240.00
2023-01-021500.002700.002700.001300.00
2023-01-03900.003600.003600.001420.00
2023-01-041600.005200.004000.001500.00
2023-01-051100.006300.003600.001480.00
2023-01-062000.008300.004700.001625.00
2023-01-071800.0010100.004900.001633.33

This example shows:

  • Running total of sales over time
  • A 3-day rolling sum (current day plus 2 previous days)
  • A 5-day moving average (2 days before, current day, and 2 days after)

Named Window Definitions

If you need to use the same window definition multiple times, you can define it once using the WINDOW clause:

sql
SELECT 
name,
department,
salary,
ROW_NUMBER() OVER dept_window as row_num,
RANK() OVER dept_window as rank,
DENSE_RANK() OVER dept_window as dense_rank
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC);

This produces the same result as our earlier row numbering example but with cleaner, more maintainable code.

Real-World Applications

Application 1: Finding Top N Per Group

A common requirement is to find the top N items in each group:

sql
-- Find the highest paid employee in each department
WITH ranked_employees AS (
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rank = 1;

Result:

namedepartmentsalary
BobEngineering80000.00
CharlieEngineering80000.00
EmilyMarketing90000.00
FrankFinance95000.00
GraceFinance95000.00

Application 2: Calculating Percentiles

Window functions can be used to calculate percentiles:

sql
-- Create a larger dataset
CREATE TABLE test_scores (
student_id SERIAL PRIMARY KEY,
subject VARCHAR(100),
score INT
);

-- Insert sample data (abbreviated)
INSERT INTO test_scores (subject, score) VALUES
('Math', 85), ('Math', 92), ('Math', 78), -- more rows...
('Science', 88), ('Science', 76), ('Science', 94), -- more rows...
('History', 81), ('History', 89), ('History', 67); -- more rows...

-- Calculate percentile ranks
SELECT
student_id,
subject,
score,
PERCENT_RANK() OVER (PARTITION BY subject ORDER BY score) * 100 as percentile
FROM test_scores
ORDER BY subject, score DESC;

This helps teachers understand how students perform relative to their peers.

Application 3: Time Series Analysis

Window functions excel at time series analysis:

sql
-- Analyzing stock prices
CREATE TABLE stock_prices (
trading_date DATE,
closing_price NUMERIC(10, 2)
);

-- Calculate various metrics
SELECT
trading_date,
closing_price,
closing_price - LAG(closing_price, 1) OVER (ORDER BY trading_date) as daily_change,
(closing_price / LAG(closing_price, 1) OVER (ORDER BY trading_date) - 1) * 100 as daily_pct_change,
AVG(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as five_day_ma,
MAX(closing_price) OVER (ORDER BY trading_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) as twenty_day_high
FROM stock_prices;

This type of analysis is crucial for finance professionals and algorithmic traders.

Performance Considerations

While window functions are powerful, they can impact performance with large datasets. Keep these tips in mind:

  1. Indexing: Ensure columns used in PARTITION BY and ORDER BY clauses are properly indexed
  2. Limit window size: Use specific frame clauses instead of the default when possible
  3. Materialized views: Consider using materialized views for frequently run window function queries
  4. Common Table Expressions: Break complex window function queries into CTEs for better readability and potentially better performance

Summary

PostgreSQL window functions are an essential tool for advanced data analysis. They allow you to:

  • Perform calculations across rows related to the current row
  • Maintain detail data while accessing aggregate information
  • Simplify complex queries that would otherwise require self-joins or subqueries
  • Handle time series data elegantly
  • Solve common business problems like ranking, percentiles, and running totals

By mastering window functions, you'll be able to write more efficient and powerful queries, extracting deeper insights from your data.

Additional Resources and Exercises

Resources

Exercises

  1. Basic Ranking: Create a table of products with categories and prices. Write a query to find the top 3 most expensive products in each category.

  2. Running Totals: Using the sales table from our examples, calculate a running total of sales for each month of the year.

  3. Moving Averages: For a time series dataset of your choice, calculate a 7-day moving average and compare it to the daily values.

  4. Percentile Analysis: Create a table of student test scores and calculate which quartile each student falls into for each subject.

  5. Gap Detection: Using a sequence of dates or numbers, write a query that identifies gaps in the sequence using window functions.

Remember, mastering window functions takes practice. Start with simple examples and gradually work your way up to more complex scenarios. Happy querying!



If you spot any mistakes on this website, please let me know at feedback@compilenrun.com. I’d greatly appreciate your feedback! :)