Window Functions
Introduction
Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions that group rows into a single output row, window functions retain all rows in the result while adding calculated values based on a specified "window" of rows.
Think of window functions as giving you the ability to look through a "window" of related rows while processing each row individually. This enables complex analytical queries that would otherwise require multiple self-joins or subqueries.
Why Use Window Functions?
Window functions solve common data analysis problems like:
- Calculating running totals or moving averages
- Ranking rows within partitions of data
- Comparing values to previous or following rows
- Determining percentiles within groups
Before window functions, these calculations required complex, inefficient SQL code. Window functions make these operations simpler, more readable, and more efficient.
Basic Syntax
The general syntax for window functions looks like this:
SELECT
column1,
column2,
WINDOW_FUNCTION() OVER (
[PARTITION BY column_list]
[ORDER BY column_list]
[frame_clause]
) AS new_column
FROM table_name;
Let's break down the key components:
WINDOW_FUNCTION()
: The actual function being applied (e.g.,SUM()
,AVG()
,ROW_NUMBER()
)OVER
: Indicates we're using a window functionPARTITION BY
: Divides rows into groups (optional)ORDER BY
: Defines the logical order of rows within each partition (optional)frame_clause
: Specifies which rows to include in the window (optional)
Common Window Functions
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential integer to each row within a partition.
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
Example Output:
employee_name | department | salary | rank |
---|---|---|---|
Alice | IT | 85000 | 1 |
Bob | IT | 78000 | 2 |
Charlie | IT | 65000 | 3 |
Diana | HR | 72000 | 1 |
Eve | HR | 68000 | 2 |
Frank | HR | 62000 | 3 |
In this example, each employee is ranked by salary within their department.
RANK() and DENSE_RANK()
While ROW_NUMBER()
always gives sequential numbers, RANK()
and DENSE_RANK()
handle ties differently:
SELECT
student_name,
score,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM test_scores;
Example Output:
student_name | score | rank | dense_rank |
---|---|---|---|
Alice | 95 | 1 | 1 |
Bob | 95 | 1 | 1 |
Charlie | 90 | 3 | 2 |
Diana | 85 | 4 | 3 |
Eve | 85 | 4 | 3 |
Frank | 80 | 6 | 4 |
Notice how:
RANK()
leaves gaps after ties (1,1,3,4,4,6)DENSE_RANK()
doesn't leave gaps (1,1,2,3,3,4)
NTILE()
Divides rows into a specified number of approximately equal groups:
SELECT
student_name,
score,
NTILE(4) OVER (ORDER BY score DESC) AS quartile
FROM test_scores;
Example Output:
student_name | score | quartile |
---|---|---|
Alice | 95 | 1 |
Bob | 95 | 1 |
Charlie | 90 | 2 |
Diana | 85 | 2 |
Eve | 85 | 3 |
Frank | 80 | 3 |
Grace | 75 | 4 |
Heidi | 70 | 4 |
This divides students into quartiles based on their scores.
Aggregate Window Functions
SUM(), AVG(), COUNT(), MIN(), MAX()
These familiar aggregate functions can be used as window functions:
SELECT
order_date,
order_amount,
SUM(order_amount) OVER (ORDER BY order_date) AS running_total,
AVG(order_amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average_3day
FROM orders;
Example Output:
order_date | order_amount | running_total | moving_average_3day |
---|---|---|---|
2023-01-01 | 100 | 100 | 100.00 |
2023-01-02 | 150 | 250 | 125.00 |
2023-01-03 | 200 | 450 | 150.00 |
2023-01-04 | 120 | 570 | 156.67 |
2023-01-05 | 180 | 750 | 166.67 |
Here we calculate:
- A running total of order amounts
- A 3-day moving average (current day and 2 preceding days)
Value Functions
LAG() and LEAD()
These functions access data from previous or subsequent rows:
SELECT
stock_date,
closing_price,
LAG(closing_price, 1) OVER (ORDER BY stock_date) AS previous_day_price,
closing_price - LAG(closing_price, 1) OVER (ORDER BY stock_date) AS price_change
FROM stock_prices;
Example Output:
stock_date | closing_price | previous_day_price | price_change |
---|---|---|---|
2023-01-01 | 150.25 | NULL | NULL |
2023-01-02 | 152.30 | 150.25 | 2.05 |
2023-01-03 | 148.75 | 152.30 | -3.55 |
2023-01-04 | 153.50 | 148.75 | 4.75 |
2023-01-05 | 155.20 | 153.50 | 1.70 |
Similarly, LEAD()
lets you access subsequent rows:
SELECT
product_name,
release_date,
price,
LEAD(price, 1) OVER (PARTITION BY product_category ORDER BY release_date) AS next_model_price
FROM products;
FIRST_VALUE() and LAST_VALUE()
These functions get the first or last value in a window frame:
SELECT
department,
employee_name,
salary,
FIRST_VALUE(employee_name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid_employee
FROM employees;
Window Frame Specification
Window frames let you control exactly which rows are included in your window:
SELECT
order_date,
order_amount,
AVG(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING
) AS centered_average
FROM orders;
Frame clauses include:
ROWS BETWEEN <start> AND <end>
: Based on physical positionsRANGE BETWEEN <start> AND <end>
: Based on logical values
Where <start>
and <end>
can be:
UNBOUNDED PRECEDING
: Start from the first row of the partitionn PRECEDING
: n rows before the current rowCURRENT ROW
: The current rown FOLLOWING
: n rows after the current rowUNBOUNDED FOLLOWING
: End at the last row of the partition
Practical Examples
Example 1: Sales Analysis
Imagine you have a sales table and want to analyze performance:
SELECT
salesperson,
sale_date,
sale_amount,
SUM(sale_amount) OVER (PARTITION BY salesperson) AS total_sales_by_person,
SUM(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total,
ROUND(
sale_amount / SUM(sale_amount) OVER (PARTITION BY salesperson) * 100,
2
) AS percentage_of_total
FROM sales
ORDER BY salesperson, sale_date;
Example Output:
salesperson | sale_date | sale_amount | total_sales_by_person | running_total | percentage_of_total |
---|---|---|---|---|---|
Alice | 2023-01-05 | 1200 | 5400 | 1200 | 22.22 |
Alice | 2023-01-12 | 800 | 5400 | 2000 | 14.81 |
Alice | 2023-01-18 | 1500 | 5400 | 3500 | 27.78 |
Alice | 2023-01-25 | 1900 | 5400 | 5400 | 35.19 |
Bob | 2023-01-03 | 950 | 4100 | 950 | 23.17 |
Bob | 2023-01-09 | 1300 | 4100 | 2250 | 31.71 |
Bob | 2023-01-20 | 1850 | 4100 | 4100 | 45.12 |
This query gives each salesperson's:
- Total sales for the month
- Running total as the month progresses
- Each sale as a percentage of their monthly total
Example 2: Identifying Outliers
You can use window functions to identify values that deviate significantly from group averages:
SELECT
product_category,
product_name,
price,
AVG(price) OVER (PARTITION BY product_category) AS avg_category_price,
ROUND(
(price - AVG(price) OVER (PARTITION BY product_category)) /
AVG(price) OVER (PARTITION BY product_category) * 100,
2
) AS percentage_diff_from_avg,
CASE
WHEN price > 1.5 * AVG(price) OVER (PARTITION BY product_category) THEN 'Expensive'
WHEN price < 0.75 * AVG(price) OVER (PARTITION BY product_category) THEN 'Bargain'
ELSE 'Average Price'
END AS price_category
FROM products;
This query helps identify products that are unusually expensive or cheap within their category.
Example 3: Time Series Analysis
Window functions are powerful for time series analysis:
SELECT
reading_date,
temperature,
AVG(temperature) OVER (
ORDER BY reading_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg,
temperature - LAG(temperature, 7) OVER (ORDER BY reading_date) AS change_from_week_ago,
MAX(temperature) OVER (
ORDER BY reading_date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) AS month_high
FROM weather_readings
ORDER BY reading_date;
This query analyzes temperature data with:
- A 7-day moving average
- Day-over-day change
- Rolling 30-day high temperature
Database Support and Differences
Window functions are supported in many database systems, but with variations:
- PostgreSQL: Excellent support for all standard window functions
- MySQL: Added full support in version 8.0
- SQL Server: Strong support since SQL Server 2012
- Oracle: Supports window functions through the OVER clause
- SQLite: Added support in version 3.25.0
Check your specific database documentation for implementation details.
Common Mistakes and Pitfalls
Confusing PARTITION BY with GROUP BY
PARTITION BY
divides rows for calculation but keeps all rows in output, while GROUP BY
consolidates rows into one output row per group.
Incorrect Frame Specifications
If not specified, the default frame for ordered windows is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
, which can lead to unexpected results, especially with duplicates.
Performance Considerations
Complex window functions can be resource-intensive. Consider:
- Adding appropriate indexes for columns used in
PARTITION BY
andORDER BY
- Limiting the data set before applying window functions
- Breaking complex queries into temporary tables or CTEs
Summary
Window functions are a powerful SQL feature that enable complex analytical queries with elegant, readable syntax. They allow you to:
- Calculate aggregates while preserving individual rows
- Perform ranking and row numbering operations
- Access data from related rows without complex joins
- Create running totals, moving averages, and other time-series calculations
By mastering window functions, you'll write more efficient queries and unlock advanced data analysis capabilities in your database applications.
Exercises
- Write a query to rank products by price within each category.
- Calculate a 3-month moving average of monthly sales.
- For each employee, show their salary, department average, and how much they earn relative to their department's average (as a percentage).
- Identify the top 3 customers by purchase amount in each region.
- Calculate quarter-over-quarter growth in revenue for each product line.
Additional Resources
- PostgreSQL Window Functions Documentation
- SQL Server Window Functions
- "SQL Window Functions Explained" on Mode Analytics Blog
- "Practical SQL" by Anthony DeBarros (Chapter on Window Functions)
- Interactive SQL exercises on window functions on popular learning platforms
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)