Skip to main content

MySQL Window Functions

Introduction

Window functions are a powerful feature in MySQL (introduced in version 8.0) that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions that collapse results into a single output row, window functions maintain the original table rows while adding calculated fields based on the specified "window" of rows.

Think of a window function as a way to look through a "window" of rows and calculate values based on what you see, without changing the number of rows in your result set. This makes window functions incredibly useful for analytical queries, running totals, moving averages, and ranking operations.

Basic Concepts of Window Functions

The general syntax for a window function looks like this:

sql
function_name() OVER (
[PARTITION BY column_list]
[ORDER BY column_list]
[frame_clause]
)

Let's break down these components:

  • function_name(): The function to apply (like SUM, AVG, ROW_NUMBER, etc.)
  • OVER: The keyword that identifies this as a window function
  • PARTITION BY: Optional - divides the result set into partitions (groups)
  • ORDER BY: Optional - defines the logical order of rows within each partition
  • frame_clause: Optional - further refines which rows are included in the window calculation

Common Window Functions in MySQL

1. ROW_NUMBER()

ROW_NUMBER() assigns a sequential integer to each row within the partition.

sql
SELECT 
product_name,
category,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products;

Example result:

| product_name   | category      | price  | price_rank |
|----------------|---------------|--------|------------|
| Premium Laptop | Electronics | 1200.00| 1 |
| Basic Laptop | Electronics | 800.00 | 2 |
| Tablet | Electronics | 400.00 | 3 |
| Designer Jeans | Clothing | 120.00 | 1 |
| Winter Jacket | Clothing | 95.00 | 2 |
| T-shirt | Clothing | 25.00 | 3 |

In this example, ROW_NUMBER() starts over at 1 for each category, giving us the rank of products by price within each category.

2. RANK() and DENSE_RANK()

These functions assign rankings, but handle ties differently:

  • RANK() leaves gaps after ties (like 1, 1, 3, 4)
  • DENSE_RANK() doesn't leave gaps after ties (like 1, 1, 2, 3)
sql
SELECT 
student_name,
score,
RANK() OVER (ORDER BY score DESC) AS normal_rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM exam_results;

Example result:

| student_name | score | normal_rank | dense_rank |
|--------------|-------|-------------|------------|
| Alice | 95 | 1 | 1 |
| Bob | 95 | 1 | 1 |
| Charlie | 92 | 3 | 2 |
| Diana | 88 | 4 | 3 |
| Eve | 88 | 4 | 3 |
| Frank | 85 | 6 | 4 |

Notice how normal_rank skips from 1 to 3 after the tie, while dense_rank continues with 2.

3. Aggregate Window Functions

You can use standard aggregate functions as window functions:

sql
SELECT 
order_date,
order_id,
order_amount,
SUM(order_amount) OVER (ORDER BY order_date) AS running_total,
AVG(order_amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3days
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-01-10';

Example result:

| order_date | order_id | order_amount | running_total | moving_avg_3days |
|------------|----------|--------------|---------------|------------------|
| 2023-01-01 | 1001 | 120.50 | 120.50 | 120.50 |
| 2023-01-02 | 1002 | 95.25 | 215.75 | 107.88 |
| 2023-01-03 | 1003 | 200.00 | 415.75 | 138.58 |
| 2023-01-05 | 1004 | 150.75 | 566.50 | 148.67 |
| 2023-01-07 | 1005 | 85.00 | 651.50 | 145.25 |
| 2023-01-09 | 1006 | 175.25 | 826.75 | 137.00 |

This example shows both a running total of order amounts and a 3-day moving average.

4. NTILE() Function

NTILE() divides the rows into a specified number of approximately equal groups:

sql
SELECT 
product_name,
price,
NTILE(4) OVER (ORDER BY price) AS price_quartile
FROM products;

Example result:

| product_name   | price   | price_quartile |
|----------------|---------|----------------|
| T-shirt | 25.00 | 1 |
| Socks | 10.00 | 1 |
| Winter Jacket | 95.00 | 2 |
| Designer Jeans | 120.00 | 2 |
| Tablet | 400.00 | 3 |
| Basic Laptop | 800.00 | 3 |
| Premium Laptop | 1200.00 | 4 |
| Smartphone | 900.00 | 4 |

This assigns products to price quartiles (quarters).

5. LAG() and LEAD() Functions

These functions let you access data from previous or subsequent rows without using self-joins:

sql
SELECT 
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS previous_month_revenue,
LEAD(revenue, 1) OVER (ORDER BY month) AS next_month_revenue
FROM monthly_sales;

Example result:

| month     | revenue | previous_month_revenue | next_month_revenue |
|-----------|---------|------------------------|-------------------|
| 2023-01 | 45000 | NULL | 52000 |
| 2023-02 | 52000 | 45000 | 48500 |
| 2023-03 | 48500 | 52000 | 61000 |
| 2023-04 | 61000 | 48500 | 57500 |
| 2023-05 | 57500 | 61000 | NULL |

Window Frame Specifications

A frame clause defines which rows are included in the window for each calculation. It's specified using:

sql
{ROWS | RANGE} {frame_start | BETWEEN frame_start AND frame_end}

Common frame specifications include:

  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: All rows from the start of the partition up to current row
  • ROWS BETWEEN n PRECEDING AND CURRENT ROW: Current row and the n previous rows
  • ROWS BETWEEN CURRENT ROW AND n FOLLOWING: Current row and the next n rows
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: All rows in the partition

Let's see an example:

sql
SELECT 
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_average_4days
FROM daily_sales;

Example result:

| sale_date  | amount | running_total | moving_average_4days |
|------------|--------|---------------|----------------------|
| 2023-05-01 | 120 | 120 | 120.00 |
| 2023-05-02 | 85 | 205 | 102.50 |
| 2023-05-03 | 95 | 300 | 100.00 |
| 2023-05-04 | 105 | 405 | 101.25 |
| 2023-05-05 | 150 | 555 | 108.75 |
| 2023-05-06 | 110 | 665 | 115.00 |

Real-world Application Examples

Example 1: Sales Analysis Dashboard

Imagine you're building a sales dashboard and need to show:

  1. Monthly sales
  2. Year-to-date (YTD) sales
  3. Comparison with previous month
  4. Rank of months by sales
sql
SELECT 
DATE_FORMAT(sale_date, '%Y-%m') AS month,
SUM(sale_amount) AS monthly_sales,
SUM(SUM(sale_amount)) OVER (
PARTITION BY YEAR(sale_date)
ORDER BY DATE_FORMAT(sale_date, '%Y-%m')
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS ytd_sales,
LAG(SUM(sale_amount), 1) OVER (
ORDER BY DATE_FORMAT(sale_date, '%Y-%m')
) AS prev_month_sales,
RANK() OVER (
PARTITION BY YEAR(sale_date)
ORDER BY SUM(sale_amount) DESC
) AS month_rank
FROM sales
GROUP BY DATE_FORMAT(sale_date, '%Y-%m');

Example 2: Employee Salary Analysis

HR might want to analyze salaries to check for fairness:

sql
SELECT 
e.employee_id,
e.department,
e.job_title,
e.salary,
AVG(e.salary) OVER (
PARTITION BY e.department
) AS dept_avg_salary,
e.salary - AVG(e.salary) OVER (
PARTITION BY e.department
) AS diff_from_dept_avg,
PERCENT_RANK() OVER (
PARTITION BY e.department
ORDER BY e.salary
) AS percentile_in_dept
FROM employees e;

Example 3: Website User Engagement Metrics

For tracking user engagement over time:

sql
SELECT 
user_id,
event_date,
actions_count,
SUM(actions_count) OVER (
PARTITION BY user_id
ORDER BY event_date
) AS cumulative_actions,
AVG(actions_count) OVER (
PARTITION BY user_id
ORDER BY event_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM user_activity;

Best Practices and Performance Considerations

  1. Place indexes wisely: Window functions often use ORDER BY, so having indexes on those columns can improve performance.

  2. Consider materialization: For complex reports, consider materializing intermediate results in temporary tables if the same window calculations are used multiple times.

  3. Limit window size: When possible, use a specific frame clause rather than an unbounded one to limit the amount of data processed.

  4. Use CTEs for readability: Common Table Expressions can make queries with window functions more readable:

sql
WITH sales_with_windows AS (
SELECT
sale_date,
product_id,
amount,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY amount DESC) AS price_rank
FROM sales
)
SELECT * FROM sales_with_windows WHERE price_rank <= 3;
  1. Mind the order: Window functions are processed after most other parts of the query, including WHERE, GROUP BY, and HAVING clauses.

Summary

MySQL window functions provide powerful capabilities for analytical queries that were previously difficult to express in SQL. They allow you to:

  • Calculate rankings within groups
  • Compute running totals and moving averages
  • Compare values across rows without complex self-joins
  • Divide data into equal groups (quartiles, deciles, etc.)
  • Analyze trends and patterns over time

These functions open up many possibilities for data analysis directly within your SQL queries, reducing the need to export data to external tools for analysis.

Additional Resources

To strengthen your understanding of MySQL window functions, try these exercises:

  1. Calculate the running total of sales by month for each product category.
  2. Find the salary difference between each employee and the next highest-paid employee in the same department.
  3. Identify customers who have spent more than the average amount for their region in each quarter.
  4. Calculate a 3-month moving average of product sales.
  5. Rank products by sales volume within their category, and only show the top 3 in each category.

Further Reading

With these powerful tools in your MySQL toolkit, you'll be able to perform sophisticated data analysis directly within your database queries!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)