MySQL Aggregate Functions

Introduction

Aggregate functions are a powerful feature in MySQL that allow you to perform calculations on sets of values. Instead of operating on individual rows, aggregate functions work across multiple rows to calculate summary statistics. These functions are essential for data analysis, reporting, and extracting meaningful insights from your databases.

In this tutorial, we'll explore the most commonly used MySQL aggregate functions, understand their syntax, and see how they can be applied in real-world scenarios.

Core Aggregate Functions in MySQL

MySQL provides several built-in aggregate functions that help you analyze and summarize your data efficiently. Let's explore the most important ones:

1. COUNT() Function

The COUNT() function returns the number of rows that match a specified criterion.

Syntax

sql
COUNT(expression)

Examples

Count all rows in a table:

sql
SELECT COUNT(*) FROM employees;

Output:

+----------+
| COUNT(*) |
+----------+
|      100 |
+----------+

Count non-NULL values in a specific column:

sql
SELECT COUNT(email) FROM employees;

Output:

+--------------+
| COUNT(email) |
+--------------+
|           98 |
+--------------+

Count distinct values:

sql
SELECT COUNT(DISTINCT department) FROM employees;

Output:

+-------------------------+
| COUNT(DISTINCT department) |
+-------------------------+
|                       5 |
+-------------------------+

2. SUM() Function

The SUM() function calculates the total of all values in a numeric column.

Syntax

sql
SUM(expression)

Examples

Calculate total salary of all employees:

sql
SELECT SUM(salary) AS total_salary FROM employees;

Output:

+--------------+
| total_salary |
+--------------+
|     5750000  |
+--------------+

Calculate total salary by department:

sql
SELECT department, SUM(salary) AS department_salary 
FROM employees
GROUP BY department;

Output:

+------------+------------------+
| department | department_salary|
+------------+------------------+
| IT         |          1250000 |
| Sales      |          1450000 |
| Marketing  |           950000 |
| HR         |           850000 |
| Finance    |          1250000 |
+------------+------------------+

3. AVG() Function

The AVG() function calculates the average (mean) value of a numeric column.

Syntax

sql
AVG(expression)

Examples

Calculate average salary of all employees:

sql
SELECT AVG(salary) AS average_salary FROM employees;

Output:

+----------------+
| average_salary |
+----------------+
|         57500  |
+----------------+

Calculate average salary by department:

sql
SELECT department, AVG(salary) AS avg_dept_salary 
FROM employees
GROUP BY department;

Output:

+------------+----------------+
| department | avg_dept_salary|
+------------+----------------+
| IT         |         65789  |
| Sales      |         58000  |
| Marketing  |         52777  |
| HR         |         48857  |
| Finance    |         62500  |
+------------+----------------+

4. MAX() Function

The MAX() function returns the largest value in a set of non-NULL values.

Syntax

sql
MAX(expression)

Examples

Find the highest salary:

sql
SELECT MAX(salary) AS highest_salary FROM employees;

Output:

+----------------+
| highest_salary |
+----------------+
|         120000 |
+----------------+

Find the highest salary in each department:

sql
SELECT department, MAX(salary) AS highest_dept_salary 
FROM employees
GROUP BY department;

Output:

+------------+--------------------+
| department | highest_dept_salary|
+------------+--------------------+
| IT         |             120000 |
| Sales      |             110000 |
| Marketing  |              95000 |
| HR         |              78000 |
| Finance    |             115000 |
+------------+--------------------+

5. MIN() Function

The MIN() function returns the smallest value in a set of non-NULL values.

Syntax

sql
MIN(expression)

Examples

Find the lowest salary:

sql
SELECT MIN(salary) AS lowest_salary FROM employees;

Output:

+---------------+
| lowest_salary |
+---------------+
|         35000 |
+---------------+

Find the lowest salary in each department:

sql
SELECT department, MIN(salary) AS lowest_dept_salary 
FROM employees
GROUP BY department;

Output:

+------------+-------------------+
| department | lowest_dept_salary|
+------------+-------------------+
| IT         |             42000 |
| Sales      |             35000 |
| Marketing  |             38000 |
| HR         |             36000 |
| Finance    |             45000 |
+------------+-------------------+

Using GROUP BY with Aggregate Functions

The GROUP BY clause is often used with aggregate functions to group results by one or more columns. This allows you to perform calculations on each group rather than on the entire table.

Basic Syntax

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Examples

Count employees by department:

sql
SELECT department, COUNT(*) AS employee_count 
FROM employees 
GROUP BY department;

Output:

+------------+----------------+
| department | employee_count |
+------------+----------------+
| IT         |             19 |
| Sales      |             25 |
| Marketing  |             18 |
| HR         |             14 |
| Finance    |             20 |
+------------+----------------+

Calculate salary statistics by department:

sql
SELECT 
    department, 
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    SUM(salary) AS total_salary
FROM employees 
GROUP BY department
ORDER BY avg_salary DESC;

Output:

+------------+----------------+------------+------------+------------+--------------+
| department | employee_count | avg_salary | min_salary | max_salary | total_salary |
+------------+----------------+------------+------------+------------+--------------+
| IT         |             19 |      65789 |      42000 |     120000 |      1250000 |
| Finance    |             20 |      62500 |      45000 |     115000 |      1250000 |
| Sales      |             25 |      58000 |      35000 |     110000 |      1450000 |
| Marketing  |             18 |      52777 |      38000 |      95000 |       950000 |
| HR         |             14 |      48857 |      36000 |      78000 |       850000 |
+------------+----------------+------------+------------+------------+--------------+

The HAVING Clause

The HAVING clause is used to filter groups based on aggregate function results. While the WHERE clause filters rows before grouping, the HAVING clause filters groups after they are created.

Syntax

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);

Examples

Find departments with average salary greater than 55000:

sql
SELECT 
    department, 
    AVG(salary) AS avg_salary
FROM employees 
GROUP BY department
HAVING AVG(salary) > 55000
ORDER BY avg_salary DESC;

Output:

+------------+------------+
| department | avg_salary |
+------------+------------+
| IT         |      65789 |
| Finance    |      62500 |
| Sales      |      58000 |
+------------+------------+

Find departments with more than 20 employees:

sql
SELECT 
    department, 
    COUNT(*) AS employee_count
FROM employees 
GROUP BY department
HAVING COUNT(*) > 20;

Output:

+------------+----------------+
| department | employee_count |
+------------+----------------+
| Sales      |             25 |
+------------+----------------+

Real-World Applications

1. Sales Analysis Dashboard

Imagine you're building a sales dashboard for an e-commerce company. You can use aggregate functions to calculate key metrics:

sql
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    SUM(order_amount) AS total_revenue,
    AVG(order_amount) AS average_order_value,
    SUM(order_amount)/COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_status = 'completed'
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

2. Product Performance Analysis

For product managers wanting to understand product performance:

sql
SELECT 
    p.category,
    p.product_name,
    COUNT(o.order_id) AS times_ordered,
    SUM(o.quantity) AS total_quantity_sold,
    AVG(o.quantity) AS avg_quantity_per_order,
    SUM(o.quantity * p.price) AS total_revenue
FROM products p
JOIN order_items o ON p.product_id = o.product_id
GROUP BY p.category, p.product_name
ORDER BY total_revenue DESC
LIMIT 10;

3. Customer Segmentation

For marketing teams looking to segment customers by purchase behavior:

sql
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(order_amount) AS total_spent,
    AVG(order_amount) AS avg_order_value,
    MIN(order_date) AS first_purchase,
    MAX(order_date) AS last_purchase,
    DATEDIFF(MAX(order_date), MIN(order_date)) AS days_as_customer
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY total_spent DESC;

Advanced Aggregate Functions

1. GROUP_CONCAT()

The GROUP_CONCAT() function concatenates string values from multiple rows into a single string.

sql
SELECT 
    department,
    GROUP_CONCAT(employee_name ORDER BY employee_name ASC SEPARATOR ', ') AS employees
FROM employees
GROUP BY department;

Output:

+------------+--------------------------------------------------------------+
| department | employees                                                     |
+------------+--------------------------------------------------------------+
| IT         | Alex, David, James, John, Mark, Robert, Sarah, ...            |
| Sales      | Amanda, Emma, Jessica, Melissa, Michael, Paul, ...            |
| Marketing  | Brian, Chris, Jennifer, Kevin, Lisa, Nicole, ...              |
| HR         | Andrew, Daniel, Emily, Karen, Patricia, Sandra, ...           |
| Finance    | Adam, George, Helen, Jason, Linda, Matthew, ...               |
+------------+--------------------------------------------------------------+

2. STD() and VARIANCE()

These functions calculate standard deviation and variance of numeric values:

sql
SELECT 
    department,
    AVG(salary) AS avg_salary,
    STD(salary) AS salary_std_dev,
    VARIANCE(salary) AS salary_variance
FROM employees
GROUP BY department;

Output:

+------------+------------+----------------+------------------+
| department | avg_salary | salary_std_dev | salary_variance  |
+------------+------------+----------------+------------------+
| IT         |      65789 |       20124.85 |      405000000.0 |
| Sales      |      58000 |       18965.61 |      359695200.0 |
| Marketing  |      52777 |       16108.92 |      259497225.4 |
| HR         |      48857 |       12542.67 |      157318452.7 |
| Finance    |      62500 |       19624.06 |      385104168.2 |
+------------+------------+----------------+------------------+

Common Mistakes and Best Practices

Mistakes to Avoid

Mixing aggregate and non-aggregate columns: When using GROUP BY, select only grouped columns or aggregate functions:

sql
-- INCORRECT
SELECT department, employee_name, AVG(salary) 
FROM employees
GROUP BY department;

-- CORRECT
SELECT department, AVG(salary) 
FROM employees
GROUP BY department;

Using WHERE instead of HAVING for filtering aggregates:

sql
-- INCORRECT
SELECT department, COUNT(*) AS emp_count 
FROM employees
WHERE COUNT(*) > 20
GROUP BY department;

-- CORRECT
SELECT department, COUNT(*) AS emp_count 
FROM employees
GROUP BY department
HAVING COUNT(*) > 20;

Best Practices

Index columns used in GROUP BY for better performance
Use aliases for aggregate results to improve readability
Filter data with WHERE before grouping when possible, as it's more efficient
Consider LIMIT for large result sets to improve query performance

Summary

MySQL aggregate functions are powerful tools for data analysis that allow you to:

Count rows with COUNT()
Calculate sums with SUM()
Find averages with AVG()
Determine maximum values with MAX()
Find minimum values with MIN()
Group data with GROUP BY
Filter grouped data with HAVING

These functions are essential for transforming raw data into meaningful insights and are widely used in reporting, analytics, and dashboard applications.

Exercises

To strengthen your understanding, try these exercises:

Write a query to find the total sales amount by month for the year 2023.
Calculate the average order value by customer and find the top 5 highest-spending customers.
For each product category, find how many products are in stock and what's the total inventory value.
Write a query that shows the number of orders placed by hour of day, identifying peak ordering times.
Find departments where the highest salary is at least twice the lowest salary.

Additional Resources

By mastering MySQL aggregate functions, you'll be able to extract valuable insights from your data and build more sophisticated database applications.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Core Aggregate Functions in MySQL​

1. COUNT() Function​

Syntax​

Examples​

2. SUM() Function​

Syntax​

Examples​

3. AVG() Function​

Syntax​

Examples​

4. MAX() Function​

Syntax​

Examples​

5. MIN() Function​

Syntax​

Examples​

Using GROUP BY with Aggregate Functions​

Basic Syntax​

Examples​

The HAVING Clause​

Syntax​

Examples​

Real-World Applications​

1. Sales Analysis Dashboard​

2. Product Performance Analysis​

3. Customer Segmentation​

Advanced Aggregate Functions​

1. GROUP_CONCAT()​

2. STD() and VARIANCE()​

Common Mistakes and Best Practices​

Mistakes to Avoid​

Best Practices​

Summary​

Exercises​

Additional Resources​

Introduction

Core Aggregate Functions in MySQL

1. COUNT() Function

Syntax

Examples

2. SUM() Function

Syntax

Examples

3. AVG() Function

Syntax

Examples

4. MAX() Function

Syntax

Examples

5. MIN() Function

Syntax

Examples

Using GROUP BY with Aggregate Functions

Basic Syntax

Examples

The HAVING Clause

Syntax

Examples

Real-World Applications

1. Sales Analysis Dashboard

2. Product Performance Analysis

3. Customer Segmentation

Advanced Aggregate Functions

1. GROUP_CONCAT()

2. STD() and VARIANCE()

Common Mistakes and Best Practices

Mistakes to Avoid

Best Practices

Summary

Exercises

Additional Resources