MySQL Aggregate Functions

Introduction

Aggregate functions in MySQL perform calculations on sets of values and return a single value. They are essential tools for data analysis and reporting in databases. These functions allow you to summarize data, find totals, averages, counts, and more across multiple rows.

In this tutorial, you'll learn about the most commonly used MySQL aggregate functions and how to apply them in real-world scenarios. These functions form the foundation of data analysis in MySQL databases.

Common Aggregate Functions

Let's explore the core aggregate functions available in MySQL:

COUNT()

The COUNT() function returns the number of rows that match a specified criterion.

Syntax:

sql
COUNT(expression)

Examples:

Count all rows in a table:

sql
SELECT COUNT(*) FROM employees;

Output:

+----------+
| COUNT(*) |
+----------+
| 300      |
+----------+

Count non-NULL values in a column:

sql
SELECT COUNT(email) FROM employees;

Output:

+--------------+
| COUNT(email) |
+--------------+
| 295          |
+--------------+

Count distinct values:

sql
SELECT COUNT(DISTINCT department) FROM employees;

Output:

+-------------------------+
| COUNT(DISTINCT department) |
+-------------------------+
| 8                       |
+-------------------------+

SUM()

The SUM() function returns the total sum of a numeric column.

Syntax:

sql
SUM(expression)

Examples:

Calculate total salary budget:

sql
SELECT SUM(salary) FROM employees;

Output:

+-------------+
| SUM(salary) |
+-------------+
| 18450000    |
+-------------+

Sum with condition:

sql
SELECT SUM(salary) FROM employees WHERE department = 'Engineering';

Output:

+-------------+
| SUM(salary) |
+-------------+
| 5320000     |
+-------------+

AVG()

The AVG() function returns the average value of a numeric column.

Syntax:

sql
AVG(expression)

Examples:

Calculate average salary:

sql
SELECT AVG(salary) FROM employees;

Output:

+-------------+
| AVG(salary) |
+-------------+
| 61500.00    |
+-------------+

Average with condition:

sql
SELECT AVG(salary) FROM employees WHERE experience_years > 5;

Output:

+-------------+
| AVG(salary) |
+-------------+
| 78250.75    |
+-------------+

MIN()

The MIN() function returns the smallest value in a column.

Syntax:

sql
MIN(expression)

Examples:

Find the lowest salary:

sql
SELECT MIN(salary) FROM employees;

Output:

+-------------+
| MIN(salary) |
+-------------+
| 35000       |
+-------------+

Find the earliest hire date:

sql
SELECT MIN(hire_date) FROM employees;

Output:

+----------------+
| MIN(hire_date) |
+----------------+
| 2010-03-15     |
+----------------+

MAX()

The MAX() function returns the largest value in a column.

Syntax:

sql
MAX(expression)

Examples:

Find the highest salary:

sql
SELECT MAX(salary) FROM employees;

Output:

+-------------+
| MAX(salary) |
+-------------+
| 185000      |
+-------------+

Find the most recent hire:

sql
SELECT MAX(hire_date) FROM employees;

Output:

+----------------+
| MAX(hire_date) |
+----------------+
| 2023-09-01     |
+----------------+

GROUP BY Clause

The GROUP BY clause groups rows that have the same values into summary rows. It's commonly used with aggregate functions to perform calculations on each group.

Syntax:

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Examples:

Count employees by department:

sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Output:

+---------------+----------------+
| department    | employee_count |
+---------------+----------------+
| Engineering   | 85             |
| HR            | 15             |
| Marketing     | 32             |
| Sales         | 68             |
| Finance       | 25             |
| IT            | 40             |
| Customer Svc  | 30             |
| Operations    | 5              |
+---------------+----------------+

Calculate average salary by department:

sql
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Output:

+---------------+------------+
| department    | avg_salary |
+---------------+------------+
| Engineering   | 78500.50   |
| IT            | 72300.25   |
| Finance       | 68900.75   |
| Sales         | 65200.00   |
| Marketing     | 61800.50   |
| Operations    | 58700.25   |
| Customer Svc  | 52400.75   |
| HR            | 49800.50   |
+---------------+------------+

Count employees by department and job title:

sql
SELECT department, job_title, COUNT(*) AS employee_count
FROM employees
GROUP BY department, job_title
ORDER BY department, employee_count DESC;

HAVING Clause

The HAVING clause was added to SQL because the WHERE clause cannot be used with aggregate functions. It filters groups based on aggregate function results.

Syntax:

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING aggregate_function_condition
ORDER BY column_name(s);

Examples:

Find departments with more than 50 employees:

sql
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 50;

Output:

+---------------+----------------+
| department    | employee_count |
+---------------+----------------+
| Engineering   | 85             |
| Sales         | 68             |
+---------------+----------------+

Find departments with average salary above 70000:

sql
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000
ORDER BY avg_salary DESC;

Output:

+---------------+------------+
| department    | avg_salary |
+---------------+------------+
| Engineering   | 78500.50   |
| IT            | 72300.25   |
+---------------+------------+

Complex example with WHERE and HAVING:

sql
SELECT department, job_title, AVG(salary) AS avg_salary, COUNT(*) AS employee_count
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department, job_title
HAVING COUNT(*) >= 3 AND AVG(salary) > 60000
ORDER BY avg_salary DESC;

Combining Multiple Aggregate Functions

You can use multiple aggregate functions in a single query:

sql
SELECT 
    COUNT(*) AS total_employees,
    SUM(salary) AS total_salary,
    AVG(salary) AS average_salary,
    MIN(salary) AS lowest_salary,
    MAX(salary) AS highest_salary
FROM employees;

Output:

+-----------------+--------------+----------------+---------------+----------------+
| total_employees | total_salary | average_salary | lowest_salary | highest_salary |
+-----------------+--------------+----------------+---------------+----------------+
| 300             | 18450000     | 61500.00       | 35000         | 185000         |
+-----------------+--------------+----------------+---------------+----------------+

Common Use Cases and Practical Examples

Example 1: Sales Data Analysis

Let's analyze a sales table to get insights:

sql
-- Total sales by year and month
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    SUM(amount) AS total_sales,
    COUNT(*) AS number_of_orders,
    AVG(amount) AS average_order_value
FROM sales
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

Example 2: Employee Performance Dashboard

Create a performance dashboard for employees:

sql
-- Employee performance metrics
SELECT 
    e.employee_id,
    CONCAT(e.first_name, ' ', e.last_name) AS employee_name,
    COUNT(s.sale_id) AS total_sales,
    SUM(s.amount) AS sales_amount,
    AVG(s.amount) AS avg_sale_value,
    MAX(s.amount) AS largest_sale
FROM employees e
LEFT JOIN sales s ON e.employee_id = s.employee_id
WHERE YEAR(s.sale_date) = 2023
GROUP BY e.employee_id, employee_name
HAVING COUNT(s.sale_id) > 0
ORDER BY sales_amount DESC;

Example 3: Product Performance Analysis

Analyze product performance:

sql
-- Product performance analysis
SELECT 
    p.category,
    p.product_name,
    COUNT(s.sale_id) AS times_sold,
    SUM(s.quantity) AS total_units_sold,
    SUM(s.quantity * s.unit_price) AS total_revenue
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY p.category, p.product_name
HAVING total_units_sold > 100
ORDER BY total_revenue DESC;

Handling NULL Values

Aggregate functions generally ignore NULL values, except for COUNT(*):

sql
-- Create a test table with NULL values
CREATE TABLE test_nulls (
    id INT,
    value INT
);

INSERT INTO test_nulls VALUES (1, 10), (2, NULL), (3, 30), (4, NULL), (5, 50);

-- Test with nulls
SELECT 
    COUNT(*) AS count_rows,
    COUNT(value) AS count_non_null,
    SUM(value) AS sum_value,
    AVG(value) AS avg_value
FROM test_nulls;

Output:

+------------+---------------+-----------+-----------+
| count_rows | count_non_null | sum_value | avg_value |
+------------+---------------+-----------+-----------+
| 5          | 3             | 90        | 30.0000   |
+------------+---------------+-----------+-----------+

Note that:

COUNT(*) counts all rows (5)
COUNT(value) counts only non-NULL values (3)
SUM and AVG calculate based only on non-NULL values

Aggregate Functions with DISTINCT

All aggregate functions can be used with the DISTINCT keyword to consider only unique values:

sql
SELECT 
    COUNT(DISTINCT department) AS unique_departments,
    COUNT(DISTINCT job_title) AS unique_job_titles,
    AVG(DISTINCT salary) AS avg_of_unique_salaries
FROM employees;

Output:

+--------------------+-------------------+------------------------+
| unique_departments | unique_job_titles | avg_of_unique_salaries |
+--------------------+-------------------+------------------------+
| 8                  | 35                | 73625.50               |
+--------------------+-------------------+------------------------+

Performance Considerations

When working with aggregate functions:

Indexing: Ensure columns used in GROUP BY clauses are indexed for better performance.
Filtering: Apply WHERE clauses before GROUP BY to reduce the amount of data processed.
HAVING vs WHERE: Use WHERE for row-level filtering and HAVING for group-level filtering.
Large Datasets: Be cautious with large tables as aggregation operations can be memory-intensive.

Summary

MySQL aggregate functions are powerful tools for data analysis and reporting. In this tutorial, we covered:

Core aggregate functions: COUNT(), SUM(), AVG(), MIN(), and MAX()
Using the GROUP BY clause to group data for aggregation
Filtering groups with the HAVING clause
Combining multiple aggregate functions in a single query
Real-world examples of data analysis using aggregate functions
Handling NULL values in aggregations
Using DISTINCT with aggregate functions
Performance considerations

Mastering these functions will enable you to extract meaningful insights from your data and generate powerful reports for business intelligence.

Exercises

Write a query to find the department with the highest average salary.
Create a query that shows the count of employees hired each year.
Find products that have been ordered more than 100 times and calculate their total revenue.
Calculate the percentage of total sales contributed by each product category.
Write a query to find months with sales above the yearly average.

Additional Resources

MySQL Official Documentation on Aggregate Functions
MySQL Window Functions (for more advanced aggregation)
MySQL GROUP BY Optimization

Happy querying!

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Common Aggregate Functions​

COUNT()​

SUM()​

AVG()​

MIN()​

MAX()​

GROUP BY Clause​

HAVING Clause​

Combining Multiple Aggregate Functions​

Common Use Cases and Practical Examples​

Example 1: Sales Data Analysis​

Example 2: Employee Performance Dashboard​

Example 3: Product Performance Analysis​

Handling NULL Values​

Aggregate Functions with DISTINCT​

Performance Considerations​

Summary​

Exercises​

Additional Resources​

Introduction

Common Aggregate Functions

COUNT()

SUM()

AVG()

MIN()

MAX()

GROUP BY Clause

HAVING Clause

Combining Multiple Aggregate Functions

Common Use Cases and Practical Examples

Example 1: Sales Data Analysis

Example 2: Employee Performance Dashboard

Example 3: Product Performance Analysis

Handling NULL Values

Aggregate Functions with DISTINCT

Performance Considerations

Summary

Exercises

Additional Resources