Skip to main content

MySQL Aggregate Functions

Introduction

Aggregate functions in MySQL perform calculations on sets of values and return a single value. They are essential tools for data analysis and reporting in databases. These functions allow you to summarize data, find totals, averages, counts, and more across multiple rows.

In this tutorial, you'll learn about the most commonly used MySQL aggregate functions and how to apply them in real-world scenarios. These functions form the foundation of data analysis in MySQL databases.

Common Aggregate Functions

Let's explore the core aggregate functions available in MySQL:

COUNT()

The COUNT() function returns the number of rows that match a specified criterion.

Syntax:

sql
COUNT(expression)

Examples:

  1. Count all rows in a table:

    sql
    SELECT COUNT(*) FROM employees;

    Output:

    +----------+
    | COUNT(*) |
    +----------+
    | 300 |
    +----------+
  2. Count non-NULL values in a column:

    sql
    SELECT COUNT(email) FROM employees;

    Output:

    +--------------+
    | COUNT(email) |
    +--------------+
    | 295 |
    +--------------+
  3. Count distinct values:

    sql
    SELECT COUNT(DISTINCT department) FROM employees;

    Output:

    +-------------------------+
    | COUNT(DISTINCT department) |
    +-------------------------+
    | 8 |
    +-------------------------+

SUM()

The SUM() function returns the total sum of a numeric column.

Syntax:

sql
SUM(expression)

Examples:

  1. Calculate total salary budget:

    sql
    SELECT SUM(salary) FROM employees;

    Output:

    +-------------+
    | SUM(salary) |
    +-------------+
    | 18450000 |
    +-------------+
  2. Sum with condition:

    sql
    SELECT SUM(salary) FROM employees WHERE department = 'Engineering';

    Output:

    +-------------+
    | SUM(salary) |
    +-------------+
    | 5320000 |
    +-------------+

AVG()

The AVG() function returns the average value of a numeric column.

Syntax:

sql
AVG(expression)

Examples:

  1. Calculate average salary:

    sql
    SELECT AVG(salary) FROM employees;

    Output:

    +-------------+
    | AVG(salary) |
    +-------------+
    | 61500.00 |
    +-------------+
  2. Average with condition:

    sql
    SELECT AVG(salary) FROM employees WHERE experience_years > 5;

    Output:

    +-------------+
    | AVG(salary) |
    +-------------+
    | 78250.75 |
    +-------------+

MIN()

The MIN() function returns the smallest value in a column.

Syntax:

sql
MIN(expression)

Examples:

  1. Find the lowest salary:

    sql
    SELECT MIN(salary) FROM employees;

    Output:

    +-------------+
    | MIN(salary) |
    +-------------+
    | 35000 |
    +-------------+
  2. Find the earliest hire date:

    sql
    SELECT MIN(hire_date) FROM employees;

    Output:

    +----------------+
    | MIN(hire_date) |
    +----------------+
    | 2010-03-15 |
    +----------------+

MAX()

The MAX() function returns the largest value in a column.

Syntax:

sql
MAX(expression)

Examples:

  1. Find the highest salary:

    sql
    SELECT MAX(salary) FROM employees;

    Output:

    +-------------+
    | MAX(salary) |
    +-------------+
    | 185000 |
    +-------------+
  2. Find the most recent hire:

    sql
    SELECT MAX(hire_date) FROM employees;

    Output:

    +----------------+
    | MAX(hire_date) |
    +----------------+
    | 2023-09-01 |
    +----------------+

GROUP BY Clause

The GROUP BY clause groups rows that have the same values into summary rows. It's commonly used with aggregate functions to perform calculations on each group.

Syntax:

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Examples:

  1. Count employees by department:

    sql
    SELECT department, COUNT(*) AS employee_count
    FROM employees
    GROUP BY department;

    Output:

    +---------------+----------------+
    | department | employee_count |
    +---------------+----------------+
    | Engineering | 85 |
    | HR | 15 |
    | Marketing | 32 |
    | Sales | 68 |
    | Finance | 25 |
    | IT | 40 |
    | Customer Svc | 30 |
    | Operations | 5 |
    +---------------+----------------+
  2. Calculate average salary by department:

    sql
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
    ORDER BY avg_salary DESC;

    Output:

    +---------------+------------+
    | department | avg_salary |
    +---------------+------------+
    | Engineering | 78500.50 |
    | IT | 72300.25 |
    | Finance | 68900.75 |
    | Sales | 65200.00 |
    | Marketing | 61800.50 |
    | Operations | 58700.25 |
    | Customer Svc | 52400.75 |
    | HR | 49800.50 |
    +---------------+------------+
  3. Count employees by department and job title:

    sql
    SELECT department, job_title, COUNT(*) AS employee_count
    FROM employees
    GROUP BY department, job_title
    ORDER BY department, employee_count DESC;

HAVING Clause

The HAVING clause was added to SQL because the WHERE clause cannot be used with aggregate functions. It filters groups based on aggregate function results.

Syntax:

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING aggregate_function_condition
ORDER BY column_name(s);

Examples:

  1. Find departments with more than 50 employees:

    sql
    SELECT department, COUNT(*) AS employee_count
    FROM employees
    GROUP BY department
    HAVING COUNT(*) > 50;

    Output:

    +---------------+----------------+
    | department | employee_count |
    +---------------+----------------+
    | Engineering | 85 |
    | Sales | 68 |
    +---------------+----------------+
  2. Find departments with average salary above 70000:

    sql
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
    HAVING AVG(salary) > 70000
    ORDER BY avg_salary DESC;

    Output:

    +---------------+------------+
    | department | avg_salary |
    +---------------+------------+
    | Engineering | 78500.50 |
    | IT | 72300.25 |
    +---------------+------------+
  3. Complex example with WHERE and HAVING:

    sql
    SELECT department, job_title, AVG(salary) AS avg_salary, COUNT(*) AS employee_count
    FROM employees
    WHERE hire_date > '2020-01-01'
    GROUP BY department, job_title
    HAVING COUNT(*) >= 3 AND AVG(salary) > 60000
    ORDER BY avg_salary DESC;

Combining Multiple Aggregate Functions

You can use multiple aggregate functions in a single query:

sql
SELECT 
COUNT(*) AS total_employees,
SUM(salary) AS total_salary,
AVG(salary) AS average_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees;

Output:

+-----------------+--------------+----------------+---------------+----------------+
| total_employees | total_salary | average_salary | lowest_salary | highest_salary |
+-----------------+--------------+----------------+---------------+----------------+
| 300 | 18450000 | 61500.00 | 35000 | 185000 |
+-----------------+--------------+----------------+---------------+----------------+

Common Use Cases and Practical Examples

Example 1: Sales Data Analysis

Let's analyze a sales table to get insights:

sql
-- Total sales by year and month
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
SUM(amount) AS total_sales,
COUNT(*) AS number_of_orders,
AVG(amount) AS average_order_value
FROM sales
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

Example 2: Employee Performance Dashboard

Create a performance dashboard for employees:

sql
-- Employee performance metrics
SELECT
e.employee_id,
CONCAT(e.first_name, ' ', e.last_name) AS employee_name,
COUNT(s.sale_id) AS total_sales,
SUM(s.amount) AS sales_amount,
AVG(s.amount) AS avg_sale_value,
MAX(s.amount) AS largest_sale
FROM employees e
LEFT JOIN sales s ON e.employee_id = s.employee_id
WHERE YEAR(s.sale_date) = 2023
GROUP BY e.employee_id, employee_name
HAVING COUNT(s.sale_id) > 0
ORDER BY sales_amount DESC;

Example 3: Product Performance Analysis

Analyze product performance:

sql
-- Product performance analysis
SELECT
p.category,
p.product_name,
COUNT(s.sale_id) AS times_sold,
SUM(s.quantity) AS total_units_sold,
SUM(s.quantity * s.unit_price) AS total_revenue
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY p.category, p.product_name
HAVING total_units_sold > 100
ORDER BY total_revenue DESC;

Handling NULL Values

Aggregate functions generally ignore NULL values, except for COUNT(*):

sql
-- Create a test table with NULL values
CREATE TABLE test_nulls (
id INT,
value INT
);

INSERT INTO test_nulls VALUES (1, 10), (2, NULL), (3, 30), (4, NULL), (5, 50);

-- Test with nulls
SELECT
COUNT(*) AS count_rows,
COUNT(value) AS count_non_null,
SUM(value) AS sum_value,
AVG(value) AS avg_value
FROM test_nulls;

Output:

+------------+---------------+-----------+-----------+
| count_rows | count_non_null | sum_value | avg_value |
+------------+---------------+-----------+-----------+
| 5 | 3 | 90 | 30.0000 |
+------------+---------------+-----------+-----------+

Note that:

  • COUNT(*) counts all rows (5)
  • COUNT(value) counts only non-NULL values (3)
  • SUM and AVG calculate based only on non-NULL values

Aggregate Functions with DISTINCT

All aggregate functions can be used with the DISTINCT keyword to consider only unique values:

sql
SELECT 
COUNT(DISTINCT department) AS unique_departments,
COUNT(DISTINCT job_title) AS unique_job_titles,
AVG(DISTINCT salary) AS avg_of_unique_salaries
FROM employees;

Output:

+--------------------+-------------------+------------------------+
| unique_departments | unique_job_titles | avg_of_unique_salaries |
+--------------------+-------------------+------------------------+
| 8 | 35 | 73625.50 |
+--------------------+-------------------+------------------------+

Performance Considerations

When working with aggregate functions:

  1. Indexing: Ensure columns used in GROUP BY clauses are indexed for better performance.
  2. Filtering: Apply WHERE clauses before GROUP BY to reduce the amount of data processed.
  3. HAVING vs WHERE: Use WHERE for row-level filtering and HAVING for group-level filtering.
  4. Large Datasets: Be cautious with large tables as aggregation operations can be memory-intensive.

Summary

MySQL aggregate functions are powerful tools for data analysis and reporting. In this tutorial, we covered:

  • Core aggregate functions: COUNT(), SUM(), AVG(), MIN(), and MAX()
  • Using the GROUP BY clause to group data for aggregation
  • Filtering groups with the HAVING clause
  • Combining multiple aggregate functions in a single query
  • Real-world examples of data analysis using aggregate functions
  • Handling NULL values in aggregations
  • Using DISTINCT with aggregate functions
  • Performance considerations

Mastering these functions will enable you to extract meaningful insights from your data and generate powerful reports for business intelligence.

Exercises

  1. Write a query to find the department with the highest average salary.
  2. Create a query that shows the count of employees hired each year.
  3. Find products that have been ordered more than 100 times and calculate their total revenue.
  4. Calculate the percentage of total sales contributed by each product category.
  5. Write a query to find months with sales above the yearly average.

Additional Resources

Happy querying!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)