MySQL Aggregate Functions
Introduction
Aggregate functions in MySQL perform calculations on sets of values and return a single value. They are essential tools for data analysis and reporting in databases. These functions allow you to summarize data, find totals, averages, counts, and more across multiple rows.
In this tutorial, you'll learn about the most commonly used MySQL aggregate functions and how to apply them in real-world scenarios. These functions form the foundation of data analysis in MySQL databases.
Common Aggregate Functions
Let's explore the core aggregate functions available in MySQL:
COUNT()
The COUNT()
function returns the number of rows that match a specified criterion.
Syntax:
COUNT(expression)
Examples:
-
Count all rows in a table:
sqlSELECT COUNT(*) FROM employees;
Output:
+----------+
| COUNT(*) |
+----------+
| 300 |
+----------+ -
Count non-NULL values in a column:
sqlSELECT COUNT(email) FROM employees;
Output:
+--------------+
| COUNT(email) |
+--------------+
| 295 |
+--------------+ -
Count distinct values:
sqlSELECT COUNT(DISTINCT department) FROM employees;
Output:
+-------------------------+
| COUNT(DISTINCT department) |
+-------------------------+
| 8 |
+-------------------------+
SUM()
The SUM()
function returns the total sum of a numeric column.
Syntax:
SUM(expression)
Examples:
-
Calculate total salary budget:
sqlSELECT SUM(salary) FROM employees;
Output:
+-------------+
| SUM(salary) |
+-------------+
| 18450000 |
+-------------+ -
Sum with condition:
sqlSELECT SUM(salary) FROM employees WHERE department = 'Engineering';
Output:
+-------------+
| SUM(salary) |
+-------------+
| 5320000 |
+-------------+
AVG()
The AVG()
function returns the average value of a numeric column.
Syntax:
AVG(expression)
Examples:
-
Calculate average salary:
sqlSELECT AVG(salary) FROM employees;
Output:
+-------------+
| AVG(salary) |
+-------------+
| 61500.00 |
+-------------+ -
Average with condition:
sqlSELECT AVG(salary) FROM employees WHERE experience_years > 5;
Output:
+-------------+
| AVG(salary) |
+-------------+
| 78250.75 |
+-------------+
MIN()
The MIN()
function returns the smallest value in a column.
Syntax:
MIN(expression)
Examples:
-
Find the lowest salary:
sqlSELECT MIN(salary) FROM employees;
Output:
+-------------+
| MIN(salary) |
+-------------+
| 35000 |
+-------------+ -
Find the earliest hire date:
sqlSELECT MIN(hire_date) FROM employees;
Output:
+----------------+
| MIN(hire_date) |
+----------------+
| 2010-03-15 |
+----------------+
MAX()
The MAX()
function returns the largest value in a column.
Syntax:
MAX(expression)
Examples:
-
Find the highest salary:
sqlSELECT MAX(salary) FROM employees;
Output:
+-------------+
| MAX(salary) |
+-------------+
| 185000 |
+-------------+ -
Find the most recent hire:
sqlSELECT MAX(hire_date) FROM employees;
Output:
+----------------+
| MAX(hire_date) |
+----------------+
| 2023-09-01 |
+----------------+
GROUP BY Clause
The GROUP BY
clause groups rows that have the same values into summary rows. It's commonly used with aggregate functions to perform calculations on each group.
Syntax:
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Examples:
-
Count employees by department:
sqlSELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;Output:
+---------------+----------------+
| department | employee_count |
+---------------+----------------+
| Engineering | 85 |
| HR | 15 |
| Marketing | 32 |
| Sales | 68 |
| Finance | 25 |
| IT | 40 |
| Customer Svc | 30 |
| Operations | 5 |
+---------------+----------------+ -
Calculate average salary by department:
sqlSELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;Output:
+---------------+------------+
| department | avg_salary |
+---------------+------------+
| Engineering | 78500.50 |
| IT | 72300.25 |
| Finance | 68900.75 |
| Sales | 65200.00 |
| Marketing | 61800.50 |
| Operations | 58700.25 |
| Customer Svc | 52400.75 |
| HR | 49800.50 |
+---------------+------------+ -
Count employees by department and job title:
sqlSELECT department, job_title, COUNT(*) AS employee_count
FROM employees
GROUP BY department, job_title
ORDER BY department, employee_count DESC;
HAVING Clause
The HAVING
clause was added to SQL because the WHERE
clause cannot be used with aggregate functions. It filters groups based on aggregate function results.
Syntax:
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING aggregate_function_condition
ORDER BY column_name(s);
Examples:
-
Find departments with more than 50 employees:
sqlSELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 50;Output:
+---------------+----------------+
| department | employee_count |
+---------------+----------------+
| Engineering | 85 |
| Sales | 68 |
+---------------+----------------+ -
Find departments with average salary above 70000:
sqlSELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000
ORDER BY avg_salary DESC;Output:
+---------------+------------+
| department | avg_salary |
+---------------+------------+
| Engineering | 78500.50 |
| IT | 72300.25 |
+---------------+------------+ -
Complex example with WHERE and HAVING:
sqlSELECT department, job_title, AVG(salary) AS avg_salary, COUNT(*) AS employee_count
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department, job_title
HAVING COUNT(*) >= 3 AND AVG(salary) > 60000
ORDER BY avg_salary DESC;
Combining Multiple Aggregate Functions
You can use multiple aggregate functions in a single query:
SELECT
COUNT(*) AS total_employees,
SUM(salary) AS total_salary,
AVG(salary) AS average_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees;
Output:
+-----------------+--------------+----------------+---------------+----------------+
| total_employees | total_salary | average_salary | lowest_salary | highest_salary |
+-----------------+--------------+----------------+---------------+----------------+
| 300 | 18450000 | 61500.00 | 35000 | 185000 |
+-----------------+--------------+----------------+---------------+----------------+
Common Use Cases and Practical Examples
Example 1: Sales Data Analysis
Let's analyze a sales table to get insights:
-- Total sales by year and month
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
SUM(amount) AS total_sales,
COUNT(*) AS number_of_orders,
AVG(amount) AS average_order_value
FROM sales
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
Example 2: Employee Performance Dashboard
Create a performance dashboard for employees:
-- Employee performance metrics
SELECT
e.employee_id,
CONCAT(e.first_name, ' ', e.last_name) AS employee_name,
COUNT(s.sale_id) AS total_sales,
SUM(s.amount) AS sales_amount,
AVG(s.amount) AS avg_sale_value,
MAX(s.amount) AS largest_sale
FROM employees e
LEFT JOIN sales s ON e.employee_id = s.employee_id
WHERE YEAR(s.sale_date) = 2023
GROUP BY e.employee_id, employee_name
HAVING COUNT(s.sale_id) > 0
ORDER BY sales_amount DESC;
Example 3: Product Performance Analysis
Analyze product performance:
-- Product performance analysis
SELECT
p.category,
p.product_name,
COUNT(s.sale_id) AS times_sold,
SUM(s.quantity) AS total_units_sold,
SUM(s.quantity * s.unit_price) AS total_revenue
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY p.category, p.product_name
HAVING total_units_sold > 100
ORDER BY total_revenue DESC;
Handling NULL Values
Aggregate functions generally ignore NULL values, except for COUNT(*)
:
-- Create a test table with NULL values
CREATE TABLE test_nulls (
id INT,
value INT
);
INSERT INTO test_nulls VALUES (1, 10), (2, NULL), (3, 30), (4, NULL), (5, 50);
-- Test with nulls
SELECT
COUNT(*) AS count_rows,
COUNT(value) AS count_non_null,
SUM(value) AS sum_value,
AVG(value) AS avg_value
FROM test_nulls;
Output:
+------------+---------------+-----------+-----------+
| count_rows | count_non_null | sum_value | avg_value |
+------------+---------------+-----------+-----------+
| 5 | 3 | 90 | 30.0000 |
+------------+---------------+-----------+-----------+
Note that:
COUNT(*)
counts all rows (5)COUNT(value)
counts only non-NULL values (3)SUM
andAVG
calculate based only on non-NULL values
Aggregate Functions with DISTINCT
All aggregate functions can be used with the DISTINCT
keyword to consider only unique values:
SELECT
COUNT(DISTINCT department) AS unique_departments,
COUNT(DISTINCT job_title) AS unique_job_titles,
AVG(DISTINCT salary) AS avg_of_unique_salaries
FROM employees;
Output:
+--------------------+-------------------+------------------------+
| unique_departments | unique_job_titles | avg_of_unique_salaries |
+--------------------+-------------------+------------------------+
| 8 | 35 | 73625.50 |
+--------------------+-------------------+------------------------+
Performance Considerations
When working with aggregate functions:
- Indexing: Ensure columns used in GROUP BY clauses are indexed for better performance.
- Filtering: Apply WHERE clauses before GROUP BY to reduce the amount of data processed.
- HAVING vs WHERE: Use WHERE for row-level filtering and HAVING for group-level filtering.
- Large Datasets: Be cautious with large tables as aggregation operations can be memory-intensive.
Summary
MySQL aggregate functions are powerful tools for data analysis and reporting. In this tutorial, we covered:
- Core aggregate functions:
COUNT()
,SUM()
,AVG()
,MIN()
, andMAX()
- Using the
GROUP BY
clause to group data for aggregation - Filtering groups with the
HAVING
clause - Combining multiple aggregate functions in a single query
- Real-world examples of data analysis using aggregate functions
- Handling NULL values in aggregations
- Using DISTINCT with aggregate functions
- Performance considerations
Mastering these functions will enable you to extract meaningful insights from your data and generate powerful reports for business intelligence.
Exercises
- Write a query to find the department with the highest average salary.
- Create a query that shows the count of employees hired each year.
- Find products that have been ordered more than 100 times and calculate their total revenue.
- Calculate the percentage of total sales contributed by each product category.
- Write a query to find months with sales above the yearly average.
Additional Resources
- MySQL Official Documentation on Aggregate Functions
- MySQL Window Functions (for more advanced aggregation)
- MySQL GROUP BY Optimization
Happy querying!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)