MySQL Aggregate Functions
Introduction
Aggregate functions are a powerful feature in MySQL that allow you to perform calculations on sets of values. Instead of operating on individual rows, aggregate functions work across multiple rows to calculate summary statistics. These functions are essential for data analysis, reporting, and extracting meaningful insights from your databases.
In this tutorial, we'll explore the most commonly used MySQL aggregate functions, understand their syntax, and see how they can be applied in real-world scenarios.
Core Aggregate Functions in MySQL
MySQL provides several built-in aggregate functions that help you analyze and summarize your data efficiently. Let's explore the most important ones:
1. COUNT() Function
The COUNT()
function returns the number of rows that match a specified criterion.
Syntax
COUNT(expression)
Examples
Count all rows in a table:
SELECT COUNT(*) FROM employees;
Output:
+----------+
| COUNT(*) |
+----------+
| 100 |
+----------+
Count non-NULL values in a specific column:
SELECT COUNT(email) FROM employees;
Output:
+--------------+
| COUNT(email) |
+--------------+
| 98 |
+--------------+
Count distinct values:
SELECT COUNT(DISTINCT department) FROM employees;
Output:
+-------------------------+
| COUNT(DISTINCT department) |
+-------------------------+
| 5 |
+-------------------------+
2. SUM() Function
The SUM()
function calculates the total of all values in a numeric column.
Syntax
SUM(expression)
Examples
Calculate total salary of all employees:
SELECT SUM(salary) AS total_salary FROM employees;
Output:
+--------------+
| total_salary |
+--------------+
| 5750000 |
+--------------+
Calculate total salary by department:
SELECT department, SUM(salary) AS department_salary
FROM employees
GROUP BY department;
Output:
+------------+------------------+
| department | department_salary|
+------------+------------------+
| IT | 1250000 |
| Sales | 1450000 |
| Marketing | 950000 |
| HR | 850000 |
| Finance | 1250000 |
+------------+------------------+
3. AVG() Function
The AVG()
function calculates the average (mean) value of a numeric column.
Syntax
AVG(expression)
Examples
Calculate average salary of all employees:
SELECT AVG(salary) AS average_salary FROM employees;
Output:
+----------------+
| average_salary |
+----------------+
| 57500 |
+----------------+
Calculate average salary by department:
SELECT department, AVG(salary) AS avg_dept_salary
FROM employees
GROUP BY department;
Output:
+------------+----------------+
| department | avg_dept_salary|
+------------+----------------+
| IT | 65789 |
| Sales | 58000 |
| Marketing | 52777 |
| HR | 48857 |
| Finance | 62500 |
+------------+----------------+
4. MAX() Function
The MAX()
function returns the largest value in a set of non-NULL values.
Syntax
MAX(expression)
Examples
Find the highest salary:
SELECT MAX(salary) AS highest_salary FROM employees;
Output:
+----------------+
| highest_salary |
+----------------+
| 120000 |
+----------------+
Find the highest salary in each department:
SELECT department, MAX(salary) AS highest_dept_salary
FROM employees
GROUP BY department;
Output:
+------------+--------------------+
| department | highest_dept_salary|
+------------+--------------------+
| IT | 120000 |
| Sales | 110000 |
| Marketing | 95000 |
| HR | 78000 |
| Finance | 115000 |
+------------+--------------------+
5. MIN() Function
The MIN()
function returns the smallest value in a set of non-NULL values.
Syntax
MIN(expression)
Examples
Find the lowest salary:
SELECT MIN(salary) AS lowest_salary FROM employees;
Output:
+---------------+
| lowest_salary |
+---------------+
| 35000 |
+---------------+
Find the lowest salary in each department:
SELECT department, MIN(salary) AS lowest_dept_salary
FROM employees
GROUP BY department;
Output:
+------------+-------------------+
| department | lowest_dept_salary|
+------------+-------------------+
| IT | 42000 |
| Sales | 35000 |
| Marketing | 38000 |
| HR | 36000 |
| Finance | 45000 |
+------------+-------------------+
Using GROUP BY with Aggregate Functions
The GROUP BY
clause is often used with aggregate functions to group results by one or more columns. This allows you to perform calculations on each group rather than on the entire table.
Basic Syntax
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Examples
Count employees by department:
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
Output:
+------------+----------------+
| department | employee_count |
+------------+----------------+
| IT | 19 |
| Sales | 25 |
| Marketing | 18 |
| HR | 14 |
| Finance | 20 |
+------------+----------------+
Calculate salary statistics by department:
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
SUM(salary) AS total_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
Output:
+------------+----------------+------------+------------+------------+--------------+
| department | employee_count | avg_salary | min_salary | max_salary | total_salary |
+------------+----------------+------------+------------+------------+--------------+
| IT | 19 | 65789 | 42000 | 120000 | 1250000 |
| Finance | 20 | 62500 | 45000 | 115000 | 1250000 |
| Sales | 25 | 58000 | 35000 | 110000 | 1450000 |
| Marketing | 18 | 52777 | 38000 | 95000 | 950000 |
| HR | 14 | 48857 | 36000 | 78000 | 850000 |
+------------+----------------+------------+------------+------------+--------------+
The HAVING Clause
The HAVING
clause is used to filter groups based on aggregate function results. While the WHERE
clause filters rows before grouping, the HAVING
clause filters groups after they are created.
Syntax
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Examples
Find departments with average salary greater than 55000:
SELECT
department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 55000
ORDER BY avg_salary DESC;
Output:
+------------+------------+
| department | avg_salary |
+------------+------------+
| IT | 65789 |
| Finance | 62500 |
| Sales | 58000 |
+------------+------------+
Find departments with more than 20 employees:
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 20;
Output:
+------------+----------------+
| department | employee_count |
+------------+----------------+
| Sales | 25 |
+------------+----------------+
Real-World Applications
1. Sales Analysis Dashboard
Imagine you're building a sales dashboard for an e-commerce company. You can use aggregate functions to calculate key metrics:
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(order_amount) AS total_revenue,
AVG(order_amount) AS average_order_value,
SUM(order_amount)/COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_status = 'completed'
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;
2. Product Performance Analysis
For product managers wanting to understand product performance:
SELECT
p.category,
p.product_name,
COUNT(o.order_id) AS times_ordered,
SUM(o.quantity) AS total_quantity_sold,
AVG(o.quantity) AS avg_quantity_per_order,
SUM(o.quantity * p.price) AS total_revenue
FROM products p
JOIN order_items o ON p.product_id = o.product_id
GROUP BY p.category, p.product_name
ORDER BY total_revenue DESC
LIMIT 10;
3. Customer Segmentation
For marketing teams looking to segment customers by purchase behavior:
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_amount) AS total_spent,
AVG(order_amount) AS avg_order_value,
MIN(order_date) AS first_purchase,
MAX(order_date) AS last_purchase,
DATEDIFF(MAX(order_date), MIN(order_date)) AS days_as_customer
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY total_spent DESC;
Advanced Aggregate Functions
1. GROUP_CONCAT()
The GROUP_CONCAT()
function concatenates string values from multiple rows into a single string.
SELECT
department,
GROUP_CONCAT(employee_name ORDER BY employee_name ASC SEPARATOR ', ') AS employees
FROM employees
GROUP BY department;
Output:
+------------+--------------------------------------------------------------+
| department | employees |
+------------+--------------------------------------------------------------+
| IT | Alex, David, James, John, Mark, Robert, Sarah, ... |
| Sales | Amanda, Emma, Jessica, Melissa, Michael, Paul, ... |
| Marketing | Brian, Chris, Jennifer, Kevin, Lisa, Nicole, ... |
| HR | Andrew, Daniel, Emily, Karen, Patricia, Sandra, ... |
| Finance | Adam, George, Helen, Jason, Linda, Matthew, ... |
+------------+--------------------------------------------------------------+
2. STD() and VARIANCE()
These functions calculate standard deviation and variance of numeric values:
SELECT
department,
AVG(salary) AS avg_salary,
STD(salary) AS salary_std_dev,
VARIANCE(salary) AS salary_variance
FROM employees
GROUP BY department;
Output:
+------------+------------+----------------+------------------+
| department | avg_salary | salary_std_dev | salary_variance |
+------------+------------+----------------+------------------+
| IT | 65789 | 20124.85 | 405000000.0 |
| Sales | 58000 | 18965.61 | 359695200.0 |
| Marketing | 52777 | 16108.92 | 259497225.4 |
| HR | 48857 | 12542.67 | 157318452.7 |
| Finance | 62500 | 19624.06 | 385104168.2 |
+------------+------------+----------------+------------------+
Common Mistakes and Best Practices
Mistakes to Avoid
-
Mixing aggregate and non-aggregate columns: When using
GROUP BY
, select only grouped columns or aggregate functions:sql-- INCORRECT
SELECT department, employee_name, AVG(salary)
FROM employees
GROUP BY department;
-- CORRECT
SELECT department, AVG(salary)
FROM employees
GROUP BY department; -
Using
WHERE
instead ofHAVING
for filtering aggregates:sql-- INCORRECT
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE COUNT(*) > 20
GROUP BY department;
-- CORRECT
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 20;
Best Practices
- Index columns used in GROUP BY for better performance
- Use aliases for aggregate results to improve readability
- Filter data with WHERE before grouping when possible, as it's more efficient
- Consider LIMIT for large result sets to improve query performance
Summary
MySQL aggregate functions are powerful tools for data analysis that allow you to:
- Count rows with
COUNT()
- Calculate sums with
SUM()
- Find averages with
AVG()
- Determine maximum values with
MAX()
- Find minimum values with
MIN()
- Group data with
GROUP BY
- Filter grouped data with
HAVING
These functions are essential for transforming raw data into meaningful insights and are widely used in reporting, analytics, and dashboard applications.
Exercises
To strengthen your understanding, try these exercises:
- Write a query to find the total sales amount by month for the year 2023.
- Calculate the average order value by customer and find the top 5 highest-spending customers.
- For each product category, find how many products are in stock and what's the total inventory value.
- Write a query that shows the number of orders placed by hour of day, identifying peak ordering times.
- Find departments where the highest salary is at least twice the lowest salary.
Additional Resources
- MySQL Official Documentation on Aggregate Functions
- MySQL GROUP BY Optimization
- MySQL Performance Tuning for Aggregate Queries
By mastering MySQL aggregate functions, you'll be able to extract valuable insights from your data and build more sophisticated database applications.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)