MySQL GROUP BY
The GROUP BY
clause is one of the most powerful features in MySQL that allows you to organize and summarize your data by grouping rows based on column values. When combined with aggregate functions, it becomes an essential tool for data analysis and reporting.
Introduction to GROUP BY
In database queries, we often need to analyze data by categories or groups rather than looking at individual records. The GROUP BY
clause allows us to:
- Group rows that have the same values in specified columns
- Apply aggregate functions to each group
- Create summary reports
- Calculate statistics per category
The basic syntax of the GROUP BY
clause is:
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Basic GROUP BY Usage
Let's start with a simple example. Suppose we have a sales
table with the following structure:
CREATE TABLE sales (
id INT AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE
);
Example 1: Counting Records per Group
To count how many sales we have for each product category:
SELECT
category,
COUNT(*) AS total_sales
FROM
sales
GROUP BY
category;
Output:
+----------------+-------------+
| category | total_sales |
+----------------+-------------+
| Electronics | 15 |
| Clothing | 23 |
| Home & Kitchen | 18 |
| Books | 12 |
+----------------+-------------+
In this query:
- We select the
category
column - We apply the
COUNT(*)
aggregate function to count rows in each group - We group the results by
category
Common Aggregate Functions with GROUP BY
The true power of GROUP BY
comes when combining it with aggregate functions. Here are the most common ones:
Function | Description |
---|---|
COUNT() | Counts the number of rows |
SUM() | Calculates the sum of values |
AVG() | Calculates the average of values |
MIN() | Returns the minimum value |
MAX() | Returns the maximum value |
Example 2: Using Multiple Aggregate Functions
Let's analyze sales data using multiple aggregations:
SELECT
category,
COUNT(*) AS total_sales,
SUM(amount) AS total_revenue,
AVG(amount) AS average_sale,
MIN(amount) AS min_sale,
MAX(amount) AS max_sale
FROM
sales
GROUP BY
category;
Output:
+----------------+-------------+---------------+--------------+----------+----------+
| category | total_sales | total_revenue | average_sale | min_sale | max_sale |
+----------------+-------------+---------------+--------------+----------+----------+
| Electronics | 15 | 7500.00 | 500.00 | 99.99 | 1299.99 |
| Clothing | 23 | 1380.00 | 60.00 | 19.99 | 149.99 |
| Home & Kitchen | 18 | 2340.00 | 130.00 | 24.99 | 499.99 |
| Books | 12 | 240.00 | 20.00 | 9.99 | 49.99 |
+----------------+-------------+---------------+--------------+----------+----------+
This single query gives us comprehensive sales analytics for each product category.
Grouping by Multiple Columns
You can group by multiple columns to create more detailed analysis:
Example 3: Multi-Column Grouping
Let's group sales by both category and year to analyze yearly performance:
SELECT
category,
YEAR(sale_date) AS sale_year,
COUNT(*) AS total_sales,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category,
YEAR(sale_date)
ORDER BY
category,
sale_year;
Output:
+----------------+-----------+-------------+---------------+
| category | sale_year | total_sales | total_revenue |
+----------------+-----------+-------------+---------------+
| Books | 2021 | 5 | 99.95 |
| Books | 2022 | 7 | 140.05 |
| Clothing | 2021 | 10 | 599.90 |
| Clothing | 2022 | 13 | 780.10 |
| Electronics | 2021 | 7 | 3500.00 |
| Electronics | 2022 | 8 | 4000.00 |
| Home & Kitchen | 2021 | 8 | 1040.00 |
| Home & Kitchen | 2022 | 10 | 1300.00 |
+----------------+-----------+-------------+---------------+
The query groups sales first by category
and then by the year extracted from the sale_date
.
HAVING Clause with GROUP BY
While the WHERE
clause filters individual rows before grouping, the HAVING
clause filters groups after they are formed. This is essential when you need to filter based on aggregate values.
Example 4: Filtering Groups with HAVING
Let's find categories that have generated more than $2000 in revenue:
SELECT
category,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category
HAVING
total_revenue > 2000;
Output:
+----------------+---------------+
| category | total_revenue |
+----------------+---------------+
| Electronics | 7500.00 |
| Home & Kitchen | 2340.00 |
+----------------+---------------+
WITH ROLLUP Modifier
The WITH ROLLUP
modifier generates super-aggregate rows that contain grand totals:
Example 5: Using WITH ROLLUP
SELECT
category,
YEAR(sale_date) AS sale_year,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category,
YEAR(sale_date)
WITH ROLLUP;
Output:
+----------------+-----------+---------------+
| category | sale_year | total_revenue |
+----------------+-----------+---------------+
| Books | 2021 | 99.95 |
| Books | 2022 | 140.05 |
| Books | NULL | 240.00 | -- Subtotal for Books
| Clothing | 2021 | 599.90 |
| Clothing | 2022 | 780.10 |
| Clothing | NULL | 1380.00 | -- Subtotal for Clothing
| Electronics | 2021 | 3500.00 |
| Electronics | 2022 | 4000.00 |
| Electronics | NULL | 7500.00 | -- Subtotal for Electronics
| Home & Kitchen | 2021 | 1040.00 |
| Home & Kitchen | 2022 | 1300.00 |
| Home & Kitchen | NULL | 2340.00 | -- Subtotal for Home & Kitchen
| NULL | NULL | 11460.00 | -- Grand Total
+----------------+-----------+---------------+
This provides subtotals for each category as well as a grand total of all sales.
Common Pitfalls and Best Practices
1. Non-Aggregated Columns
In MySQL, any column in the SELECT list that is not enclosed in an aggregate function must be included in the GROUP BY clause. Consider this example:
-- This will cause an error
SELECT
category,
product_name,
SUM(amount)
FROM
sales
GROUP BY
category;
The query above would fail because product_name
is not in the GROUP BY clause. To fix it:
SELECT
category,
product_name,
SUM(amount) AS total_amount
FROM
sales
GROUP BY
category,
product_name;
2. Group By vs. Distinct
Beginners often confuse GROUP BY
with DISTINCT
. While both can eliminate duplicates, they serve different purposes:
-- Using DISTINCT to get unique categories
SELECT DISTINCT category FROM sales;
-- Using GROUP BY to get unique categories
SELECT category FROM sales GROUP BY category;
Both queries would return the same result in this simple case, but GROUP BY
allows you to use aggregate functions while DISTINCT
does not.
Real-World Applications
Example 6: Monthly Sales Report
A common business requirement is to generate monthly sales reports:
SELECT
DATE_FORMAT(sale_date, '%Y-%m') AS month,
COUNT(*) AS transaction_count,
SUM(amount) AS monthly_revenue,
ROUND(AVG(amount), 2) AS average_transaction
FROM
sales
WHERE
sale_date >= '2022-01-01'
GROUP BY
DATE_FORMAT(sale_date, '%Y-%m')
ORDER BY
month;
Output:
+----------+------------------+-----------------+---------------------+
| month | transaction_count| monthly_revenue | average_transaction |
+----------+------------------+-----------------+---------------------+
| 2022-01 | 10 | 950.75 | 95.08 |
| 2022-02 | 12 | 1230.50 | 102.54 |
| 2022-03 | 15 | 1575.25 | 105.02 |
| 2022-04 | 11 | 1100.00 | 100.00 |
+----------+------------------+-----------------+---------------------+
Example 7: Customer Segmentation
Let's segment customers based on their purchase behaviors:
SELECT
CASE
WHEN SUM(amount) < 100 THEN 'Low Spender'
WHEN SUM(amount) BETWEEN 100 AND 500 THEN 'Medium Spender'
ELSE 'High Spender'
END AS customer_segment,
COUNT(*) AS customer_count,
ROUND(AVG(SUM(amount)), 2) AS average_spending
FROM
sales
GROUP BY
customer_id
GROUP BY
customer_segment;
Output:
+----------------+---------------+------------------+
| customer_segment | customer_count | average_spending |
+----------------+---------------+------------------+
| Low Spender | 45 | 57.35 |
| Medium Spender | 32 | 287.42 |
| High Spender | 18 | 742.88 |
+----------------+---------------+------------------+
Summary
The MySQL GROUP BY
clause is an essential tool for data analysis that allows you to:
- Group rows with similar values
- Apply aggregate functions to each group
- Generate summary statistics
- Create sophisticated reports
- Perform data analysis by categories
When combined with aggregate functions and other SQL features like HAVING
, WITH ROLLUP
, and conditional expressions, GROUP BY
becomes incredibly powerful for mining insights from your data.
Practice Exercises
- Given a
students
table, write a query to find the average grade for each course. - Using an
orders
table, find the total sales per customer, but only include customers who have spent more than $1000. - Create a sales report that shows quarterly revenue for each product category for the past year, including subtotals.
- Write a query to find the top 3 selling products by quantity in each category.
- Generate a report showing customer counts by country and city, with appropriate subtotals using
WITH ROLLUP
.
Additional Resources
- MySQL Documentation: GROUP BY (Aggregate) Functions
- MySQL Documentation: GROUP BY Modifiers
- MySQL Documentation: MySQL Aggregate Functions
Understanding GROUP BY
is a critical step in mastering SQL and will significantly enhance your ability to analyze data and generate valuable business insights from your MySQL databases.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)