Skip to main content

MySQL GROUP BY

The GROUP BY clause is one of the most powerful features in MySQL that allows you to organize and summarize your data by grouping rows based on column values. When combined with aggregate functions, it becomes an essential tool for data analysis and reporting.

Introduction to GROUP BY

In database queries, we often need to analyze data by categories or groups rather than looking at individual records. The GROUP BY clause allows us to:

  • Group rows that have the same values in specified columns
  • Apply aggregate functions to each group
  • Create summary reports
  • Calculate statistics per category

The basic syntax of the GROUP BY clause is:

sql
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Basic GROUP BY Usage

Let's start with a simple example. Suppose we have a sales table with the following structure:

sql
CREATE TABLE sales (
id INT AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE
);

Example 1: Counting Records per Group

To count how many sales we have for each product category:

sql
SELECT 
category,
COUNT(*) AS total_sales
FROM
sales
GROUP BY
category;

Output:

+----------------+-------------+
| category | total_sales |
+----------------+-------------+
| Electronics | 15 |
| Clothing | 23 |
| Home & Kitchen | 18 |
| Books | 12 |
+----------------+-------------+

In this query:

  1. We select the category column
  2. We apply the COUNT(*) aggregate function to count rows in each group
  3. We group the results by category

Common Aggregate Functions with GROUP BY

The true power of GROUP BY comes when combining it with aggregate functions. Here are the most common ones:

FunctionDescription
COUNT()Counts the number of rows
SUM()Calculates the sum of values
AVG()Calculates the average of values
MIN()Returns the minimum value
MAX()Returns the maximum value

Example 2: Using Multiple Aggregate Functions

Let's analyze sales data using multiple aggregations:

sql
SELECT 
category,
COUNT(*) AS total_sales,
SUM(amount) AS total_revenue,
AVG(amount) AS average_sale,
MIN(amount) AS min_sale,
MAX(amount) AS max_sale
FROM
sales
GROUP BY
category;

Output:

+----------------+-------------+---------------+--------------+----------+----------+
| category | total_sales | total_revenue | average_sale | min_sale | max_sale |
+----------------+-------------+---------------+--------------+----------+----------+
| Electronics | 15 | 7500.00 | 500.00 | 99.99 | 1299.99 |
| Clothing | 23 | 1380.00 | 60.00 | 19.99 | 149.99 |
| Home & Kitchen | 18 | 2340.00 | 130.00 | 24.99 | 499.99 |
| Books | 12 | 240.00 | 20.00 | 9.99 | 49.99 |
+----------------+-------------+---------------+--------------+----------+----------+

This single query gives us comprehensive sales analytics for each product category.

Grouping by Multiple Columns

You can group by multiple columns to create more detailed analysis:

Example 3: Multi-Column Grouping

Let's group sales by both category and year to analyze yearly performance:

sql
SELECT 
category,
YEAR(sale_date) AS sale_year,
COUNT(*) AS total_sales,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category,
YEAR(sale_date)
ORDER BY
category,
sale_year;

Output:

+----------------+-----------+-------------+---------------+
| category | sale_year | total_sales | total_revenue |
+----------------+-----------+-------------+---------------+
| Books | 2021 | 5 | 99.95 |
| Books | 2022 | 7 | 140.05 |
| Clothing | 2021 | 10 | 599.90 |
| Clothing | 2022 | 13 | 780.10 |
| Electronics | 2021 | 7 | 3500.00 |
| Electronics | 2022 | 8 | 4000.00 |
| Home & Kitchen | 2021 | 8 | 1040.00 |
| Home & Kitchen | 2022 | 10 | 1300.00 |
+----------------+-----------+-------------+---------------+

The query groups sales first by category and then by the year extracted from the sale_date.

HAVING Clause with GROUP BY

While the WHERE clause filters individual rows before grouping, the HAVING clause filters groups after they are formed. This is essential when you need to filter based on aggregate values.

Example 4: Filtering Groups with HAVING

Let's find categories that have generated more than $2000 in revenue:

sql
SELECT 
category,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category
HAVING
total_revenue > 2000;

Output:

+----------------+---------------+
| category | total_revenue |
+----------------+---------------+
| Electronics | 7500.00 |
| Home & Kitchen | 2340.00 |
+----------------+---------------+

WITH ROLLUP Modifier

The WITH ROLLUP modifier generates super-aggregate rows that contain grand totals:

Example 5: Using WITH ROLLUP

sql
SELECT 
category,
YEAR(sale_date) AS sale_year,
SUM(amount) AS total_revenue
FROM
sales
GROUP BY
category,
YEAR(sale_date)
WITH ROLLUP;

Output:

+----------------+-----------+---------------+
| category | sale_year | total_revenue |
+----------------+-----------+---------------+
| Books | 2021 | 99.95 |
| Books | 2022 | 140.05 |
| Books | NULL | 240.00 | -- Subtotal for Books
| Clothing | 2021 | 599.90 |
| Clothing | 2022 | 780.10 |
| Clothing | NULL | 1380.00 | -- Subtotal for Clothing
| Electronics | 2021 | 3500.00 |
| Electronics | 2022 | 4000.00 |
| Electronics | NULL | 7500.00 | -- Subtotal for Electronics
| Home & Kitchen | 2021 | 1040.00 |
| Home & Kitchen | 2022 | 1300.00 |
| Home & Kitchen | NULL | 2340.00 | -- Subtotal for Home & Kitchen
| NULL | NULL | 11460.00 | -- Grand Total
+----------------+-----------+---------------+

This provides subtotals for each category as well as a grand total of all sales.

Common Pitfalls and Best Practices

1. Non-Aggregated Columns

In MySQL, any column in the SELECT list that is not enclosed in an aggregate function must be included in the GROUP BY clause. Consider this example:

sql
-- This will cause an error
SELECT
category,
product_name,
SUM(amount)
FROM
sales
GROUP BY
category;

The query above would fail because product_name is not in the GROUP BY clause. To fix it:

sql
SELECT 
category,
product_name,
SUM(amount) AS total_amount
FROM
sales
GROUP BY
category,
product_name;

2. Group By vs. Distinct

Beginners often confuse GROUP BY with DISTINCT. While both can eliminate duplicates, they serve different purposes:

sql
-- Using DISTINCT to get unique categories
SELECT DISTINCT category FROM sales;

-- Using GROUP BY to get unique categories
SELECT category FROM sales GROUP BY category;

Both queries would return the same result in this simple case, but GROUP BY allows you to use aggregate functions while DISTINCT does not.

Real-World Applications

Example 6: Monthly Sales Report

A common business requirement is to generate monthly sales reports:

sql
SELECT 
DATE_FORMAT(sale_date, '%Y-%m') AS month,
COUNT(*) AS transaction_count,
SUM(amount) AS monthly_revenue,
ROUND(AVG(amount), 2) AS average_transaction
FROM
sales
WHERE
sale_date >= '2022-01-01'
GROUP BY
DATE_FORMAT(sale_date, '%Y-%m')
ORDER BY
month;

Output:

+----------+------------------+-----------------+---------------------+
| month | transaction_count| monthly_revenue | average_transaction |
+----------+------------------+-----------------+---------------------+
| 2022-01 | 10 | 950.75 | 95.08 |
| 2022-02 | 12 | 1230.50 | 102.54 |
| 2022-03 | 15 | 1575.25 | 105.02 |
| 2022-04 | 11 | 1100.00 | 100.00 |
+----------+------------------+-----------------+---------------------+

Example 7: Customer Segmentation

Let's segment customers based on their purchase behaviors:

sql
SELECT 
CASE
WHEN SUM(amount) < 100 THEN 'Low Spender'
WHEN SUM(amount) BETWEEN 100 AND 500 THEN 'Medium Spender'
ELSE 'High Spender'
END AS customer_segment,
COUNT(*) AS customer_count,
ROUND(AVG(SUM(amount)), 2) AS average_spending
FROM
sales
GROUP BY
customer_id
GROUP BY
customer_segment;

Output:

+----------------+---------------+------------------+
| customer_segment | customer_count | average_spending |
+----------------+---------------+------------------+
| Low Spender | 45 | 57.35 |
| Medium Spender | 32 | 287.42 |
| High Spender | 18 | 742.88 |
+----------------+---------------+------------------+

Summary

The MySQL GROUP BY clause is an essential tool for data analysis that allows you to:

  • Group rows with similar values
  • Apply aggregate functions to each group
  • Generate summary statistics
  • Create sophisticated reports
  • Perform data analysis by categories

When combined with aggregate functions and other SQL features like HAVING, WITH ROLLUP, and conditional expressions, GROUP BY becomes incredibly powerful for mining insights from your data.

Practice Exercises

  1. Given a students table, write a query to find the average grade for each course.
  2. Using an orders table, find the total sales per customer, but only include customers who have spent more than $1000.
  3. Create a sales report that shows quarterly revenue for each product category for the past year, including subtotals.
  4. Write a query to find the top 3 selling products by quantity in each category.
  5. Generate a report showing customer counts by country and city, with appropriate subtotals using WITH ROLLUP.

Additional Resources

Understanding GROUP BY is a critical step in mastering SQL and will significantly enhance your ability to analyze data and generate valuable business insights from your MySQL databases.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)