Skip to main content

GROUP BY Clause

The GROUP BY clause is one of the most powerful features in SQL, allowing you to organize and summarize your data in meaningful ways. When you need to analyze data by categories or perform calculations across groups of records, GROUP BY becomes an essential tool in your SQL toolkit.

What is the GROUP BY Clause?

The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It's commonly used with aggregate functions (like COUNT, MAX, AVG, SUM) to perform calculations on each group of rows rather than on the entire table.

Think of GROUP BY like sorting items into different buckets - each bucket represents a group with the same value for the column(s) you specify.

Basic Syntax

sql
SELECT column1, column2, ..., aggregate_function(column)
FROM table_name
GROUP BY column1, column2, ...;

Where:

  • The columns listed after GROUP BY are the ones you want to group your data by
  • The SELECT clause typically includes the columns you're grouping by and aggregate functions

Simple Example

Let's start with a basic example. Imagine we have a sales table:

sql
CREATE TABLE sales (
id INT PRIMARY KEY,
product_name VARCHAR(50),
category VARCHAR(30),
price DECIMAL(10,2),
sale_date DATE
);

INSERT INTO sales VALUES
(1, 'Laptop', 'Electronics', 1200.00, '2023-01-15'),
(2, 'Headphones', 'Electronics', 250.00, '2023-01-16'),
(3, 'Coffee Maker', 'Kitchen', 89.99, '2023-01-16'),
(4, 'Blender', 'Kitchen', 75.50, '2023-01-17'),
(5, 'Smartphone', 'Electronics', 800.00, '2023-01-18'),
(6, 'Toaster', 'Kitchen', 45.99, '2023-01-19');

If we want to know how many products we have in each category, we can use GROUP BY:

sql
SELECT category, COUNT(*) AS product_count
FROM sales
GROUP BY category;

Result:

category     | product_count
-------------|--------------
Electronics | 3
Kitchen | 3

This query groups all rows with the same category together and counts how many rows are in each group.

Using Multiple Aggregate Functions

One of the strengths of GROUP BY is that you can use multiple aggregate functions in a single query:

sql
SELECT 
category,
COUNT(*) AS product_count,
AVG(price) AS average_price,
SUM(price) AS total_value,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM sales
GROUP BY category;

Result:

category    | product_count | average_price | total_value | min_price | max_price
------------|---------------|---------------|-------------|-----------|----------
Electronics | 3 | 750.00 | 2250.00 | 250.00 | 1200.00
Kitchen | 3 | 70.49 | 211.48 | 45.99 | 89.99

This gives us a comprehensive summary of each product category.

Grouping by Multiple Columns

You can group by multiple columns to create more specific groups:

sql
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_category VARCHAR(30),
order_date DATE,
amount DECIMAL(10,2)
);

INSERT INTO orders VALUES
(101, 1, 'Electronics', '2023-01-10', 1200.00),
(102, 2, 'Kitchen', '2023-01-10', 89.99),
(103, 1, 'Kitchen', '2023-01-11', 45.99),
(104, 3, 'Electronics', '2023-01-11', 800.00),
(105, 2, 'Electronics', '2023-01-12', 250.00),
(106, 3, 'Kitchen', '2023-01-12', 75.50);

Now we can group by both date and category:

sql
SELECT 
order_date,
product_category,
COUNT(*) AS order_count,
SUM(amount) AS total_sales
FROM orders
GROUP BY order_date, product_category
ORDER BY order_date, product_category;

Result:

order_date | product_category | order_count | total_sales
-----------|------------------|-------------|------------
2023-01-10 | Electronics | 1 | 1200.00
2023-01-10 | Kitchen | 1 | 89.99
2023-01-11 | Electronics | 1 | 800.00
2023-01-11 | Kitchen | 1 | 45.99
2023-01-12 | Electronics | 1 | 250.00
2023-01-12 | Kitchen | 1 | 75.50

This gives us a breakdown of sales by both date and product category.

Common Aggregate Functions Used with GROUP BY

Here are the most commonly used aggregate functions with GROUP BY:

  1. COUNT(): Counts the number of rows in each group
  2. SUM(): Calculates the sum of values in a column for each group
  3. AVG(): Calculates the average of values in a column for each group
  4. MIN(): Finds the minimum value in a column for each group
  5. MAX(): Finds the maximum value in a column for each group

The HAVING Clause

When you need to filter groups based on aggregate results, you can't use WHERE (which filters individual rows before grouping). Instead, you use the HAVING clause:

sql
SELECT 
product_category,
COUNT(*) AS order_count,
SUM(amount) AS total_sales
FROM orders
GROUP BY product_category
HAVING SUM(amount) > 500;

Result:

product_category | order_count | total_sales
-----------------|-------------|------------
Electronics | 3 | 2250.00

This query only shows product categories with total sales greater than $500.

Common Mistakes and Gotchas

1. Selecting Non-Grouped Columns

A common mistake is trying to select columns that are not part of the GROUP BY clause and not used in an aggregate function:

sql
-- This will produce an error in most SQL databases
SELECT product_name, category, COUNT(*)
FROM sales
GROUP BY category;

This fails because the database doesn't know which product_name to show for each category group. Every column in the SELECT statement must either:

  • Be included in the GROUP BY clause, or
  • Be used with an aggregate function

2. Forgetting that NULL is a Group

NULL values form their own group in a GROUP BY clause:

sql
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
department VARCHAR(50)
);

INSERT INTO employees VALUES
(1, 'Alice', 'Engineering'),
(2, 'Bob', 'Marketing'),
(3, 'Charlie', NULL),
(4, 'Dave', 'Engineering'),
(5, 'Eve', NULL);

SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Result:

department   | employee_count
-------------|---------------
Engineering | 2
Marketing | 1
NULL | 2

Real-World Applications

Sales Analysis by Time Period

GROUP BY is perfect for analyzing sales trends over time:

sql
CREATE TABLE monthly_sales (
id INT PRIMARY KEY,
product_id INT,
sale_date DATE,
quantity INT,
revenue DECIMAL(10,2)
);

INSERT INTO monthly_sales VALUES
(1, 101, '2023-01-15', 5, 500.00),
(2, 102, '2023-01-20', 3, 300.00),
(3, 101, '2023-02-10', 7, 700.00),
(4, 103, '2023-02-15', 2, 150.00),
(5, 102, '2023-03-05', 4, 400.00),
(6, 101, '2023-03-20', 6, 600.00);

-- Monthly sales analysis
SELECT
EXTRACT(MONTH FROM sale_date) AS month,
EXTRACT(YEAR FROM sale_date) AS year,
SUM(revenue) AS total_revenue,
SUM(quantity) AS total_units_sold
FROM monthly_sales
GROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;

Result:

month | year | total_revenue | total_units_sold
------|------|---------------|------------------
1 | 2023 | 800.00 | 8
2 | 2023 | 850.00 | 9
3 | 2023 | 1000.00 | 10

Customer Segmentation

GROUP BY helps categorize customers based on their purchasing behavior:

sql
CREATE TABLE customer_purchases (
id INT PRIMARY KEY,
customer_id INT,
purchase_date DATE,
amount DECIMAL(10,2)
);

INSERT INTO customer_purchases VALUES
(1, 1, '2023-01-10', 120.00),
(2, 2, '2023-01-15', 85.50),
(3, 1, '2023-01-20', 200.00),
(4, 3, '2023-01-25', 50.00),
(5, 2, '2023-02-05', 150.00),
(6, 1, '2023-02-10', 75.00);

-- Categorizing customers by total spend
SELECT
customer_id,
COUNT(*) AS total_purchases,
SUM(amount) AS total_spend,
CASE
WHEN SUM(amount) > 300 THEN 'High Value'
WHEN SUM(amount) > 150 THEN 'Medium Value'
ELSE 'Low Value'
END AS customer_segment
FROM customer_purchases
GROUP BY customer_id
ORDER BY total_spend DESC;

Result:

customer_id | total_purchases | total_spend | customer_segment
------------|-----------------|-------------|-----------------
1 | 3 | 395.00 | High Value
2 | 2 | 235.50 | Medium Value
3 | 1 | 50.00 | Low Value

GROUP BY with ROLLUP (Advanced)

Some databases support ROLLUP, which adds extra rows for subtotals and grand totals:

sql
-- This syntax works in MySQL and some other databases
SELECT
IFNULL(category, 'All Categories') AS category,
COUNT(*) AS product_count,
SUM(price) AS total_value
FROM sales
GROUP BY category WITH ROLLUP;

Result:

category        | product_count | total_value
----------------|---------------|------------
Electronics | 3 | 2250.00
Kitchen | 3 | 211.48
All Categories | 6 | 2461.48

The last row provides totals across all categories.

Summary

The GROUP BY clause is a fundamental SQL feature that allows you to:

  • Group related data together based on common values
  • Perform aggregate calculations on each group of data
  • Create meaningful summaries and reports
  • Analyze patterns and trends in your data

When combined with aggregate functions and the HAVING clause, GROUP BY becomes an incredibly powerful tool for data analysis and reporting.

Practice Exercises

  1. Create a table representing student grades in different subjects. Write a query to find the average grade for each student across all subjects.

  2. Using the same table, find which subjects have the highest and lowest average grades.

  3. Create a sales table with product, salesperson, and region columns. Write a query to find the top-selling product in each region.

  4. Advanced: Write a query that shows sales by quarter and provides quarter-over-quarter percentage changes.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)