SQL GROUP BY
Introduction
The GROUP BY
clause is one of the most powerful features in SQL, allowing you to organize your data into groups and perform calculations on each group separately. When you need to analyze data at a summarized level rather than individual rows, GROUP BY
is your go-to tool.
In this tutorial, you'll learn:
- What the
GROUP BY
clause does and why it's useful - How to use
GROUP BY
with aggregate functions - Common use cases and examples
- Best practices and common pitfalls
Understanding GROUP BY
Basic Concept
The GROUP BY
clause divides the rows in a result set into groups based on the values in one or more columns. It's typically used with aggregate functions like COUNT()
, SUM()
, AVG()
, MIN()
, and MAX()
to perform calculations on each group.
Think of GROUP BY
as a way to organize your data into buckets, where each bucket contains rows that share the same value in the specified column(s).
Syntax
SELECT column1, column2, ... , aggregate_function(column)
FROM table_name
GROUP BY column1, column2, ...;
Where:
column1, column2, ...
are the columns you want to group byaggregate_function(column)
is an optional aggregate function applied to each group
GROUP BY with Aggregate Functions
Let's look at a simple example using a sales
table:
CREATE TABLE sales (
id INT PRIMARY KEY,
product_name VARCHAR(50),
category VARCHAR(50),
price DECIMAL(10, 2),
quantity INT,
sale_date DATE
);
INSERT INTO sales VALUES
(1, 'Laptop', 'Electronics', 999.99, 1, '2023-01-15'),
(2, 'Mouse', 'Electronics', 24.99, 3, '2023-01-15'),
(3, 'Keyboard', 'Electronics', 59.99, 2, '2023-01-16'),
(4, 'Coffee Mug', 'Kitchen', 12.99, 4, '2023-01-17'),
(5, 'Blender', 'Kitchen', 79.99, 1, '2023-01-18'),
(6, 'Headphones', 'Electronics', 149.99, 2, '2023-01-19'),
(7, 'Plate Set', 'Kitchen', 39.99, 1, '2023-01-20'),
(8, 'Mouse', 'Electronics', 24.99, 5, '2023-01-21');
Counting Items in Each Category
To count how many different products we have in each category:
SELECT category, COUNT(*) AS product_count
FROM sales
GROUP BY category;
Output:
| category | product_count |
|-------------|---------------|
| Electronics | 4 |
| Kitchen | 3 |
The GROUP BY
clause grouped all rows with the same category
value, and the COUNT(*)
function counted the number of rows in each group.
Calculating Totals by Category
Let's calculate the total revenue (price × quantity) for each category:
SELECT
category,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY category;
Output:
| category | total_revenue |
|-------------|---------------|
| Electronics | 1485.90 |
| Kitchen | 172.95 |
Multiple Aggregations
You can use multiple aggregate functions in the same query:
SELECT
category,
COUNT(*) AS transaction_count,
SUM(quantity) AS total_items_sold,
AVG(price) AS average_price,
MIN(price) AS lowest_price,
MAX(price) AS highest_price
FROM sales
GROUP BY category;
Output:
| category | transaction_count | total_items_sold | average_price | lowest_price | highest_price |
|-------------|-------------------|------------------|---------------|--------------|---------------|
| Electronics | 4 | 11 | 307.49 | 24.99 | 999.99 |
| Kitchen | 3 | 6 | 44.32 | 12.99 | 79.99 |
Grouping by Multiple Columns
You can group by multiple columns to create more specific groups:
SELECT
category,
product_name,
SUM(quantity) AS total_quantity
FROM sales
GROUP BY category, product_name;
Output:
| category | product_name | total_quantity |
|-------------|--------------|----------------|
| Electronics | Headphones | 2 |
| Electronics | Keyboard | 2 |
| Electronics | Laptop | 1 |
| Electronics | Mouse | 8 |
| Kitchen | Blender | 1 |
| Kitchen | Coffee Mug | 4 |
| Kitchen | Plate Set | 1 |
This groups the data first by category
, and then by product_name
within each category.
GROUP BY with HAVING
The HAVING
clause allows you to filter groups based on aggregate results. It works like the WHERE
clause but operates on grouped data:
SELECT
category,
SUM(quantity) AS total_quantity
FROM sales
GROUP BY category
HAVING SUM(quantity) > 5;
Output:
| category | total_quantity |
|-------------|----------------|
| Electronics | 11 |
| Kitchen | 6 |
GROUP BY with ORDER BY
You can use ORDER BY
to sort your grouped results:
SELECT
category,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY category
ORDER BY total_revenue DESC;
Output:
| category | total_revenue |
|-------------|---------------|
| Electronics | 1485.90 |
| Kitchen | 172.95 |
Real-World Applications
Sales Analysis by Date
Let's say you want to analyze sales trends by date:
SELECT
sale_date,
COUNT(*) AS transactions,
SUM(price * quantity) AS daily_revenue
FROM sales
GROUP BY sale_date
ORDER BY sale_date;
Output:
| sale_date | transactions | daily_revenue |
|------------|--------------|---------------|
| 2023-01-15 | 2 | 1074.96 |
| 2023-01-16 | 1 | 119.98 |
| 2023-01-17 | 1 | 51.96 |
| 2023-01-18 | 1 | 79.99 |
| 2023-01-19 | 1 | 299.98 |
| 2023-01-20 | 1 | 39.99 |
| 2023-01-21 | 1 | 124.95 |
Calculating Product Popularity
SELECT
product_name,
SUM(quantity) AS units_sold,
COUNT(*) AS transaction_count
FROM sales
GROUP BY product_name
ORDER BY units_sold DESC;
Output:
| product_name | units_sold | transaction_count |
|--------------|------------|-------------------|
| Mouse | 8 | 2 |
| Coffee Mug | 4 | 1 |
| Headphones | 2 | 1 |
| Keyboard | 2 | 1 |
| Blender | 1 | 1 |
| Laptop | 1 | 1 |
| Plate Set | 1 | 1 |
Group by Year and Month
Let's extract the year and month from the sale date and group by them:
SELECT
EXTRACT(YEAR FROM sale_date) AS year,
EXTRACT(MONTH FROM sale_date) AS month,
SUM(price * quantity) AS monthly_revenue
FROM sales
GROUP BY
EXTRACT(YEAR FROM sale_date),
EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;
Output:
| year | month | monthly_revenue |
|------|-------|-----------------|
| 2023 | 1 | 1791.81 |
In this example, all sales were in January 2023, but in a real-world scenario, this query would show revenue trends over time.
Common Pitfalls and Best Practices
The SELECT Clause Rule
In a GROUP BY
query, every column in the SELECT
clause must either:
- Be included in the
GROUP BY
clause, or - Be used within an aggregate function
This incorrect query will generate an error:
-- Incorrect: product_name is not in GROUP BY
SELECT category, product_name, COUNT(*)
FROM sales
GROUP BY category;
The correct version:
SELECT category, product_name, COUNT(*)
FROM sales
GROUP BY category, product_name;
Order of Operations
The SQL execution order for a query with GROUP BY
is:
FROM
clauseWHERE
clauseGROUP BY
clauseHAVING
clauseSELECT
clauseORDER BY
clause
Understanding this order is important. For example, you can't use an alias created in the SELECT
clause within the GROUP BY
or HAVING
clause.
GROUP BY vs. DISTINCT
While both GROUP BY
and DISTINCT
can remove duplicates, they serve different purposes:
DISTINCT
simply removes duplicate rowsGROUP BY
organizes data into groups for aggregate calculations
-- Using DISTINCT
SELECT DISTINCT category FROM sales;
-- Using GROUP BY
SELECT category FROM sales GROUP BY category;
Both queries return the same result in this case, but GROUP BY
allows you to perform calculations on each group.
Visual Representation
Here's a diagram showing how GROUP BY
works:
Summary
The GROUP BY
clause is an essential SQL feature for data analysis and reporting. Here's what we've covered:
GROUP BY
organizes rows into groups based on column values- It's typically used with aggregate functions like
COUNT()
,SUM()
,AVG()
,MIN()
, andMAX()
- You can group by multiple columns for more specific grouping
- The
HAVING
clause filters groups based on aggregate results - Every column in the
SELECT
list must either be in theGROUP BY
clause or used within an aggregate function
By mastering GROUP BY
, you'll be able to transform raw data into meaningful insights and summaries that help you understand your data better.
Exercises
- Write a query to find the total quantity sold for each product in the sales table.
- Write a query to find the average price of products in each category.
- Find the date with the highest total sales revenue.
- Count how many sales transactions occurred on each day of the week.
- Find categories where the average price is greater than $50.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)