PostgreSQL HAVING
Introduction
When working with databases, you'll often need to analyze grouped data - like finding average sales by product category, identifying departments with high employee turnover, or determining customers who have placed multiple orders. While PostgreSQL's GROUP BY
clause allows you to organize data into groups, the HAVING
clause gives you the power to filter those groups based on aggregate conditions.
In this tutorial, you'll learn:
- What the
HAVING
clause is and why it's necessary - How
HAVING
differs fromWHERE
- Syntax and usage patterns
- Practical real-world examples
- Common mistakes and how to avoid them
Understanding the HAVING Clause
The HAVING
clause works in conjunction with GROUP BY
to filter grouped data based on aggregate functions like COUNT()
, SUM()
, AVG()
, MAX()
, or MIN()
.
Basic Syntax
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1
HAVING condition;
The key difference between WHERE
and HAVING
is:
WHERE
filters individual rows before they are groupedHAVING
filters groups after theGROUP BY
operation
WHERE vs. HAVING: Key Differences
Let's examine this crucial distinction with a simple example:
-- Using WHERE (filters rows before grouping)
SELECT department_id, AVG(salary)
FROM employees
WHERE salary > 50000
GROUP BY department_id;
-- Using HAVING (filters groups after grouping)
SELECT department_id, AVG(salary)
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000;
These queries produce different results:
- The first query calculates the average salary only for employees who individually earn more than $50,000
- The second query calculates the average salary for all employees in each department, then shows only departments where that average exceeds $50,000
Basic HAVING Examples
Let's explore some basic examples to understand how HAVING
works:
Example 1: Finding Products with Multiple Orders
SELECT product_id, COUNT(order_id) AS order_count
FROM order_items
GROUP BY product_id
HAVING COUNT(order_id) > 10;
Output:
product_id | order_count
------------+-------------
156 | 24
42 | 15
107 | 12
88 | 11
This query shows products that appear in more than 10 orders.
Example 2: Identifying High-Value Customer Segments
SELECT customer_segment, SUM(purchase_amount) AS total_sales
FROM sales
GROUP BY customer_segment
HAVING SUM(purchase_amount) > 100000;
Output:
customer_segment | total_sales
------------------+--------------
Enterprise | 345621.50
Small Business | 124500.75
This query identifies customer segments that have generated more than $100,000 in total sales.
Working with Multiple Aggregates in HAVING
You can use multiple aggregate functions in your HAVING
clause to create more complex filtering conditions:
SELECT category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) >= 5 AND AVG(price) < 50;
Output:
category | product_count | avg_price
-------------+---------------+-----------
Electronics | 8 | 42.75
Kitchen | 12 | 28.50
Books | 15 | 15.25
This query finds product categories that have at least 5 products with an average price below $50.
Using HAVING with Joins
The HAVING
clause works seamlessly with joined tables, allowing you to filter grouped data across relationships:
SELECT c.customer_name,
COUNT(o.order_id) AS order_count,
SUM(o.order_total) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name
HAVING COUNT(o.order_id) >= 3 AND SUM(o.order_total) > 1000;
Output:
customer_name | order_count | total_spent
------------------+-------------+-------------
Johnson Company | 5 | 2450.50
Smith Industries | 3 | 1850.75
Tech Solutions | 7 | 5320.25
This query identifies valuable repeat customers who have placed at least 3 orders and spent more than $1,000 in total.
Practical Applications
Let's explore some real-world applications of the HAVING
clause:
Finding Anomalies in Data
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 2 * (SELECT AVG(salary) FROM employees);
This query identifies departments with an average salary more than twice the company-wide average - potentially highlighting budget allocation issues.
Analyzing Product Performance
SELECT p.product_name,
COUNT(s.sale_id) AS sale_count,
SUM(s.quantity) AS units_sold
FROM products p
JOIN sales s ON p.product_id = s.product_id
GROUP BY p.product_name
HAVING COUNT(s.sale_id) < 5 AND SUM(s.quantity) > 100;
This query finds products with few sales transactions but high quantities per order - potentially indicating bulk purchasing patterns.
Identifying Active Users
SELECT user_id,
COUNT(DISTINCT date(login_time)) AS active_days
FROM user_logins
WHERE login_time >= current_date - interval '30 days'
GROUP BY user_id
HAVING COUNT(DISTINCT date(login_time)) >= 20;
This query finds power users who have logged in on at least 20 different days in the past month.
Query Execution Order
Understanding the execution order of SQL clauses helps in writing effective queries:
Note that:
WHERE
filters rows before groupingGROUP BY
creates groupsHAVING
filters groups after they're createdSELECT
determines which columns appear in the results
Common Mistakes and How to Avoid Them
Mistake 1: Using HAVING Without GROUP BY
-- This will cause an error
SELECT product_name
FROM products
HAVING price > 100;
Correct approach:
-- Use WHERE instead
SELECT product_name
FROM products
WHERE price > 100;
Mistake 2: Using HAVING for Non-Aggregate Filtering
-- Inefficient approach
SELECT category, COUNT(*) AS count
FROM products
GROUP BY category
HAVING category != 'Accessories';
Better approach:
-- Filter non-aggregated columns with WHERE
SELECT category, COUNT(*) AS count
FROM products
WHERE category != 'Accessories'
GROUP BY category;
Mistake 3: Forgetting That HAVING Works on Grouped Data
-- This won't work as expected
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING salary > 50000;
Correct approach:
-- Use aggregate functions in HAVING
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;
Summary
The PostgreSQL HAVING
clause is a powerful tool for filtering grouped data based on aggregate conditions. Remember these key points:
- Use
WHERE
to filter individual rows before grouping - Use
HAVING
to filter groups after theGROUP BY
operation HAVING
works with aggregate functions likeCOUNT()
,SUM()
, andAVG()
HAVING
is executed afterGROUP BY
but beforeSELECT
By mastering the HAVING
clause, you can create sophisticated queries that answer complex business questions about your data.
Exercises
To reinforce your understanding, try these exercises:
-
Write a query to find departments with more than 10 employees and an average tenure greater than 5 years.
-
Create a query to identify products that have been ordered by at least 5 different customers with a total quantity exceeding 100 units.
-
Develop a query to find customer segments that have both high average order values (>$200) and high repeat purchase rates (>3 orders per customer on average).
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)