PostgreSQL Aggregate Functions
Introduction
Aggregate functions are a powerful feature in PostgreSQL that allow you to perform calculations across multiple rows of data, returning a single summarized value. They're essential tools for data analysis, reporting, and statistical operations in databases.
Unlike regular functions that process one row at a time, aggregate functions work on sets of rows, making them perfect for tasks like:
- Calculating sums, averages, and other statistical measures
- Finding minimum and maximum values
- Counting records that match specific criteria
- Grouping data for meaningful summaries
In this guide, we'll explore PostgreSQL's aggregate functions, learn how they work, and see practical examples of how to use them in real-world scenarios.
Basic Aggregate Functions
PostgreSQL offers several built-in aggregate functions for common statistical operations. Let's explore these one by one.
COUNT()
The COUNT()
function returns the number of rows that match a specified condition.
-- Count all rows in a table
SELECT COUNT(*) FROM employees;
-- Count non-null values in a specific column
SELECT COUNT(email) FROM employees;
-- Count unique values
SELECT COUNT(DISTINCT department) FROM employees;
Example output:
count
-------
42
SUM()
The SUM()
function calculates the total of numeric values in a column.
-- Calculate total salary of all employees
SELECT SUM(salary) FROM employees;
-- Calculate total salary per department
SELECT department, SUM(salary) FROM employees GROUP BY department;
Example output:
department | sum
--------------+------------
Engineering | 1250000.00
Marketing | 680000.00
Sales | 925000.00
AVG()
The AVG()
function computes the average (mean) of values in a column.
-- Calculate average salary across the company
SELECT AVG(salary) FROM employees;
-- Calculate average salary by job title
SELECT job_title, AVG(salary) FROM employees GROUP BY job_title;
Example output:
avg
------------------
67842.85714285714
MIN() and MAX()
These functions find the minimum and maximum values in a column, respectively.
-- Find the lowest and highest salaries
SELECT MIN(salary), MAX(salary) FROM employees;
-- Find salary range per department
SELECT department, MIN(salary), MAX(salary) FROM employees GROUP BY department;
Example output:
min | max
----------+----------
42000.00 | 150000.00
Grouping Data with GROUP BY
The true power of aggregate functions emerges when combined with the GROUP BY
clause, which allows you to categorize data before applying the aggregate function.
-- Count employees in each department
SELECT department, COUNT(*) FROM employees GROUP BY department;
-- Find average salaries by department and job title
SELECT department, job_title, AVG(salary)
FROM employees
GROUP BY department, job_title;
Example output:
department | count
--------------+-------
Engineering | 18
Marketing | 8
Sales | 12
HR | 4
Filtering Grouped Results with HAVING
While the WHERE
clause filters rows before grouping, the HAVING
clause filters the results of aggregate functions.
-- Find departments with more than 10 employees
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
-- Find departments with average salary greater than 70000
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;
Example output:
department | count
--------------+-------
Engineering | 18
Sales | 12
Advanced Aggregate Functions
PostgreSQL provides several more sophisticated aggregate functions for statistical analysis and complex data manipulation.
ARRAY_AGG()
This function aggregates values into an array.
-- Collect all employee names by department
SELECT department, ARRAY_AGG(full_name) AS employees
FROM employees
GROUP BY department;
Example output:
department | employees
------------+------------------------------------------------------------------------
Sales | {John Smith,Maria Garcia,Robert Johnson,Sarah Thompson}
Engineering| {David Wilson,Michael Brown,Lisa Davis,James Miller,Jennifer Wilson}
STRING_AGG()
This function concatenates strings from multiple rows, with an optional delimiter.
-- Combine all employee names into a comma-separated list by department
SELECT department, STRING_AGG(full_name, ', ') AS employee_list
FROM employees
GROUP BY department;
Example output:
department | employee_list
------------+----------------------------------------------------------------
Sales | John Smith, Maria Garcia, Robert Johnson, Sarah Thompson
Engineering| David Wilson, Michael Brown, Lisa Davis, James Miller
JSONB_AGG()
This function aggregates values into a JSON array.
-- Create a JSON array of employee information by department
SELECT department, JSONB_AGG(
jsonb_build_object(
'name', full_name,
'salary', salary,
'hire_date', hire_date
)
) AS team_details
FROM employees
GROUP BY department;
Example output:
department | team_details
------------+----------------------------------------------------------------
Sales | [{"name": "John Smith", "salary": 65000, "hire_date": "2022-03-15"}, {"name": "Maria Garcia", "salary": 67000, "hire_date": "2021-11-02"}, ...]
Statistical Functions
PostgreSQL includes aggregate functions for statistical analysis:
-- Calculate various statistics for salaries
SELECT
department,
COUNT(*) AS count,
AVG(salary) AS mean,
STDDEV(salary) AS standard_deviation,
VAR_SAMP(salary) AS variance,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees
GROUP BY department;
Example output:
department | count | mean | standard_deviation | variance | median
------------+-------+---------+--------------------+----------+--------
Engineering| 18 | 85000.50| 15250.45 | 232575122| 82500.00
Marketing | 8 | 72500.00| 12750.55 | 162575752| 70000.00
Window Functions vs. Aggregate Functions
While both window and aggregate functions perform calculations across sets of rows, they differ in a fundamental way:
- Aggregate functions combine multiple rows into a single result row
- Window functions perform calculations across rows but keep all rows in the result
Here's a comparison:
-- Aggregate function: Returns one row with the average
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
-- Window function: Returns all rows with the average added as a column
SELECT
full_name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;
Example output for the window function:
full_name | department | salary | dept_avg_salary
--------------+------------+---------+----------------
John Smith | Sales | 65000.00| 68750.00
Maria Garcia | Sales | 67000.00| 68750.00
Robert Lee | Sales | 72000.00| 68750.00
Sarah Johnson| Sales | 71000.00| 68750.00
David Wilson | Engineering| 85000.00| 82500.00
Michael Brown| Engineering| 92000.00| 82500.00
Aggregate Functions with Conditional Logic
You can use CASE
statements within aggregate functions to perform conditional aggregation.
Conditional COUNT
-- Count employees by salary range
SELECT
COUNT(*) AS total_employees,
COUNT(CASE WHEN salary < 50000 THEN 1 END) AS junior_level,
COUNT(CASE WHEN salary BETWEEN 50000 AND 80000 THEN 1 END) AS mid_level,
COUNT(CASE WHEN salary > 80000 THEN 1 END) AS senior_level
FROM employees;
Example output:
total_employees | junior_level | mid_level | senior_level
-----------------+--------------+-----------+-------------
42 | 7 | 20 | 15
Conditional SUM
-- Calculate total salary by experience level
SELECT
SUM(salary) AS total_salary,
SUM(CASE WHEN experience_years < 2 THEN salary ELSE 0 END) AS junior_salary,
SUM(CASE WHEN experience_years BETWEEN 2 AND 5 THEN salary ELSE 0 END) AS mid_salary,
SUM(CASE WHEN experience_years > 5 THEN salary ELSE 0 END) AS senior_salary
FROM employees;
Example output:
total_salary | junior_salary | mid_salary | senior_salary
--------------+---------------+------------+--------------
2850000.00 | 310000.00 | 1240000.00 | 1300000.00
Performance Considerations
When using aggregate functions with large datasets, keep these performance tips in mind:
- Indexes: Ensure columns used in
GROUP BY
clauses are properly indexed - Filter early: Use
WHERE
clauses to reduce the dataset before aggregation - Materialized Views: Consider using materialized views for frequently run aggregate queries
- Partial Aggregation: For very large datasets, consider breaking calculations into steps
Practical Examples
Example 1: Sales Analysis
Let's analyze a sales database to generate a monthly sales report:
-- Monthly sales summary
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(order_total) AS total_revenue,
AVG(order_total) AS average_order_value,
SUM(order_total) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Example output:
month | order_count | unique_customers | total_revenue | average_order_value | revenue_per_customer
--------------+-------------+------------------+---------------+---------------------+---------------------
2023-01-01 | 245 | 180 | 24680.50 | 100.74 | 137.11
2023-02-01 | 268 | 195 | 27850.25 | 103.92 | 142.82
...
Example 2: Employee Performance Dashboard
Create a comprehensive view of department performance:
-- Department performance dashboard
SELECT
d.department_name,
COUNT(e.employee_id) AS employee_count,
ROUND(AVG(e.salary), 2) AS avg_salary,
ROUND(AVG(p.performance_score), 1) AS avg_performance,
COUNT(CASE WHEN p.performance_score >= 4 THEN 1 END) AS top_performers,
STRING_AGG(CASE WHEN p.performance_score >= 4.5 THEN e.full_name ELSE NULL END, ', ') AS star_employees
FROM employees e
JOIN departments d ON e.department_id = d.department_id
JOIN performance p ON e.employee_id = p.employee_id
WHERE p.review_date >= (CURRENT_DATE - INTERVAL '1 year')
GROUP BY d.department_name
ORDER BY avg_performance DESC;
Example output:
department_name | employee_count | avg_salary | avg_performance | top_performers | star_employees
-----------------+----------------+------------+-----------------+----------------+----------------
Engineering | 18 | 85000.50 | 4.2 | 12 | David Wilson, Lisa Davis, James Miller
Sales | 12 | 68750.00 | 3.9 | 7 | Maria Garcia, Sarah Thompson
Marketing | 8 | 72500.00 | 3.7 | 4 | Nicole Adams
HR | 4 | 62500.00 | 3.5 | 1 |
Example 3: Product Inventory Analysis
Analyze your inventory status across categories:
-- Inventory analysis by category
SELECT
c.category_name,
COUNT(p.product_id) AS product_count,
SUM(i.quantity) AS total_stock,
ROUND(AVG(i.quantity), 0) AS avg_stock_per_product,
MIN(i.quantity) AS min_stock,
MAX(i.quantity) AS max_stock,
COUNT(CASE WHEN i.quantity < 10 THEN 1 END) AS low_stock_products,
STRING_AGG(CASE WHEN i.quantity < 5 THEN p.product_name ELSE NULL END, ', ') AS critical_stock_items
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN inventory i ON p.product_id = i.product_id
GROUP BY c.category_name
ORDER BY low_stock_products DESC;
Example output:
category_name | product_count | total_stock | avg_stock_per_product | min_stock | max_stock | low_stock_products | critical_stock_items
---------------+---------------+-------------+-----------------------+-----------+-----------+--------------------+---------------------
Electronics | 28 | 1245 | 44 | 2 | 182 | 7 | Wireless Earbuds, Smart Watch
Home Goods | 35 | 2850 | 81 | 15 | 320 | 0 |
Clothing | 42 | 1680 | 40 | 4 | 120 | 3 | Winter Jacket
Summary
PostgreSQL aggregate functions are powerful tools for data summarization and analysis that allow you to:
- Calculate statistical measures across groups of data
- Combine aggregate functions with
GROUP BY
to categorize results - Filter grouped results using
HAVING
- Use advanced aggregation functions for complex data manipulation
- Perform conditional aggregation with
CASE
statements
Mastering these functions is essential for database reporting, data analysis, and building meaningful insights from your PostgreSQL databases.
Additional Exercises
To practice your understanding of PostgreSQL aggregate functions, try these exercises:
-
Calculate the total, average, minimum, and maximum order values for each customer who has placed at least 5 orders.
-
For each product category, find the number of products, average price, and total inventory value (price × quantity).
-
Generate a monthly report showing the number of new user registrations and the retention rate (users who returned from the previous month).
-
Create a query that shows departments where the average salary is below the company-wide average.
-
Find products that have higher than average ratings and how many reviews they have received.
Further Resources
To deepen your knowledge of PostgreSQL aggregate functions:
- PostgreSQL Official Documentation on Aggregate Functions
- PostgreSQL Tutorial: GROUP BY
- Advanced SQL and PostgreSQL for Data Analysis
Remember that aggregate functions are most powerful when combined with other PostgreSQL features like window functions, common table expressions (CTEs), and advanced filtering techniques.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)