PostgreSQL Aggregate Functions

Introduction

Aggregate functions are a powerful feature in PostgreSQL that allow you to perform calculations across multiple rows of data, returning a single summarized value. They're essential tools for data analysis, reporting, and statistical operations in databases.

Unlike regular functions that process one row at a time, aggregate functions work on sets of rows, making them perfect for tasks like:

Calculating sums, averages, and other statistical measures
Finding minimum and maximum values
Counting records that match specific criteria
Grouping data for meaningful summaries

In this guide, we'll explore PostgreSQL's aggregate functions, learn how they work, and see practical examples of how to use them in real-world scenarios.

Basic Aggregate Functions

PostgreSQL offers several built-in aggregate functions for common statistical operations. Let's explore these one by one.

COUNT()

The COUNT() function returns the number of rows that match a specified condition.

sql
-- Count all rows in a table
SELECT COUNT(*) FROM employees;

-- Count non-null values in a specific column
SELECT COUNT(email) FROM employees;

-- Count unique values
SELECT COUNT(DISTINCT department) FROM employees;

Example output:

 count
-------
    42

SUM()

The SUM() function calculates the total of numeric values in a column.

sql
-- Calculate total salary of all employees
SELECT SUM(salary) FROM employees;

-- Calculate total salary per department
SELECT department, SUM(salary) FROM employees GROUP BY department;

Example output:

  department  |    sum
--------------+------------
 Engineering  | 1250000.00
 Marketing    |  680000.00
 Sales        |  925000.00

AVG()

The AVG() function computes the average (mean) of values in a column.

sql
-- Calculate average salary across the company
SELECT AVG(salary) FROM employees;

-- Calculate average salary by job title
SELECT job_title, AVG(salary) FROM employees GROUP BY job_title;

Example output:

       avg
------------------
 67842.85714285714

MIN() and MAX()

These functions find the minimum and maximum values in a column, respectively.

sql
-- Find the lowest and highest salaries
SELECT MIN(salary), MAX(salary) FROM employees;

-- Find salary range per department
SELECT department, MIN(salary), MAX(salary) FROM employees GROUP BY department;

Example output:

   min    |   max
----------+----------
 42000.00 | 150000.00

Grouping Data with GROUP BY

The true power of aggregate functions emerges when combined with the GROUP BY clause, which allows you to categorize data before applying the aggregate function.

sql
-- Count employees in each department
SELECT department, COUNT(*) FROM employees GROUP BY department;

-- Find average salaries by department and job title
SELECT department, job_title, AVG(salary) 
FROM employees 
GROUP BY department, job_title;

Example output:

  department  | count
--------------+-------
 Engineering  |    18
 Marketing    |     8
 Sales        |    12
 HR           |     4

Filtering Grouped Results with HAVING

While the WHERE clause filters rows before grouping, the HAVING clause filters the results of aggregate functions.

sql
-- Find departments with more than 10 employees
SELECT department, COUNT(*) 
FROM employees 
GROUP BY department 
HAVING COUNT(*) > 10;

-- Find departments with average salary greater than 70000
SELECT department, AVG(salary) 
FROM employees 
GROUP BY department 
HAVING AVG(salary) > 70000;

Example output:

  department  | count
--------------+-------
 Engineering  |    18
 Sales        |    12

Advanced Aggregate Functions

PostgreSQL provides several more sophisticated aggregate functions for statistical analysis and complex data manipulation.

ARRAY_AGG()

This function aggregates values into an array.

sql
-- Collect all employee names by department
SELECT department, ARRAY_AGG(full_name) AS employees
FROM employees
GROUP BY department;

Example output:

 department |                               employees
------------+------------------------------------------------------------------------
 Sales      | {John Smith,Maria Garcia,Robert Johnson,Sarah Thompson}
 Engineering| {David Wilson,Michael Brown,Lisa Davis,James Miller,Jennifer Wilson}

STRING_AGG()

This function concatenates strings from multiple rows, with an optional delimiter.

sql
-- Combine all employee names into a comma-separated list by department
SELECT department, STRING_AGG(full_name, ', ') AS employee_list
FROM employees
GROUP BY department;

Example output:

 department |                         employee_list
------------+----------------------------------------------------------------
 Sales      | John Smith, Maria Garcia, Robert Johnson, Sarah Thompson
 Engineering| David Wilson, Michael Brown, Lisa Davis, James Miller

JSONB_AGG()

This function aggregates values into a JSON array.

sql
-- Create a JSON array of employee information by department
SELECT department, JSONB_AGG(
  jsonb_build_object(
    'name', full_name,
    'salary', salary,
    'hire_date', hire_date
  )
) AS team_details
FROM employees
GROUP BY department;

Example output:

 department |                         team_details
------------+----------------------------------------------------------------
 Sales      | [{"name": "John Smith", "salary": 65000, "hire_date": "2022-03-15"}, {"name": "Maria Garcia", "salary": 67000, "hire_date": "2021-11-02"}, ...]

Statistical Functions

PostgreSQL includes aggregate functions for statistical analysis:

sql
-- Calculate various statistics for salaries
SELECT 
  department,
  COUNT(*) AS count,
  AVG(salary) AS mean,
  STDDEV(salary) AS standard_deviation,
  VAR_SAMP(salary) AS variance,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees
GROUP BY department;

Example output:

 department | count |  mean   | standard_deviation | variance | median
------------+-------+---------+--------------------+----------+--------
 Engineering|  18   | 85000.50|     15250.45       | 232575122| 82500.00
 Marketing  |   8   | 72500.00|     12750.55       | 162575752| 70000.00

Window Functions vs. Aggregate Functions

While both window and aggregate functions perform calculations across sets of rows, they differ in a fundamental way:

Aggregate functions combine multiple rows into a single result row
Window functions perform calculations across rows but keep all rows in the result

Here's a comparison:

sql
-- Aggregate function: Returns one row with the average
SELECT department, AVG(salary) 
FROM employees 
GROUP BY department;

-- Window function: Returns all rows with the average added as a column
SELECT 
  full_name, 
  department, 
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;

Example output for the window function:

  full_name   | department | salary  | dept_avg_salary
--------------+------------+---------+----------------
 John Smith   | Sales      | 65000.00|      68750.00
 Maria Garcia | Sales      | 67000.00|      68750.00
 Robert Lee   | Sales      | 72000.00|      68750.00
 Sarah Johnson| Sales      | 71000.00|      68750.00
 David Wilson | Engineering| 85000.00|      82500.00
 Michael Brown| Engineering| 92000.00|      82500.00

Aggregate Functions with Conditional Logic

You can use CASE statements within aggregate functions to perform conditional aggregation.

Conditional COUNT

sql
-- Count employees by salary range
SELECT 
  COUNT(*) AS total_employees,
  COUNT(CASE WHEN salary < 50000 THEN 1 END) AS junior_level,
  COUNT(CASE WHEN salary BETWEEN 50000 AND 80000 THEN 1 END) AS mid_level,
  COUNT(CASE WHEN salary > 80000 THEN 1 END) AS senior_level
FROM employees;

Example output:

 total_employees | junior_level | mid_level | senior_level
-----------------+--------------+-----------+-------------
              42 |            7 |        20 |          15

Conditional SUM

sql
-- Calculate total salary by experience level
SELECT 
  SUM(salary) AS total_salary,
  SUM(CASE WHEN experience_years < 2 THEN salary ELSE 0 END) AS junior_salary,
  SUM(CASE WHEN experience_years BETWEEN 2 AND 5 THEN salary ELSE 0 END) AS mid_salary,
  SUM(CASE WHEN experience_years > 5 THEN salary ELSE 0 END) AS senior_salary
FROM employees;

Example output:

 total_salary | junior_salary | mid_salary | senior_salary
--------------+---------------+------------+--------------
   2850000.00 |     310000.00 | 1240000.00 |    1300000.00

Performance Considerations

When using aggregate functions with large datasets, keep these performance tips in mind:

Indexes: Ensure columns used in GROUP BY clauses are properly indexed
Filter early: Use WHERE clauses to reduce the dataset before aggregation
Materialized Views: Consider using materialized views for frequently run aggregate queries
Partial Aggregation: For very large datasets, consider breaking calculations into steps

Practical Examples

Example 1: Sales Analysis

Let's analyze a sales database to generate a monthly sales report:

sql
-- Monthly sales summary
SELECT 
  DATE_TRUNC('month', order_date) AS month,
  COUNT(*) AS order_count,
  COUNT(DISTINCT customer_id) AS unique_customers,
  SUM(order_total) AS total_revenue,
  AVG(order_total) AS average_order_value,
  SUM(order_total) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Example output:

    month     | order_count | unique_customers | total_revenue | average_order_value | revenue_per_customer
--------------+-------------+------------------+---------------+---------------------+---------------------
 2023-01-01   |        245  |             180  |     24680.50  |             100.74  |               137.11
 2023-02-01   |        268  |             195  |     27850.25  |             103.92  |               142.82
 ...

Example 2: Employee Performance Dashboard

Create a comprehensive view of department performance:

sql
-- Department performance dashboard
SELECT 
  d.department_name,
  COUNT(e.employee_id) AS employee_count,
  ROUND(AVG(e.salary), 2) AS avg_salary,
  ROUND(AVG(p.performance_score), 1) AS avg_performance,
  COUNT(CASE WHEN p.performance_score >= 4 THEN 1 END) AS top_performers,
  STRING_AGG(CASE WHEN p.performance_score >= 4.5 THEN e.full_name ELSE NULL END, ', ') AS star_employees
FROM employees e
JOIN departments d ON e.department_id = d.department_id
JOIN performance p ON e.employee_id = p.employee_id
WHERE p.review_date >= (CURRENT_DATE - INTERVAL '1 year')
GROUP BY d.department_name
ORDER BY avg_performance DESC;

Example output:

 department_name | employee_count | avg_salary | avg_performance | top_performers | star_employees
-----------------+----------------+------------+-----------------+----------------+----------------
 Engineering     |             18 |   85000.50 |             4.2 |             12 | David Wilson, Lisa Davis, James Miller
 Sales           |             12 |   68750.00 |             3.9 |              7 | Maria Garcia, Sarah Thompson
 Marketing       |              8 |   72500.00 |             3.7 |              4 | Nicole Adams
 HR              |              4 |   62500.00 |             3.5 |              1 | 

Example 3: Product Inventory Analysis

Analyze your inventory status across categories:

sql
-- Inventory analysis by category
SELECT 
  c.category_name,
  COUNT(p.product_id) AS product_count,
  SUM(i.quantity) AS total_stock,
  ROUND(AVG(i.quantity), 0) AS avg_stock_per_product,
  MIN(i.quantity) AS min_stock,
  MAX(i.quantity) AS max_stock,
  COUNT(CASE WHEN i.quantity < 10 THEN 1 END) AS low_stock_products,
  STRING_AGG(CASE WHEN i.quantity < 5 THEN p.product_name ELSE NULL END, ', ') AS critical_stock_items
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN inventory i ON p.product_id = i.product_id
GROUP BY c.category_name
ORDER BY low_stock_products DESC;

Example output:

 category_name | product_count | total_stock | avg_stock_per_product | min_stock | max_stock | low_stock_products | critical_stock_items
---------------+---------------+-------------+-----------------------+-----------+-----------+--------------------+---------------------
 Electronics   |            28 |        1245 |                    44 |         2 |       182 |                  7 | Wireless Earbuds, Smart Watch
 Home Goods    |            35 |        2850 |                    81 |        15 |       320 |                  0 | 
 Clothing      |            42 |        1680 |                    40 |         4 |       120 |                  3 | Winter Jacket

Summary

PostgreSQL aggregate functions are powerful tools for data summarization and analysis that allow you to:

Calculate statistical measures across groups of data
Combine aggregate functions with GROUP BY to categorize results
Filter grouped results using HAVING
Use advanced aggregation functions for complex data manipulation
Perform conditional aggregation with CASE statements

Mastering these functions is essential for database reporting, data analysis, and building meaningful insights from your PostgreSQL databases.

Additional Exercises

To practice your understanding of PostgreSQL aggregate functions, try these exercises:

Calculate the total, average, minimum, and maximum order values for each customer who has placed at least 5 orders.
For each product category, find the number of products, average price, and total inventory value (price × quantity).
Generate a monthly report showing the number of new user registrations and the retention rate (users who returned from the previous month).
Create a query that shows departments where the average salary is below the company-wide average.
Find products that have higher than average ratings and how many reviews they have received.

Further Resources

To deepen your knowledge of PostgreSQL aggregate functions:

Remember that aggregate functions are most powerful when combined with other PostgreSQL features like window functions, common table expressions (CTEs), and advanced filtering techniques.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Aggregate Functions​

COUNT()​

SUM()​

AVG()​

MIN() and MAX()​

Grouping Data with GROUP BY​

Filtering Grouped Results with HAVING​

Advanced Aggregate Functions​

ARRAY_AGG()​

STRING_AGG()​

JSONB_AGG()​

Statistical Functions​

Window Functions vs. Aggregate Functions​

Aggregate Functions with Conditional Logic​

Conditional COUNT​

Conditional SUM​

Performance Considerations​

Practical Examples​

Example 1: Sales Analysis​

Example 2: Employee Performance Dashboard​

Example 3: Product Inventory Analysis​

Summary​

Additional Exercises​

Further Resources​

Introduction

Basic Aggregate Functions

COUNT()

SUM()

AVG()

MIN() and MAX()

Grouping Data with GROUP BY

Filtering Grouped Results with HAVING

Advanced Aggregate Functions

ARRAY_AGG()

STRING_AGG()

JSONB_AGG()

Statistical Functions

Window Functions vs. Aggregate Functions

Aggregate Functions with Conditional Logic

Conditional COUNT

Conditional SUM

Performance Considerations

Practical Examples

Example 1: Sales Analysis

Example 2: Employee Performance Dashboard

Example 3: Product Inventory Analysis

Summary

Additional Exercises

Further Resources