PostgreSQL Aggregate Functions

Introduction

Aggregate functions are a powerful feature in PostgreSQL that allow you to perform calculations across multiple rows of a query result and return a single output value. They're essential for data analysis, reporting, and summarizing large datasets.

Unlike regular functions that operate on each row individually, aggregate functions work on groups of rows to compute statistics like averages, sums, counts, and more. These functions are fundamental to analytical queries and form the backbone of business intelligence applications.

Basic Aggregate Functions

Let's explore the most commonly used aggregate functions in PostgreSQL:

COUNT()

The COUNT() function returns the number of rows that match a specified condition.

sql
-- Count all rows in a table
SELECT COUNT(*) FROM employees;

-- Count non-null values in a specific column
SELECT COUNT(email) FROM employees;

-- Count employees by department
SELECT department_id, COUNT(*) 
FROM employees
GROUP BY department_id;

Example output:

 count 
-------
    25

SUM()

The SUM() function calculates the total sum of a numeric column.

sql
-- Calculate total salary budget
SELECT SUM(salary) FROM employees;

-- Calculate sum of salaries by department
SELECT department_id, SUM(salary) 
FROM employees 
GROUP BY department_id;

Example output:

   sum   
---------
 1582500

AVG()

The AVG() function calculates the average value of a numeric column.

sql
-- Calculate average salary
SELECT AVG(salary) FROM employees;

-- Calculate average salary by department
SELECT department_id, AVG(salary) 
FROM employees 
GROUP BY department_id;

Example output:

          avg          
-----------------------
 63300.0000000000000000

MIN() and MAX()

These functions return the minimum and maximum values in a column.

sql
-- Find lowest and highest salaries
SELECT MIN(salary), MAX(salary) FROM employees;

-- Find salary range by department
SELECT department_id, MIN(salary), MAX(salary) 
FROM employees 
GROUP BY department_id;

Example output:

  min  |  max  
-------+-------
 42000 | 95000

Advanced Aggregate Functions

PostgreSQL offers several advanced aggregate functions for more complex analyses:

ARRAY_AGG()

The ARRAY_AGG() function aggregates values into an array.

sql
-- Collect all employee names in a department into an array
SELECT department_id, ARRAY_AGG(employee_name) 
FROM employees 
GROUP BY department_id;

Example output:

 department_id |                array_agg                
---------------+----------------------------------------
             1 | {John,Mary,Robert,Susan}
             2 | {James,Patricia,Michael,Jennifer,David}

STRING_AGG()

The STRING_AGG() function concatenates values into a string with a delimiter.

sql
-- Create a comma-separated list of employees in each department
SELECT department_id, STRING_AGG(employee_name, ', ') 
FROM employees 
GROUP BY department_id;

Example output:

 department_id |              string_agg              
---------------+-------------------------------------
             1 | John, Mary, Robert, Susan
             2 | James, Patricia, Michael, Jennifer, David

JSON_AGG() and JSONB_AGG()

These functions aggregate values into JSON or JSONB arrays.

sql
-- Create JSON array of employee records
SELECT department_id, JSON_AGG(ROW_TO_JSON(e)) 
FROM employees e
GROUP BY department_id;

Example output:

 department_id |                                 json_agg
---------------+--------------------------------------------------------------------------
             1 | [{"id":1,"name":"John","salary":65000}, {"id":2,"name":"Mary","salary":72000}]

Statistical Aggregate Functions

PostgreSQL includes several statistical aggregate functions:

STDDEV(), VARIANCE()

Calculate the standard deviation and variance of a dataset.

sql
-- Calculate salary standard deviation
SELECT STDDEV(salary), VARIANCE(salary) FROM employees;

Example output:

       stddev       |     variance     
--------------------+------------------
 12456.7890123456   | 155171543.21

PERCENTILE_CONT() and PERCENTILE_DISC()

These functions calculate percentiles of a dataset.

sql
-- Find median salary (50th percentile)
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) 
FROM employees;

-- Find 25th, 50th and 75th percentiles
SELECT 
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS "25th",
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS "50th",
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS "75th"
FROM employees;

Example output:

 50th  
-------
 62500

Using Aggregate Functions with GROUP BY

The GROUP BY clause is often used with aggregate functions to group rows that have the same values in specified columns:

sql
-- Count employees and average salary by department and job title
SELECT 
  department_id,
  job_title,
  COUNT(*) AS employee_count,
  AVG(salary) AS avg_salary
FROM 
  employees
GROUP BY 
  department_id, 
  job_title
ORDER BY 
  department_id, 
  job_title;

Example output:

 department_id |   job_title   | employee_count |    avg_salary    
---------------+---------------+----------------+------------------
             1 | Developer     |              3 | 72000.00000000
             1 | Manager       |              1 | 85000.00000000
             2 | Analyst       |              4 | 65000.00000000
             2 | Designer      |              2 | 68000.00000000

Filtering Groups with HAVING

The HAVING clause allows you to filter the results of aggregate functions:

sql
-- Find departments with average salary greater than 70000
SELECT 
  department_id, 
  AVG(salary) AS avg_salary
FROM 
  employees
GROUP BY 
  department_id
HAVING 
  AVG(salary) > 70000;

Example output:

 department_id |    avg_salary    
---------------+------------------
             1 | 75666.66666666667
             4 | 82500.00000000000

Common Use Cases

Here are some real-world applications of aggregate functions:

Sales Reporting

sql
-- Monthly sales totals
SELECT 
  EXTRACT(YEAR FROM order_date) AS year,
  EXTRACT(MONTH FROM order_date) AS month,
  SUM(order_total) AS monthly_sales
FROM 
  orders
GROUP BY 
  EXTRACT(YEAR FROM order_date),
  EXTRACT(MONTH FROM order_date)
ORDER BY 
  year, month;

Customer Analysis

sql
-- Top 5 customers by total purchase amount
SELECT 
  customer_id,
  COUNT(order_id) AS order_count,
  SUM(order_total) AS total_spent
FROM 
  orders
GROUP BY 
  customer_id
ORDER BY 
  total_spent DESC
LIMIT 5;

Product Performance

sql
-- Product sales statistics
SELECT 
  product_id,
  COUNT(*) AS times_sold,
  SUM(quantity) AS units_sold,
  AVG(price) AS avg_price,
  SUM(quantity * price) AS total_revenue
FROM 
  order_items
GROUP BY 
  product_id
ORDER BY 
  total_revenue DESC;

Handling NULL Values

Aggregate functions typically ignore NULL values in their calculations:

sql
-- Table: employees(id, name, department, bonus)
-- Where some employees have NULL bonus values

-- This counts only employees with non-NULL bonus values
SELECT COUNT(bonus) FROM employees;  

-- To count all rows including NULL values
SELECT COUNT(*) FROM employees;

-- To handle NULL values in calculations
SELECT 
  AVG(COALESCE(bonus, 0)) AS avg_bonus_including_nulls
FROM 
  employees;

Window Functions vs. Aggregate Functions

While related to aggregate functions, window functions (like ROW_NUMBER(), RANK(), etc.) calculate results across a set of table rows related to the current row. Unlike regular aggregate functions, window functions don't collapse rows:

sql
-- Regular aggregate - returns 1 row per department
SELECT 
  department_id, 
  AVG(salary) 
FROM employees 
GROUP BY department_id;

-- Window function - returns all rows, with department average for each
SELECT 
  employee_id,
  name,
  department_id,
  salary,
  AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary
FROM 
  employees;

Combining Multiple Aggregate Functions

You can use multiple aggregate functions in a single query:

sql
-- Comprehensive department statistics
SELECT 
  department_id,
  COUNT(*) AS employee_count,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary,
  AVG(salary) AS avg_salary,
  SUM(salary) AS salary_budget
FROM 
  employees
GROUP BY 
  department_id;

Aggregate Function Flow

Here's a visual representation of how aggregate functions work:

Performance Considerations

When using aggregate functions, keep these performance tips in mind:

Indexing: Ensure columns used in GROUP BY clauses are indexed.
Materialized Views: For frequently run aggregation queries, consider using materialized views.
LIMIT after aggregation: Apply LIMIT after aggregation is complete.
Filter first: Apply WHERE conditions before grouping to reduce the amount of data processed.

sql
-- Less efficient (processes all rows before filtering)
SELECT 
  department_id, 
  AVG(salary) 
FROM 
  employees 
GROUP BY 
  department_id 
HAVING 
  department_id IN (1, 2, 3);

-- More efficient (filters rows before grouping)
SELECT 
  department_id, 
  AVG(salary) 
FROM 
  employees 
WHERE 
  department_id IN (1, 2, 3)
GROUP BY 
  department_id;

Summary

PostgreSQL aggregate functions are powerful tools for data analysis that allow you to:

Summarize data across multiple rows
Perform statistical calculations
Group data by common attributes
Filter grouped results
Combine with other SQL features for complex analyses

Mastering aggregate functions is essential for anyone working with databases, especially in data analysis, business intelligence, or reporting applications.

Exercises

To practice your understanding of aggregate functions, try these exercises:

Calculate the total, average, minimum, and maximum sales amount by product category.
Find the departments where the average salary is higher than the company-wide average.
Generate a report showing monthly order counts and revenue for the past year.
Create a list of customers with their total number of orders and average order value.
Find products that have sold more than 100 units in total.

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Aggregate Functions​

COUNT()​

SUM()​

AVG()​

MIN() and MAX()​

Advanced Aggregate Functions​

ARRAY_AGG()​

STRING_AGG()​

JSON_AGG() and JSONB_AGG()​

Statistical Aggregate Functions​

STDDEV(), VARIANCE()​

PERCENTILE_CONT() and PERCENTILE_DISC()​

Using Aggregate Functions with GROUP BY​

Filtering Groups with HAVING​

Common Use Cases​

Sales Reporting​

Customer Analysis​

Product Performance​

Handling NULL Values​

Window Functions vs. Aggregate Functions​

Combining Multiple Aggregate Functions​

Aggregate Function Flow​

Performance Considerations​

Summary​

Exercises​

Additional Resources​

Introduction

Basic Aggregate Functions

COUNT()

SUM()

AVG()

MIN() and MAX()

Advanced Aggregate Functions

ARRAY_AGG()

STRING_AGG()

JSON_AGG() and JSONB_AGG()

Statistical Aggregate Functions

STDDEV(), VARIANCE()

PERCENTILE_CONT() and PERCENTILE_DISC()

Using Aggregate Functions with GROUP BY

Filtering Groups with HAVING

Common Use Cases

Sales Reporting

Customer Analysis

Product Performance

Handling NULL Values

Window Functions vs. Aggregate Functions

Combining Multiple Aggregate Functions

Aggregate Function Flow

Performance Considerations

Summary

Exercises

Additional Resources