Skip to main content

PostgreSQL DISTINCT

Introduction

When querying databases, you often need to find unique values in a dataset without duplicates. PostgreSQL's DISTINCT clause provides a powerful way to filter out duplicate rows from your query results, returning only unique values.

In this tutorial, you'll learn how to use the DISTINCT clause in PostgreSQL to retrieve unique values from database tables. This is an essential SQL concept that will help you write more efficient and meaningful queries.

Understanding DISTINCT

The DISTINCT clause is used in the SELECT statement to remove duplicate rows from the result set. When you add DISTINCT to your query, PostgreSQL will only return unique values for the column(s) you specify.

Basic Syntax

sql
SELECT DISTINCT column_name
FROM table_name;

Using DISTINCT with a Single Column

Let's start with a basic example. Suppose we have a products table that contains products from various categories:

sql
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);

INSERT INTO products (name, category, price) VALUES
('Laptop', 'Electronics', 999.99),
('Smartphone', 'Electronics', 699.99),
('Headphones', 'Electronics', 149.99),
('Coffee Maker', 'Kitchen', 79.99),
('Blender', 'Kitchen', 49.99),
('Toaster', 'Kitchen', 29.99),
('Running Shoes', 'Clothing', 89.99),
('T-shirt', 'Clothing', 19.99),
('Jeans', 'Clothing', 59.99);

If we want to see all the unique categories in our products table, we would use:

sql
SELECT DISTINCT category
FROM products;

Result:

  category
-------------
Electronics
Kitchen
Clothing

Instead of returning 9 rows (one for each product), the query returns only 3 rows, representing the unique categories.

Using DISTINCT with Multiple Columns

DISTINCT can also be applied to multiple columns. In this case, PostgreSQL will return rows with unique combinations of the specified columns.

Consider a table of customer_orders:

sql
CREATE TABLE customer_orders (
id SERIAL PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE,
quantity INT
);

INSERT INTO customer_orders (customer_id, product_id, order_date, quantity) VALUES
(1, 101, '2023-01-15', 2),
(1, 102, '2023-01-15', 1),
(1, 101, '2023-01-20', 1),
(2, 101, '2023-01-16', 3),
(2, 103, '2023-01-16', 2),
(3, 102, '2023-01-18', 1),
(3, 101, '2023-01-18', 1),
(1, 103, '2023-01-20', 2);

To find unique combinations of customers and products:

sql
SELECT DISTINCT customer_id, product_id
FROM customer_orders;

Result:

 customer_id | product_id
-------------+------------
1 | 101
1 | 102
1 | 103
2 | 101
2 | 103
3 | 101
3 | 102

This returns each unique customer-product pair, regardless of how many times a customer ordered that product.

DISTINCT ON Expression

PostgreSQL offers a powerful extension to the standard DISTINCT clause called DISTINCT ON. This allows you to specify which column(s) to check for distinctness while returning data from other columns.

The syntax is:

sql
SELECT DISTINCT ON (column1) column1, column2, column3...
FROM table_name
ORDER BY column1, column2...;

For example, to find the most recent order for each customer:

sql
SELECT DISTINCT ON (customer_id) 
customer_id,
product_id,
order_date
FROM customer_orders
ORDER BY customer_id, order_date DESC;

Result:

 customer_id | product_id | order_date
-------------+------------+------------
1 | 103 | 2023-01-20
2 | 101 | 2023-01-16
3 | 101 | 2023-01-18

This selects the most recent order for each customer. The DISTINCT ON (customer_id) keeps only one row for each customer_id, and the ORDER BY customer_id, order_date DESC ensures we get the latest order for each customer.

COUNT with DISTINCT

A common use case is counting the number of unique values in a column:

sql
SELECT COUNT(DISTINCT category) AS unique_categories
FROM products;

Result:

 unique_categories
------------------
3

This tells us there are three unique categories in our products table.

Real-World Applications

Finding Unique Visitors

Imagine you have a website analytics table:

sql
CREATE TABLE page_views (
id SERIAL PRIMARY KEY,
visitor_ip VARCHAR(15),
page_url VARCHAR(255),
visit_date DATE
);

INSERT INTO page_views (visitor_ip, page_url, visit_date) VALUES
('192.168.1.1', '/home', '2023-02-01'),
('192.168.1.1', '/products', '2023-02-01'),
('192.168.1.2', '/home', '2023-02-01'),
('192.168.1.3', '/contact', '2023-02-01'),
('192.168.1.1', '/home', '2023-02-02'),
('192.168.1.4', '/products', '2023-02-02'),
('192.168.1.2', '/contact', '2023-02-02');

To count unique visitors per day:

sql
SELECT 
visit_date,
COUNT(DISTINCT visitor_ip) AS unique_visitors
FROM page_views
GROUP BY visit_date
ORDER BY visit_date;

Result:

 visit_date | unique_visitors
------------+----------------
2023-02-01 | 3
2023-02-02 | 3

Analyzing Customer Behavior

sql
SELECT 
customer_id,
COUNT(DISTINCT product_id) AS unique_products_ordered
FROM customer_orders
GROUP BY customer_id
ORDER BY unique_products_ordered DESC;

Result:

 customer_id | unique_products_ordered
-------------+------------------------
1 | 3
2 | 2
3 | 2

This analysis shows how many different products each customer has ordered, helping identify customers with diverse purchasing habits.

Performance Considerations

While DISTINCT is very useful, it can impact query performance on large datasets because it requires sorting or hashing operations. Consider these best practices:

  1. Use DISTINCT only when necessary
  2. Apply filters (WHERE clauses) before DISTINCT when possible
  3. Consider creating indexes on columns frequently used with DISTINCT
  4. For large datasets, consider alternative approaches like using subqueries or CTEs

Summary

The PostgreSQL DISTINCT clause is a powerful tool for filtering duplicate data from your query results. In this tutorial, you've learned:

  • How to use DISTINCT with single and multiple columns
  • The special PostgreSQL DISTINCT ON syntax for more complex filtering
  • How to combine COUNT with DISTINCT for aggregate analysis
  • Real-world applications for unique value filtering
  • Performance considerations when using DISTINCT

By mastering the DISTINCT clause, you've added an important technique to your SQL toolkit that will help you retrieve more meaningful and concise data from your databases.

Exercises

  1. Create a table of employees with departments and locations, then write a query to find all unique department-location combinations.
  2. Write a query that counts how many different products each customer has ordered, but only for customers who have ordered at least two different products.
  3. Using the page_views table, find the single most viewed page (URL) for each day.
  4. Create a query that shows each visitor's first page visited, based on the earliest visit_date.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)