PostgreSQL DISTINCT
Introduction
When querying databases, you often need to find unique values in a dataset without duplicates. PostgreSQL's DISTINCT
clause provides a powerful way to filter out duplicate rows from your query results, returning only unique values.
In this tutorial, you'll learn how to use the DISTINCT
clause in PostgreSQL to retrieve unique values from database tables. This is an essential SQL concept that will help you write more efficient and meaningful queries.
Understanding DISTINCT
The DISTINCT
clause is used in the SELECT
statement to remove duplicate rows from the result set. When you add DISTINCT
to your query, PostgreSQL will only return unique values for the column(s) you specify.
Basic Syntax
SELECT DISTINCT column_name
FROM table_name;
Using DISTINCT with a Single Column
Let's start with a basic example. Suppose we have a products
table that contains products from various categories:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);
INSERT INTO products (name, category, price) VALUES
('Laptop', 'Electronics', 999.99),
('Smartphone', 'Electronics', 699.99),
('Headphones', 'Electronics', 149.99),
('Coffee Maker', 'Kitchen', 79.99),
('Blender', 'Kitchen', 49.99),
('Toaster', 'Kitchen', 29.99),
('Running Shoes', 'Clothing', 89.99),
('T-shirt', 'Clothing', 19.99),
('Jeans', 'Clothing', 59.99);
If we want to see all the unique categories in our products table, we would use:
SELECT DISTINCT category
FROM products;
Result:
category
-------------
Electronics
Kitchen
Clothing
Instead of returning 9 rows (one for each product), the query returns only 3 rows, representing the unique categories.
Using DISTINCT with Multiple Columns
DISTINCT
can also be applied to multiple columns. In this case, PostgreSQL will return rows with unique combinations of the specified columns.
Consider a table of customer_orders
:
CREATE TABLE customer_orders (
id SERIAL PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE,
quantity INT
);
INSERT INTO customer_orders (customer_id, product_id, order_date, quantity) VALUES
(1, 101, '2023-01-15', 2),
(1, 102, '2023-01-15', 1),
(1, 101, '2023-01-20', 1),
(2, 101, '2023-01-16', 3),
(2, 103, '2023-01-16', 2),
(3, 102, '2023-01-18', 1),
(3, 101, '2023-01-18', 1),
(1, 103, '2023-01-20', 2);
To find unique combinations of customers and products:
SELECT DISTINCT customer_id, product_id
FROM customer_orders;
Result:
customer_id | product_id
-------------+------------
1 | 101
1 | 102
1 | 103
2 | 101
2 | 103
3 | 101
3 | 102
This returns each unique customer-product pair, regardless of how many times a customer ordered that product.
DISTINCT ON Expression
PostgreSQL offers a powerful extension to the standard DISTINCT
clause called DISTINCT ON
. This allows you to specify which column(s) to check for distinctness while returning data from other columns.
The syntax is:
SELECT DISTINCT ON (column1) column1, column2, column3...
FROM table_name
ORDER BY column1, column2...;
For example, to find the most recent order for each customer:
SELECT DISTINCT ON (customer_id)
customer_id,
product_id,
order_date
FROM customer_orders
ORDER BY customer_id, order_date DESC;
Result:
customer_id | product_id | order_date
-------------+------------+------------
1 | 103 | 2023-01-20
2 | 101 | 2023-01-16
3 | 101 | 2023-01-18
This selects the most recent order for each customer. The DISTINCT ON (customer_id)
keeps only one row for each customer_id, and the ORDER BY customer_id, order_date DESC
ensures we get the latest order for each customer.
COUNT with DISTINCT
A common use case is counting the number of unique values in a column:
SELECT COUNT(DISTINCT category) AS unique_categories
FROM products;
Result:
unique_categories
------------------
3
This tells us there are three unique categories in our products table.
Real-World Applications
Finding Unique Visitors
Imagine you have a website analytics table:
CREATE TABLE page_views (
id SERIAL PRIMARY KEY,
visitor_ip VARCHAR(15),
page_url VARCHAR(255),
visit_date DATE
);
INSERT INTO page_views (visitor_ip, page_url, visit_date) VALUES
('192.168.1.1', '/home', '2023-02-01'),
('192.168.1.1', '/products', '2023-02-01'),
('192.168.1.2', '/home', '2023-02-01'),
('192.168.1.3', '/contact', '2023-02-01'),
('192.168.1.1', '/home', '2023-02-02'),
('192.168.1.4', '/products', '2023-02-02'),
('192.168.1.2', '/contact', '2023-02-02');
To count unique visitors per day:
SELECT
visit_date,
COUNT(DISTINCT visitor_ip) AS unique_visitors
FROM page_views
GROUP BY visit_date
ORDER BY visit_date;
Result:
visit_date | unique_visitors
------------+----------------
2023-02-01 | 3
2023-02-02 | 3
Analyzing Customer Behavior
SELECT
customer_id,
COUNT(DISTINCT product_id) AS unique_products_ordered
FROM customer_orders
GROUP BY customer_id
ORDER BY unique_products_ordered DESC;
Result:
customer_id | unique_products_ordered
-------------+------------------------
1 | 3
2 | 2
3 | 2
This analysis shows how many different products each customer has ordered, helping identify customers with diverse purchasing habits.
Performance Considerations
While DISTINCT
is very useful, it can impact query performance on large datasets because it requires sorting or hashing operations. Consider these best practices:
- Use
DISTINCT
only when necessary - Apply filters (WHERE clauses) before
DISTINCT
when possible - Consider creating indexes on columns frequently used with
DISTINCT
- For large datasets, consider alternative approaches like using subqueries or CTEs
Summary
The PostgreSQL DISTINCT
clause is a powerful tool for filtering duplicate data from your query results. In this tutorial, you've learned:
- How to use
DISTINCT
with single and multiple columns - The special PostgreSQL
DISTINCT ON
syntax for more complex filtering - How to combine
COUNT
withDISTINCT
for aggregate analysis - Real-world applications for unique value filtering
- Performance considerations when using
DISTINCT
By mastering the DISTINCT
clause, you've added an important technique to your SQL toolkit that will help you retrieve more meaningful and concise data from your databases.
Exercises
- Create a table of employees with departments and locations, then write a query to find all unique department-location combinations.
- Write a query that counts how many different products each customer has ordered, but only for customers who have ordered at least two different products.
- Using the
page_views
table, find the single most viewed page (URL) for each day. - Create a query that shows each visitor's first page visited, based on the earliest visit_date.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)