PostgreSQL Normalization

Introduction

Database normalization is a systematic approach to organizing data in a relational database like PostgreSQL. It's a fundamental design technique that helps eliminate data redundancy, avoid update anomalies, and ensure data integrity. For beginners working with PostgreSQL, understanding normalization is crucial for building efficient, maintainable, and scalable database systems.

In this guide, we'll explore the concept of normalization in PostgreSQL, walk through the different normal forms, and provide practical examples to demonstrate how proper normalization can significantly improve your database design.

Why Normalize Your PostgreSQL Database?

Before diving into the specifics, let's understand why normalization matters:

Reduce data redundancy: Store data in one place to avoid duplication
Improve data integrity: Prevent inconsistencies when updating, inserting, or deleting data
Optimize query performance: Well-normalized databases often perform better for complex queries
Simplify database maintenance: Make your schema easier to understand and modify

Understanding Database Normalization Forms

Normalization follows a series of steps called "normal forms." Each form builds upon the previous one to further refine your database structure.

First Normal Form (1NF)

A table is in First Normal Form when:

It has a primary key (a column or set of columns that uniquely identifies each row)
All columns contain atomic (indivisible) values
No repeating groups of columns exist

Example: Before 1NF

Let's say we're building an e-commerce database and start with this problematic table:

CREATE TABLE orders_unnormalized (
  order_id INT,
  customer_name VARCHAR(100),
  products TEXT, -- contains comma-separated product IDs
  quantities TEXT, -- contains comma-separated quantities
  order_date DATE
);

INSERT INTO orders_unnormalized VALUES 
  (1, 'John Smith', '101,102,103', '2,1,3', '2023-05-10'),
  (2, 'Jane Doe', '102,104', '1,2', '2023-05-11');

This design violates 1NF because:

The products and quantities columns contain multiple values
We can't easily query individual products within an order

Example: After 1NF

CREATE TABLE orders_1nf (
  order_id INT,
  customer_name VARCHAR(100),
  product_id INT,
  quantity INT,
  order_date DATE,
  PRIMARY KEY (order_id, product_id)
);

INSERT INTO orders_1nf VALUES 
  (1, 'John Smith', 101, 2, '2023-05-10'),
  (1, 'John Smith', 102, 1, '2023-05-10'),
  (1, 'John Smith', 103, 3, '2023-05-10'),
  (2, 'Jane Doe', 102, 1, '2023-05-11'),
  (2, 'Jane Doe', 104, 2, '2023-05-11');

Now each row represents a single order-product combination, and every field contains atomic values.

Second Normal Form (2NF)

A table is in Second Normal Form when:

It is already in 1NF
All non-key attributes are fully functionally dependent on the entire primary key

This means removing attributes that depend only on part of the composite primary key.

Example: Before 2NF

Our orders_1nf table violates 2NF because:

customer_name and order_date depend only on order_id, not on the full primary key (order_id, product_id)

Example: After 2NF

-- Table for order headers
CREATE TABLE orders_2nf (
  order_id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  order_date DATE
);

-- Table for order details
CREATE TABLE order_items_2nf (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES orders_2nf(order_id)
);

-- Insert data
INSERT INTO orders_2nf VALUES 
  (1, 'John Smith', '2023-05-10'),
  (2, 'Jane Doe', '2023-05-11');

INSERT INTO order_items_2nf VALUES 
  (1, 101, 2),
  (1, 102, 1),
  (1, 103, 3),
  (2, 102, 1),
  (2, 104, 2);

Now we have two tables:

orders_2nf contains data dependent on the order
order_items_2nf contains data dependent on the specific product within an order

Third Normal Form (3NF)

A table is in Third Normal Form when:

It is already in 2NF
All attributes depend directly on the primary key, not on other non-key attributes

This means removing transitive dependencies.

Example: Before 3NF

Let's expand our example by adding customer address information directly to the orders table:

CREATE TABLE orders_not_3nf (
  order_id INT PRIMARY KEY,
  customer_id INT,
  customer_name VARCHAR(100),
  customer_email VARCHAR(100),
  customer_address TEXT,
  order_date DATE
);

This violates 3NF because customer_name, customer_email, and customer_address depend on customer_id, not directly on order_id.

Example: After 3NF

-- Customers table
CREATE TABLE customers_3nf (
  customer_id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  customer_email VARCHAR(100),
  customer_address TEXT
);

-- Orders table (now in 3NF)
CREATE TABLE orders_3nf (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers_3nf(customer_id)
);

-- Order items table (unchanged from 2NF)
CREATE TABLE order_items_3nf (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES orders_3nf(order_id)
);

Now customer information is stored once in the customers_3nf table, eliminating redundancy.

Boyce-Codd Normal Form (BCNF)

BCNF is a slightly stronger version of 3NF. A table is in BCNF when:

It is in 3NF
For every non-trivial functional dependency X → Y, X must be a superkey

This means that every determinant (a column that determines other columns) must be a candidate key.

Let's see an example where 3NF isn't enough:

Example: Before BCNF

Consider a database tracking which professors teach which subjects in which rooms:

CREATE TABLE class_schedule (
  student_id INT,
  subject_id INT,
  professor_id INT,
  room_id INT,
  PRIMARY KEY (student_id, subject_id)
);

Assume each subject is taught by exactly one professor, and each professor teaches in exactly one room. This creates dependencies:

subject_id → professor_id (a subject determines the professor)
professor_id → room_id (a professor determines the room)

This violates BCNF because subject_id and professor_id are determinants but not candidate keys.

Example: After BCNF

-- Table for professor-subject assignments
CREATE TABLE subject_professors (
  subject_id INT PRIMARY KEY,
  professor_id INT,
  UNIQUE (professor_id, subject_id)
);

-- Table for professor-room assignments
CREATE TABLE professor_rooms (
  professor_id INT PRIMARY KEY,
  room_id INT
);

-- Table for student enrollments
CREATE TABLE enrollments (
  student_id INT,
  subject_id INT,
  PRIMARY KEY (student_id, subject_id),
  FOREIGN KEY (subject_id) REFERENCES subject_professors(subject_id)
);

This design eliminates the transitive dependencies and achieves BCNF.

Visualizing Database Normalization

Let's visualize the transformation from an unnormalized database to a fully normalized one:

Real-World PostgreSQL Normalization Example

Let's walk through a complete example of normalizing a database for a small blog application.

Step 1: Identify the Initial Requirements

We need to store:

Blog posts with title, content, publish date
Authors with name, email, bio
Categories for posts
Tags for posts
Comments on posts

Step 2: Create an Unnormalized Design

CREATE TABLE blog_unnormalized (
  post_id INT,
  title VARCHAR(200),
  content TEXT,
  publish_date TIMESTAMP,
  author_name VARCHAR(100),
  author_email VARCHAR(100),
  author_bio TEXT,
  categories TEXT, -- comma-separated category names
  tags TEXT, -- comma-separated tag names
  comment_ids TEXT, -- comma-separated comment IDs
  comment_authors TEXT, -- comma-separated comment author names
  comment_texts TEXT, -- comma-separated comment texts
  comment_dates TEXT -- comma-separated comment dates
);

Step 3: Apply Normalization

First Normal Form (1NF)

First, let's eliminate the comma-separated values:

CREATE TABLE posts_1nf (
  post_id INT,
  title VARCHAR(200),
  content TEXT,
  publish_date TIMESTAMP,
  author_name VARCHAR(100),
  author_email VARCHAR(100),
  author_bio TEXT
);

CREATE TABLE post_categories_1nf (
  post_id INT,
  category_name VARCHAR(50),
  PRIMARY KEY (post_id, category_name)
);

CREATE TABLE post_tags_1nf (
  post_id INT,
  tag_name VARCHAR(50),
  PRIMARY KEY (post_id, tag_name)
);

CREATE TABLE comments_1nf (
  comment_id INT PRIMARY KEY,
  post_id INT,
  author_name VARCHAR(100),
  comment_text TEXT,
  comment_date TIMESTAMP
);

Second Normal Form (2NF)

The tables are already in 2NF because they don't have composite primary keys with partial dependencies.

Third Normal Form (3NF)

Let's extract the author information to eliminate transitive dependencies:

CREATE TABLE authors_3nf (
  author_id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100) UNIQUE,
  bio TEXT
);

CREATE TABLE posts_3nf (
  post_id INT PRIMARY KEY,
  title VARCHAR(200),
  content TEXT,
  publish_date TIMESTAMP,
  author_id INT,
  FOREIGN KEY (author_id) REFERENCES authors_3nf(author_id)
);

CREATE TABLE categories_3nf (
  category_id INT PRIMARY KEY,
  name VARCHAR(50) UNIQUE
);

CREATE TABLE post_categories_3nf (
  post_id INT,
  category_id INT,
  PRIMARY KEY (post_id, category_id),
  FOREIGN KEY (post_id) REFERENCES posts_3nf(post_id),
  FOREIGN KEY (category_id) REFERENCES categories_3nf(category_id)
);

CREATE TABLE tags_3nf (
  tag_id INT PRIMARY KEY,
  name VARCHAR(50) UNIQUE
);

CREATE TABLE post_tags_3nf (
  post_id INT,
  tag_id INT,
  PRIMARY KEY (post_id, tag_id),
  FOREIGN KEY (post_id) REFERENCES posts_3nf(post_id),
  FOREIGN KEY (tag_id) REFERENCES tags_3nf(tag_id)
);

CREATE TABLE comments_3nf (
  comment_id INT PRIMARY KEY,
  post_id INT,
  author_name VARCHAR(100),
  comment_text TEXT,
  comment_date TIMESTAMP,
  FOREIGN KEY (post_id) REFERENCES posts_3nf(post_id)
);

This fully normalized schema is now in 3NF (and also satisfies BCNF requirements).

Step 4: Create the Final Schema with PostgreSQL Data Types and Constraints

-- Authors table
CREATE TABLE authors (
  author_id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL,
  bio TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Posts table
CREATE TABLE posts (
  post_id SERIAL PRIMARY KEY,
  title VARCHAR(200) NOT NULL,
  slug VARCHAR(200) UNIQUE NOT NULL,
  content TEXT NOT NULL,
  excerpt TEXT,
  publish_date TIMESTAMP,
  status VARCHAR(20) DEFAULT 'draft' CHECK (status IN ('draft', 'published', 'archived')),
  author_id INT NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

-- Categories table
CREATE TABLE categories (
  category_id SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE NOT NULL,
  slug VARCHAR(50) UNIQUE NOT NULL,
  description TEXT
);

-- Post-Category relationship table
CREATE TABLE post_categories (
  post_id INT NOT NULL,
  category_id INT NOT NULL,
  PRIMARY KEY (post_id, category_id),
  FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE,
  FOREIGN KEY (category_id) REFERENCES categories(category_id) ON DELETE CASCADE
);

-- Tags table
CREATE TABLE tags (
  tag_id SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE NOT NULL,
  slug VARCHAR(50) UNIQUE NOT NULL
);

-- Post-Tag relationship table
CREATE TABLE post_tags (
  post_id INT NOT NULL,
  tag_id INT NOT NULL,
  PRIMARY KEY (post_id, tag_id),
  FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id) REFERENCES tags(tag_id) ON DELETE CASCADE
);

-- Comments table
CREATE TABLE comments (
  comment_id SERIAL PRIMARY KEY,
  post_id INT NOT NULL,
  author_name VARCHAR(100) NOT NULL,
  author_email VARCHAR(100),
  comment_text TEXT NOT NULL,
  is_approved BOOLEAN DEFAULT FALSE,
  parent_comment_id INT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (post_id) REFERENCES posts(post_id) ON DELETE CASCADE,
  FOREIGN KEY (parent_comment_id) REFERENCES comments(comment_id) ON DELETE SET NULL
);

This final schema incorporates PostgreSQL-specific features:

SERIAL for auto-incrementing IDs
Appropriate constraints (NOT NULL, UNIQUE, CHECK)
Timestamps for tracking creation and updates
ON DELETE CASCADE for maintaining referential integrity
Self-referencing foreign key in the comments table for threaded comments

Common Denormalization Scenarios

While normalization is generally recommended, there are cases where controlled denormalization can improve performance:

When to Consider Denormalization

Read-heavy workloads: If your application mostly reads data and seldom updates it
Performance bottlenecks: When joins between normalized tables become too expensive
Reporting and analytics: For complex aggregations and historical data

Example: Denormalized Post View Count

-- Add a denormalized view count to the posts table
ALTER TABLE posts ADD COLUMN view_count INT DEFAULT 0;

-- Instead of maintaining a separate table of views:
-- CREATE TABLE post_views (
--   post_id INT,
--   viewer_ip VARCHAR(45),
--   view_timestamp TIMESTAMP,
--   FOREIGN KEY (post_id) REFERENCES posts(post_id)
-- );

-- Update the view count with a simple UPDATE
UPDATE posts SET view_count = view_count + 1 WHERE post_id = 42;

PostgreSQL Tools for Maintaining Normalized Data

PostgreSQL offers several features that help maintain data integrity in normalized databases:

Foreign Keys

As shown in our examples, foreign keys enforce relationships between tables:

CREATE TABLE orders (
  order_id SERIAL PRIMARY KEY,
  customer_id INT NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Constraints

PostgreSQL supports various constraints to enforce data integrity:

CREATE TABLE products (
  product_id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  price NUMERIC(10,2) CHECK (price > 0),
  sku VARCHAR(50) UNIQUE NOT NULL
);

Triggers

Triggers can automate complex integrity rules:

CREATE OR REPLACE FUNCTION update_post_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = CURRENT_TIMESTAMP;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_updated_at
BEFORE UPDATE ON posts
FOR EACH ROW
EXECUTE FUNCTION update_post_updated_at();

Views

Views can simplify working with normalized data:

CREATE VIEW post_details AS
SELECT 
  p.post_id,
  p.title,
  p.content,
  p.publish_date,
  a.name AS author_name,
  a.email AS author_email,
  string_agg(DISTINCT c.name, ', ') AS categories,
  string_agg(DISTINCT t.name, ', ') AS tags,
  COUNT(DISTINCT cm.comment_id) AS comment_count
FROM posts p
JOIN authors a ON p.author_id = a.author_id
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON pc.category_id = c.category_id
LEFT JOIN post_tags pt ON p.post_id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.tag_id
LEFT JOIN comments cm ON p.post_id = cm.post_id
WHERE p.status = 'published'
GROUP BY p.post_id, a.author_id;

Common Normalization Mistakes and How to Avoid Them

Overengineering

Don't split tables unnecessarily if there isn't a clear dependency issue.

Underestimating Performance Impact

Heavily normalized databases may require complex joins, which can impact performance.

Ignoring Concurrency

When using PostgreSQL, consider how concurrent access affects your normalized structure.

Not Using Proper Indexing

In normalized databases, proper indexes are crucial:

-- Add index to foreign keys for better join performance
CREATE INDEX idx_posts_author_id ON posts(author_id);
CREATE INDEX idx_comments_post_id ON comments(post_id);

Summary

Database normalization is a foundational concept for designing efficient PostgreSQL databases:

First Normal Form (1NF): Eliminate repeating groups and ensure atomic values
Second Normal Form (2NF): Remove partial dependencies on the primary key
Third Normal Form (3NF): Remove transitive dependencies
BCNF and beyond: Further refinement for specific cases

By following these normalization principles, you'll create PostgreSQL databases that are:

Easier to maintain and update
More resilient against data inconsistencies
Better structured for complex queries
More flexible for future changes

Remember that while normalization is generally recommended, strategic denormalization might be appropriate in specific performance-critical scenarios.

Exercises

Take an unnormalized table containing customer orders with repeated customer information and normalize it to 3NF.

Identify normalization problems in this table design:

CREATE TABLE products (
  product_id INT PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(50),
  category_description TEXT,
  supplier_name VARCHAR(100),
  supplier_contact VARCHAR(100)
);

Design a normalized database schema for a library management system with books, authors, borrowers, and loans.

Additional Resources

PostgreSQL Documentation on Constraints
Database Design for Mere Mortals by Michael J. Hernandez
Practical PostgreSQL by Joshua D. Drake and John C. Worsley

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Normalize Your PostgreSQL Database?​

Understanding Database Normalization Forms​

First Normal Form (1NF)​

Example: Before 1NF​

Example: After 1NF​

Second Normal Form (2NF)​

Example: Before 2NF​

Example: After 2NF​

Third Normal Form (3NF)​

Example: Before 3NF​

Example: After 3NF​

Boyce-Codd Normal Form (BCNF)​

Example: Before BCNF​

Example: After BCNF​

Visualizing Database Normalization​

Real-World PostgreSQL Normalization Example​

Step 1: Identify the Initial Requirements​

Step 2: Create an Unnormalized Design​

Step 3: Apply Normalization​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

Step 4: Create the Final Schema with PostgreSQL Data Types and Constraints​

Common Denormalization Scenarios​

When to Consider Denormalization​

Example: Denormalized Post View Count​

PostgreSQL Tools for Maintaining Normalized Data​

Foreign Keys​

Constraints​

Triggers​

Views​

Common Normalization Mistakes and How to Avoid Them​

Overengineering​

Underestimating Performance Impact​

Ignoring Concurrency​

Not Using Proper Indexing​

Summary​

Exercises​

Additional Resources​

Introduction

Why Normalize Your PostgreSQL Database?

Understanding Database Normalization Forms

First Normal Form (1NF)

Example: Before 1NF

Example: After 1NF

Second Normal Form (2NF)

Example: Before 2NF

Example: After 2NF

Third Normal Form (3NF)

Example: Before 3NF

Example: After 3NF

Boyce-Codd Normal Form (BCNF)

Example: Before BCNF

Example: After BCNF

Visualizing Database Normalization

Real-World PostgreSQL Normalization Example

Step 1: Identify the Initial Requirements

Step 2: Create an Unnormalized Design

Step 3: Apply Normalization

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Step 4: Create the Final Schema with PostgreSQL Data Types and Constraints

Common Denormalization Scenarios

When to Consider Denormalization

Example: Denormalized Post View Count

PostgreSQL Tools for Maintaining Normalized Data

Foreign Keys

Constraints

Triggers

Views

Common Normalization Mistakes and How to Avoid Them

Overengineering

Underestimating Performance Impact

Ignoring Concurrency

Not Using Proper Indexing

Summary

Exercises

Additional Resources