Physical Database Design

Introduction

Physical database design is the process of deciding how to implement the logical database design in a specific database management system (DBMS). It transforms the logical data model into a physical data model that can be implemented in a database system. This phase focuses on how data is stored, accessed, and manipulated at the hardware level.

While logical database design deals with what data to store, physical database design deals with how to store it efficiently. Good physical design ensures optimal performance, storage efficiency, and scalability of your database.

The Relationship Between Logical and Physical Design

Before diving deeper into physical design, let's understand its relationship with logical design:

As shown in the diagram, physical design is the fourth step in the database design process, following logical design and preceding implementation.

Key Components of Physical Database Design

1. File Organization and Storage Structures

This involves deciding how data will be physically stored on disk:

Heap Files: Unordered collections of records
Sorted Files: Records stored in sorted order based on a key
Hash Files: Records distributed using a hash function
B-Trees: Self-balancing tree structures used for efficient indexing

Example of how different file organizations affect query performance:

-- Query on a heap file (unordered)
-- Full table scan required
SELECT * FROM customers WHERE customer_id = 1001;

-- Same query on a B-Tree indexed file
-- Significantly faster execution
SELECT * FROM customers WHERE customer_id = 1001;

2. Indexing Strategies

Indexes are data structures that improve the speed of data retrieval operations. Choosing the right indexes is crucial for database performance.

Types of Indexes:

Primary Index: Created on the primary key
Secondary Index: Created on non-primary key fields
Clustered Index: Determines the physical order of data
Non-clustered Index: Contains pointers to the actual data rows
Composite Index: Created on multiple columns

Creating an index in SQL:

-- Creating a basic index
CREATE INDEX idx_last_name ON customers(last_name);

-- Creating a composite index
CREATE INDEX idx_name ON customers(last_name, first_name);

When to use indexes:

Columns frequently used in WHERE clauses
Columns used in JOIN operations
Columns used in ORDER BY or GROUP BY clauses

When to avoid indexes:

Small tables where full scans are fast
Columns with low selectivity (many duplicate values)
Tables with frequent large-batch INSERT operations

3. Partitioning

Partitioning divides large tables into smaller, more manageable pieces while maintaining their logical appearance as a single table.

Types of Partitioning:

Horizontal Partitioning (Sharding): Splits rows across multiple tables
Vertical Partitioning: Splits columns across multiple tables
Range Partitioning: Based on ranges of values
List Partitioning: Based on lists of discrete values
Hash Partitioning: Based on a hash function

Example of range partitioning in PostgreSQL:

CREATE TABLE sales (
    sale_id INT,
    sale_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);

-- Create partitions for each quarter
CREATE TABLE sales_q1_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE sales_q2_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

Benefits of partitioning:

Improved query performance
Easier maintenance
Better scalability
Reduced contention

4. Denormalization

While normalization is a key concept in logical design, physical design may involve denormalization—deliberately introducing redundancy to improve performance.

Example of denormalization:

-- Normalized approach (two tables)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE
);

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100)
);

-- Denormalized approach (customer info duplicated in orders)
CREATE TABLE denormalized_orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    customer_name VARCHAR(100), -- Duplicated from customers
    customer_email VARCHAR(100), -- Duplicated from customers
    order_date DATE
);

When to consider denormalization:

When joins are expensive
For frequently accessed read-only data
When performance is crucial and storage is less of a concern

5. Tablespaces and Storage Parameters

Tablespaces allow you to control where and how database objects are stored on disk.

Example in PostgreSQL:

-- Create a tablespace
CREATE TABLESPACE fast_reads
    LOCATION '/ssd_storage/postgres';

-- Create a table in the tablespace
CREATE TABLE frequent_access_table (
    id SERIAL PRIMARY KEY,
    data TEXT
) TABLESPACE fast_reads;

Physical Design Process

The physical design process typically follows these steps:

Analyze Workload: Understand query patterns and data access requirements
Choose Storage Structures: Select appropriate file organizations
Design Indexes: Identify which columns need indexing
Consider Partitioning: Determine if and how to partition tables
Evaluate Denormalization: Decide if denormalization would improve performance
Configure Tablespaces: Set up storage locations for database objects
Implement Security Measures: Set up appropriate access controls
Test and Tune: Measure performance and adjust design as needed

Physical Design Considerations

Performance vs. Storage Tradeoffs

Almost every physical design decision involves a tradeoff between:

Query performance
Storage efficiency
Maintenance overhead
Development complexity

DBMS-Specific Features

Different database systems offer different physical design options:

PostgreSQL: TOAST storage, table inheritance, comprehensive indexing
MySQL: Storage engines (InnoDB, MyISAM), spatial indexing
Oracle: Materialized views, bitmap indexes, advanced partitioning
SQL Server: Columnstore indexes, memory-optimized tables, partitioning

Hardware Considerations

Physical design should take into account:

Available memory
CPU resources
Disk I/O capabilities
Network bandwidth (for distributed databases)

Real-World Example: E-Commerce Database

Let's consider a simplified e-commerce database with these tables:

Products
Customers
Orders
OrderItems

Physical Design Decisions:

Indexing Strategy:
- Primary keys: B-tree indexes on all ID fields
- Foreign keys: Indexes on customer_id in Orders, product_id in OrderItems
- Search optimization: Index on product_name, price range
Partitioning:
- Orders table: Range-partitioned by order_date
- Products table: List-partitioned by category
Denormalization:
- Add product_name to OrderItems to avoid joins for order summaries
- Store order_total in Orders rather than calculating it each time
Storage Allocation:
- Recent orders: Faster SSD storage
- Historical data: Cheaper HDD storage

Here's a simplified implementation for MySQL:

-- Products table with indexing
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10,2),
    stock_quantity INT,
    INDEX idx_product_name (product_name),
    INDEX idx_price (price),
    INDEX idx_category (category)
);

-- Orders table with partitioning
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    order_total DECIMAL(10,2),
    INDEX idx_customer_id (customer_id),
    INDEX idx_order_date (order_date)
) PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

-- OrderItems with denormalization
CREATE TABLE order_items (
    item_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    product_name VARCHAR(100), -- Denormalized field
    quantity INT,
    unit_price DECIMAL(10,2),
    INDEX idx_order_id (order_id),
    INDEX idx_product_id (product_id)
);

Performance Tuning and Monitoring

Physical design is an iterative process that requires ongoing monitoring and tuning:

Identify Slow Queries: Use tools like EXPLAIN to analyze query execution plans
Monitor Index Usage: Check which indexes are used and which are not
Analyze I/O Patterns: Identify disk bottlenecks
Adjust Configurations: Modify memory allocation, buffer sizes, etc.
Reassess Design Choices: Revise indexing, partitioning strategies as data grows

Example of examining a query plan:

EXPLAIN SELECT * FROM orders 
WHERE customer_id = 1001 
AND order_date BETWEEN '2023-01-01' AND '2023-12-31';

Physical Design Best Practices

Design for the Most Common Queries: Optimize for the 80% case
Consider Data Growth: Plan for how data will scale over time
Balance Read vs. Write Performance: Different strategies work best for read-heavy vs. write-heavy workloads
Start Simple: Begin with a simple design and add complexity as needed
Regular Maintenance: Plan for index rebuilding, statistics updates, etc.
Document Design Decisions: Record why specific physical design choices were made

Summary

Physical database design translates logical data models into efficient, performant physical implementations. It involves making strategic decisions about:

File organization and storage
Indexing strategies
Partitioning approaches
Denormalization techniques
Storage allocation

These decisions have profound impacts on database performance, storage efficiency, and scalability. Physical design must balance competing requirements and make appropriate tradeoffs.

Unlike logical design, physical design is closely tied to the specific DBMS being used and must take advantage of the features it offers while working within its constraints.

Exercises

Design an appropriate indexing strategy for a social media database with tables for users, posts, comments, and likes.
Compare the performance implications of range partitioning vs. hash partitioning for a large table of financial transactions.
Identify situations where denormalization would be appropriate in a library management system.
Create a physical design plan for a time-series database that will store IoT sensor readings (millions per day).
Analyze the query execution plan for a complex join query and suggest physical design improvements.

Additional Resources

Database System Concepts by Silberschatz, Korth, and Sudarshan
Physical Database Design by Sam Lightstone, Toby Teorey, and Tom Nadeau
PostgreSQL Documentation on Indexing
MySQL Performance Tuning and Optimization Guide
Oracle Database Administrator's Guide

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

The Relationship Between Logical and Physical Design​

Key Components of Physical Database Design​

1. File Organization and Storage Structures​

2. Indexing Strategies​

Types of Indexes:​

3. Partitioning​

Types of Partitioning:​

4. Denormalization​

5. Tablespaces and Storage Parameters​

Physical Design Process​

Physical Design Considerations​

Performance vs. Storage Tradeoffs​

DBMS-Specific Features​

Hardware Considerations​

Real-World Example: E-Commerce Database​

Physical Design Decisions:​

Performance Tuning and Monitoring​

Physical Design Best Practices​

Summary​

Exercises​

Additional Resources​