SQL Refactoring

Introduction

SQL refactoring is the process of restructuring existing SQL code to improve its internal structure without changing its external behavior. Just like refactoring in other programming languages, SQL refactoring aims to enhance code quality, performance, and maintainability while preserving functionality.

As databases grow and applications evolve, SQL queries that once performed adequately may become bottlenecks. Refactoring helps address these issues proactively, ensuring your database interactions remain efficient and your code remains clear and maintainable.

Why Refactor SQL?

Refactoring SQL provides several important benefits:

Improved performance: Optimized queries execute faster and consume fewer resources
Enhanced readability: Well-structured SQL is easier to understand and maintain
Reduced technical debt: Regular refactoring prevents accumulation of problematic code
Better scalability: Efficient queries handle growing data volumes more effectively
Easier maintenance: Clean SQL is simpler to debug and modify

Common SQL Refactoring Techniques

1. Simplifying Complex Queries

Complex queries with multiple nested subqueries can be difficult to understand and optimize. Breaking them down into simpler components often improves both readability and performance.

Before Refactoring:

sql
SELECT customer_name, 
       (SELECT SUM(order_total) 
        FROM orders 
        WHERE orders.customer_id = customers.customer_id 
        AND order_date BETWEEN '2023-01-01' AND '2023-12-31') as yearly_total
FROM customers
WHERE customer_id IN 
      (SELECT customer_id 
       FROM orders 
       GROUP BY customer_id 
       HAVING COUNT(*) > 5);

After Refactoring:

sql
WITH active_customers AS (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING COUNT(*) > 5
),
yearly_totals AS (
    SELECT customer_id, SUM(order_total) as yearly_total
    FROM orders
    WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
    GROUP BY customer_id
)
SELECT c.customer_name, COALESCE(yt.yearly_total, 0) as yearly_total
FROM customers c
JOIN active_customers ac ON c.customer_id = ac.customer_id
LEFT JOIN yearly_totals yt ON c.customer_id = yt.customer_id;

The refactored version uses Common Table Expressions (CTEs) to break the logic into distinct, understandable steps, making the query more maintainable and often more efficient.

2. Replacing Subqueries with JOINs

In many cases, subqueries can be replaced with JOINs, which typically perform better, especially with larger datasets.

Before Refactoring:

sql
SELECT product_name, product_price
FROM products
WHERE category_id = (
    SELECT category_id 
    FROM categories 
    WHERE category_name = 'Electronics'
);

After Refactoring:

sql
SELECT p.product_name, p.product_price
FROM products p
JOIN categories c ON p.category_id = c.category_id
WHERE c.category_name = 'Electronics';

The JOIN version is often more efficient because the database can optimize the execution plan better than with subqueries.

3. Eliminating Redundant Conditions

Redundant or overlapping WHERE conditions can complicate queries without adding value.

Before Refactoring:

sql
SELECT * FROM employees
WHERE (hire_date >= '2020-01-01' AND hire_date <= '2023-12-31')
AND (department_id = 5 OR department_id = 6)
AND (hire_date >= '2019-06-01')
AND (salary > 50000);

After Refactoring:

sql
SELECT * FROM employees
WHERE hire_date BETWEEN '2020-01-01' AND '2023-12-31'
AND department_id IN (5, 6)
AND salary > 50000;

The refactored query is cleaner and eliminates the redundant date condition.

4. Using Proper Indexing

While not strictly a code refactoring technique, ensuring proper indexes exist to support your queries is essential for performance.

Example Index Creation:

sql
-- Before refactoring: frequent query without proper index
SELECT * FROM orders
WHERE customer_id = 123 AND order_date > '2023-01-01';

-- Add a composite index to support this query pattern
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date);

5. Replacing UNION with UNION ALL When Appropriate

When you don't need to eliminate duplicates, using UNION ALL instead of UNION can significantly improve performance.

Before Refactoring:

sql
SELECT product_id, product_name FROM featured_products
UNION
SELECT product_id, product_name FROM new_arrivals;

After Refactoring (when duplicates are acceptable):

sql
SELECT product_id, product_name FROM featured_products
UNION ALL
SELECT product_id, product_name FROM new_arrivals;

UNION ALL skips the resource-intensive process of checking for and removing duplicates.

SQL Refactoring Process

Follow these steps for effective SQL refactoring:

Step 1: Identify Problem Areas

Look for SQL queries that are:

Running slowly
Hard to understand
Frequently modified
Causing application bottlenecks

Step 2: Analyze Query Execution Plan

Use database tools to examine how your query is currently executing:

sql
-- In MySQL/MariaDB
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- In PostgreSQL
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- In SQL Server
SET STATISTICS IO, TIME ON;
GO
SELECT * FROM orders WHERE customer_id = 123;
GO

Step 3: Create Test Cases

Before refactoring, create test cases that verify the current behavior:

sql
-- Create a temporary table with the original query results
CREATE TEMPORARY TABLE original_results AS
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10;

Step 4: Refactor the Query

Apply appropriate refactoring techniques while keeping the intended functionality intact.

Step 5: Verify Results Match

Compare the results of the original and refactored queries:

sql
-- Compare refactored results with original
WITH refactored_results AS (
    SELECT c.customer_id, COUNT(o.order_id) as order_count
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id
    HAVING COUNT(o.order_id) > 10
)
SELECT * FROM original_results
EXCEPT
SELECT * FROM refactored_results;

-- If this returns no rows, the results match

Step 6: Measure Performance Improvement

Compare the execution time and resource usage of the original and refactored queries.

Step 7: Document Changes

Document what was changed and why, including performance metrics.

Advanced SQL Refactoring Techniques

1. Denormalizing for Read-Heavy Operations

In some cases, controlled denormalization can improve read performance:

Before:

sql
SELECT o.order_id, c.customer_name, p.product_name, oi.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date > '2023-01-01';

After (with denormalized order_summary table):

sql
-- Create a denormalized view or materialized view
CREATE VIEW order_summary AS
SELECT o.order_id, o.order_date, 
       c.customer_id, c.customer_name,
       p.product_id, p.product_name, 
       oi.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

-- Then query becomes simpler
SELECT order_id, customer_name, product_name, quantity
FROM order_summary
WHERE order_date > '2023-01-01';

2. Refactoring String Operations

String operations in SQL are often expensive. When possible, refactor them to use more efficient approaches:

Before:

sql
SELECT * FROM customers
WHERE UPPER(first_name) = 'JOHN';

After:

sql
-- Create a computed column or use a case-insensitive collation
SELECT * FROM customers
WHERE first_name = 'John'
   OR first_name = 'JOHN'
   OR first_name = 'john';

Better yet, in many databases you can use:

sql
SELECT * FROM customers
WHERE first_name ILIKE 'john';  -- PostgreSQL
-- or
WHERE first_name COLLATE SQL_Latin1_General_CP1_CI_AS = 'john';  -- SQL Server

3. Using Window Functions

Window functions can replace complex self-joins and subqueries:

Before:

sql
SELECT e.employee_id, e.employee_name, e.salary,
       (SELECT AVG(salary) FROM employees 
        WHERE department_id = e.department_id) as dept_avg_salary
FROM employees e;

After:

sql
SELECT employee_id, employee_name, salary,
       AVG(salary) OVER (PARTITION BY department_id) as dept_avg_salary
FROM employees;

Window functions perform calculations across a set of rows related to the current row, often resulting in cleaner, more efficient code.

Real-World SQL Refactoring Example

Let's walk through a complete refactoring example for a complex reporting query:

Original Query:

sql
SELECT 
    p.product_name,
    (SELECT category_name FROM categories WHERE category_id = p.category_id) as category,
    p.unit_price,
    (SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) as total_quantity,
    (SELECT COUNT(DISTINCT o.customer_id) 
     FROM orders o 
     JOIN order_items oi ON o.order_id = oi.order_id 
     WHERE oi.product_id = p.product_id) as unique_customers,
    CASE 
        WHEN (SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) > 100 THEN 'High'
        WHEN (SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) > 50 THEN 'Medium'
        ELSE 'Low'
    END as demand_level
FROM products p
WHERE p.discontinued = 0
AND p.unit_price > 20
ORDER BY (SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) DESC;

Refactored Query:

sql
WITH product_sales AS (
    SELECT 
        oi.product_id,
        SUM(oi.quantity) as total_quantity,
        COUNT(DISTINCT o.customer_id) as unique_customers
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    GROUP BY oi.product_id
)
SELECT 
    p.product_name,
    c.category_name as category,
    p.unit_price,
    COALESCE(ps.total_quantity, 0) as total_quantity,
    COALESCE(ps.unique_customers, 0) as unique_customers,
    CASE 
        WHEN COALESCE(ps.total_quantity, 0) > 100 THEN 'High'
        WHEN COALESCE(ps.total_quantity, 0) > 50 THEN 'Medium'
        ELSE 'Low'
    END as demand_level
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
LEFT JOIN product_sales ps ON p.product_id = ps.product_id
WHERE p.discontinued = 0
AND p.unit_price > 20
ORDER BY COALESCE(ps.total_quantity, 0) DESC;

This refactoring:

Eliminates redundant subqueries that calculated the same values multiple times
Uses CTEs to organize the logic clearly
Replaces correlated subqueries with JOINs
Handles NULL cases with COALESCE
Maintains the exact same results as the original query

Best Practices for SQL Refactoring

Start with a backup: Always have a backup of your database and original queries before beginning.
Test thoroughly: Verify the refactored query returns identical results to the original.
Refactor incrementally: Make small, targeted changes rather than rewriting entire queries at once.
Use database-specific features wisely: Leverage your database's unique features, but be careful about portability.
Consider query plans: Check execution plans before and after refactoring to confirm performance improvements.
Document changes: Keep records of what was changed and why, especially performance metrics.
Standardize SQL formatting: Adopt consistent formatting practices for better readability:

sql
-- Consistent formatting example
SELECT 
    customer_id,
    first_name,
    last_name,
    COUNT(order_id) AS order_count
FROM 
    customers
LEFT JOIN 
    orders ON customers.customer_id = orders.customer_id
WHERE 
    customer_status = 'Active'
    AND registration_date >= '2022-01-01'
GROUP BY 
    customer_id,
    first_name,
    last_name
HAVING 
    COUNT(order_id) > 0
ORDER BY 
    order_count DESC;

Common SQL Refactoring Pitfalls

Premature optimization: Don't refactor what isn't broken or what isn't a performance bottleneck.
Over-complication: Sometimes a simple query with a good index outperforms a complex "optimized" query.
Ignoring database differences: What works well in PostgreSQL might not work the same in MySQL or SQL Server.
Forgetting about transactions: Refactored code must maintain the same transaction boundaries and isolation levels.
Neglecting edge cases: Test with boundary conditions and special cases, not just typical data.

Tools for SQL Refactoring

Several tools can assist in the SQL refactoring process:

Database-specific execution plan analyzers:
- MySQL: EXPLAIN and EXPLAIN ANALYZE
- PostgreSQL: EXPLAIN (ANALYZE, BUFFERS)
- SQL Server: Query Execution Plans and Statistics
SQL formatting tools:
- SQL Formatter online tools
- IDE plugins (DataGrip, VS Code SQL extensions)
SQL quality checkers:
- SonarQube with SQL plugins
- SQLFluff

Summary

SQL refactoring is a valuable practice that helps maintain database performance and code quality as your application grows. By applying structured techniques like simplifying complex queries, replacing subqueries with JOINs, and using CTEs, you can transform difficult-to-maintain SQL into cleaner, more efficient code.

Remember that good SQL refactoring preserves existing functionality while enhancing performance, readability, and maintainability. Always verify results match, test thoroughly, and document your changes.

Exercises

Refactor this query to use CTEs instead of nested subqueries:

sql
SELECT product_name, 
       (SELECT AVG(unit_price) FROM products) as avg_price,
       unit_price - (SELECT AVG(unit_price) FROM products) as price_difference
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);

Refactor this query to eliminate the self-join by using window functions:

sql
SELECT e1.employee_name, e1.department, e1.salary,
       e2.average_salary,
       e1.salary - e2.average_salary as difference
FROM employees e1
JOIN (
    SELECT department, AVG(salary) as average_salary
    FROM employees
    GROUP BY department
) e2 ON e1.department = e2.department;

Identify and fix performance issues in this query:

sql
SELECT 
    o.order_date, 
    c.customer_name,
    SUM(oi.quantity * p.unit_price) as order_total
FROM orders o
JOIN customers c ON UPPER(o.customer_id) = UPPER(c.customer_id)
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE YEAR(o.order_date) = 2023
GROUP BY o.order_date, c.customer_name;

Additional Resources

SQL Performance Explained - An excellent book on SQL optimization
Database Refactoring Patterns - Catalog of refactoring techniques
SQL Style Guide - Guidelines for SQL formatting and conventions
SQL Server Query Tuning - Microsoft's official guide
PostgreSQL Performance Optimization - From the PostgreSQL documentation

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Refactor SQL?​

Common SQL Refactoring Techniques​

1. Simplifying Complex Queries​

2. Replacing Subqueries with JOINs​

3. Eliminating Redundant Conditions​

4. Using Proper Indexing​

5. Replacing UNION with UNION ALL When Appropriate​

SQL Refactoring Process​

Step 1: Identify Problem Areas​

Step 2: Analyze Query Execution Plan​

Step 3: Create Test Cases​

Step 4: Refactor the Query​

Step 5: Verify Results Match​

Step 6: Measure Performance Improvement​

Step 7: Document Changes​

Advanced SQL Refactoring Techniques​

1. Denormalizing for Read-Heavy Operations​

2. Refactoring String Operations​

3. Using Window Functions​

Real-World SQL Refactoring Example​

Best Practices for SQL Refactoring​

Common SQL Refactoring Pitfalls​

Tools for SQL Refactoring​

Summary​

Exercises​

Additional Resources​

Introduction

Why Refactor SQL?

Common SQL Refactoring Techniques

1. Simplifying Complex Queries

2. Replacing Subqueries with JOINs

3. Eliminating Redundant Conditions

4. Using Proper Indexing

5. Replacing UNION with UNION ALL When Appropriate

SQL Refactoring Process

Step 1: Identify Problem Areas

Step 2: Analyze Query Execution Plan

Step 3: Create Test Cases

Step 4: Refactor the Query

Step 5: Verify Results Match

Step 6: Measure Performance Improvement

Step 7: Document Changes

Advanced SQL Refactoring Techniques

1. Denormalizing for Read-Heavy Operations

2. Refactoring String Operations

3. Using Window Functions

Real-World SQL Refactoring Example

Best Practices for SQL Refactoring

Common SQL Refactoring Pitfalls

Tools for SQL Refactoring

Summary

Exercises

Additional Resources