Distributed Concurrency Control

Introduction

When multiple users or applications access a database simultaneously, they may try to read or modify the same data at the same time. In a distributed database system, where data is spread across multiple locations, this challenge becomes even more complex. Distributed Concurrency Control refers to the techniques and protocols used to ensure that concurrent transactions in a distributed database environment maintain data consistency and integrity.

In this article, we'll explore how distributed systems manage concurrent access to data, the challenges unique to distributed environments, and the common approaches used to solve these problems.

What is Concurrency Control?

Before diving into distributed concurrency control, let's understand the basic concept of concurrency control.

Concurrency control ensures that database transactions (a sequence of operations performed as a single logical unit of work) are executed in a way that preserves the ACID properties:

Atomicity: A transaction is treated as a single, indivisible unit that either completely succeeds or completely fails.
Consistency: A transaction brings the database from one valid state to another valid state.
Isolation: Concurrent transactions should not interfere with each other.
Durability: Once a transaction is committed, its effects persist even in the event of system failures.

In a distributed environment, these challenges are magnified because:

Data is physically spread across multiple locations
Network delays can affect the order of operations
Partial system failures can occur
Clock synchronization between nodes is imperfect

The Challenge of Distributed Concurrency

Let's illustrate the challenge with a simple example:

javascript
// Transaction 1 (running on Node A)
function transfer(fromAccount, toAccount, amount) {
  const fromBalance = readBalance(fromAccount); // Read operation
  const toBalance = readBalance(toAccount);     // Read operation
  
  if (fromBalance >= amount) {
    writeBalance(fromAccount, fromBalance - amount); // Write operation
    writeBalance(toAccount, toBalance + amount);     // Write operation
    return true;
  }
  return false;
}

// Meanwhile, Transaction 2 (running on Node B)
function addInterest(account, rate) {
  const balance = readBalance(account);        // Read operation
  writeBalance(account, balance * (1 + rate)); // Write operation
}

If these transactions execute concurrently on different nodes without proper concurrency control:

Transaction 1 might read an account balance
Transaction 2 might update the same balance
Transaction 1 then writes based on the old balance value

This is known as the lost update problem and can lead to inconsistent data.

Fundamental Approaches to Distributed Concurrency Control

There are three main approaches to handling concurrency in distributed systems:

1. Pessimistic Concurrency Control

This approach assumes conflicts between transactions are likely and prevents them beforehand.

Two-Phase Locking (2PL)

The most common pessimistic approach is Two-Phase Locking (2PL), which works as follows:

Growing Phase: A transaction acquires all needed locks but doesn't release any.
Shrinking Phase: The transaction releases locks and cannot acquire new ones.

For distributed systems, we typically use a variant called Distributed 2PL:

javascript
// Simplified example of Distributed 2PL
async function transferWithLocking(fromAccount, toAccount, amount) {
  try {
    // Growing phase - acquire locks
    await lockManager.lock(fromAccount);
    await lockManager.lock(toAccount);
    
    // Perform operations
    const fromBalance = await readBalance(fromAccount);
    const toBalance = await readBalance(toAccount);
    
    if (fromBalance >= amount) {
      await writeBalance(fromAccount, fromBalance - amount);
      await writeBalance(toAccount, toBalance + amount);
      
      // Shrinking phase - release locks (in commit phase)
      await lockManager.unlock(fromAccount);
      await lockManager.unlock(toAccount);
      return true;
    }
    
    // Shrinking phase - release locks (in abort phase)
    await lockManager.unlock(fromAccount);
    await lockManager.unlock(toAccount);
    return false;
  } catch (error) {
    // Handle errors and release locks
    await lockManager.unlock(fromAccount);
    await lockManager.unlock(toAccount);
    throw error;
  }
}

Distributed Deadlock Detection

A challenge with locking in distributed systems is deadlock detection. A deadlock can occur when:

Transaction 1 (on Node A) holds a lock on resource X and requests resource Y
Transaction 2 (on Node B) holds a lock on resource Y and requests resource X

Distributed systems use techniques like:

Timeout-based detection: Abort transactions that wait too long for locks
Wait-for graphs: Track which transaction is waiting for which resource
Global deadlock detection: Periodically combine local wait-for graphs to detect cycles

2. Timestamp-Based Concurrency Control

This approach assigns a unique timestamp to each transaction and processes operations in timestamp order.

javascript
// Simplified Timestamp-based Concurrency Control example
function timestampOrderedRead(transaction, dataItem) {
  if (transaction.timestamp < dataItem.writeTimestamp) {
    // This transaction is trying to read data that was written by a "future" transaction
    // Abort and restart this transaction with a new timestamp
    abortAndRestart(transaction);
  } else {
    // Read operation can proceed
    dataItem.readTimestamp = Math.max(dataItem.readTimestamp, transaction.timestamp);
    return dataItem.value;
  }
}

function timestampOrderedWrite(transaction, dataItem, newValue) {
  if (transaction.timestamp < dataItem.readTimestamp || 
      transaction.timestamp < dataItem.writeTimestamp) {
    // This transaction is trying to write data that was already read or written 
    // by a "future" transaction
    abortAndRestart(transaction);
  } else {
    // Write operation can proceed
    dataItem.writeTimestamp = transaction.timestamp;
    dataItem.value = newValue;
  }
}

Timestamp ordering in distributed systems requires coordinated clock synchronization, which can be achieved through:

Logical clocks (e.g., Lamport timestamps)
Vector clocks
Hybrid approaches

3. Optimistic Concurrency Control (OCC)

This approach assumes conflicts are rare and validates transactions only at commit time.

OCC works in three phases:

Read Phase: Transaction reads data without acquiring locks and records this data in a private workspace.
Validation Phase: Before committing, check if any conflicts occurred.
Write Phase: If validation succeeds, make changes permanent; otherwise, abort and retry.

Here's a simplified code example:

javascript
async function optimisticTransfer(fromAccount, toAccount, amount) {
  // Read Phase - work in local workspace
  const readSet = new Map();
  const writeSet = new Map();
  
  // Read current balances
  readSet.set(fromAccount, await readBalance(fromAccount));
  readSet.set(toAccount, await readBalance(toAccount));
  
  // Perform calculations in local workspace
  const fromBalance = readSet.get(fromAccount);
  const toBalance = readSet.get(toAccount);
  
  if (fromBalance >= amount) {
    writeSet.set(fromAccount, fromBalance - amount);
    writeSet.set(toAccount, toBalance + amount);
    
    // Validation Phase
    const validationResult = await validateTransaction(readSet);
    
    if (validationResult.success) {
      // Write Phase - commit changes
      for (const [account, newBalance] of writeSet.entries()) {
        await writeBalance(account, newBalance);
      }
      return true;
    } else {
      // Conflict detected, retry the transaction
      console.log("Conflict detected, retrying transaction");
      return optimisticTransfer(fromAccount, toAccount, amount);
    }
  }
  return false;
}

async function validateTransaction(readSet) {
  // Check if any values in the read set have changed
  for (const [key, value] of readSet.entries()) {
    const currentValue = await readBalance(key);
    if (currentValue !== value) {
      return { success: false, reason: `Value for ${key} has changed` };
    }
  }
  return { success: true };
}

Advanced Distributed Concurrency Control Techniques

Two-Phase Commit (2PC)

Two-Phase Commit is a protocol that ensures all nodes in a distributed system agree to commit a transaction or abort it entirely. It works in two phases:

Prepare Phase: The coordinator asks all participants if they can commit.
Commit Phase: If all participants vote "yes," the coordinator tells everyone to commit; otherwise, it tells everyone to abort.

Three-Phase Commit (3PC)

An extension of 2PC that addresses some of its limitations by adding a pre-commit phase to better handle coordinator failures.

Distributed MVCC (Multi-Version Concurrency Control)

MVCC keeps multiple versions of data items, allowing readers to see a consistent snapshot without being blocked by writers.

javascript
// Simplified MVCC example
function mvccRead(transaction, dataItem) {
  // Find the most recent version of dataItem that was created
  // before this transaction started
  const version = dataItem.versions
    .filter(v => v.timestamp <= transaction.startTimestamp)
    .sort((a, b) => b.timestamp - a.timestamp)[0];
  
  return version ? version.value : null;
}

function mvccWrite(transaction, dataItem, newValue) {
  // Create a new version of the data item
  dataItem.versions.push({
    value: newValue,
    timestamp: transaction.commitTimestamp,
    transactionId: transaction.id
  });
}

Real-World Applications

Case Study 1: Banking System

Consider a distributed banking system where accounts are stored across multiple data centers:

Challenge: Ensuring a transfer between accounts remains consistent even if some nodes fail.
Solution: Use Two-Phase Commit to ensure all parts of the transaction succeed or fail together.

javascript
// Two-Phase Commit example for a banking system
async function distributedTransfer(fromAccount, toAccount, amount) {
  const coordinator = new TransactionCoordinator();
  const participants = [
    new AccountNode("node1", fromAccount),
    new AccountNode("node2", toAccount)
  ];
  
  // Phase 1: Prepare
  let allReady = true;
  for (const participant of participants) {
    const ready = await coordinator.askPrepare(participant, {
      fromAccount, toAccount, amount
    });
    if (!ready) {
      allReady = false;
      break;
    }
  }
  
  // Phase 2: Commit or Abort
  if (allReady) {
    for (const participant of participants) {
      await coordinator.sendCommit(participant);
    }
    return true;
  } else {
    for (const participant of participants) {
      await coordinator.sendAbort(participant);
    }
    return false;
  }
}

Case Study 2: E-commerce Platform

An e-commerce platform that handles thousands of concurrent orders:

Challenge: Managing inventory updates across distributed warehouses.
Solution: Use Optimistic Concurrency Control for better performance under high read loads.

javascript
async function reserveInventory(productId, quantity, userId) {
  let maxRetries = 3;
  let attempts = 0;
  
  while (attempts < maxRetries) {
    attempts++;
    
    // Read current inventory (Optimistic approach)
    const productVersion = await readProductWithVersion(productId);
    const { inventory, version } = productVersion;
    
    if (inventory >= quantity) {
      // Try to update with version check
      const updateSuccess = await updateInventoryIfVersionMatches(
        productId, 
        inventory - quantity, 
        version
      );
      
      if (updateSuccess) {
        // Successfully reserved inventory
        await createReservation(userId, productId, quantity);
        return true;
      }
      // Version conflict, retry
      console.log(`Optimistic concurrency conflict, retrying (attempt ${attempts})`);
    } else {
      // Not enough inventory
      return false;
    }
  }
  
  // Exceeded retry limit
  throw new Error("Failed to reserve inventory after multiple attempts");
}

Performance Considerations

Different concurrency control methods have different performance characteristics:

Pessimistic approaches (like 2PL):
- Pros: Prevents conflicts completely
- Cons: Higher overhead due to lock management, potential for deadlocks
Optimistic approaches (like OCC):
- Pros: Better performance in read-heavy workloads with few conflicts
- Cons: Wasted work when conflicts occur and transactions must be restarted
Timestamp-based approaches:
- Pros: No deadlocks
- Cons: Potential for unnecessary aborts, clock synchronization overhead

Choosing the right approach depends on your specific workload:

Summary

Distributed concurrency control is essential for maintaining data consistency when multiple users access a distributed database simultaneously. We've explored three main approaches:

Pessimistic Concurrency Control: Prevents conflicts before they happen using locks (2PL)
Timestamp-Based Concurrency Control: Orders transactions based on their timestamps
Optimistic Concurrency Control: Assumes conflicts are rare and validates at commit time

Each approach has its strengths and weaknesses, and the choice depends on your specific use case, data access patterns, and performance requirements.

Exercises

Implement a simple two-phase locking mechanism for a distributed bank transfer system.
Design an optimistic concurrency control system for an online shopping cart.
Compare the performance of pessimistic and optimistic approaches in a simulated environment with varying conflict rates.
Modify the MVCC example to handle transaction rollbacks.
Implement a simple distributed deadlock detection algorithm.

Additional Resources

Books:
- "Transaction Processing: Concepts and Techniques" by Jim Gray and Andreas Reuter
- "Principles of Distributed Database Systems" by M. Tamer Özsu and Patrick Valduriez
Online Courses:
- MIT OpenCourseWare: Distributed Systems
- Stanford's Databases: Advanced Topics in Database Systems
Research Papers:
- "A Critique of ANSI SQL Isolation Levels" by Berenson et al.
- "Concurrency Control in Distributed Database Systems" by Bernstein and Goodman

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is Concurrency Control?​

The Challenge of Distributed Concurrency​

Fundamental Approaches to Distributed Concurrency Control​

1. Pessimistic Concurrency Control​

Two-Phase Locking (2PL)​

Distributed Deadlock Detection​

2. Timestamp-Based Concurrency Control​

3. Optimistic Concurrency Control (OCC)​

Advanced Distributed Concurrency Control Techniques​

Two-Phase Commit (2PC)​

Three-Phase Commit (3PC)​

Distributed MVCC (Multi-Version Concurrency Control)​

Real-World Applications​

Case Study 1: Banking System​

Case Study 2: E-commerce Platform​

Performance Considerations​

Summary​

Exercises​

Additional Resources​