Distributed Algorithms

Introduction

Distributed algorithms are a set of rules and protocols that allow multiple processes, running on different nodes in a distributed system, to work together to solve a common problem. Unlike traditional algorithms that run on a single processor, distributed algorithms operate in environments where:

Computation is distributed across multiple machines
Communication between machines occurs through message passing
No single machine has complete information about the system state
Machines may operate at different speeds
Components may fail independently

As distributed systems continue to grow in importance across cloud computing, big data processing, and decentralized applications, understanding how distributed algorithms work becomes crucial for building reliable and scalable applications.

Key Properties of Distributed Algorithms

1. Fault Tolerance

Distributed algorithms must continue to function correctly even when some components fail.

2. Scalability

The algorithm should remain efficient as the system size increases.

3. Decentralization

There should be no single point of failure or control.

4. Concurrency

Multiple processes need to execute simultaneously without conflicts.

5. Message Passing

Processes communicate by sending messages to each other rather than sharing memory.

Types of Distributed Algorithms

Let's explore some fundamental types of distributed algorithms that form the building blocks of distributed systems.

Consensus Algorithms

Consensus algorithms allow distributed processes to agree on a single value, even in the presence of failures.

Example: Simple Majority Voting

class Node {
  constructor(id, nodes) {
    this.id = id;
    this.nodes = nodes;      // List of all nodes in the network
    this.value = null;       // Local value
    this.proposals = {};     // Received proposals
    this.decided = false;    // Whether a decision has been made
  }

  // Propose a value
  propose(value) {
    this.value = value;
    // Send proposal to all nodes
    for (const nodeId of this.nodes) {
      this.sendMessage(nodeId, { type: 'PROPOSE', from: this.id, value });
    }
  }

  // Receive a message from another node
  receiveMessage(message) {
    if (message.type === 'PROPOSE') {
      this.proposals[message.from] = message.value;
      
      // Check if we have enough proposals to make a decision
      if (Object.keys(this.proposals).length > this.nodes.length / 2) {
        // Count occurrences of each proposed value
        const counts = {};
        for (const nodeId in this.proposals) {
          const val = this.proposals[nodeId];
          counts[val] = (counts[val] || 0) + 1;
        }
        
        // Find the value with the most votes
        let maxCount = 0;
        let majorityValue = null;
        for (const value in counts) {
          if (counts[value] > maxCount) {
            maxCount = counts[value];
            majorityValue = value;
          }
        }
        
        // If majority reached, decide on that value
        if (maxCount > this.nodes.length / 2) {
          this.decided = true;
          this.value = majorityValue;
          console.log(`Node ${this.id} decided on value: ${majorityValue}`);
        }
      }
    }
  }

  sendMessage(nodeId, message) {
    // In a real implementation, this would send a message over the network
    console.log(`Node ${this.id} sending to Node ${nodeId}: ${JSON.stringify(message)}`);
    // The receiving node would call receiveMessage
  }
}

Input:

// Setup a network with 5 nodes
const nodeIds = [1, 2, 3, 4, 5];
const nodes = nodeIds.map(id => new Node(id, nodeIds));

// Nodes propose different values
nodes[0].propose("A");
nodes[1].propose("B");
nodes[2].propose("A");
nodes[3].propose("A");
nodes[4].propose("B");

// Simulate message delivery (in a real system, this happens over the network)
// Each node receives all proposals
for (const sender of nodes) {
  for (const receiver of nodes) {
    if (sender.id !== receiver.id) {
      receiver.receiveMessage({ 
        type: 'PROPOSE', 
        from: sender.id, 
        value: sender.value 
      });
    }
  }
}

Output:

Node 1 sending to Node 2: {"type":"PROPOSE","from":1,"value":"A"}
Node 1 sending to Node 3: {"type":"PROPOSE","from":1,"value":"A"}
...
Node 1 decided on value: A
Node 2 decided on value: A
Node 3 decided on value: A
Node 4 decided on value: A
Node 5 decided on value: A

This simple majority voting algorithm illustrates the basic principle of consensus, though real-world consensus algorithms like Paxos and Raft are more complex to handle scenarios like network partitions and node failures.

Leader Election Algorithms

Leader election algorithms choose a single coordinator among multiple processes.

The Bully Algorithm

class Node:
    def __init__(self, id, nodes):
        self.id = id
        self.nodes = nodes  # List of all node IDs in the system
        self.is_leader = False
        self.election_in_progress = False
        
    def start_election(self):
        print(f"Node {self.id} starting election")
        self.election_in_progress = True
        
        # Find nodes with higher IDs
        higher_nodes = [n for n in self.nodes if n > self.id]
        
        if not higher_nodes:
            # This is the node with the highest ID
            self.become_leader()
            return
        
        # Send election messages to all nodes with higher IDs
        for node_id in higher_nodes:
            self.send_election_message(node_id)
            
    def receive_election_message(self, from_id):
        print(f"Node {self.id} received election message from {from_id}")
        # Reply to the sender
        self.send_reply(from_id)
        
        # Start a new election (only if not already in one)
        if not self.election_in_progress:
            self.start_election()
            
    def receive_reply(self, from_id):
        print(f"Node {self.id} received reply from {from_id}")
        # If I get a reply, I'm not the leader
        self.election_in_progress = False
        
    def receive_coordinator_message(self, from_id):
        print(f"Node {self.id} received coordinator message from {from_id}")
        self.is_leader = False
        self.election_in_progress = False
        print(f"Node {self.id} acknowledges Node {from_id} as leader")
        
    def become_leader(self):
        print(f"Node {self.id} becomes the leader")
        self.is_leader = True
        self.election_in_progress = False
        
        # Announce leadership to all other nodes
        for node_id in self.nodes:
            if node_id != self.id:
                self.send_coordinator_message(node_id)
    
    # These methods would send actual network messages in a real implementation
    def send_election_message(self, to_id):
        print(f"Node {self.id} sending election message to Node {to_id}")
    
    def send_reply(self, to_id):
        print(f"Node {self.id} sending reply to Node {to_id}")
    
    def send_coordinator_message(self, to_id):
        print(f"Node {self.id} sending coordinator message to Node {to_id}")

Example Execution:

# Create nodes with IDs 1, 5, 9
all_ids = [1, 5, 9]
nodes = {id: Node(id, all_ids) for id in all_ids}

# Let's say node 5 detects the leader (node 9) has failed
# Node 5 starts an election
nodes[5].start_election()

# Simulation of message passing
# Node 9 receives election message and replies
nodes[9].receive_election_message(5)
nodes[5].receive_reply(9)

# Since node 9 has highest ID, it becomes leader
nodes[9].become_leader()

# Node 1 and 5 receive coordinator messages
nodes[1].receive_coordinator_message(9)
nodes[5].receive_coordinator_message(9)

Output:

Node 5 starting election
Node 5 sending election message to Node 9
Node 9 received election message from 5
Node 9 sending reply to Node 5
Node 9 starting election
Node 5 received reply from 9
Node 9 becomes the leader
Node 9 sending coordinator message to Node 1
Node 9 sending coordinator message to Node 5
Node 1 received coordinator message from 9
Node 1 acknowledges Node 9 as leader
Node 5 received coordinator message from 9
Node 5 acknowledges Node 9 as leader

Time Synchronization

Clock synchronization is crucial in distributed systems where events must be ordered correctly.

Simplified Network Time Protocol (NTP)

class ClockSync {
    private long localClock;
    private String nodeId;
    
    public ClockSync(String nodeId, long initialTime) {
        this.nodeId = nodeId;
        this.localClock = initialTime;
    }
    
    // Get current local time
    public long getTime() {
        return localClock;
    }
    
    // Update local time
    private void updateTime(long newTime) {
        System.out.println(nodeId + ": Adjusting clock from " + localClock + " to " + newTime);
        localClock = newTime;
    }
    
    // Simulate time passing
    public void tick(long amount) {
        localClock += amount;
    }
    
    // Request time from server
    public void synchronizeWithServer(ClockSync server) {
        // T1: Client request time
        long t1 = this.getTime();
        
        // T2: Server receives request
        long t2 = server.getTime();
        
        // Server processes request (simulated delay)
        server.tick(5);
        
        // T3: Server sends response
        long t3 = server.getTime();
        
        // Client processes response (simulated delay)
        this.tick(5);
        
        // T4: Client receives response
        long t4 = this.getTime();
        
        // Calculate round trip delay
        long roundTripDelay = (t4 - t1) - (t3 - t2);
        
        // Calculate offset
        long offset = ((t2 - t1) + (t3 - t4)) / 2;
        
        System.out.println(nodeId + ": Round trip delay: " + roundTripDelay);
        System.out.println(nodeId + ": Clock offset: " + offset);
        
        // Adjust local clock
        this.updateTime(this.getTime() + offset);
    }
}

Example Usage:

public static void main(String[] args) {
    // Create a time server with time 1000
    ClockSync timeServer = new ClockSync("Server", 1000);
    
    // Create a client with time 900 (100 units behind)
    ClockSync client = new ClockSync("Client", 900);
    
    System.out.println("Before synchronization:");
    System.out.println("Server time: " + timeServer.getTime());
    System.out.println("Client time: " + client.getTime());
    
    // Client synchronizes with server
    client.synchronizeWithServer(timeServer);
    
    System.out.println("
After synchronization:");
    System.out.println("Server time: " + timeServer.getTime());
    System.out.println("Client time: " + client.getTime());
}

Output:

Before synchronization:
Server time: 1000
Client time: 900
Client: Round trip delay: 10
Client: Clock offset: 95
Client: Adjusting clock from 910 to 1005

After synchronization:
Server time: 1005
Client time: 1005

Distributed Mutual Exclusion

In distributed systems, ensuring only one process can access a shared resource at a time requires special algorithms.

Token Ring Algorithm

class Node:
    def __init__(self, id, total_nodes):
        self.id = id
        self.total_nodes = total_nodes
        self.has_token = False
        self.wants_access = False
        self.is_in_critical_section = False
        
    def request_critical_section(self):
        print(f"Node {self.id} wants to enter critical section")
        self.wants_access = True
        
        if self.has_token:
            self.enter_critical_section()
        
    def enter_critical_section(self):
        if not self.wants_access:
            return
            
        print(f"Node {self.id} entering critical section")
        self.is_in_critical_section = True
        self.wants_access = False
        
        # Simulate doing work in critical section
        print(f"Node {self.id} working in critical section")
        
        # Exit critical section
        self.exit_critical_section()
        
    def exit_critical_section(self):
        print(f"Node {self.id} exiting critical section")
        self.is_in_critical_section = False
        
        # Pass token to next node
        next_node = (self.id + 1) % self.total_nodes
        print(f"Node {self.id} passing token to Node {next_node}")
        self.has_token = False
        return next_node
    
    def receive_token(self):
        print(f"Node {self.id} received the token")
        self.has_token = True
        
        if self.wants_access:
            self.enter_critical_section()

Example Execution:

# Create 4 nodes
nodes = [Node(i, 4) for i in range(4)]

# Give the token to node 0
nodes[0].has_token = True
print("Initial state: Node 0 has the token")

# Nodes 1 and 3 want to enter critical section
nodes[1].request_critical_section()
nodes[3].request_critical_section()

# Simulate token passing around the ring
current_token_holder = 0

for _ in range(8):  # Simulate 8 token passes
    if nodes[current_token_holder].has_token:
        next_node = nodes[current_token_holder].exit_critical_section()
        current_token_holder = next_node
        nodes[current_token_holder].receive_token()

Output:

Initial state: Node 0 has the token
Node 1 wants to enter critical section
Node 3 wants to enter critical section
Node 0 exiting critical section
Node 0 passing token to Node 1
Node 1 received the token
Node 1 entering critical section
Node 1 working in critical section
Node 1 exiting critical section
Node 1 passing token to Node 2
Node 2 received the token
Node 2 exiting critical section
Node 2 passing token to Node 3
Node 3 received the token
Node 3 entering critical section
Node 3 working in critical section
Node 3 exiting critical section
Node 3 passing token to Node 0
Node 0 received the token
Node 0 exiting critical section
...

Visual Representation

Here's a visual diagram showing how distributed algorithms operate in a distributed system:

Practical Applications

Distributed algorithms power many systems we use daily:

1. Distributed Databases

Systems like Cassandra and MongoDB use distributed consensus algorithms to maintain data consistency across multiple nodes.

// Simplified example of a distributed key-value store
class DistributedKVStore {
  constructor(nodeId, nodes) {
    this.nodeId = nodeId;
    this.nodes = nodes;
    this.data = {};
    this.versionNumbers = {};
  }
  
  // Set a value with vector clock versioning
  set(key, value) {
    // Update version number
    this.versionNumbers[key] = this.versionNumbers[key] || {};
    this.versionNumbers[key][this.nodeId] = 
      (this.versionNumbers[key][this.nodeId] || 0) + 1;
    
    // Store data with version
    this.data[key] = {
      value: value,
      version: {...this.versionNumbers[key]}
    };
    
    // Replicate to other nodes
    for (const node of this.nodes) {
      if (node !== this.nodeId) {
        this.sendReplication(node, key, value, this.versionNumbers[key]);
      }
    }
    
    return true;
  }
  
  // Receive replication message
  receiveReplication(key, value, incomingVersion) {
    // Initialize if this is a new key
    if (!this.data[key]) {
      this.data[key] = { value: null, version: {} };
      this.versionNumbers[key] = {};
    }
    
    // Compare versions
    const currentVersion = this.versionNumbers[key] || {};
    
    // Check if the incoming version is newer
    let isNewer = false;
    let isConflict = false;
    
    for (const nodeId in incomingVersion) {
      if (!currentVersion[nodeId] || 
          incomingVersion[nodeId] > currentVersion[nodeId]) {
        isNewer = true;
      }
    }
    
    for (const nodeId in currentVersion) {
      if (!incomingVersion[nodeId] || 
          currentVersion[nodeId] > incomingVersion[nodeId]) {
        if (isNewer) {
          isConflict = true;
        }
      }
    }
    
    // Update data if the incoming version is newer
    if (isNewer) {
      if (isConflict) {
        console.log(`Node ${this.nodeId}: Conflict detected for key ${key}, resolving...`);
        // In a real system, we would apply conflict resolution strategies
        // For simplicity, we'll just take the incoming version
      }
      
      this.data[key] = {
        value: value,
        version: {...incomingVersion}
      };
      
      // Update our version numbers
      this.versionNumbers[key] = {...incomingVersion};
      
      console.log(`Node ${this.nodeId}: Updated key ${key} to value ${value}`);
    }
  }
  
  // Get value for a key
  get(key) {
    if (this.data[key]) {
      return this.data[key].value;
    }
    return null;
  }
  
  // Send replication message to another node
  sendReplication(nodeId, key, value, version) {
    console.log(`Node ${this.nodeId} sending replication of key ${key} to Node ${nodeId}`);
    // In a real system, this would send over the network
  }
}

2. Blockchain Networks

Blockchain technologies like Bitcoin and Ethereum use distributed consensus to agree on the state of a ledger.

3. Content Delivery Networks (CDNs)

CDNs use distributed caching and routing algorithms to efficiently deliver content to users.

4. Distributed File Systems

Systems like HDFS (Hadoop Distributed File System) use distributed algorithms for file replication and access.

Challenges in Distributed Algorithms

1. Partial Failures

Unlike centralized systems where components either work entirely or fail entirely, distributed systems can experience partial failures where some components work while others fail.

2. Message Delays and Losses

Network communication is unreliable, and messages can be delayed, lost, or delivered out of order.

3. Clock Drift

Physical clocks on different machines drift at different rates, making time synchronization challenging.

4. Network Partitions

Network failures can isolate groups of nodes from each other, creating "split-brain" scenarios.

CAP Theorem

The CAP theorem states that a distributed system cannot simultaneously provide all three of these guarantees:

Consistency: All nodes see the same data at the same time
Availability: Every request receives a response (success or failure)
Partition Tolerance: The system continues to operate despite network partitions

In practice, when a network partition occurs, you must choose between consistency and availability.

Summary

Distributed algorithms solve complex coordination problems in distributed systems, enabling reliable and scalable applications. Key types include:

Consensus algorithms - Allow nodes to agree on values
Leader election - Select a coordinator
Time synchronization - Coordinate clocks across nodes
Mutual exclusion - Control access to shared resources

Understanding these algorithms is crucial for building modern distributed applications. They address fundamental challenges like partial failures, message delays, and network partitions, allowing systems to maintain reliability and consistency in unpredictable environments.

Additional Resources

Books:

"Distributed Algorithms" by Nancy Lynch
"Designing Data-Intensive Applications" by Martin Kleppmann

Online Tutorials:

MIT's Distributed Systems Course
Coursera's Cloud Computing Specialization

Exercises

Implement a simplified version of the Raft consensus algorithm with leader election and log replication.
Design a distributed counter that maintains consistency across multiple nodes.
Implement a distributed mutual exclusion algorithm and test it under different network conditions.
Create a simulation of clock synchronization between several nodes with random clock drift.
Design a conflict resolution strategy for a distributed key-value store when two nodes update the same key simultaneously.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Key Properties of Distributed Algorithms​

1. Fault Tolerance​

2. Scalability​

3. Decentralization​

4. Concurrency​

5. Message Passing​

Types of Distributed Algorithms​

Consensus Algorithms​

Example: Simple Majority Voting​

Leader Election Algorithms​

The Bully Algorithm​

Time Synchronization​

Simplified Network Time Protocol (NTP)​

Distributed Mutual Exclusion​

Token Ring Algorithm​

Visual Representation​

Practical Applications​

1. Distributed Databases​

2. Blockchain Networks​

3. Content Delivery Networks (CDNs)​

4. Distributed File Systems​

Challenges in Distributed Algorithms​

1. Partial Failures​

2. Message Delays and Losses​

3. Clock Drift​

4. Network Partitions​

CAP Theorem​

Summary​

Additional Resources​

Books:​

Online Tutorials:​

Exercises​

Introduction

Key Properties of Distributed Algorithms

1. Fault Tolerance

2. Scalability

3. Decentralization

4. Concurrency

5. Message Passing

Types of Distributed Algorithms

Consensus Algorithms

Example: Simple Majority Voting

Leader Election Algorithms

The Bully Algorithm

Time Synchronization

Simplified Network Time Protocol (NTP)

Distributed Mutual Exclusion

Token Ring Algorithm

Visual Representation

Practical Applications

1. Distributed Databases

2. Blockchain Networks

3. Content Delivery Networks (CDNs)

4. Distributed File Systems

Challenges in Distributed Algorithms

1. Partial Failures

2. Message Delays and Losses

3. Clock Drift

4. Network Partitions

CAP Theorem

Summary

Additional Resources

Books:

Online Tutorials:

Exercises