Distributed Architecture
Introduction
Distributed architecture forms the backbone of modern distributed database systems. Unlike traditional monolithic databases where all components reside on a single machine, distributed databases spread data, processing, and management across multiple nodes in a network. This approach offers significant advantages in terms of scalability, reliability, and performance, but also introduces unique challenges.
In this guide, we'll explore the core concepts of distributed architecture, examine common patterns used in distributed database systems, and understand how these principles apply in real-world scenarios.
Core Concepts of Distributed Architecture
What Makes an Architecture "Distributed"?
A distributed architecture divides a system into separate components that run on different nodes (computers) across a network. These components work together to function as a single coherent system.
Key characteristics include:
- Distribution of Data: Data is partitioned and stored across multiple nodes
- Distribution of Processing: Computational tasks are spread across multiple nodes
- Communication: Components communicate via network protocols
- Coordination: Mechanisms ensure consistent operation despite physical separation
Basic Distributed Architecture Models
1. Master-Slave Architecture
In this model, a single master node coordinates operations while multiple slave nodes store data and execute queries.
Characteristics:
- Master handles write operations and delegates read operations
- Slaves replicate data from the master
- Simple but has a single point of failure (the master)
Example use case: Traditional MySQL replication setups
2. Peer-to-Peer Architecture
All nodes have equal roles and responsibilities, with no centralized control point.
Characteristics:
- No single point of failure
- Horizontally scalable
- Complex coordination required
- Eventually consistent by default
Example use case: Cassandra, BitTorrent
3. Sharded Architecture
Data is horizontally partitioned (sharded) across multiple nodes, with each node responsible for a subset of the data.
Characteristics:
- Improved write performance
- Horizontally scalable
- Complex to rebalance
- Requires careful sharding strategy
Example use case: MongoDB, MySQL sharding
Key Components in Distributed Database Architecture
Partitioning (Sharding)
Partitioning divides the data into smaller, more manageable pieces distributed across multiple nodes.
Horizontal Partitioning (Sharding) divides rows across nodes:
Database Table (Users)
┌────────────┬──────────┬────────┐
│ User ID │ Name │ Email │
├────────────┼──────────┼────────┤
│ 1 │ Alice │ [email protected]│
│ 2 │ Bob │ [email protected]│
│ ... │ ... │ ... │
│ 1000 │ Zack │ [email protected]│
└────────────┴──────────┴────────┘
↓ Horizontal Partitioning ↓
Shard 1 (Node A) Shard 2 (Node B)
┌────────┬────────┐ ┌────────┬────────┐
│UserID │ Data │ │UserID │ Data │
├────────┼────────┤ ├────────┼────────┤
│1-500 │ ... │ │501-1000│ ... │
└────────┴────────┘ └────────┴────────┘
Vertical Partitioning divides columns across nodes:
Node A Node B
┌────────┬────────┐ ┌────────┬──────────┐
│UserID │ Name │ │UserID │ Email │
├────────┼────────┤ ├────────┼──────────┤
│1-1000 │ ... │ │1-1000 │ ... │
└────────┴────────┘ └────── ──┴──────────┘
Common Sharding Strategies:
- Range-based sharding: Divides data based on ranges of a key (e.g., UserIDs 1-1000 on Server A)
- Hash-based sharding: Uses a hash function on the key to determine placement
- Directory-based sharding: Maintains a lookup service to track data location
Let's implement a simple hash-based sharding function in JavaScript:
function determineShardId(userId, totalShards) {
// Simple hash function: modulo of the user ID by total shards
return userId % totalShards;
}
// Example usage
const userId = 42;
const totalShards = 4;
const shardId = determineShardId(userId, totalShards);
console.log(`User ${userId} should be stored on shard ${shardId}`);
// Output: User 42 should be stored on shard 2
Replication
Replication creates and maintains copies of data across multiple nodes to improve availability and durability.
Types of Replication:
- Synchronous Replication: Write operations complete only after all replicas confirm successful update
- Asynchronous Replication: Primary node acknowledges writes immediately, replicas update later
- Semi-synchronous Replication: At least one replica must confirm before write is acknowledged
Consistency Models
Distributed systems must choose between consistency, availability, and partition tolerance (the CAP theorem).
Common Consistency Models:
- Strong Consistency: All nodes see the same data at the same time
- Eventual Consistency: Given enough time without updates, all nodes will converge to the same state
- Causal Consistency: Operations causally related appear in the same order to all processes
- Session Consistency: A client's reads reflect its previous writes within a session
Let's see how eventual consistency might manifest in a simple Python implementation:
# Simplified representation of nodes in a distributed system
nodes = [
{"name": "user_profile", "value": "Initial value", "timestamp": 0},
{"name": "user_profile", "value": "Initial value", "timestamp": 0},
{"name": "user_profile", "value": "Initial value", "timestamp": 0}
]
def update_node(node_id, new_value, timestamp):
"""Update a node if the timestamp is newer than current value"""
if timestamp > nodes[node_id]["timestamp"]:
nodes[node_id]["value"] = new_value
nodes[node_id]["timestamp"] = timestamp
print(f"Node {node_id} updated to: {new_value}")
else:
print(f"Node {node_id} ignored update: {new_value} (older timestamp)")
# Client updates node 0
update_node(0, "Updated profile", 1) # Succeeds
# Network propagation (simulated as direct calls)
update_node(1, "Updated profile", 1) # Succeeds
update_node(2, "Updated profile", 1) # Succeeds
# Later update with higher timestamp
update_node(1, "New profile info", 2) # Succeeds
# Try to apply older update
update_node(2, "Outdated info", 1) # Fails - already has newer data
# Check system state
for i, node in enumerate(nodes):
print(f"Node {i}: {node['value']} (timestamp: {node['timestamp']})")
# Output:
# Node 0: Updated profile (timestamp: 1)
# Node 1: New profile info (timestamp: 2)
# Node 2: Updated profile (timestamp: 1)
# System is eventually consistent once all updates propagate
Consensus Algorithms
Consensus algorithms help distributed systems agree on shared values despite node failures or network issues.
Popular Consensus Algorithms:
- Paxos: Classic algorithm for reaching consensus in a network of unreliable processors
- Raft: Designed to be more understandable than Paxos, using leader election
- ZAB (Zookeeper Atomic Broadcast): Used in Apache ZooKeeper
- Byzantine Fault Tolerance: Can handle malicious nodes in the system
Common Distributed Database Architectures
Shared-Nothing Architecture
Each node operates independently with its own CPU, memory, and storage.
Advantages:
- Highly scalable
- No resource contention between nodes
- Node failures affect only a portion of the data
Disadvantages:
- Complex coordination
- Joins across shards can be expensive
Example systems: Amazon DynamoDB, Google Bigtable, Apache Cassandra
Shared-Disk Architecture
Nodes share a common storage system but have their own CPU and memory.
Advantages:
- Easier data access across nodes
- Simplified backup and recovery
Disadvantages:
- Storage becomes a potential bottleneck
- Limited horizontal scalability
Example systems: Oracle RAC, IBM Db2 pureScale