Distributed Shared Memory

Introduction

Distributed Shared Memory (DSM) is a memory architecture that creates an abstraction of shared memory across physically distributed systems. In traditional computing, processes running on the same machine can easily share memory. However, in distributed systems where computations run across multiple machines connected by a network, sharing memory becomes challenging. DSM solves this problem by providing a virtual shared address space that spans multiple computers.

Think of DSM as creating the illusion of a single, unified memory space even though the actual memory is physically distributed across different machines. This allows programmers to use the familiar shared-memory programming model even in distributed environments.

Understanding Distributed Shared Memory

The Basic Concept

In a traditional single computer system, multiple processes can access the same memory:

Process A  Process B  Process C
     │         │         │
     ▼         ▼         ▼
┌─────────────────────────┐
│      Shared Memory      │
└─────────────────────────┘

But in distributed systems, we have physically separate memory spaces:

DSM creates a layer of abstraction that makes these separate memory spaces appear as one:

How DSM Works

DSM systems typically operate through these core mechanisms:

Memory Mapping: Creating a virtual address space that maps to physical memory across machines.
Access Detection: Monitoring memory accesses to detect when distributed data needs to be updated.
Consistency Maintenance: Ensuring that all nodes see a consistent view of the shared memory.
Data Transfer: Moving or replicating data between machines when needed.

Implementation Approaches

Page-Based DSM

In page-based DSM, memory is divided into fixed-size pages (similar to virtual memory in operating systems). When a process accesses a page that's not available locally, a "page fault" occurs, and the system fetches the required page from another machine.

// Conceptual example (not actual code)
function accessSharedMemory(address) {
  const page = getPageFromAddress(address);
  
  if (!isPageLocallyAvailable(page)) {
    // Page fault! Need to fetch from remote machine
    requestPageFromRemote(page);
    waitForPageArrival(page);
  }
  
  // Now we can access the data
  return readMemory(address);
}

Object-Based DSM

Object-based DSM systems manage memory at the granularity of programming language objects rather than fixed-size pages. This approach often aligns better with how applications actually use memory.

// Example of object-based DSM in a Java-like language
@Distributed
class SharedCounter {
  private int value = 0;
  
  public synchronized void increment() {
    value++;
  }
  
  public int getValue() {
    return value;
  }
}

// Usage across multiple machines
SharedCounter counter = DSM.get("globalCounter");
counter.increment(); // This might happen on any machine in the cluster

Consistency Models

One of the most challenging aspects of DSM is maintaining consistency—ensuring that all processes have the same view of shared memory. Several consistency models exist:

Strict Consistency

Every read operation returns the most recently written value. This is the most intuitive model but extremely expensive to implement in distributed systems.

Sequential Consistency

All processes see the same sequence of memory operations, but not necessarily in real-time order. It preserves the program order of operations from each process.

Process 1: Write(x, 1) -> Write(y, 2)
Process 2: Read(y) -> Read(x)

Possible sequentially consistent interleaving:
Write(x, 1) -> Write(y, 2) -> Read(y) returns 2 -> Read(x) returns 1

Causal Consistency

Less strict than sequential consistency; it only preserves the order of causally related operations.

Weak Consistency

Synchronization operations are strictly ordered, but data operations are not. This allows for better performance at the cost of a more complex programming model.

Practical DSM Example: Distributed Matrix Multiplication

Let's implement a simple matrix multiplication using a DSM system. We'll partition the matrices across multiple machines.

# Conceptual example of matrix multiplication with DSM
def matrix_multiply_dsm(A, B, result_matrix):
    # Assume matrices A and B are already distributed
    rows_A = A.shape[0]
    cols_B = B.shape[1]
    
    # Each node computes a portion of the result matrix
    node_id = get_current_node_id()
    rows_per_node = rows_A // total_nodes
    start_row = node_id * rows_per_node
    end_row = start_row + rows_per_node if node_id < total_nodes - 1 else rows_A
    
    for i in range(start_row, end_row):
        for j in range(cols_B):
            result_matrix[i][j] = 0
            for k in range(A.shape[1]):
                # This might cause remote memory access if B[k][j] is on another node
                result_matrix[i][j] += A[i][k] * B[k][j]
    
    # Synchronize to ensure all nodes complete their work
    dsm_barrier()

When running this code, each machine computes a portion of the result matrix. The DSM system handles the complexity of accessing matrix elements that might be stored on remote machines.

Advantages of DSM

Simplified Programming Model: Developers can use the familiar shared memory paradigm.
Transparency: The distribution of memory is hidden from application code.
Resource Sharing: Memory resources across multiple machines can be pooled together.
Scalability: Additional machines can be added to increase memory capacity.

Challenges and Limitations

Network Latency: Accessing remote memory is significantly slower than local memory.
Consistency Overhead: Maintaining memory consistency across nodes requires complex protocols.
Fault Tolerance: Handling node failures requires careful design.
Limited Scalability: DSM systems often struggle to scale beyond a few dozen nodes.

Real-World Applications

Scientific Computing

Large-scale simulations and computations can be distributed across multiple machines while maintaining a simple programming model.

# Example: Distributed physics simulation
# Each node handles a region of the simulation space
def simulate_region(region_id, shared_simulation_state):
    while not simulation_complete:
        # Update local region based on its current state
        update_local_region(region_id)
        
        # Update boundary conditions based on neighboring regions
        # (these might be on other machines, handled by DSM)
        update_boundaries(region_id, shared_simulation_state)
        
        # Wait for all regions to complete this step
        dsm_barrier()

In-Memory Databases

Distributed in-memory databases can use DSM to provide a unified view of data spread across multiple machines.

// Conceptual example of a DSM-based distributed database operation
Transaction tx = database.beginTransaction();
try {
    // These records might be on different machines, but DSM hides this complexity
    Customer customer = database.get(Customer.class, customerId);
    Order newOrder = new Order(customer, items);
    database.save(newOrder);
    
    // Update inventory (potentially on different machines)
    for (Item item : items) {
        Inventory inv = database.get(Inventory.class, item.getId());
        inv.decreaseStock(item.getQuantity());
        database.update(inv);
    }
    
    tx.commit();
} catch (Exception e) {
    tx.rollback();
    throw e;
}

Parallel Computing Frameworks

Modern frameworks often incorporate DSM concepts even if they don't explicitly call it that.

Implementing a Simple DSM System

Let's build a very basic DSM system using sockets for communication. This is a simplified example to demonstrate the concepts:

# Simplified DSM implementation (conceptual)
class SimpleDSM:
    def __init__(self, node_id, node_addresses):
        self.node_id = node_id
        self.node_addresses = node_addresses
        self.local_memory = {}
        self.page_owners = {}  # Track which node owns which page
        
        # Start server to listen for page requests
        self.start_server()
    
    def read(self, address):
        page_id = address // PAGE_SIZE
        
        if page_id in self.local_memory:
            # We have the page locally
            return self.local_memory[page_id][address % PAGE_SIZE]
        else:
            # Need to fetch the page from remote node
            owner_node = self.find_page_owner(page_id)
            page_data = self.request_page_from_node(owner_node, page_id)
            self.local_memory[page_id] = page_data
            return page_data[address % PAGE_SIZE]
    
    def write(self, address, value):
        page_id = address // PAGE_SIZE
        
        if page_id not in self.local_memory:
            # Fetch page first if we don't have it
            self.read(address)
        
        # Update local copy
        self.local_memory[page_id][address % PAGE_SIZE] = value
        
        # Invalidate this page on other nodes (simplistic approach)
        self.broadcast_invalidation(page_id)
        
        # Mark ourselves as the owner
        self.page_owners[page_id] = self.node_id
        
    def find_page_owner(self, page_id):
        if page_id in self.page_owners:
            return self.page_owners[page_id]
        else:
            # Ask other nodes who owns this page
            # (simplified - in reality would need a more robust protocol)
            return self.broadcast_owner_query(page_id)
    
    # Other methods for server functionality, broadcasting, etc.
    # would be implemented here

This example shows a simplified page-based DSM implementation. In practice, DSM systems are much more complex, with sophisticated protocols for consistency, fault tolerance, and performance optimization.

Summary

Distributed Shared Memory provides a powerful abstraction that bridges the gap between the simplicity of shared-memory programming and the scalability of distributed systems. By creating the illusion of a single address space across multiple machines, DSM systems allow developers to focus on their application logic rather than the complexities of distributed communication.

While DSM systems face significant challenges in terms of performance and scalability, they remain valuable in specific domains like scientific computing, in-memory databases, and parallel computing frameworks.

Exercises

Implement a simple counter that can be incremented from multiple machines using the DSM concept.
Compare the performance of a DSM-based matrix multiplication with one that uses explicit message passing.
Design a consistency protocol that would be suitable for a collaborative text editor using DSM.
Analyze how network latency affects the performance of different DSM implementation approaches.
Extend the simple DSM implementation provided above to include basic fault tolerance.

Introduction​

Understanding Distributed Shared Memory​

The Basic Concept​

How DSM Works​

Implementation Approaches​

Page-Based DSM​

Object-Based DSM​

Consistency Models​

Strict Consistency​

Sequential Consistency​

Causal Consistency​

Weak Consistency​

Practical DSM Example: Distributed Matrix Multiplication​

Advantages of DSM​

Challenges and Limitations​

Real-World Applications​

Scientific Computing​

In-Memory Databases​

Parallel Computing Frameworks​

Implementing a Simple DSM System​

Summary​

Exercises​

Further Reading​