Persistent Data Structures
Introduction
When you modify a regular data structure, such as adding an element to an array, the original structure is altered and its previous state is lost. But what if you want to keep track of the previous state? This is where persistent data structures come into play.
Persistent data structures are a special class of data structures that preserve previous versions of themselves when modified. Instead of overwriting the existing data, they create a new version while keeping the old one intact. This property makes them particularly valuable in functional programming, version control systems, and any application where historical data access is important.
Understanding Persistence
What is Persistence?
In the context of data structures, persistence refers to the ability to access previous versions of a data structure after it has been modified. This is different from simply storing data persistently on disk; it's about maintaining multiple versions of the same data structure in memory.
There are several levels of persistence:
- Partial Persistence: Allows access to all past versions but only allows modifications to the newest version
- Full persistence: Allows both access and modifications to any version
- Confluent persistence: Allows versions to be merged in addition to modification and access
- Functional persistence: Always preserves the previous version when the structure is modified
Let's visualize the difference between ephemeral (regular) and persistent data structures:
Implementation Techniques
Let's explore how to implement persistent data structures.
Path Copying
Path copying is a straightforward technique where we create a new copy of the nodes that are affected by an update operation, along with all nodes that reference them directly or indirectly.
Here's a simple example with a persistent linked list:
class PersistentNode {
constructor(value, next = null) {
this.value = value;
this.next = next;
}
}
class PersistentList {
constructor(head = null) {
this.head = head;
}
// Add element to the front - returns a new list
cons(value) {
return new PersistentList(new PersistentNode(value, this.head));
}
// Return the first element
head() {
if (!this.head) throw new Error("Empty list");
return this.head.value;
}
// Return list without the first element - returns a new list
tail() {
if (!this.head) throw new Error("Empty list");
return new PersistentList(this.head.next);
}
// Convert to array for display
toArray() {
const result = [];
let current = this.head;
while (current) {
result.push(current.value);
current = current.next;
}
return result;
}
}
// Usage example
const emptyList = new PersistentList();
const list1 = emptyList.cons(1); // list1: [1]
const list2 = list1.cons(2); // list2: [2, 1]
const list3 = list2.cons(3); // list3: [3, 2, 1]
console.log(list1.toArray()); // Output: [1]
console.log(list2.toArray()); // Output: [2, 1]
console.log(list3.toArray()); // Output: [3, 2, 1]
In this example, each time we add a new element with cons()
, we create a new list with its own head, rather than modifying the original. The original list remains unchanged and accessible.
Structural Sharing
Creating complete copies can be inefficient. Structural sharing is an optimization technique where unmodified parts of the data structure are reused in the new version.
The diagram above shows how only the modified nodes (and their path to the root) are copied, while unmodified subtrees are shared between versions.
Fat Nodes
Another approach is to use "fat nodes" that store multiple values corresponding to different versions of the data structure:
class FatNode {
constructor(initialValue) {
this.versions = new Map();
this.versions.set(0, {
value: initialValue,
next: null
});
this.latestVersion = 0;
}
getVersion(version) {
return this.versions.get(version);
}
addVersion(version, value, next) {
this.versions.set(version, { value, next });
this.latestVersion = version;
}
}
Practical Examples
Example 1: Persistent Array
Let's implement a simple persistent array using path copying:
class PersistentArray {
constructor(data = []) {
this.data = [...data]; // Create a copy of the array
}
get(index) {
if (index < 0 || index >= this.data.length) {
throw new Error("Index out of bounds");
}
return this.data[index];
}
set(index, value) {
if (index < 0 || index >= this.data.length) {
throw new Error("Index out of bounds");
}
// Create a new array with the updated value
const newData = [...this.data];
newData[index] = value;
// Return a new persistent array
return new PersistentArray(newData);
}
push(value) {
const newData = [...this.data, value];
return new PersistentArray(newData);
}
length() {
return this.data.length;
}
}
// Usage
const arr1 = new PersistentArray([1, 2, 3]);
const arr2 = arr1.set(1, 99); // Update index 1
const arr3 = arr2.push(4); // Add new element
console.log(arr1.data); // Output: [1, 2, 3]
console.log(arr2.data); // Output: [1, 99, 3]
console.log(arr3.data); // Output: [1, 99, 3, 4]
Example 2: Persistent Stack
A stack is naturally suited for a persistent implementation:
class PersistentStack {
constructor(items = []) {
this.items = items;
}
push(item) {
return new PersistentStack([item, ...this.items]);
}
pop() {
if (this.isEmpty()) {
throw new Error("Cannot pop from empty stack");
}
return new PersistentStack(this.items.slice(1));
}
peek() {
if (this.isEmpty()) {
throw new Error("Cannot peek at empty stack");
}
return this.items[0];
}
isEmpty() {
return this.items.length === 0;
}
size() {
return this.items.length;
}
}
// Usage
const stack1 = new PersistentStack();
const stack2 = stack1.push("A");
const stack3 = stack2.push("B");
const stack4 = stack3.pop();
console.log(stack2.peek()); // Output: "A"
console.log(stack3.peek()); // Output: "B"
console.log(stack4.peek()); // Output: "A" (after popping "B")
Real-World Applications
1. Undo/Redo Functionality
Persistent data structures are ideal for implementing undo/redo functionality in applications:
class TextEditor {
constructor(text = "") {
this.currentText = text;
this.history = [text];
this.currentIndex = 0;
}
edit(newText) {
this.currentText = newText;
// Remove any forward history
this.history = this.history.slice(0, this.currentIndex + 1);
this.history.push(newText);
this.currentIndex = this.history.length - 1;
}
undo() {
if (this.currentIndex > 0) {
this.currentIndex--;
this.currentText = this.history[this.currentIndex];
}
return this.currentText;
}
redo() {
if (this.currentIndex < this.history.length - 1) {
this.currentIndex++;
this.currentText = this.history[this.currentIndex];
}
return this.currentText;
}
}
// Usage
const editor = new TextEditor("Hello");
editor.edit("Hello World");
editor.edit("Hello World!");
console.log(editor.currentText); // Output: "Hello World!"
console.log(editor.undo()); // Output: "Hello World"
console.log(editor.undo()); // Output: "Hello"
console.log(editor.redo()); // Output: "Hello World"
2. State Management in Web Applications
Libraries like Redux in React utilize principles of immutability and persistence to manage application state:
// Simplified Redux-like implementation
function reducer(state = { count: 0 }, action) {
switch (action.type) {
case 'INCREMENT':
return { ...state, count: state.count + 1 };
case 'DECREMENT':
return { ...state, count: state.count - 1 };
default:
return state;
}
}
// Usage
let state1 = reducer(undefined, { type: 'INIT' }); // { count: 0 }
let state2 = reducer(state1, { type: 'INCREMENT' }); // { count: 1 }
let state3 = reducer(state2, { type: 'INCREMENT' }); // { count: 2 }
console.log(state1); // Still { count: 0 }
console.log(state2); // { count: 1 }
console.log(state3); // { count: 2 }
3. Version Control Systems
Git and other version control systems use persistent data structures to efficiently store different versions of files:
// Simplified representation of a Git-like storage
class Repository {
constructor() {
this.commits = new Map();
this.HEAD = null;
}
commit(id, files, parentId = null) {
this.commits.set(id, {
files: new Map(files),
parent: parentId
});
this.HEAD = id;
}
getFilesAt(commitId) {
return this.commits.get(commitId).files;
}
}
// Usage
const repo = new Repository();
// Initial commit
repo.commit("commit1", [
["file1.txt", "Initial content"]
]);
// Second commit
repo.commit("commit2", [
["file1.txt", "Updated content"],
["file2.txt", "New file"]
], "commit1");
console.log(repo.getFilesAt("commit1").get("file1.txt")); // "Initial content"
console.log(repo.getFilesAt("commit2").get("file1.txt")); // "Updated content"
Performance Considerations
Persistent data structures come with some trade-offs:
-
Space Efficiency: Due to structural sharing, persistent structures often use less memory than naive copy-on-write approaches, but more memory than mutable structures.
-
Time Efficiency: Operations on persistent data structures generally have logarithmic complexity O(log n) compared to constant time O(1) for many operations on mutable structures.
-
Complexity: Implementation is often more complex, requiring careful design to maintain persistence and efficiency.
Here's a simple comparison:
Operation | Array (Mutable) | Persistent Array |
---|---|---|
Access | O(1) | O(log n) |
Update | O(1) | O(log n) |
Insert | O(n) | O(log n) |
Delete | O(n) | O(log n) |
Popular Persistent Data Structure Libraries
Several libraries provide ready-to-use persistent data structures:
- Immutable.js (JavaScript): Provides many persistent data structures
- Immer (JavaScript): Creates immutable state by writing mutable code
- Vavr (Java): Functional library with persistent data structures
- Clojure's data structures: Built with persistence in mind
Summary
Persistent data structures offer a powerful approach to handling data that needs to maintain its history. By preserving previous versions when modifications occur, they enable:
- Efficient undo/redo operations
- Safe concurrent access without locks
- Simplified reasoning about state changes
- Better debugging and time-travel capabilities
While they may introduce some overhead compared to mutable structures, the benefits often outweigh the costs in many applications, particularly those dealing with complex state management or historical data access.
Exercises
-
Implement a persistent binary search tree that maintains its previous state when nodes are inserted or deleted.
-
Extend the persistent array example to include methods for
insert
anddelete
that preserve the original array. -
Create a persistent hash map that can store key-value pairs and maintains previous versions when entries are added, updated, or removed.
-
Implement a text editor that uses a persistent data structure to store the document and support undo/redo operations efficiently.
-
Research and compare the performance characteristics of your own implementations versus established libraries like Immutable.js.
Additional Resources
-
Books:
- "Purely Functional Data Structures" by Chris Okasaki
- "Functional Programming in JavaScript" by Luis Atencio
-
Online Courses:
- Functional Programming Principles (Coursera)
- Advanced Data Structures (MIT OpenCourseWare)
-
Libraries:
By mastering persistent data structures, you'll be better equipped to handle complex state management in your applications, whether you're working in functional programming paradigms or simply need to track changes to your data over time.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)