Hash Tables
Introduction
Hash tables are one of the most powerful and widely used data structures in computer science. At their core, hash tables provide a way to store key-value pairs for efficient retrieval. Unlike arrays where you access elements by their index position, hash tables allow you to access values using keys that have meaning to your application.
The magic of hash tables lies in their ability to retrieve data in O(1) constant time on average, regardless of how much data they store. This incredible efficiency makes them indispensable for countless applications.
In this tutorial, we'll explore how hash tables work, how to implement them, and when to use them in your projects.
What is a Hash Table?
A hash table (also called a hash map) is a data structure that maps keys to values using a special function called a hash function. The hash function converts keys into array indices where the corresponding values are stored.
Basic Structure
A hash table consists of:
- An underlying array for storage
- A hash function that transforms keys into array indices
- A method for handling collisions (when two keys map to the same index)
Here's a visual representation of a hash table:
How Hash Functions Work
The hash function is the heart of a hash table. It takes a key and converts it into an array index. A good hash function should:
- Be fast to compute
- Distribute keys uniformly across the array
- Generate the same index for the same key consistently
Here's a simple example of a hash function for string keys:
function simpleHash(key, arraySize) {
let hashCode = 0;
for (let i = 0; i < key.length; i++) {
// Sum the character codes
hashCode += key.charCodeAt(i);
}
// Ensure the hash is within the array bounds
return hashCode % arraySize;
}
// Example usage
const index = simpleHash("apple", 10); // Might return something like 7
Handling Collisions
A collision occurs when two different keys hash to the same array index. Since this is inevitable (especially as the hash table fills up), we need strategies to handle collisions. The two most common approaches are:
1. Chaining
In chaining, each array position stores a linked list of all key-value pairs that hash to that index.
2. Open Addressing (Linear Probing)
With open addressing, when a collision occurs, we look for the next available slot in the array.
function findSlot(key, hashTable) {
let index = simpleHash(key, hashTable.length);
// If the slot is occupied and doesn't contain our key
while (hashTable[index] !== null && hashTable[index].key !== key) {
// Move to the next slot (linear probing)
index = (index + 1) % hashTable.length;
}
return index;
}
Implementing a Hash Table
Let's implement a simple hash table in JavaScript using the chaining approach:
class HashTable {
constructor(size = 53) {
this.table = new Array(size).fill(null).map(() => []);
this.size = size;
}
_hash(key) {
let total = 0;
const PRIME = 31;
// Only hash the first 100 characters for longer strings
for (let i = 0; i < Math.min(key.length, 100); i++) {
const char = key[i];
const value = char.charCodeAt(0) - 96; // 'a' is 1
total = (total * PRIME + value) % this.size;
}
return total;
}
set(key, value) {
const index = this._hash(key);
const bucket = this.table[index];
// Check if key already exists
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket[i][1] = value; // Update value
return;
}
}
// Key doesn't exist, add new key-value pair
bucket.push([key, value]);
}
get(key) {
const index = this._hash(key);
const bucket = this.table[index];
// Look for the key in the bucket
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
return bucket[i][1]; // Return value
}
}
return undefined; // Key not found
}
remove(key) {
const index = this._hash(key);
const bucket = this.table[index];
// Find and remove the key-value pair
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket.splice(i, 1);
return true;
}
}
return false; // Key not found
}
keys() {
const keysArray = [];
// Collect all keys from the hash table
for (let i = 0; i < this.table.length; i++) {
if (this.table[i].length > 0) {
for (let j = 0; j < this.table[i].length; j++) {
keysArray.push(this.table[i][j][0]);
}
}
}
return keysArray;
}
values() {
const valuesArray = [];
// Collect all values from the hash table
for (let i = 0; i < this.table.length; i++) {
if (this.table[i].length > 0) {
for (let j = 0; j < this.table[i].length; j++) {
valuesArray.push(this.table[i][j][1]);
}
}
}
return valuesArray;
}
}
Let's use our hash table implementation:
// Create a new hash table
const fruitColors = new HashTable();
// Add some key-value pairs
fruitColors.set("apple", "red");
fruitColors.set("banana", "yellow");
fruitColors.set("grape", "purple");
fruitColors.set("orange", "orange");
// Retrieve values
console.log(fruitColors.get("apple")); // Output: "red"
console.log(fruitColors.get("grape")); // Output: "purple"
console.log(fruitColors.get("mango")); // Output: undefined
// Get all keys and values
console.log(fruitColors.keys()); // Output: ["apple", "banana", "grape", "orange"]
console.log(fruitColors.values()); // Output: ["red", "yellow", "purple", "orange"]
// Remove a key-value pair
fruitColors.remove("banana");
console.log(fruitColors.get("banana")); // Output: undefined
Time Complexity
Let's analyze the time complexity of hash table operations:
Operation | Average Case | Worst Case |
---|---|---|
Insert | O(1) | O(n) |
Lookup | O(1) | O(n) |
Delete | O(1) | O(n) |
The worst-case scenario occurs when all keys hash to the same index, creating a long chain. However, with a good hash function and appropriate table size, the average case of O(1) is what makes hash tables so powerful.
Real-World Applications
Hash tables are used in countless applications due to their efficiency:
1. Caching Systems
Web browsers use hash tables to store recently visited websites. The URL serves as the key, and the webpage content is the value.
const cache = new HashTable();
function fetchPage(url) {
if (cache.get(url)) {
return cache.get(url); // Return cached page
}
// Fetch the page from the server
const pageContent = fetchFromServer(url);
// Store in cache for future requests
cache.set(url, pageContent);
return pageContent;
}
2. Database Indexing
Databases use hash indices to quickly locate records based on key values.
3. Language Dictionaries
Spell checkers and language dictionaries use hash tables to quickly look up words.
const dictionary = new HashTable();
// Populate the dictionary
["apple", "banana", "computer", "programming", /* thousands more */].forEach(word => {
dictionary.set(word, true);
});
function checkSpelling(word) {
return dictionary.get(word.toLowerCase()) !== undefined;
}
console.log(checkSpelling("computer")); // Output: true
console.log(checkSpelling("compooter")); // Output: false
4. Counting Frequencies
Hash tables are perfect for counting frequencies of items:
function countWords(text) {
const wordCounts = new HashTable();
const words = text.toLowerCase().match(/\w+/g);
words.forEach(word => {
const count = wordCounts.get(word) || 0;
wordCounts.set(word, count + 1);
});
return wordCounts;
}
const text = "The quick brown fox jumps over the lazy dog. The dog was not pleased.";
const counts = countWords(text);
console.log(counts.get("the")); // Output: 2
console.log(counts.get("dog")); // Output: 2
console.log(counts.get("fox")); // Output: 1
When to Use Hash Tables
Hash tables are ideal when:
- You need fast lookups by key
- You need to ensure uniqueness of keys
- You need to count frequencies or track occurrences
- You're implementing a cache or dictionary
When Not to Use Hash Tables
Hash tables may not be the best choice when:
- You need to maintain a specific order of elements
- Memory space is extremely limited
- You need to find the minimum or maximum value quickly
- You need range queries (finding all values between X and Y)
Common Interview Questions
Hash tables are a popular topic in coding interviews. Here are some common questions:
- Implement a hash table from scratch
- Design a hash function for a specific type of key
- Solve the "Two Sum" problem using a hash table
- Find the first non-repeating character in a string
Let's solve the "Two Sum" problem as an example:
function twoSum(nums, target) {
const numMap = {};
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (numMap[complement] !== undefined) {
return [numMap[complement], i];
}
numMap[nums[i]] = i;
}
return null;
}
console.log(twoSum([2, 7, 11, 15], 9)); // Output: [0, 1]
Summary
Hash tables are incredibly powerful data structures that provide efficient key-value storage and retrieval. The key benefits include:
- Fast operations: O(1) average time complexity for insertions, lookups, and deletions
- Flexible keys: Various data types can be used as keys
- Memory efficient: Only stores what you need
We've covered how hash tables work, implementation strategies, handling collisions, and real-world applications. With this knowledge, you should be able to recognize when a hash table is the right tool for your programming challenges.
Exercises
- Implement a hash table using open addressing instead of chaining
- Create a function that finds all duplicate values in an array using a hash table
- Implement a simple spell checker using a hash table
- Use a hash table to determine if two strings are anagrams of each other
- Create a function that returns the first recurring character in a string
Additional Resources
- Books: "Introduction to Algorithms" by Cormen, Leiserson, Rivest, and Stein
- Online Courses: Check out courses on algorithms and data structures on platforms like Coursera, edX, or Udemy
- Practice: Solve hash table problems on coding platforms like LeetCode, HackerRank, or CodeSignal
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)