MongoDB Hashed Indexes
Introduction
Hashed indexes are a specialized type of index in MongoDB that indexes the hash of a field's value instead of the actual value itself. They're particularly useful for facilitating equality matches and supporting sharding in MongoDB distributed systems.
Unlike regular indexes that store the actual field values, hashed indexes store hash values computed from the indexed field values. This approach offers unique advantages, especially when working with high-cardinality fields or implementing sharded clusters.
In this guide, we'll explore what hashed indexes are, when to use them, how to create them, and their benefits and limitations.
Understanding Hashed Indexes
What Is a Hashed Index?
A hashed index uses a hashing function to convert field values into hash codes - fixed-size values that represent the original data. MongoDB then indexes these hash values instead of the original data.
Key Features
- Optimized for equality queries: Hashed indexes excel at exact match queries (
field: value
) - Random distribution: Hash values distribute randomly, which helps with even data distribution across shards
- Reduced index size: In some cases, hashed indexes can be smaller than regular indexes
- No support for range queries: Hashed indexes cannot support range-based operations
Creating a Hashed Index
Creating a hashed index in MongoDB is simple. You use the createIndex()
method with the hashed
index type:
db.collection.createIndex({ fieldName: "hashed" })
Example - Creating a Hashed Index on User ID
Let's create a hashed index on the userId
field in a users
collection:
db.users.createIndex({ userId: "hashed" })
MongoDB will return a success message:
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
When to Use Hashed Indexes
1. Sharding Based on a High-Cardinality Field
Hashed indexes are especially useful when configuring sharding with a high-cardinality key (a field with many unique values). They ensure more even data distribution across shards.
// First, create a hashed index
db.products.createIndex({ productId: "hashed" })
// Then, use it as a shard key
sh.shardCollection("mystore.products", { productId: "hashed" })
2. Optimizing Equality Queries on Large String Fields
If your application frequently performs equality matches on large string fields, hashed indexes can improve performance by reducing the index size.
// Create a hashed index on email field
db.customers.createIndex({ email: "hashed" })
// Query using the hashed index
db.customers.find({ email: "[email protected]" })
Practical Example: User Session Management System
Let's create a practical example of using hashed indexes in a session management system:
1. Setting Up the Collection
// Create a sessions collection
db.createCollection("sessions")
// Add some sample session data
db.sessions.insertMany([
{
sessionId: "a1b2c3d4e5f6g7h8i9j0",
userId: "user123",
lastActive: new Date(),
device: "mobile"
},
{
sessionId: "b2c3d4e5f6g7h8i9j0k1",
userId: "user456",
lastActive: new Date(),
device: "desktop"
},
// ... imagine thousands more sessions
])
2. Creating a Hashed Index
Let's add a hashed index on the sessionId
field, which is a good candidate because:
- It's a high-cardinality field (many unique values)
- We'll primarily query it for exact matches
- Session IDs are typically long strings, so hashing can save space
db.sessions.createIndex({ sessionId: "hashed" })
3. Querying with the Hashed Index
When a user sends a request with their session ID, we can quickly look up their session:
// This query will use the hashed index
const session = db.sessions.findOne({ sessionId: "a1b2c3d4e5f6g7h8i9j0" })
4. Verifying Index Usage
To confirm the hashed index is being used, we can examine the query execution plan:
db.sessions.explain().find({ sessionId: "a1b2c3d4e5f6g7h8i9j0" })
This would show that MongoDB is using our hashed index for this query.
Benefits and Limitations
Benefits
- Even data distribution: Hashed indexes distribute data evenly, making them ideal for shard keys
- Equality query performance: They excel at exact match queries
- Storage efficiency: For long string fields, the hash may be smaller than the original value
Limitations
- No range queries: You can't use hashed indexes for range-based queries (
$gt
,$lt
, etc.) - No multikey support: Cannot be used on array fields
- No compound indexes: Cannot be combined with other fields in a compound index
- No text searches: Not suitable for text search operations
- No geospatial indexing: Cannot be used for geospatial operations
Comparing Regular vs. Hashed Indexes
Let's look at a side-by-side comparison:
Feature | Regular Index | Hashed Index |
---|---|---|
Storage | Stores actual values | Stores hash of values |
Range queries | Supported | Not supported |
Equality matches | Supported | Supported |
Sorting | Supported | Not supported |
Compound index | Can be part of compound index | Cannot be part of compound index |
Multikey (arrays) | Supported | Not supported |
Distribution | Preserves order | Random distribution |
Best use case | General purpose | Sharding, equality matches |
Practical Tips
-
Use for sharding high-cardinality fields: Hashed indexes are excellent for sharding on fields like UUIDs, emails, or product IDs
-
Test performance: Before implementing in production, compare the performance of hashed vs. regular indexes for your specific use case
-
Avoid for range queries: If your application needs range-based queries on a field, don't use hashed indexes for that field
-
Consider hybrid sharding strategies: In some cases, using compound shard keys (with one hashed component) can give you the benefits of both approaches
Summary
Hashed indexes in MongoDB provide an efficient way to index fields when you need equality matches or even data distribution for sharding. They work by storing a hash of the field values rather than the values themselves.
Key takeaways:
- Use hashed indexes for equality queries and shard keys
- They provide even data distribution across shards
- They don't support range queries, compound indexes, or arrays
- They're especially useful for high-cardinality fields like UUIDs, long strings, or randomly distributed values
As you design your MongoDB schema and indexing strategy, consider whether certain fields might benefit from hashed indexing, particularly if you're implementing a sharded cluster.
Additional Resources
Exercises
-
Create a collection of products with random product IDs and implement a hashed index on the product ID field.
-
Compare the performance of equality queries on a regular vs. hashed index for a large collection with string identifiers.
-
Implement a sharded collection using a hashed index and verify the distribution of data across shards.
-
Create a simple API that uses hashed indexes to optimize lookups of resources by their UUID.
-
Experiment with different field types and query patterns to determine when hashed indexes perform better than regular indexes.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)