MongoDB Anti-Patterns
In software development, an anti-pattern is a common but ineffective approach to solving a problem. MongoDB, with its flexible schema and document-oriented structure, gives developers a lot of freedom—but this freedom can lead to poor design choices if you're not careful. This guide explores common MongoDB anti-patterns and how to avoid them.
Introduction
MongoDB's flexible document model offers significant advantages over traditional relational databases for many use cases. However, this flexibility can be a double-edged sword. Without a good understanding of MongoDB's strengths and limitations, developers can inadvertently create data models that perform poorly, are difficult to maintain, or don't scale well.
Understanding these anti-patterns will help you build more robust and efficient MongoDB applications.
Common MongoDB Anti-Patterns
1. Massive Arrays
Anti-Pattern: Storing large arrays within documents that grow without bounds.
Problem
// Document with an unbounded array that could grow indefinitely
{
"_id": "user123",
"username": "john_doe",
"posts": [
{ "title": "Post 1", "content": "..." },
{ "title": "Post 2", "content": "..." },
// Potentially thousands more posts
]
}
This approach causes several issues:
- Document growth can lead to expensive relocations in storage
- Reading the document means loading all array elements into memory
- Updates to any single array element require updating the entire document
Better Solution
// User document
{
"_id": "user123",
"username": "john_doe"
}
// Separate posts collection
{
"_id": "post1",
"user_id": "user123",
"title": "Post 1",
"content": "..."
}
{
"_id": "post2",
"user_id": "user123",
"title": "Post 2",
"content": "..."
}
By separating large arrays into their own collections, you can:
- Query and update individual items more efficiently
- Avoid document size limits (16MB per document)
- Enable pagination and indexing on array elements
2. Deeply Nested Documents
Anti-Pattern: Creating complex, deeply nested document structures.
Problem
{
"_id": "order123",
"customer": {
"name": "Jane Smith",
"address": {
"street": "123 Main St",
"city": "New York",
"country": {
"name": "United States",
"code": "US",
"region": {
"name": "North America",
"subregion": {
// More nesting...
}
}
}
},
"orders": [
{
"items": [
{
"product": {
// Deep nesting continues...
}
}
]
}
]
}
}
Issues with deeply nested documents:
- Difficult to query specific nested fields
- Hard to update specific elements without loading the entire document
- Complex aggregation pipelines required for analysis
- Poor readability and maintainability
Better Solution
Flatten the structure and use references:
// Customer document
{
"_id": "customer456",
"name": "Jane Smith",
"address_id": "addr789",
"country_id": "US"
}
// Address document
{
"_id": "addr789",
"customer_id": "customer456",
"street": "123 Main St",
"city": "New York"
}
// Country document
{
"_id": "US",
"name": "United States",
"region_id": "na001"
}
// Order document
{
"_id": "order123",
"customer_id": "customer456",
"items": ["item1", "item2"]
}
3. Inappropriate Indexing
Anti-Pattern: Either over-indexing or under-indexing your collections.
Problem
Under-indexing:
// With millions of records and no index on 'email'
db.users.find({email: "[email protected]"})
Over-indexing:
// Creating too many indexes
db.users.createIndex({firstName: 1})
db.users.createIndex({lastName: 1})
db.users.createIndex({email: 1})
db.users.createIndex({age: 1})
db.users.createIndex({city: 1})
db.users.createIndex({state: 1})
db.users.createIndex({zipCode: 1})
// And many more...
Issues:
- Under-indexing leads to full collection scans and slow queries
- Over-indexing slows down write operations and uses more storage
- Unused indexes waste resources
Better Solution
Create indexes based on actual query patterns:
// If you frequently query by email
db.users.createIndex({email: 1})
// If you often search for users by name and filter by age
db.users.createIndex({lastName: 1, firstName: 1, age: 1})
// Use MongoDB's Index Statistics to identify unused indexes
db.users.aggregate([{$indexStats: {}}])
4. Schema-less but Not Planning-less
Anti-Pattern: Treating MongoDB's flexible schema as an excuse not to design your data model.
Problem
// Inconsistent document structure
{
"_id": "user1",
"name": "John",
"age": 30
}
{
"_id": "user2",
"userName": "Jane",
"userAge": 28
}
{
"_id": "user3",
"user": {
"name": "Bob",
"years": 42
}
}
This makes querying, indexing, and maintaining your application extremely difficult.
Better Solution
Even with a flexible schema, establish consistent document patterns:
// Consistent structure with schema validation
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "age"],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
age: {
bsonType: "int",
minimum: 0,
description: "must be a positive integer and is required"
}
}
}
}
})
5. Inappropriately Modeling Relationships
Anti-Pattern: Always embedding related data or always using references, regardless of the use case.
Problem
Either always embedding:
// Everything embedded - can cause large documents
{
"_id": "store123",
"name": "Grocery Store",
"products": [
{
"name": "Milk",
"price": 3.99,
"reviews": [
{
"user": {
"name": "John",
"email": "[email protected]",
// All user data embedded
},
"rating": 5,
"comment": "Great product!"
},
// Potentially hundreds of reviews
]
},
// Potentially thousands of products
]
}
Or always using references:
// Everything referenced - requires multiple queries
{
"_id": "user123",
"name": "John",
"address_id": "addr456",
"preferences_id": "pref789",
"settings_id": "set101"
}
Better Solution
Use a hybrid approach based on:
- How often the data is accessed together
- Update frequency
- Data size
- The "belongs to" vs "has many" relationship
// Example hybrid approach for a blog
{
"_id": "post123",
"title": "MongoDB Best Practices",
"content": "...",
"author": {
"_id": "user456",
"name": "Jane Smith"
// Frequently accessed author info embedded
},
"comments": [
{
"user_id": "user789", // Reference to user
"text": "Great article!"
}
],
"category_id": "cat001" // Reference to category
}
Real-world Examples and Solutions
Example 1: E-commerce Product Catalog
Anti-pattern implementation:
// Anti-pattern: All product variations in a single document
{
"_id": "tshirt1",
"name": "Cotton T-Shirt",
"description": "Comfortable cotton t-shirt",
"variants": [
{
"color": "Red",
"sizes": [
{
"size": "S",
"inventory": 25,
"price": 19.99
},
{
"size": "M",
"inventory": 30,
"price": 19.99
},
// Many more size variations
]
},
{
"color": "Blue",
"sizes": [
// Many more size variations
]
},
// Potentially dozens of color variations
]
}
Problems:
- Updating inventory requires updating the entire document
- Difficult to query for specific variants
- Document could grow beyond the 16MB limit with many variants
Better solution:
// Main product document
{
"_id": "tshirt1",
"name": "Cotton T-Shirt",
"description": "Comfortable cotton t-shirt",
"basePrice": 19.99
}
// Separate variants collection
{
"_id": "variant_red_s",
"product_id": "tshirt1",
"color": "Red",
"size": "S",
"inventory": 25,
"price": 19.99
}
{
"_id": "variant_red_m",
"product_id": "tshirt1",
"color": "Red",
"size": "M",
"inventory": 30,
"price": 19.99
}
This approach allows:
- Atomic updates to inventory numbers
- Easy queries for specific variants
- Efficient indexing on product_id, color, and size
Example 2: Social Media Feed
Anti-pattern implementation:
// Anti-pattern: Storing all user posts in the user document
{
"_id": "user123",
"name": "John Doe",
"posts": [
{
"id": "post1",
"content": "Hello world!",
"likes": 42,
"comments": [
{
"user": "user456",
"text": "Great post!",
"timestamp": ISODate("2023-01-15T12:30:45Z")
},
// Potentially hundreds of comments
]
},
// Potentially thousands of posts
]
}
Problems:
- Document grows without bounds
- Loading the user profile loads all posts and comments
- Difficult to paginate through posts
Better solution:
// User document
{
"_id": "user123",
"name": "John Doe",
"profile": {
"bio": "Software developer",
"joinedDate": ISODate("2022-05-10")
}
}
// Posts collection
{
"_id": "post1",
"user_id": "user123",
"content": "Hello world!",
"likes": 42,
"timestamp": ISODate("2023-01-15T10:30:00Z")
}
// Comments collection
{
"_id": "comment1",
"post_id": "post1",
"user_id": "user456",
"text": "Great post!",
"timestamp": ISODate("2023-01-15T12:30:45Z")
}
This allows:
- Efficient pagination of posts
- Loading only relevant data
- Atomic updates to likes count without loading all posts
Data Modeling Decision Framework
When making data modeling decisions, consider these factors:
Summary
MongoDB's flexibility is powerful but requires thoughtful design. By avoiding these common anti-patterns, you can create data models that are performant, maintainable, and scalable:
- Avoid unbounded arrays - Split large arrays into separate collections when appropriate
- Minimize document nesting - Flatten your schema when deep nesting becomes problematic
- Index thoughtfully - Create indexes based on actual query patterns
- Plan your schema - Even with flexibility, consistency matters
- Use the right relationship model - Choose embedding or referencing based on access patterns
Remember that good MongoDB design is about understanding your data access patterns and making appropriate trade-offs.
Additional Resources
- MongoDB Documentation: Data Modeling Introduction
- Book: "MongoDB: The Definitive Guide" by Kristina Chodorow
- MongoDB University: M320: Data Modeling
Exercises
- Refactor the following blog post schema to avoid the massive array anti-pattern:
{
"_id": "blog123",
"title": "My Blog",
"posts": [
// Imagine 1000+ blog posts here
]
}
- Identify the anti-patterns in this e-commerce schema and propose a better solution:
{
"_id": "user123",
"name": "Jane Smith",
"addresses": [
// Multiple addresses
],
"orders": [
{
"date": ISODate("2023-01-05"),
"items": [
{
"product": {
"name": "Laptop",
"description": "...",
"specs": {
// Deeply nested specs
}
},
"quantity": 1
}
]
}
// Potentially hundreds of orders
],
"cart": [
// Current shopping cart
],
"wishlist": [
// Wishlist items
]
}
- Create a MongoDB schema for a library management system that avoids the anti-patterns discussed in this lesson.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)