MongoDB Anti-Patterns

In software development, an anti-pattern is a common but ineffective approach to solving a problem. MongoDB, with its flexible schema and document-oriented structure, gives developers a lot of freedom—but this freedom can lead to poor design choices if you're not careful. This guide explores common MongoDB anti-patterns and how to avoid them.

Introduction

MongoDB's flexible document model offers significant advantages over traditional relational databases for many use cases. However, this flexibility can be a double-edged sword. Without a good understanding of MongoDB's strengths and limitations, developers can inadvertently create data models that perform poorly, are difficult to maintain, or don't scale well.

Understanding these anti-patterns will help you build more robust and efficient MongoDB applications.

Common MongoDB Anti-Patterns

1. Massive Arrays

Anti-Pattern: Storing large arrays within documents that grow without bounds.

Problem

// Document with an unbounded array that could grow indefinitely
{
  "_id": "user123",
  "username": "john_doe",
  "posts": [
    { "title": "Post 1", "content": "..." },
    { "title": "Post 2", "content": "..." },
    // Potentially thousands more posts
  ]
}

This approach causes several issues:

Document growth can lead to expensive relocations in storage
Reading the document means loading all array elements into memory
Updates to any single array element require updating the entire document

Better Solution

// User document
{
  "_id": "user123",
  "username": "john_doe"
}

// Separate posts collection
{
  "_id": "post1",
  "user_id": "user123",
  "title": "Post 1",
  "content": "..."
}

{
  "_id": "post2",
  "user_id": "user123",
  "title": "Post 2",
  "content": "..."
}

By separating large arrays into their own collections, you can:

Query and update individual items more efficiently
Avoid document size limits (16MB per document)
Enable pagination and indexing on array elements

2. Deeply Nested Documents

Anti-Pattern: Creating complex, deeply nested document structures.

Problem

{
  "_id": "order123",
  "customer": {
    "name": "Jane Smith",
    "address": {
      "street": "123 Main St",
      "city": "New York",
      "country": {
        "name": "United States",
        "code": "US",
        "region": {
          "name": "North America",
          "subregion": {
            // More nesting...
          }
        }
      }
    },
    "orders": [
      {
        "items": [
          {
            "product": {
              // Deep nesting continues...
            }
          }
        ]
      }
    ]
  }
}

Issues with deeply nested documents:

Difficult to query specific nested fields
Hard to update specific elements without loading the entire document
Complex aggregation pipelines required for analysis
Poor readability and maintainability

Better Solution

Flatten the structure and use references:

// Customer document
{
  "_id": "customer456",
  "name": "Jane Smith",
  "address_id": "addr789",
  "country_id": "US"
}

// Address document
{
  "_id": "addr789",
  "customer_id": "customer456",
  "street": "123 Main St",
  "city": "New York"
}

// Country document
{
  "_id": "US",
  "name": "United States",
  "region_id": "na001"
}

// Order document
{
  "_id": "order123",
  "customer_id": "customer456",
  "items": ["item1", "item2"]
}

3. Inappropriate Indexing

Anti-Pattern: Either over-indexing or under-indexing your collections.

Problem

Under-indexing:

// With millions of records and no index on 'email'
db.users.find({email: "[email protected]"})

Over-indexing:

// Creating too many indexes
db.users.createIndex({firstName: 1})
db.users.createIndex({lastName: 1})
db.users.createIndex({email: 1})
db.users.createIndex({age: 1})
db.users.createIndex({city: 1})
db.users.createIndex({state: 1})
db.users.createIndex({zipCode: 1})
// And many more...

Issues:

Under-indexing leads to full collection scans and slow queries
Over-indexing slows down write operations and uses more storage
Unused indexes waste resources

Better Solution

Create indexes based on actual query patterns:

// If you frequently query by email
db.users.createIndex({email: 1})

// If you often search for users by name and filter by age
db.users.createIndex({lastName: 1, firstName: 1, age: 1})

// Use MongoDB's Index Statistics to identify unused indexes
db.users.aggregate([{$indexStats: {}}])

4. Schema-less but Not Planning-less

Anti-Pattern: Treating MongoDB's flexible schema as an excuse not to design your data model.

Problem

// Inconsistent document structure
{
  "_id": "user1",
  "name": "John",
  "age": 30
}

{
  "_id": "user2",
  "userName": "Jane",
  "userAge": 28
}

{
  "_id": "user3",
  "user": {
    "name": "Bob",
    "years": 42
  }
}

This makes querying, indexing, and maintaining your application extremely difficult.

Better Solution

Even with a flexible schema, establish consistent document patterns:

// Consistent structure with schema validation
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "age"],
      properties: {
        name: {
          bsonType: "string",
          description: "must be a string and is required"
        },
        age: {
          bsonType: "int",
          minimum: 0,
          description: "must be a positive integer and is required"
        }
      }
    }
  }
})

5. Inappropriately Modeling Relationships

Anti-Pattern: Always embedding related data or always using references, regardless of the use case.

Problem

Either always embedding:

// Everything embedded - can cause large documents
{
  "_id": "store123",
  "name": "Grocery Store",
  "products": [
    {
      "name": "Milk",
      "price": 3.99,
      "reviews": [
        {
          "user": {
            "name": "John",
            "email": "[email protected]",
            // All user data embedded
          },
          "rating": 5,
          "comment": "Great product!"
        },
        // Potentially hundreds of reviews
      ]
    },
    // Potentially thousands of products
  ]
}

Or always using references:

// Everything referenced - requires multiple queries
{
  "_id": "user123",
  "name": "John",
  "address_id": "addr456",
  "preferences_id": "pref789",
  "settings_id": "set101"
}

Better Solution

Use a hybrid approach based on:

How often the data is accessed together
Update frequency
Data size
The "belongs to" vs "has many" relationship

// Example hybrid approach for a blog
{
  "_id": "post123",
  "title": "MongoDB Best Practices",
  "content": "...",
  "author": {
    "_id": "user456",
    "name": "Jane Smith"
    // Frequently accessed author info embedded
  },
  "comments": [
    {
      "user_id": "user789", // Reference to user
      "text": "Great article!"
    }
  ],
  "category_id": "cat001" // Reference to category
}

Real-world Examples and Solutions

Example 1: E-commerce Product Catalog

Anti-pattern implementation:

// Anti-pattern: All product variations in a single document
{
  "_id": "tshirt1",
  "name": "Cotton T-Shirt",
  "description": "Comfortable cotton t-shirt",
  "variants": [
    {
      "color": "Red",
      "sizes": [
        {
          "size": "S",
          "inventory": 25,
          "price": 19.99
        },
        {
          "size": "M",
          "inventory": 30,
          "price": 19.99
        },
        // Many more size variations
      ]
    },
    {
      "color": "Blue",
      "sizes": [
        // Many more size variations
      ]
    },
    // Potentially dozens of color variations
  ]
}

Problems:

Updating inventory requires updating the entire document
Difficult to query for specific variants
Document could grow beyond the 16MB limit with many variants

Better solution:

// Main product document
{
  "_id": "tshirt1",
  "name": "Cotton T-Shirt",
  "description": "Comfortable cotton t-shirt",
  "basePrice": 19.99
}

// Separate variants collection
{
  "_id": "variant_red_s",
  "product_id": "tshirt1",
  "color": "Red",
  "size": "S",
  "inventory": 25,
  "price": 19.99
}

{
  "_id": "variant_red_m",
  "product_id": "tshirt1",
  "color": "Red",
  "size": "M",
  "inventory": 30,
  "price": 19.99
}

This approach allows:

Atomic updates to inventory numbers
Easy queries for specific variants
Efficient indexing on product_id, color, and size

Anti-pattern implementation:

// Anti-pattern: Storing all user posts in the user document
{
  "_id": "user123",
  "name": "John Doe",
  "posts": [
    {
      "id": "post1",
      "content": "Hello world!",
      "likes": 42,
      "comments": [
        {
          "user": "user456",
          "text": "Great post!",
          "timestamp": ISODate("2023-01-15T12:30:45Z")
        },
        // Potentially hundreds of comments
      ]
    },
    // Potentially thousands of posts
  ]
}

Problems:

Document grows without bounds
Loading the user profile loads all posts and comments
Difficult to paginate through posts

Better solution:

// User document
{
  "_id": "user123",
  "name": "John Doe",
  "profile": {
    "bio": "Software developer",
    "joinedDate": ISODate("2022-05-10")
  }
}

// Posts collection
{
  "_id": "post1",
  "user_id": "user123",
  "content": "Hello world!",
  "likes": 42,
  "timestamp": ISODate("2023-01-15T10:30:00Z")
}

// Comments collection
{
  "_id": "comment1",
  "post_id": "post1",
  "user_id": "user456",
  "text": "Great post!",
  "timestamp": ISODate("2023-01-15T12:30:45Z")
}

This allows:

Efficient pagination of posts
Loading only relevant data
Atomic updates to likes count without loading all posts

Data Modeling Decision Framework

When making data modeling decisions, consider these factors:

Summary

MongoDB's flexibility is powerful but requires thoughtful design. By avoiding these common anti-patterns, you can create data models that are performant, maintainable, and scalable:

Avoid unbounded arrays - Split large arrays into separate collections when appropriate
Minimize document nesting - Flatten your schema when deep nesting becomes problematic
Index thoughtfully - Create indexes based on actual query patterns
Plan your schema - Even with flexibility, consistency matters
Use the right relationship model - Choose embedding or referencing based on access patterns

Remember that good MongoDB design is about understanding your data access patterns and making appropriate trade-offs.

Additional Resources

MongoDB Documentation: Data Modeling Introduction
Book: "MongoDB: The Definitive Guide" by Kristina Chodorow
MongoDB University: M320: Data Modeling

Exercises

Refactor the following blog post schema to avoid the massive array anti-pattern:

{
  "_id": "blog123",
  "title": "My Blog",
  "posts": [
    // Imagine 1000+ blog posts here
  ]
}

Identify the anti-patterns in this e-commerce schema and propose a better solution:

{
  "_id": "user123",
  "name": "Jane Smith",
  "addresses": [
    // Multiple addresses
  ],
  "orders": [
    {
      "date": ISODate("2023-01-05"),
      "items": [
        {
          "product": {
            "name": "Laptop",
            "description": "...",
            "specs": {
              // Deeply nested specs
            }
          },
          "quantity": 1
        }
      ]
    }
    // Potentially hundreds of orders
  ],
  "cart": [
    // Current shopping cart
  ],
  "wishlist": [
    // Wishlist items
  ]
}

Create a MongoDB schema for a library management system that avoids the anti-patterns discussed in this lesson.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Common MongoDB Anti-Patterns​

1. Massive Arrays​

Problem​

Better Solution​

2. Deeply Nested Documents​

Problem​

Better Solution​

3. Inappropriate Indexing​

Problem​

Better Solution​

4. Schema-less but Not Planning-less​

Problem​

Better Solution​

5. Inappropriately Modeling Relationships​

Problem​

Better Solution​

Real-world Examples and Solutions​

Example 1: E-commerce Product Catalog​

Example 2: Social Media Feed​

Data Modeling Decision Framework​

Summary​

Additional Resources​

Exercises​

Introduction

Common MongoDB Anti-Patterns

1. Massive Arrays

Problem

Better Solution

2. Deeply Nested Documents

Problem

Better Solution

3. Inappropriate Indexing

Problem

Better Solution

4. Schema-less but Not Planning-less

Problem

Better Solution

5. Inappropriately Modeling Relationships

Problem

Better Solution

Real-world Examples and Solutions

Example 1: E-commerce Product Catalog

Example 2: Social Media Feed

Data Modeling Decision Framework

Summary

Additional Resources

Exercises