Skip to main content

MongoDB $lookup Stage

Introduction

In relational databases, joins are a fundamental operation that allows combining data from multiple tables. MongoDB, being a NoSQL database, has a different approach to data modeling that often involves embedding related data. However, there are situations where you need to reference data across collections. This is where the $lookup stage in MongoDB's aggregation pipeline comes in.

The $lookup stage performs a left outer join operation on two collections in the same database. It adds a new field to each document in the "from" collection that contains documents from the "joined" collection that match the specified condition.

Let's explore how to use the $lookup stage to connect data across collections in MongoDB.

Basic Syntax

The basic syntax of the $lookup stage is:

javascript
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}

Parameters:

  • from: The collection to join with the current pipeline
  • localField: The field from the input documents
  • foreignField: The field from the documents of the "from" collection
  • as: The name of the new array field to be added to the input documents

Simple Example: Authors and Books

Let's start with a simple example. Imagine we have two collections: authors and books.

Here's our sample data:

javascript
// authors collection
[
{ _id: 1, name: "Jane Austen", country: "United Kingdom" },
{ _id: 2, name: "George Orwell", country: "United Kingdom" },
{ _id: 3, name: "Ernest Hemingway", country: "USA" }
]

// books collection
[
{ _id: 101, title: "Pride and Prejudice", authorId: 1, year: 1813 },
{ _id: 102, title: "Sense and Sensibility", authorId: 1, year: 1811 },
{ _id: 103, title: "1984", authorId: 2, year: 1949 },
{ _id: 104, title: "Animal Farm", authorId: 2, year: 1945 },
{ _id: 105, title: "The Old Man and the Sea", authorId: 3, year: 1952 }
]

Let's say we want to get all authors along with their books. We can use the $lookup stage as follows:

javascript
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
}
])

The result would be:

javascript
[
{
_id: 1,
name: "Jane Austen",
country: "United Kingdom",
books: [
{ _id: 101, title: "Pride and Prejudice", authorId: 1, year: 1813 },
{ _id: 102, title: "Sense and Sensibility", authorId: 1, year: 1811 }
]
},
{
_id: 2,
name: "George Orwell",
country: "United Kingdom",
books: [
{ _id: 103, title: "1984", authorId: 2, year: 1949 },
{ _id: 104, title: "Animal Farm", authorId: 2, year: 1945 }
]
},
{
_id: 3,
name: "Ernest Hemingway",
country: "USA",
books: [
{ _id: 105, title: "The Old Man and the Sea", authorId: 3, year: 1952 }
]
}
]

Notice that each author document now has a books field containing an array of books written by that author.

Working with Lookup Results

The $lookup stage adds an array field to each document. In many cases, you'll want to process these results further. Let's look at some common operations:

Filtering Joined Documents

You can combine $lookup with other stages to filter the joined documents:

javascript
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{
$project: {
name: 1,
country: 1,
books: {
$filter: {
input: "$books",
as: "book",
cond: { $gt: ["$$book.year", 1900] }
}
}
}
}
])

This query returns authors with only their books published after 1900.

You might want to count the number of related documents:

javascript
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{
$project: {
name: 1,
country: 1,
bookCount: { $size: "$books" }
}
}
])

Result:

javascript
[
{ _id: 1, name: "Jane Austen", country: "United Kingdom", bookCount: 2 },
{ _id: 2, name: "George Orwell", country: "United Kingdom", bookCount: 2 },
{ _id: 3, name: "Ernest Hemingway", country: "USA", bookCount: 1 }
]

Unwind and Group

Sometimes you might need to unwind the lookup results and then group them back:

javascript
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{ $unwind: "$books" },
{
$group: {
_id: "$country",
totalBooks: { $sum: 1 },
authors: { $addToSet: "$name" }
}
}
])

Result:

javascript
[
{
_id: "United Kingdom",
totalBooks: 4,
authors: ["Jane Austen", "George Orwell"]
},
{
_id: "USA",
totalBooks: 1,
authors: ["Ernest Hemingway"]
}
]

Advanced $lookup: Using Expressions (MongoDB 3.6+)

Starting from MongoDB 3.6, the $lookup stage supports a more advanced form that allows you to specify a pipeline to run on the joined collection:

javascript
{
$lookup:
{
from: <collection to join>,
let: { <variable1>: <expression1>, ... },
pipeline: [ <pipeline to run on the joined collection> ],
as: <output array field>
}
}

This form gives you more flexibility for complex join conditions.

Example with Let and Pipeline

Let's say we want to find authors who have written books in a specific time period:

javascript
db.authors.aggregate([
{
$lookup: {
from: "books",
let: { authorId: "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$authorId", "$$authorId"] },
{ $gte: ["$year", 1900] },
{ $lte: ["$year", 1950] }
]
}
}
}
],
as: "booksInPeriod"
}
},
{
$match: { booksInPeriod: { $ne: [] } }
}
])

This query returns authors who have written books between 1900 and 1950. The let clause defines variables that can be used in the pipeline, and $expr allows us to reference those variables using the $$ prefix.

Real-world Example: E-commerce Application

Let's consider a more complex real-world example for an e-commerce application. We have three collections:

  1. orders: Contains order information
  2. customers: Contains customer details
  3. products: Contains product information

Sample data:

javascript
// customers collection
[
{
_id: 1,
name: "Alice Johnson",
email: "[email protected]",
address: {
street: "123 Main St",
city: "New York",
zipcode: "10001"
}
},
// More customers...
]

// products collection
[
{
_id: 101,
name: "Smartphone",
price: 699.99,
category: "Electronics",
inStock: true
},
// More products...
]

// orders collection
[
{
_id: "ORD001",
customerId: 1,
date: ISODate("2023-04-15"),
items: [
{ productId: 101, quantity: 1 },
{ productId: 102, quantity: 2 }
],
status: "shipped"
},
// More orders...
]

Let's create a comprehensive order report that includes customer information and product details:

javascript
db.orders.aggregate([
// Stage 1: Lookup customer information
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerInfo"
}
},
// Stage 2: Unwind the customerInfo array (it will have only one element)
{ $unwind: "$customerInfo" },

// Stage 3: Unwind the items array to process each order item separately
{ $unwind: "$items" },

// Stage 4: Lookup product information for each item
{
$lookup: {
from: "products",
localField: "items.productId",
foreignField: "_id",
as: "productInfo"
}
},

// Stage 5: Unwind the productInfo array
{ $unwind: "$productInfo" },

// Stage 6: Calculate item total
{
$addFields: {
"items.subtotal": { $multiply: ["$items.quantity", "$productInfo.price"] }
}
},

// Stage 7: Group back to order level
{
$group: {
_id: "$_id",
customerName: { $first: "$customerInfo.name" },
customerEmail: { $first: "$customerInfo.email" },
date: { $first: "$date" },
status: { $first: "$status" },
items: {
$push: {
product: "$productInfo.name",
quantity: "$items.quantity",
price: "$productInfo.price",
subtotal: "$items.subtotal"
}
},
totalAmount: { $sum: "$items.subtotal" }
}
}
])

This query generates a comprehensive order report with customer information, product details, and calculated totals. The pipeline demonstrates how $lookup can be used as part of a larger aggregation pipeline to create rich reports from multiple collections.

Optimizing $lookup Performance

While $lookup is powerful, it can be expensive to execute, especially on large collections. Here are some tips to optimize its performance:

  1. Create indexes: Ensure that both the localField and foreignField are indexed
  2. Limit documents before lookup: Use $match before $lookup to reduce the number of documents
  3. Project only needed fields: Use $project to select only necessary fields
  4. Consider embedding for frequently accessed data: For data accessed together frequently, embedding might be more efficient

Limitations

It's important to understand the limitations of $lookup:

  1. The from collection must be in the same database
  2. There's a 100MB document size limit for resulting documents
  3. The operation might be slower compared to embedded data models
  4. Can't handle complex joins involving multiple conditions efficiently without using the advanced form

Summary

The $lookup stage is a powerful feature in MongoDB's aggregation framework that allows you to perform left outer joins between collections. It bridges the gap between the document model of MongoDB and relational data requirements. By understanding how to use $lookup effectively, you can model your data more flexibly and create complex reports and views that combine data from multiple collections.

With the advanced form introduced in MongoDB 3.6, you can perform even more sophisticated join operations using expressions and subpipelines.

Practice Exercises

  1. Create two collections: students and courses. Design a schema and use $lookup to find all courses taken by each student.

  2. Extend the e-commerce example by adding a reviews collection and use $lookup to find all products with their reviews.

  3. Try using nested $lookup operations to join three collections in a single pipeline.

Additional Resources

By mastering the $lookup stage, you'll be able to handle complex data relationships in MongoDB without sacrificing the flexibility of the document model.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)