MongoDB $lookup Stage
Introduction
In relational databases, joins are a fundamental operation that allows combining data from multiple tables. MongoDB, being a NoSQL database, has a different approach to data modeling that often involves embedding related data. However, there are situations where you need to reference data across collections. This is where the $lookup
stage in MongoDB's aggregation pipeline comes in.
The $lookup
stage performs a left outer join operation on two collections in the same database. It adds a new field to each document in the "from" collection that contains documents from the "joined" collection that match the specified condition.
Let's explore how to use the $lookup
stage to connect data across collections in MongoDB.
Basic Syntax
The basic syntax of the $lookup
stage is:
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
Parameters:
- from: The collection to join with the current pipeline
- localField: The field from the input documents
- foreignField: The field from the documents of the "from" collection
- as: The name of the new array field to be added to the input documents
Simple Example: Authors and Books
Let's start with a simple example. Imagine we have two collections: authors
and books
.
Here's our sample data:
// authors collection
[
{ _id: 1, name: "Jane Austen", country: "United Kingdom" },
{ _id: 2, name: "George Orwell", country: "United Kingdom" },
{ _id: 3, name: "Ernest Hemingway", country: "USA" }
]
// books collection
[
{ _id: 101, title: "Pride and Prejudice", authorId: 1, year: 1813 },
{ _id: 102, title: "Sense and Sensibility", authorId: 1, year: 1811 },
{ _id: 103, title: "1984", authorId: 2, year: 1949 },
{ _id: 104, title: "Animal Farm", authorId: 2, year: 1945 },
{ _id: 105, title: "The Old Man and the Sea", authorId: 3, year: 1952 }
]
Let's say we want to get all authors along with their books. We can use the $lookup
stage as follows:
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
}
])
The result would be:
[
{
_id: 1,
name: "Jane Austen",
country: "United Kingdom",
books: [
{ _id: 101, title: "Pride and Prejudice", authorId: 1, year: 1813 },
{ _id: 102, title: "Sense and Sensibility", authorId: 1, year: 1811 }
]
},
{
_id: 2,
name: "George Orwell",
country: "United Kingdom",
books: [
{ _id: 103, title: "1984", authorId: 2, year: 1949 },
{ _id: 104, title: "Animal Farm", authorId: 2, year: 1945 }
]
},
{
_id: 3,
name: "Ernest Hemingway",
country: "USA",
books: [
{ _id: 105, title: "The Old Man and the Sea", authorId: 3, year: 1952 }
]
}
]
Notice that each author document now has a books
field containing an array of books written by that author.
Working with Lookup Results
The $lookup
stage adds an array field to each document. In many cases, you'll want to process these results further. Let's look at some common operations:
Filtering Joined Documents
You can combine $lookup
with other stages to filter the joined documents:
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{
$project: {
name: 1,
country: 1,
books: {
$filter: {
input: "$books",
as: "book",
cond: { $gt: ["$$book.year", 1900] }
}
}
}
}
])
This query returns authors with only their books published after 1900.
Counting Related Documents
You might want to count the number of related documents:
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{
$project: {
name: 1,
country: 1,
bookCount: { $size: "$books" }
}
}
])
Result:
[
{ _id: 1, name: "Jane Austen", country: "United Kingdom", bookCount: 2 },
{ _id: 2, name: "George Orwell", country: "United Kingdom", bookCount: 2 },
{ _id: 3, name: "Ernest Hemingway", country: "USA", bookCount: 1 }
]
Unwind and Group
Sometimes you might need to unwind the lookup results and then group them back:
db.authors.aggregate([
{
$lookup: {
from: "books",
localField: "_id",
foreignField: "authorId",
as: "books"
}
},
{ $unwind: "$books" },
{
$group: {
_id: "$country",
totalBooks: { $sum: 1 },
authors: { $addToSet: "$name" }
}
}
])
Result:
[
{
_id: "United Kingdom",
totalBooks: 4,
authors: ["Jane Austen", "George Orwell"]
},
{
_id: "USA",
totalBooks: 1,
authors: ["Ernest Hemingway"]
}
]
Advanced $lookup: Using Expressions (MongoDB 3.6+)
Starting from MongoDB 3.6, the $lookup
stage supports a more advanced form that allows you to specify a pipeline to run on the joined collection:
{
$lookup:
{
from: <collection to join>,
let: { <variable1>: <expression1>, ... },
pipeline: [ <pipeline to run on the joined collection> ],
as: <output array field>
}
}
This form gives you more flexibility for complex join conditions.
Example with Let and Pipeline
Let's say we want to find authors who have written books in a specific time period:
db.authors.aggregate([
{
$lookup: {
from: "books",
let: { authorId: "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$authorId", "$$authorId"] },
{ $gte: ["$year", 1900] },
{ $lte: ["$year", 1950] }
]
}
}
}
],
as: "booksInPeriod"
}
},
{
$match: { booksInPeriod: { $ne: [] } }
}
])
This query returns authors who have written books between 1900 and 1950. The let
clause defines variables that can be used in the pipeline, and $expr
allows us to reference those variables using the $$
prefix.
Real-world Example: E-commerce Application
Let's consider a more complex real-world example for an e-commerce application. We have three collections:
orders
: Contains order informationcustomers
: Contains customer detailsproducts
: Contains product information
Sample data:
// customers collection
[
{
_id: 1,
name: "Alice Johnson",
email: "[email protected]",
address: {
street: "123 Main St",
city: "New York",
zipcode: "10001"
}
},
// More customers...
]
// products collection
[
{
_id: 101,
name: "Smartphone",
price: 699.99,
category: "Electronics",
inStock: true
},
// More products...
]
// orders collection
[
{
_id: "ORD001",
customerId: 1,
date: ISODate("2023-04-15"),
items: [
{ productId: 101, quantity: 1 },
{ productId: 102, quantity: 2 }
],
status: "shipped"
},
// More orders...
]
Let's create a comprehensive order report that includes customer information and product details:
db.orders.aggregate([
// Stage 1: Lookup customer information
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerInfo"
}
},
// Stage 2: Unwind the customerInfo array (it will have only one element)
{ $unwind: "$customerInfo" },
// Stage 3: Unwind the items array to process each order item separately
{ $unwind: "$items" },
// Stage 4: Lookup product information for each item
{
$lookup: {
from: "products",
localField: "items.productId",
foreignField: "_id",
as: "productInfo"
}
},
// Stage 5: Unwind the productInfo array
{ $unwind: "$productInfo" },
// Stage 6: Calculate item total
{
$addFields: {
"items.subtotal": { $multiply: ["$items.quantity", "$productInfo.price"] }
}
},
// Stage 7: Group back to order level
{
$group: {
_id: "$_id",
customerName: { $first: "$customerInfo.name" },
customerEmail: { $first: "$customerInfo.email" },
date: { $first: "$date" },
status: { $first: "$status" },
items: {
$push: {
product: "$productInfo.name",
quantity: "$items.quantity",
price: "$productInfo.price",
subtotal: "$items.subtotal"
}
},
totalAmount: { $sum: "$items.subtotal" }
}
}
])
This query generates a comprehensive order report with customer information, product details, and calculated totals. The pipeline demonstrates how $lookup
can be used as part of a larger aggregation pipeline to create rich reports from multiple collections.
Optimizing $lookup Performance
While $lookup
is powerful, it can be expensive to execute, especially on large collections. Here are some tips to optimize its performance:
- Create indexes: Ensure that both the
localField
andforeignField
are indexed - Limit documents before lookup: Use
$match
before$lookup
to reduce the number of documents - Project only needed fields: Use
$project
to select only necessary fields - Consider embedding for frequently accessed data: For data accessed together frequently, embedding might be more efficient
Limitations
It's important to understand the limitations of $lookup
:
- The
from
collection must be in the same database - There's a 100MB document size limit for resulting documents
- The operation might be slower compared to embedded data models
- Can't handle complex joins involving multiple conditions efficiently without using the advanced form
Summary
The $lookup
stage is a powerful feature in MongoDB's aggregation framework that allows you to perform left outer joins between collections. It bridges the gap between the document model of MongoDB and relational data requirements. By understanding how to use $lookup
effectively, you can model your data more flexibly and create complex reports and views that combine data from multiple collections.
With the advanced form introduced in MongoDB 3.6, you can perform even more sophisticated join operations using expressions and subpipelines.
Practice Exercises
-
Create two collections:
students
andcourses
. Design a schema and use$lookup
to find all courses taken by each student. -
Extend the e-commerce example by adding a
reviews
collection and use$lookup
to find all products with their reviews. -
Try using nested
$lookup
operations to join three collections in a single pipeline.
Additional Resources
- MongoDB Documentation on $lookup
- MongoDB Aggregation Pipeline Optimization
- MongoDB Data Modeling Strategies
By mastering the $lookup
stage, you'll be able to handle complex data relationships in MongoDB without sacrificing the flexibility of the document model.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)