MongoDB $match Stage
Introduction
The $match
stage is one of the most fundamental and frequently used stages in MongoDB's aggregation framework. Think of it as the bouncer at a club who decides which documents get to continue down your aggregation pipeline. It filters the documents to allow only those that match the specified conditions to pass to the next stage.
The $match
stage is similar to the query conditions you would use in the MongoDB find()
method. It provides the same functionality as a query operator and has the same syntax, making it familiar and easy to adopt if you've used MongoDB queries before.
Basic Syntax
Here's the basic syntax of the $match
stage:
db.collection.aggregate([
{
$match: {
<field1>: <value1>,
<field2>: <value2>,
...
}
}
])
How $match Works
The $match
stage:
- Takes documents from the input collection or previous stage
- Filters them based on the specified conditions
- Passes only matching documents to the next stage
- Discards non-matching documents
Simple Examples
Example 1: Basic Filtering
Let's say we have a collection of students:
// Sample collection
db.students.insertMany([
{ name: "Alice", age: 21, major: "Computer Science", gpa: 3.8 },
{ name: "Bob", age: 19, major: "Mathematics", gpa: 3.5 },
{ name: "Charlie", age: 22, major: "Physics", gpa: 3.2 },
{ name: "Diana", age: 20, major: "Computer Science", gpa: 3.9 },
{ name: "Edward", age: 19, major: "History", gpa: 3.3 }
])
To find all Computer Science students:
db.students.aggregate([
{
$match: {
major: "Computer Science"
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]
Example 2: Multiple Conditions
To find Computer Science students with a GPA greater than 3.5:
db.students.aggregate([
{
$match: {
major: "Computer Science",
gpa: { $gt: 3.5 }
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]
Using Comparison Operators
You can use all of MongoDB's comparison operators in the $match
stage:
$eq
: Equal to$gt
: Greater than$gte
: Greater than or equal to$lt
: Less than$lte
: Less than or equal to$ne
: Not equal to$in
: In an array$nin
: Not in an array
Example: Age Range
To find students between 20 and 22 years old:
db.students.aggregate([
{
$match: {
age: {
$gte: 20,
$lte: 22
}
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Charlie", "age": 22, "major": "Physics", "gpa": 3.2 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]
Logical Operators
You can combine multiple conditions using logical operators:
$and
: Logical AND$or
: Logical OR$not
: Logical NOT$nor
: Logical NOR
Example: Using $or
Find students who either study Computer Science or have a GPA above 3.4:
db.students.aggregate([
{
$match: {
$or: [
{ major: "Computer Science" },
{ gpa: { $gt: 3.4 } }
]
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Bob", "age": 19, "major": "Mathematics", "gpa": 3.5 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]
$match in a Pipeline
One of the most powerful aspects of the $match
stage is how it can be combined with other stages in an aggregation pipeline. Let's look at some examples:
Example 1: group
Find the average GPA of students by major for students who are at least 20 years old:
db.students.aggregate([
{
$match: {
age: { $gte: 20 }
}
},
{
$group: {
_id: "$major",
averageGPA: { $avg: "$gpa" }
}
}
])
Output:
[
{ "_id": "Computer Science", "averageGPA": 3.85 },
{ "_id": "Physics", "averageGPA": 3.2 }
]
Example 2: Using $match Multiple Times
You can use $match
at multiple points in your pipeline. It's a good practice to place $match
early in your pipeline to reduce the number of documents processed by subsequent stages.
db.students.aggregate([
// First match to get only CS students
{
$match: {
major: "Computer Science"
}
},
// Some processing stage
{
$project: {
name: 1,
gpa: 1,
standing: {
$cond: {
if: { $gte: ["$gpa", 3.5] },
then: "Good Standing",
else: "Regular"
}
}
}
},
// Second match to filter further
{
$match: {
standing: "Good Standing"
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "gpa": 3.8, "standing": "Good Standing" },
{ "_id": ObjectId("..."), "name": "Diana", "gpa": 3.9, "standing": "Good Standing" }
]
Regex in $match
You can use regular expressions in the $match
stage for pattern matching on string fields.
Example: Name Starting with a Specific Letter
db.students.aggregate([
{
$match: {
name: { $regex: /^A/, $options: "i" } // Names starting with 'A', case-insensitive
}
}
])
Output:
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 }
]
Real-world Applications
Example 1: E-commerce Order Analysis
Imagine you have a collection of orders and want to analyze orders from a specific region that have a certain minimum value:
db.orders.aggregate([
{
$match: {
"shipping.region": "Northeast",
orderTotal: { $gte: 100 },
status: "Completed"
}
},
{
$group: {
_id: "$customer.id",
totalSpent: { $sum: "$orderTotal" },
orderCount: { $sum: 1 }
}
},
{
$match: {
orderCount: { $gt: 1 } // Only customers with more than one order
}
},
{
$sort: { totalSpent: -1 }
}
])
This query:
- First filters orders to include only completed orders from the Northeast region with a value of at least $100
- Groups by customer to calculate total spending and order count
- Filters again to include only customers with more than one order
- Sorts by total spending in descending order
Example 2: Log Analysis
If you're storing application logs in MongoDB, you might want to analyze errors:
db.logs.aggregate([
{
$match: {
level: "ERROR",
timestamp: {
$gte: ISODate("2023-01-01T00:00:00Z"),
$lt: ISODate("2023-02-01T00:00:00Z")
}
}
},
{
$group: {
_id: "$errorCode",
count: { $sum: 1 },
examples: { $push: { message: "$message", timestamp: "$timestamp" } }
}
},
{
$project: {
errorCode: "$_id",
_id: 0,
count: 1,
recentExamples: { $slice: ["$examples", 3] }
}
},
{
$sort: { count: -1 }
}
])
This query provides a summary of error codes encountered in January 2023, with counts and recent examples of each error.
Performance Considerations
-
Place
$match
Early: Always try to place$match
stages as early as possible in your pipeline to reduce the number of documents that need to be processed by subsequent stages. -
Use Indexes: The
$match
stage can utilize indexes if the fields being filtered have indexes. This can significantly speed up query execution. -
Avoid Complex
$match
Expressions: While$match
supports complex expressions, simpler conditions tend to be more efficient. -
Combine
$match
Stages: When possible, combine multiple$match
stages into a single stage for better performance.
Common Mistakes and Pitfalls
-
Missing Documents: If your
$match
conditions are too restrictive, you might filter out more documents than intended. -
Case Sensitivity: String matching in MongoDB is case-sensitive by default. Use the
$options: "i"
parameter with regex for case-insensitive matching. -
Type Mismatch: Ensure the types in your
$match
condition match the types in your documents. For example, comparing a string value to a numeric field will not match. -
Null Handling: Be careful when matching for null values or when fields might not exist in all documents.
Summary
The $match
stage is a versatile and essential tool in MongoDB aggregation pipelines that allows you to filter documents based on specific conditions. Key points to remember:
$match
uses the same syntax as the query conditions in thefind()
method- It should be placed early in pipelines to improve performance
- It can use all MongoDB query operators including comparison and logical operators
- It can be used multiple times in a single pipeline
$match
can leverage indexes for better performance
By mastering the $match
stage, you gain precise control over which documents flow through your aggregation pipeline, enabling more efficient and targeted data analysis.
Exercises
-
Create a
$match
query that finds all students with a GPA above 3.0 who are not studying Computer Science. -
Write an aggregation pipeline that uses
$match
to find orders placed in the last 30 days with a value greater than $50, then group them by customer and calculate the average order value. -
Create a pipeline that uses
$match
twice: first to filter blog posts by a specific category, then after some transformations, to filter only posts with more than 5 comments.
Further Resources
By implementing the examples and completing the exercises in this guide, you'll be well on your way to mastering the $match
stage in MongoDB's aggregation framework.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)