Skip to main content

MongoDB $match Stage

Introduction

The $match stage is one of the most fundamental and frequently used stages in MongoDB's aggregation framework. Think of it as the bouncer at a club who decides which documents get to continue down your aggregation pipeline. It filters the documents to allow only those that match the specified conditions to pass to the next stage.

The $match stage is similar to the query conditions you would use in the MongoDB find() method. It provides the same functionality as a query operator and has the same syntax, making it familiar and easy to adopt if you've used MongoDB queries before.

Basic Syntax

Here's the basic syntax of the $match stage:

javascript
db.collection.aggregate([
{
$match: {
<field1>: <value1>,
<field2>: <value2>,
...
}
}
])

How $match Works

The $match stage:

  1. Takes documents from the input collection or previous stage
  2. Filters them based on the specified conditions
  3. Passes only matching documents to the next stage
  4. Discards non-matching documents

Simple Examples

Example 1: Basic Filtering

Let's say we have a collection of students:

javascript
// Sample collection
db.students.insertMany([
{ name: "Alice", age: 21, major: "Computer Science", gpa: 3.8 },
{ name: "Bob", age: 19, major: "Mathematics", gpa: 3.5 },
{ name: "Charlie", age: 22, major: "Physics", gpa: 3.2 },
{ name: "Diana", age: 20, major: "Computer Science", gpa: 3.9 },
{ name: "Edward", age: 19, major: "History", gpa: 3.3 }
])

To find all Computer Science students:

javascript
db.students.aggregate([
{
$match: {
major: "Computer Science"
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]

Example 2: Multiple Conditions

To find Computer Science students with a GPA greater than 3.5:

javascript
db.students.aggregate([
{
$match: {
major: "Computer Science",
gpa: { $gt: 3.5 }
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]

Using Comparison Operators

You can use all of MongoDB's comparison operators in the $match stage:

  • $eq: Equal to
  • $gt: Greater than
  • $gte: Greater than or equal to
  • $lt: Less than
  • $lte: Less than or equal to
  • $ne: Not equal to
  • $in: In an array
  • $nin: Not in an array

Example: Age Range

To find students between 20 and 22 years old:

javascript
db.students.aggregate([
{
$match: {
age: {
$gte: 20,
$lte: 22
}
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Charlie", "age": 22, "major": "Physics", "gpa": 3.2 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]

Logical Operators

You can combine multiple conditions using logical operators:

  • $and: Logical AND
  • $or: Logical OR
  • $not: Logical NOT
  • $nor: Logical NOR

Example: Using $or

Find students who either study Computer Science or have a GPA above 3.4:

javascript
db.students.aggregate([
{
$match: {
$or: [
{ major: "Computer Science" },
{ gpa: { $gt: 3.4 } }
]
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 },
{ "_id": ObjectId("..."), "name": "Bob", "age": 19, "major": "Mathematics", "gpa": 3.5 },
{ "_id": ObjectId("..."), "name": "Diana", "age": 20, "major": "Computer Science", "gpa": 3.9 }
]

$match in a Pipeline

One of the most powerful aspects of the $match stage is how it can be combined with other stages in an aggregation pipeline. Let's look at some examples:

Example 1: match+match + group

Find the average GPA of students by major for students who are at least 20 years old:

javascript
db.students.aggregate([
{
$match: {
age: { $gte: 20 }
}
},
{
$group: {
_id: "$major",
averageGPA: { $avg: "$gpa" }
}
}
])

Output:

javascript
[
{ "_id": "Computer Science", "averageGPA": 3.85 },
{ "_id": "Physics", "averageGPA": 3.2 }
]

Example 2: Using $match Multiple Times

You can use $match at multiple points in your pipeline. It's a good practice to place $match early in your pipeline to reduce the number of documents processed by subsequent stages.

javascript
db.students.aggregate([
// First match to get only CS students
{
$match: {
major: "Computer Science"
}
},
// Some processing stage
{
$project: {
name: 1,
gpa: 1,
standing: {
$cond: {
if: { $gte: ["$gpa", 3.5] },
then: "Good Standing",
else: "Regular"
}
}
}
},
// Second match to filter further
{
$match: {
standing: "Good Standing"
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "gpa": 3.8, "standing": "Good Standing" },
{ "_id": ObjectId("..."), "name": "Diana", "gpa": 3.9, "standing": "Good Standing" }
]

Regex in $match

You can use regular expressions in the $match stage for pattern matching on string fields.

Example: Name Starting with a Specific Letter

javascript
db.students.aggregate([
{
$match: {
name: { $regex: /^A/, $options: "i" } // Names starting with 'A', case-insensitive
}
}
])

Output:

javascript
[
{ "_id": ObjectId("..."), "name": "Alice", "age": 21, "major": "Computer Science", "gpa": 3.8 }
]

Real-world Applications

Example 1: E-commerce Order Analysis

Imagine you have a collection of orders and want to analyze orders from a specific region that have a certain minimum value:

javascript
db.orders.aggregate([
{
$match: {
"shipping.region": "Northeast",
orderTotal: { $gte: 100 },
status: "Completed"
}
},
{
$group: {
_id: "$customer.id",
totalSpent: { $sum: "$orderTotal" },
orderCount: { $sum: 1 }
}
},
{
$match: {
orderCount: { $gt: 1 } // Only customers with more than one order
}
},
{
$sort: { totalSpent: -1 }
}
])

This query:

  1. First filters orders to include only completed orders from the Northeast region with a value of at least $100
  2. Groups by customer to calculate total spending and order count
  3. Filters again to include only customers with more than one order
  4. Sorts by total spending in descending order

Example 2: Log Analysis

If you're storing application logs in MongoDB, you might want to analyze errors:

javascript
db.logs.aggregate([
{
$match: {
level: "ERROR",
timestamp: {
$gte: ISODate("2023-01-01T00:00:00Z"),
$lt: ISODate("2023-02-01T00:00:00Z")
}
}
},
{
$group: {
_id: "$errorCode",
count: { $sum: 1 },
examples: { $push: { message: "$message", timestamp: "$timestamp" } }
}
},
{
$project: {
errorCode: "$_id",
_id: 0,
count: 1,
recentExamples: { $slice: ["$examples", 3] }
}
},
{
$sort: { count: -1 }
}
])

This query provides a summary of error codes encountered in January 2023, with counts and recent examples of each error.

Performance Considerations

  1. Place $match Early: Always try to place $match stages as early as possible in your pipeline to reduce the number of documents that need to be processed by subsequent stages.

  2. Use Indexes: The $match stage can utilize indexes if the fields being filtered have indexes. This can significantly speed up query execution.

  3. Avoid Complex $match Expressions: While $match supports complex expressions, simpler conditions tend to be more efficient.

  4. Combine $match Stages: When possible, combine multiple $match stages into a single stage for better performance.

Common Mistakes and Pitfalls

  1. Missing Documents: If your $match conditions are too restrictive, you might filter out more documents than intended.

  2. Case Sensitivity: String matching in MongoDB is case-sensitive by default. Use the $options: "i" parameter with regex for case-insensitive matching.

  3. Type Mismatch: Ensure the types in your $match condition match the types in your documents. For example, comparing a string value to a numeric field will not match.

  4. Null Handling: Be careful when matching for null values or when fields might not exist in all documents.

Summary

The $match stage is a versatile and essential tool in MongoDB aggregation pipelines that allows you to filter documents based on specific conditions. Key points to remember:

  • $match uses the same syntax as the query conditions in the find() method
  • It should be placed early in pipelines to improve performance
  • It can use all MongoDB query operators including comparison and logical operators
  • It can be used multiple times in a single pipeline
  • $match can leverage indexes for better performance

By mastering the $match stage, you gain precise control over which documents flow through your aggregation pipeline, enabling more efficient and targeted data analysis.

Exercises

  1. Create a $match query that finds all students with a GPA above 3.0 who are not studying Computer Science.

  2. Write an aggregation pipeline that uses $match to find orders placed in the last 30 days with a value greater than $50, then group them by customer and calculate the average order value.

  3. Create a pipeline that uses $match twice: first to filter blog posts by a specific category, then after some transformations, to filter only posts with more than 5 comments.

Further Resources

By implementing the examples and completing the exercises in this guide, you'll be well on your way to mastering the $match stage in MongoDB's aggregation framework.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)