MongoDB $group Stage
The $group
stage is one of the most powerful and commonly used stages in MongoDB's aggregation framework. It allows you to group documents together based on a specified field or expression, and then perform various operations on these grouped data such as counting documents, calculating averages, finding minimum and maximum values, and more.
What is the $group Stage?
The $group
stage is similar to the GROUP BY
statement in SQL. It divides the input documents into groups based on one or more specified expressions, called the group key. For each unique group key value, it outputs a document containing:
- The unique
_id
field with the group key value - Optional accumulators that perform operations on the grouped documents
Basic Syntax
{
$group: {
_id: <expression>, // Group key
<field1>: { <accumulator1>: <expression1> },
<field2>: { <accumulator2>: <expression2> },
...
}
}
Where:
_id
: Specifies the group key. If you set this tonull
, all documents will be grouped into a single group.<field>
: The name you want to give to the computed field in the output document.<accumulator>
: One of MongoDB's accumulator operators (like$sum
,$avg
, etc.).<expression>
: The field or expression to accumulate.
Common Accumulators
MongoDB provides many accumulator operators to use with the $group
stage:
Accumulator | Description |
---|---|
$sum | Calculates the sum of numeric values |
$avg | Calculates the average of numeric values |
$min | Returns the minimum value |
$max | Returns the maximum value |
$push | Creates an array of all values |
$addToSet | Creates an array of unique values |
$first | Returns the first document for each group |
$last | Returns the last document for each group |
$count | Returns the count of documents |
Basic Examples
Let's work with a sample collection of orders
:
[
{ _id: 1, product: "Laptop", category: "Electronics", price: 1200, quantity: 1 },
{ _id: 2, product: "Phone", category: "Electronics", price: 800, quantity: 2 },
{ _id: 3, product: "Headphones", category: "Electronics", price: 200, quantity: 3 },
{ _id: 4, product: "Table", category: "Furniture", price: 500, quantity: 1 },
{ _id: 5, product: "Chair", category: "Furniture", price: 150, quantity: 4 },
{ _id: 6, product: "Laptop", category: "Electronics", price: 1500, quantity: 1 }
]
Example 1: Group by Category and Count Documents
db.orders.aggregate([
{
$group: {
_id: "$category",
count: { $sum: 1 }
}
}
])
Output:
[
{ "_id": "Electronics", "count": 4 },
{ "_id": "Furniture", "count": 2 }
]
In this example:
- We group documents by the
category
field. - For each group, we count the number of documents using
$sum: 1
.
Example 2: Group by Category and Calculate Total Revenue
db.orders.aggregate([
{
$group: {
_id: "$category",
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
])
Output:
[
{ "_id": "Electronics", "totalRevenue": 3800 },
{ "_id": "Furniture", "totalRevenue": 1100 }
]
In this example:
- We group documents by the
category
field. - We calculate the total revenue for each category by multiplying
price
byquantity
and summing the results.
Example 3: Group by Multiple Fields
You can group by multiple fields by using an object as the _id
:
db.orders.aggregate([
{
$group: {
_id: {
category: "$category",
product: "$product"
},
totalQuantity: { $sum: "$quantity" }
}
}
])
Output:
[
{ "_id": { "category": "Electronics", "product": "Laptop" }, "totalQuantity": 2 },
{ "_id": { "category": "Electronics", "product": "Phone" }, "totalQuantity": 2 },
{ "_id": { "category": "Electronics", "product": "Headphones" }, "totalQuantity": 3 },
{ "_id": { "category": "Furniture", "product": "Table" }, "totalQuantity": 1 },
{ "_id": { "category": "Furniture", "product": "Chair" }, "totalQuantity": 4 }
]
Advanced Examples
Example 4: Group All Documents and Calculate Statistics
db.orders.aggregate([
{
$group: {
_id: null,
totalProducts: { $sum: 1 },
avgPrice: { $avg: "$price" },
minPrice: { $min: "$price" },
maxPrice: { $max: "$price" },
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
])
Output:
[
{
"_id": null,
"totalProducts": 6,
"avgPrice": 725,
"minPrice": 150,
"maxPrice": 1500,
"totalRevenue": 4900
}
]
Example 5: Create Arrays of Values with $push
db.orders.aggregate([
{
$group: {
_id: "$category",
products: { $push: "$product" }
}
}
])
Output:
[
{ "_id": "Electronics", "products": ["Laptop", "Phone", "Headphones", "Laptop"] },
{ "_id": "Furniture", "products": ["Table", "Chair"] }
]
Example 6: Create Arrays of Unique Values with $addToSet
db.orders.aggregate([
{
$group: {
_id: "$category",
uniqueProducts: { $addToSet: "$product" }
}
}
])
Output:
[
{ "_id": "Electronics", "uniqueProducts": ["Laptop", "Phone", "Headphones"] },
{ "_id": "Furniture", "uniqueProducts": ["Table", "Chair"] }
]
Notice how $addToSet
only includes "Laptop" once, unlike $push
which included it twice.
Real-World Applications
Sales Reporting
You can use $group
to generate sales reports by different dimensions:
db.orders.aggregate([
{
$group: {
_id: {
year: { $year: "$orderDate" },
month: { $month: "$orderDate" }
},
salesCount: { $sum: 1 },
totalRevenue: { $sum: "$totalAmount" },
avgOrderValue: { $avg: "$totalAmount" }
}
},
{ $sort: { "_id.year": 1, "_id.month": 1 } }
])
This query would generate monthly sales reports with counts, revenue, and average order values.
Customer Analysis
You can use $group
to analyze customer behavior:
db.orders.aggregate([
{
$group: {
_id: "$customerId",
orderCount: { $sum: 1 },
totalSpent: { $sum: "$totalAmount" },
averageOrderValue: { $avg: "$totalAmount" },
firstPurchase: { $min: "$orderDate" },
lastPurchase: { $max: "$orderDate" },
purchasedProducts: { $addToSet: "$product" }
}
},
{ $sort: { "totalSpent": -1 } }
])
This query generates customer profiles with their purchase history and behavior metrics.
Common Patterns with $group
Two-Stage Grouping
Sometimes, you need to perform grouping in multiple stages:
db.orders.aggregate([
// First grouping: Get sales by product and category
{
$group: {
_id: {
product: "$product",
category: "$category"
},
totalSales: { $sum: { $multiply: ["$price", "$quantity"] } }
}
},
// Second grouping: Get total sales by category
{
$group: {
_id: "$_id.category",
totalCategorySales: { $sum: "$totalSales" },
productCount: { $sum: 1 }
}
}
])
Using $group with Other Stages
The $group
stage is commonly used with other aggregation stages:
db.orders.aggregate([
// Filter documents before grouping
{ $match: { price: { $gte: 500 } } },
// Group by category
{
$group: {
_id: "$category",
averagePrice: { $avg: "$price" },
count: { $sum: 1 }
}
},
// Sort results
{ $sort: { averagePrice: -1 } },
// Add a new field
{
$addFields: {
categoryLabel: { $concat: ["Category: ", "$_id"] }
}
}
])
Best Practices and Considerations
-
Memory Limitations: The
$group
stage has a 100MB memory limit. If your grouped data exceeds this limit, you'll get an error. Consider using$match
before$group
to reduce the data size. -
Performance: Grouping operations can be resource-intensive. Always try to filter your data using
$match
before using$group
to minimize the amount of data being processed. -
Group Key: The
_id
field is mandatory in a$group
stage. If you want to group all documents together, use_id: null
. -
Order: In an aggregation pipeline, document order is only preserved by the
$sort
stage. Don't rely on the order of documents coming out of a$group
stage. -
Indexes: The
$group
stage doesn't use indexes directly, but stages before it (like$match
or$sort
) can benefit from indexes.
Summary
The $group
stage is a powerful tool in MongoDB's aggregation framework that allows you to:
- Group documents by one or more fields
- Perform calculations on grouped data using accumulator operators
- Generate statistical summaries and reports
- Create arrays of values for each group
By mastering the $group
stage, you'll be able to perform complex data aggregations and analyses on your MongoDB collections, similar to what you might do with SQL's GROUP BY clause but with more flexibility and power.
Exercises
-
Given a collection of students with fields
name
,age
, andgrade
, write an aggregation query to find the average grade for each age group. -
Using the orders collection from our examples, write an aggregation query to find the most expensive product in each category.
-
Write an aggregation query to group orders by product and calculate the total revenue, average price, and total quantity sold for each product.
Additional Resources
Remember that the power of the $group
stage comes from combining it with other aggregation stages in a pipeline to transform your data in meaningful ways.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)