Skip to main content

MongoDB $group Stage

The $group stage is one of the most powerful and commonly used stages in MongoDB's aggregation framework. It allows you to group documents together based on a specified field or expression, and then perform various operations on these grouped data such as counting documents, calculating averages, finding minimum and maximum values, and more.

What is the $group Stage?

The $group stage is similar to the GROUP BY statement in SQL. It divides the input documents into groups based on one or more specified expressions, called the group key. For each unique group key value, it outputs a document containing:

  1. The unique _id field with the group key value
  2. Optional accumulators that perform operations on the grouped documents

Basic Syntax

javascript
{
$group: {
_id: <expression>, // Group key
<field1>: { <accumulator1>: <expression1> },
<field2>: { <accumulator2>: <expression2> },
...
}
}

Where:

  • _id: Specifies the group key. If you set this to null, all documents will be grouped into a single group.
  • <field>: The name you want to give to the computed field in the output document.
  • <accumulator>: One of MongoDB's accumulator operators (like $sum, $avg, etc.).
  • <expression>: The field or expression to accumulate.

Common Accumulators

MongoDB provides many accumulator operators to use with the $group stage:

AccumulatorDescription
$sumCalculates the sum of numeric values
$avgCalculates the average of numeric values
$minReturns the minimum value
$maxReturns the maximum value
$pushCreates an array of all values
$addToSetCreates an array of unique values
$firstReturns the first document for each group
$lastReturns the last document for each group
$countReturns the count of documents

Basic Examples

Let's work with a sample collection of orders:

javascript
[
{ _id: 1, product: "Laptop", category: "Electronics", price: 1200, quantity: 1 },
{ _id: 2, product: "Phone", category: "Electronics", price: 800, quantity: 2 },
{ _id: 3, product: "Headphones", category: "Electronics", price: 200, quantity: 3 },
{ _id: 4, product: "Table", category: "Furniture", price: 500, quantity: 1 },
{ _id: 5, product: "Chair", category: "Furniture", price: 150, quantity: 4 },
{ _id: 6, product: "Laptop", category: "Electronics", price: 1500, quantity: 1 }
]

Example 1: Group by Category and Count Documents

javascript
db.orders.aggregate([
{
$group: {
_id: "$category",
count: { $sum: 1 }
}
}
])

Output:

javascript
[
{ "_id": "Electronics", "count": 4 },
{ "_id": "Furniture", "count": 2 }
]

In this example:

  • We group documents by the category field.
  • For each group, we count the number of documents using $sum: 1.

Example 2: Group by Category and Calculate Total Revenue

javascript
db.orders.aggregate([
{
$group: {
_id: "$category",
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
])

Output:

javascript
[
{ "_id": "Electronics", "totalRevenue": 3800 },
{ "_id": "Furniture", "totalRevenue": 1100 }
]

In this example:

  • We group documents by the category field.
  • We calculate the total revenue for each category by multiplying price by quantity and summing the results.

Example 3: Group by Multiple Fields

You can group by multiple fields by using an object as the _id:

javascript
db.orders.aggregate([
{
$group: {
_id: {
category: "$category",
product: "$product"
},
totalQuantity: { $sum: "$quantity" }
}
}
])

Output:

javascript
[
{ "_id": { "category": "Electronics", "product": "Laptop" }, "totalQuantity": 2 },
{ "_id": { "category": "Electronics", "product": "Phone" }, "totalQuantity": 2 },
{ "_id": { "category": "Electronics", "product": "Headphones" }, "totalQuantity": 3 },
{ "_id": { "category": "Furniture", "product": "Table" }, "totalQuantity": 1 },
{ "_id": { "category": "Furniture", "product": "Chair" }, "totalQuantity": 4 }
]

Advanced Examples

Example 4: Group All Documents and Calculate Statistics

javascript
db.orders.aggregate([
{
$group: {
_id: null,
totalProducts: { $sum: 1 },
avgPrice: { $avg: "$price" },
minPrice: { $min: "$price" },
maxPrice: { $max: "$price" },
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
])

Output:

javascript
[
{
"_id": null,
"totalProducts": 6,
"avgPrice": 725,
"minPrice": 150,
"maxPrice": 1500,
"totalRevenue": 4900
}
]

Example 5: Create Arrays of Values with $push

javascript
db.orders.aggregate([
{
$group: {
_id: "$category",
products: { $push: "$product" }
}
}
])

Output:

javascript
[
{ "_id": "Electronics", "products": ["Laptop", "Phone", "Headphones", "Laptop"] },
{ "_id": "Furniture", "products": ["Table", "Chair"] }
]

Example 6: Create Arrays of Unique Values with $addToSet

javascript
db.orders.aggregate([
{
$group: {
_id: "$category",
uniqueProducts: { $addToSet: "$product" }
}
}
])

Output:

javascript
[
{ "_id": "Electronics", "uniqueProducts": ["Laptop", "Phone", "Headphones"] },
{ "_id": "Furniture", "uniqueProducts": ["Table", "Chair"] }
]

Notice how $addToSet only includes "Laptop" once, unlike $push which included it twice.

Real-World Applications

Sales Reporting

You can use $group to generate sales reports by different dimensions:

javascript
db.orders.aggregate([
{
$group: {
_id: {
year: { $year: "$orderDate" },
month: { $month: "$orderDate" }
},
salesCount: { $sum: 1 },
totalRevenue: { $sum: "$totalAmount" },
avgOrderValue: { $avg: "$totalAmount" }
}
},
{ $sort: { "_id.year": 1, "_id.month": 1 } }
])

This query would generate monthly sales reports with counts, revenue, and average order values.

Customer Analysis

You can use $group to analyze customer behavior:

javascript
db.orders.aggregate([
{
$group: {
_id: "$customerId",
orderCount: { $sum: 1 },
totalSpent: { $sum: "$totalAmount" },
averageOrderValue: { $avg: "$totalAmount" },
firstPurchase: { $min: "$orderDate" },
lastPurchase: { $max: "$orderDate" },
purchasedProducts: { $addToSet: "$product" }
}
},
{ $sort: { "totalSpent": -1 } }
])

This query generates customer profiles with their purchase history and behavior metrics.

Common Patterns with $group

Two-Stage Grouping

Sometimes, you need to perform grouping in multiple stages:

javascript
db.orders.aggregate([
// First grouping: Get sales by product and category
{
$group: {
_id: {
product: "$product",
category: "$category"
},
totalSales: { $sum: { $multiply: ["$price", "$quantity"] } }
}
},
// Second grouping: Get total sales by category
{
$group: {
_id: "$_id.category",
totalCategorySales: { $sum: "$totalSales" },
productCount: { $sum: 1 }
}
}
])

Using $group with Other Stages

The $group stage is commonly used with other aggregation stages:

javascript
db.orders.aggregate([
// Filter documents before grouping
{ $match: { price: { $gte: 500 } } },

// Group by category
{
$group: {
_id: "$category",
averagePrice: { $avg: "$price" },
count: { $sum: 1 }
}
},

// Sort results
{ $sort: { averagePrice: -1 } },

// Add a new field
{
$addFields: {
categoryLabel: { $concat: ["Category: ", "$_id"] }
}
}
])

Best Practices and Considerations

  1. Memory Limitations: The $group stage has a 100MB memory limit. If your grouped data exceeds this limit, you'll get an error. Consider using $match before $group to reduce the data size.

  2. Performance: Grouping operations can be resource-intensive. Always try to filter your data using $match before using $group to minimize the amount of data being processed.

  3. Group Key: The _id field is mandatory in a $group stage. If you want to group all documents together, use _id: null.

  4. Order: In an aggregation pipeline, document order is only preserved by the $sort stage. Don't rely on the order of documents coming out of a $group stage.

  5. Indexes: The $group stage doesn't use indexes directly, but stages before it (like $match or $sort) can benefit from indexes.

Summary

The $group stage is a powerful tool in MongoDB's aggregation framework that allows you to:

  • Group documents by one or more fields
  • Perform calculations on grouped data using accumulator operators
  • Generate statistical summaries and reports
  • Create arrays of values for each group

By mastering the $group stage, you'll be able to perform complex data aggregations and analyses on your MongoDB collections, similar to what you might do with SQL's GROUP BY clause but with more flexibility and power.

Exercises

  1. Given a collection of students with fields name, age, and grade, write an aggregation query to find the average grade for each age group.

  2. Using the orders collection from our examples, write an aggregation query to find the most expensive product in each category.

  3. Write an aggregation query to group orders by product and calculate the total revenue, average price, and total quantity sold for each product.

Additional Resources

Remember that the power of the $group stage comes from combining it with other aggregation stages in a pipeline to transform your data in meaningful ways.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)