MongoDB Aggregation Accumulators
Introduction
In MongoDB's aggregation framework, accumulators are special operators that maintain their state as documents process through the pipeline stages. They are primarily used within the $group
stage to perform calculations on grouped data, but some can also be used in other stages like $project
and $addFields
.
Accumulators work by collecting values from multiple documents and combining them to produce a single result. Think of them as functions that aggregate values across a group of documents, similar to how aggregate functions work in SQL (like SUM, AVG, COUNT, etc.).
In this tutorial, we'll explore the most commonly used MongoDB accumulators, understand how they work, and learn when and how to apply them in real-world scenarios.
Basic Accumulator Syntax
Accumulators are typically used inside the $group
stage of an aggregation pipeline. The basic structure looks like this:
{
$group: {
_id: <grouping expression>,
<field1>: { <accumulator1>: <expression1> },
<field2>: { <accumulator2>: <expression2> },
...
}
}
Where:
_id
defines how documents are grouped- Each additional field uses an accumulator to compute values across the grouped documents
Common Accumulators
Let's explore the most frequently used accumulators in MongoDB:
1. $sum
The $sum
accumulator adds numeric values together for each group.
Basic Usage:
{ $sum: <expression> }
Example: Count total items by category
Let's say we have a collection of products:
db.products.insertMany([
{ name: "Laptop", category: "Electronics", price: 999, quantity: 5 },
{ name: "Smartphone", category: "Electronics", price: 699, quantity: 10 },
{ name: "Coffee Maker", category: "Appliances", price: 89, quantity: 8 },
{ name: "Toaster", category: "Appliances", price: 49, quantity: 12 },
{ name: "Headphones", category: "Electronics", price: 199, quantity: 15 }
]);
We can use $sum
to calculate the total quantity of items in each category:
db.products.aggregate([
{
$group: {
_id: "$category",
totalQuantity: { $sum: "$quantity" }
}
}
]);
Output:
[
{ _id: "Electronics", totalQuantity: 30 },
{ _id: "Appliances", totalQuantity: 20 }
]
You can also use $sum
to count documents by passing the value 1:
db.products.aggregate([
{
$group: {
_id: "$category",
count: { $sum: 1 }
}
}
]);
Output:
[
{ _id: "Electronics", count: 3 },
{ _id: "Appliances", count: 2 }
]
2. $avg
The $avg
accumulator calculates the average of numeric values.
Basic Usage:
{ $avg: <expression> }
Example: Average product price by category
db.products.aggregate([
{
$group: {
_id: "$category",
averagePrice: { $avg: "$price" }
}
}
]);
Output:
[
{ _id: "Electronics", averagePrice: 632.33 },
{ _id: "Appliances", averagePrice: 69 }
]
3. $min
and $max
These accumulators find the minimum and maximum values in a group.
Basic Usage:
{ $min: <expression> }
{ $max: <expression> }
Example: Find min and max priced products by category
db.products.aggregate([
{
$group: {
_id: "$category",
lowestPrice: { $min: "$price" },
highestPrice: { $max: "$price" }
}
}
]);
Output:
[
{ _id: "Electronics", lowestPrice: 199, highestPrice: 999 },
{ _id: "Appliances", lowestPrice: 49, highestPrice: 89 }
]
4. $first
and $last
These accumulators return the first or last value in a group. The order depends on the documents' order in the pipeline.
Basic Usage:
{ $first: <expression> }
{ $last: <expression> }
Example: Get the first and last product in each category (by name)
db.products.aggregate([
{ $sort: { name: 1 } },
{
$group: {
_id: "$category",
firstProduct: { $first: "$name" },
lastProduct: { $last: "$name" }
}
}
]);
Output:
[
{ _id: "Electronics", firstProduct: "Headphones", lastProduct: "Smartphone" },
{ _id: "Appliances", firstProduct: "Coffee Maker", lastProduct: "Toaster" }
]
5. $push
and $addToSet
These accumulators collect values from documents into arrays.
$push
: Creates an array with all values, including duplicates.$addToSet
: Creates an array with unique values (no duplicates).
Basic Usage:
{ $push: <expression> }
{ $addToSet: <expression> }
Example: Collect all product names by category
db.products.aggregate([
{
$group: {
_id: "$category",
productNames: { $push: "$name" }
}
}
]);
Output:
[
{
_id: "Electronics",
productNames: ["Laptop", "Smartphone", "Headphones"]
},
{
_id: "Appliances",
productNames: ["Coffee Maker", "Toaster"]
}
]
Example: Collect unique price points by category
db.products.aggregate([
{
$group: {
_id: "$category",
uniquePrices: { $addToSet: "$price" }
}
}
]);
Output:
[
{ _id: "Electronics", uniquePrices: [999, 699, 199] },
{ _id: "Appliances", uniquePrices: [89, 49] }
]
6. $stdDevPop
and $stdDevSamp
These accumulators calculate the population standard deviation and sample standard deviation respectively.
Basic Usage:
{ $stdDevPop: <expression> }
{ $stdDevSamp: <expression> }
Example: Calculate price standard deviation by category
db.products.aggregate([
{
$group: {
_id: "$category",
priceStdDev: { $stdDevSamp: "$price" }
}
}
]);
Output (approximate values):
[
{ _id: "Electronics", priceStdDev: 404.96 },
{ _id: "Appliances", priceStdDev: 28.28 }
]
Advanced Accumulators
MongoDB also offers more specialized accumulators for complex calculations:
1. $mergeObjects
Combines multiple documents into a single document.
Example: Merging product details
db.products.aggregate([
{
$group: {
_id: "$category",
categoryInfo: {
$mergeObjects: {
count: { $sum: 1 },
totalValue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
}
}
]);
2. $top
and $bottom
These accumulators (available in MongoDB 5.2+) return the top or bottom N values from a group based on a specified sort criteria.
{
$top: {
output: <output>,
n: <n>,
sortBy: <sortBy>
}
}
Real-World Applications
Let's look at some real-world scenarios where accumulators are particularly useful:
Sales Analytics
Suppose we have a sales collection with order data:
db.sales.insertMany([
{ date: new Date("2023-01-15"), product: "Laptop", category: "Electronics", quantity: 2, price: 999 },
{ date: new Date("2023-01-15"), product: "Headphones", category: "Electronics", quantity: 5, price: 199 },
{ date: new Date("2023-01-16"), product: "Coffee Maker", category: "Appliances", quantity: 3, price: 89 },
{ date: new Date("2023-01-17"), product: "Smartphone", category: "Electronics", quantity: 1, price: 699 },
{ date: new Date("2023-01-17"), product: "Laptop", category: "Electronics", quantity: 1, price: 999 },
{ date: new Date("2023-01-18"), product: "Toaster", category: "Appliances", quantity: 4, price: 49 }
]);
Let's create a comprehensive sales analysis by category:
db.sales.aggregate([
// Add a calculated field for total sale amount
{
$addFields: {
saleAmount: { $multiply: ["$price", "$quantity"] }
}
},
// Group by category
{
$group: {
_id: "$category",
totalSales: { $sum: "$saleAmount" },
averageSale: { $avg: "$saleAmount" },
itemsSold: { $sum: "$quantity" },
numberOfTransactions: { $sum: 1 },
products: { $addToSet: "$product" },
highestSingleSale: { $max: "$saleAmount" },
lowestSingleSale: { $min: "$saleAmount" }
}
},
// Sort by total sales
{
$sort: { totalSales: -1 }
}
]);
Output:
[
{
_id: "Electronics",
totalSales: 5091,
averageSale: 1273.75,
itemsSold: 9,
numberOfTransactions: 4,
products: ["Laptop", "Headphones", "Smartphone"],
highestSingleSale: 1998,
lowestSingleSale: 699
},
{
_id: "Appliances",
totalSales: 463,
averageSale: 231.5,
itemsSold: 7,
numberOfTransactions: 2,
products: ["Coffee Maker", "Toaster"],
highestSingleSale: 267,
lowestSingleSale: 196
}
]
Customer Spending Analysis
Imagine we have a customer orders collection:
db.orders.insertMany([
{ customer: "John", date: new Date("2023-01-05"), amount: 120, items: 3, category: "Groceries" },
{ customer: "Emma", date: new Date("2023-01-07"), amount: 75, items: 2, category: "Clothing" },
{ customer: "John", date: new Date("2023-01-10"), amount: 200, items: 1, category: "Electronics" },
{ customer: "Sarah", date: new Date("2023-01-12"), amount: 50, items: 1, category: "Books" },
{ customer: "Emma", date: new Date("2023-01-15"), amount: 150, items: 2, category: "Groceries" },
{ customer: "John", date: new Date("2023-01-20"), amount: 90, items: 3, category: "Clothing" },
{ customer: "Sarah", date: new Date("2023-01-22"), amount: 300, items: 1, category: "Electronics" }
]);
Let's analyze spending patterns by customer:
db.orders.aggregate([
{
$group: {
_id: "$customer",
totalSpent: { $sum: "$amount" },
averageOrderValue: { $avg: "$amount" },
totalItemsPurchased: { $sum: "$items" },
orderCount: { $sum: 1 },
categories: { $addToSet: "$category" },
largestPurchase: { $max: "$amount" },
smallestPurchase: { $min: "$amount" }
}
},
{
$addFields: {
averageItemPrice: { $divide: ["$totalSpent", "$totalItemsPurchased"] }
}
},
{
$sort: { totalSpent: -1 }
}
]);
Output:
[
{
_id: "John",
totalSpent: 410,
averageOrderValue: 136.67,
totalItemsPurchased: 7,
orderCount: 3,
categories: ["Groceries", "Electronics", "Clothing"],
largestPurchase: 200,
smallestPurchase: 90,
averageItemPrice: 58.57
},
{
_id: "Sarah",
totalSpent: 350,
averageOrderValue: 175,
totalItemsPurchased: 2,
orderCount: 2,
categories: ["Books", "Electronics"],
largestPurchase: 300,
smallestPurchase: 50,
averageItemPrice: 175
},
{
_id: "Emma",
totalSpent: 225,
averageOrderValue: 112.5,
totalItemsPurchased: 4,
orderCount: 2,
categories: ["Clothing", "Groceries"],
largestPurchase: 150,
smallestPurchase: 75,
averageItemPrice: 56.25
}
]
Using Accumulators Outside of $group
While accumulators are most commonly used in the $group
stage, several accumulators can also be used in stages like $project
, $addFields
, and $set
. These are known as "accumulator expressions."
Example: Using $sum
in $project
db.sales.aggregate([
{
$project: {
product: 1,
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
]);
Best Practices
-
Use the right accumulator for the job: Choose the most appropriate accumulator for your use case. Don't use
$push
when$addToSet
would be more appropriate for collecting unique values. -
Be mindful of memory usage: Accumulators like
$push
and$addToSet
can consume a lot of memory if your groups contain many documents or if the arrays grow very large. -
Consider the order of operations: When using
$first
and$last
, remember that the order depends on the document order in the pipeline. Add a$sort
stage before$group
if a specific order is needed. -
Process data in smaller batches: If you're working with large datasets, consider breaking up your aggregation into smaller chunks or use the
allowDiskUse
option. -
Combine accumulators for complex analysis: Don't hesitate to use multiple accumulators in the same
$group
stage to create richer analytics.
Summary
MongoDB's accumulator operators are powerful tools for performing calculations across groups of documents in the aggregation pipeline. They allow you to:
- Calculate sums, averages, minimums, and maximums
- Collect values into arrays (with or without duplicates)
- Find the first or last value in a group
- Calculate standard deviations and other statistical measures
- Combine multiple documents into a single document
Understanding how and when to use accumulators is essential for creating advanced analytics and reports in MongoDB. They form the foundation of data aggregation and enable you to transform raw data into meaningful insights.
Practice Exercises
-
Basic Exercise: Using the products collection, find the total inventory value (price × quantity) for each category.
-
Intermediate Exercise: Create an analysis of product pricing that shows for each category: the average price, the price range (max - min), and how many products fall below the average price.
-
Advanced Exercise: Using the sales collection, create a daily sales report that shows for each date: total sales, number of transactions, unique products sold, and the most popular product (by quantity).
Additional Resources
Happy aggregating!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)