Skip to main content

MongoDB Aggregation Accumulators

Introduction

In MongoDB's aggregation framework, accumulators are special operators that maintain their state as documents process through the pipeline stages. They are primarily used within the $group stage to perform calculations on grouped data, but some can also be used in other stages like $project and $addFields.

Accumulators work by collecting values from multiple documents and combining them to produce a single result. Think of them as functions that aggregate values across a group of documents, similar to how aggregate functions work in SQL (like SUM, AVG, COUNT, etc.).

In this tutorial, we'll explore the most commonly used MongoDB accumulators, understand how they work, and learn when and how to apply them in real-world scenarios.

Basic Accumulator Syntax

Accumulators are typically used inside the $group stage of an aggregation pipeline. The basic structure looks like this:

javascript
{
$group: {
_id: <grouping expression>,
<field1>: { <accumulator1>: <expression1> },
<field2>: { <accumulator2>: <expression2> },
...
}
}

Where:

  • _id defines how documents are grouped
  • Each additional field uses an accumulator to compute values across the grouped documents

Common Accumulators

Let's explore the most frequently used accumulators in MongoDB:

1. $sum

The $sum accumulator adds numeric values together for each group.

Basic Usage:

javascript
{ $sum: <expression> }

Example: Count total items by category

Let's say we have a collection of products:

javascript
db.products.insertMany([
{ name: "Laptop", category: "Electronics", price: 999, quantity: 5 },
{ name: "Smartphone", category: "Electronics", price: 699, quantity: 10 },
{ name: "Coffee Maker", category: "Appliances", price: 89, quantity: 8 },
{ name: "Toaster", category: "Appliances", price: 49, quantity: 12 },
{ name: "Headphones", category: "Electronics", price: 199, quantity: 15 }
]);

We can use $sum to calculate the total quantity of items in each category:

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
totalQuantity: { $sum: "$quantity" }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", totalQuantity: 30 },
{ _id: "Appliances", totalQuantity: 20 }
]

You can also use $sum to count documents by passing the value 1:

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
count: { $sum: 1 }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", count: 3 },
{ _id: "Appliances", count: 2 }
]

2. $avg

The $avg accumulator calculates the average of numeric values.

Basic Usage:

javascript
{ $avg: <expression> }

Example: Average product price by category

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
averagePrice: { $avg: "$price" }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", averagePrice: 632.33 },
{ _id: "Appliances", averagePrice: 69 }
]

3. $min and $max

These accumulators find the minimum and maximum values in a group.

Basic Usage:

javascript
{ $min: <expression> }
{ $max: <expression> }

Example: Find min and max priced products by category

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
lowestPrice: { $min: "$price" },
highestPrice: { $max: "$price" }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", lowestPrice: 199, highestPrice: 999 },
{ _id: "Appliances", lowestPrice: 49, highestPrice: 89 }
]

4. $first and $last

These accumulators return the first or last value in a group. The order depends on the documents' order in the pipeline.

Basic Usage:

javascript
{ $first: <expression> }
{ $last: <expression> }

Example: Get the first and last product in each category (by name)

javascript
db.products.aggregate([
{ $sort: { name: 1 } },
{
$group: {
_id: "$category",
firstProduct: { $first: "$name" },
lastProduct: { $last: "$name" }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", firstProduct: "Headphones", lastProduct: "Smartphone" },
{ _id: "Appliances", firstProduct: "Coffee Maker", lastProduct: "Toaster" }
]

5. $push and $addToSet

These accumulators collect values from documents into arrays.

  • $push: Creates an array with all values, including duplicates.
  • $addToSet: Creates an array with unique values (no duplicates).

Basic Usage:

javascript
{ $push: <expression> }
{ $addToSet: <expression> }

Example: Collect all product names by category

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
productNames: { $push: "$name" }
}
}
]);

Output:

javascript
[
{
_id: "Electronics",
productNames: ["Laptop", "Smartphone", "Headphones"]
},
{
_id: "Appliances",
productNames: ["Coffee Maker", "Toaster"]
}
]

Example: Collect unique price points by category

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
uniquePrices: { $addToSet: "$price" }
}
}
]);

Output:

javascript
[
{ _id: "Electronics", uniquePrices: [999, 699, 199] },
{ _id: "Appliances", uniquePrices: [89, 49] }
]

6. $stdDevPop and $stdDevSamp

These accumulators calculate the population standard deviation and sample standard deviation respectively.

Basic Usage:

javascript
{ $stdDevPop: <expression> }
{ $stdDevSamp: <expression> }

Example: Calculate price standard deviation by category

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
priceStdDev: { $stdDevSamp: "$price" }
}
}
]);

Output (approximate values):

javascript
[
{ _id: "Electronics", priceStdDev: 404.96 },
{ _id: "Appliances", priceStdDev: 28.28 }
]

Advanced Accumulators

MongoDB also offers more specialized accumulators for complex calculations:

1. $mergeObjects

Combines multiple documents into a single document.

Example: Merging product details

javascript
db.products.aggregate([
{
$group: {
_id: "$category",
categoryInfo: {
$mergeObjects: {
count: { $sum: 1 },
totalValue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
}
}
]);

2. $top and $bottom

These accumulators (available in MongoDB 5.2+) return the top or bottom N values from a group based on a specified sort criteria.

javascript
{
$top: {
output: <output>,
n: <n>,
sortBy: <sortBy>
}
}

Real-World Applications

Let's look at some real-world scenarios where accumulators are particularly useful:

Sales Analytics

Suppose we have a sales collection with order data:

javascript
db.sales.insertMany([
{ date: new Date("2023-01-15"), product: "Laptop", category: "Electronics", quantity: 2, price: 999 },
{ date: new Date("2023-01-15"), product: "Headphones", category: "Electronics", quantity: 5, price: 199 },
{ date: new Date("2023-01-16"), product: "Coffee Maker", category: "Appliances", quantity: 3, price: 89 },
{ date: new Date("2023-01-17"), product: "Smartphone", category: "Electronics", quantity: 1, price: 699 },
{ date: new Date("2023-01-17"), product: "Laptop", category: "Electronics", quantity: 1, price: 999 },
{ date: new Date("2023-01-18"), product: "Toaster", category: "Appliances", quantity: 4, price: 49 }
]);

Let's create a comprehensive sales analysis by category:

javascript
db.sales.aggregate([
// Add a calculated field for total sale amount
{
$addFields: {
saleAmount: { $multiply: ["$price", "$quantity"] }
}
},
// Group by category
{
$group: {
_id: "$category",
totalSales: { $sum: "$saleAmount" },
averageSale: { $avg: "$saleAmount" },
itemsSold: { $sum: "$quantity" },
numberOfTransactions: { $sum: 1 },
products: { $addToSet: "$product" },
highestSingleSale: { $max: "$saleAmount" },
lowestSingleSale: { $min: "$saleAmount" }
}
},
// Sort by total sales
{
$sort: { totalSales: -1 }
}
]);

Output:

javascript
[
{
_id: "Electronics",
totalSales: 5091,
averageSale: 1273.75,
itemsSold: 9,
numberOfTransactions: 4,
products: ["Laptop", "Headphones", "Smartphone"],
highestSingleSale: 1998,
lowestSingleSale: 699
},
{
_id: "Appliances",
totalSales: 463,
averageSale: 231.5,
itemsSold: 7,
numberOfTransactions: 2,
products: ["Coffee Maker", "Toaster"],
highestSingleSale: 267,
lowestSingleSale: 196
}
]

Customer Spending Analysis

Imagine we have a customer orders collection:

javascript
db.orders.insertMany([
{ customer: "John", date: new Date("2023-01-05"), amount: 120, items: 3, category: "Groceries" },
{ customer: "Emma", date: new Date("2023-01-07"), amount: 75, items: 2, category: "Clothing" },
{ customer: "John", date: new Date("2023-01-10"), amount: 200, items: 1, category: "Electronics" },
{ customer: "Sarah", date: new Date("2023-01-12"), amount: 50, items: 1, category: "Books" },
{ customer: "Emma", date: new Date("2023-01-15"), amount: 150, items: 2, category: "Groceries" },
{ customer: "John", date: new Date("2023-01-20"), amount: 90, items: 3, category: "Clothing" },
{ customer: "Sarah", date: new Date("2023-01-22"), amount: 300, items: 1, category: "Electronics" }
]);

Let's analyze spending patterns by customer:

javascript
db.orders.aggregate([
{
$group: {
_id: "$customer",
totalSpent: { $sum: "$amount" },
averageOrderValue: { $avg: "$amount" },
totalItemsPurchased: { $sum: "$items" },
orderCount: { $sum: 1 },
categories: { $addToSet: "$category" },
largestPurchase: { $max: "$amount" },
smallestPurchase: { $min: "$amount" }
}
},
{
$addFields: {
averageItemPrice: { $divide: ["$totalSpent", "$totalItemsPurchased"] }
}
},
{
$sort: { totalSpent: -1 }
}
]);

Output:

javascript
[
{
_id: "John",
totalSpent: 410,
averageOrderValue: 136.67,
totalItemsPurchased: 7,
orderCount: 3,
categories: ["Groceries", "Electronics", "Clothing"],
largestPurchase: 200,
smallestPurchase: 90,
averageItemPrice: 58.57
},
{
_id: "Sarah",
totalSpent: 350,
averageOrderValue: 175,
totalItemsPurchased: 2,
orderCount: 2,
categories: ["Books", "Electronics"],
largestPurchase: 300,
smallestPurchase: 50,
averageItemPrice: 175
},
{
_id: "Emma",
totalSpent: 225,
averageOrderValue: 112.5,
totalItemsPurchased: 4,
orderCount: 2,
categories: ["Clothing", "Groceries"],
largestPurchase: 150,
smallestPurchase: 75,
averageItemPrice: 56.25
}
]

Using Accumulators Outside of $group

While accumulators are most commonly used in the $group stage, several accumulators can also be used in stages like $project, $addFields, and $set. These are known as "accumulator expressions."

Example: Using $sum in $project

javascript
db.sales.aggregate([
{
$project: {
product: 1,
totalRevenue: { $sum: { $multiply: ["$price", "$quantity"] } }
}
}
]);

Best Practices

  1. Use the right accumulator for the job: Choose the most appropriate accumulator for your use case. Don't use $push when $addToSet would be more appropriate for collecting unique values.

  2. Be mindful of memory usage: Accumulators like $push and $addToSet can consume a lot of memory if your groups contain many documents or if the arrays grow very large.

  3. Consider the order of operations: When using $first and $last, remember that the order depends on the document order in the pipeline. Add a $sort stage before $group if a specific order is needed.

  4. Process data in smaller batches: If you're working with large datasets, consider breaking up your aggregation into smaller chunks or use the allowDiskUse option.

  5. Combine accumulators for complex analysis: Don't hesitate to use multiple accumulators in the same $group stage to create richer analytics.

Summary

MongoDB's accumulator operators are powerful tools for performing calculations across groups of documents in the aggregation pipeline. They allow you to:

  • Calculate sums, averages, minimums, and maximums
  • Collect values into arrays (with or without duplicates)
  • Find the first or last value in a group
  • Calculate standard deviations and other statistical measures
  • Combine multiple documents into a single document

Understanding how and when to use accumulators is essential for creating advanced analytics and reports in MongoDB. They form the foundation of data aggregation and enable you to transform raw data into meaningful insights.

Practice Exercises

  1. Basic Exercise: Using the products collection, find the total inventory value (price × quantity) for each category.

  2. Intermediate Exercise: Create an analysis of product pricing that shows for each category: the average price, the price range (max - min), and how many products fall below the average price.

  3. Advanced Exercise: Using the sales collection, create a daily sales report that shows for each date: total sales, number of transactions, unique products sold, and the most popular product (by quantity).

Additional Resources

Happy aggregating!



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)