Skip to main content

MongoDB $skip Stage

In MongoDB's aggregation framework, the $skip stage provides a powerful way to bypass a specified number of documents before continuing with the pipeline processing. This is particularly useful for pagination and when you need to process documents in batches.

Introduction

The $skip stage is one of the fundamental stages in MongoDB's aggregation pipeline. Similar to the SQL OFFSET clause, it allows you to exclude the first n documents from the results returned by the aggregation pipeline.

The $skip stage takes a single parameter that specifies how many documents to skip:

javascript
{ $skip: <positive integer> }

Basic Usage

The basic syntax of the $skip stage is straightforward:

javascript
db.collection.aggregate([
{ $skip: 10 }
])

This aggregation operation will skip the first 10 documents in the collection and return all documents starting from the 11th document.

How $skip Works

When MongoDB processes a $skip stage:

  1. It counts documents as they flow through the pipeline
  2. It discards the specified number of documents
  3. It passes the remaining documents to the next stage in the pipeline

Consider this simple example with a collection of numbers:

javascript
// Sample data
db.numbers.insertMany([
{ value: 1 }, { value: 2 }, { value: 3 },
{ value: 4 }, { value: 5 }, { value: 6 },
{ value: 7 }, { value: 8 }, { value: 9 },
{ value: 10 }
])

// Skip the first 5 documents
db.numbers.aggregate([
{ $skip: 5 }
])

Output:

javascript
{ "_id": ObjectId("..."), "value": 6 }
{ "_id": ObjectId("..."), "value": 7 }
{ "_id": ObjectId("..."), "value": 8 }
{ "_id": ObjectId("..."), "value": 9 }
{ "_id": ObjectId("..."), "value": 10 }

Common Use Case: Pagination

One of the most common applications of the $skip stage is implementing pagination in web applications. By combining $skip with $limit, you can create an efficient pagination system:

javascript
const pageSize = 10;  // Number of documents per page
const pageNumber = 3; // Page number (1-based)
const skip = pageSize * (pageNumber - 1);

db.products.aggregate([
// Any filtering or matching can go here
{ $skip: skip },
{ $limit: pageSize }
])

This will retrieve the third page of products, skipping the first 20 documents (10 per page × (3-1) pages) and returning the next 10 documents.

Best Practices and Performance Considerations

While $skip is useful, there are some important considerations to keep in mind:

  1. Performance Impact: Using large values for $skip can be inefficient as MongoDB must still process all the skipped documents. As $skip values increase, performance may degrade.

  2. **Position with sort:Whenusingsort**: When using `skipandlimitwithlimit` with `sort, always place the sortstagebeforethesort` stage before the `skipand$limit` stages to ensure consistent results.

  3. Alternative for Large Datasets: For large datasets, consider cursor-based pagination instead of using $skip, which involves using field values from the last document of the current page to query the next page.

javascript
// Instead of skip-based pagination for large collections:
db.products.aggregate([
{ $match: { price: { $gt: lastSeenPrice } } },
{ $sort: { price: 1 } },
{ $limit: pageSize }
])

Advanced Example: Report Generation with Skip

Imagine you're generating reports for an e-commerce platform and want to exclude the first week of data from a month:

javascript
db.sales.aggregate([
{ $match: {
date: {
$gte: ISODate("2023-07-01"),
$lte: ISODate("2023-07-31")
}
}
},
{ $sort: { date: 1 } },
// Skip the first 7 days of sales
{ $skip: 7 },
{ $group: {
_id: null,
totalSales: { $sum: "$amount" },
averageSale: { $avg: "$amount" },
count: { $sum: 1 }
}
}
])

This aggregation pipeline:

  1. Matches sales from July 2023
  2. Sorts them by date
  3. Skips the first week (7 days) of data
  4. Groups the remaining data to calculate total sales, average sale, and count

Combining with Other Stages

$skip becomes even more powerful when combined with other aggregation stages. Here's how it fits into a more complex pipeline:

javascript
db.orders.aggregate([
// Stage 1: Filter orders by status
{ $match: { status: "completed" } },

// Stage 2: Sort by order date
{ $sort: { orderDate: -1 } },

// Stage 3: Skip the first 50 orders
{ $skip: 50 },

// Stage 4: Limit to 10 orders
{ $limit: 10 },

// Stage 5: Project only necessary fields
{ $project: {
orderId: 1,
customerName: 1,
totalAmount: 1,
shippingAddress: 1,
orderDate: 1,
_id: 0
}
}
])

This pipeline retrieves the 51st to 60th most recent completed orders with only the specified fields.

A Visual Representation of $skip

Common Mistakes to Avoid

  1. Skipping a negative number of documents: The $skip value must be a positive integer.
javascript
// This will throw an error
db.collection.aggregate([
{ $skip: -5 }
])
  1. Using $skip without $sort for pagination: Without sorting, the documents skipped may not be consistent between queries.

  2. Placing $skip after $limit: This sequence won't work as expected since $limit would restrict documents before $skip can process them.

javascript
// Incorrect order
db.collection.aggregate([
{ $limit: 10 },
{ $skip: 5 } // This will only get 5 documents (10-5)
])

// Correct order
db.collection.aggregate([
{ $skip: 5 },
{ $limit: 10 } // This will skip 5 and get the next 10
])

Real-World Application: Blog Post Pagination

Let's implement a complete blog post pagination system:

javascript
function getBlogPosts(page, postsPerPage, category = null) {
const pipeline = [];

// Apply category filter if provided
if (category) {
pipeline.push({ $match: { categories: category } });
}

// Sort by publication date, newest first
pipeline.push({ $sort: { publishedDate: -1 } });

// Apply pagination
pipeline.push(
{ $skip: (page - 1) * postsPerPage },
{ $limit: postsPerPage }
);

// Project only necessary fields
pipeline.push({
$project: {
title: 1,
slug: 1,
excerpt: 1,
author: 1,
publishedDate: 1,
readTimeMinutes: 1,
tags: 1,
_id: 0
}
});

return db.blogPosts.aggregate(pipeline);
}

// Usage:
// getBlogPosts(2, 10, "technology")

This function will:

  1. Optionally filter posts by category
  2. Sort by publication date (newest first)
  3. Implement pagination using $skip and $limit
  4. Return only the fields needed for the blog listing

Summary

The $skip stage in MongoDB's aggregation pipeline is an essential tool for:

  • Implementing pagination in applications
  • Excluding a specific number of documents from processing
  • Creating batch processing jobs
  • Working with data subsets

While $skip is powerful and easy to use, remember that skipping large numbers of documents can impact performance. For optimal results, especially with large datasets, consider using cursor-based pagination or other techniques that utilize indexes more effectively.

Further Learning

To practice your understanding of the $skip stage, try these exercises:

  1. Create a collection with 100 documents and practice retrieving different pages using $skip and $limit
  2. Implement a cursor-based pagination system and compare its performance with skip-based pagination
  3. Use $skip in a complex aggregation pipeline that includes $match, $sort, $group, and $project stages

Additional Resources

With these tools and techniques, you can efficiently control which documents flow through your aggregation pipelines and build powerful, performant MongoDB applications.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)