Skip to content

Basic Aggregation

$match

{ $match: { <query> } }
  • $match uses the MongoDB query language query operators to express queries

  • A $match stage may contain a $textquery operator, but it must be the first stage in a pipeline

  • $match should come early in an aggregation pipeline
  • cannot use $where with $match
  • $match uses the same query syntax as find

$project

{ $project: { <specification(s)> } }
  • Once we specify one field to retain, we must specify all fields we want to retain.(except _id)
  • Beyond simply removing and retaining fields, $project lets us add new fields
  • $projectcan be used as many times as required within an Aggregation pipeline
  • $project can be used to reassign values to existing field names and to derive entirely new fields

Core Aggregation

$group

  • _id is where to specify what incoming documents should be grouped on
  • Can use all accumulator expressions within $group
  • $group can be used multiple times within a pipeline
  • It may be necessary to sanitize incoming data

$unwind

{ $unwind: <field path> }
  • $unwind only works on array values
  • There are two forms for unwind, short form and long form
  • Using unwind on large collections with big documents may lead to performance issue

$lookup

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}
  • The from collection cannot be shared
  • The from collection must be in the same database
  • The values in localField and foreignField are matched on equality
  • as can be any names, but if it exists in the working document that field will be overwritten

$bucket

  • Must always specify at least 2 values to boundaries
  • boundaries must all be of the same general type
  • count is inserted by default with no output, but removed when output is specified

$bucketAuto

  • Cardinality of groupBy expression may impact even distribution and number of buckets
  • Specifying a granularity requires the expression to groupBy to resolve to a numeric value

$sortByCount

  • Is equivalent to a group stage to count occurence, and then sorting in descending order

Miscellaneous Aggregation

$redact

  • $$KEEP and $$PRUNE automatically apply to all levels below the evaluated level
  • $$DESCEND retains the current level and evaluates the next level down
  • $redact is not for restricting access to a collection

$out

  • Will create a new collection or overwrite an existing collection if specified
  • Honors indexes on existing collections
  • Will not create or overwrite data if pipeline errors
  • Create collections in the same databases as the source collection
  • $out must be the last stage in a pipeline, and is not allowed within a $facet stage

$merge

$merge allows us to :

  • output the documents from an Aggregation Pipeline to a collection in the same or different database.

  • output the documents from an Aggregation Pipeline to a sharded collection.

  • merge documents from an Aggregation and the target collection - we can even specify how we want these documents to be merged.
{ $merge: {
     into: <collection> -or- { db: <db>, coll: <collection> },
     on: <identifier field> -or- [ <identifier field1>, ...],  // Optional
     let: <variables>,                                         // Optional
     whenMatched: <replace|keepExisting|merge|fail|pipeline>,  // Optional
     whenNotMatched: <insert|discard|fail>                     // Optional
} }

Views

  • Views contain no data themselves. They are created on demand and reflect the data in the source collection
  • Views are read only. Write Operations to Views will error
  • Views have some restrictions:
  • No write operations
  • No mapReduce
  • No index operations
  • No $text
  • No renaming
  • No geoNear or $geoNear
  • Collation restrictions
  • find() operations with projection operators are not permitted:

    • $
    • $elemMatch
    • $slice
    • $meta
  • Horizontal slicing is performed with the $match stage, reducing the number of documents that are returned

  • Vertical slicing is performed with $project or other shaping stage, modifing individual documents

Aggregation Performance and Pipeline Optimization

Performance

"Realtime" Processing:

  • Provide data for applications
  • Query performance is more important

Batch Processing:

  • Provide data for analytics
  • Query performance is less important

Memory Constraints

  • Results are subject to 16MB document limit
  • Use $limit and $project
  • 100MB of RAM per stage
  • Use indexes
  • db.orders.aggregate([...], {allowDiskUse: true})
    • Doesn't work with $graphLookup

Optimization

  • Avoid unnecessary stages, the Aggregation Framework can project fields automatically if final shape of the output document can be determined from initial input.
  • Use accumulator expressions, $map, $reduce and $filter in project before an $unwind if possible
  • Every high order array function can be implemented with $reduce if the provided expressions do not meet your needs.