Aggregations

The aggregation framework in MongoDB allows you to define a series (called a pipeline) of operations (called stages) against the data in a collection. These pipelines can be used for analytics or they can be used to convert your data from one form to another. This guide will not go in to the details of how aggregation works, however. The official MongoDB documentation has extensive tutorials on such details. Rather, this guide will focus on the Morphia API. The examples shown here are taken from the tests in Morphia itself.

Writing an aggregation pipeline starts just like writing a standard query. As with querying, we start with the Datastore:

datastore.aggregate(Book.class).pipeline(
    group(id("author"))
      .field("books", push("$title")),
    sort()
       .ascending("name"))
    .execute(Author.class);

aggregate() takes a Class literal. This lets Morphia know which collection to perform this aggregation against. Because of the transformational operations available in the aggregation pipeline, Morphia can not validate as much as it can with querying so care will need to be taken to ensure document fields actually exist when referencing them in your pipeline.

The Pipeline

Aggregation pipelines are comprised of a series stages. Our example here with the group() stage. This method is the Morphia equivalent of the $group operator. This stage, as the name suggests, groups together documents based on various criteria. In this example, we are defining the group ID as the author field which will collect all the books by the author together.

The next step defines a new field, books comprised of the titles of the books found in each document. (For reference, this example is the Morphia equivalent of an example found in the aggregation tutorials.) This results in a series of documents that look like this:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

Executing the Pipeline

Once your pipeline is complete, you can execute it via the execute() method. This method optionally takes a Class reference for the target type of your aggregation. Given this type, Morphia will map each document in the results and return it. Additionally, you can also include some options to execute(). We can use the various options on the AggregationOptions class to configure how we want the pipeline to execute.

$out

Depending your use case, you might not want to return the results of your aggregation but simply output them to another collection. That’s where $out comes in. $out is an operator that allows the results of a pipeline to be stored in to a named collection. This collection can not be sharded or a capped collection, however. This collection, if it does not exist, will be created upon execution of the pipeline.

Any existing data in the collection will be replaced by the output of the aggregation.

An example aggregation using the $out stage looks like this:

datastore.aggregate(Book.class).pipeline(
    group(id("$author"))
        .field("books",
            push().single("$title")),
    out(Author.class))
    .execute();

You’ll note that out() is the final stage. $out and $merge must be the final stage in our pipeline. We pass a type to out() that reflects the collection we want to write our output to. Morphia will use the type-to-collection mapping you’ve defined when mapping your entities to determine that collection name. You may also pass a String with the collection name as well if the target collection does not correspond to a mapped entity.

$merge

$merge is a very similar option with a some major differences. The biggest difference is that $merge can write to existing collections without destroying the existing documents. $out would overwrite any existing documents and replace them with the results of the pipeline. $merge, however, can deposit these new results alongside existing data and update existing data.

Using $merge might look something like this:

aggregation.pipeline(
    group(id()
        .field("fiscal_year", "$fiscal_year")
        .field("dept", "$dept"))
        .field("salaries", sum("$salary")),
    merge("reporting", "budgets")
        .on("_id")
        .whenMatched(REPLACE)
        .whenNotMatched(INSERT))
    .execute();

Much like out() above, for merge() we pass in a collection information but here we are also passing in which database to find/create the collection in. A merge is slightly more complex and so has more options to consider. In this example, we’re merging in to the budgets collection in the reporting database and merging any existing documents based on the`_id` as denoted using the on() method. Because there may be existing data in the collection, we need to instruct the operation how to handle those cases. In this example, when documents matching we’re choosing to replace them and when they don’t we’re instructing the operation to insert the new documents in to the collection. Other options are defined on com.mongodb.client.model.MergeOptions type defined by the Java driver.

Supported Operators

Every effort is made to provide 100% coverage of all the operators offered by MongoDB. A select handful of operators have been excluded for reasons of suitability in Morphia. In short, some operators just don’t make sense in Morphia. Below is listed all the currently supported operators. If an operator is missing and you think it should be included, please file an issue for that operator.

Table 1. Stages
Operator Docs

$addFields

AddFields#addFields()

$bucket

Bucket#bucket()

$bucketAuto

AutoBucket#autoBucket()

$changeStream

ChangeStream#changeStream()

$collStats

CollectionStats#collStats()

$count

Count#count(String)

$currentOp

CurrentOp#currentOp()

$densify

Densify#densify(String,Range)

$documents

Documents#documents(DocumentExpression…​)

$facet

Facet#facet()

$fill

Fill#fill()

$geoNear

$graphLookup

$group

$indexStats

IndexStats#indexStats()

$limit

Limit#limit(long)

$lookup

$match

Match#match(Filter…​)

$merge

$out

$planCacheStats

PlanCacheStats#planCacheStats()

$project

Projection#project()

$redact

Redact#redact(Expression)

$replaceRoot

ReplaceRoot#replaceRoot()

$replaceWith

ReplaceWith#replaceWith()

$sample

Sample#sample(long)

$set

Set#set()

$setWindowFields

SetWindowFields#setWindowFields()

$skip

Skip#skip(long)

$sort

Sort#sort()

$sortByCount

SortByCount#sortByCount(Object)

$unionWith

$unset

Unset#unset(String,String…​)

$unwind

Unwind#unwind(String)

Table 2. Expressions
Operator Docs

$abs

MathExpressions#abs(Object)

$accumulator

AccumulatorExpressions#accumulator(String,String,List,String)

$acos

TrigonometryExpressions#acos(Object)

$acosh

TrigonometryExpressions#acosh(Object)

$add

MathExpressions#add(Object,Object…​)

$addToSet

AccumulatorExpressions#addToSet(Object)

$allElementsTrue

SetExpressions#allElementsTrue(Object,Object…​)

$and

$anyElementTrue

SetExpressions#anyElementTrue(Object,Object…​)

$arrayElemAt

ArrayExpressions#elementAt(Object,Object)

$arrayToObject

ArrayExpressions#arrayToObject(Object)

$asin

TrigonometryExpressions#asin(Object)

$asinh

TrigonometryExpressions#asinh(Object)

$atan

TrigonometryExpressions#atan(Object)

$atan2

TrigonometryExpressions#atan2(Object,Object)

$atanh

TrigonometryExpressions#atanh(Object)

$avg

AccumulatorExpressions#avg(Object,Object…​)

$binarySize

DataSizeExpressions#binarySize(Object)

$bitAnd

MathExpressions#bitAnd(Object,Object)

$bitNot

MathExpressions#bitNot(Object)

$bitOr

MathExpressions#bitOr(Object,Object)

$bitXor

MathExpressions#bitXor(Object,Object)

$bottom

AccumulatorExpressions#bottom(Object,Sort…​)

$bottomN

AccumulatorExpressions#bottomN(Object,Object,Sort…​)

$bsonSize

DataSizeExpressions#bsonSize(Object)

$ceil

MathExpressions#ceil(Object)

$cmp

ComparisonExpressions#cmp(Object,Object)

$concat

StringExpressions#concat(Object,Object…​)

$concatArrays

ArrayExpressions#concatArrays(Object,Object…​)

$cond

ConditionalExpressions#condition(Object,Object,Object)

$convert

TypeExpressions#convert(Object,ConvertType)

$cos

TrigonometryExpressions#cos(Object)

$cosh

TrigonometryExpressions#cosh(Object)

$count

AccumulatorExpressions#count()

$covariancePop

WindowExpressions#covariancePop(Object,Object)

$covarianceSamp

WindowExpressions#covarianceSamp(Object,Object)

$dateAdd

DateExpressions#dateAdd(Object,long,TimeUnit)

$dateDiff

DateExpressions#dateDiff(Object,Object,TimeUnit)

$dateFromParts

DateExpressions#dateFromParts()

$dateFromString

DateExpressions#dateFromString()

$dateSubtract

DateExpressions#dateSubtract(Object,long,TimeUnit)

$dateToParts

DateExpressions#dateToParts(Object)

$dateToString

DateExpressions#dateToString()

$dateTrunc

DateExpressions#dateTrunc(Object,TimeUnit)

$dayOfMonth

DateExpressions#dayOfMonth(Object)

$dayOfWeek

DateExpressions#dayOfWeek(Object)

$dayOfYear

DateExpressions#dayOfYear(Object)

$degreesToRadians

TrigonometryExpressions#degreesToRadians(Object)

$denseRank

WindowExpressions#denseRank()

$derivative

WindowExpressions#derivative(Object)

$divide

MathExpressions#divide(Object,Object)

$documentNumber

WindowExpressions#documentNumber()

$eq

ComparisonExpressions#eq(Object,Object)

$exp

MathExpressions#exp(Object)

$expMovingAvg

$filter

Expressions#filter(Object,Object)

$first

AccumulatorExpressions#first(Object)

$firstN

AccumulatorExpressions#firstN(Object,Object)

$floor

MathExpressions#floor(Object)

$function

AccumulatorExpressions#function(String,Object…​)

$getField

$gt

ComparisonExpressions#gt(Object,Object)

$gte

ComparisonExpressions#gte(Object,Object)

$hour

DateExpressions#hour(Object)

$ifNull

ConditionalExpressions#ifNull()

$in

ArrayExpressions#in(Object,Object)

$indexOfArray

ArrayExpressions#indexOfArray(Object,Object)

$indexOfBytes

StringExpressions#indexOfBytes(Object,Object)

$indexOfCP

StringExpressions#indexOfCP(Object,Object)

$integral

WindowExpressions#integral(Object)

$isArray

ArrayExpressions#isArray(Object)

$isNumber

TypeExpressions#isNumber(Object)

$isoDayOfWeek

DateExpressions#isoDayOfWeek(Object)

$isoWeek

DateExpressions#isoWeek(Object)

$isoWeekYear

DateExpressions#isoWeekYear(Object)

$last

AccumulatorExpressions#last(Object)

$lastN

AccumulatorExpressions#lastN(Object,Object)

$let

VariableExpressions#let(Expression)

$linearFill

WindowExpressions#linearFill(Object)

$literal

Expressions#literal(Object)

$ln

MathExpressions#ln(Object)

$locf

WindowExpressions#locf(Object)

$log

MathExpressions#log(Object,Object)

$log10

MathExpressions#log10(Object)

$lt

ComparisonExpressions#lt(Object,Object)

$lte

ComparisonExpressions#lte(Object,Object)

$ltrim

StringExpressions#ltrim(Object)

$map

ArrayExpressions#map(Object,Object)

$max

AccumulatorExpressions#max(Object,Object…​)

$maxN

AccumulatorExpressions#maxN(Object,Object)

$median

MathExpressions#median(Object)

$mergeObjects

ObjectExpressions#mergeObjects()

$meta

$millisecond

DateExpressions#milliseconds(Object)

$min

AccumulatorExpressions#min(Object,Object…​)

$minN

AccumulatorExpressions#minN(Object,Object)

$minute

DateExpressions#minute(Object)

$mod

MathExpressions#mod(Object,Object)

$month

DateExpressions#month(Object)

$multiply

MathExpressions#multiply(Object,Object…​)

$ne

ComparisonExpressions#ne(Object,Object)

$not

BooleanExpressions#not(Object)

$objectToArray

ArrayExpressions#objectToArray(Object)

$or

$percentile

$pow

MathExpressions#pow(Object,Object)

$push

$radiansToDegrees

TrigonometryExpressions#radiansToDegrees(Object)

$rand

Miscellaneous#rand()

$range

$rank

WindowExpressions#rank()

$reduce

ArrayExpressions#reduce(Object,Object,Object)

$regexFind

StringExpressions#regexFind(Object)

$regexFindAll

StringExpressions#regexFindAll(Object)

$regexMatch

StringExpressions#regexMatch(Object)

$replaceAll

StringExpressions#replaceAll(Object,Object,Object)

$replaceOne

StringExpressions#replaceOne(Object,Object,Object)

$reverseArray

ArrayExpressions#reverseArray(Object)

$round

MathExpressions#round(Object,Object)

$rtrim

StringExpressions#rtrim(Object)

$sampleRate

Miscellaneous#sampleRate(double)

$second

DateExpressions#second(Object)

$setDifference

SetExpressions#setDifference(Object,Object)

$setEquals

SetExpressions#setEquals(Object,Object…​)

$setField

Miscellaneous#setField(Object,Object,Object)

$setIntersection

SetExpressions#setIntersection(Object,Object…​)

$setIsSubset

SetExpressions#setIsSubset(Object,Object)

$setUnion

SetExpressions#setUnion(Object,Object…​)

$shift

WindowExpressions#shift(Object,long,Object)

$sin

TrigonometryExpressions#sin(Object)

$sinh

TrigonometryExpressions#sinh(Object)

$size

ArrayExpressions#size(Object)

$slice

ArrayExpressions#slice(Object,int)

$sortArray

ArrayExpressions#sortArray(Object,Sort…​)

$split

StringExpressions#split(Object,Object)

$sqrt

MathExpressions#sqrt(Object)

$stdDevPop

WindowExpressions#stdDevPop(Object,Object…​)

$stdDevSamp

WindowExpressions#stdDevSamp(Object,Object…​)

$strLenBytes

StringExpressions#strLenBytes(Object)

$strLenCP

StringExpressions#strLenCP(Object)

$strcasecmp

StringExpressions#strcasecmp(Object,Object)

$substrBytes

$substrCP

StringExpressions#substrCP(Object,Object,Object)

$subtract

MathExpressions#subtract(Object,Object)

$sum

AccumulatorExpressions#sum(Object,Object…​)

$switch

ConditionalExpressions#switchExpression()

$tan

TrigonometryExpressions#tan(Object)

$tanh

TrigonometryExpressions#tanh(Object)

$toBool

TypeExpressions#toBool(Object)

$toDate

DateExpressions#toDate(Object)

$toDecimal

TypeExpressions#toDecimal(Object)

$toDouble

TypeExpressions#toDouble(Object)

$toInt

TypeExpressions#toInt(Object)

$toLong

TypeExpressions#toLong(Object)

$toLower

StringExpressions#toLower(Object)

$toObjectId

TypeExpressions#toObjectId(Object)

$toString

$toUpper

StringExpressions#toUpper(Object)

$top

AccumulatorExpressions#top(Object,Sort…​)

$topN

AccumulatorExpressions#topN(Object,Object,Sort…​)

$trim

StringExpressions#trim(Object)

$trunc

$tsIncrement

DateExpressions#tsIncrement(Object)

$tsSecond

DateExpressions#tsSecond(Object)

$type

TypeExpressions#type(Object)

$unsetField

Miscellaneous#unsetField(Object,Object)

$week

DateExpressions#week(Object)

$year

DateExpressions#year(Object)

$zip

ArrayExpressions#zip(Object…​)