$bucket (聚合)
定义
注意事项
$bucket
和内存限制
在 $bucket
阶段,内存限制为100兆字节。默认情况下,如果该阶段超过此限制,$bucket
将返回错误。为了允许阶段处理有更多空间,使用allowDiskUse 选项来启用聚合管道阶段写入临时文件。
语法
{ $bucket: { groupBy: <expression>, boundaries: [ <lowerbound1>, <lowerbound2>, ... ], default: <literal>, output: { <output1>: { <$accumulator expression> }, ... <outputN>: { <$accumulator expression> } } } }
$bucket
文档包含以下字段
字段 | 类型 | 描述 | |||
---|---|---|---|---|---|
expression | 一个用于按文档分组的 expression。要指定 字段路径,请使用美元符号 除非 | ||||
数组 | |||||
字面量 | |||||
文档 | 可选。一个文档,用于指定在输出文档中包含的字段,除了
如果没有指定 如果指定了 |
行为
$bucket
需要满足以下条件之一或操作会抛出错误
每个输入文档将
groupBy
表达式解析为boundaries
指定的桶范围之一中的值,或者指定了一个
default
值,用于将groupBy
值超出boundaries
或与boundaries
中的值不同类型的文档进行分类。
如果 groupBy
表达式解析为数组或文档,则 $bucket
使用 $sort
中的比较逻辑来安排输入文档到桶中。
示例
按年份分组并按分组结果过滤
在mongosh
中,创建一个名为artists
的示例集合,包含以下文档
db.artists.insertMany([ { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" }, { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" }, { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" }, { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" }, { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" }, { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" }, { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" }, { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" } ])
以下操作根据year_born
字段将文档分组到桶中,并基于桶中文档的数量进行过滤
db.artists.aggregate( [ // First Stage { $bucket: { groupBy: "$year_born", // Field to group by boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets default: "Other", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artists" : { $push: { "name": { $concat: [ "$first_name", " ", "$last_name"] }, "year_born": "$year_born" } } } } }, // Second Stage { $match: { count: {$gt: 3} } } ] )
- 第一阶段
$bucket
阶段根据year_born
字段将文档分组到桶中。桶的边界如下:[1840, 1850) 上限包含
1840
,下限不包含1850
。[1850, 1860) 上限包含
1850
,下限不包含1860
。[1860, 1870) 上限包含
1860
,下限不包含1870
。[1870, 1880) 上限包含
1870
,下限不包含1880
。如果文档没有包含
year_born
字段或其year_born
字段超出了上述范围,则将其放置在具有_id
值"Other"
的默认桶中。
该阶段包括输出文档以确定要返回的字段
字段描述_id
桶的包含下界。count
桶中文档的数量。artists
包含桶中每个艺术家信息的文档数组。每个文档包含艺术家的
name
,它是由艺术家的first_name
和last_name
连接而成的(即$concat
)。year_born
此阶段将以下文档传递到下一阶段
{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] } { "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 }, { "name" : "Edvard Diriks", "year_born" : 1855 } ] } { "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 }, { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 }, { "name" : "Alfred Maurer", "year_born" : 1868 }, { "name" : "Edvard Munch", "year_born" : 1863 } ] } { "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] } - 第二阶段
$match
阶段会过滤来自前一阶段的输出,仅返回包含超过3个文档的桶。操作返回以下文档
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 }, { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 }, { "name" : "Alfred Maurer", "year_born" : 1868 }, { "name" : "Edvard Munch", "year_born" : 1863 } ] }
使用$bucket与$facet按多个字段分桶
您可以使用$facet
阶段在一个阶段内执行多个$bucket
聚合操作。
在mongosh
中,创建一个名为artwork
的样本集合,包含以下文档
db.artwork.insertMany([ { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }, { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }, { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }, { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }, { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }, { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }, { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 /* No price*/ }, { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") } ])
以下操作在一个$facet
阶段中使用两个$bucket
阶段,创建两个分组,一个按price
分组,另一个按year
分组
db.artwork.aggregate( [ { $facet: { // Top-level $facet stage "price": [ // Output field 1 { $bucket: { groupBy: "$price", // Field to group by boundaries: [ 0, 200, 400 ], // Boundaries for the buckets default: "Other", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artwork" : { $push: { "title": "$title", "price": "$price" } }, "averagePrice": { $avg: "$price" } } } } ], "year": [ // Output field 2 { $bucket: { groupBy: "$year", // Field to group by boundaries: [ 1890, 1910, 1920, 1940 ], // Boundaries for the buckets default: "Unknown", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artwork": { $push: { "title": "$title", "year": "$year" } } } } } ] } } ] )
- 第一个Facet
第一个Facet按
price
对输入文档进行分组。桶的界限如下[0, 200) ,包含下界
0
和排他上界200
。[200, 400) ,包含下界
200
和排他上界400
。"Other",包含没有价格或价格在上述范围内的文档的
默认
桶。
$bucket
阶段包含输出文档以确定要返回的字段字段描述_id
桶的包含下界。count
桶中文档的数量。artwork
包含桶中每个艺术品信息的文档数组。averagePrice
使用$avg
运算符来显示该桶中所有艺术品的价格平均值。- 第二方面
第二方面将输入文档按
年份
分组。桶的边界如下[1890, 1910) 包含下界
1890
和排除上界1910
。[1910, 1920) 包含下界
1910
和排除上界1920
。[1920, 1940) 包含下界
1910
和排除上界1940
。"未知",包含没有年份或年份不在上述范围内的文档的默认桶。
$bucket
阶段包含输出文档以确定要返回的字段字段描述count
桶中文档的数量。artwork
包含桶中每个艺术品信息的文档数组。- 输出
操作返回以下文档
{ "price" : [ // Output of first facet { "_id" : 0, "count" : 4, "artwork" : [ { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") }, { "title" : "Dancer", "price" : NumberDecimal("76.04") }, { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") }, { "title" : "Blue Flower", "price" : NumberDecimal("118.42") } ], "averagePrice" : NumberDecimal("140.4375") }, { "_id" : 200, "count" : 2, "artwork" : [ { "title" : "Melancholy III", "price" : NumberDecimal("280.00") }, { "title" : "Composition VII", "price" : NumberDecimal("385.00") } ], "averagePrice" : NumberDecimal("332.50") }, { // Includes documents without prices and prices greater than 400 "_id" : "Other", "count" : 2, "artwork" : [ { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") }, { "title" : "The Scream" } ], "averagePrice" : NumberDecimal("483.00") } ], "year" : [ // Output of second facet { "_id" : 1890, "count" : 2, "artwork" : [ { "title" : "Melancholy III", "year" : 1902 }, { "title" : "The Scream", "year" : 1893 } ] }, { "_id" : 1910, "count" : 2, "artwork" : [ { "title" : "Composition VII", "year" : 1913 }, { "title" : "Blue Flower", "year" : 1918 } ] }, { "_id" : 1920, "count" : 3, "artwork" : [ { "title" : "The Pillars of Society", "year" : 1926 }, { "title" : "Dancer", "year" : 1925 }, { "title" : "The Persistence of Memory", "year" : 1931 } ] }, { // Includes documents without a year "_id" : "Unknown", "count" : 1, "artwork" : [ { "title" : "The Great Wave off Kanagawa" } ] } ] }