$bucket (聚合)

本页内容

定义

注意事项
语法
行为
示例

定义

$bucket

根据指定的表达式和桶边界将传入的文档分类到组中，称为桶，并为每个桶输出一个文档。每个输出文档都包含一个 _id 字段，其值指定桶的包含下限。输出选项指定每个输出文档包含的字段。

$bucket只为包含至少一个输入文档的桶生成输出文档。

注意事项

`$bucket` 和内存限制

在 $bucket 阶段，内存限制为100兆字节。默认情况下，如果该阶段超过此限制，$bucket 将返回错误。为了允许阶段处理有更多空间，使用allowDiskUse 选项来启用聚合管道阶段写入临时文件。

提示

另请参阅

聚合管道限制

语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

$bucket 文档包含以下字段

字段

类型

描述

groupBy

expression

一个用于按文档分组的 expression。要指定字段路径，请使用美元符号 $ 作为字段名称的前缀，并将其放在引号内。

除非 $bucket 包含一个默认规范，否则每个输入文档必须解析 groupBy 字段路径或表达式，以一个值，该值位于由界限指定的一个范围内。

boundaries

数组

一个基于 groupBy 表达式的值数组，指定每个桶的界限。每个相邻值对用作桶的包含下限和排除上限。您必须指定至少两个界限。

指定的值必须按升序排列，并且所有值都必须是相同的类型。例外情况是，如果值是混合数值类型，例如

[ 10, NumberLong(20), NumberInt(30) ]

例如，一个数组 [0, 5, 10] 创建了两个桶

包含从 0（包含）到 5（不包含）的桶。
包含从 5（包含）到 10（不包含）的桶。

默认值

字面量

可选。一个字面量，用于指定一个额外的桶的 _id，该桶包含所有 groupBy 表达式结果不在边界指定的桶中的文档。

如果没有指定，每个输入文档都必须将 groupBy 表达式解析为 boundaries 指定的桶范围之一中的值，否则操作会抛出错误。

default 值必须小于最低的 boundaries 值，或者大于或等于最高的 boundaries 值。

default 值可以与 boundaries 中的条目不同的类型。

输出

文档

可选。一个文档，用于指定在输出文档中包含的字段，除了 _id 字段。要指定要包含的字段，您必须使用累加器表达式。

<outputfield1>: { <accumulator>: <expression1> },
...
<outputfieldN>: { <accumulator>: <expressionN> }

如果没有指定 output 文档，则操作返回一个 count 字段，包含每个桶中的文档数量。

如果指定了 output 文档，则仅返回文档中指定的字段；即，除非显式包含在 output 文档中，否则不返回 count 字段。

行为

$bucket 需要满足以下条件之一或操作会抛出错误

每个输入文档将 groupBy 表达式解析为 boundaries 指定的桶范围之一中的值，或者
指定了一个 default 值，用于将 groupBy 值超出 boundaries 或与 boundaries 中的值不同类型的文档进行分类。

如果 groupBy 表达式解析为数组或文档，则 $bucket 使用 $sort 中的比较逻辑来安排输入文档到桶中。

示例

按年份分组并按分组结果过滤

在mongosh中，创建一个名为artists的示例集合，包含以下文档

db.artists.insertMany([
  { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
  { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
  { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
  { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
  { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
  { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
  { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
  { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])

以下操作根据year_born字段将文档分组到桶中，并基于桶中文档的数量进行过滤

db.artists.aggregate( [
  // First Stage
  {
    $bucket: {
      groupBy: "$year_born",                        // Field to group by
      boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
      default: "Other",                             // Bucket ID for documents which do not fall into a bucket
      output: {                                     // Output for each bucket
        "count": { $sum: 1 },
        "artists" :
          {
            $push: {
              "name": { $concat: [ "$first_name", " ", "$last_name"] },
              "year_born": "$year_born"
            }
          }
      }
    }
  },
  // Second Stage
  {
    $match: { count: {$gt: 3} }
  }
] )

第一阶段

$bucket阶段根据year_born字段将文档分组到桶中。桶的边界如下：

[1840, 1850) 上限包含 1840，下限不包含 1850。
[1850, 1860) 上限包含 1850，下限不包含 1860。
[1860, 1870) 上限包含 1860，下限不包含 1870。
[1870, 1880) 上限包含 1870，下限不包含 1880。
如果文档没有包含year_born字段或其year_born字段超出了上述范围，则将其放置在具有_id值 "Other" 的默认桶中。

该阶段包括输出文档以确定要返回的字段

字段	描述
`_id`	桶的包含下界。
`count`	桶中文档的数量。
`artists`	包含桶中每个艺术家信息的文档数组。每个文档包含艺术家的 `name`，它是由艺术家的`first_name`和`last_name`连接而成的（即`$concat`）。 `year_born`

此阶段将以下文档传递到下一阶段

{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }

第二阶段

$match阶段会过滤来自前一阶段的输出，仅返回包含超过3个文档的桶。

操作返回以下文档

{ "_id" : 1860, "count" : 4, "artists" :
  [
    { "name" : "Emil Bernard", "year_born" : 1868 },
    { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    { "name" : "Alfred Maurer", "year_born" : 1868 },
    { "name" : "Edvard Munch", "year_born" : 1863 }
  ]
}

使用$bucket与$facet按多个字段分桶

您可以使用$facet阶段在一个阶段内执行多个$bucket聚合操作。

在mongosh中，创建一个名为artwork的样本集合，包含以下文档

db.artwork.insertMany([
  { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
  { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
  { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
  { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
  { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
  { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
  { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
  { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
])

以下操作在一个$facet阶段中使用两个$bucket阶段，创建两个分组，一个按price分组，另一个按year分组

db.artwork.aggregate( [
  {
    $facet: {                               // Top-level $facet stage
      "price": [                            // Output field 1
        {
          $bucket: {
              groupBy: "$price",            // Field to group by
              boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets
              default: "Other",             // Bucket ID for documents which do not fall into a bucket
              output: {                     // Output for each bucket
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } },
                "averagePrice": { $avg: "$price" }
              }
          }
        }
      ],
      "year": [                                      // Output field 2
        {
          $bucket: {
            groupBy: "$year",                        // Field to group by
            boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets
            default: "Unknown",                      // Bucket ID for documents which do not fall into a bucket
            output: {                                // Output for each bucket
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

第一个Facet

第一个Facet按price对输入文档进行分组。桶的界限如下

[0, 200) ，包含下界0和排他上界200。
[200, 400) ，包含下界200和排他上界400。
"Other"，包含没有价格或价格在上述范围内的文档的默认桶。

$bucket阶段包含输出文档以确定要返回的字段

字段	描述
`_id`	桶的包含下界。
`count`	桶中文档的数量。
`artwork`	包含桶中每个艺术品信息的文档数组。
`averagePrice`	使用`$avg`运算符来显示该桶中所有艺术品的价格平均值。

第二方面

第二方面将输入文档按年份分组。桶的边界如下

[1890, 1910) 包含下界1890和排除上界1910。
[1910, 1920) 包含下界1910和排除上界1920。
[1920, 1940) 包含下界1910和排除上界1940。
"未知"，包含没有年份或年份不在上述范围内的文档的默认桶。

$bucket阶段包含输出文档以确定要返回的字段

字段	描述
`count`	桶中文档的数量。
`artwork`	包含桶中每个艺术品信息的文档数组。

输出

操作返回以下文档

{
  "price" : [ // Output of first facet
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
        { "title" : "Dancer", "price" : NumberDecimal("76.04") },
        { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
        { "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
      ],
      "averagePrice" : NumberDecimal("140.4375")
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
        { "title" : "Composition VII", "price" : NumberDecimal("385.00") }
      ],
      "averagePrice" : NumberDecimal("332.50")
    },
    {
      // Includes documents without prices and prices greater than 400
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
        { "title" : "The Scream" }
      ],
      "averagePrice" : NumberDecimal("483.00")
    }
  ],
  "year" : [ // Output of second facet
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "year" : 1902 },
        { "title" : "The Scream", "year" : 1893 }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        { "title" : "Composition VII", "year" : 1913 },
        { "title" : "Blue Flower", "year" : 1918 }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        { "title" : "The Pillars of Society", "year" : 1926 },
        { "title" : "Dancer", "year" : 1925 },
        { "title" : "The Persistence of Memory", "year" : 1931 }
      ]
    },
    {
      // Includes documents without a year
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        { "title" : "The Great Wave off Kanagawa" }
      ]
    }
  ]
}

提示

另请参阅

$bucketAuto

$addFields

$bucketAuto

定义

注意事项

$bucket 和内存限制

提示

另请参阅

语法

行为

示例

按年份分组并按分组结果过滤

使用$bucket与$facet按多个字段分桶

提示

另请参阅

`$bucket` 和内存限制