使用聚合操作转换您的数据

本页内容

概述

聚合操作与查找操作的区别
限制
聚合操作示例
解释聚合操作
更多信息
MongoDB服务器手册
聚合操作教程
API文档

概述

在本指南中，您可以学习如何使用PyMongo执行 聚合操作。

聚合操作处理您的MongoDB集合中的数据，并返回计算结果。MongoDB聚合框架是查询API的一部分，其设计基于数据处理管道的概念。文档进入一个包含一个或多个阶段的管道，该管道将文档转换成聚合结果。

聚合操作类似于汽车工厂。汽车工厂有装配线，其中包含装配站和专用工具来完成特定工作，例如钻孔机和焊接机。原始部件进入工厂，然后装配线将它们转换和组装成成品。

聚合管道 是装配线，聚合阶段 是装配站，操作表达式 是专用工具。

聚合操作与查找操作

您可以使用查找操作执行以下操作

选择要返回的文档
选择要返回的字段
排序结果

您可以使用聚合操作执行以下操作

执行查找操作
重命名字段
计算字段
汇总数据
分组值

限制

使用聚合操作时，请考虑以下限制

返回的文档不得违反BSON文档大小限制为16兆字节。
默认情况下，管道阶段具有100兆字节的内存限制。您可以通过使用allowDiskUse 参数是 aggregate() 方法的。

重要

$graphLookup 异常

$graphLookup 阶段有严格的 100 兆字节内存限制，并忽略 allowDiskUse 参数。

聚合示例

注意

此示例使用了来自 Atlas 示例数据集的 sample_restaurants.restaurants 集合。有关如何创建免费的 MongoDB Atlas 集群并加载示例数据集的信息，请参阅PyMongo 入门.

要执行聚合，请将聚合阶段列表传递给 collection.aggregate() 方法。

以下代码示例生成了纽约每个区面包店的数量。为此，它使用以下阶段的聚合管道

一个 $match 阶段，用于筛选 cuisine 字段包含值 "Bakery" 的文档。
一个 $group 阶段，按 borough 字段对匹配的文档进行分组，并对每个不同的值累计文档数量。

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
    { "$match": { "cuisine": "Bakery" } },
    { "$group": { "_id": "$borough", "count": { "$sum": 1 } } }
]
# Execute the aggregation
aggCursor = collection.aggregate(pipeline)
# Print the aggregated results
for document in aggCursor:
    print(document)

前面的代码示例产生的输出类似于以下

{'_id': 'Bronx', 'count': 71}
{'_id': 'Brooklyn', 'count': 173}
{'_id': 'Missing', 'count': 2}
{'_id': 'Manhattan', 'count': 221}
{'_id': 'Queens', 'count': 204}
{'_id': 'Staten Island', 'count': 20}

解释聚合

要查看MongoDB执行操作的信息，您可以指示MongoDB进行解释。当MongoDB解释一个操作时，它将返回执行计划和性能统计信息。执行计划是MongoDB完成操作的可能方式之一。当您指示MongoDB解释一个操作时，它将返回MongoDB执行的计划和任何被拒绝的执行计划。

要解释一个聚合操作，您可以使用PyMongoExplain库或数据库命令。选择下面的相应选项卡，查看每种方法的示例。

使用pip安装pymongoexplain库，如下例所示

python3 -m pip install pymongoexplain

以下代码示例运行前面的聚合示例并打印MongoDB返回的解释

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { "$match": { "cuisine": "Bakery" } },
   { "$group": { "_id": "$borough", "count": { "$sum": 1 } } }
]
# Execute the operation and print the explanation
result = ExplainableCollection(collection).aggregate(pipeline)
print(result)

...
'winningPlan': {'queryPlan': {'stage': 'GROUP',
                                      'planNodeId': 3,
                                      'inputStage': {'stage': 'COLLSCAN',
                                                     'planNodeId': 1,
                                                     'filter': {'cuisine': {'$eq': 'Bakery'}},
                                                     'direction': 'forward'}},
                                                    ...

以下代码示例运行前面的聚合示例并打印MongoDB返回的解释

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { $match: { cuisine: "Bakery" } },
   { $group: { _id: "$borough", count: { $sum: 1 } } }
]
# Execute the operation and print the explanation
result = database.command("aggregate", "collection", pipeline=pipeline, explain=True)
print(result)

...
'command': {'aggregate': 'collection',
  'pipeline': [{'$match': {'cuisine': 'Bakery'}},
               {'$group': {'_id': '$borough',
                           'count': {'$sum': 1}}}],
  'explain': True,
...

提示

您可以使用Python的pprint模块使解释结果更容易阅读

import pprint
...
pprint.pp(result)

使用聚合操作转换您的数据

概述

聚合操作与查找操作

限制

重要

$graphLookup 异常

聚合示例

注意

解释聚合

提示

更多信息

mongodb服务器手册

聚合教程

API文档