模式示例

本页内容

带有模式的嵌套数据

带有投影的嵌套数据

本指南展示了如何在常见场景中使用PyMongoArrow模式的一些示例。

具有模式的嵌套数据

在进行聚合或查找操作时，您可以使用struct对象为嵌套数据提供模式。与父文档相比，子文档中可能存在冲突的名称。

>>> from pymongo import MongoClient
... from pymongoarrow.api import Schema, find_arrow_all
... from pyarrow import struct, field, int32
... coll = MongoClient().db.coll
... coll.insert_many(
...     [
...         {"start": "string", "prop": {"name": "foo", "start": 0}},
...         {"start": "string", "prop": {"name": "bar", "start": 10}},
...     ]
... )
... arrow_table = find_arrow_all(
...     coll, {}, schema=Schema({"start": str, "prop": struct([field("start", int32())])})
... )
... print(arrow_table)
pyarrow.Table
start: string
prop: struct<start: int32>
  child 0, start: int32
----
start: [["string","string"]]
prop: [
  -- is_valid: all not null
  -- child 0 type: int32
[0,10]]

当使用Pandas和NumPy时，您也可以做同样的事情。

>>> df = find_pandas_all(
...     coll, {}, schema=Schema({"start": str, "prop": struct([field("start", int32())])})
... )
... print(df)
    start           prop
0  string   {'start': 0}
1  string  {'start': 10}

具有投影的嵌套数据

您还可以使用投影在将数据传递给PyMongoArrow之前将其扁平化。以下示例通过一个非常简单的嵌套文档结构说明了如何执行此操作。

>>> df = find_pandas_all(
...     coll,
...     {
...         "prop.start": {
...             "$gte": 0,
...             "$lte": 10,
...         }
...     },
...     projection={"propName": "$prop.name", "propStart": "$prop.start"},
...     schema=Schema({"_id": ObjectIdType(), "propStart": int, "propName": str}),
... )
... print(df)
                                 _id  propStart propName
0  b'c\xec2\x98R(\xc9\x1e@#\xcc\xbb'          0      foo
1  b'c\xec2\x98R(\xc9\x1e@#\xcc\xbc'         10      bar

在执行聚合操作时，您可以使用$project阶段来扁平化字段，如下面的示例所示。

>>> df = aggregate_pandas_all(
...     coll,
...     pipeline=[
...         {"$match": {"prop.start": {"$gte": 0, "$lte": 10}}},
...         {
...             "$project": {
...                 "propStart": "$prop.start",
...                 "propName": "$prop.name",
...             }
...         },
...     ],
... )

数据类型

下一步

常见问题解答