$regexFindAll (聚合)

在本页

定义

语法
行为
示例

定义

$regexFindAll: 提供聚合表达式中的正则表达式（regex）模式匹配能力。该运算符返回包含每个匹配信息的文档数组。如果没有找到匹配项，则返回空数组。

语法

语法$regexFindAll运算符的语法如下：

{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }

字段

描述

input

要应用正则表达式模式的字符串。可以是字符串或任何有效的表达式，该表达式解析为字符串。

regex

要应用的正则表达式模式。可以是任何有效的表达式，该表达式解析为字符串或正则表达式模式 /<pattern>/。当使用正则表达式 /<pattern>/ 时，您还可以指定正则表达式选项 i 和 m（但不能使用 s 或 x 选项）

"pattern"
/<pattern>/
/<pattern>/<options>

或者，您也可以使用 options 字段指定正则表达式选项。要指定 s 或 x 选项，您必须使用 options 字段。

您不能同时在 regex 和 options 字段中指定选项。

options

可选。以下 <options> 可用于正则表达式。

您不能同时在 regex 和 options 字段中指定选项。

选项	描述
`i`	不区分大小写匹配，可以同时匹配大写和小写。您可以在 `options` 字段中指定此选项，或者作为正则表达式字段的一部分。
`m`	对于包含锚点（例如，`^` 用于开头，`$` 用于结尾）的模式，对于多行值的字符串，在每行的开头或结尾进行匹配。如果没有此选项，这些锚点将在字符串的开头或结尾进行匹配。如果模式不包含锚点或字符串值没有换行符（例如，`\n`），则 `m` 选项不起作用。
`x`	扩展功能，忽略模式中的所有空白字符，除非它们被转义或包含在字符类中。此外，它还会忽略在未转义的井号（`#`）字符和下一个新行之间的字符，这样您就可以在复杂模式中包含注释。这仅适用于数据字符；空白字符永远不能出现在模式中的特殊字符序列中。 `x` 选项不影响 VT 字符（即代码 11）的处理。您只能在 `options` 字段中指定此选项。
`s`	允许点字符（即 `.`）匹配所有字符，包括换行符。您只能在 `options` 字段中指定此选项。

此操作符返回一个数组

如果操作符未找到匹配项，则操作符返回一个空数组。
如果操作符找到匹配项，则操作符返回一个包含以下信息的文档数组
- 匹配的字符串在输入中
- 匹配字符串在码点中索引（非字节索引），并且
- 一个数组，对应于匹配字符串捕获的组。捕获组在正则表达式模式中使用未转义的圆括号 () 指定。
```
[ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]
```

提示

另请参阅

行为

PCRE 库

从版本6.1开始，MongoDB使用PCRE2（Perl兼容正则表达式）库来实现正则表达式模式匹配。要了解更多关于PCRE2的信息，请参阅PCRE文档。

`$regexFindAll`和校对

$regexFindAll忽略了集合指定的校对，db.collection.aggregate()以及如果使用的话，索引。

例如，创建一个具有校对强度1（即仅比较基本字符，忽略其他差异，如大小写和重音符号）的示例集合

db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )

插入以下文档

db.myColl.insertMany([
   { _id: 1, category: "café" },
   { _id: 2, category: "cafe" },
   { _id: 3, category: "cafE" }
])

使用集合的校对，以下操作执行不区分大小写和不区分重音符号的匹配

db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )

操作返回以下3个文档

{ "_id" : 1, "category" : "café" }
{ "_id" : 2, "category" : "cafe" }
{ "_id" : 3, "category" : "cafE" }

但是，聚合表达式$regexFind忽略校对；也就是说，以下正则表达式模式匹配示例是区分大小写和重音符号的

db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ }  } } } ] )
db.myColl.aggregate(
   [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }       // Ignored in the $regexFindAll
)

两个操作都返回以下内容

{ "_id" : 1, "category" : "café", "results" : [ ] }
{ "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] }
{ "_id" : 3, "category" : "cafE", "results" : [ ] }

要执行不区分大小写的正则表达式模式匹配，请使用i选项代替。有关示例，请参阅i选项。

`捕捉` 输出行为

如果您的正则表达式模式包含捕获组，并且模式在输入中找到匹配项，则结果中的捕捉数组对应于由匹配字符串捕获的组。捕获组使用未转义括号()在正则表达式模式中指定。捕捉数组的长度等于模式中捕获组的数量，数组的顺序与捕获组出现的顺序相匹配。

创建一个名为contacts的示例集合，包含以下文档

db.contacts.insertMany([
  { "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" },
  { "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" },
  { "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" },
  { "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" },
  { "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" }
])

以下管道将正则表达式模式/(C(ar)*)ol/应用于fname字段

db.contacts.aggregate([
  {
    $project: {
      returnObject: {
        $regexFindAll: { input: "$fname", regex: /(C(ar)*)ol/ }
      }
    }
  }
])

正则表达式模式在fname值Carol和Colleen中找到匹配项

{ "_id" : 1, "returnObject" : [ { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } ] }
{ "_id" : 2, "returnObject" : [ ] }
{ "_id" : 3, "returnObject" : [ ] }
{ "_id" : 4, "returnObject" : [ { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } ] }
{ "_id" : 5, "returnObject" : [ ] }

模式包含捕获组(C(ar)*)，其中包含嵌套组(ar)。捕捉数组中的元素对应于两个捕获组。如果一个匹配文档没有被组捕获（例如，Colleen和组(ar)），则$regexFindAll将组替换为null占位符。

如前例所示，捕捉数组包含每个捕获组的一个元素（使用null表示未捕获）。考虑以下示例，该示例通过在phone字段上应用捕获组的逻辑或来搜索具有纽约市区号的电话号码。每个组代表一个纽约市区号

db.contacts.aggregate([
  {
    $project: {
      nycContacts: {
        $regexFindAll: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ }
      }
    }
  }
])

对于与正则表达式模式匹配的文档，捕捉数组包括匹配的捕获组，并用null替换任何非捕获组

{ "_id" : 1, "nycContacts" : [ { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } ] }
{ "_id" : 2, "nycContacts" : [ { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } ] }
{ "_id" : 3, "nycContacts" : [ ] }
{ "_id" : 4, "nycContacts" : [ ] }
{ "_id" : 5, "nycContacts" : [ { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } ] }

示例

`$regexFindAll`及其选项

为了说明本例中讨论的$regexFindAll操作符的行为，创建一个具有以下文档的示例集合products

db.products.insertMany([
   { _id: 1, description: "Single LINE description." },
   { _id: 2, description: "First lines\nsecond line" },
   { _id: 3, description: "Many spaces before     line" },
   { _id: 4, description: "Multiple\nline descriptions" },
   { _id: 5, description: "anchors, links and hyperlinks" },
   { _id: 6, description: "métier work vocation" }
])

默认情况下，$regexFindAll执行大小写敏感的匹配。例如，以下聚合操作在description字段上执行大小写敏感的$regexFindAll。正则表达式模式/line/未指定任何分组

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } }
])

操作返回以下内容

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] }
] }
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{
   "_id" : 6,
   "description" : "métier work vocation",
   "returnObject" : [ ]
}

以下正则表达式模式/lin(e|k)/在模式中指定了一个分组(e|k)

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } }
])

操作返回以下内容

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject": [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{
   "_id" : 6,
   "description" : "métier work vocation",
   "returnObject" : [ ]
}

在返回选项中，idx字段是码点的索引，而不是字节索引。为了说明，请考虑以下使用正则表达式模式/tier/的示例。

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } }
])

操作返回以下结果，其中只有最后一个记录与模式匹配，并且返回的idx是2（而不是使用字节索引时的3）

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : [ ] }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] }
{ "_id" : 6, "description" : "métier work vocation",
             "returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }

`i`选项

注意

您不能同时在 regex 和 options 字段中指定选项。

要执行不区分大小写的模式匹配，请将i选项作为正则表达式字段或选项字段的一部分包括在内

// Specify i as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/i } }
// Specify i in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "i" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "i" } }

例如，以下聚合操作在description字段上执行不区分大小写的$regexFindAll。正则表达式模式/line/未指定任何分组

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } }
])

操作返回以下文档

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

`m`选项

注意

您不能同时在 regex 和 options 字段中指定选项。

为了匹配多行字符串中每行的指定锚点（例如 ^，$），在正则表达式字段或 options 字段中包含 m 选项

// Specify m as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/m } }
// Specify m in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "m" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "m" } }

以下示例同时包含 i 和 m 选项，以匹配以字母 s 或 S 开头的行为多行字符串

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } }
])

操作返回以下内容

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

`x` 选项

注意

您不能同时在 regex 和 options 字段中指定选项。

为了在模式中忽略所有未转义的白空字符和注释（由未转义的井号 # 字符和下一个换行符表示），在 options 字段中包含 s 选项

// Specify x in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "x" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "x" } }

以下示例包含 x 选项以跳过未转义的白空字符和注释

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
])

操作返回以下内容

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

`s` 选项

注意

您不能同时在 regex 和 options 字段中指定选项。

为了允许模式中的点字符（即 .）匹配所有字符，包括换行符，在 options 字段中包含 s 选项

// Specify s in the options field
{ $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } }
{ $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }

以下示例包含 s 选项，允许点字符（即 .）匹配所有字符，包括换行符，以及 i 选项以执行不区分大小写的匹配

db.products.aggregate([
   { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si"  } } } }
])

操作返回以下内容

{
   "_id" : 1,
   "description" : "Single LINE description.",
   "returnObject" : [ ]
}
{
   "_id" : 2,
   "description" : "First lines\nsecond line",
   "returnObject" : [ ]
}
{
   "_id" : 3,
   "description" : "Many spaces before     line",
   "returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 4,
   "description" : "Multiple\nline descriptions",
   "returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ]
}
{
   "_id" : 5,
   "description" : "anchors, links and hyperlinks",
   "returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

使用 `$regexFindAll` 从字符串中解析电子邮件

创建一个包含以下文档的示例集合 feedback

db.feedback.insertMany([
   { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com"  },
   { "_id" : 2, comment: "I wanted to concatenate a string" },
   { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
   { "_id" : 4, comment: "It's just me. I'm testing.  fred@MongoDB.com" }
])

以下聚合使用 $regexFindAll 从 comment 字段提取所有电子邮件（不区分大小写）。

db.feedback.aggregate( [
    { $addFields: {
       "email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
    } },
    { $set: { email: "$email.match"} }
] )

第一阶段

该阶段使用 $addFields 阶段在文档中添加一个新的字段 email。新字段是一个包含对 comment 字段执行 $regexFindAll 结果的数组。

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] }

第二阶段

该阶段使用 $set 阶段将 email 数组元素重置为 "email.match" 值。如果 email 的当前值为 null，则将 email 的新值设置为 null。

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }

使用捕获分组来解析用户名

创建一个包含以下文档的示例集合 feedback

db.feedback.insertMany([
   { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com"  },
   { "_id" : 2, comment: "I wanted to concatenate a string" },
   { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
   { "_id" : 4, comment: "It's just me. I'm testing.  fred@MongoDB.com" }
])

为了回复反馈，假设您想解析电子邮件地址的本地部分作为问候语中的名称。使用 $regexFindAll 结果中返回的 captured 字段，您可以解析出每个电子邮件地址的本地部分

db.feedback.aggregate( [
    { $addFields: {
       "names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } },
    } },
    { $set: { names: { $reduce: { input:  "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } }
] )

第一阶段

该阶段使用 $addFields 阶段在文档中添加一个新的字段 names。新字段包含对 comment 字段执行 $regexFindAll 的结果

{
   "_id" : 1,
   "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
   "names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
   "_id" : 3,
   "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
   "names" : [
      { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] },
      { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] }
    ]
}
{
   "_id" : 4,
   "comment" : "It's just me. I'm testing.  fred@MongoDB.com",
   "names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ]
}

第二阶段

该阶段使用 $set 阶段和 $reduce 操作符来重置 names 为包含 "$names.captures" 元素的数组。

{
   "_id" : 1,
   "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
   "names" : [ "aunt.arc.tica" ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
   "_id" : 3,
   "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
   "names" : [ "cam", "c.dia" ]
}
{
   "_id" : 4,
   "comment" : "It's just me. I'm testing.  fred@MongoDB.com",
   "names" : [ "fred" ]
}

提示

另请参阅

有关 captures 数组的行为以及更多示例信息，请参阅 captures 输出行为。

$regexFind

下一步

$regexMatch

定义

语法

返回

提示

另请参阅

行为

PCRE 库

$regexFindAll和校对

捕捉 输出行为

示例

$regexFindAll及其选项

i选项

注意

m选项

注意

x 选项

注意

s 选项

注意

使用 $regexFindAll 从字符串中解析电子邮件

使用捕获分组来解析用户名

提示

另请参阅

`$regexFindAll`和校对

`捕捉` 输出行为

`$regexFindAll`及其选项

`i`选项

`m`选项

`x` 选项

`s` 选项

使用 `$regexFindAll` 从字符串中解析电子邮件