$regexFindAll (聚合)
定义
语法
语法$regexFindAll
运算符的语法如下:
{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }
字段 | 描述 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
要应用正则表达式模式的字符串。可以是字符串或任何有效的表达式,该表达式解析为字符串。 | |||||||||||
可选。以下 您不能同时在
|
返回
此操作符返回一个数组
行为
PCRE 库
从版本6.1开始,MongoDB使用PCRE2(Perl兼容正则表达式)库来实现正则表达式模式匹配。要了解更多关于PCRE2的信息,请参阅PCRE文档。
$regexFindAll
和校对
$regexFindAll
忽略了集合指定的校对,db.collection.aggregate()
以及如果使用的话,索引。
例如,创建一个具有校对强度1
(即仅比较基本字符,忽略其他差异,如大小写和重音符号)的示例集合
db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )
插入以下文档
db.myColl.insertMany([ { _id: 1, category: "café" }, { _id: 2, category: "cafe" }, { _id: 3, category: "cafE" } ])
使用集合的校对,以下操作执行不区分大小写和不区分重音符号的匹配
db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )
操作返回以下3个文档
{ "_id" : 1, "category" : "café" } { "_id" : 2, "category" : "cafe" } { "_id" : 3, "category" : "cafE" }
但是,聚合表达式$regexFind
忽略校对;也就是说,以下正则表达式模式匹配示例是区分大小写和重音符号的
db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ] ) db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ], { collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFindAll )
两个操作都返回以下内容
{ "_id" : 1, "category" : "café", "results" : [ ] } { "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] } { "_id" : 3, "category" : "cafE", "results" : [ ] }
捕捉
输出行为
如果您的正则表达式模式包含捕获组,并且模式在输入中找到匹配项,则结果中的捕捉
数组对应于由匹配字符串捕获的组。捕获组使用未转义括号()
在正则表达式模式中指定。捕捉
数组的长度等于模式中捕获组的数量,数组的顺序与捕获组出现的顺序相匹配。
创建一个名为contacts
的示例集合,包含以下文档
db.contacts.insertMany([ { "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" }, { "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" }, { "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" }, { "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" }, { "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" } ])
以下管道将正则表达式模式/(C(ar)*)ol/
应用于fname
字段
db.contacts.aggregate([ { $project: { returnObject: { $regexFindAll: { input: "$fname", regex: /(C(ar)*)ol/ } } } } ])
正则表达式模式在fname
值Carol
和Colleen
中找到匹配项
{ "_id" : 1, "returnObject" : [ { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } ] } { "_id" : 2, "returnObject" : [ ] } { "_id" : 3, "returnObject" : [ ] } { "_id" : 4, "returnObject" : [ { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } ] } { "_id" : 5, "returnObject" : [ ] }
模式包含捕获组(C(ar)*)
,其中包含嵌套组(ar)
。捕捉
数组中的元素对应于两个捕获组。如果一个匹配文档没有被组捕获(例如,Colleen
和组(ar)
),则$regexFindAll
将组替换为null占位符。
如前例所示,捕捉
数组包含每个捕获组的一个元素(使用null
表示未捕获)。考虑以下示例,该示例通过在phone
字段上应用捕获组的逻辑或
来搜索具有纽约市区号的电话号码。每个组代表一个纽约市区号
db.contacts.aggregate([ { $project: { nycContacts: { $regexFindAll: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ } } } } ])
对于与正则表达式模式匹配的文档,捕捉
数组包括匹配的捕获组,并用null
替换任何非捕获组
{ "_id" : 1, "nycContacts" : [ { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } ] } { "_id" : 2, "nycContacts" : [ { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } ] } { "_id" : 3, "nycContacts" : [ ] } { "_id" : 4, "nycContacts" : [ ] } { "_id" : 5, "nycContacts" : [ { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } ] }
示例
$regexFindAll
及其选项
为了说明本例中讨论的$regexFindAll
操作符的行为,创建一个具有以下文档的示例集合products
db.products.insertMany([ { _id: 1, description: "Single LINE description." }, { _id: 2, description: "First lines\nsecond line" }, { _id: 3, description: "Many spaces before line" }, { _id: 4, description: "Multiple\nline descriptions" }, { _id: 5, description: "anchors, links and hyperlinks" }, { _id: 6, description: "métier work vocation" } ])
默认情况下,$regexFindAll
执行大小写敏感的匹配。例如,以下聚合操作在description
字段上执行大小写敏感的$regexFindAll
。正则表达式模式/line/
未指定任何分组
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } } ])
操作返回以下内容
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
以下正则表达式模式/lin(e|k)/
在模式中指定了一个分组(e|k)
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } } ])
操作返回以下内容
{ "_id" : 1, "description" : "Single LINE description.", "returnObject": [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
在返回选项中,idx
字段是码点的索引,而不是字节索引。为了说明,请考虑以下使用正则表达式模式/tier/
的示例。
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } } ])
操作返回以下结果,其中只有最后一个记录与模式匹配,并且返回的idx
是2
(而不是使用字节索引时的3)
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }
i
选项
注意
您不能同时在 regex
和 options
字段中指定选项。
要执行不区分大小写的模式匹配,请将i选项作为正则表达式字段或选项字段的一部分包括在内
// Specify i as part of the regex field { $regexFindAll: { input: "$description", regex: /line/i } } // Specify i in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "i" } } { $regexFindAll: { input: "$description", regex: "line", options: "i" } }
例如,以下聚合操作在description
字段上执行不区分大小写的$regexFindAll
。正则表达式模式/line/
未指定任何分组
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } } ])
操作返回以下文档
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
m
选项
注意
您不能同时在 regex
和 options
字段中指定选项。
为了匹配多行字符串中每行的指定锚点(例如 ^
,$
),在正则表达式字段或 options 字段中包含 m 选项
// Specify m as part of the regex field { $regexFindAll: { input: "$description", regex: /line/m } } // Specify m in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "m" } } { $regexFindAll: { input: "$description", regex: "line", options: "m" } }
以下示例同时包含 i
和 m
选项,以匹配以字母 s
或 S
开头的行为多行字符串
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } } ])
操作返回以下内容
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
x
选项
注意
您不能同时在 regex
和 options
字段中指定选项。
为了在模式中忽略所有未转义的白空字符和注释(由未转义的井号 #
字符和下一个换行符表示),在 options 字段中包含 s 选项
// Specify x in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "x" } } { $regexFindAll: { input: "$description", regex: "line", options: "x" } }
以下示例包含 x
选项以跳过未转义的白空字符和注释
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } } ])
操作返回以下内容
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
s
选项
注意
您不能同时在 regex
和 options
字段中指定选项。
为了允许模式中的点字符(即 .
)匹配所有字符,包括换行符,在 options 字段中包含 s 选项
// Specify s in the options field { $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } } { $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }
以下示例包含 s
选项,允许点字符(即 .)匹配所有字符,包括换行符,以及 i
选项以执行不区分大小写的匹配
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si" } } } } ])
操作返回以下内容
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
使用 $regexFindAll
从字符串中解析电子邮件
创建一个包含以下文档的示例集合 feedback
db.feedback.insertMany([ { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" }, { "_id" : 2, comment: "I wanted to concatenate a string" }, { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" }, { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" } ])
以下聚合使用 $regexFindAll
从 comment
字段提取所有电子邮件(不区分大小写)。
db.feedback.aggregate( [ { $addFields: { "email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } } } }, { $set: { email: "$email.match"} } ] )
- 第一阶段
该阶段使用
$addFields
阶段在文档中添加一个新的字段email
。新字段是一个包含对comment
字段执行$regexFindAll
结果的数组。{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] } - 第二阶段
该阶段使用
$set
阶段将email
数组元素重置为"email.match"
值。如果email
的当前值为 null,则将email
的新值设置为 null。{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }
使用捕获分组来解析用户名
创建一个包含以下文档的示例集合 feedback
db.feedback.insertMany([ { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" }, { "_id" : 2, comment: "I wanted to concatenate a string" }, { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" }, { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" } ])
为了回复反馈,假设您想解析电子邮件地址的本地部分作为问候语中的名称。使用 $regexFindAll
结果中返回的 captured
字段,您可以解析出每个电子邮件地址的本地部分
db.feedback.aggregate( [ { $addFields: { "names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }, } }, { $set: { names: { $reduce: { input: "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } } ] )
- 第一阶段
该阶段使用
$addFields
阶段在文档中添加一个新的字段names
。新字段包含对comment
字段执行$regexFindAll
的结果{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "names" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] } ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ] } - 第二阶段
该阶段使用
$set
阶段和$reduce
操作符来重置names
为包含"$names.captures"
元素的数组。{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "names" : [ "aunt.arc.tica" ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "names" : [ "cam", "c.dia" ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "names" : [ "fred" ] }