$strLenBytes (聚合)

在本页

定义

行为
示例

定义

$strLenBytes

返回指定字符串中UTF-8编码的字节数。

$strLenBytes 具有以下操作表达式语法:

{ $strLenBytes: <string expression> }

参数可以是任何有效的表达式，只要它解析为字符串。有关表达式的更多信息，请参阅表达式运算符。

如果参数解析为 null 值或引用缺失的字段，则 $strLenBytes 返回错误。

行为

$strLenBytes 运算符计算字符串中UTF-8编码字节数，其中每个字符可能使用1到4个字节。

例如，US-ASCII字符使用一个字节编码。带重音符号的字符和额外的拉丁字母字符（英语字母表之外的拉丁字符）使用两个字节编码。中文、日文和韩文字符通常需要三个字节，而其他Unicode平面（表情符号、数学符号等）需要四个字节。

运算符 $strLenBytes 与 $strLenCP 运算符不同，后者计算指定字符串中码点的数量，而不管每个字符占用多少字节。

示例

结果

注意

{ $strLenBytes: "abcde" }

5

每个字符使用一个字节进行编码。

{ $strLenBytes: "Hello World!" }

12

每个字符使用一个字节进行编码。

{ $strLenBytes: "cafeteria" }

9

每个字符使用一个字节进行编码。

{ $strLenBytes: "cafétéria" }

11

é 使用两个字节进行编码。

{ $strLenBytes: "" }

0

空字符串返回 0。

{ $strLenBytes: "$€λG" }

7

€ 使用三个字节进行编码。 λ 使用两个字节。

{ $strLenBytes: "寿司" }

6

每个字符使用三个字节进行编码。

示例

单字节和多字节字符集

创建以下文档的 food 集合

db.food.insertMany(
 [
    { "_id" : 1, "name" : "apple" },
    { "_id" : 2, "name" : "banana" },
    { "_id" : 3, "name" : "éclair" },
    { "_id" : 4, "name" : "hamburger" },
    { "_id" : 5, "name" : "jalapeño" },
    { "_id" : 6, "name" : "pizza" },
    { "_id" : 7, "name" : "tacos" },
    { "_id" : 8, "name" : "寿司" }
 ]
)

以下操作使用 $strLenBytes 运算符来计算每个 name 值的长度

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "length": { $strLenBytes: "$name" }
      }
    }
  ]
)

该操作返回以下结果

{ "_id" : 1, "name" : "apple", "length" : 5 }
{ "_id" : 2, "name" : "banana", "length" : 6 }
{ "_id" : 3, "name" : "éclair", "length" : 7 }
{ "_id" : 4, "name" : "hamburger", "length" : 9 }
{ "_id" : 5, "name" : "jalapeño", "length" : 9 }
{ "_id" : 6, "name" : "pizza", "length" : 5 }
{ "_id" : 7, "name" : "tacos", "length" : 5 }
{ "_id" : 8, "name" : "寿司", "length" : 6 }

文档中具有 _id: 3 和 _id: 5 的记录分别包含一个带重音符号的字符（分别是 é 和 ñ），需要两个字节进行编码。具有 _id: 8 的文档包含两个使用三个字节进行编码的日语字符。这使得文档中 length 的长度大于 name 的字符数，对于具有 _id: 3、_id: 5 和 _id: 8 的文档。

提示

另请参阅

字符串比较（ASCII码）

字符串长度（代码点）