Case Insensitive Sorting in MongoDB – MongoDB中不区分大小写的排序方法

最后修改: 2022年 3月 12日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

By default, the MongoDB engine considers character case when sorting extracted data. It’s possible to execute case insensitive sorting queries by specifying Aggregations or Collations.

默认情况下,MongoDB引擎在对提取的数据进行排序时考虑字符大小写。通过指定AggregationsCollations,可以执行不区分大小写的排序查询。

In this short tutorial, we’ll look at the two solutions using both MongoDB Shell and Java.

在这个简短的教程中,我们将同时使用MongoDB Shell和Java来看这两种解决方案。

2. Setting up an Environment

2.建立一个环境

First of all, we need to run a MongoDB server. Let’s use a Docker image:

首先,我们需要运行一个MongoDB服务器。让我们使用一个Docker镜像。

$ docker run -d -p 27017:27017 --name example-mongo mongo:latest

This will create a new temporary Docker container named “example-mongo” exposing port 27017. Now, we need to create a basic Mongo database with the data we need to test the solution.

这将创建一个名为”example-mongo“的临时Docker容器,暴露端口27017。现在,我们需要用我们需要的数据创建一个基本的Mongo数据库来测试解决方案。

First, let’s open a Mongo Shell inside the container:

首先,让我们在容器内打开一个Mongo Shell。

$ docker exec -it example-mongo mongosh

Once we’re in the shell, let’s switch the context and enter the database named “sorting“:

一旦我们进入shell,让我们切换上下文并进入名为”sorting“的数据库。

> use sorting

Finally, let’s insert some data for us to try with our sort operations:

最后,让我们插入一些数据,让我们试试我们的排序操作。

> db.users.insertMany([
  {name: "ben", surname: "ThisField" },
  {name: "aen", surname: "Does" },
  {name: "Aen", surname: "Not" },
  {name: "Ben", surname: "Matter" },
])

We’ve inserted similar values in some of the documents’ name fields. The only difference is the case of the first letter. At this point, the database is created and data inserted appropriately, so we’re ready for action.

我们在一些文件的name字段中插入了类似的值。唯一的区别是第一个字母的大小写。在这一点上,数据库已经创建,数据也适当地插入了,所以我们已经准备好行动了。

3. Default Sorting

3.默认排序

Let’s run the standard query without customization:

让我们运行没有定制的标准查询。

> db.getCollection('users').find({}).sort({name:1})

The data returned will be ordered considering the case. This means, for example, that the uppercase character “B” will be considered before the lowercase character “a”:

返回的数据将考虑大小写排序。这意味着,例如,大写字符”B”将被考虑在小写字符”a”之前。

[
  {
    _id: ..., name: 'Aen', surname: 'Not'
  },
  {
    _id: ..., name: 'Ben', surname: 'Matter'
  },
  {
    _id: ..., name: 'aen', surname: 'Does'
  },
  {
    _id: ..., name: 'ben', surname: 'ThisField'
  }
]

Let’s now look at how we can make our sorts case-insensitive so that Ben and ben would appear together.

现在让我们看看如何使我们的排序不分大小写,使Benben一起出现。

4. Case Insensitive Sorting in Mongo Shell

4.在Mongo Shell中进行不区分大小写的排序

4.1. Sorting Using Collation

4.1.使用排序进行排序

Let’s try using MongoDB Collation. Only available in MongoDB 3.4 and subsequent versions, it enables language-specific rules for string comparison.

让我们尝试使用MongoDB整理法。它仅在 MongoDB 3.4 及后续版本中可用,能够为字符串比较提供特定的语言规则。

The Collation ICU locale parameter drives how the database does sorting. Let’s use the “en” (English) locale:

Collation ICU的locale参数驱动数据库如何进行排序。让我们使用“en”(英语)locale:

> db.getCollection('users').find({}).collation({locale: "en"}).sort({name:1})

This produces output where the names are clustered by letter:

这产生的输出结果中,名字是按字母分组的。

[
  {
    _id: ..., name: 'aen', surname: 'Does'
  },
  {
    _id: ..., name: 'Aen', surname: 'Not'
  },
  {
    _id: ..., name: 'ben', surname: 'ThisField'
  },
  {
    _id: ..., name: 'Ben', surname: 'Matter'
  }
]

4.2. Sorting Using Aggregation

4.2.使用Aggregation进行排序

Let’s now use the Aggregation function:

现在让我们使用Aggregation函数。

> db.getCollection('users').aggregate([{
        "$project": {
            "name": 1,
            "surname": 1,
            "lowerName": {
                "$toLower": "$name"
            }
        }
    },
    {
        "$sort": {
            "lowerName": 1
        }
    }
])

Using the $project functionality, we add a lowerName field as the lowercase version of the name field. This allows us to sort using that field. It’ll give us a result object with an additional field, in the desired sort order:

使用$project功能,我们添加一个lowerName字段作为name字段的小写版本。这允许我们使用该字段进行排序。它将给我们一个带有额外字段的结果对象,按照所需的排序顺序。

[
  {
    _id: ..., name: 'aen', surname: 'Does', lowerName: 'aen'
  },
  {
    _id: ..., name: 'Aen', surname: 'Not', lowerName: 'aen'
  },
  {
    _id: ..., name: 'ben', surname: 'ThisField', lowerName: 'ben'
  },
  {
    _id: ..., name: 'Ben', surname: 'Matter', lowerName: 'ben'
  }
]

5. Case Insensitive Sorting with Java

5.用Java进行不区分大小写的排序

Let’s try to implement the same methods in Java.

让我们试着用Java实现同样的方法。

5.1. Configuration Boilerplate Code

5.1.配置模板代码

Let’s first add the mongo-java-driver dependency:

让我们首先添加mongo-java-driver依赖项。

<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongo-java-driver</artifactId>
    <version>3.12.10</version>
</dependency>

Then, let’s connect using the MongoClient:

然后,让我们使用MongoClient进行连接。

MongoClient mongoClient = new MongoClient();
MongoDatabase db = mongoClient.getDatabase("sorting");
MongoCollection<Document> collection = db.getCollection("users");

5.2. Sorting Using Collation in Java

5.2.在Java中使用校对进行排序

Let’s see how it’s possible to implement the “Collation” solution in Java:

让我们看看如何在Java中实现“Collation”解决方案。

FindIterable<Document> nameDoc = collection.find().sort(ascending("name"))
  .collation(Collation.builder().locale("en").build());

Here, we built the collation using the “en” locale. Then, we passed the created Collation object to the collation method of the FindIterable object.

在这里,我们使用“en”区域设置来建立整理。然后,我们将创建的Collation对象传递给FindIterable 对象的collation方法。

Next, let’s read the results one by one using the MongoCursor:

接下来,让我们使用MongoCursor逐一读取结果。

MongoCursor cursor = nameDoc.cursor();
List expectedNamesOrdering = Arrays.asList("aen", "Aen", "ben", "Ben", "cen", "Cen");
List actualNamesOrdering = new ArrayList<>();
while (cursor.hasNext()) {
    Document document = cursor.next();
    actualNamesOrdering.add(document.get("name").toString());
}
assertEquals(expectedNamesOrdering, actualNamesOrdering);

5.3. Sorting Using Aggregations in Java

5.3.在Java中使用聚合进行排序

We can also sort the collection using Aggregation. Let’s recreate our command-line version using the Java API.

我们还可以使用Aggregation对集合进行排序。让我们使用Java API重新创建我们的命令行版本。

First, we rely on the project method to create a Bson object. This object will also include the lowerName field that is computed by transforming every character of the name into lowercase using the Projections class:

首先,我们依靠project方法来创建一个Bson对象。这个对象还将包括lowerName字段,这个字段是通过使用Projections类将名字的每个字符转化为小写来计算的。

Bson projectBson = project(
  Projections.fields(
    Projections.include("name","surname"),
    Projections.computed("lowerName", Projections.computed("$toLower", "$name"))));

Next, we feed the aggregate method with a list containing the Bson of the previous snippet and the sort method:

接下来,我们给aggregate方法输入一个列表,其中包含前一个片段的Bson以及sort方法。

AggregateIterable<Document> nameDoc = collection.aggregate(
  Arrays.asList(projectBson,
  sort(Sorts.ascending("lowerName"))));

In this case, as in the previous one, we could easily read the results using the MongoCursor.

在这种情况下,和之前的情况一样,我们可以使用MongoCursor轻松读取结果。

6. Conclusion

6.结语

In this article, we’ve seen how to perform a simple case-insensitive sorting of a MongoDB collection.

在这篇文章中,我们已经看到如何对MongoDB集合进行简单的不区分大小写的排序。

We used Aggregation and Collation methods in the MongoDB shell. In the end, we translated those queries and provided a simple Java implementation using the mongo-java-driver library.

我们在MongoDB shell中使用了AggregationCollation方法。最后,我们翻译了这些查询,并使用mongo-java-driver库提供了一个简单的Java实现。

As always, the full source code of the article is available over on GitHub.

一如既往,该文章的完整源代码可在GitHub上获得