Querying Couchbase with MapReduce Views – 用MapReduce视图查询Couchbase

最后修改: 2017年 1月 31日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we will introduce some simple MapReduce views and demonstrate how to query them using the Couchbase Java SDK.

在本教程中,我们将介绍一些简单的MapReduce视图,并演示如何使用Couchbase Java SDK查询这些视图。

2. Maven Dependency

2.Maven的依赖性

To work with Couchbase in a Maven project, import the Couchbase SDK into your pom.xml:

要在Maven项目中使用Couchbase,请将Couchbase SDK导入你的pom.xml

<dependency>
    <groupId>com.couchbase.client</groupId>
    <artifactId>java-client</artifactId>
    <version>2.4.0</version>
</dependency>

You can find the latest version on Maven Central.

您可以在Maven Central上找到最新版本。

3. MapReduce Views

3.MapReduce视图

In Couchbase, a MapReduce view is a type of index that can be used to query a data bucket. It is defined using a JavaScript map function and an optional reduce function.

在Couchbase中,MapReduce视图是一种索引,可以用来查询数据桶。它是用一个JavaScript map函数和一个可选的reduce函数来定义的。

3.1. The map Function

3.1.map函数

The map function is run against each document one time. When the view is created, the map function is run once against each document in the bucket, and the results are stored in the bucket.

map函数是针对每个文档运行一次的。当视图被创建时,map函数针对桶中的每个文档运行一次,结果被存储在桶中。

Once a view is created, the map function is run only against newly inserted or updated documents in order to update the view incrementally.

一旦视图被创建,map函数只针对新插入或更新的文档运行,以便逐步更新视图。

Because the map function’s results are stored in the data bucket, queries against a view exhibit low latencies.

由于map函数的结果存储在数据桶中,针对视图的查询表现出低延迟。

Let’s look at an example of a map function that creates an index on the name field of all documents in the bucket whose type field is equal to “StudentGrade”:

让我们看一个map函数的例子,它在type字段等于“StudentGrade”的桶中的所有文档的name字段上创建一个索引。

function (doc, meta) {
    if(doc.type == "StudentGrade" && doc.name) {    
        emit(doc.name, null);
    }
}

The emit function tells Couchbase which data field(s) to store in the index key (first parameter) and what value (second parameter) to associate with the indexed document.

emit函数告诉Couchbase哪些数据字段要存储在索引键中(第一个参数),哪些值(第二个参数)要与被索引的文档相关联。

In this case, we are storing only the document name property in the index key. And since we are not interested in associating any particular value with each entry, we pass null as the value parameter.

在这种情况下,我们在索引键中只存储了文档的name属性。由于我们对将任何特定的值与每个条目联系起来不感兴趣,我们传递null作为值参数。

As Couchbase processes the view, it creates an index of the keys that are emitted by the map function, associating each key with all documents for which that key was emitted.

当Couchbase处理视图时,它创建了一个由map函数发出的键的索引,将每个键与该键被发出的所有文档相关联。

For example, if three documents have the name property set to “John Doe”, then the index key “John Doe” would be associated with those three documents.

例如,如果有三个文档的name属性设置为“John Doe”,那么索引键“John Doe”将与这三个文档相关。

3.2. The reduce Function

3.2.reduce函数

The reduce function is used to perform aggregate calculations using the results of a map function. The Couchbase Admin UI provides an easy way to apply the built-in reduce functions “_count”, “_sum”, and “_stats”, to your map function.

reduce函数是用来使用map函数的结果进行聚合计算的。Couchbase管理界面提供了一个简单的方法来应用内置的reduce函数“_count”, “_sum”, “_stats”, 到你的map函数。

You can also write your own reduce functions for more complex aggregations. We will see examples of using the built-in reduce functions later in the tutorial.

你也可以为更复杂的聚合编写你自己的reduce函数。我们将在本教程的后面看到使用内置reduce函数的例子。

4. Working With Views and Queries

4.使用视图和查询

4.1. Organizing the Views

4.1.组织视图

Views are organized into one or more design document per bucket. In theory, there is no limit to the number of views per design document. However, for optimal performance, it has been suggested that you should limit each design document to fewer than ten views.

视图被组织到每个桶的一个或多个设计文件中。理论上,每个设计文件的视图数量是没有限制的。然而,为了获得最佳性能,有人建议你应该把每个设计文件限制在十个以下的视图。

When you first create a view within a design document, Couchbase designates it as a development view. You can run queries against a development view to test its functionality. Once you are satisfied with the view, you would publish the design document, and the view becomes a production view.

当你第一次在设计文档中创建一个视图时,Couchbase会将其指定为开发视图。你可以对开发视图运行查询来测试其功能。一旦你对该视图感到满意,你就会发布设计文档,然后该视图就成为生产视图。

4.2. Constructing Queries

4.2.构建查询

In order to construct a query against a Couchbase view, you need to provide its design document name and view name to create a ViewQuery object:

为了构建一个针对Couchbase视图的查询,你需要提供它的设计文档名称和视图名称来创建一个ViewQuery对象。

ViewQuery query = ViewQuery.from("design-document-name", "view-name");

When executed, this query will return all rows of the view. We will see in later sections how to restrict the result set based on the key values.

当执行时,这个查询将返回视图的所有行。我们将在后面的章节中看到如何根据键值来限制结果集。

To construct a query against a development view, you can apply the development() method when creating the query:

要针对开发视图构建查询,你可以在创建查询时应用development()方法。

ViewQuery query 
  = ViewQuery.from("design-doc-name", "view-name").development();

4.3. Executing the Query

4.3.执行查询

Once we have a ViewQuery object, we can execute the query to obtain a ViewResult:

一旦我们有了ViewQuery对象,我们就可以执行查询以获得ViewResult

ViewResult result = bucket.query(query);

4.4. Processing Query Results

4.4.处理查询结果

And now that we have a ViewResult, we can iterate over the rows to get the document ids and/or content:

现在我们有了一个ViewResult,我们可以遍历这些行,以获得文档的ID和/或内容。

for(ViewRow row : result.allRows()) {
    JsonDocument doc = row.document();
    String id = doc.id();
    String json = doc.content().toString();
}

5. Sample Application

5.应用样本

For the remainder of the tutorial, we will write MapReduce views and queries for a set of student grade documents having the following format, with grades constrained to the range 0 to 100:

在教程的其余部分,我们将为一组具有以下格式的学生成绩文件编写MapReduce视图和查询,成绩限制在0到100之间。

{ 
    "type": "StudentGrade",
    "name": "John Doe",
    "course": "History",
    "hours": 3,
    "grade": 95
}

We will store these documents in the “baeldung-tutorial” bucket and all views in a design document named “studentGrades.” Let’s look at the code needed to open the bucket so that we can query it:

我们将把这些文件存储在”baeldung-tutorial“桶中,并把所有视图存储在名为”studentGrades“的设计文档中。让我们看看打开水桶所需的代码,以便我们能够查询它。

Bucket bucket = CouchbaseCluster.create("127.0.0.1")
  .openBucket("baeldung-tutorial");

6. Exact Match Queries

6.精确匹配查询

Suppose you want to find all student grades for a particular course or set of courses. Let’s write a view called “findByCourse” using the following map function:

假设你想找到一门特定课程或一组课程的所有学生成绩。让我们写一个名为”findByCourse“的视图,使用以下map函数。

function (doc, meta) {
    if(doc.type == "StudentGrade" && doc.course && doc.grade) {
        emit(doc.course, null);
    }
}

Note that in this simple view, we only need to emit the course field.

注意,在这个简单的视图中,我们只需要发射course字段。

6.1. Matching on a Single Key

6.1.单一密钥的匹配

To find all grades for the History course, we apply the key method to our base query:

为了找到历史课程的所有成绩,我们将key方法应用于我们的基本查询。

ViewQuery query 
  = ViewQuery.from("studentGrades", "findByCourse").key("History");

6.2. Matching on Multiple Keys

6.2.在多个键上进行匹配

If you want to find all grades for Math and Science courses, you can apply the keys method to the base query, passing it an array of key values:

如果你想找到数学和科学课程的所有成绩,你可以将keys方法应用到基本查询中,将一个键值的数组传递给它。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByCourse")
  .keys(JsonArray.from("Math", "Science"));

7. Range Queries

7.范围查询

In order to query for documents containing a range of values for one or more fields, we need a view that emits the field(s) we are interested in, and we must specify a lower and/or upper bound for the query.

为了查询包含一个或多个字段的数值范围的文档,我们需要一个视图,该视图会发出我们感兴趣的字段,而且我们必须为查询指定一个下限和/或上限。

Let’s take a look at how to perform range queries involving a single field and multiple fields.

让我们来看看如何执行涉及单个字段和多个字段的范围查询。

7.1. Queries Involving a Single Field

7.1.涉及单个字段的查询

To find all documents with a range of grade values regardless of the value of the course field, we need a view that emits only the grade field. Let’s write the map function for the “findByGrade” view:

为了找到所有具有grade值范围的文档,而不考虑course字段的值,我们需要一个只发射grade字段的视图。让我们为”findByGrade“视图编写map函数。

function (doc, meta) {
    if(doc.type == "StudentGrade" && doc.grade) {
        emit(doc.grade, null);
    }
}

Let’s write a query in Java using this view to find all grades equivalent to a “B” letter grade (80 to 89 inclusive):

让我们用Java写一个查询,用这个视图找到所有相当于 “B “字母等级的成绩(80到89分,含)。

ViewQuery query = ViewQuery.from("studentGrades", "findByGrade")
  .startKey(80)
  .endKey(89)
  .inclusiveEnd(true);

Note that the start key value in a range query is always treated as inclusive.

请注意,范围查询中的起始键值总是被视为包括在内。

And if all the grades are known to be integers, then the following query will yield the same results:

而如果已知所有的等级都是整数,那么下面的查询将产生相同的结果。

ViewQuery query = ViewQuery.from("studentGrades", "findByGrade")
  .startKey(80)
  .endKey(90)
  .inclusiveEnd(false);

To find all “A” grades (90 and above), we only need to specify the lower bound:

要找到所有 “A “级(90分及以上),我们只需要指定下限。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByGrade")
  .startKey(90);

And to find all failing grades (below 60), we only need to specify the upper bound:

而要找到所有不及格的成绩(低于60分),我们只需要指定上界。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByGrade")
  .endKey(60)
  .inclusiveEnd(false);

7.2. Queries Involving Multiple Fields

7.2.涉及多个字段的查询

Now, suppose we want to find all students in a specific course whose grade falls into a certain range. This query requires a new view that emits both the course and grade fields.

现在,假设我们想找到特定课程中所有成绩在某一范围内的学生。这个查询需要一个新的视图,同时发出coursegrade字段。

With multi-field views, each index key is emitted as an array of values. Since our query involves a fixed value for course and a range of grade values, we will write the map function to emit each key as an array of the form [course, grade].

在多字段视图中,每个索引键都是作为一个数组的值发出的。由于我们的查询涉及course的固定值和grade的范围,我们将编写map函数,将每个键作为一个形式为[course, grade]的数组发出。

Let’s look at the map function for the view “findByCourseAndGrade“:

让我们看看视图”findByCourseAndGrade“的map函数。

function (doc, meta) {
    if(doc.type == "StudentGrade" && doc.course && doc.grade) {
        emit([doc.course, doc.grade], null);
    }
}

When this view is populated in Couchbase, the index entries are sorted by course and grade. Here’s a subset of keys in the “findByCourseAndGrade” view shown in their natural sort order:

当这个视图在Couchbase中被填充时,索引条目是按照coursegrade排序的。下面是”findByCourseAndGrade“视图中的一个键的子集,以其自然排序顺序显示。

["History", 80]
["History", 90]
["History", 94]
["Math", 82]
["Math", 88]
["Math", 97]
["Science", 78]
["Science", 86]
["Science", 92]

Since the keys in this view are arrays, you would also use arrays of this format when specifying the lower and upper bounds of a range query against this view.

因为这个视图中的键是数组,所以在针对这个视图指定范围查询的下限和上限时,也会使用这种格式的数组。

This means that in order to find all students who got a “B” grade (80 to 89) in the Math course, you would set the lower bound to:

这意味着,为了找到所有在数学课程中获得 “B “级(80至89分)的学生,你要把下限设置为。

["Math", 80]

and the upper bound to:

并将其上界为。

["Math", 89]

Let’s write the range query in Java:

让我们用Java写一下范围查询。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByCourseAndGrade")
  .startKey(JsonArray.from("Math", 80))
  .endKey(JsonArray.from("Math", 89))
  .inclusiveEnd(true);

If we want to find for all students who received an “A” grade (90 and above) in Math, then we would write:

如果我们想找到所有在数学上获得 “A “级(90分及以上)的学生,那么我们就写。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByCourseAndGrade")
  .startKey(JsonArray.from("Math", 90))
  .endKey(JsonArray.from("Math", 100));

Note that because we are fixing the course value to “Math“, we have to include an upper bound with the highest possible grade value. Otherwise, our result set would also include all documents whose course value is lexicographically greater than “Math“.

请注意,由于我们将课程值固定为”Math“,我们必须包括一个具有最高可能的grade值的上限。否则,我们的结果集也会包括所有课程值在词法上大于”数学“的文件。

And to find all failing Math grades (below 60):

并找到所有不合格的数学成绩(低于60分)。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByCourseAndGrade")
  .startKey(JsonArray.from("Math", 0))
  .endKey(JsonArray.from("Math", 60))
  .inclusiveEnd(false);

Much like the previous example, we must specify a lower bound with the lowest possible grade. Otherwise, our result set would also include all grades where the course value is lexicographically less than “Math“.

和前面的例子一样,我们必须指定一个最低可能的成绩的下限。否则,我们的结果集也会包括所有课程值在词典上小于”数学“的成绩。

Finally, to find the five highest Math grades (barring any ties), you can tell Couchbase to perform a descending sort and to limit the size of the result set:

最后,为了找到五个最高的数学成绩(排除任何并列关系),你可以告诉Couchbase进行降序排序并限制结果集的大小。

ViewQuery query = ViewQuery
  .from("studentGrades", "findByCourseAndGrade")
  .descending()
  .startKey(JsonArray.from("Math", 100))
  .endKey(JsonArray.from("Math", 0))
  .inclusiveEnd(true)
  .limit(5);

Note that when performing a descending sort, the startKey and endKey values are reversed, because Couchbase applies the sort before it applies the limit.

注意,当执行降序排序时,startKeyendKey值是相反的,因为Couchbase在应用limit之前应用了排序。

8. Aggregate Queries

8.聚合查询

A major strength of MapReduce views is that they are highly efficient for running aggregate queries against large datasets. In our student grades dataset, for example, we can easily calculate the following aggregates:

MapReduce视图的一个主要优势是,它们在对大型数据集运行聚合查询时效率很高。例如,在我们的学生成绩数据集中,我们可以很容易地计算出以下聚合。

  • number of students in each course
  • sum of credit hours for each student
  • grade point average for each student across all courses

Let’s build a view and query for each of these calculations using built-in reduce functions.

让我们使用内置的reduce函数为这些计算建立一个视图和查询。

8.1. Using the count() Function

8.1.使用count()函数

First, let’s write the map function for a view to count the number of students in each course:

首先,让我们为一个视图编写map函数,计算每门课程的学生人数。

function (doc, meta) {
    if(doc.type == "StudentGrade" && doc.course && doc.name) {
        emit([doc.course, doc.name], null);
    }
}

We’ll call this view “countStudentsByCourse” and designate that it is to use the built-in “_count” function. And since we are only performing a simple count, we can still emit null as the value for each entry.

我们将这个视图称为”countStudentsByCourse“并指定它使用内置的“_count”函数。由于我们只是执行一个简单的计数,我们仍然可以将null作为每个条目的值。

To count the number of students in the each course:

计算每门课程的学生人数。

ViewQuery query = ViewQuery
  .from("studentGrades", "countStudentsByCourse")
  .reduce()
  .groupLevel(1);

Extracting data from aggregate queries is different from what we’ve seen up to this point. Instead of extracting a matching Couchbase document for each row in the result, we are extracting the aggregate keys and results.

从聚合查询中提取数据与我们到现在为止所看到的不同。我们不是为结果中的每一行提取一个匹配的Couchbase文档,而是提取聚合键和结果。

Let’s run the query and extract the counts into a java.util.Map:

让我们运行查询并将计数提取到一个java.util.Map中。

ViewResult result = bucket.query(query);
Map<String, Long> numStudentsByCourse = new HashMap<>();
for(ViewRow row : result.allRows()) {
    JsonArray keyArray = (JsonArray) row.key();
    String course = keyArray.getString(0);
    long count = Long.valueOf(row.value().toString());
    numStudentsByCourse.put(course, count);
}

8.2. Using the sum() Function

8.2.使用sum()函数

Next, let’s write a view that calculates the sum of each student’s credit hours attempted. We’ll call this view “sumHoursByStudent” and designate that it is to use the built-in “_sum” function:

接下来,让我们写一个视图,计算每个学生的学时之和。我们将这个视图称为”sumHoursByStudent“并指定它使用内置的“_sum”函数。

function (doc, meta) {
    if(doc.type == "StudentGrade"
         && doc.name
         && doc.course
         && doc.hours) {
        emit([doc.name, doc.course], doc.hours);
    }
}

Note that when applying the “_sum” function, we have to emit the value to be summed — in this case, the number of credits — for each entry.

请注意,当应用“_sum”函数时,我们必须emit每个条目的要加的值–在本例中是点数–。

Let’s write a query to find the total number of credits for each student:

让我们写一个查询,找出每个学生的总学分。

ViewQuery query = ViewQuery
  .from("studentGrades", "sumCreditsByStudent")
  .reduce()
  .groupLevel(1);

And now, let’s run the query and extract the aggregated sums into a java.util.Map:

现在,让我们运行查询,并将聚合的总和提取到一个java.util.Map中。

ViewResult result = bucket.query(query);
Map<String, Long> hoursByStudent = new HashMap<>();
for(ViewRow row : result.allRows()) {
    String name = (String) row.key();
    long sum = Long.valueOf(row.value().toString());
    hoursByStudent.put(name, sum);
}

8.3. Calculating Grade Point Averages

8.3.计算平均分数

Suppose we want to calculate each student’s grade point average (GPA) across all courses, using the conventional grade point scale based on the grades obtained and the number of credit hours that the course is worth (A=4 points per credit hour, B=3 points per credit hour, C=2 points per credit hour, and D=1 point per credit hour).

假设我们想计算每个学生在所有课程中的平均成绩(GPA),使用基于所获成绩和课程所占学分的常规分数表(A=每学分4分,B=每学分3分,C=每学分2分,D=每学分1分)。

There is no built-in reduce function to calculate average values, so we’ll combine the output from two views to compute the GPA.

没有内置的reduce函数来计算平均值,所以我们将结合两个视图的输出来计算GPA。

We already have the “sumHoursByStudent” view that sums the number of credit hours each student attempted. Now we need the total number of grade points each student earned.

我们已经有了“sumHoursByStudent”视图,它对每个学生尝试的学时数进行汇总。现在我们需要每个学生获得的总分数。

Let’s create a view called “sumGradePointsByStudent” that calculates the number of grade points earned for each course taken. We’ll use the built-in “_sum” function to reduce the following map function:

让我们创建一个名为“sumGradePointsByStudent”的视图,计算每门课程获得的成绩点数。我们将使用内置的“_sum”函数来减少以下map函数。

function (doc, meta) {
    if(doc.type == "StudentGrade"
         && doc.name
         && doc.hours
         && doc.grade) {
        if(doc.grade >= 90) {
            emit(doc.name, 4*doc.hours);
        }
        else if(doc.grade >= 80) {
            emit(doc.name, 3*doc.hours);
        }
        else if(doc.grade >= 70) {
            emit(doc.name, 2*doc.hours);
        }
        else if(doc.grade >= 60) {
            emit(doc.name, doc.hours);
        }
        else {
            emit(doc.name, 0);
        }
    }
}

Now let’s query this view and extract the sums into a java.util.Map:

现在让我们来查询这个视图,并将和值提取到一个java.util.Map中。

ViewQuery query = ViewQuery.from(
  "studentGrades",
  "sumGradePointsByStudent")
  .reduce()
  .groupLevel(1);
ViewResult result = bucket.query(query);

Map<String, Long> gradePointsByStudent = new HashMap<>();
for(ViewRow row : result.allRows()) {
    String course = (String) row.key();
    long sum = Long.valueOf(row.value().toString());
    gradePointsByStudent.put(course, sum);
}

Finally, let’s combine the two Maps in order to calculate GPA for each student:

最后,让我们把这两个Maps结合起来,以计算每个学生的GPA。

Map<String, Float> result = new HashMap<>();
for(Entry<String, Long> creditHoursEntry : hoursByStudent.entrySet()) {
    String name = creditHoursEntry.getKey();
    long totalHours = creditHoursEntry.getValue();
    long totalGradePoints = gradePointsByStudent.get(name);
    result.put(name, ((float) totalGradePoints / totalHours));
}

9. Conclusion

9.结论

We have demonstrated how to write some basic MapReduce views in Couchbase, and how to construct and execute queries against the views, and extract the results.

我们已经演示了如何在Couchbase中编写一些基本的MapReduce视图,以及如何针对这些视图构造和执行查询,并提取结果。

The code presented in this tutorial can be found in the GitHub project.

本教程中介绍的代码可以在GitHub项目中找到。

You can learn more about MapReduce views and how to query them in Java at the official Couchbase developer documentation site.

您可以在官方的MapReduce 视图以及如何在Java中查询它们了解更多。