A Simple Tagging Implementation with Elasticsearch – 使用Elasticsearch的简单标签实现

最后修改: 2018年 2月 4日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Tagging is a common design pattern that allows us to categorize and filter items in our data model.

标签是一种常见的设计模式,它允许我们对数据模型中的项目进行分类和过滤。

In this article, we’ll implement tagging using Spring and Elasticsearch. We’ll be using both Spring Data and the Elasticsearch API.

在这篇文章中,我们将使用Spring和Elasticsearch实现标签。我们将同时使用Spring Data和Elasticsearch API。

First of all, we aren’t going to cover the basics of getting Elasticsearch and Spring Data – you can explore these here.

首先,我们不打算介绍获取Elasticsearch和Spring Data的基础知识–你可以在这里探索这些

2. Adding Tags

2.添加标签

The simplest implementation of tagging is an array of strings. We can implement this by adding a new field to our data model like this:

标签的最简单实现是一个字符串数组。我们可以通过向我们的数据模型添加一个新的字段来实现,就像这样。

@Document(indexName = "blog", type = "article")
public class Article {

    // ...

    @Field(type = Keyword)
    private String[] tags;

    // ...
}

Notice the use of the Keyword field type. We only want exact matches of our tags to filter a result. This allows us to use similar but separate tags like elasticsearchIsAwesome and elasticsearchIsTerrible.

注意到关键字字段类型的使用。我们只想让我们的标签完全匹配来过滤一个结果。这允许我们使用类似但独立的标签,如elasticsearchIsAwesomeelasticsearchIsTerrible

Analyzed fields would return partial hits which is a wrong behavior in this case.

分析字段将返回部分命中,这在这种情况下是一个错误的行为。

3. Building Queries

3.建立查询

Tags allow us to manipulate our queries in interesting ways. We can search across them like any other field, or we can use them to filter our results on match_all queries. We can also use them with other queries to tighten our results.

标签允许我们以有趣的方式操作我们的查询。我们可以像其他字段一样在它们之间进行搜索,或者我们可以用它们来过滤match_all查询的结果。我们还可以将它们与其他查询一起使用,以加强我们的结果。

3.1. Searching Tags

3.1.搜索标签

The new tag field we created on our model is just like every other field in our index. We can search for any entity that has a specific tag like this:

我们在模型上创建的新tag字段就像我们索引中的其他字段一样。我们可以像这样搜索任何有特定标签的实体。

@Query("{\"bool\": {\"must\": [{\"match\": {\"tags\": \"?0\"}}]}}")
Page<Article> findByTagUsingDeclaredQuery(String tag, Pageable pageable);

This example uses a Spring Data Repository to construct our query, but we can just as quickly use a Rest Template to query the Elasticsearch cluster manually.

这个例子使用Spring Data Repository来构建我们的查询,但我们同样可以快速使用Rest Template来手动查询Elasticsearch集群。

Similarly, we can use the Elasticsearch API:

同样地,我们可以使用Elasticsearch API。

boolQuery().must(termQuery("tags", "elasticsearch"));

Assume we use the following documents in our index:

假设我们在索引中使用以下文件。

[
    {
        "id": 1,
        "title": "Spring Data Elasticsearch",
        "authors": [ { "name": "John Doe" }, { "name": "John Smith" } ],
        "tags": [ "elasticsearch", "spring data" ]
    },
    {
        "id": 2,
        "title": "Search engines",
        "authors": [ { "name": "John Doe" } ],
        "tags": [ "search engines", "tutorial" ]
    },
    {
        "id": 3,
        "title": "Second Article About Elasticsearch",
        "authors": [ { "name": "John Smith" } ],
        "tags": [ "elasticsearch", "spring data" ]
    },
    {
        "id": 4,
        "title": "Elasticsearch Tutorial",
        "authors": [ { "name": "John Doe" } ],
        "tags": [ "elasticsearch" ]
    },
]

Now we can use this query:

现在我们可以使用这个查询。

Page<Article> articleByTags 
  = articleService.findByTagUsingDeclaredQuery("elasticsearch", PageRequest.of(0, 10));

// articleByTags will contain 3 articles [ 1, 3, 4]
assertThat(articleByTags, containsInAnyOrder(
 hasProperty("id", is(1)),
 hasProperty("id", is(3)),
 hasProperty("id", is(4)))
);

3.2. Filtering All Documents

3.2.过滤所有文件

A common design pattern is to create a Filtered List View in the UI that shows all entities, but also allows the user to filter based on different criteria.

一个常见的设计模式是在用户界面中创建一个过滤的列表视图,显示所有实体,但也允许用户根据不同的标准进行过滤。

Let’s say we want to return all articles filtered by whatever tag the user selects:

比方说,我们想通过用户选择的任何标签来返回所有的文章,进行过滤。

@Query("{\"bool\": {\"must\": " +
  "{\"match_all\": {}}, \"filter\": {\"term\": {\"tags\": \"?0\" }}}}")
Page<Article> findByFilteredTagQuery(String tag, Pageable pageable);

Once again, we’re using Spring Data to construct our declared query.

再一次,我们使用Spring Data来构建我们声明的查询。

Consequently, the query we’re using is split into two pieces. The scoring query is the first term, in this case, match_all. The filter query is next and tells Elasticsearch which results to discard.

因此,我们使用的查询被分成了两部分。计分查询是第一个术语,在本例中是match_all。接下来是过滤查询,告诉Elasticsearch要放弃哪些结果。

Here is how we use this query:

下面是我们如何使用这个查询。

Page<Article> articleByTags =
  articleService.findByFilteredTagQuery("elasticsearch", PageRequest.of(0, 10));

// articleByTags will contain 3 articles [ 1, 3, 4]
assertThat(articleByTags, containsInAnyOrder(
  hasProperty("id", is(1)),
  hasProperty("id", is(3)),
  hasProperty("id", is(4)))
);

It is important to realize that although this returns the same results as our example above, this query will perform better.

重要的是要认识到,尽管这返回的结果与我们上面的例子相同,但这个查询会表现得更好。

3.3. Filtering Queries

3.3.过滤查询

Sometimes a search returns too many results to be usable. In that case, it’s nice to expose a filtering mechanism that can rerun the same search, just with the results narrowed down.

有时一个搜索返回的结果太多,无法使用。在这种情况下,暴露出一个过滤机制是很好的,它可以重新运行同样的搜索,只是将结果缩小了。

Here’s an example where we narrow down the articles an author has written, to just the ones with a specific tag:

这里有一个例子,我们把一个作者写的文章缩小到只有有特定标签的文章。

@Query("{\"bool\": {\"must\": " + 
  "{\"match\": {\"authors.name\": \"?0\"}}, " +
  "\"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
Page<Article> findByAuthorsNameAndFilteredTagQuery(
  String name, String tag, Pageable pageable);

Again, Spring Data is doing all the work for us.

同样,Spring Data正在为我们做所有的工作。

Let’s also look at how to construct this query ourselves:

我们也来看看如何自己构建这个查询。

QueryBuilder builder = boolQuery().must(
  nestedQuery("authors", boolQuery().must(termQuery("authors.name", "doe")), ScoreMode.None))
  .filter(termQuery("tags", "elasticsearch"));

We can, of course, use this same technique to filter on any other field in the document. But tags lend themselves particularly well to this use case.

当然,我们也可以使用同样的技术来过滤文档中的任何其他字段。但标签特别适合于这种使用情况。

Here is how to use the above query:

下面是如何使用上述查询。

SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(builder)
  .build();
List<Article> articles = 
  elasticsearchTemplate.queryForList(searchQuery, Article.class);

// articles contains [ 1, 4 ]
assertThat(articleByTags, containsInAnyOrder(
 hasProperty("id", is(1)),
 hasProperty("id", is(4)))
);

4. Filter Context

4.过滤语境

When we build a query, we need to differentiate between the Query Context and the Filter Context. Every query in Elasticsearch has a Query Context so we should be used to seeing them.

当我们建立一个查询时,我们需要区分查询上下文和过滤上下文。Elasticsearch的每个查询都有一个查询上下文,所以我们应该习惯于看到它们。

Not every query type supports the Filter Context. Therefore if we want to filter on tags, we need to know which query types we can use.

不是每个查询类型都支持Filter Context。因此,如果我们想对标签进行过滤,我们需要知道我们可以使用哪些查询类型。

The bool query has two ways to access the Filter Context. The first parameter, filter, is the one we use above. We can also use a must_not parameter to activate the context.

bool查询有两种方式来访问Filter Context。第一个参数,filter,就是我们上面使用的那个。我们还可以使用一个must_not参数来激活上下文。

The next query type we can filter is constant_score. This is useful when uu want to replace the Query Context with the results of the Filter and assign each result the same score.

我们可以过滤的下一个查询类型是constant_score。当我们想用Filter的结果来替换Query Context,并给每个结果赋予相同的分数时,这很有用。

The final query type that we can filter based on tags is the filter aggregation. This allows us to create aggregation groups based on the results of our filter. In other words, we can group all articles by tag in our aggregation result.

我们可以基于标签进行过滤的最后一种查询类型是过滤聚合。这允许我们根据我们的过滤器的结果来创建聚合组。换句话说,我们可以在聚合结果中按标签对所有文章进行分组。

5. Advanced Tagging

5.高级标签

So far, we have only talked about tagging using the most basic implementation. The next logical step is to create tags that are themselves key-value pairs. This would allow us to get even fancier with our queries and filters.

到目前为止,我们只谈到了使用最基本的实现的标记。下一个合乎逻辑的步骤是创建本身就是键值对的标签。这将使我们的查询和过滤变得更加复杂。

For example, we could change our tag field into this:

例如,我们可以把我们的标签字段改成这样。

@Field(type = Nested)
private List<Tag> tags;

Then we’d just change our filters to use nestedQuery types.

然后我们只需改变我们的过滤器,使用nestedQuery类型。

Once we understand how to use key-value pairs it is a small step to using complex objects as our tag. Not many implementations will need a full object as a tag, but it’s good to know we have this option should we require it.

一旦我们理解了如何使用键值对,那么使用复杂的对象作为我们的标记就是一个小步骤。没有多少实现会需要一个完整的对象作为标签,但是如果我们需要的话,知道我们有这个选项是很好的。

6. Conclusion

6.结论

In this article, we’ve covered the basics of implementing tagging using Elasticsearch.

在这篇文章中,我们已经介绍了使用Elasticsearch实现标签的基本知识。

As always, examples can be found over on GitHub.

一如既往,可以在GitHub上找到实例

Next »

A Simple Tagging Implementation with JPA