Elasticsearch Queries with Spring Data – 使用Spring Data的Elasticsearch查询

最后修改: 2016年 3月 18日


1. Introduction


In a previous article, we demonstrated how to configure and use Spring Data Elasticsearch for a project. In this article, we will examine several query types offered by Elasticsearch and we’ll also talk about field analyzers and their impact on search results.

前一篇文章中,我们演示了如何为一个项目配置和使用Spring Data Elasticsearch。在这篇文章中,我们将研究Elasticsearch提供的几种查询类型,我们还将讨论字段分析器及其对搜索结果的影响。

2. Analyzers


All stored string fields are, by default, processed by an analyzer. An analyzer consists of one tokenizer and several token filters, and is usually preceded by one or more character filters.


The default analyzer splits the string by common word separators (such as spaces or punctuation) and puts every token in lowercase. It also ignores common English words.


Elasticsearch can also be configured to regard a field as analyzed and not-analyzed at the same time.


For example, in an Article class, suppose we store the title field as a standard analyzed field. The same field with the suffix verbatim will be stored as a not-analyzed field:


  mainField = @Field(type = Text, fielddata = true),
  otherFields = {
      @InnerField(suffix = "verbatim", type = Keyword)
private String title;

Here, we apply the @MultiField annotation to tell Spring Data that we would like this field to be indexed in several ways. The main field will use the name title and will be analyzed according to the rules described above.

在这里,我们应用@MultiField注解来告诉Spring Data,我们希望这个字段能以多种方式被索引。主字段将使用title的名字,并将根据上面描述的规则进行分析。

But we also provide a second annotation, @InnerField, which describes an additional indexing of the title field. We use FieldType.keyword to indicate that we do not want to use an analyzer when performing the additional indexing of the field, and that this value should be stored using a nested field with the suffix verbatim.


2.1. Analyzed Fields


Let’s look at an example. Suppose an article with the title “Spring Data Elasticsearch” is added to our index. The default analyzer will break up the string at the space characters and produce lowercase tokens: “spring“, “data”, and “elasticsearch“.

让我们看一个例子。假设一篇标题为 “Spring Data Elasticsearch “的文章被添加到我们的索引中。默认的分析器会在空格字符处将字符串打散,并产生小写的标记。”spring“,”data”,和”elasticsearch“。

Now we may use any combination of these terms to match a document:


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchQuery("title", "elasticsearch data"))

2.2. Non-analyzed Fields


A non-analyzed field is not tokenized, so can only be matched as a whole when using match or term queries:


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchQuery("title.verbatim", "Second Article About Elasticsearch"))

Using a match query, we may only search by the full title, which is also case-sensitive.


3. Match Query


A match query accepts text, numbers and dates.


There are three type of “match” query:

有三种类型的 “匹配 “查询。

  • boolean
  • phrase and
  • phrase_prefix

In this section, we will explore the boolean match query.


3.1. Matching With Boolean Operators


boolean is the default type of a match query; you can specify which boolean operator to use (or is the default):


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchQuery("title","Search engines").operator(Operator.AND))
SearchHits<Article> articles = elasticsearchTemplate()
  .search(searchQuery, Article.class, IndexCoordinates.of("blog"));

This query would return an article with the title “Search engines” by specifying two terms from the title with and operator. But what will happen if we search with the default (or) operator when only one of the terms matches?

通过用and操作符指定标题中的两个术语,这个查询将返回一篇标题为 “搜索引擎 “的文章。但是,如果我们使用默认的(or)操作符进行搜索,当只有一个术语匹配时,会发生什么?

NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchQuery("title", "Engines Solutions"))
SearchHits<Article> articles = elasticsearchTemplate()
  .search(searchQuery, Article.class, IndexCoordinates.of("blog"));
assertEquals(1, articles.getTotalHits());
assertEquals("Search engines", articles.getSearchHit(0).getContent().getTitle());

The “Search engines” article is still matched, but it will have a lower score because not all of the terms matched.


The sum of the scores of each matching term add up to the total score of each resulting document.


There may be situations in which a document containing a rare term entered in the query will have higher rank than a document that contains several common terms.


3.2. Fuzziness


When the user makes a typo in a word, it is still possible to match it with a search by specifying a fuzziness parameter, which allows inexact matching.

当用户在一个单词中出现错别字时,仍然可以通过指定一个fuzziness 参数来进行搜索匹配,这允许不精确的匹配。

For string fields, fuzziness means the edit distance: the number of one-character changes that need to be made to one string to make it the same as another string.


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchQuery("title", "spring date elasticsearch")

The prefix_length parameter is used to improve performance. In this case, we require that the first three characters should match exactly, which reduces the number of possible combinations.


5. Phrase Search


Phase search is stricter, although you can control it with the slop parameter. This parameter tells the phrase query how far apart terms are allowed to be while still considering the document a match.


In other words, it represents the number of times you need to move a term in order to make the query and document match:


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
  .withQuery(matchPhraseQuery("title", "spring elasticsearch").slop(1))

Here the query will match the document with the title “Spring Data Elasticsearch” because we set the slop to one.

在这里,查询将匹配标题为”Spring Data Elasticsearch“的文件,因为我们将slop设为1。

6. Multi Match Query


When you want to search in multiple fields then you could use QueryBuilders#multiMatchQuery() where you specify all the fields to match:


NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()

Here we search the title and tags fields for a match.


Notice that here we use the “best fields” scoring strategy. It will take the maximum score among the fields as a document score.

请注意,这里我们使用的是 “最佳字段 “的评分策略。它将取各字段中的最大分数作为文件分数。

7. Aggregations


In our Article class we have also defined a tags field, which is non-analyzed. We could easily create a tag cloud by using an aggregation.


Remember that, because the field is non-analyzed, the tags will not be tokenized:


TermsAggregationBuilder aggregation = AggregationBuilders.terms("top_tags")
SearchSourceBuilder builder = new SearchSourceBuilder().aggregation(aggregation);

SearchRequest searchRequest = 
  new SearchRequest().indices("blog").types("article").source(builder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

Map<String, Aggregation> results = response.getAggregations().asMap();
StringTerms topTags = (StringTerms) results.get("top_tags");

List<String> keys = topTags.getBuckets()
  .map(b -> b.getKeyAsString())
assertEquals(asList("elasticsearch", "spring data", "search engines", "tutorial"), keys);

8. Summary


In this article, we discussed the difference between analyzed and non-analyzed fields, and how this distinction affects search.


We also learned about several types of queries provided by Elasticsearch, such as the match query, phrase match query, full-text search query, and boolean query.


Elasticsearch provides many other types of queries, such as geo queries, script queries and compound queries. You can read about them in the Elasticsearch documentation and explore the Spring Data Elasticsearch API in order to use these queries in your code.

Elasticsearch提供了许多其他类型的查询,例如地理查询、脚本查询和复合查询。您可以在Elasticsearch文档中阅读这些内容,并探索Spring Data Elasticsearch API,以便在您的代码中使用这些查询。

You can find a project containing the examples used in this article in the GitHub repository.

你可以在GitHub 仓库中找到包含本文中使用的示例的项目。