Introduction to Hibernate Search – Hibernate搜索简介

最后修改: 2017年 12月 11日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we’ll discuss the basics of Hibernate Search, how to configure it, and we’ll implement some simple queries.

在这篇文章中,我们将讨论Hibernate搜索的基础知识,如何配置它,我们将实现一些简单的查询。

2. Basics of Hibernate Search

2.Hibernate搜索的基础知识

Whenever we have to implement full-text search functionality, using tools we’re already well-versed with is always a plus.

每当我们必须实现全文搜索功能时,使用我们已经熟悉的工具总是一个优势。

In case we’re already using Hibernate and JPA for ORM, we’re only one step away from Hibernate Search.

如果我们已经在使用Hibernate和JPA进行ORM,那么我们离Hibernate Search只有一步之遥。

Hibernate Search integrates Apache Lucene, a high-performance and extensible full-text search-engine library written in Java. This combines the power of Lucene with the simplicity of Hibernate and JPA.

Hibernate Search集成了Apache Lucene,一个用Java编写的高性能和可扩展的全文搜索引擎库。这将Lucene的力量与Hibernate和JPA的简单性相结合。

Simply put, we just have to add some additional annotations to our domain classes, and the tool will take care of the things like database/index synchronization.

简单地说,我们只需在我们的领域类中添加一些额外的注解,而该工具将负责处理数据库/索引同步等事宜。

Hibernate Search also provides an Elasticsearch integration; however, as it’s still in an experimental stage, we’ll focus on Lucene here.

Hibernate Search也提供了Elasticsearch的集成;然而,由于它仍处于实验阶段,我们在此将重点讨论Lucene。

3. Configurations

3.配置

3.1. Maven Dependencies

3.1.Maven的依赖性

Before getting started, we first need to add the necessary dependencies to our pom.xml:

在开始之前,我们首先需要将必要的依赖性添加到我们的pom.xml

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-search-orm</artifactId>
    <version>5.8.2.Final</version>
</dependency>

For the sake of simplicity, we’ll use H2 as our database:

为了简单起见,我们将使用H2>作为我们的数据库。

<dependency>
    <groupId>com.h2database</groupId> 
    <artifactId>h2</artifactId>
    <version>1.4.196</version>
</dependency>

3.2. Configurations

3.2.配置

We also have to specify where Lucene should store the index.

我们还必须指定Lucene应该在哪里存储索引。

This can be done via the property hibernate.search.default.directory_provider.

这可以通过属性hibernate.search.default.directory_provider完成。

We’ll choose filesystem, which is the most straightforward option for our use case. More options are listed in the official documentation. Filesystem-master/filesystem-slave and infinispan are noteworthy for clustered applications, where the index has to be synchronized between nodes.

我们将选择filesystem,这是对我们的用例来说最直接的选项。更多的选项在官方文档中列出。Filesystem-master/filesystem-slaveinfinispan对于集群应用来说是值得注意的,在这种情况下,索引必须在节点之间同步。

We also have to define a default base directory where indexes will be stored:

我们还必须定义一个默认的基本目录,索引将被存储在那里。

hibernate.search.default.directory_provider = filesystem
hibernate.search.default.indexBase = /data/index/default

4. The Model Classes

4.模型类

After the configuration, we’re now ready to specify our model.

配置完成后,我们现在准备指定我们的模型。

On top of the JPA annotations @Entity and @Table, we have to add an @Indexed annotation. It tells Hibernate Search that the entity Product shall be indexed.

在JPA注解@Entity@Table之上,我们必须添加一个@Indexed注解。它告诉Hibernate Search,实体Product应被索引。

After that, we have to define the required attributes as searchable by adding a @Field annotation:

之后,我们必须通过添加@Field注解将所需的属性定义为可搜索属性

@Entity
@Indexed
@Table(name = "product")
public class Product {

    @Id
    private int id;

    @Field(termVector = TermVector.YES)
    private String productName;

    @Field(termVector = TermVector.YES)
    private String description;

    @Field
    private int memory;

    // getters, setters, and constructors
}

The termVector = TermVector.YES attribute will be required for the “More Like This” query later.

termVector = TermVector.YES属性在后面的 “More Like This “查询中是必需的。

5. Building the Lucene Index

5.建立Lucene索引

Before starting the actual queries, we have to trigger Lucene to build the index initially:

在开始实际查询之前,我们必须先触发Lucene来建立索引

FullTextEntityManager fullTextEntityManager 
  = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();

After this initial build, Hibernate Search will take care of keeping the index up to date. I. e. we can create, manipulate and delete entities via the EntityManager as usual.

在这个初始构建之后,Hibernate Search将负责保持索引的更新。也就是说,我们可以像往常一样通过EntityManager创建、操作和删除实体。

Note: we have to make sure that entities are fully committed to the database before they can be discovered and indexed by Lucene (by the way, this also the reason why the initial test data import in our example code test cases comes in a dedicated JUnit test case, annotated with @Commit).

注意:我们必须确保实体在被Lucene发现和索引之前已经完全提交到数据库中(顺便说一下,这也是为什么我们的示例代码测试案例中的初始测试数据导入是在一个专门的JUnit测试案例中进行的,注释为@Commit的原因)。

6. Building and Executing Queries

6.建立和执行查询

Now, we’re ready for creating our first query.

现在,我们已经准备好创建我们的第一个查询。

In the following section, we’ll show the general workflow for preparing and executing a query.

在下一节中,我们将展示准备和执行查询的一般工作流程。

After that, we’ll create some example queries for the most important query types.

之后,我们将为最重要的查询类型创建一些查询实例。

6.1. General Workflow for Creating and Executing a Query

6.1.创建和执行查询的一般工作流程

Preparing and executing a query in general consists of four steps:

准备和执行一个查询一般包括四个步骤

In step 1, we have to get a JPA FullTextEntityManager and from that a QueryBuilder:

在第1步中,我们必须得到一个JPA FullTextEntityManager,并从中得到一个QueryBuilder

FullTextEntityManager fullTextEntityManager 
  = Search.getFullTextEntityManager(entityManager);

QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory() 
  .buildQueryBuilder()
  .forEntity(Product.class)
  .get();

In step 2, we will create a Lucene query via the Hibernate query DSL:

在第二步,我们将通过Hibernate查询DSL创建一个Lucene查询。

org.apache.lucene.search.Query query = queryBuilder
  .keyword()
  .onField("productName")
  .matching("iphone")
  .createQuery();

In step 3, we’ll wrap the Lucene query into a Hibernate query:

在第3步中,我们将把Lucene查询包装成一个Hibernate查询。

org.hibernate.search.jpa.FullTextQuery jpaQuery
  = fullTextEntityManager.createFullTextQuery(query, Product.class);

Finally, in step 4 we’ll execute the query:

最后,在第四步,我们将执行查询。

List<Product> results = jpaQuery.getResultList();

Note: by default, Lucene sorts the results by relevance.

注意:默认情况下,Lucene按相关性对结果进行排序。

Steps 1, 3 and 4 are the same for all query types.

步骤1、3和4对所有查询类型都是一样的。

In the following, we will focus on step 2, i. e. how to create different types of queries.

在下文中,我们将重点讨论第2步,即如何创建不同类型的查询。

6.2. Keyword Queries

6.2.关键字查询

The most basic use-case is searching for a specific word.

最基本的用例是搜索一个特定的词

This is what we actually did already in the previous section:

这就是我们在上一节实际上已经做的事情。

Query keywordQuery = queryBuilder
  .keyword()
  .onField("productName")
  .matching("iphone")
  .createQuery();

Here, keyword() specifies that we are looking for one specific word, onField() tells Lucene where to look and matching() what to look for.

这里,keyword()指定我们正在寻找一个特定的词,onField()告诉Lucene在哪里寻找,matching()寻找什么。

6.3. Fuzzy Queries

6.3 模糊查询

Fuzzy queries are working like keyword queries, except that we can define a limit of “fuzziness”, above which Lucene shall accept the two terms as matching.

模糊查询的工作方式与关键词查询类似,只是我们可以定义一个 “模糊性 “的极限,超过这个极限,Lucene将接受两个词的匹配。

By withEditDistanceUpTo(), we can define how much a term may deviate from the other. It can be set to 0, 1, and 2, whereby the default value is 2 (note: this limitation is coming from the Lucene’s implementation).

通过withEditDistanceUpTo()我们可以定义一个术语可以偏离多少。它可以被设置为0、1和2,而默认值是2(:这个限制来自Lucene的实现)。

By withPrefixLength(), we can define the length of the prefix which shall be ignored by the fuzziness:

通过withPrefixLength(),我们可以定义前缀的长度,这将被模糊性忽略。

Query fuzzyQuery = queryBuilder
  .keyword()
  .fuzzy()
  .withEditDistanceUpTo(2)
  .withPrefixLength(0)
  .onField("productName")
  .matching("iPhaen")
  .createQuery();

6.4. Wildcard Queries

6.4.通配符查询

Hibernate Search also enables us to execute wildcard queries, i. e. queries for which a part of a word is unknown.

Hibernate搜索还使我们能够执行通配符查询,即对一个词的一部分未知的查询。

For this, we can use “?” for a single character, and “*” for any character sequence:

为此,我们可以用”?”表示单个字符,”*”表示任何字符序列。

Query wildcardQuery = queryBuilder
  .keyword()
  .wildcard()
  .onField("productName")
  .matching("Z*")
  .createQuery();

6.5. Phrase Queries

6.5.短语查询

If we want to search for more than one word, we can use phrase queries. We can either look for exact or for approximate sentences, using phrase() and withSlop(), if necessary. The slop factor defines the number of other words permitted in the sentence:

如果我们想搜索一个以上的单词,我们可以使用短语查询。我们可以寻找精确的或近似的句子,必要时使用phrase()/em>和withSlop()/em>。slop系数定义了句子中允许的其他词的数量。

Query phraseQuery = queryBuilder
  .phrase()
  .withSlop(1)
  .onField("description")
  .sentence("with wireless charging")
  .createQuery();

6.6. Simple Query String Queries

6.6.简单的查询字符串查询

With the previous query types, we had to specify the query type explicitly.

对于以前的查询类型,我们必须明确指定查询类型。

If we want to give some more power to the user, we can use simple query string queries: by that, he can define his own queries at runtime.

如果我们想给用户一些更多的权力,我们可以使用简单的查询字符串查询。通过这种方式,他可以在运行时定义自己的查询

The following query types are supported:

支持以下查询类型。

  • boolean (AND using “+”, OR using “|”, NOT using “-“)
  • prefix (prefix*)
  • phrase (“some phrase”)
  • precedence (using parentheses)
  • fuzzy (fuzy~2)
  • near operator for phrase queries (“some phrase”~3)

The following example would combine fuzzy, phrase and boolean queries:

下面的例子将结合模糊、短语和布尔查询。

Query simpleQueryStringQuery = queryBuilder
  .simpleQueryString()
  .onFields("productName", "description")
  .matching("Aple~2 + \"iPhone X\" + (256 | 128)")
  .createQuery();

6.7. Range Queries

6.7.范围查询

Range queries search for a value in between given boundaries. This can be applied to numbers, dates, timestamps, and strings:

范围查询在给定的边界之间搜索一个值。这可以应用于数字、日期、时间戳和字符串。

Query rangeQuery = queryBuilder
  .range()
  .onField("memory")
  .from(64).to(256)
  .createQuery();

6.8. More Like This Queries

6.8.更多类似的查询

Our last query type is the “More Like This” – query. For this, we provide an entity, and Hibernate Search returns a list with similar entities, each with a similarity score.

我们的最后一个查询类型是”More Like This“–查询。为此,我们提供一个实体,Hibernate Search会返回一个类似实体的列表,每个实体都有一个相似度分数。

As mentioned before, the termVector = TermVector.YES attribute in our model class is required for this case: it tells Lucene to store the frequency for each term during indexing.

如前所述,在这种情况下,我们的模型类中的termVector = TermVector.YES属性是必需的:它告诉Lucene在索引期间为每个术语存储频率。

Based on this, the similarity will be calculated at query execution time:

在此基础上,相似度将在查询执行时被计算出来。

Query moreLikeThisQuery = queryBuilder
  .moreLikeThis()
  .comparingField("productName").boostedTo(10f)
  .andField("description").boostedTo(1f)
  .toEntity(entity)
  .createQuery();
List<Object[]> results = (List<Object[]>) fullTextEntityManager
  .createFullTextQuery(moreLikeThisQuery, Product.class)
  .setProjection(ProjectionConstants.THIS, ProjectionConstants.SCORE)
  .getResultList();

6.9. Searching More Than One Field

6.9.搜索多于一个字段

Until now, we only created queries for searching one attribute, using onField().

到目前为止,我们只创建了搜索一个属性的查询,使用onField()

Depending on the use case, we can also search two or more attributes:

根据使用情况,我们也可以搜索两个或多个属性

Query luceneQuery = queryBuilder
  .keyword()
  .onFields("productName", "description")
  .matching(text)
  .createQuery();

Moreover, we can specify each attribute to be searched separately, e. g. if we want to define a boost for one attribute:

此外,我们可以单独指定要搜索的每个属性,例如,如果我们想为一个属性定义一个提升。

Query moreLikeThisQuery = queryBuilder
  .moreLikeThis()
  .comparingField("productName").boostedTo(10f)
  .andField("description").boostedTo(1f)
  .toEntity(entity)
  .createQuery();

6.10. Combining Queries

6.10.组合查询

Finally, Hibernate Search also supports combining queries using various strategies:

最后,Hibernate Search还支持使用各种策略组合查询。

  • SHOULD: the query should contain the matching elements of the subquery
  • MUST: the query must contain the matching elements of the subquery
  • MUST NOT: the query must not contain the matching elements of the subquery

The aggregations are similar to the boolean ones AND, OR and NOT. However, the names are different to emphasize that they also have an impact on the relevance.

聚合是类似于布尔式的AND, ORNOT但是,名称不同,以强调它们对相关性也有影响。

For example, a SHOULD between two queries is similar to boolean OR: if one of the two queries has a match, this match will be returned.

例如,两个查询之间的SHOULD类似于布尔OR:如果两个查询中的一个有匹配,这个匹配将被返回。

However, if both queries match, the match will have a higher relevance compared to if only one query matches:

然而,如果两个查询都匹配,那么与只有一个查询匹配相比,该匹配将具有更高的相关性。

Query combinedQuery = queryBuilder
  .bool()
  .must(queryBuilder.keyword()
    .onField("productName").matching("apple")
    .createQuery())
  .must(queryBuilder.range()
    .onField("memory").from(64).to(256)
    .createQuery())
  .should(queryBuilder.phrase()
    .onField("description").sentence("face id")
    .createQuery())
  .must(queryBuilder.keyword()
    .onField("productName").matching("samsung")
    .createQuery())
  .not()
  .createQuery();

7. Conclusion

7.结论

In this article, we discussed the basics of Hibernate Search and showed how to implement the most important query types. More advanced topics can be found it the official documentation.

在这篇文章中,我们讨论了Hibernate搜索的基础知识,并展示了如何实现最重要的查询类型。更高级的主题可以在官方文档中找到。

As always, the full source code of the examples is available over on GitHub.

一如既往,这些示例的完整源代码可在GitHub上获得over