1. Overview
1.概述
In this article, we’re going to dive into some key concepts related to full-text search engines, with a special focus on Elasticsearch.
在这篇文章中,我们将深入探讨与全文搜索引擎有关的一些关键概念,并特别关注Elasticsearch。
As this is a Java-oriented article, we’re not going to give a detailed step-by-step tutorial on how to setup Elasticsearch and show how it works under the hood. Instead, we’re going to target the Java client, and how to use the main features like index, delete, get and search.
由于这是一篇面向Java的文章,我们不打算给出一个详细的分步教程,介绍如何设置Elasticsearch,以及它是如何在引擎盖下工作的。相反,我们将针对Java客户端,以及如何使用主要功能,如index、delete、get和search。
2. Setup
2.设置
For the sake of simplicity, we’ll use a docker image for our Elasticsearch instance, though any Elasticsearch instance listening on port 9200 will do.
为了简单起见,我们将为我们的Elasticsearch实例使用一个docker镜像,尽管任何监听端口为9200的Elasticsearch实例都可以。
We start by firing up our Elasticsearch instance:
我们首先启动我们的Elasticsearch实例。
docker run -d --name es762 -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.6.2
By default, Elasticsearch listens on the 9200 port for upcoming HTTP queries. We can verify that it is successfully launched by opening the http://localhost:9200/ URL in your favorite browser:
默认情况下,Elasticsearch对即将到来的HTTP查询采用9200端口进行监听。我们可以通过在你喜欢的浏览器中打开http://localhost:9200/ URL来验证它是否已经成功启动。
{
"name" : "M4ojISw",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "CNnjvDZzRqeVP-B04D3CmA",
"version" : {
"number" : "7.6.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f4c224",
"build_date" : "2020-03-18T23:22:18.622755Z",
"build_snapshot" : false,
"lucene_version" : "8.4.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.8.0-beta1"
},
"tagline" : "You Know, for Search"
}
3. Maven Configuration
3.Maven配置
Now that we have our basic Elasticsearch cluster up and running, let’s jump straight to the Java client. First of all, we need to have the following Maven dependency declared in our pom.xml file:
现在我们已经建立并运行了基本的Elasticsearch集群,让我们直接跳到Java客户端。首先,我们需要在pom.xml文件中声明以下Maven依赖项。
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.6.2</version>
</dependency>
You can always check the latest versions hosted by the Maven Central with the link provided before.
您可以通过之前提供的链接随时查看Maven中心托管的最新版本。
4. Java API
4.Java API[/strong]
Before we jump straight to how to use the main Java API features, we need to initiate the RestHighLevelClient:
在我们直接跳到如何使用主要的Java API功能之前,我们需要启动RestHighLevelClient:。
ClientConfiguration clientConfiguration =
ClientConfiguration.builder().connectedTo("localhost:9200").build();
RestHighLevelClient client = RestClients.create(clientConfiguration).rest();
4.1. Indexing Documents
4.1.为文件编制索引
The index() function allows to store an arbitrary JSON document and make it searchable:
index()函数允许存储一个任意的JSON文档并使其可被搜索。
@Test
public void givenJsonString_whenJavaObject_thenIndexDocument() {
String jsonObject = "{\"age\":10,\"dateOfBirth\":1471466076564,"
+"\"fullName\":\"John Doe\"}";
IndexRequest request = new IndexRequest("people");
request.source(jsonObject, XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
String index = response.getIndex();
long version = response.getVersion();
assertEquals(Result.CREATED, response.getResult());
assertEquals(1, version);
assertEquals("people", index);
}
Note that it is possible to use any JSON Java library to create and process your documents. If you are not familiar with any of these, you can use Elasticsearch helpers to generate your own JSON documents:
请注意,可以使用任何JSON Java库来创建和处理你的文档。如果你不熟悉其中任何一个,你可以使用Elasticsearch的帮助器来生成你自己的JSON文档。
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.field("fullName", "Test")
.field("dateOfBirth", new Date())
.field("age", "10")
.endObject();
IndexRequest indexRequest = new IndexRequest("people");
indexRequest.source(builder);
IndexResponse response = client.index(indexRequest, RequestOptions.DEFAULT);
assertEquals(Result.CREATED, response.getResult());
4.2. Querying Indexed Documents
4.2.查询有索引的文件
Now that we have a typed searchable JSON document indexed, we can proceed and search using the search() method:
现在我们已经有了一个可搜索的JSON文档的索引,我们可以继续进行并使用search() 方法进行搜索。
SearchRequest searchRequest = new SearchRequest();
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] searchHits = response.getHits().getHits();
List<Person> results =
Arrays.stream(searchHits)
.map(hit -> JSON.parseObject(hit.getSourceAsString(), Person.class))
.collect(Collectors.toList());
The results returned by the search() method are called Hits, each Hit refers to a JSON document matching a search request.
由search()方法返回的结果被称为Hits,每个Hit指的是符合搜索请求的JSON文档。
In this case, the results list contains all the data stored in the cluster. Note that in this example we’re using the FastJson library in order to convert JSON Strings to Java objects.
在这种情况下,results列表包含了存储在集群中的所有数据。请注意,在这个例子中,我们使用FastJson 库,以便将JSON Strings转换为Java对象。
We can enhance the request by adding additional parameters in order to customize the query using the QueryBuilders methods:
我们可以通过添加额外的参数来增强请求,以便使用QueryBuilders方法来定制查询。
SearchSourceBuilder builder = new SearchSourceBuilder()
.postFilter(QueryBuilders.rangeQuery("age").from(5).to(15));
SearchRequest searchRequest = new SearchRequest();
searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH);
searchRequest.source(builder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
4.3. Retrieving and Deleting Documents
4.3.检索和删除文件
The get() and delete() methods allow to get or delete a JSON document from the cluster using its id:
get()和delete()方法允许使用其id从集群中获取或删除一个JSON文档。
GetRequest getRequest = new GetRequest("people");
getRequest.id(id);
GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
// process fields
DeleteRequest deleteRequest = new DeleteRequest("people");
deleteRequest.id(id);
DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
The syntax is pretty straightforward, you just need to specify the index alongside the object’s id.
语法非常简单,你只需要在对象的id旁边指定索引。
5. QueryBuilders Examples
5.QueryBuilders 示例
The QueryBuilders class provides a variety of static methods used as dynamic matchers to find specific entries in the cluster. While using the search() method to look for specific JSON documents in the cluster, we can use query builders to customize the search results.
QueryBuilders类提供了各种静态方法,用作动态匹配器来查找集群中的特定条目。在使用search()方法寻找集群中的特定JSON文档时,我们可以使用查询构建器来定制搜索结果。
Here’s a list of the most common uses of the QueryBuilders API.
下面是QueryBuilders API最常见的用途列表。
The matchAllQuery() method returns a QueryBuilder object that matches all documents in the cluster:
matchAllQuery()方法返回一个QueryBuilder对象,该对象匹配集群中的所有文档。
QueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();
The rangeQuery() matches documents where a field’s value is within a certain range:
rangeQuery() 匹配某个字段的值在某个范围内的文档。
QueryBuilder matchDocumentsWithinRange = QueryBuilders
.rangeQuery("price").from(15).to(100)
Providing a field name – e.g. fullName, and the corresponding value – e.g. John Doe, The matchQuery() method matches all document with these exact field’s value:
提供一个字段名–例如fullName,和相应的值–例如John Doe,matchQuery()方法匹配所有具有这些确切字段值的文档。
QueryBuilder matchSpecificFieldQuery= QueryBuilders
.matchQuery("fullName", "John Doe");
We can as well use the multiMatchQuery() method to build a multi-fields version of the match query:
我们也可以使用multiMatchQuery()方法来建立一个多字段版本的匹配查询。
QueryBuilder matchSpecificFieldQuery= QueryBuilders.matchQuery(
"Text I am looking for", "field_1", "field_2^3", "*_field_wildcard");
We can use the caret symbol (^) to boost specific fields.
我们可以使用圆点符号(^)来提升特定字段。
In our example the field_2 has boost value set to three, making it more important than the other fields. Note that it’s possible to use wildcards and regex queries, but performance-wise, beware of memory consumption and response-time delay when dealing with wildcards, because something like *_apples may cause a huge impact on performance.
在我们的例子中,field_2的提升值设置为3,使其比其他字段更重要。注意,可以使用通配符和重组查询,但从性能上讲,在处理通配符时要注意内存消耗和响应时间延迟,因为像*_苹果这样的东西可能对性能造成巨大影响。
The coefficient of importance is used to order the result set of hits returned after executing the search() method.
重要性系数用于对执行search()方法后返回的点击结果集进行排序。
If you are more familiar with the Lucene queries syntax, you can use the simpleQueryStringQuery() method to customize search queries:
如果你对Lucene的查询语法比较熟悉,你可以使用simpleQueryStringQuery()方法来定制搜索查询:。
QueryBuilder simpleStringQuery = QueryBuilders
.simpleQueryStringQuery("+John -Doe OR Janette");
As you can probably guess, we can use the Lucene’s Query Parser syntax to build simple, yet powerful queries. Here’re some basic operators that can be used alongside the AND/OR/NOT operators to build search queries:
正如你可能猜到的,我们可以使用Lucene的查询分析器语法来建立简单而强大的查询。这里有一些基本的操作符,可以和AND/OR/NOT操作符一起使用,以建立搜索查询。
- The required operator (+): requires that a specific piece of text exists somewhere in fields of a document.
- The prohibit operator (–): excludes all documents that contain a keyword declared after the (–) symbol.
6. Conclusion
6.结论
In this quick article, we’ve seen how to use the ElasticSearch’s Java API to perform some of the common features related to full-text search engines.
在这篇快速文章中,我们已经看到了如何使用ElasticSearch的Java API来执行一些与全文搜索引擎相关的常见功能。
You can check out the example provided in this article in the GitHub project.
你可以在GitHub项目中查看本文提供的例子。