1. Overview
1.概述
In this article, we’ll look at different ways to search an array for a specified value.
在这篇文章中,我们将看一下在数组中搜索指定值的不同方法。
We’ll also compare how these perform using JMH (the Java Microbenchmark Harness) to determine which method works best.
我们还将使用JMH(Java Microbenchmark Harness)比较这些方法的性能,以确定哪种方法效果最好。
2. Setup
2.设置
For our examples, we’ll use an array that contains randomly generated Strings for each test:
对于我们的例子,我们将使用一个数组,其中包含随机生成的字符串,用于每个测试。
String[] seedArray(int length) {
String[] strings = new String[length];
Random value = new Random();
for (int i = 0; i < length; i++) {
strings[i] = String.valueOf(value.nextInt());
}
return strings;
}
To reuse the array in each benchmark, we’ll declare an inner class to hold the array and the count so we can declare its scope for JMH:
为了在每个基准中重复使用数组,我们将声明一个内部类来保存数组和计数,这样我们就可以为JMH声明其范围。
@State(Scope.Benchmark)
public static class SearchData {
static int count = 1000;
static String[] strings = seedArray(1000);
}
3. Basic Search
3.基本搜索
Three commonly used methods for searching an array are as a List, a Set, or with a loop that examines each member until it finds a match.
三种常用的搜索数组的方法是作为List, a Set,或者用一个循环来检查每个成员,直到找到一个匹配。
Let’s start with three methods that implement each algorithm:
让我们从实现每种算法的三种方法开始。
boolean searchList(String[] strings, String searchString) {
return Arrays.asList(SearchData.strings)
.contains(searchString);
}
boolean searchSet(String[] strings, String searchString) {
Set<String> stringSet = new HashSet<>(Arrays.asList(SearchData.strings));
return stringSet.contains(searchString);
}
boolean searchLoop(String[] strings, String searchString) {
for (String string : SearchData.strings) {
if (string.equals(searchString))
return true;
}
return false;
}
We’ll use these class annotations to tell JMH to output average time in microseconds and run for five warmup iterations to ensure that our tests are reliable:
我们将使用这些类注解来告诉JMH输出以微秒为单位的平均时间,并运行五个热身迭代,以确保我们的测试是可靠的。
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
And run each test in a loop:
并在一个循环中运行每个测试。
@Benchmark
public void searchArrayLoop() {
for (int i = 0; i < SearchData.count; i++) {
searchLoop(SearchData.strings, "T");
}
}
@Benchmark
public void searchArrayAllocNewList() {
for (int i = 0; i < SearchData.count; i++) {
searchList(SearchData.strings, "T");
}
}
@Benchmark
public void searchArrayAllocNewSet() {
for (int i = 0; i < SearchData.count; i++) {
searchSet(SearchData.strings, "S");
}
}
When we run with 1000 searches for each method, our results look something like this:
当我们对每种方法进行1000次搜索时,我们的结果看起来是这样的。
SearchArrayTest.searchArrayAllocNewList avgt 20 937.851 ± 14.226 us/op
SearchArrayTest.searchArrayAllocNewSet avgt 20 14309.122 ± 193.844 us/op
SearchArrayTest.searchArrayLoop avgt 20 758.060 ± 9.433 us/op
The loop search is more efficient than others. But this is at least partly because of how we’re using collections.
循环搜索比其他的更有效。但这至少有一部分是因为我们使用集合的方式。
We’re creating a new List instance with each call to searchList() and a new List and a new HashSet with each call to searchSet(). Creating these objects creates an additional cost that looping through the array doesn’t.
我们在每次调用searchList()时创建一个新的List实例,在每次调用searchSet()时创建一个新的List和一个新的HashSet。创建这些对象会产生额外的成本,而在数组中循环则不会。
4. More Efficient Search
4.更高效的搜索
What happens when we create single instances of List and Set and then reuse them for each search?
当我们创建List和Set的单个实例,然后在每次搜索中重复使用它们时会发生什么?
Let’s give it a try:
让我们试一试吧。
public void searchArrayReuseList() {
List asList = Arrays.asList(SearchData.strings);
for (int i = 0; i < SearchData.count; i++) {
asList.contains("T");
}
}
public void searchArrayReuseSet() {
Set asSet = new HashSet<>(Arrays.asList(SearchData.strings));
for (int i = 0; i < SearchData.count; i++) {
asSet.contains("T");
}
}
We’ll run these methods with the same JMH annotations as above, and include the results for the simple loop for comparison.
我们将用与上面相同的JMH注释来运行这些方法,并包括简单循环的结果以进行比较。
We see very different results:
我们看到非常不同的结果。
SearchArrayTest.searchArrayLoop avgt 20 758.060 ± 9.433 us/op
SearchArrayTest.searchArrayReuseList avgt 20 837.265 ± 11.283 us/op
SearchArrayTest.searchArrayReuseSet avgt 20 14.030 ± 0.197 us/op
While searching the List is marginally faster than before, Set drops to less than 1 percent of the time required for the loop!
虽然搜索List的速度比以前略快,但Set的时间降到了循环所需的1%以下!。
Now that we’ve removed the time required for creating new Collections from each search, these results make sense.
现在,我们已经从每次搜索中删除了创建新收藏所需的时间,这些结果是有意义的。
Searching a hash table, the structure underlying a HashSet, has a time complexity of 0(1), while an array, which underlies the ArrayList is 0(n).
搜索哈希表(HashSet的基础结构)的时间复杂度为0(1),而作为ArrayList基础的数组的时间复杂度为0(n)。
5. Binary Search
5.二进制搜索
Another method for searching an array is a binary search. While very efficient, a binary search requires that the array is sorted in advance.
另一种搜索数组的方法是二进制搜索。虽然非常有效,但二进制搜索需要事先对数组进行排序。
Let’s sort the array and try the binary search:
让我们对数组进行排序并尝试二进制搜索。
@Benchmark
public void searchArrayBinarySearch() {
Arrays.sort(SearchData.strings);
for (int i = 0; i < SearchData.count; i++) {
Arrays.binarySearch(SearchData.strings, "T");
}
}
SearchArrayTest.searchArrayBinarySearch avgt 20 26.527 ± 0.376 us/op
Binary search is very fast, although less efficient than the HashSet: the worst case performance for a binary search is 0(log n), which places its performance between that of an array search and a hash table.
二进制搜索是非常快的,尽管效率不如HashSet:二进制搜索的最坏情况下的性能是0(log n),这使其性能介于数组搜索和哈希表之间。
6. Conclusion
6.结论
We’ve seen several methods of searching through an array.
我们已经看到了几种搜索数组的方法。
Based on our results, a HashSet works best for searching through a list of values. However, we need to create them in advance and store them in the Set.
根据我们的结果,HashSet对于搜索一个值的列表来说效果最好。然而,我们需要提前创建它们,并将它们存储在Set.中。
As always, the full source code of the examples is available over on GitHub.
一如既往,这些示例的完整源代码可在GitHub上获得。