1. Overview
1.概述
In this article, we’ll cover advantages of a binary search over a simple linear search and walk through its implementation in Java.
在这篇文章中,我们将介绍二进制搜索相对于简单线性搜索的优势,并介绍其在Java中的实现。
2. Need for Efficient Search
2.需要高效的搜索
Let’s say we’re in the wine-selling business and millions of buyers are visiting our application every day.
假设我们从事葡萄酒销售业务,每天有数百万买家访问我们的应用程序。
Through our app, a customer can filter out items which have a price below n dollars, select a bottle from the search results, and add them to his cart. We have millions of users seeking wines with a price limit every second. The results need to be fast.
通过我们的应用程序,客户可以过滤出价格低于n美元的商品,从搜索结果中选择一瓶,并将其加入购物车。我们每秒钟都有数百万用户在寻找有价格限制的葡萄酒。结果需要快速。
On the backend, our algorithm runs a linear search through the entire list of wines comparing the price limit entered by the customer with the price of every wine bottle in the list.
在后端,我们的算法在整个酒单中进行线性搜索,将客户输入的价格限制与酒单中每瓶酒的价格进行比较。
Then, it returns items which have a price less than or equal to the price limit. This linear search has a time complexity of O(n).
然后,它返回那些价格小于或等于价格限制的项目。这种线性搜索的时间复杂性为O(n)。
This means the bigger the number of wine bottles in our system, the more time it will take. The search time increases proportionately to the number of new items introduced.
这意味着我们系统中的酒瓶数量越多,所需时间就越长。搜索时间与引入的新项目数量成正比增加。。
If we start saving items in sorted order and search for items using the binary search, we can achieve a complexity of O(log n).
如果我们开始以排序的方式保存项目,并使用二进制搜索来搜索项目,我们可以达到O(log n)的复杂度。
With binary search, the time taken by the search results naturally increases with the size of the dataset, but not proportionately.
在二进制搜索中,搜索结果所花费的时间自然会随着数据集的大小而增加,但不是按比例增加。。
3. Binary Search
3.二进制搜索
Simply put, the algorithm compares the key value with the middle element of the array; if they are unequal, the half in which the key cannot be part of is eliminated, and the search continues for the remaining half until it succeeds.
简单地说,该算法将key值与数组的中间元素进行比较;如果它们不相等,则剔除key不属于的那一半,继续搜索剩余的一半,直到成功。
Remember – the key aspect here is that the array is already sorted.
记住–这里的关键是,数组已经被排序了。
If the search ends with the remaining half being empty, the key is not in the array.
如果搜索结束后,剩下的一半是空的,则说明key不在数组中。
3.1. Iterative Implementation
3.1.迭代实施
public int runBinarySearchIteratively(
int[] sortedArray, int key, int low, int high) {
int index = Integer.MAX_VALUE;
while (low <= high) {
int mid = low + ((high - low) / 2);
if (sortedArray[mid] < key) {
low = mid + 1;
} else if (sortedArray[mid] > key) {
high = mid - 1;
} else if (sortedArray[mid] == key) {
index = mid;
break;
}
}
return index;
}
The runBinarySearchIteratively method takes a sortedArray, key & the low & high indexes of the sortedArray as arguments. When the method runs for the first time the low, the first index of the sortedArray, is 0, while the high, the last index of the sortedArray, is equal to its length – 1.
runBinarySearchIteratively方法以sortedArray、key和sortedArray的low和high索引为参数。当该方法第一次运行时,low,即sortedArray的第一个索引,是0,而high,即sortedArray的最后一个索引,等于其长度-1。
The middle is the middle index of the sortedArray. Now the algorithm runs a while loop comparing the key with the array value of the middle index of the sortedArray.
middle是sortedArray的中间索引。现在,该算法运行一个while循环,将key与sortedArray的中间索引的数组值进行比较。
Notice how the middle index is generated (int mid = low + ((high – low) / 2). This to accommodate for extremely large arrays. If the middle index is generated simply by getting the middle index (int mid = (low + high) / 2), an overflow may occur for an array containing 230 or more elements as the sum of low + high could easily exceed the maximum positive int value.
注意到中间索引是如何生成的(int mid = low + ((high – low) / 2)。如果简单地通过获取中间索引生成(int mid = (low + high) / 2),对于一个包含230或更多元素的数组,可能会发生溢出,因为low + high的总和很容易超过最大的正int值。
3.2. Recursive Implementation
3.2.递归实现
Now, let’s have a look at a simple, recursive implementation as well:
现在,让我们也来看看一个简单的、递归的实现。
public int runBinarySearchRecursively(
int[] sortedArray, int key, int low, int high) {
int middle = low + ((high - low) / 2);
if (high < low) {
return -1;
}
if (key == sortedArray[middle]) {
return middle;
} else if (key < sortedArray[middle]) {
return runBinarySearchRecursively(
sortedArray, key, low, middle - 1);
} else {
return runBinarySearchRecursively(
sortedArray, key, middle + 1, high);
}
}
The runBinarySearchRecursively method accepts a sortedArray, key, the low and high indexes of the sortedArray.
runBinarySearchRecursively方法接受一个sortedArray、key、low和high的索引。
3.3. Using Arrays.binarySearch()
3.3.使用Arrays.binarySearch()
int index = Arrays.binarySearch(sortedArray, key);
A sortedArray and an int key, which is to be searched in the array of integers, are passed as arguments to the binarySearch method of the Java Arrays class.
A sortedArray和一个int key(要在整数数组中搜索)被作为参数传递给Java Arrays类的binarySearch方法。
3.4. Using Collections.binarySearch()
3.4.使用Collections.binarySearch()
int index = Collections.binarySearch(sortedList, key);
A sortedList & an Integer key, which is to be searched in the list of Integer objects, are passed as arguments to the binarySearch method of the Java Collections class.
A sortedList & 一个Integer key,要在Integer对象的列表中进行搜索,作为参数传递给Java Collections类的binarySearch方法。
3.5. Performance
3.5.性能
Whether to use a recursive or an iterative approach for writing the algorithm is mostly a matter of personal preference. But still here are a few points we should be aware of:
是使用递归方法还是迭代方法来编写算法,这主要是个人偏好的问题。但这里仍有几点是我们应该注意的。
1. Recursion can be slower due to the overhead of maintaining a stack and usually takes up more memory
2. Recursion is not stack-friendly. It may cause StackOverflowException when processing big data sets
3. Recursion adds clarity to the code as it makes it shorter in comparison to the iterative approach
1.由于维护堆栈的开销,递归可能会更慢,通常会占用更多的内存
。
2.递归对堆栈不友好。在处理大数据集时,它可能导致StackOverflowException
3.
3.递归增加了代码的清晰度,因为与迭代方法相比,它使代码更短。
Ideally, a binary search will perform less number of comparisons in contrast to a linear search for large values of n. For smaller values of n, the linear search could perform better than a binary search.
理想情况下,对于大的n值,二进制搜索会比线性搜索执行较少的比较次数。对于较小的n值,线性搜索可能比二进制搜索表现更好。
One should know that this analysis is theoretical and might vary depending on the context.
人们应该知道,这种分析是理论性的,可能会因背景不同而不同。
Also, the binary search algorithm needs a sorted data set which has its costs too. If we use a merge sort algorithm for sorting the data, an additional complexity of n log n is added to our code.
另外,二进制搜索算法需要一个排序的数据集,这也有其成本。如果我们使用合并排序算法对数据进行排序,我们的代码中就会增加n log n的额外复杂度。
So first we need to analyze our requirements well and then take a decision on which search algorithm would suit our requirements best.
因此,首先我们需要很好地分析我们的要求,然后决定哪种搜索算法最适合我们的要求。
4. Conclusion
4.结论
This tutorial demonstrated a binary search algorithm implementation and a scenario where it would be preferable to use it instead of a linear search.
本教程演示了二进制搜索算法的实现,以及最好使用它而不是线性搜索的场景。
Please find the code for the tutorial over on GitHub.
请在GitHub上找到该教程的代码over。