1. Overview
1.概述
In this tutorial, we’ll implement different solutions to the problem of finding the k largest elements in an array with Java. To describe time complexity we`ll be using Big-O notation.
在本教程中,我们将用Java实现对寻找数组中k最大元素问题的不同解决方案。为了描述时间复杂性,我们将使用Big-O符号。
2. Brute-Force Solution
2.粗暴的解决方案
The brute-force solution to this problem is to iterate through the given array k times. In each iteration, we’ll find the largest value. Then we’ll remove this value from the array and put into the output list:
这个问题的粗暴解决方法是:在给定的数组中迭代k次。在每次迭代中,我们将找到最大的值。然后我们将把这个值从数组中移除,并放入输出列表中。
public List findTopK(List input, int k) {
List array = new ArrayList<>(input);
List topKList = new ArrayList<>();
for (int i = 0; i < k; i++) {
int maxIndex = 0;
for (int j = 1; j < array.size(); j++) {
if (array.get(j) > array.get(maxIndex)) {
maxIndex = j;
}
}
topKList.add(array.remove(maxIndex));
}
return topKList;
}
If we suppose n to be the size of the given array, the time complexity of this solution is O(n * k). Furthermore, this is the most inefficient solution.
如果我们假设n是给定数组的大小,这个解决方案的时间复杂度为O(n * k)。此外,这是最没有效率的解决方案。
3. Java Collections Approach
3.Java集合方法
However, more efficient solutions to this problem exist. In this section, we’ll explain two of them using Java Collections.
然而,这个问题有更有效的解决方案。在这一节中,我们将用Java集合来解释其中的两个。
3.1. TreeSet
3.1. TreeSet
TreeSet has a Red-Black Tree data structure as a backbone. As a result, putting a value to this set costs O(log n). TreeSet is a sorted collection. Therefore, we can put all the values in the TreeSet and extract the first k of them:
TreeSet有一个红黑树 数据结构作为骨干。因此,给这个集合放一个值需要花费O(log n)。TreeSet是一个排序的集合。因此,我们可以将所有的值放入TreeSet 并 提取其中的前k。
public List<Integer> findTopK(List<Integer> input, int k) {
Set<Integer> sortedSet = new TreeSet<>(Comparator.reverseOrder());
sortedSet.addAll(input);
return sortedSet.stream().limit(k).collect(Collectors.toList());
}
The time complexity of this solution is O(n * log n). Above all, this is supposed to be more efficient than the brute-force approach if k ≥ log n.
这个解决方案的时间复杂性为O(n * log n)。最重要的是,如果k ≥ log n,这应该比暴力方法更有效率。
It’s important to remember that TreeSet contains no duplicates. As a result, the solution works only for an input array with distinct values.
重要的是要记住,TreeSet不包含重复的内容。因此,该解决方案只适用于具有不同值的输入数组。
3.2. PriorityQueue
3.2.PriorityQueue(优先级队列)
PriorityQueue is a Heap data structure in Java. With its help, we can achieve an O(n * log k) solution. Moreover, this will be a faster solution than the previous one. Due to the stated problem, k is always less than the size of the array. So, it means that O(n * log k) ≤ O(n * log n).
PriorityQueue是Java中一个堆数据结构。在它的帮助下,我们可以实现O(n * log k)解决方案。此外,这将是一个比前一个更快的解决方案。由于所述问题,k总是小于阵列的大小。所以,这意味着O(n * log k) ≤ O(n * log n).
The algorithm iterates once through the given array. At each iteration, we’ll add a new element to the heap. Also, we’ll keep the size of the heap to be less than or equal to k. So, we’ll have to remove extra elements from the heap and add new ones. As a result, after iterating through the array, the heap will contain the k largest values:
该算法在给定的数组中迭代一次。在每次迭代中,我们都会向堆中添加一个新元素。另外,我们将保持堆的大小小于或等于k。所以,我们必须从堆中删除多余的元素,并添加新的元素。结果是,在遍历数组后,堆将包含k最大的值。
public List<Integer> findTopK(List<Integer> input, int k) {
PriorityQueue<Integer> maxHeap = new PriorityQueue<>();
input.forEach(number -> {
maxHeap.add(number);
if (maxHeap.size() > k) {
maxHeap.poll();
}
});
List<Integer> topKList = new ArrayList<>(maxHeap);
Collections.reverse(topKList);
return topKList;
}
4. Selection Algorithm
4.选择算法
There are many approaches to solve the given problem. And, although it’s beyond the scope of this tutorial, using the Selection algorithm approach will be the best because it yields a linear time complexity.
有许多方法可以解决给定的问题。而且,尽管这超出了本教程的范围,使用选择算法的方法将是最好的,因为它产生了线性时间复杂度。
5. Conclusion
5.总结
In this tutorial, we’ve described several solutions for finding the k largest elements in an array.
在本教程中,我们已经介绍了几种寻找数组中k最大元素的解决方案。
As usual, the example code is available over on GitHub.
像往常一样,示例代码可以在GitHub上获得。