1. Introduction
1.绪论
In this tutorial, we’ll see how Heap Sort works, and we’ll implement it in Java.
在本教程中,我们将看到堆排序是如何工作的,并且我们将在Java中实现它。
Heap Sort is based on the Heap data structure. In order to understand Heap Sort properly, we’ll first dig into Heaps and how they are implemented.
堆排序是基于堆数据结构的。为了正确理解堆排序,我们首先要深入了解堆以及它们是如何实现的。
2. Heap Data Structure
2.堆数据结构
A Heap is a specialized tree-based data structure. Therefore it’s composed of nodes. We assign the elements to nodes: every node contains exactly one element.
堆是一个专门的基于树的数据结构。因此它是由节点组成的。我们把元素分配给结点:每个结点正好包含一个元素。
Also, nodes can have children. If a node doesn’t have any children, we call it leaf.
另外,节点可以有子节点。如果一个节点没有任何子节点,我们称它为叶子。
What Heap makes special are two things:
Heap的特别之处在于两点。
- every node’s value must be less or equal to all values stored in its children
- it’s a complete tree, which means it has the least possible height
Because of the 1st rule, the least element always will be in the root of the tree.
由于第一条规则,最少的元素总是在树的根部。
How we enforce these rules is implementation-dependent.
我们如何执行这些规则是取决于实施情况的。
Heaps are usually used to implement priority queues because Heap is a very efficient implementation of extracting the least (or greatest) element.
堆通常被用来实现优先级队列,因为堆是一个非常有效的提取最小(或最大)元素的实现。
2.1. Heap Variants
2.1.堆的变体
Heap has many variants, all of them differ in some implementation details.
堆有很多变种,它们都在一些实现细节上有差异。
For example, what we described above is a Min-Heap, because a parent is always less than all of its children. Alternatively, we could have defined Max-Heap, in which case a parent is always greater than it’s children. Hence, the greatest element will be in the root node.
例如,我们上面描述的是一个Min-Heap,因为一个父类总是小于它的所有子类。或者,我们可以定义Max-Heap,在这种情况下,一个父节点总是比它的子节点大。因此,最大的元素将在根节点中。
We can choose from many tree implementations. The most straightforward is a Binary Tree. In a Binary Tree, every node can have at most two children. We call them left child and right child.
我们可以选择许多树的实现方式。最直接的是二叉树。在二叉树中,每个节点最多可以有两个孩子。我们称之为左孩子和右孩子。
The easiest way to enforce the 2nd rule is to use a Full Binary Tree. A Full Binary Tree follows some simple rules:
执行第二条规则的最简单方法是使用全二叉树。全二进制树遵循一些简单的规则。
- if a node has only one child, that should be its left child
- only the rightmost node on the deepest level can have exactly one child
- leaves can only be on the deepest level
Let’s see these rules with some examples:
让我们通过一些例子来看看这些规则。
1 2 3 4 5 6 7 8 9 10
() () () () () () () () () ()
/ \ / \ / \ / \ / \ / / / \
() () () () () () () () () () () () () ()
/ \ / \ / \ / / \
() () () () () () () () ()
/
()
The trees 1, 2, 4, 5 and 7 follow the rules.
树木1、2、4、5和7遵循规则。
Tree 3 and 6 violate the 1st rule, 8 and 9 the 2nd rule, and 10 violate the 3rd rule.
树3和树6违反了第一条规则,树8和树9违反了第二条规则,而树10则违反了第三条规则。
In this tutorial, we’ll focus on Min-Heap with a Binary Tree implementation.
在本教程中,我们将专注于二进制树的Min-Heap实现。
2.2. Inserting Elements
2.2.插入元素
We should implement all operations in a way, that keeps the Heap invariants. This way, we can build the Heap with repeated insertions, so we’ll focus on the single insert operation.
我们应该以一种方式实现所有的操作,以保持堆的不变性。这样一来,我们就可以用重复插入的方式来构建堆,所以我们将专注于单一的插入操作。
We can insert an element with the following steps:
我们可以通过以下步骤插入一个元素。
- create a new leaf which is the rightmost available slot on the deepest level and store the item in that node
- if the element is less than it’s parent, we swap them
- continue with step 2, until the element is less than it’s parent or it becomes the new root
Note, that step 2 won’t violate the Heap rule, because if we replace a node’s value with a less one, it still will be less than it’s children.
注意,第2步不会违反堆规则,因为如果我们把一个节点的值替换成一个较小的值,它仍然会比它的子节点小。
Let’s see an example! We want to insert 4 into this Heap:
让我们看一个例子!我们想在这个堆中插入4。
2
/ \
/ \
3 6
/ \
5 7
The first step is to create a new leaf which stores 4:
第一步是创建一个新的叶子,它储存了4个。
2
/ \
/ \
3 6
/ \ /
5 7 4
Since 4 is less than it’s parent, 6, we swap them:
由于4小于它的父本6,我们把它们交换。
2
/ \
/ \
3 4
/ \ /
5 7 6
Now we check whether 4 is less than it’s parent or not. Since its parent is 2, we stop. The Heap is still valid, and we inserted number 4.
现在我们检查4是否小于它的父代。由于它的父代是2,我们停止。堆仍然是有效的,我们插入了数字4。
Let’s insert 1:
让我们插入1。
2
/ \
/ \
3 4
/ \ / \
5 7 6 1
We have to swap 1 and 4:
我们必须把1和4对调。
2
/ \
/ \
3 1
/ \ / \
5 7 6 4
Now we should swap 1 and 2:
现在我们应该把1和2对调。
1
/ \
/ \
3 2
/ \ / \
5 7 6 4
Since 1 is the new root, we stop.
由于1是新的根,我们停止。
3. Heap Implementation in Java
3.Java中的堆实现
Since we use a Full Binary Tree, we can implement it with an array: an element in the array will be a node in the tree. We mark every node with the array indices from left-to-right, from top-to-bottom the following way:
由于我们使用的是全二叉树,我们可以用一个数组来实现它:数组中的一个元素将是树上的一个节点。我们用数组索引从左到右、从上到下的方式来标记每个节点。
0
/ \
/ \
1 2
/ \ /
3 4 5
The only thing we need is to keep track of how many elements we store in the tree. This way the index of the next element we want to insert will be the size of the array.
我们唯一需要的是跟踪我们在树上存储了多少个元素。这样一来,我们要插入的下一个元素的索引将是数组的大小。
Using this indexing, we can calculate the index of the parent and child nodes:
利用这个索引,我们可以计算出父节点和子节点的索引。
- parent: (index – 1) / 2
- left child: 2 * index + 1
- right child: 2 * index + 2
Since we don’t want to bother with array reallocating, we’ll simplify the implementation even more and use an ArrayList.
由于我们不想为数组的重新分配而烦恼,我们将进一步简化实现,使用一个ArrayList。
A basic Binary Tree implementation looks like this:
一个基本的二叉树实现看起来是这样的。
class BinaryTree<E> {
List<E> elements = new ArrayList<>();
void add(E e) {
elements.add(e);
}
boolean isEmpty() {
return elements.isEmpty();
}
E elementAt(int index) {
return elements.get(index);
}
int parentIndex(int index) {
return (index - 1) / 2;
}
int leftChildIndex(int index) {
return 2 * index + 1;
}
int rightChildIndex(int index) {
return 2 * index + 2;
}
}
The code above only adds the new element to the end of the tree. Therefore, we need to traverse the new element up if necessary. We can do it with the following code:
上面的代码只把新元素添加到树的末端。因此,如果有必要,我们需要将新元素向上遍历。我们可以用下面的代码来做。
class Heap<E extends Comparable<E>> {
// ...
void add(E e) {
elements.add(e);
int elementIndex = elements.size() - 1;
while (!isRoot(elementIndex) && !isCorrectChild(elementIndex)) {
int parentIndex = parentIndex(elementIndex);
swap(elementIndex, parentIndex);
elementIndex = parentIndex;
}
}
boolean isRoot(int index) {
return index == 0;
}
boolean isCorrectChild(int index) {
return isCorrect(parentIndex(index), index);
}
boolean isCorrect(int parentIndex, int childIndex) {
if (!isValidIndex(parentIndex) || !isValidIndex(childIndex)) {
return true;
}
return elementAt(parentIndex).compareTo(elementAt(childIndex)) < 0;
}
boolean isValidIndex(int index) {
return index < elements.size();
}
void swap(int index1, int index2) {
E element1 = elementAt(index1);
E element2 = elementAt(index2);
elements.set(index1, element2);
elements.set(index2, element1);
}
// ...
}
Note, that since we need to compare the elements, they need to implement java.util.Comparable.
注意,由于我们需要比较元素,它们需要实现java.util.Comparable。
4. Heap Sort
4.堆积排序
Since the root of the Heap always contains the smallest element, the idea behind Heap Sort is pretty simple: remove the root node until the Heap becomes empty.
由于堆的根总是包含最小的元素,堆排序背后的想法非常简单:移除根节点,直到堆变空。
The only thing we need is a remove operation, which keeps the Heap in a consistent state. We must ensure that we don’t violate the structure of the Binary Tree or the Heap property.
我们唯一需要的是一个移除操作,它能使堆保持一致的状态。我们必须确保不违反二叉树的结构或堆的属性。
To keep the structure, we can’t delete any element, except the rightmost leaf. So the idea is to remove the element from the root node and store the rightmost leaf in the root node.
为了保持结构,我们不能删除任何元素,除了最右边的叶子。因此,我们的想法是将元素从根节点移除,并将最右边的叶子存储在根节点中。
But this operation will most certainly violate the Heap property. So if the new root is greater than any of its child nodes, we swap it with its least child node. Since the least child node is less than all other child nodes, it doesn’t violate the Heap property.
但是这个操作肯定会违反堆的特性。所以如果新的根比它的任何一个子节点都大,我们就把它和它的最小子节点交换。由于最小的子节点小于所有其他的子节点,所以它没有违反堆属性。
We keep swapping until the element becomes a leaf, or it’s less than all of its children.
我们一直交换,直到该元素成为叶子,或者它小于它所有的子元素。
Let’s delete the root from this tree:
让我们从这个树上删除根。
1
/ \
/ \
3 2
/ \ / \
5 7 6 4
First, we place the last leaf in the root:
首先,我们把最后一片叶子放在根部。
4
/ \
/ \
3 2
/ \ /
5 7 6
Then, since it’s greater than both of its children, we swap it with its least child, which is 2:
然后,由于它大于它的两个子代,我们把它和它最小的子代交换,也就是2。
2
/ \
/ \
3 4
/ \ /
5 7 6
4 is less than 6, so we stop.
4小于6,所以我们停止。
5. Heap Sort Implementation in Java
5.在Java中实现堆排序
With all we have, removing the root (popping) looks like this:
有了我们所拥有的一切,去除根部(爆裂)看起来是这样的。
class Heap<E extends Comparable<E>> {
// ...
E pop() {
if (isEmpty()) {
throw new IllegalStateException("You cannot pop from an empty heap");
}
E result = elementAt(0);
int lasElementIndex = elements.size() - 1;
swap(0, lasElementIndex);
elements.remove(lasElementIndex);
int elementIndex = 0;
while (!isLeaf(elementIndex) && !isCorrectParent(elementIndex)) {
int smallerChildIndex = smallerChildIndex(elementIndex);
swap(elementIndex, smallerChildIndex);
elementIndex = smallerChildIndex;
}
return result;
}
boolean isLeaf(int index) {
return !isValidIndex(leftChildIndex(index));
}
boolean isCorrectParent(int index) {
return isCorrect(index, leftChildIndex(index)) && isCorrect(index, rightChildIndex(index));
}
int smallerChildIndex(int index) {
int leftChildIndex = leftChildIndex(index);
int rightChildIndex = rightChildIndex(index);
if (!isValidIndex(rightChildIndex)) {
return leftChildIndex;
}
if (elementAt(leftChildIndex).compareTo(elementAt(rightChildIndex)) < 0) {
return leftChildIndex;
}
return rightChildIndex;
}
// ...
}
Like we said before, sorting is just creating a Heap, and removing the root repeatedly:
就像我们之前说的,排序只是创建一个堆,并反复删除根。
class Heap<E extends Comparable<E>> {
// ...
static <E extends Comparable<E>> List<E> sort(Iterable<E> elements) {
Heap<E> heap = of(elements);
List<E> result = new ArrayList<>();
while (!heap.isEmpty()) {
result.add(heap.pop());
}
return result;
}
static <E extends Comparable<E>> Heap<E> of(Iterable<E> elements) {
Heap<E> result = new Heap<>();
for (E element : elements) {
result.add(element);
}
return result;
}
// ...
}
We can verify it’s working with the following test:
我们可以通过以下测试来验证它的工作。
@Test
void givenNotEmptyIterable_whenSortCalled_thenItShouldReturnElementsInSortedList() {
// given
List<Integer> elements = Arrays.asList(3, 5, 1, 4, 2);
// when
List<Integer> sortedElements = Heap.sort(elements);
// then
assertThat(sortedElements).isEqualTo(Arrays.asList(1, 2, 3, 4, 5));
}
Note, that we could provide an implementation, which sorts in-place, which means we provide the result in the same array we got the elements. Additionally, this way we don’t need any intermediate memory allocation. However, that implementation would be a bit harder to understand.
请注意,我们可以提供一个实现,即就地排序,这意味着我们在获得元素的同一数组中提供结果。此外,这样我们就不需要任何中间的内存分配。然而,这种实现方式会有点难以理解。
6. Time Complexity
6.时间复杂度
Heap sort consists of two key steps, inserting an element and removing the root node. Both steps have the complexity O(log n).
堆排序包括两个关键步骤,插入一个元素和移除根节点。这两个步骤的复杂度O(log n)。
Since we repeat both steps n times, the overall sorting complexity is O(n log n).
由于我们将这两个步骤重复了n次,因此总体排序复杂度为O(n log n)。
Note, that we didn’t mention the cost of array reallocation, but since it’s O(n), it doesn’t affect the overall complexity. Also, as we mentioned before, it’s possible to implement an in-place sorting, which means no array reallocation is necessary.
请注意,我们没有提到数组重新分配的成本,但由于它是O(n),它并不影响整体的复杂性。另外,正如我们之前提到的,我们可以实现原地排序,这意味着不需要重新分配数组。
Also worth mentioning, that 50% of the elements are leaves, and 75% of elements are at the two bottommost levels. Therefore, most insert operations won’t take more, than two steps.
另外值得一提的是,50%的元素是叶子,75%的元素在最下面的两层。因此,大多数插入操作不会超过两步。
Note, that on real-world data, Quicksort is usually more performant than Heap Sort. The silver lining is that Heap Sort always has a worst-case O(n log n) time complexity.
注意,在现实世界的数据中,Quicksort 通常比 Heap Sort 更具性能。一线希望是,堆排序总是具有最坏情况下的O(n log n)时间复杂性。
7. Conclusion
7.结语
In this tutorial, we saw an implementation of Binary Heap and Heap Sort.
在本教程中,我们看到了二进制堆和堆排序的实现。
Even though it’s time complexity is O(n log n), in most cases, it isn’t the best algorithm on real-world data.
尽管它的时间复杂度是O(n log n),但在大多数情况下,它并不是真实世界数据的最佳算法。
As usual, the examples are available over on GitHub.
像往常一样,这些例子可以在GitHub上找到over。