1. Overview
1.概述
In this tutorial, we’ll talk about the performance of different collections from the Java Collection API. When we talk about collections, we usually think about the List, Map, and Set data structures, as well as their common implementations.
在本教程中,我们将讨论来自 Java Collection API 的不同集合的性能。当我们谈论集合时,我们通常会想到List、Map和Set数据结构,以及它们的常见实现。
First, we’ll look at Big-O complexity insights for common operations. Then we’ll show the real numbers of some collection operations’ running times.
首先,我们会看一下常见操作的Big-O复杂性的见解。然后,我们将展示一些收集操作的运行时间的真实数字。
2. Time Complexity
2.时间的复杂性
Usually, when we talk about time complexity, we refer to Big-O notation. Simply put, the notation describes how the time to perform the algorithm grows with the input size.
通常,当我们谈论时间复杂度时,我们指的是Big-O符号。简单地说,该符号描述了执行算法的时间是如何随着输入大小而增长的。
Useful write-ups are available to learn more about Big-O notation theory and practical Java examples.
有一些有用的写法,可以了解更多关于Big-O符号理论和实际Java例子。
3. List
3.列表
Let’s start with a simple list, which is an ordered collection.
让我们从一个简单的列表开始,它是一个有序的集合。
Here we’ll look at a performance overview of the ArrayList, LinkedList, and CopyOnWriteArrayList implementations.
这里我们将看看ArrayList、LinkedList、和CopyOnWriteArrayList实现的性能概述。
3.1. ArrayList
3.1.ArrayList
The ArrayList in Java is backed by an array. This helps to understand the internal logic of its implementation. A more comprehensive guide for the ArrayList is available in this article.
Java中的ArrayList是由一个数组支持的。这有助于理解其实现的内部逻辑。关于ArrayList的更全面的指南可在这篇文章中找到。
So let’s focus first on the time complexity of the common operations at a high level:
因此,让我们首先关注一下高水平的普通操作的时间复杂性。
- add() – takes O(1) time; however, worst-case scenario, when a new array has to be created and all the elements copied to it, it’s O(n)
- add(index, element) – on average runs in O(n) time
- get() – is always a constant time O(1) operation
- remove() – runs in linear O(n) time. We have to iterate the entire array to find the element qualifying for removal.
- indexOf() – also runs in linear time. It iterates through the internal array and checks each element one by one, so the time complexity for this operation always requires O(n) time.
- contains() – implementation is based on indexOf(), so it’ll also run in O(n) time.
3.2. CopyOnWriteArrayList
3.2.CopyOnWriteArrayList
This implementation of the List interface is beneficial when working with multi-threaded applications. It’s thread-safe and explained well in this guide here.
这个List接口的实现在处理多线程应用程序时是有益的。它是线程安全的,并且在这里的指南中得到了很好的解释。
Here’s the Big-O notation performance overview for CopyOnWriteArrayList:
下面是CopyOnWriteArrayList的Big-O记号性能概述。
- add() – depends on the position we add value, so the complexity is O(n)
- get() – is O(1) constant time operation
- remove() – takes O(n) time
- contains() – likewise, the complexity is O(n)
As we can see, using this collection is very expensive because of the performance characteristics of the add() method.
我们可以看到,由于add()方法的性能特点,使用这个集合是非常昂贵的。
3.3. LinkedList
3.3.LinkedList
LinkedList is a linear data structure that consists of nodes holding a data field and a reference to another node. For more LinkedList features and capabilities, have a look at this article here.
LinkedList是一种线性数据结构,由持有一个数据字段的节点和另一个节点的引用组成。关于更多的LinkedList特性和功能,请看这里的文章。
Let’s present the average estimate of time we need to perform some basic operations:
让我们介绍一下我们进行一些基本操作所需的平均估计时间。
- add() – appends an element to the end of the list. It only updates a tail, and therefore, it’s O(1) constant-time complexity.
- add(index, element) – on average runs in O(n) time
- get() – searching for an element takes O(n) time.
- remove(element) – to remove an element, we first need to find it. This operation is O(n).
- remove(index) – to remove an element by index, we first need to follow the links from the beginning; therefore, the overall complexity is O(n).
- contains() – also has O(n) time complexity
3.4. Warming Up the JVM
3.4.预热JVM
Now, to prove the theory, let’s play with actual data. To be more precise, we’ll present the JMH (Java Microbenchmark Harness) test results of the most common collection operations.
现在,为了证明这一理论,让我们用实际数据来玩玩。为了更精确,我们将介绍最常见的集合操作的JMH(Java Microbenchmark Harness)测试结果。
If we’re not familiar with the JMH tool, we can check out this useful guide.
如果我们对JMH工具不熟悉,可以看看这个有用的指南。
First, we’ll present the main parameters of our benchmark tests:
首先,我们将介绍我们基准测试的主要参数。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 10)
public class ArrayListBenchmark {
}
Then we’ll set the warm-up iterations number to 10. Note that we also want to see the average running time of our results displayed in microseconds.
然后我们将预热迭代次数设置为10。注意,我们还想看到以微秒为单位显示的结果的平均运行时间。
3.5. Benchmark Tests
3.5.基准测试
Now it’s time to run our performance tests. First, we’ll start with the ArrayList:
现在是时候运行我们的性能测试了。首先,我们将从ArrayList开始。
@State(Scope.Thread)
public static class MyState {
List<Employee> employeeList = new ArrayList<>();
long iterations = 100000;
Employee employee = new Employee(100L, "Harry");
int employeeIndex = -1;
@Setup(Level.Trial)
public void setUp() {
for (long i = 0; i < iterations; i++) {
employeeList.add(new Employee(i, "John"));
}
employeeList.add(employee);
employeeIndex = employeeList.indexOf(employee);
}
}
Inside our ArrayListBenchmark, we add the State class to hold the initial data.
在我们的ArrayListBenchmark中,我们添加了State类来保存初始数据。
Here, we create an ArrayList of Employee objects. Then we initialize it with 100.000 items inside of the setUp() method. The @State indicates that the @Benchmark tests have full access to the variables declared in it within the same thread.
在这里,我们创建一个Employee对象的ArrayList。然后我们在setUp()方法中用100.000项来初始化它。@State表示@Benchmark测试可以在同一线程中完全访问其中声明的变量。
Finally, it’s time to add the benchmark tests for the add(), contains(), indexOf(), remove(), and get() methods:
最后,是时候为add(), contains(), indexOf(), remove(), 和get()方法添加基准测试了。
@Benchmark
public void testAdd(ArrayListBenchmark.MyState state) {
state.employeeList.add(new Employee(state.iterations + 1, "John"));
}
@Benchmark
public void testAddAt(ArrayListBenchmark.MyState state) {
state.employeeList.add((int) (state.iterations), new Employee(state.iterations, "John"));
}
@Benchmark
public boolean testContains(ArrayListBenchmark.MyState state) {
return state.employeeList.contains(state.employee);
}
@Benchmark
public int testIndexOf(ArrayListBenchmark.MyState state) {
return state.employeeList.indexOf(state.employee);
}
@Benchmark
public Employee testGet(ArrayListBenchmark.MyState state) {
return state.employeeList.get(state.employeeIndex);
}
@Benchmark
public boolean testRemove(ArrayListBenchmark.MyState state) {
return state.employeeList.remove(state.employee);
}
3.6. Test Results
3.6.测试结果
All of the results are presented in microseconds:
所有的结果都是以微秒为单位。
Benchmark Mode Cnt Score Error
ArrayListBenchmark.testAdd avgt 20 2.296 ± 0.007
ArrayListBenchmark.testAddAt avgt 20 101.092 ± 14.145
ArrayListBenchmark.testContains avgt 20 709.404 ± 64.331
ArrayListBenchmark.testGet avgt 20 0.007 ± 0.001
ArrayListBenchmark.testIndexOf avgt 20 717.158 ± 58.782
ArrayListBenchmark.testRemove avgt 20 624.856 ± 51.101
From the results, we learn that the testContains() and testIndexOf() methods run at approximately the same time. We can also clearly see the huge difference between the testAdd() and testGet() method scores from the rest of the results. Adding an element takes 2.296 microseconds, and getting one is a 0.007-microsecond operation.
从结果中,我们得知testContains()和testIndexOf()方法的运行时间大致相同。我们还可以清楚地看到testAdd()和testGet()方法的得分与其他结果的巨大差异。添加一个元素需要2.296微秒,而获取一个元素是0.007微秒的操作。
Furthermore, searching or removing an element costs roughly 700 microseconds. These numbers are proof of the theoretical part, where we learned that add(), and get() have O(1) time complexity, and the other methods are O(n). n=10.000 elements in our example.
此外,搜索或删除一个元素大约需要700微秒。这些数字是理论部分的证明,我们了解到add(), 和get()的时间复杂度为O(1),其他方法为O(n)。在我们的例子中,n=10.000元素。
Similarly, we can write the same tests for the CopyOnWriteArrayList collection. All we need to do is replace the ArrayList in employeeList with the CopyOnWriteArrayList instance.
同样地,我们可以为CopyOnWriteArrayList集合编写同样的测试。我们所要做的就是用CopyOnWriteArrayList实例替换EmployeeList中的ArrayList。
Here are the results of the benchmark test:
以下是基准测试的结果。
Benchmark Mode Cnt Score Error
CopyOnWriteBenchmark.testAdd avgt 20 652.189 ± 36.641
CopyOnWriteBenchmark.testAddAt avgt 20 897.258 ± 35.363
CopyOnWriteBenchmark.testContains avgt 20 537.098 ± 54.235
CopyOnWriteBenchmark.testGet avgt 20 0.006 ± 0.001
CopyOnWriteBenchmark.testIndexOf avgt 20 547.207 ± 48.904
CopyOnWriteBenchmark.testRemove avgt 20 648.162 ± 138.379
Here, again, the numbers confirm the theory. As we can see, testGet() on average runs in 0.006 ms, which we can consider as O(1). Comparing to ArrayList, we also notice the significant difference between the testAdd() method results, as here we have O(n) complexity for the add() method versus ArrayList’s O(1).
这里,数字再次证实了理论。我们可以看到,testGet()的平均运行时间为0.006毫秒,我们可以认为是O(1)。与ArrayList相比,我们也注意到了testAdd()方法结果的显著差异,因为在这里我们的add()方法的复杂度为O(n),而ArrayList的复杂度为O(1)。
We can clearly see the linear growth of time, as the performance numbers are 878.166 compared to 0.051.
我们可以清楚地看到时间的线性增长,因为性能数字是878.166,而0.051。
Now it’s LinkedList time:
现在是LinkedList时间。
Benchmark Cnt Score Error
testAdd 20 2.580 ± 0.003
testContains 20 1808.102 ± 68.155
testGet 20 1561.831 ± 70.876
testRemove 20 0.006 ± 0.001
We can see from the scores that adding and removing elements in LinkedList is quite fast.
我们可以从分数中看到,在LinkedList中添加和删除元素是相当快的。
Furthermore, there’s a significant performance gap between add/remove and get/contains operations.
此外,在添加/删除和获取/包含操作之间存在着明显的性能差距。
4. Map
4.地图
With the latest JDK versions, we’re witnessing significant performance improvement for Map implementations, such as replacing the LinkedList with the balanced tree node structure in HashMap, and LinkedHashMap internal implementations. This shortens the element lookup worst-case scenario from O(n) to O(log(n)) time during the HashMap collisions.
在最新的JDK版本中,我们见证了Map实现的显著性能改进,例如在HashMap,和LinkedHashMap内部实现中,用平衡树节点结构取代了LinkedList。这将元素查找的最坏情况从O(n)缩短到O(log(n))时间,在HashMap碰撞中。
However, if we implement proper .equals() and .hashcode() methods, collisions are unlikely.
然而,如果我们实现了适当的.equals()和.hashcode()方法,碰撞就不可能发生。
To learn more about HashMap collisions, check out this write-up. From the write-up, we’ll also learn that storing and retrieving elements from the HashMap takes constant O(1) time.
要了解有关HashMap碰撞的更多信息,请查看这个写作。从该文章中,我们还将了解到,从HashMap中存储和检索元素需要恒定的O(1)时间。
4.1. Testing O(1) Operations
4.1.测试O(1)操作
Now let’s see some actual numbers. First, the HashMap:
现在我们来看看一些实际的数字。首先是HashMap。
Benchmark Mode Cnt Score Error
HashMapBenchmark.testContainsKey avgt 20 0.009 ± 0.002
HashMapBenchmark.testGet avgt 20 0.011 ± 0.001
HashMapBenchmark.testPut avgt 20 0.019 ± 0.002
HashMapBenchmark.testRemove avgt 20 0.010 ± 0.001
As we can see, the numbers prove the O(1) constant time for running the methods listed above. Now let’s compare the HashMap test scores with the other Map instance scores.
正如我们所看到的,这些数字证明了运行上述方法的O(1)恒定时间。现在让我们将HashMap测试得分与其他Map实例得分进行比较。
For all of the listed methods, we have O(1) for HashMap, LinkedHashMap, IdentityHashMap, WeakHashMap, EnumMap and ConcurrentHashMap.
对于所有列出的方法,我们对HashMap、LinkedHashMap、IdentityHashMap、WeakHashMap、EnumMap和ConcurrentHashMap有O(1)。
Let’s present the results of the remaining test scores in the form of a table:
让我们以表格的形式介绍一下其余考试成绩的结果。
Benchmark LinkedHashMap IdentityHashMap WeakHashMap ConcurrentHashMap
testContainsKey 0.008 0.009 0.014 0.011
testGet 0.011 0.109 0.019 0.012
testPut 0.020 0.013 0.020 0.031
testRemove 0.011 0.115 0.021 0.019
From the output numbers, we can confirm the claims of O(1) time complexity.
从输出数字来看,我们可以确认O(1)时间复杂性的说法。
4.2. Testing O(log(n)) Operations
4.2.测试O(log(n))操作
For the tree structure TreeMap and ConcurrentSkipListMap, the put(), get(), remove(), and containsKey() operations time is O(log(n)).
对于树状结构TreeMap和ConcurrentSkipListMap,的put(), get(), remove(), 和 containsKey() 操作时间为O(log(n))。
Here we want to make sure that our performance tests will run approximately in logarithmic time. For this reason, we’ll initialize the maps with n=1000, 10,000, 100,000, 1,000,000 items continuously.
在这里,我们要确保我们的性能测试将大约以对数时间运行。出于这个原因,我们将用n=1000、10,000、100,000、1,000,000个项目连续初始化地图。
In this case, we’re interested in the total time of execution:
在这种情况下,我们对总的执行时间感兴趣:。
items count (n) 1000 10,000 100,000 1,000,000
all tests total score 00:03:17 00:03:17 00:03:30 00:05:27
When n=1000, we have a total of 00:03:17 milliseconds execution time. At n=10,000, the time is almost unchanged, 00:03:18 ms. n=100,000 has a minor increase at 00:03:30. And finally, when n=1,000,000, the run completes in 00:05:27 ms.
当n=1000,我们共有00:03:17毫秒的执行时间。在n=10,000,时,时间几乎没有变化,00:03:18毫秒。n=100,000在00:03:30时有一个小的增长。最后,当n=1,000,000,时,运行在00:05:27 ms完成。
After comparing the runtime numbers with the log(n) function of each n, we can confirm that the correlation of both functions matches.
在将运行时间数字与每个n的log(n)函数进行比较后,我们可以确认这两个函数的相关性是相符的。
5. Set
5.设置
Generally, Set is a collection of unique elements. Here we’re going to examine the HashSet, LinkedHashSet, EnumSet, TreeSet, CopyOnWriteArraySet, and ConcurrentSkipListSet implementations of the Set interface.
一般来说,Set是一个独特元素的集合。在这里,我们将研究HashSet、LinkedHashSet、EnumSet、TreeSet、CopyOnWriteArraySet和ConcurrentSkipListSet接口的实现。
To better understand the internals of the HashSet, this guide is here to help.
为了更好地了解HashSet的内部情况,本指南将为您提供帮助。
Now let’s jump ahead to present the time complexity numbers. For HashSet, LinkedHashSet, and EnumSet, the add(), remove() and contains() operations cost constant O(1) time thanks to the internal HashMap implementation.
现在让我们跳到前面来介绍一下时间复杂性的数字。对于HashSet、LinkedHashSet、和EnumSet,的add()、remove()和contains()操作花费了恒定的O(1)时间,这得益于内部HashMap实现。
Likewise, the TreeSet has O(log(n)) time complexity for the operations listed in the previous group. This is because of the TreeMap implementation. The time complexity for ConcurrentSkipListSet is also O(log(n)) time, as it’s based in skip list data structure.
同样,TreeSet对于前一组列出的操作,其时间复杂性为O(log(n))。这是因为TreeMap的实现。ConcurrentSkipListSet的时间复杂性也是O(log(n))时间,因为它是基于跳过列表数据结构的。
For CopyOnWriteArraySet, the add(), remove() and contains() methods have O(n) average time complexity.
对于CopyOnWriteArraySet,add()、remove()和contains()方法的平均时间复杂性为O(n)。
5.1. Test Methods
5.1.测试方法
Now let’s jump to our benchmark tests:
现在让我们跳到我们的基准测试。
@Benchmark
public boolean testAdd(SetBenchMark.MyState state) {
return state.employeeSet.add(state.employee);
}
@Benchmark
public Boolean testContains(SetBenchMark.MyState state) {
return state.employeeSet.contains(state.employee);
}
@Benchmark
public boolean testRemove(SetBenchMark.MyState state) {
return state.employeeSet.remove(state.employee);
}
We’ll leave the remaining benchmark configurations as they are.
我们将保持其余的基准配置不变。
5.2. Comparing the Numbers
5.2.数字的比较
Let’s see the behavior of the runtime execution score for HashSet and LinkedHashSet having n = 1000; 10,000; 100,000 items.
让我们看看HashSet和LinkedHashSet的运行时执行分数的行为,其中n = 1000; 10,000; 100,000项。
For the HashSet, the numbers are:
对于HashSet,数字是。
Benchmark 1000 10,000 100,000
.add() 0.026 0.023 0.024
.remove() 0.009 0.009 0.009
.contains() 0.009 0.009 0.010
Similarly, the results for the LinkedHashSet are:
同样地,LinkedHashSet的结果是。
Benchmark 1000 10,000 100,000
.add() 0.022 0.026 0.027
.remove() 0.008 0.012 0.009
.contains() 0.008 0.013 0.009
As we can see, the scores remain almost the same for each operation. When we compare them with the HashMap test outputs, they look the same as well.
我们可以看到,每个操作的分数几乎都是一样的。当我们将它们与HashMap测试输出相比较时,它们看起来也是一样的。
As a result, we confirm that all the tested methods run in constant O(1) time.
结果是,我们确认所有测试方法都能在恒定的O(1)时间内运行。
6. Conclusion
6.结论
This article presents the time complexity of the most common implementations of the Java data structures.
本文介绍了Java数据结构最常见的实现方式的时间复杂性。
We saw the actual runtime performance of each type of collection through the JVM benchmark tests. We also compared the performance of the same operations in different collections. As a result, we learned how to choose the right collection to fit our needs.
我们通过JVM的基准测试看到了每种类型集合的实际运行性能。我们还比较了不同集合中相同操作的性能。结果是,我们学会了如何选择合适的集合来满足我们的需求。
As usual, the complete code for this article is available over on GitHub.
像往常一样,本文的完整代码可以在GitHub上找到。