1. Overview
1.概述
The Java programming language provides arrays and collections to group objects together. Mostly, a collection is backed by an array and modeled with a set of methods to process the elements it contains.
Java编程语言提供了数组和集合来将对象分组。大多数情况下,集合以数组为后盾,并以一组方法为模型来处理其中的元素。
While developing software, it’s quite common to use both of these data structures. Hence, programmers need a bridging mechanism to convert these elements from one form to another. The asList method from the Arrays class and the Collection interface’s toArray method form this bridge.
在开发软件时,使用这两种数据结构是很常见的。因此,程序员需要一个桥接机制来将这些元素从一种形式转换为另一种形式。asList方法来自Arrays类和Collection接口的toArray方法就构成这座桥梁。
In this tutorial, we’ll do an in-depth analysis of an interesting argument: which toArray method to use and why? We’ll also use JMH-assisted benchmarking to support these arguments.
在本教程中,我们将对一个有趣的论点进行深入分析。使用哪种toArray方法,为什么?我们还将使用JMH辅助的基准测试来支持这些论点。
2. The toArray Rabbit Hole
2.toArray兔子洞
Before aimlessly invoking the toArray method, let’s understand what’s inside the box. The Collection interface offers two methods to transform a collection into an array:
在漫无目的地调用toArray方法之前,让我们先了解一下盒子里的东西。集合接口提供了两个方法来将一个集合转化为数组。
Object[] toArray()
<T> T[] toArray(T[] a)
Both methods return an array containing all elements of the collection. To demonstrate this, let’s create a list of natural numbers:
这两种方法都返回一个包含集合中所有元素的数组。为了证明这一点,让我们创建一个自然数的列表。
List<Integer> naturalNumbers = IntStream
.range(1, 10000)
.boxed()
.collect(Collectors.toList());
2.1. Collection.toArray()
2.1.Collection.toArray()
The toArray() method allocates a new in-memory array with a length equal to the size of the collection. Internally, it invokes the Arrays.copyOf on the underlying array backing the collection. Therefore, the returned array has no references to it and is safe to use:
toArray()方法在内存中分配一个新的数组,其长度等于集合的大小。在内部, 它在支持集合的底层数组上调用Arrays.copyOf 。因此,返回的数组没有对它的引用,可以安全使用。
Object[] naturalNumbersArray = naturalNumbers.toArray();
However, we cannot merely cast the result into an Integer[]. Doing so will result in a ClassCastException.
然而,我们不能仅仅将结果投到Integer[]中。这样做将导致ClassCastException。
2.2. <T> T[] Collection.toArray(T[] a)
2.2.<T> T[] Collection.toArray(T[] a)
Unlike the non-parameterized method, this one accepts a pre-allocated array as an argument. Additionally, the use of Generics in the method’s definition mandates having the same type for the input and the returned array. This also solves the previously observed problem of iterating over an Object[].
与非参数化方法不同,这个方法接受一个预先分配的数组作为参数。此外,在该方法的定义中使用Generics,这就要求输入和返回的数组必须具有相同的类型。这也解决了之前观察到的在Object[]上迭代的问题。
This variant works distinctively based on the size of the input array:
这个变体基于输入数组的大小而独特地工作。
- If the length of the pre-allocated array is less than the collection’s size, a new array of the required length and the same type is allocated:
Integer[] naturalNumbersArray = naturalNumbers.toArray(new Integer[0]);
- If the input array is large enough to contain the collection’s elements, it’s returned with those elements inside:
Integer[] naturalNumbersArray = naturalNumbers.toArray(new Integer[naturalNumbers.size]);
Now, let’s switch back to the original question of selecting the faster and better-performing candidate.
现在,让我们换回最初的问题,即选择速度更快、表现更好的候选人。
3. Performance Trials
3.性能试验
Let’s begin with a simple experiment that compares the zero-sized (toArray(new T[0]) and the pre-sized (toArray(new T[size]) variants. We’ll use the popular ArrayList and the AbstractCollection backed TreeSet for the trials. Also, we’ll include differently sized (small, medium, and large) collections to have a broad spectrum of sample data.
让我们从一个简单的试验开始,比较一下零大小(toArray(new T[0])和预大小(toArray(new T[size])的变体。我们将使用流行的ArrayList和AbstractCollection支持的TreeSet来进行试验。另外,我们将包括不同大小(小、中、大)的集合,以拥有广泛的样本数据。
3.1. The JMH Benchmark
3.1.JMH基准
Next, let’s put together a JMH (Java Microbenchmark Harness) benchmark for our trials. We’ll configure the size and type parameters of the collection for the benchmark:
接下来,让我们把一个JMH(Java Microbenchmark Harness)基准放在一起进行试验。我们将为该基准配置集合的大小和类型参数。
@Param({ "10", "10000", "10000000" })
private int size;
@Param({ "array-list", "tree-set" })
private String type;
Additionally, we’ll define benchmark methods for the zero-sized and the pre-sized toArray variants:
此外,我们将为零大小和预大小的toArray变体定义基准方法。
@Benchmark
public String[] zero_sized() {
return collection.toArray(new String[0]);
}
@Benchmark
public String[] pre_sized() {
return collection.toArray(new String[collection.size()]);
}
3.2. Benchmark Results
3.2 基准结果
Running the above benchmark on an 8 vCPU, 32 GB RAM, Linux x86_64 Virtual Machine with JMH (v1.28) and JDK (1.8.0_292) furnishes the results shown below. The Score reveals the average execution time, in nanoseconds per operation, for each of the benchmarked methods.
在8个vCPU、32GB内存、Linux x86_64虚拟机上运行上述基准,并使用JMH(v1.28)和JDK(1.8.0_292),得到如下结果。该分数显示了每个基准方法的平均执行时间,以纳秒为单位。
The lower the value, the better the performance:
该值越低,性能越好。
Benchmark (size) (type) Mode Cnt Score Error Units
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TestBenchmark.zero_sized 10 array-list avgt 15 24.939 ± 1.202 ns/op
TestBenchmark.pre_sized 10 array-list avgt 15 38.196 ± 3.767 ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized 10000 array-list avgt 15 15244.367 ± 238.676 ns/op
TestBenchmark.pre_sized 10000 array-list avgt 15 21263.225 ± 802.684 ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized 10000000 array-list avgt 15 82710389.163 ± 6616266.065 ns/op
TestBenchmark.pre_sized 10000000 array-list avgt 15 100426920.878 ± 10381964.911 ns/op
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TestBenchmark.zero_sized 10 tree-set avgt 15 66.802 ± 5.667 ns/op
TestBenchmark.pre_sized 10 tree-set avgt 15 66.009 ± 4.504 ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized 10000 tree-set avgt 15 85141.622 ± 2323.420 ns/op
TestBenchmark.pre_sized 10000 tree-set avgt 15 89090.155 ± 4895.966 ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized 10000000 tree-set avgt 15 211896860.317 ± 21019102.769 ns/op
TestBenchmark.pre_sized 10000000 tree-set avgt 15 212882486.630 ± 20921740.965 ns/op
After careful observation of the above results, it’s quite evident that the zero-sized method calls win it all, for all sizes and collection types in this trial.
在仔细观察了上述结果后,很明显,在这次试验中,零尺寸的方法调用赢得了一切,对所有尺寸和集合类型都是如此。
For now, these numbers are just data. To have a detailed understanding, let’s dig deep and analyze them.
就目前而言,这些数字只是数据。为了有一个详细的了解,让我们深入挖掘和分析它们。
3.3. The Allocation Rate
3.3.分配率
Hypothetically, it can be assumed that the zero-sized toArray method calls perform better than the pre-sized ones due to optimized memory allocations per operation. Let’s clarify this by executing another benchmark and quantifying the average allocation rates – the memory in bytes allocated per operation – for the benchmarked methods.
假设一下,可以假设零大小的toArray方法调用比预大小的方法表现更好,因为每个操作的内存分配得到了优化。让我们通过执行另一个基准测试来澄清这一点,并量化平均分配率–每次操作分配的内存字节数–对于基准方法来说。
The JMH provides a GC profiler (-prof gc) that internally uses ThreadMXBean#getThreadAllocatedBytes to calculate the allocation rate per @Benchmark:
JMH提供了一个GC剖析器(-prof gc),其内部使用ThreadMXBean#getThreadAllocatedBytes来计算每个@Benchmark的分配率。
Benchmark (size) (type) Mode Cnt Score Error Units
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10 array-list avgt 15 72.000 ± 0.001 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10 array-list avgt 15 56.000 ± 0.001 B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10000 array-list avgt 15 40032.007 ± 0.001 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10000 array-list avgt 15 40016.010 ± 0.001 B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10000000 array-list avgt 15 40000075.796 ± 8.882 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10000000 array-list avgt 15 40000062.213 ± 4.739 B/op
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10 tree-set avgt 15 56.000 ± 0.001 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10 tree-set avgt 15 56.000 ± 0.001 B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10000 tree-set avgt 15 40055.818 ± 16.723 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10000 tree-set avgt 15 41069.423 ± 1644.717 B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm 10000000 tree-set avgt 15 40000155.947 ± 9.416 B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm 10000000 tree-set avgt 15 40000138.987 ± 7.987 B/op
Clearly, the above numbers prove that the allocation rate is more or less the same for identical sizes, irrespective of the collection type or the toArray variant. Therefore, it negates any speculative assumptions that the pre-sized and zero-sized toArray variants perform differently due to the irregularities in their memory allocation rates.
显然,上述数字证明,无论集合类型或toArray变体如何,对于相同的大小,分配率是大致相同的。因此,它否定了任何猜测性的假设,即预大小和零大小的toArray变体由于其内存分配率的不规则而表现不同。
3.4. The toArray(T[] a) Internals
3.4.toArray(T[] a)的内部结构
To further root out the cause of the problem, let’s delve into the ArrayList internals:
为了进一步找出问题的原因,让我们深入研究一下ArrayList的内部结构。
if (a.length < size)
return (T[]) Arrays.copyOf(elementData, size, a.getClass());
System.arraycopy(elementData, 0, a, 0, size);
if (a.length > size)
a[size] = null;
return a;
Basically, depending on the length of the pre-allocated array, it’s either an Arrays.copyOf or the native System.arraycopy method call that copies the underlying elements of the collection into an array.
基本上,根据预先分配的数组的长度,要么是Arrays.copyOf,要么是native System.arraycopy方法调用,将集合的底层元素复制到数组中。
Further, gazing at the copyOf method, it’s evident that first a copy array of length equal to the size of the collection is created and then followed by the System.arraycopy invocation:
此外,凝视copyOf方法,很明显,首先创建了一个长度等于集合大小的拷贝数组,然后是System.arraycopyinvocation。
T[] copy = ((Object)newType == (Object)Object[].class)
? (T[]) new Object[newLength]
: (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
When both the zero-sized and the pre-sized methods eventually invoke the native System.arraycopy method, how is the zero-sized method call faster?
当零尺寸和预尺寸方法最终都调用本地System.arraycopy方法时,零尺寸方法的调用速度如何?
The mystery lies in the direct costs of the CPU time spent in performing Zero Initializations for the externally pre-allocated arrays that make the toArray(new T[size]) method much slower.
谜底在于为外部预分配的数组执行零初始化所花费的CPU时间的直接成本,这使得toArray(new T[size]) 方法慢了很多。
4. Zero Initializations
4.零点初始化
The Java language specification directs that newly instantiated arrays and objects should have the default field values and not the irregular leftovers from memory. Hence, the runtime must zero-out the pre-allocated storage. Benchmarking experiments have proved that the zero-sized array method calls managed to avoid zeroing, but the pre-sized case could not.
Java语言规范指示,新实例化的数组和对象应该有默认的字段值,而不是内存中不规则的遗留物。因此,运行时必须将预分配的存储空间清零。Benchmarking实验已经证明,零大小的数组方法调用能够避免归零,但是预大小的情况却不能。
Let’s consider a couple of benchmarks:
让我们考虑几个基准。
@Benchmark
public Foo[] arraycopy_srcLength() {
Object[] src = this.src;
Foo[] dst = new Foo[size];
System.arraycopy(src, 0, dst, 0, src.length);
return dst;
}
@Benchmark
public Foo[] arraycopy_dstLength() {
Object[] src = this.src;
Foo[] dst = new Foo[size];
System.arraycopy(src, 0, dst, 0, dst.length);
return dst;
}
Experimental observations show that the System.arraycopy immediately following the array allocation in the arraycopy_srcLength benchmark is able to avoid the pre-zeroing of the dst array. However, the arraycopy_dstLength execution could not avoid pre-zeroing.
实验观察表明,在arraycopy_srcLength基准中,紧随阵列分配之后的System.arraycopy能够避免dst array的预清零。然而,arraycopy_dstLength的执行不能避免预清零。
Coincidently, the latter arraycopy_dstLength case is similar to the pre-sized array method collection.toArray(new String[collection.size()]) where zeroing cannot be eliminated, hence its slowness.
巧合的是,后者arraycopy_dstLength的情况类似于预大小的数组方法collection.toArray(new String[collection.size()]),其中归零无法消除,因此其速度很慢。
5. Benchmarks on Newer JDKs
5.在较新的JDK上的基准测试
Finally, let’s execute the original benchmark on the recently released JDKs, and also configure the JVM to use the newer and much improved G1 garbage collector:
最后,让我们在最近发布的JDK上执行原来的基准,同时将JVM配置为使用较新的、改进很多的G1垃圾收集器。
# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
-----------------------------------------------------------------------------------
Benchmark (size) (type) Mode Cnt Score Error Units
-----------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 array-list avgt 15 199.920 ± 11.309 ns/op
ToArrayBenchmark.pre_sized 100 array-list avgt 15 237.342 ± 14.166 ns/op
-----------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 tree-set avgt 15 819.306 ± 85.916 ns/op
ToArrayBenchmark.pre_sized 100 tree-set avgt 15 972.771 ± 69.743 ns/op
###################################################################################
# VM version: JDK 14.0.2, OpenJDK 64-Bit Server VM, 14.0.2+12-46
------------------------------------------------------------------------------------
Benchmark (size) (type) Mode Cnt Score Error Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 array-list avgt 15 158.344 ± 3.862 ns/op
ToArrayBenchmark.pre_sized 100 array-list avgt 15 214.340 ± 5.877 ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 tree-set avgt 15 877.289 ± 132.673 ns/op
ToArrayBenchmark.pre_sized 100 tree-set avgt 15 934.550 ± 148.660 ns/op
####################################################################################
# VM version: JDK 15.0.2, OpenJDK 64-Bit Server VM, 15.0.2+7-27
------------------------------------------------------------------------------------
Benchmark (size) (type) Mode Cnt Score Error Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 array-list avgt 15 147.925 ± 3.968 ns/op
ToArrayBenchmark.pre_sized 100 array-list avgt 15 213.525 ± 6.378 ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 tree-set avgt 15 820.853 ± 105.491 ns/op
ToArrayBenchmark.pre_sized 100 tree-set avgt 15 947.433 ± 123.782 ns/op
####################################################################################
# VM version: JDK 16, OpenJDK 64-Bit Server VM, 16+36-2231
------------------------------------------------------------------------------------
Benchmark (size) (type) Mode Cnt Score Error Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 array-list avgt 15 146.431 ± 2.639 ns/op
ToArrayBenchmark.pre_sized 100 array-list avgt 15 214.117 ± 3.679 ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized 100 tree-set avgt 15 818.370 ± 104.643 ns/op
ToArrayBenchmark.pre_sized 100 tree-set avgt 15 964.072 ± 142.008 ns/op
####################################################################################
Interestingly, the toArray(new T[0]) method has been consistently faster than toArray(new T[size]). Also, its performance has constantly improved with every new release of the JDK.
有趣的是,toArray(new T[0]) 方法一直比toArray(new T[size])快。而且,它的性能随着JDK的每一个新版本而不断提高。
5.1. Java 11 Collection.toArray(IntFunction<T[]>)
5.1.Java 11 Collection.toArray(IntFunction<T[]>)
In Java 11, the Collection interface introduced a new default toArray method that accepts an IntFunction<T[]> generator as an argument (one that will generate a new array of the desired type and the provided length).
在 Java 11 中,Collection 接口引入了一个新的default toArray 方法,它接受一个 IntFunction<。T[]>生成器作为参数(一个将生成一个所需类型和所提供长度的新数组)。
This method guarantees new T[0] array initialization by invoking the generator function with a value of zero, thereby ensuring that the faster and better performing zero-sized toArray(T[]) method will always be executed.
该方法通过调用值为0的生成器函数来保证新T[0]数组的初始化,从而确保更快和性能更好的零大小toArray(T[])方法将始终被执行。
6. Conclusion
6.结语
In this article, we probed into the different toArray overloaded methods of the Collection interface. We also ran performance trials leveraging the JMH micro-benchmarking tool across different JDKs.
在这篇文章中,我们探究了toArray接口的不同toArray过载方法。我们还利用JMH微观基准工具在不同的JDK上进行了性能测试。
We understood the necessity and the impact of zeroing and observed how the internally allocated array eliminates the zeroing, thus winning the performance race. Lastly, we can firmly conclude that the toArray(new T[0]) variant is faster than the toArray(new T[size]) and, therefore, should always be the preferred option when we have to convert a collection to an array.
我们了解了归零的必要性和影响,并观察到内部分配的数组如何消除了归零,从而赢得了性能竞赛。最后,我们可以坚定地得出结论:toArray(new T[0]) 变量比toArray(new T[size]) 更快,因此,当我们必须将一个集合转换为一个数组时,应该始终是首选方案。
As always, the code used in this article can be found over on GitHub.
一如既往,本文中所使用的代码可以在GitHub上找到over。