1. Overview
1.概述
In this tutorial, we’ll dive into how different uses of the Java Stream API affect the order in which a stream generates, processes, and collects data.
在本教程中,我们将深入探讨Java Stream API的不同用途如何影响流的生成、处理和收集数据的顺序。
We’ll also look at how ordering influences performance.
我们还将看一下排序如何影响性能。
2. Encounter Order
2.相遇的顺序
Simply put, encounter order is the order in which a Stream encounters data.
简单地说,遇到的顺序是Stream遇到数据的顺序。
2.1. Encounter Order of Collection Sources
2.1.采集源的相遇顺序
The Collection we choose as our source affects the encounter order of the Stream.
我们选择的Collection作为我们的源,会影响Stream.的相遇顺序。
To test this, let’s simply create two streams.
为了测试这一点,让我们简单地创建两个流。
Our first is created from a List, which has an intrinsic ordering.
我们的第一个是由一个List创建的,它有一个内在的排序。
Our second is created from a TreeSet which doesn’t.
我们的第二个是由一个TreeSet创建的,它没有。
We then collect the output of each Stream into an Array to compare the results.
然后我们将每个Stream的输出收集到一个Array中,以比较结果。
@Test
public void givenTwoCollections_whenStreamedSequentially_thenCheckOutputDifferent() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
Set<String> set = new TreeSet<>(list);
Object[] listOutput = list.stream().toArray();
Object[] setOutput = set.stream().toArray();
assertEquals("[B, A, C, D, F]", Arrays.toString(listOutput));
assertEquals("[A, B, C, D, F]", Arrays.toString(setOutput));
}
As we can tell from our example, the TreeSet hasn’t kept the order of our input sequence, therefore, scrambling the encounter order of the Stream.
从我们的例子中可以看出,TreeSet没有保持我们输入序列的顺序,因此,扰乱了Stream的相遇顺序。
If our Stream is ordered, it doesn’t matter whether our data is being processed sequentially or in parallel; the implementation will maintain the encounter order of the Stream.
如果我们的Stream是有序的,那么无论我们的数据是按顺序处理还是并行处理都不重要;实现将保持Stream的相遇顺序。
When we repeat our test using parallel streams, we get the same result:
当我们使用平行流重复我们的测试时,我们得到了相同的结果。
@Test
public void givenTwoCollections_whenStreamedInParallel_thenCheckOutputDifferent() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
Set<String> set = new TreeSet<>(list);
Object[] listOutput = list.stream().parallel().toArray();
Object[] setOutput = set.stream().parallel().toArray();
assertEquals("[B, A, C, D, F]", Arrays.toString(listOutput));
assertEquals("[A, B, C, D, F]", Arrays.toString(setOutput));
}
2.2. Removing Order
2.2.撤消订单
At any point, we can explicitly remove the order constraint with the unordered method.
在任何时候,我们都可以用unorderedmethod明确地移除顺序约束。
For example, let’s declare a TreeSet:
例如,让我们声明一个TreeSet。
Set<Integer> set = new TreeSet<>(
Arrays.asList(-9, -5, -4, -2, 1, 2, 4, 5, 7, 9, 12, 13, 16, 29, 23, 34, 57, 102, 230));
And if we stream without calling unordered:
而如果我们不调用unordered就进行流化。
set.stream().parallel().limit(5).toArray();
Then TreeSet‘s natural order is preserved:
那么TreeSet的自然顺序就被保留了。
[-9, -5, -4, -2, 1]
But, if we explicitly remove ordering:
但是,如果我们明确地删除排序。
set.stream().unordered().parallel().limit(5).toArray();
Then the output is different:
那么输出就不同了。
[1, 4, 7, 9, 23]
The reason is two-fold: First, since sequential streams process the data one element at a time, unordered has little effect on its own. When we called parallel, too, however, we affected the output.
原因有两个方面。首先,由于顺序流一次处理一个元素的数据,无序本身的影响很小。然而,当我们也调用parallel时,我们影响了输出。
3. Intermediate Operations
3.中级业务
We can also affect stream ordering through intermediate operations.
我们还可以通过中间操作影响流排序。
While most intermediate operations will maintain the order of the Stream, some will, by their nature, change it.
虽然大多数中间操作将保持Stream的顺序,有些操作由于其性质将改变它。
For example, we can affect the stream ordering by sorting:
例如,我们可以通过排序来影响流的排序。
@Test
public void givenUnsortedStreamInput_whenStreamSorted_thenCheckOrderChanged() {
List<Integer> list = Arrays.asList(-3, 10, -4, 1, 3);
Object[] listOutput = list.stream().toArray();
Object[] listOutputSorted = list.stream().sorted().toArray();
assertEquals("[-3, 10, -4, 1, 3]", Arrays.toString(listOutput));
assertEquals("[-4, -3, 1, 3, 10]", Arrays.toString(listOutputSorted));
}
unordered and empty are two more examples of intermediate operations that will ultimately change the ordering of a Stream.
无序和空是另外两个中间操作的例子,最终将改变Stream的排序。。
4. Terminal Operations
4.终端业务
Finally, we can affect the order depending on the terminal operation that we use.
最后,我们可以影响顺序取决于我们使用的终端操作。
4.1. ForEach vs ForEachOrdered
4.1.ForEach vs ForEachOrdered
ForEach and ForEachOrdered may seem to provide the same functionality, but they have one key difference: ForEachOrdered guarantees to maintain the order of the Stream.
ForEach和ForEachOrdered似乎提供了相同的功能,但它们有一个关键的区别。ForEachOrdered保证维持Stream的顺序。
If we declare a list:
如果我们声明一个列表。
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
And use forEachOrdered after parallelizing:
并在并行化后使用forEachOrdered。
list.stream().parallel().forEachOrdered(e -> logger.log(Level.INFO, e));
Then the output is ordered:
然后对输出进行排序。
INFO: B
INFO: A
INFO: C
INFO: D
INFO: F
However, if we use forEach:
然而,如果我们使用forEach:的话
list.stream().parallel().forEach(e -> logger.log(Level.INFO, e));
Then the output is unordered:
那么输出结果就是无序的。
INFO: C
INFO: F
INFO: B
INFO: D
INFO: A
ForEach logs the elements in the order they arrive from each thread. The second Stream with its ForEachOrdered method waits for each previous thread to complete before calling the log method.
ForEach按照每个线程到达的顺序记录这些元素。第二个Stream及其ForEachOrdered方法,在调用log方法之前,等待每个前一个线程完成。
4.2. Collect
4.2.收集
When we use the collect method to aggregate the Stream output, it’s important to note that the Collection we choose will impact the order.
当我们使用collect方法来聚合Stream输出时,需要注意的是,我们选择的Collection将影响顺序。
For example, inherently unordered Collections such as TreeSet won’t obey the order of the Stream output:
例如,固有的无序集合,如TreeSet不会服从Stream输出的顺序。
@Test
public void givenSameCollection_whenStreamCollected_checkOutput() {
List<String> list = Arrays.asList("B", "A", "C", "D", "F");
List<String> collectionList = list.stream().parallel().collect(Collectors.toList());
Set<String> collectionSet = list.stream().parallel()
.collect(Collectors.toCollection(TreeSet::new));
assertEquals("[B, A, C, D, F]", collectionList.toString());
assertEquals("[A, B, C, D, F]", collectionSet.toString());
}
When running our code, we see that the order of our Stream changes by collecting into a Set.
当运行我们的代码时,我们看到我们的Stream的顺序通过收集到一个Set.而改变。
4.3. Specifying Collections
4.3.指定Collections
In the case we collect to an unordered collection using, say, Collectors.toMap, we can still enforce ordering by changing the implementation of our Collectors methods to use the Linked implementation.
在我们使用Collectors.toMap收集到一个无序的集合的情况下,我们仍然可以通过改变我们的Collectors方法的实现来执行排序,以使用链接实现。
First, we’ll initialize our list, along with the usual 2-parameter version of the toMap method:
首先,我们将初始化我们的列表,以及通常的2参数版本的toMap方法。
@Test
public void givenList_whenStreamCollectedToHashMap_thenCheckOrderChanged() {
List<String> list = Arrays.asList("A", "BB", "CCC");
Map<String, Integer> hashMap = list.stream().collect(Collectors
.toMap(Function.identity(), String::length));
Object[] keySet = hashMap.keySet().toArray();
assertEquals("[BB, A, CCC]", Arrays.toString(keySet));
}
As expected, our new HashMap hasn’t kept the original ordering of the input list, but let’s change that.
不出所料,我们的新HashMap没有保持输入列表的原始顺序,但让我们改变一下。
With our second Stream, we’ll use the 4-parameter version of the toMap method to tell our supplier to supply a new LinkedHashMap:
通过我们的第二个Stream,我们将使用4参数版本的toMap方法来告诉我们的supplier提供一个新LinkedHashMap。
@Test
public void givenList_whenCollectedtoLinkedHashMap_thenCheckOrderMaintained(){
List<String> list = Arrays.asList("A", "BB", "CCC");
Map<String, Integer> linkedHashMap = list.stream().collect(Collectors.toMap(
Function.identity(),
String::length,
(u, v) -> u,
LinkedHashMap::new
));
Object[] keySet = linkedHashMap.keySet().toArray();
assertEquals("[A, BB, CCC]", Arrays.toString(keySet));
}
Hey, that’s much better!
嘿,那就好得多了!
We’ve managed to keep the original order of the list by collecting our data to a LinkedHashMap.
我们通过收集数据到一个LinkedHashMap,设法保持了列表的原始顺序。
5. Performance
5.表现
If we’re using sequential streams, the presence or absence of order makes little difference to the performance of our program. Parallel streams, however, can be heavily affected by the presence of an ordered Stream.
如果我们使用的是顺序流,有无顺序对我们程序的性能影响不大。然而,并行流可能会受到有序Stream存在的严重影响。
The reason for this is that each thread must wait for the computation of the previous element of the Stream.
其原因是,每个线程必须等待Stream的前一个元素的计算。
Let’s try and demonstrate this using the Java Microbenchmark harness, JMH, to measure the performance.
让我们尝试使用Java Microbenchmark harness, JMH来测量性能。
In the following examples, we’ll measure the performance cost of processing ordered and unordered parallel streams with some common intermediate operations.
在下面的例子中,我们将用一些常见的中间操作来衡量处理有序和无序的并行流的性能成本。
5.1. Distinct
5.1.分明
Let’s set up a test using the distinct function on both ordered and unordered streams.
让我们使用distinct函数在有序和无序的流上设置一个测试。
@Benchmark
public void givenOrderedStreamInput_whenStreamDistinct_thenShowOpsPerMS() {
IntStream.range(1, 1_000_000).parallel().distinct().toArray();
}
@Benchmark
public void givenUnorderedStreamInput_whenStreamDistinct_thenShowOpsPerMS() {
IntStream.range(1, 1_000_000).unordered().parallel().distinct().toArray();
}
When we hit run, we can see the disparity in the time taken per operation:
当我们点击运行时,我们可以看到每个操作所需时间的差距。
Benchmark Mode Cnt Score Error Units
TestBenchmark.givenOrdered... avgt 2 222252.283 us/op
TestBenchmark.givenUnordered... avgt 2 78221.357 us/op
5.2. Filter
5.2.过滤器
Next, we’ll use a parallel Stream with a simple filter method to return every 10th integer:
接下来,我们将使用一个并行的Stream与一个简单的过滤器方法来返回每10个整数:。
@Benchmark
public void givenOrderedStreamInput_whenStreamFiltered_thenShowOpsPerMS() {
IntStream.range(1, 100_000_000).parallel().filter(i -> i % 10 == 0).toArray();
}
@Benchmark
public void givenUnorderedStreamInput_whenStreamFiltered_thenShowOpsPerMS(){
IntStream.range(1,100_000_000).unordered().parallel().filter(i -> i % 10 == 0).toArray();
}
Interestingly, the difference between our two streams is much less than when using the distinct method.
有趣的是,我们两个数据流之间的差异比使用distinct方法时小得多。
Benchmark Mode Cnt Score Error Units
TestBenchmark.givenOrdered... avgt 2 116333.431 us/op
TestBenchmark.givenUnordered... avgt 2 111471.676 us/op
6. Conclusion
6.结论
In this article, we looked at the ordering of streams, focusing on the different stages of the Stream process and how each one has its own effect.
在这篇文章中,我们研究了流的排序,重点是流过程的不同阶段,以及每个阶段如何有自己的影响。
Finally, we saw how the order contract placed on a Stream can affect the performance of parallel streams.
最后,我们看到了放置在Stream上的顺序契约是如何影响并行流的性能的。
As always, check out the full sample set over on GitHub.
一如既往,请查看GitHub上的完整样本集,。