1. Overview
1.概述
In this article, we’ll compare different ways of filtering Java Streams. Initially, we’ll see which solution leads to more readable code. After that, we’ll compare the solutions from a performance point of view.
在本文中,我们将比较过滤Java Streams的不同方法。首先,我们将看到哪种解决方案能带来更多可读的代码。之后,我们将从性能的角度来比较这些解决方案。
2. Readability
2.可读性
Firstly, we’ll compare the two solutions from a readability perspective. For the code examples in this section, we’ll use the Student class:
首先,我们将从可读性的角度来比较这两种解决方案。在本节的代码示例中,我们将使用Student类。
public class Student {
private String name;
private int year;
private List<Integer> marks;
private Profile profile;
// constructor getters and setters
}
Our goal is to filter a Stream of Students based on the following three rules:
我们的目标是根据以下三个规则过滤Students流。
- the profile must be Profile.PHYSICS
- the count of the marks should be greater than 3
- the average mark should be greater than 50
2.1. Multiple Filters
2.1 多重过滤器
The Stream API allows chaining multiple filters. We can leverage this to satisfy the complex filtering criteria described. Besides, we can use the not Predicate if we want to negate conditions.
Stream API 允许连锁多个过滤器。我们可以利用这一点来满足所述的复杂过滤条件。此外,如果我们想否定条件,我们可以使用not谓词。
This approach will lead to a clean and easy-to-understand code:
这种方法将导致一个干净和易于理解的代码。
@Test
public void whenUsingMultipleFilters_dataShouldBeFiltered() {
List<Student> filteredStream = students.stream()
.filter(s -> s.getMarksAverage() > 50)
.filter(s -> s.getMarks().size() > 3)
.filter(not(s -> s.getProfile() == Student.Profile.PHYSICS))
.collect(Collectors.toList());
assertThat(filteredStream).containsExactly(mathStudent);
}
2.2. Single Filter With Complex Condition
2.2.具有复杂条件的单个过滤器
The alternative would be to use a single filter with a more complex condition.
另一个办法是使用一个条件更复杂的单一过滤器。
Unfortunately, the resulted code will be a bit harder to read:
不幸的是,所产生的代码将有点难以阅读。
@Test
public void whenUsingSingleComplexFilter_dataShouldBeFiltered() {
List<Student> filteredStream = students.stream()
.filter(s -> s.getMarksAverage() > 50
&& s.getMarks().size() > 3
&& s.getProfile() != Student.Profile.PHYSICS)
.collect(Collectors.toList());
assertThat(filteredStream).containsExactly(mathStudent);
}
Though, we can make it better by extracting the several conditions into a separate method:
虽然,我们可以通过把几个条件提取到一个单独的方法中来使它变得更好。
public boolean isEligibleForScholarship() {
return getMarksAverage() > 50
&& marks.size() > 3
&& profile != Profile.PHYSICS;
}
As a result, we’ll hide the complex condition and we’ll give more meaning to the filtering criteria:
因此,我们将隐藏复杂的条件,我们将赋予过滤标准更多的意义。
@Test
public void whenUsingSingleComplexFilterExtracted_dataShouldBeFiltered() {
List<Student> filteredStream = students.stream()
.filter(Student::isEligibleForScholarship)
.collect(Collectors.toList());
assertThat(filteredStream).containsExactly(mathStudent);
}
This would be a good solution, especially when we can encapsulate the filter logic inside our model.
这将是一个很好的解决方案,尤其是当我们可以将过滤器的逻辑封装在我们的模型内时。
3. Performance
3.业绩
We have seen that using multiple filters can improve the readability of our code. On the other hand, this will imply the creation of multiple objects and it can lead to a loss in performance. To demonstrate this, we’ll filter Streams of different sizes and perform multiple checks on their elements.
我们已经看到,使用多个过滤器可以提高我们代码的可读性。另一方面,这将意味着要创建多个对象,而且会导致性能的损失。为了证明这一点,我们将过滤不同大小的Streams,并对其元素进行多次检查。
After this, we’ll calculate the total processing time in milliseconds and compare the two solutions. Additionally, we’ll include in our tests Parallel Streams and the simple, old, for loop:
之后,我们将以毫秒为单位计算总的处理时间,并比较这两种解决方案。此外,我们将在测试中加入并行流和简单、古老的for循环:。
As a result, we can notice that using a complex condition will result in a performance gain.
Though, for small sample sizes, the difference might not be noticeable.
虽然,对于小规模的样本,差异可能并不明显。
4. The Order of the Conditions
4.条件的顺序
Regardless if we are using single or multiple filters, the filtering can cause a performance drop if the checks are not executed in the optimal order.
无论我们使用的是单个还是多个过滤器,如果检查没有按照最佳顺序执行,过滤会导致性能下降。
4.1. Conditions Which are Filtering out Many Elements
4.1.过滤掉许多元素的条件
Let’s assume we have a stream of 100 integer numbers and we want to find the even numbers smaller than 20.
假设我们有一个100个整数的数据流,我们想找到小于20的偶数。
If we first check the parity of the number, we’ll end up with a total of 150 checks. This is because the first condition will be evaluated each time, while the second condition will be evaluated only for the even numbers.
如果我们首先检查数字的奇偶性,我们最终会有150次检查。这是因为第一个条件每次都会被评估,而第二个条件只对偶数进行评估。
@Test
public void givenWrongFilterOrder_whenUsingMultipleFilters_shouldEvaluateManyConditions() {
long filteredStreamSize = IntStream.range(0, 100).boxed()
.filter(this::isEvenNumber)
.filter(this::isSmallerThanTwenty)
.count();
assertThat(filteredStreamSize).isEqualTo(10);
assertThat(numberOfOperations).hasValue(150);
}
On the other hand, if we inverse the order of the filters, we’ll only need a total of 120 checks to properly filter the stream. Consequently, the conditions which are filtering out the majority of the elements should be evaluated first.
另一方面,如果我们反转过滤器的顺序,我们将只需要总共120个检查来正确过滤流。因此,过滤掉大多数元素的条件应该首先被评估。
4.2. Slow or Heavy Conditions
4.2.缓慢或沉重的条件
Some of the conditions can potentially be slow. For instance, if one of the filters would require executing some heavy logic or an external call over the network. For better performance, we’ll try to evaluate these conditions as fewer times as possible. Therefore, we’ll try to evaluate them only if all other conditions were met.
有些条件有可能很慢。例如,如果其中一个过滤器需要执行一些沉重的逻辑或通过网络进行外部调用。为了提高性能,我们会尽量减少评估这些条件的次数。因此,我们将尝试只在所有其他条件都满足的情况下评估它们。
5. Conclusion
5.总结
In this article, we have analyzed different ways of filtering Java Streams. Firstly, we have compared the two approaches from a readability point of view. We discovered that multiple filters provide a more comprehensible filtering condition.
在这篇文章中,我们分析了过滤Java Streams的不同方法。首先,我们从可读性的角度比较了这两种方法。我们发现,多个过滤器提供了一个更易理解的过滤条件。
After that, we have compared the solutions from a performance perspective. We learned that using a complex condition and, therefore, creating fewer objects will lead to better overall performance.
之后,我们从性能角度比较了这些解决方案。我们了解到,使用一个复杂的条件,因此,创建较少的对象,将导致更好的整体性能。
As always, the source code is available over on GitHub.
一如既往,源代码可在GitHub上获取。