1. Overview
1.概述
Java Stream API provides various methods to operate and work with a sequence of elements. However, it’s not easy if we want to process only part of the stream, e.g., every N-th element. This might be useful if we’re processing a stream of raw data representing a CSV file or database table and would like to process only specific columns.
Java Stream API 提供了多种操作和处理元素序列的方法。但是,如果我们只想处理流的一部分,例如每 N 个元素,那就不容易了。如果我们正在处理代表CSV文件或数据库表的原始数据流,并且只想处理特定列,这可能会很有用。
We’ll address two kinds of streams: finite and infinite. The first case can be resolved by converting a Stream into a List, which allows indexing. On the other side, infinite streams would require a different approach. In this tutorial, we’ll learn how to address this challenge using various techniques.
我们将讨论两种流:有限流和无限流。第一种情况可以通过将流转换为允许索引的列表来解决。另一方面,无限流需要采用不同的方法。在本教程中,我们将学习如何使用各种技术来应对这一挑战。
2. Tests Setup
2.测试设置
We’ll use parametrized tests to check the correctness of our solutions. There’ll be a couple of cases with respective N-th elements and expected results:
我们将使用参数化测试来检查解决方案的正确性。这里会有几个分别包含 N 个元素和预期结果的案例:
Arguments.of(
Stream.of("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
List.of("Wednesday", "Saturday"), 3),
Arguments.of(
Stream.of("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
List.of("Friday"), 5),
Arguments.of(
Stream.of("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
List.of("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), 1)
Now, we can dive into different methods of processing the N-th element from a stream.
现在,我们可以深入研究处理流中第 N 个元素的不同方法。
3. Using filter()
3.使用 filter() 方法
In the first approach, we can create a separate stream containing only the indexes of the elements we would like to process. We can use a filter(Predicate) to create such an array:
在第一种方法中,我们可以创建一个单独的流,其中只包含我们要处理的元素的索引。我们可以使用 filter(Predicate) 来创建这样一个数组:
void givenListSkipNthElementInListWithFilterTestShouldFilterNthElement(Stream<String> input, List<String> expected, int n) {
final List<String> sourceList = input.collect(Collectors.toList());
final List<String> actual = IntStream.range(0, sourceList.size())
.filter(s -> (s + 1) % n == 0)
.mapToObj(sourceList::get)
.collect(Collectors.toList());
assertEquals(expected, actual);
}
This approach will work if we want to operate on a data structure that allows indexed accesses, such as a List. The needed elements can be collected to a new List or processed with forEach(Consumer).
如果我们想对允许索引访问的数据结构(如 List)进行操作,那么这种方法也行得通。可以将所需元素收集到一个新的 List 中,或使用 forEach(Consumer) 进行处理。
4. Using iterate()
4.使用 iterate()
This approach is similar to the previous one and requires a data structure with indexed accesses. However, instead of filtering out indexes we don’t need, we’ll generate only the indexes we would like to use in the beginning:
这种方法与前一种方法类似,都需要一个带有索引访问的数据结构。不过,我们不会过滤掉不需要的索引,而是一开始就只生成我们希望使用的索引:
void givenListSkipNthElementInListWithIterateTestShouldFilterNthElement(Stream<String> input, List<String> expected, int n) {
final List<String> sourceList = input.collect(Collectors.toList());
int limit = sourceList.size() / n;
final List<String> actual = IntStream.iterate(n - 1, i -> (i + n))
.limit(limit)
.mapToObj(sourceList::get)
.collect(Collectors.toList());
assertEquals(expected, actual);
}
In this case, we’re using IntStream.iterate(int, IntUnaryOperator), which allows us to create an integer sequence with n step.
在本例中,我们使用 IntStream.iterate(int, IntUnaryOperator), 来创建一个具有 n 步的整数序列。
5. Using subList()
5.使用 subList()
This approach uses Stream.iterate and is similar to the previous one, but it creates a stream of Lists, each starting at the nk-th index:
这种方法使用 Stream.iterate 并与前一种方法类似,但它创建了一个 Lists 流,每个流从 nk-th 索引开始:
void givenListSkipNthElementInListWithSublistTestShouldFilterNthElement(Stream<String> input, List<String> expected, int n) {
final List<String> sourceList = input.collect(Collectors.toList());
int limit = sourceList.size() / n;
final List<String> actual = Stream.iterate(sourceList, s -> s.subList(n, s.size()))
.limit(limit)
.map(s -> s.get(n - 1))
.collect(Collectors.toList());
assertEquals(expected, actual);
}
We should take the first element of each of these Lists to get the needed result.
我们应该选取每个列表中的第一个元素,以获得所需的结果.。
6. Using a Custom Collector
6.使用自定义 收集器</em
As a more advanced and transparent solution, we can implement a custom Collector that collects only the needed elements:
作为一种更先进、更透明的解决方案,我们可以实施一个自定义的 Collector 只收集所需的元素:
class SkippingCollector {
private static final BinaryOperator<SkippingCollector> IGNORE_COMBINE = (a, b) -> a;
private final int skip;
private final List<String> list = new ArrayList<>();
private int currentIndex = 0;
private SkippingCollector(int skip) {
this.skip = skip;
}
private void accept(String item) {
int index = ++currentIndex % skip;
if (index == 0) {
list.add(item);
}
}
private List<String> getResult() {
return list;
}
public static Collector<String, SkippingCollector, List<String>> collector(int skip) {
return Collector.of(() -> new SkippingCollector(skip),
SkippingCollector::accept,
IGNORE_COMBINE,
SkippingCollector::getResult);
}
}
This approach is more complex and requires some coding. At the same time, this solution doesn’t allow parallelization and technically may fail even on sequential streams because combining is an implementation detail that might change in future releases:
这种方法较为复杂,需要一定的编码。同时,这种解决方案不允许并行化,从技术上讲,即使在顺序流上也可能失败,因为结合是一个实施细节,在未来的版本中可能会发生变化:
public static List<String> skipNthElementInStreamWithCollector(Stream<String> sourceStream, int n) {
return sourceStream.collect(SkippingCollector.collector(n));
}
However, it’s possible to use Spliterators to make this approach work for parallel streams, but it should have a good reason for this.
不过,也可以使用 Spliterators 使这种方法适用于并行流,但应该有充分的理由。
7. Simple Loop
7.简单循环
All the previous solutions would work, but overall, they’re unnecessarily complex and often misguiding. The best way to resolve the problem is often with the simplest implementation possible. This is how we can use a for loop to achieve the same:
之前的所有解决方案都可以奏效,但总体而言,它们都过于复杂,而且往往会误导用户。解决问题的最佳方法往往是尽可能使用最简单的实现。我们可以使用 for 循环来实现同样的目标:
void givenListSkipNthElementInListWithForTestShouldFilterNthElement(Stream<String> input, List<String> expected, int n) {
final List<String> sourceList = input.collect(Collectors.toList());
List<String> result = new ArrayList<>();
for (int i = n - 1; i < sourceList.size(); i += n) {
result.add(sourceList.get(i));
}
final List<String> actual = result;
assertEquals(expected, actual);
}
However, sometimes, we need to work with a Stream directly, and this won’t allow us to access elements directly by their indexes. In this case, we can use an Iterator with a while loop:
但是,有时我们需要直接使用 Stream 而这不允许我们通过索引直接访问元素。在这种情况下,我们可以使用 Iterator 和 while 循环:
void givenListSkipNthElementInStreamWithIteratorTestShouldFilterNthElement(Stream<String> input, List<String> expected, int n) {
List<String> result = new ArrayList<>();
final Iterator<String> iterator = input.iterator();
int count = 0;
while (iterator.hasNext()) {
if (count % n == n - 1) {
result.add(iterator.next());
} else {
iterator.next();
}
++count;
}
final List<String> actual = result;
assertEquals(expected, actual);
}
These solutions are cleaner and more straightforward to understand while resolving the same problem.
这些解决方案在解决相同问题的同时,更加简洁明了。
8. Conclusion
8.结论
Java Stream API is a powerful tool that helps to make the code more declarative and readable. Additionally, streams can achieve a better performance by utilizing parametrization. However, the desire to use streams everywhere might not be the best way to approach this API.
Java Stream API 是一种强大的工具,有助于使代码更具声明性和可读性。此外,通过利用参数化,流可以获得更好的性能。不过,到处使用流的愿望可能并不是使用该 API 的最佳方式。
Although mental gymnastics of applying stream operation in the cases where they’re not ideally suitable might be fun, it might also result in “clever code.” Often, the simplest structure, like loops, can achieve the same result with less and more understandable code.
虽然在并不理想的情况下应用流操作的心理体操可能会很有趣,但它也可能会导致“聪明的代码”。通常情况下,最简单的结构(如循环)可以用更少、更易懂的代码实现相同的结果。
As always, all the code used in this article is available over on GitHub.
一如既往,本文中使用的所有代码均可在 GitHub 上获取。