Guide to Java 8’s Collectors – Java 8’的收集器指南

最后修改: 2016年 7月 15日

中文/混合/英文(键盘快捷键:t)

1. Overview

1. Overview

In this tutorial, we’ll be going through Java 8’s Collectors, which are used at the final step of processing a Stream.

在本教程中,我们将了解Java 8的收集器,它是在处理Stream的最后一步使用的。

To read more about Stream API itself, we can check out this article.

要阅读更多关于StreamAPI本身的信息,我们可以查看这篇文章

If we want to see how to leverage the power of Collectors for parallel processing, we can look at this project.

如果我们想看看如何利用采集器的力量进行并行处理,我们可以看看这个项目。

2. The Stream.collect() Method

2、Stream.collect()方法

Stream.collect() is one of the Java 8’s Stream API‘s terminal methods. It allows us to perform mutable fold operations (repackaging elements to some data structures and applying some additional logic, concatenating them, etc.) on data elements held in a Stream instance.

Stream.collect()是Java 8的Stream API的终端方法之一。它允许我们对Stream实例中持有的数据元素进行易变的折叠操作(将元素重新包装成一些数据结构,并应用一些额外的逻辑,连接它们,等等)。

The strategy for this operation is provided via the Collector interface implementation.

这个操作的策略是通过Collector接口实现提供的。

3. Collectors

3.收藏家

All predefined implementations can be found in the Collectors class. It’s common practice to use the following static import with them to leverage increased readability:

所有预定义的实现都可以在Collectors类中找到。通常的做法是对它们使用以下静态导入,以利用增加可读性。

import static java.util.stream.Collectors.*;

We can also use single import collectors of our choice:

我们也可以使用我们选择的单一进口收集器。

import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;
import static java.util.stream.Collectors.toSet;

In the following examples, we’ll be reusing the following list:

在下面的例子中,我们将重复使用以下列表。

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dd");

3.1. Collectors.toList()

3.1.Collectors.toList()

The toList collector can be used for collecting all Stream elements into a List instance. The important thing to remember is that we can’t assume any particular List implementation with this method. If we want to have more control over this, we can use toCollection instead.

toList收集器可用于收集所有Stream元素到一个List实例。需要记住的是,我们不能用这个方法假设任何特定的List实现。如果我们想对此有更多的控制,我们可以使用toCollection代替。

Let’s create a Stream instance representing a sequence of elements, and then collect them into a List instance:

让我们创建一个代表元素序列的Stream实例,然后将它们收集到一个List实例中。

List<String> result = givenList.stream()
  .collect(toList());

3.1.1. Collectors.toUnmodifiableList()

Java 10 introduced a convenient way to accumulate the Stream elements into an unmodifiable List:

Java 10引入了一种方便的方法,将Stream元素累积到一个不可修改的List

List<String> result = givenList.stream()
  .collect(toUnmodifiableList());

Now if we try to modify the result List, we’ll get an UnsupportedOperationException:

现在,如果我们试图修改result List,我们会得到一个UnsupportedOperationException

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

3.2. Collectors.toSet()

3.2.Collectors.toSet()

The toSet collector can be used for collecting all Stream elements into a Set instance. The important thing to remember is that we can’t assume any particular Set implementation with this method. If we want to have more control over this, we can use toCollection instead.

toSet收集器可用于收集所有Stream元素到一个Set实例中。需要记住的是,我们不能用这个方法假设任何特定的Set实现。如果我们想对此有更多的控制,我们可以使用toCollection代替。

Let’s create a Stream instance representing a sequence of elements, and then collect them into a Set instance:

让我们创建一个代表元素序列的Stream实例,然后将它们收集到一个Set实例中。

Set<String> result = givenList.stream()
  .collect(toSet());

A Set doesn’t contain duplicate elements. If our collection contains elements equal to each other, they appear in the resulting Set only once:

一个集合不包含重复的元素。如果我们的集合中包含了彼此相等的元素,它们在产生的Set中只出现一次。

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
Set<String> result = listWithDuplicates.stream().collect(toSet());
assertThat(result).hasSize(4);

3.2.1. Collectors.toUnmodifiableSet()

Since Java 10, we can easily create an unmodifiable Set using the toUnmodifiableSet() collector:

从Java 10开始,我们可以使用toUnmodifiableSet()收集器轻松创建一个不可修改的Set

Set<String> result = givenList.stream()
  .collect(toUnmodifiableSet());

Any attempt to modify the result Set will end up with an UnsupportedOperationException:

任何试图修改结果集的行为都将以不支持操作的异常而告终。

assertThatThrownBy(() -> result.add("foo"))
  .isInstanceOf(UnsupportedOperationException.class);

3.3. Collectors.toCollection()

3.3.Collectors.toCollection()

As we’ve already noted, when using the toSet and toList collectors, we can’t make any assumptions of their implementations. If we want to use a custom implementation, we’ll need to use the toCollection collector with a provided collection of our choice.

正如我们已经指出的,当使用toSettoList收集器时,我们不能对它们的实现做出任何假设。如果我们想使用一个自定义的实现,我们需要使用toCollection收集器来提供我们选择的集合。

Let’s create a Stream instance representing a sequence of elements, and then collect them into a LinkedList instance:

让我们创建一个代表元素序列的Stream实例,然后将它们收集到一个LinkedList实例中。

List<String> result = givenList.stream()
  .collect(toCollection(LinkedList::new))

Notice that this will not work with any immutable collections. In such a case, we would need to either write a custom Collector implementation or use collectingAndThen.

请注意,这不会对任何不可变的集合起作用。在这种情况下,我们需要写一个自定义的Collector实现或者使用collectingAndThen

3.4. Collectors.toMap()

3.4.Collectors.toMap()

The toMap collector can be used to collect Stream elements into a Map instance. To do this, we need to provide two functions:

toMap收集器可用于将Stream元素收集到一个Map实例中。要做到这一点,我们需要提供两个函数。

  • keyMapper
  • valueMapper

We’ll use keyMapper to extract a Map key from a Stream element, and valueMapper to extract a value associated with a given key.

我们将使用 keyMapper Stream元素中提取一个 Map键,以及 valueMapper提取一个与给定键相关的值。

Let’s collect those elements into a Map that stores strings as keys and their lengths as values:

让我们把这些元素收集到一个Map中,将字符串作为键,将其长度作为值来存储。

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length))

Function.identity() is just a shortcut for defining a function that accepts and returns the same value.

Function.ident()只是定义一个接受和返回相同值的函数的一个快捷方式。

So what happens if our collection contains duplicate elements? Contrary to toSet, toMap doesn’t silently filter duplicates, which is understandable because how would it figure out which value to pick for this key?

那么,如果我们的集合包含重复的元素会怎样呢?与toSet相反,toMap不会默默地过滤重复的元素,这是可以理解的,因为它怎么会知道要为这个键选择哪个值?

List<String> listWithDuplicates = Arrays.asList("a", "bb", "c", "d", "bb");
assertThatThrownBy(() -> {
    listWithDuplicates.stream().collect(toMap(Function.identity(), String::length));
}).isInstanceOf(IllegalStateException.class);

Note that toMap doesn’t even evaluate whether the values are also equal. If it sees duplicate keys, it immediately throws an IllegalStateException.

请注意,toMap甚至不评估这些值是否也相等。如果它看到重复的键,它会立即抛出一个IllegalStateException

In such cases with key collision, we should use toMap with another signature:

在这种钥匙碰撞的情况下,我们应该使用带有另一个签名的toMap

Map<String, Integer> result = givenList.stream()
  .collect(toMap(Function.identity(), String::length, (item, identicalItem) -> item));

The third argument here is a BinaryOperator, where we can specify how we want collisions to be handled. In this case, we’ll just pick any of these two colliding values because we know that the same strings will always have the same lengths too.

这里的第三个参数是一个BinaryOperator,在这里我们可以指定我们要如何处理碰撞。在这种情况下,我们只需在这两个碰撞值中选择任何一个,因为我们知道相同的字符串也总是有相同的长度。

3.4.1. Collectors.toUnmodifiableMap()

Similar to with Lists and Sets, Java 10 introduced an easy way to collect Stream elements into an unmodifiable Map:

Lists和Sets类似,Java 10引入了一种简单的方法来将Stream元素收集到一个不可修改的Map

Map<String, Integer> result = givenList.stream()
  .collect(toUnmodifiableMap(Function.identity(), String::length))

As we can see, if we try to put a new entry into a result Map, we’ll get an UnsupportedOperationException:

正如我们所看到的,如果我们试图在result Map中放入一个新条目,我们会得到一个UnsupportedOperationException

assertThatThrownBy(() -> result.put("foo", 3))
  .isInstanceOf(UnsupportedOperationException.class);

3.5. Collectors.collectingAndThen()

3.5.Collectors.collectingAndThen()

CollectingAndThen is a special collector that allows us to perform another action on a result straight after collecting ends.

CollectingAndThen是一个特殊的收集器,允许我们在收集结束后直接对一个结果执行另一个动作。

Let’s collect Stream elements to a List instance, and then convert the result into an ImmutableList instance:

让我们将Stream元素收集到一个List实例,然后将结果转换为ImmutableList实例。

List<String> result = givenList.stream()
  .collect(collectingAndThen(toList(), ImmutableList::copyOf))

3.6. Collectors.joining()

3.6.Collectors.joining()

Joining collector can be used for joining Stream<String> elements.

连接收集器可用于连接Stream<String>元素。

We can join them together by doing:

我们可以通过行动将它们联系在一起。

String result = givenList.stream()
  .collect(joining());

This will result in:

这将导致。

"abbcccdd"

We can also specify custom separators, prefixes, postfixes:

我们还可以指定自定义分隔符、前缀、后缀。

String result = givenList.stream()
  .collect(joining(" "));

This will result in:

这将导致。

"a bb ccc dd"

We can also write:

我们也可以这样写。

String result = givenList.stream()
  .collect(joining(" ", "PRE-", "-POST"));

This will result in:

这将导致。

"PRE-a bb ccc dd-POST"

3.7. Collectors.counting()

3.7.Collectors.counting()

Counting is a simple collector that allows for the counting of all Stream elements.

Counting是一个简单的收集器,允许对所有Stream元素进行计数。

Now we can write:

现在我们可以写。

Long result = givenList.stream()
  .collect(counting());

3.8. Collectors.summarizingDouble/Long/Int()

3.8.Collectors.summarizingDouble/Long/Int()

SummarizingDouble/Long/Int is a collector that returns a special class containing statistical information about numerical data in a Stream of extracted elements.

SummarizingDouble/Long/Int是一个收集器,它返回一个特殊的类,包含了在一个 Stream中提取元素的数字数据的统计信息。

We can obtain information about string lengths by doing:

我们可以通过以下方式获得有关字符串长度的信息。

DoubleSummaryStatistics result = givenList.stream()
  .collect(summarizingDouble(String::length));

In this case, the following will be true:

在这种情况下,以下情况将是真的。

assertThat(result.getAverage()).isEqualTo(2);
assertThat(result.getCount()).isEqualTo(4);
assertThat(result.getMax()).isEqualTo(3);
assertThat(result.getMin()).isEqualTo(1);
assertThat(result.getSum()).isEqualTo(8);

3.9. Collectors.averagingDouble/Long/Int()

3.9.Collectors.averagingDouble/Long/Int()

AveragingDouble/Long/Int is a collector that simply returns an average of extracted elements.

AveragingDouble/Long/Int是一个收集器,简单地返回提取的元素的平均值。

We can get the average string length by doing:

我们可以通过以下方式获得平均字符串长度。

Double result = givenList.stream()
  .collect(averagingDouble(String::length));

3.10. Collectors.summingDouble/Long/Int()

3.10.Collectors.summingDouble/Long/Int()

SummingDouble/Long/Int is a collector that simply returns a sum of extracted elements.

SummingDouble/Long/Int是一个收集器,简单地返回提取的元素之和。

We can get the sum of all string lengths by doing:

我们可以通过以下方法得到所有字符串长度的总和。

Double result = givenList.stream()
  .collect(summingDouble(String::length));

3.11. Collectors.maxBy()/minBy()

3.11.Collectors.maxBy()/minBy()

MaxBy/MinBy collectors return the biggest/smallest element of a Stream according to a provided Comparator instance.

MaxBy/MinBy收集器根据提供的Comparator实例,返回Stream中最大/最小的元素。

We can pick the biggest element by doing:

我们可以通过实践来挑选最大的元素。

Optional<String> result = givenList.stream()
  .collect(maxBy(Comparator.naturalOrder()));

We can see that the returned value is wrapped in an Optional instance. This forces users to rethink the empty collection corner case.

我们可以看到,返回的值被包裹在一个Optional实例中。这迫使用户重新考虑空集合的角落情况。

3.12. Collectors.groupingBy()

3.12.Collectors.groupingBy()

GroupingBy collector is used for grouping objects by some property, and then storing the results in a Map instance.

GroupingBy收集器用于按某些属性对对象进行分组,然后将结果存储在Map实例中。

We can group them by string length, and store the grouping results in Set instances:

我们可以按字符串长度分组,并将分组结果存储在Set实例中。

Map<Integer, Set<String>> result = givenList.stream()
  .collect(groupingBy(String::length, toSet()));

This will result in the following being true:

这将导致以下情况为真。

assertThat(result)
  .containsEntry(1, newHashSet("a"))
  .containsEntry(2, newHashSet("bb", "dd"))
  .containsEntry(3, newHashSet("ccc"));

We can see that the second argument of the groupingBy method is a Collector. In addition, we’re free to use any Collector of our choice.

我们可以看到,groupingBy方法的第二个参数是一个Collector。此外,我们可以自由地使用我们选择的任何Collector

3.13. Collectors.partitioningBy()

3.13.Collectors.partitioningBy()

PartitioningBy is a specialized case of groupingBy that accepts a Predicate instance, and then collects Stream elements into a Map instance that stores Boolean values as keys and collections as values. Under the “true” key, we can find a collection of elements matching the given Predicate, and under the “false” key, we can find a collection of elements not matching the given Predicate.

PartitioningBygroupingBy的一个特例,它接受一个Predicate实例,然后将Stream元素收集到一个Map实例,该实例将Boolean值作为键,将集合作为值。在 “真 “键下,我们可以找到与给定Predicate匹配的元素集合,在 “假 “键下,我们可以找到与给定Predicate不匹配的元素集合。

We can write:

我们可以写。

Map<Boolean, List<String>> result = givenList.stream()
  .collect(partitioningBy(s -> s.length() > 2))

This results in a Map containing:

这导致了一个包含的地图。

{false=["a", "bb", "dd"], true=["ccc"]}

3.14. Collectors.teeing()

3.14.Collectors.teeing()

Let’s find the maximum and minimum numbers from a given Stream using the collectors we’ve learned so far:

让我们使用到目前为止学到的收集器,从一个给定的Stream中找到最大和最小的数字。

List<Integer> numbers = Arrays.asList(42, 4, 2, 24);
Optional<Integer> min = numbers.stream().collect(minBy(Integer::compareTo));
Optional<Integer> max = numbers.stream().collect(maxBy(Integer::compareTo));
// do something useful with min and max

Here we’re using two different collectors, and then combining the results of those two to create something meaningful. Before Java 12, in order to cover such use cases, we had to operate on the given Stream twice, store the intermediate results into temporary variables, and then combine those results afterwards.

在这里,我们使用了两个不同的收集器,然后将这两个收集器的结果结合起来,创造出有意义的东西。在Java 12之前,为了涵盖这样的用例,我们必须对给定的Stream进行两次操作,将中间结果存储到临时变量中,然后再将这些结果合并。

Fortunately, Java 12 offers a built-in collector that takes care of these steps on our behalf; all we have to do is provide the two collectors and the combiner function.

幸运的是,Java 12提供了一个内置的收集器,代表我们处理这些步骤;我们所要做的就是提供两个收集器和组合器函数。

Since this new collector tees the given stream towards two different directions, it’s called teeing:

由于这个新的收集器将给定的数据流向两个不同的方向,所以它被称为teeing:

numbers.stream().collect(teeing(
  minBy(Integer::compareTo), // The first collector
  maxBy(Integer::compareTo), // The second collector
  (min, max) -> // Receives the result from those collectors and combines them
));

This example is available on GitHub in the core-java-12 project.

这个例子可以在GitHub的core-java-12项目中找到。

4. Custom Collectors

4.定制收藏家

If we want to write our own Collector implementation, we need to implement the Collector interface, and specify its three generic parameters:

如果我们想写我们自己的采集器实现,我们需要实现采集器接口,并指定其三个通用参数。

public interface Collector<T, A, R> {...}
  1. T – the type of objects that will be available for collection
  2. A – the type of a mutable accumulator object
  3. R – the type of a final result

Let’s write an example Collector for collecting elements into an ImmutableSet instance. We start by specifying the right types:

让我们写一个收集元素到ImmutableSet实例的采集器的例子。我们首先要指定正确的类型。

private class ImmutableSetCollector<T>
  implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {...}

Since we need a mutable collection for internal collection operation handling, we can’t use ImmutableSet. Instead, we need to use some other mutable collection, or any other class that could temporarily accumulate objects for us. In this case, we will go with an ImmutableSet.Builder and now we need to implement 5 methods:

由于我们需要一个可变的集合来处理内部的集合操作,我们不能使用ImmutableSet。相反,我们需要使用一些其他的可变集合,或者任何其他可以为我们临时积累对象的类。在这种情况下,我们将使用一个ImmutableSet.Builder,现在我们需要实现5个方法。

  • Supplier<ImmutableSet.Builder<T>> supplier()
  • BiConsumer<ImmutableSet.Builder<T>, T> accumulator()
  • BinaryOperator<ImmutableSet.Builder<T>> combiner()
  • Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher()
  • Set<Characteristics> characteristics()

The supplier() method returns a Supplier instance that generates an empty accumulator instance. So in this case, we can simply write:

supplier()方法返回一个Supplier实例,生成一个空的累积器实例。所以在这种情况下,我们可以简单地写。

@Override
public Supplier<ImmutableSet.Builder<T>> supplier() {
    return ImmutableSet::builder;
}

The accumulator() method returns a function that is used for adding a new element to an existing accumulator object. So let’s just use the Builder‘s add method:

accumulator()方法返回一个函数,用于向现有的accumulator对象添加一个新元素。所以我们就用Builderadd方法吧。

@Override
public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
    return ImmutableSet.Builder::add;
}

The combiner() method returns a function that is used for merging two accumulators together:

combiner()方法返回一个函数,用于将两个累加器合并在一起。

@Override
public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
    return (left, right) -> left.addAll(right.build());
}

The finisher() method returns a function that is used for converting an accumulator to final result type. So in this case, we’ll just use Builder‘s build method:

finisher()方法返回一个函数,用于将累加器转换为最终结果类型。所以在这种情况下,我们就使用Builderbuild方法。

@Override
public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
    return ImmutableSet.Builder::build;
}

The characteristics() method is used to provide Stream with some additional information that will be used for internal optimizations. In this case, we don’t pay attention to the elements order in a Set because we’ll use Characteristics.UNORDERED. To obtain more information regarding this subject, check Characteristics‘ JavaDoc:

characteristics()方法被用来向Stream提供一些额外的信息,这些信息将被用于内部优化。在这种情况下,我们不关注Set中的元素顺序,因为我们将使用Characteristics.UNORDERED。要获得关于这个主题的更多信息,请查看Characteristics‘ JavaDoc。

@Override public Set<Characteristics> characteristics() {
    return Sets.immutableEnumSet(Characteristics.UNORDERED);
}

Here is the complete implementation along with the usage:

下面是完整的实现和用法。

public class ImmutableSetCollector<T>
  implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> {

@Override
public Supplier<ImmutableSet.Builder<T>> supplier() {
    return ImmutableSet::builder;
}

@Override
public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() {
    return ImmutableSet.Builder::add;
}

@Override
public BinaryOperator<ImmutableSet.Builder<T>> combiner() {
    return (left, right) -> left.addAll(right.build());
}

@Override
public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() {
    return ImmutableSet.Builder::build;
}

@Override
public Set<Characteristics> characteristics() {
    return Sets.immutableEnumSet(Characteristics.UNORDERED);
}

public static <T> ImmutableSetCollector<T> toImmutableSet() {
    return new ImmutableSetCollector<>();
}

Finally, here in action:

最后,在这里采取行动。

List<String> givenList = Arrays.asList("a", "bb", "ccc", "dddd");

ImmutableSet<String> result = givenList.stream()
  .collect(toImmutableSet());

5. Conclusion

5.结论

In this article, we explored in depth Java 8’s Collectors, and showed how to implement one. Make sure to check out one of my projects that enhances the capabilities of parallel processing in Java.

在这篇文章中,我们深入探讨了Java 8的Collectors,,并展示了如何实现一个。请务必查看我的一个项目,该项目增强了Java中的并行处理能力

All code examples are available on the GitHub. More interesting articles can be read on my site.

所有的代码实例都可以在GitHub上找到。更多有趣的文章可以在我的网站上阅读