Guide to Java 8 groupingBy Collector – Java 8 groupingBy 采集器指南

最后修改: 2017年 2月 8日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

In this tutorial, we’ll see how the groupingBy collector works using various examples.

在本教程中,我们将通过各种例子来了解groupingBy收集器是如何工作的。

For us to understand the material covered in this tutorial, we’ll need a basic knowledge of Java 8 features. We can have a look at the intro to Java 8 Streams and the guide to Java 8’s Collectors for these basics.

为了让我们理解本教程中所涉及的材料,我们需要对Java 8的功能有一个基本的了解。我们可以看看Java 8 Streams 简介Java 8 的收集器指南,了解这些基础知识。

 2.10. Aggregating multiple Attributes of a Grouped Result

2.10.聚合一个分组结果的多个属性

2. groupingBy Collectors

2.groupingByCollectors

The Java 8 Stream API lets us process collections of data in a declarative way.

Java 8的Stream API让我们以声明的方式处理数据集合。

The static factory methods Collectors.groupingBy() and Collectors.groupingByConcurrent() provide us with functionality similar to the ‘GROUP BY’ clause in the SQL language. We use them for grouping objects by some property and storing results in a Map instance.

静态工厂方法Collectors.groupingBy()Collectors.groupingByConcurrent()为我们提供了类似于SQL语言中’GROUP BY’条款的功能。我们使用它们来按某些属性对对象进行分组,并将结果存储在一个Map实例中。

The overloaded methods of groupingBy are:

groupingBy的重载方法是。

  • First, with a classification function as the method parameter:

    首先,以一个分类函数作为方法参数:

static <T,K> Collector<T,?,Map<K,List<T>>> 
  groupingBy(Function<? super T,? extends K> classifier)
  • Secondly, with a classification function and a second collector as method parameters:

    其次,以分类功能和第二采集器作为方法参数。

static <T,K,A,D> Collector<T,?,Map<K,D>>
  groupingBy(Function<? super T,? extends K> classifier, 
    Collector<? super T,A,D> downstream)
  • Finally, with a classification function, a supplier method (that provides the Map implementation which contains the end result), and a second collector as method parameters:

    最后,用一个分类函数、一个供应商方法(提供包含最终结果的Map实现)和第二个采集器作为方法参数。

static <T,K,D,A,M extends Map<K,D>> Collector<T,?,M>
  groupingBy(Function<? super T,? extends K> classifier, 
    Supplier<M> mapFactory, Collector<? super T,A,D> downstream)

2.1. Example Code Setup

2.1.示例代码设置

To demonstrate the usage of groupingBy(), let’s define a BlogPost class (we will use a stream of BlogPost objects):

为了演示groupingBy()的用法,让我们定义一个BlogPost类(我们将使用一个BlogPost对象的流)。

class BlogPost {
    String title;
    String author;
    BlogPostType type;
    int likes;
}

Next, the BlogPostType:

接下来是BlogPostType

enum BlogPostType {
    NEWS,
    REVIEW,
    GUIDE
}

Then the List of BlogPost objects:

然后是ListBlogPost对象。

List<BlogPost> posts = Arrays.asList( ... );

Let’s also define a Tuple class that will be used to group posts by the combination of their type and author attributes:

让我们也定义一个Tuple类,它将被用于通过其typeauthor属性的组合来分组帖子。

class Tuple {
    BlogPostType type;
    String author;
}

2.2. Simple Grouping by a Single Column

2.2.按单列进行简单分组

Let’s start with the simplest groupingBy method, which only takes a classification function as its parameter. A classification function is applied to each element of the stream.

让我们从最简单的groupingBy方法开始,它只接受一个分类函数作为其参数。一个分类函数被应用于流的每个元素。

We use the value returned by the function as a key to the map that we get from the groupingBy collector.

我们使用该函数返回的值作为我们从groupingBy收集器得到的地图的键。

To group the blog posts in the blog post list by their type:

将博文列表中的博文按其类型分组。

Map<BlogPostType, List<BlogPost>> postsPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType));

2.3. groupingBy with a Complex Map Key Type

2.3.groupingBy与一个复杂的Map键类型

The classification function is not limited to returning only a scalar or String value. The key of the resulting map could be any object as long as we make sure that we implement the necessary equals and hashcode methods.

分类函数并不局限于只返回标量或String值。只要我们确保实现必要的equalshashcode方法,生成的map的键可以是任何对象。

To group using two fields as keys, we can use the Pair class provided in the javafx.util or org.apache.commons.lang3.tuple packages.

为了使用两个字段作为键进行分组,我们可以使用javafx.utilorg.apache.commons.lang3.tuple中提供的Pair类。

For example to group the blog posts in the list, by the type and author combined in an Apache Commons Pair instance:

例如,通过Apache Commons Pair实例中的类型和作者组合,将列表中的博客文章分组。

Map<Pair<BlogPostType, String>, List<BlogPost>> postsPerTypeAndAuthor = posts.stream()
  .collect(groupingBy(post -> new ImmutablePair<>(post.getType(), post.getAuthor())));

Similarly, we can use the Tuple class defined before, this class can be easily generalized to include more fields as needed. The previous example using a Tuple instance will be:

同样地,我们可以使用之前定义的Tuple类,这个类可以很容易地被泛化,以根据需要包括更多的字段。之前使用Tuple实例的例子将是。

Map<Tuple, List<BlogPost>> postsPerTypeAndAuthor = posts.stream()
  .collect(groupingBy(post -> new Tuple(post.getType(), post.getAuthor())));

Java 16 has introduced the concept of a record as a new form of generating immutable Java classes.

Java 16引入了record的概念,作为生成不可变的Java类的一种新形式。

The record feature provides us with a simpler, clearer, and safer way to do groupingBy than the Tuple. For example, we have defined a record instance in the BlogPost:

record功能为我们提供了一种比Tuple更简单、更清晰、更安全的方式来进行groupingBy。例如,我们在BlogPost中定义了一个record实例。

public class BlogPost {
    private String title;
    private String author;
    private BlogPostType type;
    private int likes;
    record AuthPostTypesLikes(String author, BlogPostType type, int likes) {};
    
    // constructor, getters/setters
}

Now it’s very simple to group the BlotPost in the list by the type, author, and likes using the record instance:

现在,使用record实例将列表中的BlotPost按类型、作者和喜好分组是非常简单的。

Map<BlogPost.AuthPostTypesLikes, List<BlogPost>> postsPerTypeAndAuthor = posts.stream()
  .collect(groupingBy(post -> new BlogPost.AuthPostTypesLikes(post.getAuthor(), post.getType(), post.getLikes())));

2.4. Modifying the Returned Map Value Type

2.4.修改返回的Map值类型

The second overload of groupingBy takes an additional second collector (downstream collector) that is applied to the results of the first collector.

groupingBy的第二个重载需要一个额外的第二个收集器(下游收集器),它被应用于第一个收集器的结果。

When we specify a classification function, but not a downstream collector, the toList() collector is used behind the scenes.

当我们指定了一个分类函数,但没有指定下游收集器时,toList()收集器会在幕后使用。

Let’s use the toSet() collector as the downstream collector and get a Set of blog posts (instead of a List):

让我们使用toSet()收集器作为下游收集器,得到一个Set的博客文章(而不是List)。

Map<BlogPostType, Set<BlogPost>> postsPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, toSet()));

2.5. Grouping by Multiple Fields

2.5.按多个字段分组

A different application of the downstream collector is to do a secondary groupingBy to the results of the first group by.

下游收集器的一个不同的应用是对第一个分组的结果做一个二次groupingBy

To group the List of BlogPosts first by author and then by type:

首先按作者,然后按类型博客帖子列表进行分组。

Map<String, Map<BlogPostType, List>> map = posts.stream()
  .collect(groupingBy(BlogPost::getAuthor, groupingBy(BlogPost::getType)));

2.6. Getting the Average from Grouped Results

2.6.从分组结果中获取平均值

By using the downstream collector, we can apply aggregation functions in the results of the classification function.

通过使用下游采集器,我们可以在分类功能的结果中应用聚合功能。

For instance, to find the average number of likes for each blog post type:

例如,要找到每篇博文类型的平均喜欢数

Map<BlogPostType, Double> averageLikesPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, averagingInt(BlogPost::getLikes)));

2.7. Getting the Sum from Grouped Results

2.7.从分组的结果中获取总和

To calculate the total sum of likes for each type:

要计算每个类型喜欢的总和。

Map<BlogPostType, Integer> likesPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, summingInt(BlogPost::getLikes)));

2.8. Getting the Maximum or Minimum from Grouped Results

2.8.从分组结果中获取最大值或最小值

Another aggregation that we can perform is to get the blog post with the maximum number of likes:

我们可以进行的另一个聚合是获得拥有最大数量的喜欢的博客文章。

Map<BlogPostType, Optional<BlogPost>> maxLikesPerPostType = posts.stream()
  .collect(groupingBy(BlogPost::getType,
  maxBy(comparingInt(BlogPost::getLikes))));

Similarly, we can apply the minBy downstream collector to get the blog post with the minimum number of likes.

同样地,我们可以应用minBy下游收集器来获得likes数量最少的博客文章。

Note that the maxBy and minBy collectors take into account the possibility that the collection to which they are applied could be empty. This is why the value type in the map is Optional<BlogPost>.

请注意,maxByminBy收集器考虑到了它们所应用的集合可能是空的。这就是为什么地图中的值类型是Optional<BlogPost>/em>。

2.9. Getting a Summary for an Attribute of Grouped Results

2.9.获取分组结果的一个属性的摘要

The Collectors API offers a summarizing collector that we can use in cases when we need to calculate the count, sum, minimum, maximum and average of a numerical attribute at the same time.

Collectors API提供了一个总结性的收集器,当我们需要同时计算一个数字属性的计数、总和、最小、最大和平均数时,我们可以使用这个收集器。

Let’s calculate a summary for the likes attribute of the blog posts for each different type:

让我们计算一下每个不同类型的博客文章的喜欢属性的总结。

Map<BlogPostType, IntSummaryStatistics> likeStatisticsPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, 
  summarizingInt(BlogPost::getLikes)));

The IntSummaryStatistics object for each type contains the count, sum, average, min and max values for the likes attribute. Additional summary objects exist for double and long values.

每个类型的IntSummaryStatistics对象包含likes属性的计数、总和、平均数、最小和最大值。对于双数和长数的值还有额外的摘要对象。

2.10. Aggregating Multiple Attributes of a Grouped Result

2.10.聚合一个分组结果的多个属性

In the previous sections we’ve seen how to aggregate one field at a time. There are some techniques that we can follow to do aggregations over multiple fields.

在前面的章节中,我们已经看到如何一次聚合一个字段。我们可以遵循一些技术来对多个字段进行聚合

The first approach is to use Collectors::collectingAndThen for the downstream collector of groupingBy. For the first parameter of collectingAndThen we collect the stream into a list, using Collectors::toList. The second parameter applies the finishing transformation, we can use it with any of the Collectors’ class methods that support aggregations to get our desired results.

第一种方法是使用Collectors::collectionAndThen作为groupingBy的下游收集器。对于collectingAndThen的第一个参数,我们使用Collectors::toList将流收集成一个列表。第二个参数应用了整理转换,我们可以与任何支持聚合的Collectors’类方法一起使用,以获得我们想要的结果。

For example, let’s group by author and for each one we count the number of titles, list the titles, and provide a summary statistics of the likes. To accomplish this, we start by adding a new record to the BlogPost:

例如,让我们按作者分组,对于每个人,我们计算标题的数量,列出标题,并提供喜欢的汇总统计。为了实现这一目标,我们首先向BlogPost添加一条新记录。

public class BlogPost {
    // ...
    record PostCountTitlesLikesStats(long postCount, String titles, IntSummaryStatistics likesStats){};
     // ...
}

The implementation of groupingBy and collectingAndThen will be:

groupingBycollectingAndThen的实现将是。

Map<String, BlogPost.PostCountTitlesLikesStats> postsPerAuthor = posts.stream()
  .collect(groupingBy(BlogPost::getAuthor, collectingAndThen(toList(), list -> {
    long count = list.stream()
      .map(BlogPost::getTitle)
      .collect(counting());
    String titles = list.stream()
      .map(BlogPost::getTitle)
      .collect(joining(" : "));
    IntSummaryStatistics summary = list.stream()
      .collect(summarizingInt(BlogPost::getLikes));
    return new BlogPost.PostCountTitlesLikesStats(count, titles, summary);
  })));

In the first parameter of collectingAndThen we get a list of BlogPost. We use it in the finishing transformation as an input to the lambda function to calculate the values to generate PostCountTitlesLikesStats.

在collectionAndThen的第一个参数中,我们得到一个BlogPost的列表。我们在整理转换中使用它作为lambda函数的输入,以计算生成PostCountTitlesLikesStats的值。

To get the information for a given author is as simple as:

要获得某一特定作者的信息,就像这样简单。

BlogPost.PostCountTitlesLikesStats result = postsPerAuthor.get("Author 1");
assertThat(result.postCount()).isEqualTo(3L);
assertThat(result.titles()).isEqualTo("News item 1 : Programming guide : Tech review 2");
assertThat(result.likesStats().getMax()).isEqualTo(20);
assertThat(result.likesStats().getMin()).isEqualTo(15);
assertThat(result.likesStats().getAverage()).isEqualTo(16.666d, offset(0.001d));

We can also do more sophisticated aggregations if we use Collectors::toMap to collect and aggregate the elements of the stream.

如果我们使用Collectors::toMap来收集和聚合流的元素,我们还可以做更复杂的聚合

Let’s consider a simple example where we want to group the BlogPost elements by author and concatenate the titles with an upper bounded sum of like scores.

让我们考虑一个简单的例子,我们想按作者BlogPost元素进行分组,并将titleslike分数的上限之和连接起来。

First, we create the record that is going to encapsulate our aggregated result:

首先,我们创建一个记录,用来封装我们的聚合结果。

public class BlogPost {
    // ...
    record TitlesBoundedSumOfLikes(String titles, int boundedSumOfLikes) {};
    // ...
}

Then we group and accumulate the stream in the following manner:

然后,我们以下列方式对流进行分组和累积。

int maxValLikes = 17;
Map<String, BlogPost.TitlesBoundedSumOfLikes> postsPerAuthor = posts.stream()
  .collect(toMap(BlogPost::getAuthor, post -> {
    int likes = (post.getLikes() > maxValLikes) ? maxValLikes : post.getLikes();
    return new BlogPost.TitlesBoundedSumOfLikes(post.getTitle(), likes);
  }, (u1, u2) -> {
    int likes = (u2.boundedSumOfLikes() > maxValLikes) ? maxValLikes : u2.boundedSumOfLikes();
    return new BlogPost.TitlesBoundedSumOfLikes(u1.titles().toUpperCase() + " : " + u2.titles().toUpperCase(), u1.boundedSumOfLikes() + likes);
  }));

The first parameter of toMap groups the keys applying BlogPost::getAuthor.

toMap的第一个参数将应用BlogPost::getAuthor的键分组。

The second parameter transforms the values of the map using the lambda function to convert each BlogPost into a TitlesBoundedSumOfLikes record.

第二个参数使用lambda函数转换地图的值,将每个BlogPost转换成TitlesBoundedSumOfLikes记录。

The third parameter of toMap deals with duplicate elements for a given key and here we use another lambda function to concatenate the titles and sum the likes with a max allowed value specified in maxValLikes.

toMap的第三个参数处理给定键的重复元素,这里我们使用另一个lambda函数来连接titles,并将likesmaxValLikes中指定的最大允许值相加。

2.11. Mapping Grouped Results to a Different Type

2.11.将分组的结果映射到不同的类型

We can achieve more complex aggregations by applying a mapping downstream collector to the results of the classification function.

我们可以通过对分类函数的结果应用mapping下游收集器来实现更复杂的聚合。

Let’s get a concatenation of the titles of the posts for each blog post type:

让我们得到每个博客文章类型标题的串联。

Map<BlogPostType, String> postsPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, 
  mapping(BlogPost::getTitle, joining(", ", "Post titles: [", "]"))));

What we have done here is to map each BlogPost instance to its title and then reduce the stream of post titles to a concatenated String. In this example, the type of the Map value is also different from the default List type.

我们在这里所做的是将每个BlogPost实例映射到其title,然后将帖子标题流减少到一个串联的String。在这个例子中,Map值的类型也与默认的List类型不同。

2.11. Modifying the Return Map Type

2.11.修改返回的Map类型

When using the groupingBy collector, we cannot make assumptions about the type of the returned Map. If we want to be specific about which type of Map we want to get from the group by, then we can use the third variation of the groupingBy method that allows us to change the type of the Map by passing a Map supplier function.

当使用groupingBy收集器时,我们不能对返回的Map的类型做出假设。如果我们想具体了解我们想从分组中获得哪种类型的Map,那么我们可以使用groupingBy方法的第三个变体,它允许我们通过传递一个Map供应商函数来改变Map的类型。

Let’s retrieve an EnumMap by passing an EnumMap supplier function to the groupingBy method:

让我们通过将EnumMap供应商函数传递给groupingBy方法来检索一个EnumMap

EnumMap<BlogPostType, List<BlogPost>> postsPerType = posts.stream()
  .collect(groupingBy(BlogPost::getType, 
  () -> new EnumMap<>(BlogPostType.class), toList()));

3. Concurrent groupingBy Collector

3.并发的groupingBy收集器

Similar to groupingBy is the groupingByConcurrent collector, which leverages multi-core architectures. This collector has three overloaded methods that take exactly the same arguments as the respective overloaded methods of the groupingBy collector. The return type of the groupingByConcurrent collector, however, must be an instance of the ConcurrentHashMap class or a subclass of it.

groupingBy类似的是groupingByConcurrent收集器,它可以利用多核架构。这个收集器有三个重载方法,其参数与groupingBy收集器的相应重载方法完全相同。然而,groupingByConcurrent收集器的返回类型必须是ConcurrentHashMap类或其子类的一个实例。

To do a grouping operation concurrently, the stream needs to be parallel:

要并发地进行分组操作,流需要是并行的。

ConcurrentMap<BlogPostType, List<BlogPost>> postsPerType = posts.parallelStream()
  .collect(groupingByConcurrent(BlogPost::getType));

If we choose to pass a Map supplier function to the groupingByConcurrent collector, then we need to make sure that the function returns either a ConcurrentHashMap or a subclass of it.

如果我们选择将一个Map供应商函数传递给groupingByConcurrent收集器,那么我们需要确保该函数返回一个ConcurrentHashMap或其子类。

4. Java 9 Additions

4.Java 9的补充

Java 9 introduced two new collectors that work well with groupingBy; more information about them can be found here.

Java 9 引入了两个新的收集器,它们与groupingBy配合得很好;关于它们的更多信息可以在这里找到。

5. Conclusion

5.结论

In this article, we explored the usage of the groupingBy collector offered by the Java 8 Collectors API.

在这篇文章中,我们探讨了Java 8 Collectors API所提供的groupingBycollector的用法。

We learned how groupingBy can be used to classify a stream of elements based on one of their attributes, and how the results of this classification can be further collected, mutated, and reduced to final containers.

我们学习了如何使用groupingBy来根据元素的一个属性对元素流进行分类,以及如何进一步收集、变异和减少这种分类的结果,使之成为最终的容器。

The complete implementation of the examples in this article can be found in the GitHub project.

本文例子的完整实现可以在GitHub项目中找到。