List vs. Set in @OneToMany JPA – @OneToMany JPA 中的列表与集合

最后修改: 2024年 1月 24日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Spring JPA and Hibernate provide a powerful tool for seamless database communication. However, as clients delegate more control to the frameworks, including query generation, the result might be far from what we expect.

Spring JPA 和 Hibernate 为无缝数据库通信提供了强大的工具。然而,随着客户将更多控制权(包括查询生成)委托给框架,结果可能与我们的预期大相径庭。

There’s usually confusion about what to use, Lists or Sets with to-many relationships. Often, this confusion is amplified by the fact that Hibernate uses similar names for its bags, lists, and sets but with slightly different meanings behind them.

ListsSets to-many 的关系通常会造成混淆。Hibernate 对其 、列表和 使用了类似的名称,但其背后的含义却略有不同,这往往会加剧这种混淆。

In most cases, Sets are more suitable for one-to-many or many-to-many relationships. However, they have particular performance implications that we should be aware of.

在大多数情况下,集合更适用于一对多多对多关系。不过,它们对性能有特殊的影响,我们应该注意。

In this tutorial, we’ll learn the difference between Lists and Sets in the context of entity relationships and review several examples of different complexities. Also, we’ll identify the pros and cons of each approach.

在本教程中,我们将学习实体关系中列表和集合的区别,并回顾几个不同复杂性的示例。此外,我们还将指出每种方法的优缺点。

2. Testing

2.测试

We’ll be using a dedicated library to test the number of requests. Checking the logs isn’t a good solution as it’s not automated and might work only on simple examples. When requests generate tens and hundreds of queries, using logs isn’t efficient enough.

我们将使用专用库来测试请求数量。检查日志并不是一个好的解决方案,因为它不是自动化的,而且可能只适用于简单的示例。当请求产生数十或数百个查询时,使用日志就不够高效了。

First of all, we need the io.hypersistenceNote that the number in the artifact ID is the Hibernate version:

首先,我们需要 io.hypersistence.注意,工件 ID 中的数字是 Hibernate 版本:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-63</artifactId>
    <version>3.7.0</version>
</dependency>

Additionally, we’ll be using the util library for log analysis:

此外,我们还将使用 util 库进行 log 分析

<dependency>
    <groupId>com.vladmihalcea</groupId>
    <artifactId>db-util</artifactId>
    <version>1.0.7</version>
</dependency>

We can use these libraries for exploratory tests and cover crucial parts of our application. This way, we ensure that changes in the entity classes don’t create some invisible side effects in the query generation.

我们可以使用这些库进行探索性测试,并覆盖应用程序的关键部分。 通过这种方式,我们可以确保实体类中的更改不会在查询生成中产生一些不可见的副作用。

We should wrap our data source with the provided utilities to make it work. We can use BeanPostProcessor to do this: 

我们应该 使用所提供的实用程序包装我们的数据源,以使其正常工作。我们可以使用 BeanPostProcessor 来做到这一点:

@Component
public class DataSourceWrapper implements BeanPostProcessor {

    public Object postProcessBeforeInitialization(Object bean, String beanName) {
        return bean;
    }

    public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException {
        if (bean instanceof DataSource originalDataSource) {
            ChainListener listener = new ChainListener();
            SLF4JQueryLoggingListener loggingListener = new SLF4JQueryLoggingListener();
            loggingListener.setQueryLogEntryCreator(new InlineQueryLogEntryCreator());
            listener.addListener(loggingListener);
            listener.addListener(new DataSourceQueryCountListener());
            return ProxyDataSourceBuilder
              .create(originalDataSource)
              .name("datasource-proxy")
              .listener(listener)
              .build();
        }
        return bean;
    }
}

The rest is simple. In our tests, we’ll use SQLStatementCountValidator to validate the number and the type of the queries.

剩下的就很简单了。在测试中,我们将使用 SQLStatementCountValidator 来验证查询的次数和类型。

3. Domain

3.领域

To make the examples more relevant and easier to follow, we’ll be using a model for a social networking website. We’ll have different relationships between groups, users, posts, and comments.

为了使示例更贴切、更容易理解,我们将使用一个社交网站的模型。我们将在群组、用户、帖子和评论之间建立不同的关系。

However, we’ll build up the complexity step by step, adding entities to highlight the differences and the performance effect. This is important as simple models with only a few relationships won’t provide a complete picture. At the same time, overly complex ones might overwhelm the information, making it hard to follow.

不过,我们将逐步提高复杂性,添加实体以突出差异和性能效果。这一点很重要,因为只有少量关系的简单模型无法提供完整的图像。同时,过于复杂的模型可能会淹没信息,使人难以理解。

For these examples, we’ll use only the eager fetch type for to-many relationships. In general, Lists and Sets behave similarly when we use lazy fetch.

在这些示例中,我们将仅使用 eager fetch 类型来处理 to-many 关系。一般而言,当我们使用懒取回时,列表集合的行为类似。

In the visuals, we’ll be using Iterable as a to-many field type. This is done only for brevity, so we don’t need to repeat the same visuals for Lists and Sets. We’ll explicitly define a dedicated type in each section and show it in the code.

在视觉效果中,我们将使用 Iterable 作为 to-many 字段类型。这样做只是为了简明扼要,因此我们无需重复ListsSets的相同视觉效果。我们将在每一节中明确定义专用类型,并在代码中显示出来。

4. Users and Posts

4.用户和职位

First of all, let’s consider only the part of our domain. Here, we’ll be taking into account only users and posts:

首先,我们只考虑域名的一部分。在这里,我们只考虑用户和帖子:

For the first example, we’ll have a simple bidirectional relationship between users and posts. Users can have many posts. At the same time, a post can have only one user as an author.

在第一个示例中,我们将在用户和帖子之间建立一个简单的双向关系。用户可以拥有许多帖子。同时,一个帖子只能有一个用户作为作者。

4.1. Lists and Sets Joins

4.1.ListsSets 连接

Let’s check the behavior of the queries when we request only one user. We’ll consider the following two scenarios for Set and List:

让我们检查一下当我们只请求一个用户时的查询行为。我们将考虑 SetList 的以下两种情况:

@Data
@Entity
public class User {
    // Other fields
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    protected List<Post> posts;
}

Set-based User looks quite similar:

基于 SetUser 看起来非常相似:

@Data
@Entity
public class User {
    // Other fields
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    protected Set<Post> posts;
}

While fetching a User, Hibernate generates a single query with LEFT JOIN to get all the information in one go. This is true for both cases:

在获取 User 时,Hibernate 会使用 LEFT JOIN 生成单个查询,以便一次性获取所有信息:

SELECT u.id, u.email, u.username, p.id, p.author_id, p.content
FROM simple_user u
         LEFT JOIN post p ON u.id = p.author_id
WHERE u.id = ?

While we have only one query, the user’s data will be repeated for each row. This means that we’ll see the ID, email, and username as many times as many posts a particular user has:

虽然我们只有一个查询,但用户的数据将在每一行中重复出现。这意味着,我们将多次看到 ID、电子邮件和用户名,就像某个用户发表了多少篇文章一样: <br

u.id u.email u.username p.id p.author_id p.content
101 user101@email.com user101 1 101 “User101 post 1”
101 user101@email.com user101 2 101 “User101 post 2”
102 user102@email.com user102 3 102 “User102 post 1”
102 user102@email.com user102 4 102 “User102 post 2”
103 user103@email.com user103 5 103 “User103 post 1”
103 user103@email.com user103 6 103 “User103 post 2”

If the user table has many columns or posts, this may create a performance problem. We can address this issue by specifying the fetch mode explicitly.

如果用户表有很多列或帖子,这可能会造成性能问题。我们可以通过明确指定 抓取模式来解决这个问题。

4.2. Lists and Sets N+1

4.2.ListsSets N+1

At the same time, while fetching multiple users, we encounter an infamous N+1 problem. This is true for List-based Users:

同时,在获取多个用户时,我们会遇到一个臭名昭著的 N+1 问题。这对于基于 ListUsers 来说是真实的:

@Test
void givenEagerListBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {
    List<User> users = getService().findAll();
    assertSelectCount(users.size() + 1);
}

Also, this is true for Set-based Users:

此外,对于基于 SetUsers 也是如此:

@Test<br/>
void givenEagerSetBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {<br/>
List<User> users = getService().findAll();<br/>
assertSelectCount(users.size() + 1);<br/>
}

@Test<br/>
void givenEagerSetBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {<br/>
List<User> users = getService().findAll();<br/>
assertSelectCount(users.size() + 1);<br/>
}

There will be only two kinds of queries. The first one fetches all the users:

只有两种查询。第一种是获取所有用户:

SELECT u.id, u.email, u.username
FROM simple_user u

And N number of subsequent requests to get the posts for each User:

以及N次后续请求,以获取每个用户的帖子:

SELECT p.id, p.author_id, p.content
FROM post p
WHERE p.author_id = ?

Thus, we don’t have any differences between Lists and Sets for these types of relationships.

因此,对于这些类型的关系,ListsSets 之间没有任何区别。

5. Groups, Users and Posts

5.组、用户和帖子

Let’s consider more complex relationships and add groups to our model. They create unidirectional many-to-many relationships with users:

让我们考虑一下更复杂的关系,并将组添加到我们的模型中。它们会与用户创建 unidirectional many-to-many 关系:

Because the relationships between Users and Posts remain the same, old tests will be valid and produce the same results. Let’s create similar tests for groups.

由于 UsersPosts 之间的关系保持不变,因此旧测试将有效并产生相同的结果。让我们为组创建类似的测试。

5.1. Lists and N + 1

5.1.ListsN + 1

We’ll have the following Group class with @ManyToMany relationships:

我们将拥有以下具有 @ManyToMany 关系的 Group 类:

@Data
@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany(fetch = FetchType.EAGER)
    private List<User> members;
}

Let’s try to fetch all the groups:

让我们试着获取所有组别:

@Test
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusMPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    Set<User> users = groups.stream().map(Group::getMembers).flatMap(List::stream).collect(Collectors.toSet());
    assertSelectCount(groups.size() + users.size() + 1);
}

Hibernate will issue additional queries for each group to get the members and for each member to get their posts. Thus, we’ll have three types of queries:

Hibernate 将为每个组发出额外的查询,以获取成员,并为每个成员发出额外的查询,以获取他们的帖子。因此,我们将有三种类型的查询:

SELECT g.id, g.name
FROM interest_group g

SELECT gm.interest_group_id, u.id, u.email, u.username
FROM interest_group_members gm
         JOIN simple_user u ON u.id = gm.members_id
WHERE gm.interest_group_id = ?

SELECT p.author_id, p.id, p.content
FROM post p
WHERE p.author_id = ?

Overall, we’ll get 1 + N + M number of requests. N is the number of groups, and M is the number of unique users in these groups. Let’s try to fetch a single group:

N是组的数量,M是这些组中唯一用户的数量。让我们尝试获取一个组:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests(Long groupId) {
    Optional<Group> group = groupService.findById(groupId);
    assertThat(group).isPresent();
    assertSelectCount(1 + group.get().getMembers().size());
}

We’ll have a similar situation, but we’ll get all the User data in a single query using LEFT JOIN. Thus, there will be only two types of queries:

我们将遇到类似的情况,但我们将使用 LEFT JOIN 在单个查询中获取所有 User 数据。这样,就只有两种类型的查询:

SELECT g.id, gm.interest_group_id, u.id, u.email, u.username, g.name
FROM interest_group g
         LEFT JOIN (interest_group_members gm JOIN simple_user u ON u.id = gm.members_id)
                   ON g.id = gm.interest_group_id
WHERE g.id = ?

SELECT p.author_id, p.id, p.content
FROM post p
WHERE p.author_id = ?

Overall, we’ll have N + 1 requests, where N is the number of group members.

总的来说,我们将收到 N + 1 个请求,其中 N 是小组成员的数量。

5.2. Sets and Cartesian Product

5.2.集合和笛卡尔积

While working with Sets, we’ll see a different picture. Let’s check our Set-based Group class:

在使用 Sets 时,我们将看到一幅不同的画面。让我们检查一下基于 SetGroup 类:

@Data
@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany(fetch = FetchType.EAGER)
    private Set<User> members;
}

Fetching all the groups will produce a slightly different result from the List-based groups:

获取所有组的结果将与基于 List 的组略有不同:

@Test
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    assertSelectCount(groups.size() + 1);
}

Instead of N + M + 1 from the previous example. We’ll have just N + 1 but get more complex queries. We’ll still have a separate query to get all the groups, but Hibernate fetches users and their posts in a single query using two JOINs:

而不是上一个示例中的 N + M + 1。我们将只有 N + 1,但会得到更复杂的查询。我们仍将使用单独的查询来获取所有组,但 Hibernate 会使用两个 JOIN 在单个查询中获取用户及其帖子:

SELECT g.id, g.name
FROM interest_group g

SELECT u.id,
       u.username,
       u.email,
       p.id,
       p.author_id,
       p.content,
       gm.interest_group_id,
FROM interest_group_members gm
         JOIN simple_user u ON u.id = gm.members_id
         LEFT JOIN post p ON u.id = p.author_id
WHERE gm.interest_group_id = ?

Although we reduced the number of queries, the result set might contain duplicated data due to JOINs and, subsequently, a Cartesian product. We’ll get repeated group information for all the users in the group, and all of that will be repeated for each user post:

虽然我们减少了查询次数,但由于 JOIN,结果集中可能包含重复数据,并随后出现 Cartesian 产品我们将为组中的所有用户获取重复的组信息,并且所有这些信息都将在每个用户的帖子中重复出现:

u.id u.username u.email p.id p.author_id p.content gm.interest_group_id
301 user301 user301@email.com 201 301 “User301’s post 1” 101
302 user302 user302@email.com 202 302 “User302’s post 1” 101
303 user303 user303@email.com NULL NULL NULL 101
304 user304 user304@email.com 203 304 “User304’s post 1” 102
305 user305 user305@email.com 204 305 “User305’s post 1” 102
306 user306 user306@email.com NULL NULL NULL 102
307 user307 user307@email.com 205 307 “User307’s post 1” 103
308 user308 user308@email.com 206 308 “User308’s post 1” 103
309 user309 user309@email.com NULL NULL NULL 103

After reviewing the previous queries, it’s obvious why fetching a single group would issue a single request:

在查看了前面的查询后,很明显为什么获取单个组会发出单个请求了:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenCreateCartesianProductInOneQuery(Long groupId) {
    groupService.findById(groupId);
    assertSelectCount(1);
}

We’ll use only the second query with JOINs, reducing the number of requests:

我们将只使用带 JOIN 的第二个查询,从而减少请求次数:

SELECT u.id,
       u.username,
       u.email,
       p.id,
       p.author_id,
       p.content,
       gm.interest_group_id,
FROM interest_group_members gm
         JOIN simple_user u ON u.id = gm.members_id
         LEFT JOIN post p ON u.id = p.author_id
WHERE gm.interest_group_id = ?

5.3. Removals using Lists and Sets

5.3.使用 ListsSets 进行删除

Another interesting difference between Sets and Lists is how they remove objects. This only applies to the @ManyToMany relationships. Let’s consider a more straightforward case with Sets first:

SetsLists 之间另一个有趣的区别是它们如何移除对象。这只适用于@ManyToMany关系。让我们先考虑一个更直接的Sets案例:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedGroup_whenRemoveUser_thenIssueOnlyOneDelete(Long groupId) {
    groupService.findById(groupId).ifPresent(group -> {
        Set<User> members = group.getMembers();
        if (!members.isEmpty()) {
            reset();
            Set<User> newMembers = members.stream().skip(1).collect(Collectors.toSet());
            group.setMembers(newMembers);
            groupService.save(group);
            assertSelectCount(1);
            assertDeleteCount(1);
        }
    });
}

The behavior is quite reasonable, and we just remove the record from the join table. We’ll see in the logs only two queries:

这种行为非常合理,我们只需从连接表中删除记录即可。我们将在日志中看到两个查询:

SELECT g.id, g.name,
       u.id, u.username, u.email,
       p.id, p.author_id, p.content,
       m.interest_group_id,
FROM interest_group g
         LEFT JOIN (interest_group_members m JOIN simple_user u ON u.id = m.members_id)
                   ON g.id = m.interest_group_id
         LEFT JOIN post p ON u.id = p.author_id

DELETE
FROM interest_group_members
WHERE interest_group_id = ? AND members_id = ?

We have an additional selection only because the test methods aren’t transactional, and the original group isn’t stored in our persistence context.

我们有额外的选择,只是因为测试方法不是事务性的,而且原始组没有存储在我们的持久化上下文中。

Overall, Sets behave the way we would assume. Now, let’s check the Lists behavior:

总的来说,Sets 的行为与我们设想的一致。现在,让我们检查一下 Lists 的行为:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedGroup_whenRemoveUser_thenIssueRecreateGroup(Long groupId) {
    groupService.findById(groupId).ifPresent(group -> {
        List<User> members = group.getMembers();
        int originalNumberOfMembers = members.size();
        assertSelectCount(ONE + originalNumberOfMembers);
        if (!members.isEmpty()) {
            reset();
            members.remove(0);
            groupService.save(group);
            assertSelectCount(ONE + originalNumberOfMembers);
            assertDeleteCount(ONE);
            assertInsertCount(originalNumberOfMembers - ONE);
        }
    });
}

Here, we have several queries: SELECT, DELETE, and INSERT. The problem is that Hibernate removes the entire group from the join table and recreates it anew. Again, we have the initial select statements due to the lack of persistence context in the test methods:

在这里,我们有几个查询:SELECT、DELETE 和 INSERT问题在于,Hibernate 从 连接表中删除了整个组,并重新创建了它。同样,由于测试方法中缺乏持久化上下文,我们使用了初始选择语句:

SELECT u.id, u.email, u.username, g.name,
       g.id, gm.interest_group_id,
FROM interest_group g
         LEFT JOIN (interest_group_members gm JOIN simple_user u ON u.id = gm.members_id)
                   ON g.id = gm.interest_group_id
WHERE g.id = ?

SELECT p.author_id, p.id, p.content
FROM post p
WHERE p.author_id = ?

DELETE
FROM interest_group_members
WHERE interest_group_id = ? 
    
INSERT
INTO interest_group_members (interest_group_id, members_id)
VALUES (?, ?)

The code will produce one query to get all the group members. N requests to get the posts, where N is the number of members. One request to delete the entire group, and N – 1 requests to add members again. In general, we can think about it as 1 + 2N.

代码将产生一个查询来获取所有群组成员。N个请求来获取帖子,其中N是成员的数量。一个请求用于删除整个群组,N – 1 个请求用于再次添加成员。一般来说,我们可以把它看成是 1 + 2N.

Lists don’t produce a Cartesian product not because of the performance consideration. As Lists allow repeated elements, Hibernate has problems distinguishing Cartesian duplicates and the duplicates in the collections. 

列表不产生笛卡尔产品并不是因为性能方面的考虑。由于 Lists 允许重复元素,因此 Hibernate 在区分笛卡尔重复和集合中的重复时会遇到问题。

This is why it’s recommended to use only Sets with @ManyToMany annotation. Otherwise, we should prepare for the dramatic performance impact.

这就是为什么我们建议仅使用带有 @ManyToMany 注解的 Sets 的原因。否则,我们应该为性能受到的巨大影响做好准备。

6. Complete Domain

6.完成领域

Now, let’s consider a more realistic domain with many different relationships:

现在,让我们考虑一个更现实的领域,它有许多不同的关系:

Now, we have quite an interconnected domain model. There are several one-to-many relationships, bidirectional many-to-many relationships, and transitive circular relationships.

现在,我们有了一个相互关联的领域模型。其中有若干一对多关系、双向多对多关系和传递循环关系。

6.1. Lists

6.1.列表</em

First, let’s consider the relationships where we use List for all to-many relationships. Let’s try to fetch all the users from the database:

首先,让我们考虑使用 List 来处理所有 to-many 关系的情况。让我们尝试从数据库中获取所有用户:

@ParameterizedTest
@MethodSource
void givenEagerListBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests(ToIntFunction<List<User>> function) {
    int numberOfRequests = getService().countNumberOfRequestsWithFunction(function);
    assertSelectCount(numberOfRequests);
}

static Stream<Arguments> givenEagerListBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {
    return Stream.of(
      Arguments.of((ToIntFunction<List<User>>) s -> {
          int result = 2 * s.size() + 1;
          List<Post> posts = s.stream().map(User::getPosts)
            .flatMap(List::stream)
            .toList();

          result += posts.size();
          return result;
      })
    );
}

This request would result in many different queries. First, we’ll get all the users’ IDs. Then, separate requests for all the groups and posts for each user. Finally, we’ll fetch the information about each post.

这个请求会产生许多不同的查询。首先,我们将获取所有用户的 ID。然后,分别请求每个用户的所有群组和帖子。最后,我们将获取每个帖子的信息。

Overall, we’ll issue lots of queries, but at the same time, we won’t have any joins between several to-many relationships. This way, we avoid a Cartesian product and have a lower amount of data returned, as we don’t have duplicates, but we use more requests.

总的来说,我们将发出大量查询,但同时,我们不会在多个 to-many 关系之间进行任何连接。这样,我们就避免了笛卡尔乘积,返回的数据量也会减少,因为我们不会有重复数据,但我们会使用更多请求。

While fetching a single user, we’ll have an interesting situation:

在获取单个用户时,我们会遇到一种有趣的情况:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedUser_WhenFetchingOneUser_ThenUseDFS(Long id) {
    int numberOfRequests = getService()
      .getUserByIdWithFunction(id, this::countNumberOfRequests);
    assertSelectCount(numberOfRequests);
}

The countNumberOfRequests method is a util method that uses DFS to count the number of entities and calculate the number of requests:

countNumberOfRequests 方法是一个 util 方法,它使用 DFS 来计算实体数量和请求数量:

Get all the posts for user #2
The user wrote the following posts: 1,2,3
 Check all the commenters for post #1: 3,8,9,10
  Get all the posts for user #10: 22
   Check all the commenters for post #22: 3,6,7,10
    Get all the posts for user #3: 4,5,6
     Check all the commenters for post #4: 2,4,9
      Get all the posts for user #9: 19,20,21
       Check all the commenters for post #19: 3,4,8,9,10
        Get all the posts for user #8: 16,17,18
         Check all the commenters for post #16: 
         Check all the commenters for post #17: 2,4,9
          Get all the posts for user #4: 7,8,9,10
           Check all the commenters for post #7: 
           Check all the commenters for post #8: 
           Check all the commenters for post #9: 1,5,6
            Get all the posts for user #1: 
            Get all the posts for user #5: 11,12,13,14
             Check all the commenters for post #11: 2,3,8
             Check all the commenters for post #12: 10
             Check all the commenters for post #13: 4,9,10
             Check all the commenters for post #14: 
            Get all the posts for user #6: 
           Check all the commenters for post #10: 2,5,6,8
         Check all the commenters for post #18: 1,2,3,4,5
       Check all the commenters for post #20: 
       Check all the commenters for post #21: 7
        Get all the posts for user #7: 15
         Check all the commenters for post #15: 1
     Check all the commenters for post #5: 1,2,5,8
     Check all the commenters for post #6: 
 Check all the commenters for post #2: 
 Check all the commenters for post #3: 1,3,6

The result is a transitive closure. For a single user with ID #2, we have to do 42(!) requests to the database. Although the main issue is the eager fetch type, it shows the explosion in the request number if we’re using Lists.

结果是一个 传递闭包。对于 ID 为 #2 的单个用户,我们必须向数据库发出 42(!)次请求。虽然主要问题在于急切获取类型,但如果我们使用 Lists 则会导致请求数激增。

Lazy fetch might produce a similar issue when we trigger the load for most of the internal fields. This might be intentional based on the domain logic. Also, it might be accidental, for example, incorrect overrides for toString(), equals(T), and hashCode() methods. 

当我们触发大多数内部字段的加载时,懒取回可能会产生类似的问题。此外,这也可能是偶然的,例如,对 toString()、equals(T) hashCode()方法的不正确重载。

6.2. Sets

6.2.集合</em

Let’s change all the Lists in our domain model to Sets and make similar tests:

让我们将领域模型中的所有 Lists 更改为 Sets 并进行类似的测试:

@Test
void givenEagerSetBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequestsWithCartesianProduct() {
    List<User> users = getService().findAll();
    assertSelectCount(users.size() + 1);
}

First, we’ll have fewer requests to get all the users, which should be better overall. However, if we look at the requests, we can see the following:

首先,我们将有更少的请求来获取所有用户,这在整体上应该会更好。但是,如果我们查看这些请求,就会发现以下情况:

SELECT profile.id, profile.biography, profile.website, profile.profile_picture_url,
       user.id, user.email, user.username,
       user_group.members_id,
       interest_group.id, interest_group.name,
       post.id, post.author_id, post.content,
       comment.id, comment.text, comment.post_id,
       comment_author.id, comment_author.profile_id, comment_author.username, comment_author.email,
       comment_author_group_member.members_id,
       comment_author_group.id, comment_author_group.name
FROM profile profile
         LEFT JOIN simple_user user
ON profile.id = user.profile_id
    LEFT JOIN (interest_group_members user_group
    JOIN interest_group interest_group
    ON interest_group.id = user_group.groups_id)
    ON user.id = user_group.members_id
    LEFT JOIN post post ON user.id = post.author_id
    LEFT JOIN comment comment ON post.id = comment.post_id
    LEFT JOIN simple_user comment_author ON comment_author.id = comment.author_id
    LEFT JOIN (interest_group_members comment_author_group_member
    JOIN interest_group comment_author_group
    ON comment_author_group.id = comment_author_group_member.groups_id)
    ON comment_author.id = comment_author_group_member.members_id
WHERE profile.id = ?

This query pulls an immense amount of data from the database, and we have one such query for each user. Another thing is that the result set will contain duplicates due to the Cartesian product. Getting a single user would give us a similar result, fewer requests but with massive result sets.

该查询从数据库中提取了大量数据,我们对每个用户都进行了一次这样的查询。另外,由于笛卡尔积的存在,结果集将包含重复数据。获取单个用户的结果与此类似,请求次数较少,但结果集庞大。

7. Pros and Cons

7.优点和缺点

We used eager fetch in this tutorial to highlight the difference in the default behavior of Lists and Sets. While loading data eagerly might improve the performance and simplify the interaction with the database, it should be used cautiously.

在本教程中,我们使用了急切获取来强调 ListsSets 默认行为的不同之处。虽然急切加载数据可能会提高性能并简化与数据库的交互,但应谨慎使用。

Although eager fetch is usually considered to solve the N+1 problem, it’s not always the case. The behavior depends on multiple factors and the overall structure of the relationships between domain entities.

虽然急切获取通常被认为能解决 N+1 问题,但情况并非总是如此。其行为取决于多种因素和域实体间关系的整体结构。

Sets are preferable to use with too many relationships for several reasons. First, in most cases, the collection that doesn’t allow duplicates reflects the domain model perfectly. We cannot have two identical users in a group, and a user cannot have two identical posts.

更适合与过多的关系一起使用,原因有以下几点。首先,在大多数情况下,不允许重复的集合可以完美地反映域模型。我们不能在一个组中拥有两个相同的用户,一个用户也不能拥有两个相同的帖子。

Another thing is that Sets are more flexible. While the default fetch mode for Sets is to create a join, we can define it explicitly by using fetch mode.

另外,Sets 更加灵活。虽然 Sets 的默认获取模式是创建连接,但我们可以通过使用获取模式来明确定义它。

The delete behavior for many-to-many relationships using Lists produces an overhead. It’s hard to notice the difference on small datasets, but we can experience high latency with lots of data.

使用 Lists 的多对多关系的删除行为会产生开销。在小数据集上很难注意到这种差异,但在数据量大的情况下,我们可能会遇到很高的延迟。

To avoid these problems, it’s a good idea to cover the crucial parts of our interaction with the database with tests. It would ensure that some seemingly insignificant change in one part of our domain model won’t introduce huge overhead in generated queries.

为了避免这些问题,最好通过测试来覆盖我们与数据库交互的关键部分。这将确保我们的领域模型中某一部分看似无关紧要的更改不会在生成的查询中带来巨大的开销。

8. Conclusion

8.结论</b

In most situations, we should use Sets for to-many relationships. This provides us with mode controllable relationships and avoids overheads on deletes.

在大多数情况下,我们应该使用 Sets 来处理 to-many 关系。这为我们提供了模式可控的关系,并避免了删除时的开销。

However, all the changes and ideas about improving the domain model should be profiled and tested. The issues might not expose themselves to small datasets and simplistic entity relationships.

不过,所有有关改进领域模型的更改和想法都应进行剖析和测试。这些问题可能不会在小数据集和简单实体关系中暴露出来。

As usual, all the code from this tutorial is available over on GitHub.

与往常一样,本教程中的所有代码均可在 GitHub 上获取