N+1 Problem in Hibernate and Spring Data JPA – Hibernate 和 Spring Data JPA 中的 N+1 问题

最后修改: 2024年 1月 29日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Spring JPA and Hibernate provide a powerful tool for seamless database communication. However, because clients delegate more control to the frameworks, the resulting generated queries might be far from optimal.

Spring JPA 和 Hibernate 为无缝数据库通信提供了强大的工具。然而,由于客户端将更多控制权委托给了框架,因此生成的查询可能远非最佳。

In this tutorial, we’ll review a common N+1 problem while using Spring JPA and Hibernate. We’ll check different situations that might cause the problem.

在本教程中,我们将回顾使用 Spring JPA 和 Hibernate 时常见的 N+1 问题。我们将检查可能导致该问题的不同情况。

2. Social Media Platform

2.社交媒体平台

To visualize the issue better, we need to outline the relationships between entities. Let’s take a simple social network platform as an example. There’ll be only Users and Posts:

为了更好地将问题形象化,我们需要概述实体之间的关系。让我们以一个简单的社交网络平台为例。这里只有 UsersPosts

User post relationships

We’re using Iterable in the diagrams, and we’ll provide concrete implementations for each example: List or Set.

我们将在图表中使用 Iterable ,我们将为每个示例提供具体实现:ListSet.

To test the number of requests, we’ll use a dedicated library instead of checking the logs. However, we’ll refer to logs to better understand the structure of the requests.

要测试请求的数量,我们将使用 专用库 而不是检查日志。不过,我们会参考日志,以便更好地了解请求的结构。

The fetch type of relationships is assumed as default if not mentioned explicitly in each example. All to-one relationships have eager fetch and to-many – lazy. Also, the code examples use Lombok to reduce the noise in the code.

如果每个示例中没有明确提及,则假定关系的 抓取类型为默认类型。所有to-one关系都是急切加载,而to-many关系则是延迟加载。此外,代码示例使用了 Lombok 以减少代码中的噪音。

3. N+1 Problem

3. N+1 问题

The N+1 problem is the situation when, for a single request, for example, fetching Users, we make additional requests for each User to get their information. Although this problem often is connected to lazy loading, it’s not always the case.

N+1问题指的是这样一种情况:对于单个请求,例如获取Users,我们会对每个User发出额外请求,以获取他们的信息。虽然这个问题通常与懒加载有关,但并非总是如此。

We can get this issue with any type of relationship. However, it usually arises from many-to-many or one-to-many relationships.

任何类型的关系都可能出现这种问题。不过,它通常出现在 many-to-manyone-to-many 关系中。

3.1. Lazy Fetch

3.1. 懒惰撷取

First of all, let’s see how lazy loading might cause the N+1 problem. We’ll consider the following example:

首先,让我们来看看懒加载是如何导致 N+1 问题的。我们来看看下面的例子:

@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author")
    protected List<Post> posts;
    // constructors, getters, setters, etc.
}

Users have one-to-many relationships with the Posts. This means that each User has multiple Posts. We didn’t explicitly identify the fetch strategy for the fields. The strategy is inferred from the annotations. As was mentioned previously, @OneToMany has lazy fetch by default:

UsersPosts 之间具有 one-to-many 关系。这意味着每个 User 都有多个 Posts 。我们没有明确确定字段的获取策略。策略是从注释中推断出来的。如前所述,@OneToMany 默认具有懒取回功能:

@Target({METHOD, FIELD}) 
@Retention(RUNTIME)
public @interface OneToMany {
    Class targetEntity() default void.class;
    CascadeType[] cascade() default {};
    FetchType fetch() default FetchType.LAZY;
    String mappedBy() default "";
    boolean orphanRemoval() default false;
}

If we’re trying to get all the Users, lazy fetch won’t pull more information than we accessed:

如果我们试图获取所有的 Users 用户,那么懒取法不会获取比我们访问的更多的信息:

@Test
void givenLazyListBasedUser_WhenFetchingAllUsers_ThenIssueOneRequests() {
    getUserService().findAll();
    assertSelectCount(1);
}

Thus, to get all Users, we’ll issue a single request. Let’s try to access Posts. Hibernate will issue an additional request because the information wasn’t fetched beforehand. For a single User, it means two requests overall:

因此,要获取所有用户,我们只需发出一个请求。让我们尝试访问 Posts. Hibernate 会发出一个额外的请求,因为事先并没有获取这些信息。对于单个用户来说,这意味着总共需要两次请求:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenLazyListBasedUser_WhenFetchingOneUser_ThenIssueTwoRequest(Long id) {
    getUserService().getUserByIdWithPredicate(id, user -> !user.getPosts().isEmpty());
    assertSelectCount(2);
}

The getUserByIdWithPredicate(Long, Predicate) method filters the Users, but its main goal in the tests is to trigger the loading. We’ll have 1+1 requests, but if we scale it, we’ll get the N+1 problem:

getUserByIdWithPredicate(Long, Predicate) 方法会过滤用户,但它在测试中的主要目标是触发加载。我们会有 1+1 个请求,但如果我们扩展它,就会出现 N+1 的问题:

@Test
void givenLazyListBasedUser_WhenFetchingAllUsersCheckingPosts_ThenIssueNPlusOneRequests() {
    int numberOfRequests = getUserService().countNumberOfRequestsWithFunction(users -> {
        List<List<Post>> usersWithPosts = users.stream()
          .map(User::getPosts)
          .filter(List::isEmpty)
          .toList();
        return users.size();
    });
    assertSelectCount(numberOfRequests + 1);
}

We should be careful about lazy fetch. In some cases, lazy loading makes sense to reduce the data we get from a database. However, if we’re accessing lazily-fetched information in most cases, we might increase the volume of requests. To make the best judgment, we must investigate the access patterns.

我们应该谨慎对待 “懒取回”。在某些情况下,懒加载可以减少我们从数据库中获取的数据。但是,如果我们在大多数情况下都访问懒取回的信息,我们可能会增加请求量。要做出最佳判断,我们必须调查访问模式。

3.2. Eager Fetch

3.2 急切取物

In most cases, eager loading can help us with the N+1 problem. However, the result depends on the relationships between our entities. Let’s consider a similar User class but with an explicitly set eager fetch:

在大多数情况下,急切加载可以帮助我们解决 N+1 问题。但是,结果取决于实体之间的关系。让我们考虑一个类似的 User 类,但它明确设置了急切获取:

@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    private List<Post> posts;
    // constructors, getters, setters, etc.
}

If we fetch a single user, the fetch type will force Hibernate to load all the data in a single request:

如果我们获取的是单个用户,那么获取类型将迫使 Hibernate 在一次请求中加载所有数据:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedUser_WhenFetchingOneUser_ThenIssueOneRequest(Long id) {
    getUserService().getUserById(id);
    assertSelectCount(1);
}

At the same time, the situation with fetching all users changes. We’ll get N+1 straight away whether we want to use the Posts or not:

与此同时,获取所有用户的情况也发生了变化。无论我们是否想使用Posts,我们都将直接获得N+1:

@Test
void givenEagerListBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {
    List<User> users = getUserService().findAll();
    assertSelectCount(users.size() + 1);
}

Although eager fetch changed how Hibernate pulls the data, it’s hard to call it a successful optimization.

虽然急切获取改变了 Hibernate 提取数据的方式,但很难称其为成功的优化。

4. Multiple Collections

4.多种收藏

Let’s introduce Groups in our initial domain:

让我们在初始域中引入 Groups

Group user post relationshiops

The Group contains a List of Users:

Group 包含 UsersList

@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany
    private List<User> members;
    // constructors, getters, setters, etc.
}

4.1. Lazy Fetch

4.1 懒惰撷取

This relationship would generally behave similarly to the previous examples with lazy fetch. We’ll get a new request for each access to lazily pulled information.

这种关系的表现一般与之前使用懒获取的示例类似。每次访问懒提取的信息时,我们都会收到一个新请求。

Thus, unless we access users directly, we’ll have a single request:

因此,除非我们直接访问用户,否则我们将只有一个请求:

@Test
void givenLazyListBasedGroup_whenFetchingAllGroups_thenIssueOneRequest() {
    groupService.findAll();
    assertSelectCount( 1);
}

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenLazyListBasedGroup_whenFetchingAllGroups_thenIssueOneRequest(Long groupId) {
    Optional<Group> group = groupService.findById(groupId);
    assertThat(group).isPresent();
    assertSelectCount(1);
}

However, it would create the N+1 problem if we try to access each User in a group:

但是,如果我们试图访问组中的每个 User ,就会产生 N+1 问题:

@Test
void givenLazyListBasedGroup_whenFilteringGroups_thenIssueNPlusOneRequests() {
    int numberOfRequests = groupService.countNumberOfRequestsWithFunction(groups -> {
        groups.stream()
          .map(Group::getMembers)
          .flatMap(Collection::stream)
          .collect(Collectors.toSet());
        return groups.size();
    });
    assertSelectCount(numberOfRequests + 1);
}

The countNumberOfRequestsWithFunction(ToIntFunction) method counts the requests and also triggers lazy loading.

countNumberOfRequestsWithFunction(ToIntFunction)方法对请求进行计数,同时触发懒加载。

4.2. Eager Fetch

4.2 急切取物

Let’s check the behavior with eager fetch. While requesting a single group, we’ll get the following result:

让我们检查一下急切获取的行为。在请求一个组时,我们会得到以下结果:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests(Long groupId) {
    Optional<Group> group = groupService.findById(groupId);
    assertThat(group).isPresent();
    assertSelectCount(1 + group.get().getMembers().size());
}

It’s reasonable, as we need to get the information for each User eagerly. At the same time, when we get all groups, the number of requests jumps significantly:

这是合理的,因为我们需要急切地获取每个 User 的信息。同时,当我们获取所有组的信息时,请求的数量会大幅增加:

@Test
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusMPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    Set<User> users = groups.stream().map(Group::getMembers).flatMap(List::stream).collect(Collectors.toSet());
    assertSelectCount(groups.size() + users.size() + 1);
}

We need to get the information about Users, and then, for each User, we fetch their Posts. Technically, we have an N+M+1 situation. Thus, neither lazy nor eager fetch entirely resolved the problem.

我们需要获取 Users 的信息,然后针对每个User,获取他们的Posts。从技术上讲,我们会遇到 N+M+1 的情况。因此,无论是懒取还是急取,都不能完全解决问题。

4.3. Using Set

4.3.使用集合

Let’s approach this situation differently. Let’s replace Lists with Sets. We’ll be using eager fetch, as lazy Sets and List behave similarly:

让我们换一种方式来处理这种情况。让我们用 Sets 代替 Lists。我们将使用急切获取,因为懒惰的 SetsList 行为类似:

@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany(fetch = FetchType.EAGER)
    private Set<User> members;
    // constructors, getters, setters, etc.
}

@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    protected Set<Post> posts;
    // constructors, getters, setters, etc.
}
@Entity
public class Post {
    @Id
    private Long id;
    @Lob
    private String content;
    @ManyToOne
    private User author;
    // constructors, getters, setters, etc.
}

Let’s run similar tests to see if this makes any difference:

让我们进行类似的测试,看看这样做是否有什么不同:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenCreateCartesianProductInOneQuery(Long groupId) {
    groupService.findById(groupId);
    assertSelectCount(1);
}

We resolved the N+1 problem while getting a single Group. Hibernate fetched Users and their Posts in one request. Also, getting all Groups has decreased the number of requests, but it’s still N+1:

我们在获取单个 Group 的同时解决了 N+1 的问题。Hibernate 在一次请求中获取了 Users 和他们的 Posts 。此外,获取所有 Groups 也减少了请求次数,但仍然是 N+1:

@Test
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    assertSelectCount(groups.size() + 1);
}

Although we partially solved the problem, we created another one. Hibernates uses several JOINs, creating the Cartesian product:

虽然我们部分解决了这个问题,但我们又产生了一个问题。Hibernates 使用了多个 JOIN,创建了笛卡尔乘积:

SELECT g.id, g.name, gm.interest_group_id,
       u.id, u.username, u.email,
       p.id, p.author_id, p.content
FROM group g
         LEFT JOIN (group_members gm JOIN user u ON u.id = gm.members_id)
                   ON g.id = gm.interest_group_id
         LEFT JOIN post p ON u.id = p.author_id
WHERE g.id = ?

The query might become overly complex and, with many dependencies between objects, pull a huge chunk of a database.

查询可能会变得过于复杂,而且由于对象之间存在许多依赖关系,会占用数据库的大量空间。

Due to the nature of Sets, Hibernate can ensure that all the duplicates in the result set are from the Cartesian product. This is not possible with lists, so data should be fetched in separate requests to maintain its integrity when using lists.

由于集合的性质,Hibernate 可以确保结果集中的所有重复数据都来自笛卡尔乘积。这在列表中是不可能实现的,因此在使用列表时,应通过单独的请求获取数据以保持其完整性。

Most relationships align with the Set invariants. It makes little sense to allow Users to have several identical Posts. At the same time, we can provide a fetch mode explicitly instead of relying on default behavior.

大多数关系与集合不变式一致。允许 Users 拥有多个相同的 Posts 是毫无意义的。同时,我们可以明确提供 撷取模式,而不是依赖默认行为。

5. Tradeoffs

5. 权衡

Picking a fetch type might help reduce the number of requests in simple cases. However, using simple annotations, we have limited control over query generation. Also, it’s done transparently, and small changes in the domain model might create a dramatic impact.

在简单的情况下,选择一种获取类型可能有助于减少请求的数量。不过,使用简单的注释,我们对查询生成的控制有限。而且,查询是透明生成的,领域模型中的微小变化都可能产生巨大影响。

The best way to address the issue is to observe the system’s behavior and identify the access patterns. Creating separate methods, SQL and JPQL queries can help tailor them for each case. Also, we can use fetch mode to hint Hibernate about how we load related entities.

解决这一问题的最佳方法是观察系统行为并识别访问模式。创建单独的方法、SQL 和 JPQL 查询有助于针对每种情况进行定制。此外,我们还可以使用获取模式来提示 Hibernate 如何加载相关实体。

Adding simple tests can help with unintended changes in the model. This way, we can ensure that new relationships won’t create the Cartesian product or N+1 problem.

添加简单的测试有助于解决模型中的意外变化。这样,我们就能确保新的关系不会产生笛卡尔乘积或 N+1 问题。

6. Conclusion

6.结论</b

While eager fetch type can mitigate some simple issues with additional queries, it might cause other issues. It’s necessary to test the application to ensure its performance.

虽然急切获取类型可以通过额外查询缓解一些简单的问题,但它可能会导致其他问题。有必要对应用程序进行测试,以确保其性能。

Different combinations of fetch types and relationships can often produce an unexpected result. That’s why it’s better to cover crucial parts with tests.

获取类型和关系的不同组合往往会产生意想不到的结果。因此,最好通过测试来覆盖关键部分。

As usual, all the code from this tutorial is available over on GitHub.

与往常一样,本教程中的所有代码都可以在 GitHub 上获取