Skip Select Before Insert in Spring Data JPA – 在 Spring Data JPA 中插入前跳过选择

最后修改: 2024年 3月 4日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In some cases, when we’re saving an entity using a Spring Data JPA Repository, we may encounter an additional SELECT in the logs. This may cause performance issues due to numerous extra calls.

在某些情况下,当我们使用Spring Data JPA Repository保存实体时,我们可能会在日志中遇到一个额外的SELECT。这可能会因为大量的额外调用而导致性能问题。

In this tutorial, we’ll explore a few methods to skip SELECT in logs and improve performance.

在本教程中,我们将探讨几种在日志中跳过 SELECT 并提高性能的方法

2. Setup

2.设置

Before diving into Spring Data JPA and testing it, there are a few preparatory steps we need to take.

在深入 Spring Data JPA 并对其进行测试之前,我们需要做一些准备工作。

2.1. Dependencies

2.1 依赖性

To create our test repositories we’ll use Spring Data JPA dependency:

要创建测试库,我们将使用 Spring Data JPA 依赖关系:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

As a test database, we’ll use the H2 Database. Let’s add its dependency:

作为测试数据库,我们将使用 H2 数据库。让我们添加它的 依赖关系

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

In our integrational tests, we’ll use a test Spring Context. Let’s add the spring-boot-starter-test dependency:

在集成测试中,我们将使用 Spring Context 测试。让我们添加 spring-boot-starter-test 依赖关系:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

2.2. Configuration

2.2.配置

Here is the JPA configuration we’ll use in our example:

下面是我们将在示例中使用的 JPA 配置:

spring.jpa.hibernate.dialect=org.hibernate.dialect.H2Dialect
spring.jpa.hibernate.ddl-auto=create-drop
spring.jpa.hibernate.show_sql=true
spring.jpa.hibernate.hbm2ddl.auto=create-drop

According to this configuration, we’ll let Hibernate generate the schema and log all the SQL queries into the log.

根据这一配置,我们将让 Hibernate 生成模式,并将所有 SQL 查询记录到日志中。

3. The Reason for the SELECT Query

3.SELECT 查询的原因

Let’s see the reason why we have such extra SELECT queries implementing the simple repository.

First things first, let’s create an entity:

首先,让我们创建一个实体:

@Entity
public class Task {

    @Id
    private Integer id;
    private String description;

    //getters and setters
}

Now, let’s create a repository for this entity:

现在,让我们为这个实体创建一个存储库:

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer> {
}

Now, let’s save a new Task specifying the ID:

现在,让我们保存一个新的 Task 并指定 ID:

@Autowired
private TaskRepository taskRepository;

@Test
void givenRepository_whenSaveNewTaskWithPopulatedId_thenExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(1);
    taskRepository.saveAndFlush(task);
}

When we call the saveAndFlush() – the behavior for the save() method will be the same – method of our repository, internally we use this code:

当我们调用saveAndFlush() –时,save()方法的行为将与存储库的方法相同,我们在内部使用此代码:

public<S extends T> S save(S entity){
    if(isNew(entity)){
        entityManager.persist(entity);
        return entity;
    } else {
        return entityManager.merge(entity);
    }
}

So, if our entity is considered as not new, we’ll call the merge() method of the entity manager. Inside merge() JPA checks if our entity is present in a cache and persistence context. Since our object is new it’ll not be found there.  Finally, it tries to load the entity from the data source.

因此,如果我们的实体被认为不是新的,我们将调用实体管理器的 merge() 方法。在 merge() 中,JPA 会检查我们的实体是否存在于缓存和持久化上下文中。由于我们的对象是新的,所以不会在那里找到。 最后,它会尝试从数据源加载实体。

This is the point where we come across the SELECT query in the logs. Since we don’t have such an item in the database, we invoke the INSERT query after that:

这就是我们在日志中遇到 SELECT 查询的地方。由于数据库中没有这样的项目,因此我们在这之后调用 INSERT 查询:

Hibernate: select task0_.id as id1_1_0_, task0_.description as descript2_1_0_ from task task0_ where task0_.id=?
Hibernate: insert into task (id, description) values (default, ?)

In the isNew() method implementation we can find the next code:

isNew() 方法的实现中,我们可以找到下一段代码:

public boolean isNew(T entity) {
    ID id = this.getId(entity);
    return id == null;
}

If we specify the ID on the application side, our entity will be considered new. An extra SELECT query will be sent to the database in that case.

如果我们在应用程序端指定 ID,我们的实体将被视为新实体。在这种情况下,将向数据库发送一个额外的 SELECT 查询。

4. Use @GeneratedValue

4.使用 @GeneratedValue 方法

One of the possible solutions is to not specify the ID on the application side. We can use @GeneratedValue annotation and specify a strategy that’ll be used to generate ID on the database side.

可能的解决方案之一是不在应用程序端指定 ID。我们可以使用 @GeneratedValue 注释,并指定用于在数据库侧生成 ID 的策略。

Let’s specify the generation strategy for our TaskWithGeneratedId ID:

让我们指定 TaskWithGeneratedId ID 的生成策略

@Entity
public class TaskWithGeneratedId {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;
}

Then,  we save an instance of the TaskWithGeneratedId entity, but now we don’t set the ID:

然后,我们保存一个 TaskWithGeneratedId 实体的实例,但现在我们不设置 ID:

@Autowired
private TaskWithGeneratedIdRepository taskWithGeneratedIdRepository;

@Test
void givenRepository_whenSaveNewTaskWithGeneratedId_thenNoExtraSelectIsExpected() {
    TaskWithGeneratedId task = new TaskWithGeneratedId();
    TaskWithGeneratedId saved = taskWithGeneratedIdRepository.saveAndFlush(task);
    assertNotNull(saved.getId());
}

As we can see in the logs, there are no SELECT queries in the logs and a new ID was generated for the entity.

我们可以从日志中看到,日志中没有 SELECT 查询,并且为实体生成了一个新的 ID。

5. Implement Persistable

5.实现 Persistable

Another option we have is to implement the Persistable interface in our entity:

我们的另一个选择是在实体中实现 Persistable 接口:

@Entity
public class PersistableTask implements Persistable<Integer> {
    @Id
    private int id;

    @Transient
    private boolean isNew = true;

    @Override
    public Integer getId() {
        return id;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }
    
    //getters and setters
}

Here we’ve added a new field isNew and annotated it as @Transient to not create a column in the base. Using the overridden isNew() method we can consider our entity as new even though we have an ID specified.

在这里,我们添加了一个新字段 isNew,并将其注释为 @Transient,以便不在基础中创建列。使用重载的 isNew() 方法,我们可以将实体视为新实体,即使我们指定了一个 ID。

Now, under the hood, JPA uses another logic to consider if an entity is new or not:

现在,在引擎盖下,JPA 使用另一种逻辑来考虑实体是否是新的:

public class JpaPersistableEntityInformation {
    public boolean isNew(T entity) {
        return entity.isNew();
    }
}

Let’s save our PersistableTask using the PersistableTaskRepository:

让我们使用 PersistableTaskRepository 保存我们的 PersistableTask

@Autowired
private PersistableTaskRepository persistableTaskRepository;

@Test
void givenRepository_whenSaveNewPersistableTask_thenNoExtraSelectIsExpected() {
    PersistableTask persistableTask = new PersistableTask();
    persistableTask.setId(2);
    persistableTask.setNew(true);
    PersistableTask saved = persistableTaskRepository.saveAndFlush(persistableTask);
    assertEquals(2, saved.getId());
}

As we can see, we’ll have only the INSERT log message and the entity contains the ID we specified.

正如我们所看到的,我们将只有 INSERT 日志信息,实体包含我们指定的 ID。

If we try to save a few new entities with the same ID, we encounter an exception:

如果我们尝试保存几个具有相同 ID 的新实体,就会出现异常:

@Test
void givenRepository_whenSaveNewPersistableTasksWithSameId_thenExceptionIsExpected() {
    PersistableTask persistableTask = new PersistableTask();
    persistableTask.setId(3);
    persistableTask.setNew(true);
    persistableTaskRepository.saveAndFlush(persistableTask);

    PersistableTask duplicateTask = new PersistableTask();
    duplicateTask.setId(3);
    duplicateTask.setNew(true);

    assertThrows(DataIntegrityViolationException.class,
      () -> persistableTaskRepository.saveAndFlush(duplicateTask));
}

So, if we take the responsibility to generate the IDs, we also should take care of their uniqueness.

因此,如果我们负责生成 ID,我们还应该确保其唯一性。

6. Use persist() Method Directly

6.直接使用 persist() 方法

As we saw in previous examples, all the actions we did led us to call the persist() method. We also can create an extension for our repository that allows us to call this method directly.

正如我们在前面的示例中看到的,我们所做的所有操作都会导致我们调用 persist() 方法。我们还可以为存储库创建一个扩展,允许我们直接调用该方法。

Let’s create an interface with the persist() method:

让我们用 persist() 方法创建一个接口:

public interface TaskRepositoryExtension {
    Task persistAndFlush(Task task);
}

Then, let’s make an implementation bean of this interface:

然后,让我们为这个接口制作一个实现 bean:

@Component
public class TaskRepositoryExtensionImpl implements TaskRepositoryExtension {
    @PersistenceContext
    private EntityManager entityManager;

    @Override
    public Task persistAndFlush(Task task) {
        entityManager.persist(task);
        entityManager.flush();
        return task;
    }
}

Now, we extend our TaskRepository using a new interface:

现在,我们使用一个新接口来扩展我们的 TaskRepository

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer>, TaskRepositoryExtension {
}

Let’s call our custom persistAndFlush() method to save the Task instance:

让我们调用自定义 persistAndFlush() 方法来保存 Task 实例:

@Test
void givenRepository_whenPersistNewTaskUsingCustomPersistMethod_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(4);
    Task saved = taskRepository.persistAndFlush(task);

    assertEquals(4, saved.getId());
}

We can see the log message with an INSERT call and no extra SELECT calls.

我们可以看到日志信息中有一个 INSERT 调用,没有额外的 SELECT 调用。

7. Use BaseJpaRepository From Hypersistence Utils

7.使用 Hypersistence Utils 中的 BaseJpaRepository 工具

The idea from the previous section was already implemented in the Hypersistence Utils project. This project provides us a BaseJpaRepository where we have the persistAndFlush()  method implementation as well as its batch analog.

上一节中的想法已经在 Hypersistence Utils 项目中实现。该项目为我们提供了BaseJpaRepository,我们在其中实现了persistAndFlush()方法及其批量模拟。

To use it, we have to specify additional dependencies. We should choose a correct Maven artifact based on our Hibernate version:

要使用它,我们必须指定额外的依赖关系。我们应根据 Hibernate 版本选择正确的 Maven 构件:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-55</artifactId>
</dependency>

Let’s implement another repository, that extends both BaseJpaRepository from Hypersistence Utils and JpaRepository from Spring Data JPA:

让我们实现另一个存储库,它同时扩展了 Hypersistence Utils 中的 BaseJpaRepository 和 Spring Data JPA 中的 JpaRepository

@Repository
public interface TaskJpaRepository extends JpaRepository<Task, Integer>, BaseJpaRepository<Task, Integer> {
}

Also, we have to enable the implementation of BaseJpaRepository using @EnableJpaRepositories annotation:

此外,我们还必须使用 @EnableJpaRepositories 注解启用 BaseJpaRepository 的实现:

@EnableJpaRepositories(
    repositoryBaseClass = BaseJpaRepositoryImpl.class
)

Now, let’s save our Task using our new repository:

现在,让我们使用新版本库保存我们的 Task

@Autowired
private TaskJpaRepository taskJpaRepository;

@Test
void givenRepository_whenPersistNewTaskUsingPersist_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(5);
    Task saved = taskJpaRepository.persistAndFlush(task);

    assertEquals(5, saved.getId());
}

We have our Task saved and there are no SELECT queries in the log.

我们保存了 Task ,日志中没有 SELECT 查询。

Like in all the examples where we specified ID on the application side, there can be unique constraints violations:

与我们在应用程序端指定 ID 的所有示例一样,可能会出现违反唯一性约束的情况: <br

@Test
void givenRepository_whenPersistTaskWithTheSameId_thenExceptionIsExpected() {
    Task task = new Task();
    task.setId(5);
    taskJpaRepository.persistAndFlush(task);

    Task secondTask = new Task();
    secondTask.setId(5);

    assertThrows(DataIntegrityViolationException.class,
      () ->  taskJpaRepository.persistAndFlush(secondTask));
}

8. Use @Query Annotated Method

8.使用 @Query 注解方法

We also can avoid extra calls using modifying native queries directly. Let’s specify a such method in our TaskRepository:

我们还可以通过直接修改 本地查询来避免额外调用。让我们在 TaskRepository 中指定一个这样的方法:

@Repository
public interface TaskRepository extends JpaRepository<Task, Integer> {

    @Modifying
    @Query(value = "insert into task(id, description) values(:#{#task.id}, :#{#task.description})", 
      nativeQuery = true)
    void insert(@Param("task") Task task);
}

This method calls the INSERT query directly avoiding the work with persistence context. The ID will be taken from the Task object sent in the method parameters.

该方法直接调用 INSERT 查询,避免了持久化上下文的工作。ID 将从方法参数中发送的 Task 对象中获取。

Now let’s save our Task using this method:

现在,让我们使用此方法保存我们的 Task

@Test
void givenRepository_whenPersistNewTaskUsingNativeQuery_thenNoExtraSelectIsExpected() {
    Task task = new Task();
    task.setId(6);
    taskRepository.insert(task);

    assertTrue(taskRepository.findById(6).isPresent());
}

The entity was successfully saved using the ID without extra SELECT queries before INSERTWe should consider, that by using this method we avoid a JPA context and Hibernate cache.

INSERT 之前,使用 ID 成功保存了实体,而无需进行额外的 SELECT 查询。 我们应该考虑到,使用这种方法可以避免 JPA 上下文和 Hibernate 缓存。

9. Conclusion

9.结论

When implementing ID generation on the application side using Spring Data JPA, we may encounter occurrences of additional SELECT queries in the logs, leading to performance degradation. In this article, we’ve discussed various strategies to address this issue.

当使用 Spring Data JPA 在应用程序端实现 ID 生成时,我们可能会在日志中遇到额外的 SELECT 查询,从而导致性能下降。在本文中,我们讨论了解决这一问题的各种策略。

In some cases, it makes sense to move this logic to the database side or fine-tune the persistence logic according to our needs. We should take into account the pros, cons, and potential issues of each strategy before making a decision.

在某些情况下,将这一逻辑移至数据库侧或根据我们的需求对持久化逻辑进行微调是有意义的。在做出决定之前,我们应该考虑到每种策略的利弊和潜在问题。

As usual, the full source code can be found over on GitHub.

像往常一样,完整的源代码可以在 GitHub 上找到