Batch Insert/Update with Hibernate/JPA – 用Hibernate/JPA进行批量插入/更新

最后修改: 2019年 5月 4日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll learn how we can batch insert and update entities using Hibernate/JPA.

在本教程中,我们将学习如何使用Hibernate/JPA>批量插入和更新实体。

Batching allows us to send a group of SQL statements to the database in a single network call. This way, we can optimize the network and memory usage of our application.

批量允许我们在一次网络调用中向数据库发送一组SQL语句。这样,我们可以优化我们应用程序的网络和内存使用。

2. Setup

2.设置

2.1. Sample Data Model

2.1.数据模型样本

Let’s look at the sample data model that we’ll use in the examples.

让我们看一下我们将在例子中使用的数据模型样本。

First, we’ll create a School entity:

首先,我们将创建一个School实体。

@Entity
public class School {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @OneToMany(mappedBy = "school")
    private List<Student> students;

    // Getters and setters...
}

Each School will have zero or more Students:

每个学校将有零或更多的学生

@Entity
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @ManyToOne
    private School school;

    // Getters and setters...
}

2.2. Tracing SQL Queries

2.2.追踪SQL查询

When running our examples, we’ll need to verify that insert/update statements are indeed sent in batches. Unfortunately, we can’t tell from Hibernate log statements whether the SQL statements are batched or not. As a result, we’ll use a data source proxy to trace Hibernate/JPA SQL statements:

在运行我们的例子时,我们需要验证插入/更新语句是否确实是分批发送的。不幸的是,我们无法从Hibernate日志语句中看出SQL语句是否为分批发送。因此,我们将使用数据源代理来追踪Hibernate/JPA的SQL语句。

private static class ProxyDataSourceInterceptor implements MethodInterceptor {
    private final DataSource dataSource;
    public ProxyDataSourceInterceptor(final DataSource dataSource) {
        this.dataSource = ProxyDataSourceBuilder.create(dataSource)
            .name("Batch-Insert-Logger")
            .asJson().countQuery().logQueryToSysOut().build();
    }
    
    // Other methods...
}

3. Default Behaviour

3.默认行为

Hibernate doesn’t enable batching by default. This means that it’ll send a separate SQL statement for each insert/update operation:

Hibernate默认不启用批处理功能。这意味着它将为每个插入/更新操作发送一个单独的SQL语句。

@Transactional
@Test
public void whenNotConfigured_ThenSendsInsertsSeparately() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
    entityManager.flush();
}

Here we persisted 10 School entities. If we look at the query logs, we can see that Hibernate sends each insert statement separately:

这里我们持久化了10个School实体。如果我们看一下查询日志,我们可以看到Hibernate单独发送了每个插入语句。

"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School2","2"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School3","3"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School4","4"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School5","5"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","6"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School7","7"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School8","8"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School9","9"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School10","10"]]

Therefore, we should configure Hibernate to enable batching. For this purpose, we should set the hibernate.jdbc.batch_size property to a number bigger than 0.

因此,我们应该配置Hibernate以启用批处理。为此,我们应该将hibernate.jdbc.batch_size属性设置为一个大于0的数字

If we’re creating the EntityManager manually, we should add hibernate.jdbc.batch_size to the Hibernate properties:

如果我们手动创建EntityManager,我们应该将hibernate.jdbc.batch_size添加到Hibernate属性中。

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.jdbc.batch_size", "5");
    
    // Other properties...
    return properties;
}

If we’re using Spring Boot, we can define it as an application property:

如果我们使用Spring Boot,我们可以把它定义为一个应用程序的属性。

spring.jpa.properties.hibernate.jdbc.batch_size=5

4. Batch Insert for Single Table

4.单个表的批量插入

4.1. Batch Insert Without Explicit Flush

4.1.批量插入,没有明确的冲洗

Let’s first look at how we can use batch inserts when we’re dealing with only one entity type.

让我们首先看看当我们只处理一种实体类型时,我们如何使用批量插入。

We’ll use the previous code sample, but this time batching is enabled:

我们将使用之前的代码样本,但这次启用了批处理。

@Transactional
@Test
public void whenInsertingSingleTypeOfEntity_thenCreatesSingleBatch() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

Here we persisted 10 School entities. When we look at the logs, we can verify that Hibernate sends insert statements in batches:

这里我们持久化了10个学校实体。当我们看日志的时候,我们可以验证Hibernate是分批发送插入语句的。

"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"],["School2","2"],["School3","3"],["School4","4"],["School5","5"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","6"],["School7","7"],["School8","8"],["School9","9"],["School10","10"]]

One important thing to mention here is the memory consumption. When we persist an entity, Hibernate stores it in the persistence context. For example, if we persist 100,000 entities in one transaction, we’ll end up having 100,000 entity instances in memory, possibly causing an OutOfMemoryException.

这里要提到的一件重要事情是内存消耗。当我们持久化一个实体时,Hibernate将其存储在持久化上下文中。例如,如果我们在一个事务中持久化100,000个实体,我们最终会在内存中有100,000个实体实例,可能会导致OutOfMemoryException

4.2. Batch Insert With Explicit Flush

4.2.批量插入与明确的冲洗

Now we’ll look at how we can optimize memory usage during batching operations. Let’s dig deep into the persistence context’s role.

现在我们来看看如何在批处理操作中优化内存使用。让我们深入挖掘一下持久化上下文的作用。

First of all, the persistence context stores newly created and modified entities in memory. Hibernate sends these changes to the database when the transaction is synchronized. This generally happens at the end of a transaction. However, calling EntityManager.flush() also triggers a transaction synchronization.

首先,持久化上下文在内存中存储新创建和修改的实体。Hibernate在事务同步时将这些变化发送到数据库中。这一般发生在事务结束的时候。然而,调用EntityManager.flush() 也会触发事务同步

Secondly, the persistence context serves as an entity cache, also referred to as the first level cache. To clear entities in the persistence context, we can call EntityManager.clear().

其次,持久化上下文作为一个实体缓存,也被称为第一级缓存。为了清除持久化上下文中的实体,我们可以调用EntityManager.clear()

So to reduce the memory load during batching, we can call EntityManager.flush() and EntityManager.clear() on our application code whenever batch size is reached:

因此,为了减少批处理过程中的内存负载,我们可以在每当达到批处理规模时,在我们的应用程序代码中调用EntityManager.flush()EntityManager.clear()

@Transactional
@Test
public void whenFlushingAfterBatch_ThenClearsMemory() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

Here we’re flushing the entities in the persistence context, thus making Hibernate send queries to the database. Furthermore, by clearing the persistence context, we’re removing the School entities from memory. Batching behavior will remain the same.

这里我们正在冲洗持久化上下文中的实体,从而使Hibernate向数据库发送查询。此外,通过清除持久化上下文,我们将从内存中删除School实体。批处理行为将保持不变。

5. Batch Insert for Multiple Tables

5.多个表的批量插入

Now let’s see how we can configure batch inserts when dealing with multiple entity types in one transaction.

现在让我们看看在一个事务中处理多个实体类型时,我们如何配置批量插入。

When we want to persist the entities of several types, Hibernate creates a different batch for each entity type. This is because there can be only one type of entity in a single batch.

当我们想持久化几种类型的实体时,Hibernate为每种实体类型创建不同的批处理。这是因为一个批次中只能有一种实体类型

Additionally, as Hibernate collects insert statements, it creates a new batch whenever it encounters an entity type different from the one in the current batch. This is the case even though there’s already a batch for that entity type:

此外,当Hibernate收集插入语句时,只要遇到与当前批次中的实体类型不同的实体,它就会创建一个新的批次。这种情况下,即使已经有一个批处理该实体类型。

@Transactional
@Test
public void whenThereAreMultipleEntities_ThenCreatesNewBatch() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
        Student firstStudent = createStudent(school);
        Student secondStudent = createStudent(school);
        entityManager.persist(firstStudent);
        entityManager.persist(secondStudent);
    }
}

Here we’re inserting a School, assigning it two Students, and repeating this process 10 times.

这里我们插入一个学校,分配给它两个学生,并重复这个过程10次。

In the logs, we see that Hibernate sends School insert statements in several batches of size 1, while we were expecting only 2 batches of size 5. Moreover, Student insert statements are also sent in several batches of size 2, instead of 4 batches of size 5:

在日志中,我们看到Hibernate分几批发送School插入语句,大小为1,而我们预期只有2批大小为5。此外,Student插入语句也是分几批发送的,大小为2,而不是4批,大小为5。

"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School1","1","2"],["Student-School1","1","3"]]
"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School2","4"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School2","4","5"],["Student-School2","4","6"]]
"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School3","7"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School3","7","8"],["Student-School3","7","9"]]
Other log lines...

To batch all insert statements of the same entity type, we should configure the hibernate.order_inserts property.

为了批量处理同一实体类型的所有插入语句,我们应该配置hibernate.order_inserts属性

We can configure the Hibernate property manually using EntityManagerFactory:

我们可以使用EntityManagerFactory手动配置Hibernate属性。

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_inserts", "true");
    
    // Other properties...
    return properties;
}

If we’re using Spring Boot, we can configure the property in application.properties:

如果我们使用的是Spring Boot,我们可以在application.properties中配置这个属性。

spring.jpa.properties.hibernate.order_inserts=true

After adding this property, we’ll have 1 batch for School inserts and 2 batches for Student inserts:

添加这个属性后,我们将有1批学校插入,2批学生插入。

"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","16"],["School7","19"],["School8","22"],["School9","25"],["School10","28"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School6","16","17"],["Student-School6","16","18"],
  ["Student-School7","19","20"],["Student-School7","19","21"],["Student-School8","22","23"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School8","22","24"],["Student-School9","25","26"],
  ["Student-School9","25","27"],["Student-School10","28","29"],["Student-School10","28","30"]]

6. Batch Update

6.批量更新

Now let’s move on to batch updates. Similar to batch inserts, we can group several update statements and send them to the database in one go.

现在让我们继续讨论批量更新。与批量插入类似,我们可以将几个更新语句分组,并一次性将它们发送到数据库。

To enable this, we’ll configure the hibernate.order_updates and hibernate.batch_versioned_data properties.

为了启用这一点,我们将配置hibernate.order_updateshibernate.batch_versioned_data属性

If we’re creating our EntityManagerFactory manually, we can set the properties programmatically:

如果我们手动创建我们的EntityManagerFactory,我们可以以编程方式设置这些属性。

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_updates", "true");
    properties.put("hibernate.batch_versioned_data", "true");
    
    // Other properties...
    return properties;
}

And if we’re using Spring Boot, we’ll just add them to application.properties:

如果我们使用的是Spring Boot,我们就把它们添加到application.properties中。

spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.batch_versioned_data=true

After configuring these properties, Hibernate should group update statements in batches:

在配置了这些属性后,Hibernate应该将更新语句分批分组。

@Transactional
@Test
public void whenUpdatingEntities_thenCreatesBatch() {
    TypedQuery<School> schoolQuery = 
      entityManager.createQuery("SELECT s from School s", School.class);
    List<School> allSchools = schoolQuery.getResultList();
    for (School school : allSchools) {
        school.setName("Updated_" + school.getName());
    }
}

Here we’ve updated the school entities, and Hibernate sends the SQL statements in 2 batches of size 5:

这里我们更新了学校实体,Hibernate分两批发送SQL语句,大小为5条。

"batch":true, "querySize":1, "batchSize":5, "query":["update school set name=? where id=?"], 
  "params":[["Updated_School1","1"],["Updated_School2","2"],["Updated_School3","3"],
  ["Updated_School4","4"],["Updated_School5","5"]]
"batch":true, "querySize":1, "batchSize":5, "query":["update school set name=? where id=?"], 
  "params":[["Updated_School6","6"],["Updated_School7","7"],["Updated_School8","8"],
  ["Updated_School9","9"],["Updated_School10","10"]]

7. @Id Generation Strategy

7.@Id生成策略

When we want to use batching for inserts, we should be aware of the primary key generation strategy. If our entities use the GenerationType.IDENTITY identifier generator, Hibernate will silently disable batch inserts.

当我们想使用批处理插入时,我们应该注意主键的生成策略。如果我们的实体使用GenerationType.IDENTITY标识符生成器,Hibernate将默默地禁用批量插入

Since entities in our examples use the GenerationType.SEQUENCE identifier generator, Hibernate enables batch operations:

由于我们的例子中的实体使用了GenerationType.SEQUENCE标识符生成器,所以Hibernate可以进行批量操作。

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
private long id;

8. Summary

8.摘要

In this article, we looked at batch inserts and updates using Hibernate/JPA.

在这篇文章中,我们研究了使用Hibernate/JPA进行批量插入和更新。

Check out the code samples for this article over on Github.

请查看本文在Github上的代码样本