Spring Data JPA Batch Inserts – Spring Data JPA的批量插入

最后修改: 2019年 3月 5日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Going out to the database is expensive. We may be able to improve performance and consistency by batching multiple inserts into one.

到数据库中去是很昂贵的。我们也许可以通过将多次插入批处理成一次来提高性能和一致性。

In this tutorial, we’ll look at how to do this with Spring Data JPA.

在本教程中,我们将探讨如何使用Spring Data JPA实现这一目标。

2. Spring JPA Repository

2.Spring JPA存储库

First, we’ll need a simple entity. Let’s call it Customer:

首先,我们需要一个简单的实体。让我们称它为客户

@Entity
public class Customer {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    private String firstName;
    private String lastName;

    // constructor, getters, setters 
}

And then, we need our repository:

然后,我们需要我们的存储库。

public interface CustomerRepository extends CrudRepository<Customer, Long> {
}

This exposes a saveAll method for us, which will batch several inserts into one.

这为我们暴露了一个saveAll方法,该方法将把几个插入项目批量化为一个。

So, let’s leverage that in a controller:

因此,让我们在一个控制器中利用这一点。

@RestController
public class CustomerController {   
    @Autowired
    CustomerRepository customerRepository;   

    @PostMapping("/customers")
    public ResponseEntity<String> insertCustomers() {        
        Customer c1 = new Customer("James", "Gosling");
        Customer c2 = new Customer("Doug", "Lea");
        Customer c3 = new Customer("Martin", "Fowler");
        Customer c4 = new Customer("Brian", "Goetz");
        List<Customer> customers = Arrays.asList(c1, c2, c3, c4);
        customerRepository.saveAll(customers);
        return ResponseEntity.created("/customers");
    }

    // ... @GetMapping to read customers
}

3. Testing Our Endpoint

3.测试我们的端点

Testing our code is simple with MockMvc:

使用MockMvc测试我们的代码很简单。

@Autowired
private MockMvc mockMvc;

@Test 
public void whenInsertingCustomers_thenCustomersAreCreated() throws Exception {
    this.mockMvc.perform(post("/customers"))
      .andExpect(status().isCreated()));
}

4. Are We Sure We’re Batching?

4.我们确定我们在分批进行吗?

So, actually, there is a just a bit more configuration to do – let’s do a quick demo to illustrate the difference.

因此,实际上,需要做的配置只是多一点而已–让我们做一个快速的演示来说明其中的区别。

First, let’s add the following property to application.properties to see some statistics:

首先,让我们在application.properties中添加以下属性,以查看一些统计数据。

spring.jpa.properties.hibernate.generate_statistics=true

At this point, if we run the test, we’ll see stats like the following:

在这一点上,如果我们运行测试,我们会看到如下统计数字。

11232586 nanoseconds spent preparing 4 JDBC statements;
4076610 nanoseconds spent executing 4 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;

So, we created four customers, which is great, but note that none of them were inside a batch.

因此,我们创建了四个客户,这很好,但请注意,他们都不在一个批次内。

The reason is that batching isn’t switched on by default in some cases.

原因是,在某些情况下,批处理没有被默认打开。

In our case, it’s because we are using id auto-generation. So, by default, saveAll does each insert separately.

在我们的案例中,这是因为我们正在使用id自动生成。所以,默认情况下,saveAll 会分别进行每个插入。

So, let’s switch it on:

所以,让我们把它打开。

spring.jpa.properties.hibernate.jdbc.batch_size=4
spring.jpa.properties.hibernate.order_inserts=true

The first property tells Hibernate to collect inserts in batches of four. The order_inserts property tells Hibernate to take the time to group inserts by entity, creating larger batches.

第一个属性告诉Hibernate以四次为一个批次收集插入信息。order_inserts属性告诉Hibernate要花时间按实体对插入进行分组,创建更大的批次。

So, the second time we run our test, we’ll see the inserts were batched:

因此,当我们第二次运行我们的测试时,我们将看到插入被分批进行:

16577314 nanoseconds spent preparing 4 JDBC statements;
2207548 nanoseconds spent executing 4 JDBC statements;
2003005 nanoseconds spent executing 1 JDBC batches;

We can apply the same approach to deletes and updates (remembering that Hibernate also has an order_updates property).

我们可以将同样的方法应用于删除和更新(记住,Hibernate也有一个order_updates 属性)。

5. Conclusion

5.总结

With the ability to batch inserts, we can see some performance gains.

有了批量插入的能力,我们可以看到一些性能的提高。

We, of course, need to be aware that batching is automatically disabled in some cases, and we should check and plan for this before we ship.

当然,我们需要意识到,在某些情况下,批处理是自动禁用的,我们应该在发货前检查和计划。

Make sure to check out all these code snippets over on GitHub.

请确保在GitHub上查看所有这些代码片段