1. Overview
1.概述
In this tutorial, we’ll learn how to effectively insert a vast amount of data into our target RDBMS using Spring JDBC Batch support, and we’ll compare the performance of using a batch insert versus multiple single inserts.
在本教程中,我们将学习如何使用Spring JDBC批处理支持向目标RDBMS有效地插入大量的数据,我们将比较使用批处理插入与多个单次插入的性能。
2. Understanding Batch Processing
2.了解批量处理
Once our application establishes a connection to a database, we can execute multiple SQL statements in one go instead of sending each statement one by one. Thus, we significantly decrease the communication overhead.
一旦我们的应用程序建立了与数据库的连接,我们就可以一次性执行多条SQL语句,而不是一条一条地发送每条语句。因此,我们大大减少了通信开销。
One option to achieve this is using the Spring JDBC API, which is the focus of the following sections.
实现这一目标的一个选择是使用Spring JDBC API,这也是以下章节的重点。
2.1. Supporting Databases
2.1.支持数据库
Even though the JDBC API provides the batch functionality, it’s not guaranteed that the underlying JDBC driver we are using has actually implemented these APIs and supports this functionality.
即使JDBC API提供了批处理功能,也不能保证我们所使用的底层JDBC驱动程序实际上已经实现了这些API并支持这一功能。
Spring provides a utility method called JdbcUtils.supportsBatchUpdates() that takes a JDBC Connection as a parameter, and simply returns true or false. However, in most cases with the JdbcTemplate API, Spring already checks it for us and otherwise falls back to regular behavior.
Spring提供了一个名为JdbcUtils.supportedBatchUpdates()的实用方法,它将JDBC Connection作为参数,并简单地返回true或false。然而,在使用JdbcTemplate API的大多数情况下,Spring已经为我们检查了它,否则就会回到常规行为。
2.2. Factors That May Affect the Overall Performance
2.2.可能影响整体性能的因素
There are a few aspects we should consider when inserting a significant amount of data:
在插入大量数据时,有几个方面我们应该考虑。
- the number of connections we create to talk to the database server
- the table we are inserting
- the number of database requests we make to execute a single logical task
Usually, to overcome the first point, we use connection pooling. This helps by reusing already existing connections instead of creating new ones.
通常,为了克服第一点,我们使用连接池。这有助于重用已有的连接,而不是创建新的连接。
Another significant point is the target table. To be precise, the more indexed columns we have, the worse the performance will be, because the database server needs to adjust the indexes after each new row.
另一个重要的点是目标表。准确地说,我们的索引列越多,性能就越差,因为数据库服务器需要在每条新行之后调整索引。
Lastly, we can use batch support to decrease the number of roundtrips to insert a lot of entries.
最后,我们可以使用批处理支持,以减少插入大量条目的往返次数。
However, we should be aware that not all JDBC drivers/database servers provide the same efficiency level for batch operations even though they support it. For example, while database servers such as Oracle, Postgres, SQL Server, and DB2 provide a significant gain, MySQL provides poorer gain without any additional configuration.
然而,我们应该意识到,并不是所有的JDBC驱动/数据库服务器都能为批量操作提供相同的效率水平,即使它们支持。例如,虽然Oracle、Postgres、SQL Server和DB2等数据库服务器提供了显著的增益,但MySQL在没有任何额外配置的情况下提供的增益较差。
3. Spring JDBC Batch Inserts
3.Spring JDBC的批量插入
In this example, we’ll use Postgres 14 as our database server. So, we need to add the corresponding postgresql JDBC driver to our dependencies:
在这个例子中,我们将使用Postgres 14作为我们的数据库服务器。因此,我们需要将相应的postgresql JDBC驱动程序添加到我们的依赖项中。
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
Then, in order to use Spring’s JDBC abstraction, let’s add the spring-boot-starter-jdbc dependency as well:
然后,为了使用Spring的JDBC抽象,让我们也添加spring-boot-starter-jdbc依赖。
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
For demonstration purposes, we’ll explore 2 different methods: first, we’ll do regular inserts for each record, and then we’ll try to take advantage of the batch support. In either case, we’ll use a single transaction.
为了演示,我们将探索两种不同的方法:首先,我们将对每条记录进行常规插入,然后我们将尝试利用批量支持。在任何一种情况下,我们都将使用一个单一的事务。
Let’s get started with our simple Product table first:
让我们先从我们简单的产品表开始。
CREATE TABLE product (
id SERIAL PRIMARY KEY,
title VARCHAR(40),
created_ts timestamp without time zone,
price numeric
);
And here’s the corresponding model Product class:
而这里是相应的模型Product类。
public class Product {
private long id;
private String title;
private LocalDateTime createdTs;
private BigDecimal price;
// standard setters and getters
}
3.1. Configuring the Data Source
3.1.配置数据源
By adding the below configuration into our application.properties Spring Boot creates a DataSource and a JdbcTemplate bean for us:
通过在我们的application.properties中添加以下配置,Spring Boot为我们创建了一个DataSource和一个JdbcTemplatebean。
spring.datasource.url=jdbc:postgresql://localhost:5432/sample-baeldung-db
spring.datasource.username=postgres
spring.datasource.password=root
spring.datasource.driver-class-name=org.postgresql.Driver
3.2. Preparing Regular Inserts
3.2.准备常规插页
We start by creating a simple repository interface to save the list of products:
我们首先创建一个简单的资源库接口,以保存产品列表。
public interface ProductRepository {
void saveAll(List<Product> products);
}
Then the first implementation simply iterates over the products and inserts them one by one in the same transaction:
然后,第一个实现只是简单地迭代产品,并在同一事务中一个一个地插入它们。
@Repository
public class SimpleProductRepository implements ProductRepository {
private JdbcTemplate jdbcTemplate;
public SimpleProductRepository(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
@Override
@Transactional
public void saveAll(List<Product> products) {
for (Product product : products) {
jdbcTemplate.update("INSERT INTO PRODUCT (TITLE, CREATED_TS, PRICE) " +
"VALUES (?, ?, ?)",
product.getTitle(),
Timestamp.valueOf(product.getCreatedTs()),
product.getPrice());
}
}
}
Now, we need a service class ProductService that generates the given number of Product objects and starts the insertion process. First, we have a method to generate the given number of Product instances in a randomized fashion using some predefined values:
现在,我们需要一个服务类ProductService来生成给定数量的产品对象并开始插入过程。首先,我们有一个方法,使用一些预定义的值,以随机的方式生成给定数量的Productinstances。
public class ProductService {
private ProductRepository productRepository;
private Random random;
private Clock clock;
// constructor for the dependencies
private List<Product> generate(int count) {
final String[] titles = { "car", "plane", "house", "yacht" };
final BigDecimal[] prices = {
new BigDecimal("12483.12"),
new BigDecimal("8539.99"),
new BigDecimal("88894"),
new BigDecimal("458694")
};
final List<Product> products = new ArrayList<>(count);
for (int i = 0; i < count; i++) {
Product product = new Product();
product.setCreatedTs(LocalDateTime.now(clock));
product.setPrice(prices[random.nextInt(4)]);
product.setTitle(titles[random.nextInt(4)]);
products.add(product);
}
return products;
}
}
Second, we add another method into ProductService class that takes generated Product instances and inserts them:
其次,我们在ProductService类中添加另一个方法,该方法接收生成的Product实例并插入它们。
@Transactional
public long createProducts(int count) {
List<Product> products = generate(count);
long startTime = clock.millis();
productRepository.saveAll(products);
return clock.millis() - startTime;
}
To make the ProductService a Spring bean, let’s add the below configuration as well:
为了使ProductService成为一个Spring Bean,让我们也添加以下配置。
@Configuration
public class AppConfig {
@Bean
public ProductService simpleProductService(SimpleProductRepository simpleProductRepository) {
return new ProductService(simpleProductRepository, new Random(), Clock.systemUTC());
}
}
As we can see, this ProductService bean uses the SimpleProductRepository to perform regular inserts.
我们可以看到,这个ProductService Bean使用SimpleProductRepository来执行常规插入。
3.3. Preparing Batch Inserts
3.3.准备批量插入
Now, it’s time to see Spring JDBC batch support in action. First of all, let’s start creating another batch implementation of our ProductRepository class:
现在,是时候看看Spring JDBC批处理支持的实际效果了。首先,让我们开始为我们的ProductRepository类创建另一个批处理实现。
@Repository
public class BatchProductRepository implements ProductRepository {
private JdbcTemplate jdbcTemplate;
public BatchProductRepository(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
@Override
@Transactional
public void saveAll(List<Product> products) {
jdbcTemplate.batchUpdate("INSERT INTO PRODUCT (TITLE, CREATED_TS, PRICE) " +
"VALUES (?, ?, ?)",
products,
100,
(PreparedStatement ps, Product product) -> {
ps.setString(1, product.getTitle());
ps.setTimestamp(2, Timestamp.valueOf(product.getCreatedTs()));
ps.setBigDecimal(3, product.getPrice());
});
}
}
It’s important to note here that for this example we use the batch size 100. It means Spring will batch every 100 inserts and send them separately. In other words, it’ll help us to decrease the number of roundtrips 100 times.
这里需要注意的是,在这个例子中,我们使用了100的批处理量。这意味着Spring将对每100个插入进行批处理并分别发送。换句话说,这将帮助我们减少100次的往返次数。
Usually, the recommended batch size is 50-100, but it highly depends on our database server configurations and the size of each batch package.
通常情况下,建议的批次大小为50-100,但这高度取决于我们的数据库服务器配置和每个批次包的大小。
For example, MySQL Server has the configuration property called max_allowed_packet with a 64MB limit for each network package. While setting the batch size, we need to be careful not to exceed our database server limits.
例如,MySQL服务器有名为max_allowed_packet的配置属性,每个网络包的限制为64MB。在设置批量大小的同时,我们需要注意不要超过我们数据库服务器的限制。
Now, we add an additional ProductService bean configuration in the AppConfig class:
现在,我们在AppConfig类中添加一个额外的ProductServicebean配置。
@Bean
public ProductService batchProductService(BatchProductRepository batchProductRepository) {
return new ProductService(batchProductRepository, new Random(), Clock.systemUTC());
}
4. Performance Comparisons
4.性能比较
It’s time to run our example and take a look at the benchmarking. For the sake of simplicity, we prepare a Command-Line Spring Boot application by implementing the CommandLineRunner interface provided by Spring. We run our example multiple times for both the approaches:
现在是时候运行我们的示例并看看基准测试了。为了简单起见,我们通过实现Spring提供的CommandLineRunner接口来准备一个Command-Line Spring Boot应用程序。我们为这两种方法多次运行我们的例子。
@SpringBootApplication
public class SpringJdbcBatchPerformanceApplication implements CommandLineRunner {
@Autowired
@Qualifier("batchProductService")
private ProductService batchProductService;
@Autowired
@Qualifier("simpleProductService")
private ProductService simpleProductService;
public static void main(String[] args) {
SpringApplication.run(SpringJdbcBatchPerformanceApplication.class, args);
}
@Override
public void run(String... args) throws Exception {
int[] recordCounts = {1, 10, 100, 1000, 10_000, 100_000, 1000_000};
for (int recordCount : recordCounts) {
long regularElapsedTime = simpleProductService.createProducts(recordCount);
long batchElapsedTime = batchProductService.createProducts(recordCount);
System.out.println(String.join("", Collections.nCopies(50, "-")));
System.out.format("%-20s%-5s%-10s%-5s%8sms\n", "Regular inserts", "|", recordCount, "|", regularElapsedTime);
System.out.format("%-20s%-5s%-10s%-5s%8sms\n", "Batch inserts", "|", recordCount, "|", batchElapsedTime);
System.out.printf("Total gain: %d %s\n", calculateGainInPercent(regularElapsedTime, batchElapsedTime), "%");
}
}
int calculateGainInPercent(long before, long after) {
return (int) Math.floor(100D * (before - after) / before);
}
}
And here’s our benchmarking result:
下面是我们的基准测试结果。
--------------------------------------------------
Regular inserts | 1 | 14ms
Batch inserts | 1 | 8ms
Total gain: 42 %
--------------------------------------------------
Regular inserts | 10 | 4ms
Batch inserts | 10 | 1ms
Total gain: 75 %
--------------------------------------------------
Regular inserts | 100 | 29ms
Batch inserts | 100 | 6ms
Total gain: 79 %
--------------------------------------------------
Regular inserts | 1000 | 175ms
Batch inserts | 1000 | 24ms
Total gain: 86 %
--------------------------------------------------
Regular inserts | 10000 | 861ms
Batch inserts | 10000 | 128ms
Total gain: 85 %
--------------------------------------------------
Regular inserts | 100000 | 5098ms
Batch inserts | 100000 | 1126ms
Total gain: 77 %
--------------------------------------------------
Regular inserts | 1000000 | 47738ms
Batch inserts | 1000000 | 13066ms
Total gain: 72 %
--------------------------------------------------
The results look quite promising.
结果看起来相当有希望。
However, that’s not all. Some databases such as Postgres, MySQL, and SQL Server support multi-value inserts. It helps to decrease the overall size of insert statements. Let’s see how this works in general:
然而,这并不是全部。一些数据库,如Postgres、MySQL和SQL Server支持多值插入。它有助于减少插入语句的总体大小。让我们看看这在一般情况下是如何工作的。
-- REGULAR INSERTS TO INSERT 4 RECORDS
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
('test1', LOCALTIMESTAMP, 100.10);
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
('test2', LOCALTIMESTAMP, 101.10);
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
('test3', LOCALTIMESTAMP, 102.10);
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
('test4', LOCALTIMESTAMP, 103.10);
-- EQUIVALENT MULTI-VALUE INSERT
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
('test1', LOCALTIMESTAMP, 100.10),
('test2', LOCALTIMESTAMP, 101.10),
('test3', LOCALTIMESTAMP, 102.10),
('test4', LOCALTIMESTAMP, 104.10);
To take advantage of this feature with a Postgres database, it’s enough to set spring.datasource.hikari.data-source-properties.reWriteBatchedInserts=true in our application.properties file. The underlying JDBC driver starts rewriting our regular insert statements into multi-value ones for batched inserts.
为了利用Postgres数据库的这一特性,只需在我们的application.properties文件中设置spring.datasource.hikari.data-source-properties.reWriteBatchedInserts=true。底层的JDBC驱动程序开始将我们的常规插入语句改写为多值语句,以便分批插入。
This configuration is specific to Postgres. Other supporting databases might have different configuration requirements.
此配置是针对Postgres的。其他支持的数据库可能有不同的配置要求。
Let’s re-run our application with this feature enabled and see the difference:
让我们在启用该功能后重新运行我们的应用程序,看看有什么不同。
--------------------------------------------------
Regular inserts | 1 | 15ms
Batch inserts | 1 | 10ms
Total gain: 33 %
--------------------------------------------------
Regular inserts | 10 | 3ms
Batch inserts | 10 | 2ms
Total gain: 33 %
--------------------------------------------------
Regular inserts | 100 | 42ms
Batch inserts | 100 | 10ms
Total gain: 76 %
--------------------------------------------------
Regular inserts | 1000 | 141ms
Batch inserts | 1000 | 19ms
Total gain: 86 %
--------------------------------------------------
Regular inserts | 10000 | 827ms
Batch inserts | 10000 | 104ms
Total gain: 87 %
--------------------------------------------------
Regular inserts | 100000 | 5093ms
Batch inserts | 100000 | 981ms
Total gain: 80 %
--------------------------------------------------
Regular inserts | 1000000 | 50482ms
Batch inserts | 1000000 | 9821ms
Total gain: 80 %
--------------------------------------------------
We can see that enabling this feature increases the overall performance when we have a relatively large data set.
我们可以看到,当我们有一个相对较大的数据集时,启用这一功能可以提高整体性能。
5. Conclusion
5.总结
In this article, we created a simple example to show how we can benefit from Spring JDBC batch support for inserts. We compared regular inserts against the batched ones and got around 80-90 % of performance gain. Certainly, while using batch functionality, we also need to consider the support of our JDBC driver and its efficiency.
在这篇文章中,我们创建了一个简单的例子来展示我们如何从Spring JDBC对插入的批量支持中获益。我们比较了常规插入和批处理的插入,得到了大约80-90%的性能提升。当然,在使用批处理功能时,我们也需要考虑JDBC驱动程序的支持和它的效率。
Additionally, we learned that some databases/drivers provide the multi-value insert capability to boost the performance even more and we saw how to utilize it in the case of Postgres.
此外,我们了解到一些数据库/驱动提供了多值插入功能,以进一步提高性能,我们看到了如何在Postgres的情况下利用它。
As always, the source code for the example is available over on GitHub.
一如既往,该示例的源代码可在GitHub上获得。