Batch Processing with Hibernate/JPA

Learn to enable batch processing in hibernate and execute bulk INSERT / UPDATE statements for better performance and memory utilization.

Note that, internally, Hibernate leverages the JDBC’s batching capability that batches together multiple SQL statements as a single PreparedStatement.

1. Without Enabling the Batch Processing

By default, batch processing is disabled in Hibernate. So if we persist 10 new entities then 10 separate SQL INSERT statements will be executed. The same is true for a million records as well. Any application’s performance will keep degrading in proportion to the number of rows increases, in any typical setup.

@Test
public void testWithoutPeriodicFlush() {
  doInTransaction(session -> {
    
    for (int i = 1; i <= 10; i++) {
      System.out.println("Statement Queued : " + i);

      session.persist(new Post.PostBuilder()
          .title("title" + i)
          .content("content" + i)
          .build());
    }

  });
}

Notice the logs in the console. Clearly, hibernate first queued all the statements in the current persistent context. When the transaction was committed, all the statements were executed at the end of the method.

Statement Queued : 1
Statement Queued : 2
...
...
Statement Queued : 10

Hibernate: insert into Post (content, title, id) values (?, ?, ?)
Hibernate: insert into Post (content, title, id) values (?, ?, ?)
...
...
Hibernate: insert into Post (content, title, id) values (?, ?, ?)

2. Logging Batch Statements

As said before, hibernate relies on the low-level JDBC APIs to create batches of queued statements so to enable logging these statements we must intercept the calls at the datasource level.

The net.ttddyy:datasource-proxy is one such library that helps in creating a proxy around the original datasource used by the application. I will recommend its use in local and pre-production environments to test the application. Avoid using it in production.

<dependency>
    <groupId>net.ttddyy</groupId>
    <artifactId>datasource-proxy</artifactId>
    <version>1.7</version>
</dependency>

We can use a @Autowired DataSource instance or create a new DataSource in unit tests, and wrap it with the ProxyDataSource.

private DataSource getDataSource() {

  // Autowire or Create a DataSource
  MysqlDataSource ds = new MysqlDataSource();
  ds.setURL("jdbc:mysql://localhost/testdb");
  ds.setUser("root");
  ds.setPassword("password");

  // Create ProxyDataSource
  ProxyDataSource dataSource = ProxyDataSourceBuilder.create(ds)
      .asJson()
      .countQuery()
      .logQueryToSysOut()
      .multiline()
      .build();

  return dataSource;
}

Finally, use this proxy datasource in StandardServiceRegistry to bootstrap the SessionFactory.

Map<String, Object> settings = new HashMap<>();

settings.put(Environment.DATASOURCE, getDataSource());
settings.put(Environment.DRIVER, "com.mysql.cj.jdbc.Driver");
settings.put(Environment.DIALECT, "org.hibernate.dialect.MySQL8Dialect");

StandardServiceRegistry standardRegistry
            = new StandardServiceRegistryBuilder()
            .applySettings(settings)
            .build();

In the production environment, just to verify that batching is working or not, we can enable the DEBUG logging for org.hibernate.engine.jdbc.batch.internal.BatchingBatch logger.

<logger name="org.hibernate.engine.jdbc.batch.internal.BatchingBatch"  level="DEBUG"/>

If you see similar logs then batching is working. It does not provide any useful information about the batch, but it is enough to validate that batch functionality is working at the low-level APIs.

2022-05-18_16:50:06.302 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 5

3. Enabling the Batch Processing

To enable the batch processing, we need to set the hibernate.jdbc.batch_size property to a number bigger than 0.

hibernate.jdbc.batch_size = 5

If we’re using Spring Boot, we can define it as an application property:

spring.jpa.properties.hibernate.jdbc.batch_size = 5

To configure Session specific batch size, we can use setJdbcBatchSize() method.

//Using Session
session.setJdbcBatchSize(100);

//Using EntityManager
entityManager.unwrap(Session.class).setJdbcBatchSize(100);

Execute the first test again after configuring the datasource proxy and check the logs.

@Test
public void testWithoutPeriodicFlush() {
  doInTransaction(session -> {
    
    for (int i = 1; i <= 10; i++) {
      System.out.println("Statement Queued : " + i);

      session.persist(new Post.PostBuilder()
          .title("title" + i)
          .content("content" + i)
          .build());
    }

  });
}
Statement Queued : 1
Statement Queued : 2
Statement Queued : 3
Statement Queued : 4
Statement Queued : 5
Statement Queued : 6
Statement Queued : 7
Statement Queued : 8
Statement Queued : 9
Statement Queued : 10

2022-05-20_00:47:58.178 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 5

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:5
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content_1,title_1,1802),
(content_2,title_2,1803),
(content_3,title_3,1804),
(content_4,title_4,1805),
(content_5,title_5,1806)]

2022-05-20_00:47:58.178 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 5

Name:, Connection:3, Time:15, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:5
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content_6,title_6,1807),
(content_7,title_7,1808),
(content_8,title_8,1809),
(content_9,title_9,1810),
(content_10,title_10,1811)]

Clearly, batching is enabled and working when we commit the transaction. Hibernate is sending the INSERT statements in batches of 5.

At the JDBC level, these batched transactions are grouped into a single INSERT statement. So for every 5 posts, there is only 1 INSERT statement at the database level.

4. Periodically Flushing and Clearing the Session

There is one problem remaining with the above-shown default batch processing. It first queues all the entities into the context and waits for committing the transaction.

This can be a serious issue of we have to queue thousands of entity instances into memory (the session-level cache) before flushing them to the database. For sufficient large batches, it can lead to OutOfMemoryError.

To overcome this problem, we need to flush and clear the session periodically.

  • The session’s flush() method triggers a transaction synchronization that sends all changes in the persistent entities to the database. Flushing is the process of synchronizing the underlying persistent store with a persistable state held in memory.
  • The session’s clear() clears the session. It evicts all loaded instances from Session and cancels all pending saves, updates and deletions.

In the given example, we are flushing and clearing the session after each batch (of size 5). So now, we queue 5 posts in the session and use the flush() method to insert these 5 posts in the database in a single batch statement. We do this repeatedly without changing the overall batching behavior.

@Test
public void testWithPeriodicFlush() {
  doInTransaction(session -> {

    for (int i = 1; i <= 10; i++) {
      System.out.println("Statement Queued : " + i);

      session.persist(new Post.PostBuilder()
          .title("title" + i)
          .content("content" + i)
          .build());

      if (i % 5 == 0) {
        session.flush();
        session.clear();
      }
    }

  });
}

Notice the logs.

Statement Queued : 1
Statement Queued : 2
Statement Queued : 3
Statement Queued : 4
Statement Queued : 5

2022-05-18_17:16:20.227 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 5

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:5
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content_1,title_1,1852),
(content_2,title_2,1853),
(content_3,title_3,1854),
(content_4,title_4,1855),
(content_5,title_5,1856)]

Statement Queued : 6
Statement Queued : 7
Statement Queued : 8
Statement Queued : 9
Statement Queued : 10

2022-05-18_17:16:20.231 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 5

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:5
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content_6,title_6,1857),
(content_7,title_7,1858),
(content_8,title_8,1859),
(content_9,title_9,1860),
(content_10,title_10,1861)]

Now, this is much better code and provides excellent memory and runtime performance.

5. Ordering Batch Inserts for Multiple Entities

A limitation of hibernate batching is that it allows only one type of entity in a single batch. For a different entity, a second batch will be created.

Let us understand with an example. Let’s create a few posts and some comments on them. In the following example, we are creating 10 posts and adding 4 comments to each post. It makes a total of 10 posts and 40 comments.

@Test
public void testInsertOrdering() {
  doInTransaction(session -> {

    for (int i = 1; i <= 10; i++) {

      List<Comment> comments = new ArrayList<>();

      for (int j = 1; j <= 4; j++) {
        Comment comment =
            new Comment.CommentBuilder().text("Comment - " + j).build();
        session.persist(comment);
        comments.add(comment);
      }

      Post post = new Post.PostBuilder()
          .title("title" + i)
          .content("content" + i)
          .comments(comments)
          .build();

      session.persist(post);
    }
    
  });
}

Notice the logs. All the posts go into a database in a separate batch. Similarly, comments also go in 10 separate batches. So there is a total of 20 SQL INSERT statements executed in the process.

2022-05-20_00:47:58.553 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:1
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content1,title1,1902)]

2022-05-20_00:47:58.553 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 4

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:4
Query:["insert into Comment (post_id, text, id) values (?, ?, ?)"]
Params:[(NULL(BIGINT),Comment - 1,606),
(NULL(BIGINT),Comment - 2,607),
(NULL(BIGINT),Comment - 3,608),
(NULL(BIGINT),Comment - 4,609)]

...
...
...

To improve the performance, logically, all 10 posts can go into the database in a single SQL INSERT statement. And when we have post ids created for all 10 posts, all the 40 comments should in the database in the second INSERT statement. So it should really take just 2 INSERT statements in the whole process.

Hibernate provides hibernate.order_inserts property that can be used to force Hibernate to order inserts to allow for more batching. The official documentation cautions against it as a performance hit, so benchmark before and after to see if this actually helps or hurts our application.

settings.put("hibernate.order_inserts", true);

//or

hibernate.order_inserts = true

In Spring boot applications, we can force insert order using the following property.

spring.jpa.properties.hibernate.order_inserts = true

After configuring the run the test again and notice how all posts and comments have been created in just 2 INSERT statements. Note that we have increased the batch size to 50 to accommodate more statements in a single batch.

2022-05-20_01:08:56.683 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 10

Name:, Connection:3, Time:0, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:10
Query:["insert into Post (content, title, id) values (?, ?, ?)"]
Params:[(content1,title1,2302),.....]

2022-05-20_01:08:56.699 DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 40

Name:, Connection:3, Time:15, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:40
Query:["insert into Comment (post_id, text, id) values (?, ?, ?)"]
Params:[(NULL(BIGINT),Comment - 1,702),....]

6. Ordering Batch Updates

Similar to insert ordering, we can force hibernate to group SQL UPDATE statements in situations like the above.

settings.put("hibernate.order_updates", "true");
settings.put("hibernate.batch_versioned_data", "true");

And if we are using Spring Boot, we need to add these to the application.properties:

spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.batch_versioned_data=true

7. FAQs

7.1. Batching not working even after configuring the properties

The most probable reason is that you are using GenerationType.IDENTITY identifier generator.

It is important to realize that using IDENTITY columns imposes a runtime behavior where the entity row must be physically inserted prior to the identifier value being known. Due to this limitation, Hibernate will not be able to batch INSERT statements for the entities using the IDENTITY generation.

Using GenerationType.SEQUENCE is the recommended approach in batch processing.

7.2. Configuring batch size per session basis

We can have a reasonable batch size at the global level, but sometimes we need to override the global batch size. Use the session.setJdbcBatchSize() method for configuring Session specific batch size.

session.setJdbcBatchSize(100);

8. Conclusion

In this tutorial, we learned Hibernate batching functionality and how to use it properly. We learned ways to improve the performance further by periodically flushing the persistent context, controlling the session-specific batch sizes and, finally, ordering the INSERT and UPDATE statements for better batches.

Happy Learning !!

Sourcecode on Github

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode