2019. március 5., kedd

Batching database operations with Spring and Hibernate

Recently I needed to do some database performance optimization for a Spring Boot application. The application persisted a huge amount of log like data to the Oracle database via Spring JPA. With Hibernate and Spring Data JPA it was not really deficient to write this data, due to the time of single datasource, transaction and commit operations.

When writing a lot of, not modifiable data, you should consider batch operations, meaning, that a bunch of data gets written into the database as a single operation.

Batch mode with Hibernate

It is possible to set batch size to hibernate, using the hibernate.jdbc.batch_size property. In case of you persist a bunch of elements, they will be temporary stored on the 1st level cache. The property defines to Hibernate the maximum number of elements stored in the cache, before flushing them out into the database. It is useful to avoid OutOfMemoryException, caused by storing too many elements in the cache. In this case the entities will be written using the JDBC Statement's addBatch() and executeBatch() methods.

It is though not the behavior I really needed.  According to the Hibernate documentation, the batch size property should be set between 10 and 50. As my application persisted way more data in a short period of time, I wanted to be able to define a higher value, and let it change dynamically.

The above mentioned Hibernate batching works only if you persist multiple elements in the same transaction, without implicit or explicit commit and flush. In case of you use single method calls, calling in different transactions, the single write operations will be executed.

It also worth mentioning, that the hibernate.jdbc.batch_size property is used general for the Hibernate instance. Also it has an impact on the whole application, what I wanted to avoid.

Solution

As it is not possible to get Spring Data JPA and Hibernate to work in batched mode in the way I needed, I had to reach a level deeper. I implemented the batching with JDBC calls like this:

I created a service that is responsible to execute database operations in batch. In the example I show batched insert operation, but obviously it works the same way for update or delete.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
@Service
public class InsertActiveChannelBatchDao {

 private final static String INSERT_STATEMENT = "insert into ACTIVE_CHANNEL (id, client_id, device_id, channel_id, play_type, start_time, created, end_time, status)"
   + "values "
   + "(?, ?, ?, ?, ?, ?, ?, ?, 0)";

 @Autowired
 private DataSource dataSource;

 public void insertAll(List<ActiveChannel> activeChannels) {
  if (activeChannels.isEmpty()) {
   return;
  }
  log.info("Batch insert ActiveChannel entities: {}", activeChannels.size());
  Connection connection = null;

  try {
   connection = dataSource.getConnection();
   PreparedStatement insertStatement = connection.prepareStatement(INSERT_STATEMENT);

   for (ActiveChannel activeChannel : activeChannels) {
    setInsertStatementParameters(activeChannel, insertStatement);
    insertStatement.addBatch();
   }

   long start = System.currentTimeMillis();
   insertStatement.executeBatch();
   log.info("Batch insert ActiveChannel entities finished: {} in {} ms", activeChannels.size(),
     System.currentTimeMillis() - start);

  } catch (SQLException e) {
   log.warn("Exception occurred while inserting records: " + e.getMessage());
  } finally {
   DataSourceUtils.releaseConnection(connection, dataSource);
  }
 }


The service gets the database connection from the Hibernate datasource. It creates a PreparedStatement, and executes inserts as batch. At the end of the operation, the connection must be released back to the underlying pool. Please note, that it is not needed to commit the changes, because for the connection, got from the datasource, autocommit is true. 

The ActiveChannelBatchService service acts as controller for the batched database operations. It stores the elements to be persisted in a queue, and writes them to database periodically.



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
@Service
public class ActiveChannelBatchService {

 private static final Logger logger = LoggerFactory.getLogger(ActiveChannelBatchService.class);

 private final InsertActiveChannelBatchDao insertActiveChannelBatchDao;

 private boolean batchingEnabled;

 private int batchSize;

 private final ConcurrentLinkedQueue<ActiveChannel> entitiesToBePersisted = new ConcurrentLinkedQueue<>();

 @Autowired
 public ActiveChannelBatchService(InsertActiveChannelBatchDao insertActiveChannelBatchDao,
   @Value("${spring.datasource.batching.enabled}") boolean batchingEnabled,
   @Value("${spring.datasource.batching.batchSize}") int batchSize) {

  this.insertActiveChannelBatchDao = insertActiveChannelBatchDao;
  this.batchingEnabled = batchingEnabled;
  this.batchSize = batchSize;
 }

 @Async
 public void createNew(ActiveChannel activeChannel) {
  logger.info("Entering createNew(). Parameter: {} ", activeChannel);
  if (batchingEnabled) {
   entitiesToBePersisted.add(activeChannel);
  } else {
   insertActiveChannelBatchDao.insert(activeChannel);
  }
 }

 @Scheduled(fixedDelayString = "${spring.datasource.batching.commitIntervalInMs}")
 public void doDatabaseChanges() {
  if (!batchingEnabled) {
   return;
  }

  List<ActiveChannel> activeChannelsToBePersisted = retrieveActiveChannelsToBePersisted();
  insertActiveChannelBatchDao.insertAll(activeChannelsToBePersisted);
 }
 
 @PreDestroy
 public void doDbChangesBeforeShutDown() {
  logger.info("Writing all cahges into Database before shut down");
  doDatabaseChanges();
 }

 private <T extends Object> List<T> poll(ConcurrentLinkedQueue<T> queue) {
  List<T> result = new ArrayList<>(batchSize);
  for (int i = 0; i < batchSize; i++) {
   T activeChannel = queue.poll();
   if (activeChannel == null) {
    return result;
   }
   result.add(activeChannel);
  }
  return result;
 }

 List<ActiveChannel> retrieveActiveChannelsToBePersisted() {
  return poll(entitiesToBePersisted);
 }

}

As you can see, it is possible to configure following properties:

  • how often should the service perform the batch operation
  • how many elements should be written into the database at once
  • if batching enabled at all.

In order not to loose entities while the service gets shut down, the method  doDbChangesBeforeShutDown() was marked with @PreDestroy annotation.

Resources