When writing a lot of, not modifiable data, you should consider batch operations, meaning, that a bunch of data gets written into the database as a single operation.
Batch mode with Hibernate
It is possible to set batch size to hibernate, using the hibernate.jdbc.batch_size property. In case of you persist a bunch of elements, they will be temporary stored on the 1st level cache. The property defines to Hibernate the maximum number of elements stored in the cache, before flushing them out into the database. It is useful to avoid OutOfMemoryException, caused by storing too many elements in the cache. In this case the entities will be written using the JDBC Statement's addBatch() and executeBatch() methods.It is though not the behavior I really needed. According to the Hibernate documentation, the batch size property should be set between 10 and 50. As my application persisted way more data in a short period of time, I wanted to be able to define a higher value, and let it change dynamically.
The above mentioned Hibernate batching works only if you persist multiple elements in the same transaction, without implicit or explicit commit and flush. In case of you use single method calls, calling in different transactions, the single write operations will be executed.
It also worth mentioning, that the hibernate.jdbc.batch_size property is used general for the Hibernate instance. Also it has an impact on the whole application, what I wanted to avoid.
Solution
As it is not possible to get Spring Data JPA and Hibernate to work in batched mode in the way I needed, I had to reach a level deeper. I implemented the batching with JDBC calls like this:I created a service that is responsible to execute database operations in batch. In the example I show batched insert operation, but obviously it works the same way for update or delete.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | @Service public class InsertActiveChannelBatchDao { private final static String INSERT_STATEMENT = "insert into ACTIVE_CHANNEL (id, client_id, device_id, channel_id, play_type, start_time, created, end_time, status)" + "values " + "(?, ?, ?, ?, ?, ?, ?, ?, 0)"; @Autowired private DataSource dataSource; public void insertAll(List<ActiveChannel> activeChannels) { if (activeChannels.isEmpty()) { return; } log.info("Batch insert ActiveChannel entities: {}", activeChannels.size()); Connection connection = null; try { connection = dataSource.getConnection(); PreparedStatement insertStatement = connection.prepareStatement(INSERT_STATEMENT); for (ActiveChannel activeChannel : activeChannels) { setInsertStatementParameters(activeChannel, insertStatement); insertStatement.addBatch(); } long start = System.currentTimeMillis(); insertStatement.executeBatch(); log.info("Batch insert ActiveChannel entities finished: {} in {} ms", activeChannels.size(), System.currentTimeMillis() - start); } catch (SQLException e) { log.warn("Exception occurred while inserting records: " + e.getMessage()); } finally { DataSourceUtils.releaseConnection(connection, dataSource); } } |
The service gets the database connection from the Hibernate datasource. It creates a PreparedStatement, and executes inserts as batch. At the end of the operation, the connection must be released back to the underlying pool. Please note, that it is not needed to commit the changes, because for the connection, got from the datasource, autocommit is true.
The ActiveChannelBatchService service acts as controller for the batched database operations. It stores the elements to be persisted in a queue, and writes them to database periodically.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | @Service public class ActiveChannelBatchService { private static final Logger logger = LoggerFactory.getLogger(ActiveChannelBatchService.class); private final InsertActiveChannelBatchDao insertActiveChannelBatchDao; private boolean batchingEnabled; private int batchSize; private final ConcurrentLinkedQueue<ActiveChannel> entitiesToBePersisted = new ConcurrentLinkedQueue<>(); @Autowired public ActiveChannelBatchService(InsertActiveChannelBatchDao insertActiveChannelBatchDao, @Value("${spring.datasource.batching.enabled}") boolean batchingEnabled, @Value("${spring.datasource.batching.batchSize}") int batchSize) { this.insertActiveChannelBatchDao = insertActiveChannelBatchDao; this.batchingEnabled = batchingEnabled; this.batchSize = batchSize; } @Async public void createNew(ActiveChannel activeChannel) { logger.info("Entering createNew(). Parameter: {} ", activeChannel); if (batchingEnabled) { entitiesToBePersisted.add(activeChannel); } else { insertActiveChannelBatchDao.insert(activeChannel); } } @Scheduled(fixedDelayString = "${spring.datasource.batching.commitIntervalInMs}") public void doDatabaseChanges() { if (!batchingEnabled) { return; } List<ActiveChannel> activeChannelsToBePersisted = retrieveActiveChannelsToBePersisted(); insertActiveChannelBatchDao.insertAll(activeChannelsToBePersisted); } @PreDestroy public void doDbChangesBeforeShutDown() { logger.info("Writing all cahges into Database before shut down"); doDatabaseChanges(); } private <T extends Object> List<T> poll(ConcurrentLinkedQueue<T> queue) { List<T> result = new ArrayList<>(batchSize); for (int i = 0; i < batchSize; i++) { T activeChannel = queue.poll(); if (activeChannel == null) { return result; } result.add(activeChannel); } return result; } List<ActiveChannel> retrieveActiveChannelsToBePersisted() { return poll(entitiesToBePersisted); } } |
As you can see, it is possible to configure following properties:
- how often should the service perform the batch operation
- how many elements should be written into the database at once
- if batching enabled at all.
In order not to loose entities while the service gets shut down, the method doDbChangesBeforeShutDown() was marked with @PreDestroy annotation.