Transactions in Spring Batch - Part 2: Restart, cursor based reading and listeners - ...
This is the second post in a series about transactions in Spring Batch, you find the first one here, it’s about chunk based transaction handling, batch job vs. business data, a failed batch and transaction attributes, and the third one here, it’s about skip and retry.
After the basics today’s topics around transactions in Spring Batch will be cursor based reading, restarting a failed batch and listeners. While cursor based reading is a rather short chapter here it’s definitely necessary to understand what’s happening there. Restarting a failed batch is one of the central functionalities Spring Batch has to offer above other solutions, but it’s not a feature you can use right away. You have to do some thinking about a job’s restartability. And the third topic, listeners and transactions: we’re gonna see how the ItemReadListener, the ItemProcessListener, the ItemWriteListener and the ChunkListener behave regarding transactions.
Cursor based reading
Reading from a database cursor means opening a connection, firing one SQL statement against it and constantly reading rows during the whole batch job. That makes sense, because often input data of a job can be characterized by one SQL statement, but executing it and reading all the data from the ResultSet upfront is of course no solution. We just have one problem here with reading constantly: committing the transaction would close the connection. So how do we keep it open? Simple solution: it doesn’t take part in the transaction. Spring Batch’s JdbcCursorItemReader uses a separate connection for opening the cursor, thereby bypassing the transaction managed by the transaction manager.
In an application server environment we have to do a little bit more to make it work. Normally we get connections from a DataSource managed by the application server, and all of those connections take part in transactions by default. We need to set up a separate DataSource which does not take part in transactions, and only inject it into our cursor based readers. Injecting them anywhere else could cause a lot of damage regarding transaction safety.
Restarting a failed batch
Spring Batch brings the ability to restart a failed batch. A batch job instance is identified by the JobParameters, so a batch job started with certain parameters that have been used in a prior job execution automatically triggers a restart, when the first execution has been failed. If not, the second job execution would be rejected.
So far, so good, but can you just restart every failed job? Of course not. Someone has to know where to pick it up again. Readers subclassing AbstractItemCountingItemStreamItemReader store the item count in the ExecutionContext which gets persisted in every chunk transaction. Let’s say we have a chunk size of 5 and get an error with the processing of item 23. The last transaction that was committed successfully contained the items number 16 to 20, so the item count stored in the ExecutionContext in the database is 20. When restarting the job, we’ll continue with item 21 (and hopefully have fixed the error that led to the problem with item 23 before). There’s a whole family of readers that works that way, the JdbcCursorItemReader for example is among them. And all of them aren’t threadsafe, because they have to keep the item count.
Okay, let’s say you’re using one of those readers with item count and you’ve put them to step scope because of thread safety, then you’re still not done with thinking. Let’s say you’re using the JdbcCursorItemReader, you defined your SQL statement, and you want to use the restart functionality. Then you have to be sure that your SQL statement is delivering the same result when called on restart at least for all items that are processed already. When restarting with item number 21 you need to be sure that items 1 to 20 are the items that have been processed in the first try, otherwise you won’t get the results you expect. Ordering is important.
Another use case may be a flat file you’re reading in, line for line (FlatFileItemReader), and you’ve got a problem with a certain line. When fixing the file, be sure to keep the lines that have been processed already.
And when you’re writing the reader for yourself, always keep in mind that restartability doesn’t come by itself, you have to program it. It may be a good thing to subclass AbstractItemCountingItemStreamItemReader as well, or store the state that you want to recover directly into the ExecutionContext. That’s work Spring Batch just cannot take over for you.
Listeners and transactions
Beside ItemReaders, ItemProcessors and ItemWriters are listeners a second way to add your business logic to the batch processing. Listeners always listen on certain events and are executed, when an appropriate event fires. We have several listener types in Spring Batch, the important ones are the following:
- The JobExecutionListener has two methods, beforeJob and afterJob. Both of them are, of course, executed outside of the chunk’s transaction.
- The StepExecutionListener has two methods, beforeStep and afterStep. Both of them are, of course, executed outside of the chunk’s transaction.
- The ChunkListener has two methods, beforeChunk and afterChunk. The first one is executed inside the chunk’s transaction, the second one outside of the chunk’s transaction.
- The ItemReadListener has three methods, beforeRead, afterRead and onReadError. All of them are executed inside the chunk’s transaction.
- The ItemProcessListener has three methods, beforeProcess, afterProcess and onProcessError. All of them are executed inside the chunk’s transaction.
- The ItemWriteListener has three methods, beforeWrite, afterWrite and onWriteError. All of them are executed inside the chunk’s transaction.
- The SkipListener has three methods, onSkipInRead, onSkipInProcess and onSkipInWrite. All of them are executed inside the chunk’s transaction. We’ll talk about this listener in the blog post about skip functionality.
Let’s see in the illustration where exactly they fire.
When you look at the picture, you might notice one important thing. The onXXXError methods are executed right before rolling back the transaction (if you configured it the standard way, of course you could mark an exception type as no-rollback-exception, and then the transaction would be committed after firing the onError-event). So if you want to interact with some kind of transactional resource in that method, you have to open a new transaction by yourself. With annotation based transaction handling you can put the annotation @Transactional(propagation=Propagation.REQUIRES_NEW) on the method to achieve this.
In this second article on transactions in Spring Batch we took a look at cursor based processing, what it is and how it works without breaking the transaction. Then we saw what to do to make a job restartable, and that there is some thoughtwork to do that you cannot avoid. And the last paragraph was about listeners in Spring Batch, and where they have their place in transaction processing.
Next post is about skip and retry functionality.