Overview
A method to recover by restarting the job is explained when the job is abnormally terminated due to occurrence of failure.
Since this function is used differently for chunk model and tasklet model, each will be explained respectively.
There are the following methods to restart a job.
-
Job rerun
-
Job restart
-
Stateless restart
-
Number based restart
-
-
Stateful restart
-
Determine processing status, restart process to extract unprocessed data
-
It is necessary to separately implement a process for identifying the processing state
-
-
-
Below is terminology definition:
- Rerun
-
Redoing the job from the beginning.
As a preliminary work, it is necessary to recover the state before failure occurrence such as initializing data, at the time of starting the job. - Restart
-
Resume the processing from where the job was interrupted.
It is necessary to design/implement retention of restart position processing, acquisition method, data skip method till restart position etc in advance.
There are two types of restart, stateless and stateful. - Stateless restart
-
A restart method not considering the state (unprocessed / processed) for each input data.
- Number based restart
-
One of stateless restart.
A method of retaining the processed input data count and skipping that input data at the time of restart.
If the output is a non-transactional resource, it is also necessary to hold the output position and move the write position to that position at the time of restart. - Stateful restart
-
A restart method in which the state (unprocessed / processed) for each input data is judged, and only unprocessed data is acquired as an acquisition condition.
If the output is a non-transactional resource, make the resource additional, and at the time of restart, add it to the previous result.
Generally rerun is the easiest way to re-execute. With Rerun < Stateless restart < Stateful restart order, it becomes difficult to design and implement. Of course, it is always preferable to use rerun if possible, For each job that the user implements, please consider which method to apply depending on the allowable batch window and processing characteristics.
How to use
Implementation method of Rerun and restart is explained.
Job rerun
How to implement job rerun is explained.
-
Preliminary work of data recovery such as initialization of data before re-run is carried out.
-
Execute the failed job again with the same condition (same parameter).
-
In Spring Batch, if you execute a job with the same parameters, it will be treated as double execution, but TERASOLUNA Batch 5.x treats it as a separate job
For details,please refer"About parameter conversion class".
-
Job restart
How to restart a job is explained.
When restarting a job, it is basically done on a job executed synchronously.
It is recommended that asynchronously executed jobs should be designed with a corresponding job design with a rerun instead of a restart. This is difficult to judge whether it is "intended restart execution" or "unintended duplicate execution", this is because there is a possibility of confusion in operation.
If restarting requirements cannot be excluded for asynchronous job execution, the following methods can be used to clarify "intended restart execution".
-
Restart by
-restart
ofCommandLineJobRunner
-
Restart asynchronously executed job separately from synchronous execution. It becomes effective when progressing the recovery process sequentially.
-
-
Restart by
JobOperator#restart(JobExecutionId)
-
Restart the asynchronously executed job on the mechanism of asynchronous execution again.It is effective when progressing with recovery processing collectively.
-
Asynchronous execution(DB polling) does not support restart. Therefore, it is necessary to implement it separately by the user.
-
Asynchronous execution(Web container) guides how to implement restart. User implements it according to this description.
-
-
About restart when there is input check
The input check error is not recoverable unless the input resource causing the check error is corrected. For reference, an input resource example at the time of input error occurrence is shown below.
|
In the case of multiple processing (Partition Step)
When restarting in "multiple processing(Partition Step)",
processing is carried out again from split processing.
When all of the data are processed as the result of dividing the data, unnecessary splitting is performed and recorded on |
Stateless restart
How to implement stateless restart is explained.
Stateless restart with TERASOLUNA Batch 5.x refers to a number based restart.This is implemented by using the mechanism of Spring Batch.
The number based restart can be used in job execution of chunk model.
In addition, the number based restart uses context information about inputs and outputs registered in JobRepository
.
Therefore, in a number based restart, it is assumed that JobRepository
does not use the in-memory database, but uses the database which are guaranteed to be persistent.
About failure occurence of JobRepository
Updating to |
- Input at restart
-
Since most of the ItemReaders provided by Spring Batch are compatible with the number-based restart, special support is not necessary.
If you want to create a number based restartable ItemReader yourself, the following abstract classes can be extended that have restart processing implemented.-
-
org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader
-
-
-
The number-based restart is not able to detect the change / addition / deletion of input data since restart starting point is determined based on the number. Often the input data is corrected for the recovery after terminating the job abnormally. However, when data is changed this way, it must be noted that the variation occurs between the output of results for normal job termination, and the results of recovery by restarting abnormal job termination.
-
-
Change the data acquisition order
-
At the time of restart, duplicate or unprocessed data will get generated, so it should not be attempted as it results in a different recovery result from the result of rerun.
-
-
Update processed data
-
Since the data updated at the time of restarting is skipped, it is not preferred as there are cases where rerun result and the recovered result by restart result changes.
-
-
Update or add unprocessed data
-
It is allowed as rerun results and recovered result are same. However, it is different from the result of the normal termination in the first execution. This should be used when patching abnormal data in an emergency coping manner or when processing as much as possible data received at the time of execution.
-
-
-
- Output at restart
-
Care must be taken for output to non-transactional resources. For example in a file, it is necessary to grasp the position to which the output was made and output from that position.
Since theFlatFileItemWriter
provided by Spring Batch gets the previous output position from the context and outputs from that position at the time of restart, special countermeasure is unnecessary.
For transactional resources, since rollback is performed at the time of failure, it is possible to perform processing without taking any special action at restart.
If the above conditions are satisfied, add the option -restart
to the failed job and execute it again.
An example of job restart is shown below.
# (1)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> -restart
Sr.No. |
Description |
(1) |
Specify job bean path and job name same as job failed at |
An example of restarting a job executed in asynchronous execution (DB polling) is shown below.
# (1)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <JobExecutionId> -restart
Sr.No. |
Description |
(1) |
Run The job execution ID can be acquired from the job-request-table. About the job-request-table, please refer "About polling table". |
Output log of job execution ID
In order to promptly specify the job execution ID of the abnormally terminated job, it is recommended to implement a listener or exception handling class that logs the job execution ID when the job ends or when an exception occurs. |
An example of restart in asynchronous execution (Web container) is shown below.
public long restart(long JobExecutionId) throws Execption {
return jobOperator.restart(JobExecutionId); // (1)
}
Sr.No. |
Description |
(1) |
Specify the same job execution ID (JobExecutionId) as the failed job to The job execution ID can be obtained from the ID acquired when executing the job with the web application or from |
Stateful restart
How to achieve stateful restart is explained.
Stateful restart is a method of reprocessing by acquiring only unprocessed data together with input/output results at the time of execution. Although this method is difficult to design such as state retaining / determination unprocessed etc, it is sometimes used because it has a strong characteristic in data change.
In stateful restart, since restart conditions are determined from input / output resources, persistence of JobRepository
becomes unnecessary.
- Input at restart
-
Prepare an ItemReader that implements logic that acquires only unprocessed data with input / output results.
- Output at restart
-
Similar to Stateless restart caution is required for output to non-transactional resource.
In the case of a file, assuming that the context is not used, it is necessary to design such that file addition is permitted.
Stateful restart,similar to Job rerun reruns the job with the same condition as with the failed job.
Unlike stateless restart, -restart
option is not used.
An example of implementing an easy stateful restart is shown below.
-
Define a processed column in the input target table, and update it with a value other than NULL if the processing succeeds.
-
For the extraction condition of unprocessed data, the value of the processed column is NULL.
-
-
Output the processing result to a file.
<!-- (1) -->
<select id="findByProcessedIsNull"
resultType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
<![CDATA[
SELECT
branch_id AS branchId, year, month, customer_id AS customerId, amount
FROM
sales_plan_detail
WHERE
processed IS NULL
ORDER BY
branch_id ASC, year ASC, month ASC, customer_id ASC
]]>
</select>
<!-- (2) -->
<update id="update" parameterType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
<![CDATA[
UPDATE
sales_plan_detail
SET
processed = '1'
WHERE
branch_id = #{branchId}
AND
year = #{year}
AND
month = #{month}
AND
customer_id = #{customerId}
]]>
</update>
<!-- (3) -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
p:queryId="org.terasoluna.batch.functionaltest.ch06.reprocessing.repository.RestartOnConditionRepository.findByZeroOrLessAmount"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- (4) -->
<bean id="dbWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="org.terasoluna.batch.functionaltest.ch06.reprocessing.repository.RestartOnConditionRepository.update"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>
<bean id="fileWriter"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters['outputFile']}"
p:appendAllowed="true"> <!-- (5) -->
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="branchId,year,month,customerId,amount"/>
</property>
</bean>
</property>
</bean>
<!-- (6) -->
<bean id="compositeWriter" class="org.springframework.batch.item.support.CompositeItemWriter">
<property name="delegates">
<list>
<ref bean="fileWriter"/>
<ref bean="dbWriter"/>
</list>
</property>
</bean>
<batch:job id="restartOnConditionBasisJob"
job-repository="jobRepository" restartable="false"> <!-- (7) -->
<batch:step id="restartOnConditionBasisJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="amountUpdateItemProcessor"
writer="compositeWriter" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
# (8)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> <jobParameters> ...
Sr.No. | Description |
---|---|
(1) |
Define SQL so that the processed column has only NULL data. |
(2) |
Define SQL to update processed columns with non-NULL. |
(3) |
For ItemReader, set the SQLID defined in (1). |
(4) |
For updating to the database, set the SQLID defined in (2). |
(5) |
At restart,allow addition of files in order to make it possible to write from the last interruption point. |
(6) |
Set |
(7) |
It is not mandatory, but set the |
(8) |
Execute again according to the execution condition of the failed job. |
About the job’s restartable attribute
If |