TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.1.1.RELEASE, 2018-3-16
> INDEX

Overview

In this section, transaction control in jobs will be described in the following order.

Since the usage of this function is different for chunk model and tasklet model, respective usage will be explained.

About the pattern of transaction control in general batch processing

Generally, since batch processing is processing a large number of cases, if any errors are thrown at the end of the processing and all processing need to be done again, the batch system schedule will be adversely affected.
In order to avoid this, the influence at the time of error occurrence is often localized by advancing the process while confirming the transaction for each fixed number of data within the processing of one job.
(Hereafter, we call the "intermediate commit method" as the method of defining the transaction for every fixed number of data, and the "chunk" as the one grouping the data in the commit unit.)

The points of the intermediate commit method are summarized below.

  1. Localize the effects at the time of error occurrence.

    • Even if an error occurs, the processing till the chunk just before the error part is confirmed.

  2. Only use a certain amount of resources.

    • Regardless of whether the data to be processed is large or small, only resources for chunks are used, so they are stable.

However, the intermediate commit method is not a effective method in every situation.
Processed data and unprocessed data are mixed in the system even though it is temporary. As a result, since it is necessary to identify unprocessed data at the time of recovery processing, there is a possibility that the recovery becomes complicated. In order to avoid this, all of the cases must be confirmed with one transaction, and the intermediate commit method should not be used.
(Hereafter, the method of determining all cases in one transaction is called "single commit method".)

Nevertheless, if you process a large number of records such as tens of thousands of records by using a single commit method, a heavy load may occur trying to reflect all the records in the database at the time of committing. Therefore, although the single commit method is suitable for small-scale batch processing, care must be taken when adopting it in a large-scale batch. Hence, this method cannot be necessarily called as a versatile method.

In other words, there is a trade-off between "localization of impact" and "ease of recovery". Determine whether to give priority to "intermediate commit method" or "single commit method" depending on the nature of the job, for the respective usage.
Of course, it is not necessary to implement all the jobs in the batch system on either side. It is natural to use "intermediate commit method" for basic jobs and use "single commit method" for special jobs (or vice versa).

Below is the summary of advantages, disadvantages and adoption points of "intermediate commit method" and "single commit method".

Features list by method
Commit method Advantage Disadvantage Adoption point

intermediate commit method

Localize the effect at the time of error occurrence

Recovery processing may be complicated

When you want to process large amounts of data with certain machine resources

single commit method

Ensure data integrity

There is a possibility of high work-load when processing a large number of cases

When you want to set the processing result for the persistent resource to All or Nothing.
Suitable for small batch processing

Notes for input and output to the same table in the database

For the database structure, care must be taken while handling a large amount of data during the process of input and output to the same table, regardless of the commit method.

  • As the information which ensures reading consistency is lost due to output (issuing UPDATE), errors may occur at the input (SELECT).

In order to avoid this, the following measures are taken.

  • Increase the area to secure information.

    • When expanding, please thoroughly study through resource design and implement it.

    • Since the extension method depends on the database to be used, refer to the manual.

  • Divides input data and performs multiple processing.

Architecture

Transaction control in Spring Batch

Job transaction control leverages the mechanism of Spring Batch.

Two kinds of transactions are defined below.

Framework transaction

Transaction controlled by Spring Batch

User transaction

Transactions controlled by the user

Transaction control mechanism in chunk model

Transaction control in the chunk model is only the intermediate commit method. A single commit method can not be done.

The single commit method in the chunk model is reported in JIRA.
https://jira.spring.io/browse/BATCH-647
As a result, it is solved by customizing chunk completion policy and dynamically changing the chunk size. However, with this method, since all data is stored in one chunk and memory is compressed, it can not be adopted as a method.

A feature of this method is that transactions are repeatedly performed for each chunk.

Transaction control in normal process

Transaction control in normal process will be explained.

Transaction Control Chunk Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The subsequent processing is repeated until there is no input data.

    • Start a framework transaction for each chunk.

    • Repeat steps 2 to 5 until the chunk size is reached.

  2. The step obtains input data from ItemReader.

  3. ItemReader returns the input data to the step.

  4. In the step, ItemProcessor processes input data.

  5. ItemProcessor returns the processing result to the step.

  6. The step outputs data for chunk size with ItemWriter.

  7. ItemWriter will output to the target resource.

  8. The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Transaction Control Chunk Model Abnormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The subsequent processing is repeated until there is no input data.

    • Start a framework transaction on a per chunk basis.

    • Repeat steps 2 to 5 until the chunk size is reached.

  2. The step obtains input data from ItemReader.

  3. ItemReader returns the input data to the step.

  4. In the step, ItemProcessor processes input data.

  5. ItemProcessor returns the processing result to the step.

  6. The step outputs data for chunk size with ItemWriter.

  7. ItemWriter will output to the target resource.

    • If any exception occurs between the processes from 2 to 7, perform the subsequent process.

  1. The step rolls back the framework transaction.

Mechanism of transaction control in tasklet model

For transaction control in the tasklet model, either the single commit method or the intermediate commit method can be used.

single commit method

Use the transaction control mechanism of Spring Batch

Intermediate commit method

Manipulate the transaction directly with the user

single commit method in tasklet model

Explain the mechanism of transaction control by Spring Batch.

A feature of this method is to process data repeatedly within one transaction.

Transaction control in normal process

Transaction control in normal process will be explained.

Single Transaction Control Tasklet Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts a framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 7 until there is no more input data.

  3. Tasklet gets input data from Repository.

  4. Repository will return input data to tasklet.

  5. Tasklets process input data.

  6. Tasklets pass output data to Repository.

  7. Repository will output to the target resource.

  8. The tasklet returns the process end to the step.

  9. The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Single Transaction Control Tasklet Model Abormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts a framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 7 until there is no more input data.

  3. Tasklet gets input data from Repository.

  4. Repository will return input data to tasklet.

  5. Tasklets process input data.

  6. Tasklets pass output data to Repository.

  7. Repository will output to the target resource.

    • If any exception occurs between the process from 2 to 7, perform the subsequent process.

  1. The tasklet throws an exception to the step.

  2. The step rolls back the framework transaction.

Intermediate commit method in tasklet model

A mechanism for directly operating a transaction by a user will be described.

Feature of this method is that resource transactions are handled only by user transactions, by using framework transactions that cannot manipulate resources.
Specify org.springframework.batch.support.transaction.ResourcelessTransactionManager without resources, in transaction-manager attribute.

Transaction control in normal process

Transaction control in normal process will be explained.

Chunk Transaction Control Tasklet Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 10 until there is no more input data.

  3. The tasklet starts user transaction via TransacitonManager.

    • Repeat steps 4 to 8 until the chunk size is reached.

  4. Tasklet gets input data from Repository.

  5. Repository will return input data to tasklet.

  6. Tasklets process input data.

  7. Tasklets pass output data to Repository.

  8. Repository will output to the target resource.

  9. The tasklet commits the user transaction via TransacitonManager.

  10. TransacitonManager issues a commit to the target resource.

  11. The tasklet returns the process end to the step.

  12. The step commits the framework transaction.

In this case, each item is output to a resource, but like the chunk model, it is also possible to update the processing throughput collectively by chunk unit and improve the processing throughput. At that time, you can also use BatchUpdate by setting executorType of SqlSessionTemplate to BATCH. This is the same behavior as using MyBatis' ItemWriter, so you can update it using MyBatis' ItemWriter. For details of MyBatis' ItemWriter, refer to MyBatisBatchItemWriter.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Chunk Transaction Control Tasklet Model Abormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 11 until there is no more input data.

  3. The tasklet starts user transaction from TransacitonManager.

    • Repeat steps 4 to 8 until the chunk size is reached.

  4. Tasklet gets input data from Repository.

  5. Repository will return input data to tasklet.

  6. Tasklets process input data.

  7. Tasklets pass output data to Repository.

  8. Repository will output to the target resource.

If any exception occurs between the process from 3 to 8, perform the subsequent process.

  1. The tasklet processes the exception that occurred.

  2. The tasklet performs a rollback of user transaction via TransacitonManager.

  3. TransacitonManager issues a rollback to the target resource.

  4. The tasklet throws an exception to the step.

  5. The step rolls back framework transaction.

About processing continuation

Here, although processing is abnormally terminated after handling exceptions and rolling back the processing, it is possible to continue processing the next chunk. In either case, it is necessary to notify the subsequent processing by changing the status / end code of the step that an error has occurred during that process.

About framework transactions

In this case, although the job is abnormally terminated by throwing an exception after rolling back the user transaction, it is also possible to return the processing end to the step and terminate the job normally. In this case, the framework transaction is committed.

Selection policy for model-specific transaction control

In Spring Batch that is the basis of TERASOLUNA Batch 5.x, only the intermediate commit method can be implemented in the chunk model. However, in the tasklet model, either the intermediate commit method or the single commit method can be implemented.

Therefore, in TERASOLUNA Batch 5.x, when the single commit method is necessary, it is to be implemented in the tasklet model.

Difference in transaction control for each execution method

Depending on the execution method, a transaction that is not managed by Spring Batch occurs before and after the job is executed. This section explains transactions in two asynchronous execution processing schemes.

About transaction of DB polling

Regarding processing to the Job-request-table performed by the DB polling, transaction processing other than Spring Batch managed will be performed. Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by JobRequestPollTask.

A simple sequence diagram focusing on transactions is shown in the figure below.

With Database polling transaction
Transaction of DB polling
Description of the Sequence Diagram
  1. JobRequestPollTask is executed periodically from asynchronous batch daemon.

  2. JobRequestPollTask will start a transaction which is not managed by Spring Batch.

  3. JobRequestPollTask will retrieve an asynchronous execution target job from Job-request-table.

  4. JobRequestPollTask will commit the transaction which is not managed by Spring Batch.

  5. JobRequestPollTask will start a transaction which is not managed by Spring Batch.

  6. JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to POLLED.

  7. JobRequestPollTask will commit the transaction which is not managed by Spring Batch.

  8. JobRequestPollTask will execute the job.

  9. Within a job, Spring Batch carries out transaction control of the database for management (JobRepository).

  10. Within a job, Spring Batch carries out transaction control of the database for job.

  11. job_execution_id is returned to JobRequestPollTask

  12. JobRequestPollTask will start a transaction which is not managed by Spring Batch.

  13. JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to EXECUTE.

  14. JobRequestPollTask will commit the transaction which is not managed by Spring Batch.

About Commit at SELECT Issuance

Some databases may implicitly start transactions when SELECT is issued. Therefore, by explicitly issuing a commit, the transaction is confirmed so that the transaction is clearly distinguished from other transactions and is not influenced.

About the transaction of WebAP server process

For processing to resources targeted by WebAP, transaction processing which is not managed by Spring Batch is performed. Further, since the exceptions occurred in the job are handled within the job itself, it does not affect transactions performed by WebAP.

A simple sequence diagram focusing on transactions is shown in the figure below.

With Web Application transaction
Transaction of WebAP server process
Description of the Sequence Diagram
  1. WebAP processing is executed by the request from the client

  2. WebAP will start the transaction which is not managed by Spring Batch.

  3. WebAP reads from and writes to resources in WebAP before job execution.

  4. WebAP executes the job.

  5. Within a job, Spring Batch carries out transaction control of the database for management (JobRepository).

  6. Within a job, Spring Batch carries out transaction control of the database for job.

  7. job_execution_id is returned to WebAP.

  8. WebAP reads from and writes to resources in WebAP after job execution.

  9. WebAP will commit the transaction which is not managed by Spring Batch.

  10. WebAP returns a response to the client.

How to use

Here, transaction control in one job will be explained separately in the following cases.

The data source refers to the data storage location (database, file, etc.). A single data source refers to one data source, and multiple data sources refers to two or more data sources.

Processing of data in the database is a typical example of processing of single data source.
There are some variations in the case of processing multiple data sources as follows.

  • multiple databases

  • databases and files

For a single data source

Transaction control of a job which inputs / outputs to single data source is explained.

Below is a sample setting with TERASOLUNA Batch 5.x.

DataSource setting(META-INF/spring/launch-context.xml)
<!-- Job-common definitions -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
      destroy-method="close"
      p:driverClassName="${jdbc.driver}"
      p:url="${jdbc.url}"
      p:username="${jdbc.username}"
      p:password="${jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false" />
TransactionManager setting(META-INF/spring/launch-context.xml)
<!-- (1) -->
<bean id="jobTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"
      p:dataSource-ref="jobDataSource"
      p:rollbackOnCommitFailure="true" />
No Description

(1)

Bean definition of TransactionManager.
Set jobDataSource defined above for the data source.
It has been set to roll back if commit fails.

Implement transaction control

The control method differs depending on the job model and the commit method.

In case of chunk model

In the case of the chunk model, it is an intermediate commit method, leaving transaction control to Spring Batch. Transaction control should not be done by the user.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <batch:tasklet transaction-manager="jobTransactionManager"> <!-- (1) -->
            <batch:chunk reader="detailCSVReader"
                         writer="detailWriter"
                         commit-interval="10" /> <!-- (2) -->
        </batch:tasklet>
    </batch:step>
</batch:job>
No Description

(1)

Set jobTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.
The intermediate commit method transaction is controlled by the transaction manager set here.

(2)

Set chunk size to commit-interval attribute. In this sample, commit once for every 10 records.

For the tasklet model

In the case of the tasklet model, the method of transaction control differs depending on whether the method is single commit method or the intermediate commit method.

single commit method

Spring Batch control transaction.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (1) -->
        <batch:tasklet transaction-manager="jobTransactionManager"
                       ref="salesPlanSingleTranTask" />
    </batch:step>
</batch:job>
No Description

(1)

Set jobTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.
The single commit method transaction is controlled by the transaction manager set here.

intermediate commit method

Control transaction by user.

  • If you want to commit in the middle of processing, inject the TransacitonManager and operate manually.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (1) -->
        <batch:tasklet transaction-manager="jobResourcelessTransactionManager"
                       ref="salesPlanChunkTranTask" />
    </batch:step>
</batch:job>
Implementation sample
@Component()
public class SalesPlanChunkTranTask implements Tasklet {

    @Inject
    ItemStreamReader<SalesPlanDetail> itemReader;

     // (2)
    @Inject
    @Named("jobTransactionManager")
    PlatformTransactionManager transactionManager;

    @Inject
    SalesPlanDetailRepository repository;

    private static final int CHUNK_SIZE = 10;

    @Override
    public RepeatStatus execute(StepContribution contribution,
                                ChunkContext chunkContext) throws Exception {

        DefaultTransactionDefinition definition = new DefaultTransactionDefinition();
        TransactionStatus status = null;

        try {
            // omitted.

            itemReader.open(executionContext);

            while ((item = itemReader.read()) != null) {

                if (count % CHUNK_SIZE == 0) {
                    status = transactionManager.getTransaction(definition); // (3)
                }
                count++;

                // omitted.

                repository.create(item);
                if (count % CHUNK_SIZE == 0) {
                    transactionManager.commit(status);  // (4)
                }
            }
        } catch (Exception e) {
            logger.error("Exception occurred while reading.", e);
            transactionManager.rollback(status);    // (5)
            throw e;
        } finally {
            if (!status.isCompleted()) {
                transactionManager.commit(status);   // (6)
            }
            itemReader.close();
        }

        return RepeatStatus.FINISHED;
    }
}
No Description

(1)

Set jobResourcelessTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.

(2)

Inject the transaction manager.
In the @Named annotation, specify jobTransactionManager to identify the bean to use.

(3)

Start a transaction at the beginning of the chunk.

(4)

Commit the transaction at the end of the chunk.

(5)

When an exception occurs, roll back the transaction.

(6)

For the last chunk, commit the transaction.

Updating by ItemWriter

In the above example, although Repository is used, it is possible to update data using ItemWriter. Using ItemWriter has the effect of simplifying implementation, especially FlatFileItemWriter should be used when updating files.

Note for non-transactional data sources

In the case of files, no transaction setting or operation is necessary.

When using FlatFileItemWriter, pseudo transaction control can be performed. This is implemented by delaying the writing to the resource and actually writing out at the time of commit. Normally, when it reaches the chunk size, it outputs chunk data to the actual file, and if an exception occurs, data output of the chunk is not performed.

FlatFileItemWriter can switch transaction control on and off with transactional property. The default is true and transaction control is enabled. If the transactional property is false, FlatFileItemWriter will output the data regardless of the transaction.

When adopting the single commit method, it is recommended to set the transactional property to false. As described above, since data is written to the resource at the time of commit, until then, all the output data is held in the memory. Therefore, when the amount of data is large, issue of insufficient memory is highly likely to occur resulting in possible errors.

On TransacitonManager settings in jobs that only handle files

As in the following job definition, the transaction-manager attribute of batch: tasklet is mandatory in the xsd schema and can not be omitted.

Extract of TransactionManager setting part
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>

Therefore, always specify jobTransactionManager. At this time, the following behaviors are obtained.

  • If transactional is true

    • Synchronize with specified TransacitonManager and output to resource.

  • If transactional is false

    • Transaction processing of the specified TransacitonManager is idle and it outputs to the resource regardless of the transaction.

At this time, transactions are issued to the resource (eg, database) referred to by jobTransactionManager, but since there is no table access, there is no actual damage.

If you do not want to issue transactions to refer to even if it is idle or in case of actual damage, you can use ResourcelessTransactionManager which does not require resources. ResourcelessTransactionManager is defined as jobResourcelessTransactionManager in launch-context.xml.

Sample usage of ResourcelessTransactionManager
<batch:tasklet transaction-manager="jobResourcelessTransactionManager">
    <batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>

For multiple data sources

Transaction control for the jobs which input / output to multiple data sources is explained. Since points of consideration are different for input and output, these will be explained separately.

Input from multiple data source

When retrieving data from multiple data sources, the data which is the center of the processing and additional data accompanying it should be retrieved separately. Hereafter, the data which is the center of the processing is referred to as the process target record, and the additional data accompanying it is referred to as accompanying data.

Because of the structure of Spring Batch, ItemReader is based on the premise that it retrieves a process target record from one resource. Idea remains the same regardless of the type of resource.

  1. Retrieving process target record

    • Get it by ItemReader.

  2. Retrieving accompanying data

    • As for the accompanying data, it is necessary to select the fetching method according to the presence or absence of change to the data and the number of cases. This is not an option; it may be used in combination.

      • Batch retrieval before step execution

      • Retrieve each time according to the record to be processed

When retrieving all at once before step execution

Implement Listener to do the following and refer to data from the following Step.

  • Retrieve data collectively

  • Store the information in the bean whose scope is Job or Step

    • ExecutionContext of Spring Batch can be used, but a different class can be created to store data considering the readability and maintainability. For the sake of simplicity, the sample will be explained using ExecutionContext.

This method is adopted when reading data that does not depend on data to be processed such as master data. However, even if it is a master data, when there are a large number of records resulting in memory compression, it may be considered whether it is to be retrieved each time.

Implementation of Listener for collective retrieve
@Component
// (1)
public class BranchMasterReadStepListener extends StepExecutionListenerSupport {

    @Inject
    BranchRepository branchRepository;

    @Override
    public void beforeStep(StepExecution stepExecution) {   // (2)

        List<Branch> branches = branchRepository.findAll(); //(3)

        Map<String, Branch> map = branches.stream()
                .collect(Collectors.toMap(Branch::getBranchId,
                        UnaryOperator.identity()));  // (4)

        stepExecution.getExecutionContext().put("branches", map); // (5)
    }
}
Definition of Listener for collective retrieve
<batch:job id="outputAllCustomerList01" job-repository="jobRepository">
    <batch:step id="outputAllCustomerList01.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="retrieveBranchFromContextItemProcessor"
                         writer="writer" commit-interval="10"/>
            <batch:listeners>
                <batch:listener ref="branchMasterReadStepListener"/> <!-- (6) -->
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
</batch:job>
An example of referring data collectively retrieved by the ItemProcessor of the subsequent step
@Component
public class RetrieveBranchFromContextItemProcessor implements
        ItemProcessor<Customer, CustomerWithBranch> {

    private Map<String, Branch> branches;

    @BeforeStep       // (7)
    @SuppressWarnings("unchecked")
    public void beforeStep(StepExecution stepExecution) {
        branches = (Map<String, Branch>) stepExecution.getExecutionContext()
                .get("branches"); // (8)
    }

    @Override
    public CustomerWithBranch process(Customer item) throws Exception {
        CustomerWithBranch newItem = new CustomerWithBranch(item);
        newItem.setBranch(branches.get(item.getChargeBranchId()));    // (9)
        return newItem;
    }
}
No Description

(1)

Implement StepExecutionListener interface.
In order to simplify the implementation here, it is an extension from StepExecutionListenerSupport which implements the StepExecutionListener interface.

(2)

Implement the beforeStep method to get data before step execution.

(3)

Implement processing to retrieve master data.

(4)

Convert from List type to Map type so that it can be used easily in subsequent processing.

(5)

Set the acquired master data in the context of the step as branches.

(6)

Register the created Listener to the target job.

(7)

In order to acquire master data before step execution of ItemProcessor, set up Listener with @BeforeStep annotation.

(8)

In the method given the @BeforeStep annotation, obtain the master data set in (5) from the context of the step.

(9)

In the process method of ItemProcessor, data is retrieved from the master data.

Object to store in context

The object to be stored in the context(ExecutionContext) must be a class that implements java.io.Serializable. This is because ExecutionContext is stored in JobRepository.

Retrieving each time according to the record to be processed

Apart from ItemProcessor of business processing, it retrieves by ItemProcessor designated just for retrieving every time. This simplifies processing of each ItemProcessor.

  1. For each retrieval, define an ItemProcessor and separate it from business processing.

    • At this time, use MyBatis as it is when accessing the table.

  2. Concatenate multiple ItemProcessors using CompositeItemProcessor.

    • Note that ItemProcessor is processed in the order specified in the delegates attribute.

Sample implementation of ItemProcessor designated just for retrieving every time
@Component
public class RetrieveBranchFromRepositoryItemProcessor implements
        ItemProcessor<Customer, CustomerWithBranch> {

    @Inject
    BranchRepository branchRepository;  // (1)

    @Override
    public CustomerWithBranch process(Customer item) throws Exception {
        CustomerWithBranch newItem = new CustomerWithBranch(item);
        newItem.setBranch(branchRepository.findOne(
                item.getChargeBranchId())); // (2)
        return newItem; // (3)
    }
}
Definition sample of ItemProcessor designated just for retrieving every time and ItemProcessor for business process
<bean id="compositeItemProcessor"
      class="org.springframework.batch.item.support.CompositeItemProcessor">
    <property name="delegates">
        <list>
            <ref bean="retrieveBranchFromRepositoryItemProcessor"/> <!-- (4) -->
            <ref bean="businessLogicItemProcessor"/>  <!-- (5) -->
        </list>
    </property>
</bean>
No Description

(1)

Inject Repository for retrieving every time using MyBatis.

(2)

Accompanying data is retrieved from the Repository for input data(process target record).

(3)

Return data with processing target record and accompanying data together.
Notice that this data will be the input data to the next ItemProcessor.

(4)

Set ItemProcessor for retrieving every time.

(5)

Set ItemProcessor for business logic.

Output to multiple data sources(multiple steps)

Process multiple data sources throughout the job by dividing the steps for each data source and processing a single data source at each step.

  • Data processed at the first step is stored in a table, and is output to the file in the second step.

  • Although each step is simple and easy to recover, it is likely to cause issues if carried out twice.

    • Accordingly, when following harmful effects are generated, consider processing multiple data sources in one step.

      • Processing time increases

      • Business logic becomes redundant

Output to multiple data sources(single step)

Generally, when transactions for a plurality of data sources are combined into one, a distributed transaction based on 2 phase-commit is used. However, it is also known that there are the following disadvantages.

  • Middleware must be compatible with distributed transaction API such as XAResource, and special setting based on it is required

  • In standalone Java like a batch program, you need to add a JTA implementation library for distributed transactions

  • Recovery in case of failure is difficult

Although it is possible to utilize distributed transactions also in Spring Batch, the method using global transaction by JTA requires performance overhead due to the characteristics of the protocol. As a method to process multiple data sources collectively more easily, Best Efforts 1PC pattern is recommended.

What is Best Efforts 1PC pattern

Briefly, it refers to the technique of handling multiple data sources as local transactions and issuing sequential commits at the same timing. The conceptual diagram is shown in the figure below.

Best Efforts 1PC Overview
Conceptual diagram of Best Efforts 1PC pattern
Description of figure
  1. The user instructs ChainedTransactionManager to start the transaction.

  2. ChainedTransactionManager starts a transaction sequentially with registered transaction managers.

  3. The user performs transactional operations on each resource.

  4. The user instructs ChainedTransactionManager to commit.

  5. ChainedTransactionManager issues sequential commits on registered transaction managers.

    • Commit(or roll back) in reverse order of transaction start

Since this method is not a distributed transaction, there is a possibility that data consistency may not be maintained if a failure(exception) occurs at commit / rollback in the second and subsequent transaction managers. Therefore, although it is necessary to design a recovery method when a failure occurs at a transaction boundary, there is an effect that the recovery frequency can be reduced and the recovery procedure can be simplified.

When processing multiple transactional resources at the same time

Use it when processing multiple databases simultaneously, processing database and MQ, and so on.

Process as 1 phase-commit by defining multiple transaction managers as one using ChainedTransactionManager as follows. Note that ChainedTransactionManager is a class provided by Spring Data.

pom.xml
<dependencies>
    <!-- omitted -->
    <!-- (1) -->
    <dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-commons</artifactId>
    </dependency>
<dependencies>
Sample usage of chainedTransactionManager
<!-- Chained Transaction Manager -->
<!-- (2) -->
<bean id="chainedTransactionManager"
      class="org.springframework.data.transaction.ChainedTransactionManager">
   <constructor-arg>
        <!-- (3) -->
        <list>
            <ref bean="transactionManager1"/>
            <ref bean="transactionManager2"/>
        </list>
    </constructor-arg>
</bean>

<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (4) -->
        <batch:tasklet transaction-manager="chainedTransactionManager">
            <!-- omitted -->
        </batch:tasklet>
    </batch:step>
</batch:job>
No Description

(1)

Add a dependency to use ChainedTransactionManager.

(2)

Define the bean of ChainedTransactionManager.

(3)

Define multiple transaction managers that you want to summarize in a list.

(4)

Specify the bean ID defined in (1) for the transaction manager used by the job.

When processing transactional and nontransactional resources simultaneously

This method is used when processing databases and files at the same time.

For database, it is the same as For a single data source.

For files, setting FlatFileItemWriter’s transactional property to true provides the same effect as the "Best Efforts 1PC pattern" described above.
For details, refer to Note for non-transactional data sources.

This setting delays writing to the file until just prior to committing the database transaction, so it is easy to synchronize with the two data sources. However, even in this case, if an error occurs during file output processing after committing to the database, there is a possibility that data consistency may not be maintained, so it is necessary to design a recovery method.

Notes on intermediate method commit

Although it is deprecated, when processing data is skipped with ItemWriter, the chunk size setting value is forcibly changed. Note that this has a very big impact on transactions. Refer to Skip for details.

TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.1.1.RELEASE, 2018-3-16