TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.1.1.RELEASE, 2018-3-16
> INDEX

Overview

How to create chunk model job is explained. Refer to Spring Batch architecture for the architecture of chunk model.

The components of chunk model job are explained here.

Components

The components of chunk model job are shown below. A single job is implemented by combining these components in the bean definition.

Components of chunk model job
Sr. No. Name Role Mandatory settings Mandatory implementation

1

ItemReader

Interface to fetch data from various resources.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

2

ItemProcessor

Interface for processing data from input to output.
The user implements this interface whenever required and implements business logic.

3

ItemWriter

Interface for the output of data to various resources.
An interface paired with ItemReader.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

The points in this table are as follows.

  • If the data is to be only transferred from input resource to output resource in a simple way, it can be implemented only by setting.

  • ItemProcessor should be implemented whenever required.

Hereafter, how to implement the job using these components, is explained.

How to use

How to implement chunk model job is explained in the following order here.

Job configuration

Define a way to combine the elements that constitutes chunk model job in the Bean definition file. An example is shown below and the relation between components is explained.

Example of Bean definition file (Chunk model)
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:mybatis="http://mybatis.org/schema/mybatis-spring"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
             http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
             http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
             http://mybatis.org/schema/mybatis-spring http://mybatis.org/schema/mybatis-spring.xsd">

    <!-- (1) -->
    <import resource="classpath:META-INF/spring/job-base-context.xml"/>

    <!-- (2) -->
        <context:component-scan
        base-package="org.terasoluna.batch.functionaltest.app.common" />

    <!-- (3) -->
    <mybatis:scan
        base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
        factory-ref="jobSqlSessionFactory"/>

    <!-- (4) -->
    <bean id="reader"
          class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
          p:queryId="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository.findAll"
          p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

    <!-- (5) -->
    <!-- Item Processor -->
    <!-- Item Processor in order that based on the Bean defined by the annotations, not defined here -->

    <!-- (6) -->
    <bean id="writer"
          class="org.springframework.batch.item.file.FlatFileItemWriter"
          scope="step"
          p:resource="file:#{jobParameters['outputFile']}">
        <property name="lineAggregator">
            <bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
                <property name="fieldExtractor">
                    <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
                          p:names="customerId,customerName,customerAddress,customerTel,chargeBranchId"/>
                </property>
            </bean>
        </property>
    </bean>

    <!-- (7) -->
    <batch:job id="jobCustomerList01" job-repository="jobRepository"> <!-- (8) -->
        <batch:step id="jobCustomerList01.step01"> <!-- (9) -->
            <batch:tasklet transaction-manager="jobTransactionManager"> <!-- (10) -->
                <batch:chunk reader="reader"
                             processor="processor"
                             writer="writer"
                             commit-interval="10" /> <!-- (11) -->
            </batch:tasklet>
        </batch:step>
    </batch:job>
</beans>
Configuration of ItemProcessor implementation class
@Component("processor") // (5)
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
  // omitted.
}
Sr. No. Explanation

(1)

Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x.

(2)

Set base package for component scan.
If annotation based Bean definition is not performed using component scan and Bean dependency is to be resolved using annotation, <context:annotation-config/> tag should be defined.

(3)

MyBatis-Spring settings.
For the details of MyBatis-Spring settings, refer Database access

(4)

ItemReader configuration.
For the details of ItemReader, refer to Database access and File access.

(5)

ItemProcessor can be defined by annotation in (2), so there is no need to define in the Bean definition file.

(6)

ItemWriter configuration.
For the details of ItemWriter, refer to Database access and File access.

(7)

Job configuration.
id attribute must be unique for all the jobs included in 1 batch application.

(8)

JobRepository configuration.
The value set in the job-repository attribute should be fixed to jobRepository unless there is a special reason.
This will allow all the jobs to be managed by one JobRepository. The bean definition of jobRepository is resolved by (1).

(9)

Step configuration.
Although it is not necessary to use a unique id attribute for all the jobs in one batch application, a unique id is used for enabling easy tracking at the time of failure occurrence.
A format of [step+serial number] is added to id attribute specified in (7) unless there is a special reason to use a different format.

(10)

Tasklet configuration.
The value set in the transaction-manager attribute should be fixed to jobTransactionManager unless there is a special reason.
This will allow the transaction to be managed for each commit-interval of (11). For details, refer to Transaction control.
Resolve Bean definition of jobTransactionManager by (1).

(11)

Chunk model job configuration.
Specify Bean ID of ItemReader and ItemWriter defined in the previous section, in respective reader and writer attributes.
Specify Bean ID of implementation class of ItemProcessor, in processor attribute.
Set input data count per chunk in commit-interval attribute.

Tuning of commit-interval

commit-interval is the performance tuning point in chunk model job.

In the above example, it is assumed to be 10, but the appropriate number varies depending on available machine resources and job characteristics. In case of a job that processes data by accessing multiple resources, the process throughput may reach to 100 records from 10 records. If input/output resource is of 1:1 correspondence and there is a job of transferring data, then the process throughput may increase to 5000 records or even to 10000 records.

Temporarily set commit-interval to 100 records at the time of implementing the job, and then perform tuning of each job as per the result of performance measurement performed later.

Implementation of components

Here, mainly how to implement ItemProcessor is explained.

Refer to the following for other components.

Implementation of ItemProcessor

How to implement ItemProcessor is explained.

ItemProcessor is responsible for creating one record of data for the output resource based on the one record of data fetched from the input resource as shown in the interface below. In other words, ItemProcessor is where business logic for one record of data is implemented.

ItemProcessor interface
public interface ItemProcessor<I, O> {
    O process(I item) throws Exception;
}

The interface indicating I and O can be of same type or of different type as shown below. Same type means modifying input data partially. Different type means to generate output data based on the input data.

Example of implementation of ItemProcessor(Input/Output is of same type)
@Component
public class AmountUpdateItemProcessor implements
        ItemProcessor<SalesPlanDetail, SalesPlanDetail> {

    @Override
    public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
        item.setAmount(new BigDecimal("1000"));
        return item;
    }
}
Example of implementation of ItemProcessor(Input/Output is of different type)
@Component
public class UpdateItemFromDBProcessor implements
        ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {

    @Inject
    CustomerRepository customerRepository;

    @Override
    public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
        Customer customer = customerRepository.findOne(readItem.getCustomerId());

        SalesPlanDetail writeItem = new SalesPlanDetail();
        writeItem.setBranchId(customer.getChargeBranchId());
        writeItem.setYear(readItem.getYear());
        writeItem.setMonth(readItem.getMonth());
        writeItem.setCustomerId(readItem.getCustomerId());
        writeItem.setAmount(readItem.getAmount());
        return writeItem;
    }
}
Explanation of return of null from ItemProcessor

Return of null from ItemProcessor means the data is not passed to the subsequent process (Writer). In other words, the data is filtered. This can be effectively used to validate the input data. For detail, refer to Input check.

To increase process throughput of ItemProcessor

As shown in the previous implementation example, the implementation class of ItemProcessor should access resources such as database and files. Since ItemProcessor is executed for each record of input data, even if there is small I/O, large I/O occurs in the entire job, so it is important to suppress I/O as much as possible for increasing process throughput.

One method is to store the required data in memory in advance by utilizing Listener to be mentioned later and implement most of the processing in ItemProcessor so that it completes between CPU/ memory. However, since it consumes a large amount of memory per job, its not that anything can be stored in the memory. The data to be stored in memory based on I/O frequency and data size should be studied.

This point is introduced even in Input/Output of data.

Use multiple ItemProcessors at the same time

If a general ItemProcessor is provided to apply to each job, it can be implemented by using CompositeItemProcessor provided by Spring Batch and linking it.

Linking of multiple ItemProcessor by CompositeItemProcessor
<bean id="processor"
      class="org.springframework.batch.item.support.CompositeItemProcessor">
    <property name="delegates">
        <list>
            <ref bean="commonItemProcessor"/>
            <ref bean="businessLogicItemProcessor"/>
        </list>
    </property>
</bean>

Note that it is processed in the order specified in the delegates attribute.

TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.1.1.RELEASE, 2018-3-16