TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.0.1.RELEASE, 2017-9-27
> INDEX

1. Introduction

1.1. Terms of Use

In order to use this document, you are required to agree to abide by the following terms. If you do not agree with the terms, you must immediately delete or destroy this document and all its duplicate copies.

  1. Copyrights and all other rights of this document shall belong to NTT DATA or third party possessing such rights.

  2. This document may be reproduced, translated or adapted, in whole or in part for personal use. However, deletion of the terms given on this page and copyright notice of NTT DATA is prohibited.

  3. This document may be changed, in whole or in part for personal use. Creation of secondary work using this document is allowed. However, “Reference document: TERASOLUNA Batch Framework for Java (5.x) Development Guideline” or equivalent documents may be mentioned in created document and its duplicate copies.

  4. Document and its duplicate copies created according to Clause 2 may be provided to third party only if these are free of cost.

  5. Use of this document and its duplicate copies, and transfer of rights of this contract to a third party, in whole or in part, beyond the conditions specified in this contract, are prohibited without the written consent of NTT Data.

  6. NTT DATA shall not bear any responsibility regarding correctness of contents of this document, warranty of fitness for usage purpose, assurance for accuracy and reliability of usage result, liability for defect warranty, and any damage incurred directly or indirectly.

  7. NTT DATA does not guarantee the infringement of copyrights and any other rights of third party through this document. In addition to this, NTT DATA shall not bear any responsibility regarding any claim (Including the claims occurred due to dispute with third party) occurred directly or indirectly due to infringement of copyright and other rights.

Registered trademarks or trademarks of company name and service name, and product name of their respective companies used in this document are as follows.

  • TERASOLUNA is a registered trademark of NTT DATA Corporation.

  • All other company names and product names are the registered trademarks or trademarks of their respective companies.

1.2. Introduction

1.2.1. Goal of guideline

This guideline provides best practices to develop high maintainability Batch applications using full stack framework focusing on Spring Framework, Spring Batch and MyBatis.

This guideline helps to proceed with the software development (mainly coding) smoothly.

1.2.2. Target readers

This guideline is written for architects and programmers having software development experience and knowledge of the following.

  • Basic knowledge of DI and AOP of Spring Framework

  • Application development experience using Java

  • Knowledge of SQL

  • Have experiences on using Maven

This guideline is not for beginners.

In order to check whether one has enough basic knowledge to understand the document, refer to Spring Framework Comprehension Check If one is not able to answer 40% of the comprehension test, then it is recommended to study the following books separately.

1.2.3. Structure of guideline

The most important thing is that the guideline is considered as the subset of TERASOLUNA Server Framework for Java (5.x) Development Guideline (hereafter, referred to as TERASOLUNA Server 5.x Development Guideline). By using TERASOLUNA Server 5.x Development Guideline, you can eliminate duplication in explanation and reduce the cost of learning as much as possible. Since it indicates reference to TERASOLUNA Server 5.x Development Guideline everywhere, we would like you to proceed with the development by using both guides.

TERASOLUNA Batch Framework for Java (5.x)concept

Explains the basic concept of batch processing and the basic concept of TERASOLUNA Batch Framework for Java (5.x) and the overview of Spring Batch.

Methodology of application development

Explains the knowledge and method to be kept in mind while developing an application using TERASOLUNA Batch Framework for Java (5.x).

Running a Job

Explains how to running a job as Synchronous, Asynchronous and provide job parameters.

Input/output of data

Explains how to provide Input/Output to various resources such as Database, File access etc.

Handling for abnormal condition

Explains how to handle the abnormal conditions like Input checks, Exceptions.

Job management

Explains how to manage the Job execution.

Flow control and parallel/multiple processing

Explains the processing of parallel/multiple Job execution.

Tutorial

Experience batch application development with TERASOLUNA Batch Framework for Java (5.x), through basic batch application development.

1.2.4. How to read guideline

It is strongly recommended for all the developers to read the following contents for using TERASOLUNA Batch Framework for Java (5.x).

The following contents are usually required, so they should be read in advance. It is better to select according to the development target.

Refer to the following contents for the first time when proceeding with the implementation.

Developers who want to experience actual application development by using TERASOLUNA Batch Framework for Java (5.x) are recommended to read following contents. While experiencing TERASOLUNA Batch Framework for Java (5.x) for the first time, you should read these contents first and then move on to other contents.

1.2.4.1. Notations in guideline

This section describe for the notations of this guideline.

About Windows command prompt and Unix terminal

If command syntax in Windows and Unix are different, it describe both. Otherwise, standardize the notations of Unix.

Prompt sign

Describe as $ in Unix.

Prompt notation example
$ java -version
About defining properties and constructor of Bean definition

In this guideline, it is described by using namespace of p and c. The use of namespace helps in simplifying and clarifying the description of Bean definition.

Description wherein namespace is used
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
    <property name="lineTokenizer">
        <bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer"
              c:ranges="1-6, 7-10, 11-12, 13-22, 23-32"
              c:charset="MS932"
              p:names="branchId,year,month,customerId,amount"/>
    </property>
</bean>

For your reference, the description not using namespace is shown.

Description not using namespace
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
    <property name="lineTokenizer">
        <bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer">
            <constructor-arg index="0" value="1-6, 7-10, 11-12, 13-22, 23-32"/>
            <constructor-arg index="1" value="MS932"/>
            <property name="names" value="branchId,year,month,customerId,amount"/>
    </property>
</bean>

This guideline does not force the user to use namespace. We would like to consider it for simplifying the explanation.

1.2.5. Tested environments of guideline

For tested environments of contents described in this guideline, refer to " Tested Environment".

1.3. Change Log

Modified on Modified locations Modification details

2017-09-27

-

Released 5.0.1 RELEASE version

General

Description details modified
・Errors in the guideline (typing errors, simple description errors etc.) modified
・Design of the link on the index for header and footer modified (Management ID#196)
・JDK8 dependent code changed to code prior to JDK7 considering it will be used by persons who do not know JDK8 (Management ID#231)

Description details added
・Version information added to header and footer (Management ID#196)

Spring Batch Architecture

Description details added
・Explanation about character string stored in meta data table added (Management ID#233)

Description details deleted
・Explanation about job parameter constraints deleted (Management ID#233)

Create project

Description details modified
・Storage directory of Job Bean definition file of blank project changed (Management ID#161)
・Command execution example and output example modified to show command prompt and Bash examples respectively (Management ID#161)
・archetypeVersion specified while creating a project modified to 5.0.1.RELEASE (Management ID#315)

Create chunk model job

Description details modified
・Explanation of id attribute of Bean definition file modified to a simple expression (Management ID#250)

Create tasklet model job

Description details modified
・Explanation of id attribute of Bean definition file modified to a simple expression (Management ID#250)

Description details added
・Explanation of process units considered at the time of Tasklet implementation added (Management ID#202)

Asynchronous execution (DB polling)

Description details modified
・Suffix of class name modified to Repository (Management ID#241)
・Explanation for job request sequence modified to the details which are not dependent on specific RDBMS products (Management ID#233)

Description details added
・Explanation for character string which is stored in job request table added (Management ID#233)
・Explanation for job request acquisition SQL added (Management ID#233)

Job start-up parameter

Description details modified
・Example for referencing parameters modified so as to enclose the character string literal with single quotes (Management ID#246)

Listener

Description details modified
・JobExecutionListener implementation example modified to present a simpler code example (Management ID#271)

Description details added
・Link for exception handling in ChunkListener explanation added (Management ID#194)
・In case of tasklet model, added precautions to the explanation where the listener is set (Management ID#194)

Transaction control

Description details modified
・Code example of intermediate commit method in tasklet model modified to the example which uses jobResourcelessTransactionManager (Management ID#262)

Description details added
・Explanation of jobResourcelessTransactionManager added to intermediate commit method in tasklet model (Management ID#262)

Database access

Description details added
・Example to update multiple tables by using CompositeItemWriter added (Management ID#226)
・Notes while using Oracle JDBC in Linux environment added (Management ID#237)

File access

Description details modified
・FlatFileItemWriter and StaxEventItemWriter property explanation modified (Management ID#198)

Description details added
・Explanation that unintended file deletion is done by combination of FlatFileItemWriter and StaxEventItemWriter property setting is added (Management ID#198)

Exclusive control

Description details modified
・Code example of pessimistic lock in chunk model modified (Management ID#204)
・Code example of exclusive control of file modified so as to fetch file lock before opening file for exclusion (Management ID#225)

Job management

Description details deleted
・Description related to Spring Batch Admin along with termination of Spring Batch Admin project deleted (Management ID#209)

Customization of exit codes

Description details added
・Explanation for significance of exit codes added to Customization of exit codes (Management ID#294)
・Code example for changing exit codes of step in tasklet model added to Customization of exit codes (Management ID#294)

Message management

Description details modified
・Bean definition of ResourceBundleMessageSource modified (Management ID#266)

Tutorial

New chapter added
・Tutorial added (Management ID#200)

2017-03-17

-

Released 5.0.0 RELEASE version

2. TERASOLUNA Batch Framework for Java (5.x) concept

2.1. Batch Processing in General

2.1.1. Introduction to Batch Processing

The term of "Batch Processing" refers to the execution or process of a series of jobs in a computer program without manual intervention (non-interactive).
It is often a process of reading, processing and writing a large number of records from a database or a file.
Batch processing consists of following features and is a processing method which prioritizes process throughput than the responsiveness, as compared to online processing.

Commons of batch processing
  • Process large number of data is collected and processed.

  • Uninterruptible process for certainty of time is done in a fixed sequence.

  • Process runs in accordance with the schedule.

Objective of batch processing is given below.

Enhanced throughput

Process throughput can be enhanced by processing the data sets collectively in a batch.
File or database does not input or output data one by one, and instead sums up data of a fixed quantity thus dramatically reducing overheads of waiting for I/O resulting in the increased efficiency. Even though waiting period for I/O of a single record is insignificant, cumulative accumulation while processing a large amount of data result in fatal delay.

Ensuring responsiveness

Processes which are not required to be processed immediately are cut for batch processing in order to ensure responsiveness of online processing.
For example, when the process results are not required immediately, the processing is done by online processing till its acceptance and batch processing is performed in the background. The processing method is generally called "delayed processing".

Response to time and events

Processes corresponding to specific period and events are naturally implemented by batch processing.
For example, aggregating business data sets per month on the next 1st weekend,
taking backup every Sunday at 2a.m in accordance with the system operation rules,
and so on.

Restriction for coordination with external system

Batch processing is also used due to restrictions of interface like files with interactions of external systems.
File sent from the external system is a summary of data collected for a certain period. Batch processing is better suited for the processes which incorporate these files, than the online processing.

It is very common to combine various techniques to achieve batch processing. Major techniques are introduced here.

Job Scheduler

A single execution unit of a batch processing is called a job. A job scheduler is a middleware to manage this job.
A batch system rarely has several jobs, and usually the number of jobs can reach hundreds or even thousands at times. Hence, an exclusive system to define the relation with the job and manage execution schedule becomes indispensable.

Shell script

One of the methods to implement a job. A process is achieved by combining the commands implemented in OS and middleware.
Although the method can be implemented easily, it is not suitable for writing complex business logic. Hence, it is primarily used in simple processes like copying a file, backup, clearing a table etc. Further, shell script performs only the pre-start settings and post-execution processing while executing a process implemented in another programming language.

Programming language

One of the methods to implement a job. Structured code can be written rather than the shell script and is advantageous for securing development productivity, maintainability and quality. Hence, it is commonly used to implement business logic that processes data of file or database which tend to be relatively complex with logic.

2.1.2. Requirements for batch processing

Requirements for batch processing in order to implement business process is as given below.

  • Performance improvement

    • A certain quantity of data can be processed in a batch.

    • Jobs can be executed in parallel/in multiple.

  • Recovery in case of an abnormality

    • Jobs can be reexecuted (manual/schedule).

    • At the time of reprocessing, it is possible to process only unprocessed records by skipping processed records.

  • Various activation methods for running jobs

    • Synchronous execution possible.

    • Asynchronous execution possible.

      • DB polling, HTTP requests can be used as opportunities for execution.

  • Various input and output interfaces

    • Database

    • File

      • Variable length like CSV or TSV

      • Fixed length

      • XML

Specific details for the above requirements are given below.

A large amount of data can be efficiently processed using certain resources (Performance improvement)

Processing time is reduced by processing the data collectively. Important part here is "Certain resources" part.
Processing can be done by using a CPU and memory for 100 or even 1 million records and the processing time is ideally extended slowly and linearly according to number of records. Transaction is started and terminated for certain number of records to perform a process collectively. Resources to be used must be levelled in order to perform I/O collectively.
Still, when a large amount of data is to be handled which is yet to be processed, a system wherein hardware resources are used till the limit going a step further. Data to be processed is divided into records or groups and multiple processing is done by using multiple processes and multiple threads. Moving ahead, distributed processing using multiple machines is also implemented. When resources are used upto the limit, it becomes extremely important to reduce as much as possible.

Continue the processing as much as possible (Recovery at the time of occurrence of abnormality)

When a large amount of data is to be processed, the countermeasures when an abnormality occurs in input data or system itself must be considered.
A large amount of data takes a long time to finish processing, however if the time till recovery after occurrence of error is prolonged, it is likely to affect the system a great deal.
For example, consider a data consisting of 1 billion records to be processed. Operation schedule would be obviously affected a great deal if error is detected in 999 millionth record and the processing so far is to be performed all over again.
To control this impact, process continuity unique to batch processing becomes very important Hence a system wherein error data skipped and next data record is processed, a system to restart the process and a system which attempts auto-recovery become necessary. Further, it is important to simplify a job as much as possible and enable its easy execution later.

Can be executed flexibly according to triggers of execution (various activation methods)

A system to respond to various execution triggers is necessary when triggered by time, or by connecting online or connecting with external system. various systems are widely known such as synchronous processing wherein processing starts when job scheduler reaches scheduled time, asynchronous processing wherein the process is kept resident and batch processing is performed as per the events.

Handles various input and output interfaces (Various input output interfaces)

It is important to handle various files like CSV/XML as well as databases for linking online and external systems. Further, if a method which transparently handles respective input and output method exists, implementation becomes easier and to deal with various formats becomes more quickly.

2.1.3. Rules and precautions to be considered in batch processing

Important rules while building a batch processing system and a few considerations are shown.

  • Simplify unit batch processing as much as possible and avoid complex logical structures.

  • Keep process and data in physical proximity (Save data at the location where process is executed).

  • Minimise the use of system resources (especially I/O) and execute operations in in-memory as much as possible.

  • Further, review I/O of application (SQL etc) to avoid unnecessary physical I/O.

  • Do not repeat the same process for multiple jobs.

    • For example, in case of counting and reporting process, avoid repetition of counting process during reporting process.

  • Always assume the worst situation related to data consistency. Verify data to check and to maintain consistency.

  • Review backups carefully. Difficulty level of backup will be high especially when system is operational seven days a week.

2.2. TERASOLUNA Batch Framework for Java (5.x) stack

2.2.1. Overview

TERASOLUNA Batch Framework for Java (5.x)configuration is explained and TERASOLUNA Batch Framework for Java (5.x) scope of responsibility.

2.2.2. TERASOLUNA Batch Framework for Java (5.x) stack

Software Framework used in TERASOLUNA Batch Framework for Java (5.x) is a combination of OSS focusing on Spring Framework (Spring Batch) A stack schematic diagram of TERASOLUNA Batch Framework for Java (5.x) is shown below.

TERASOLUNA Batch Framework for Java (5.x) Stack
TERASOLUNA Batch Framework for Java (5.x) stack - schematic diagram

Descriptions for products like job scheduler and database are excluded from this guideline.

2.2.2.1. OSS version to be used

List of OSS versions to be used in 5.0.1.RELEASE of TERASOLUNA Batch Framework for Java (5.x) is given below.

OSS version to be used in TERASOLUNA Batch Framework for Java (5.x) as a rule conforms to definition of Spring IO platform. Note that, version of Spring IO platform in 5.0.1.RELEASE is Athens-SR2.+ For details of Spring IO platform, refer OSS version to be used of TERASOLUNA Server Framework for Java (5.x).

OSS version list
Type GroupId ArtifactId Version Spring IO platform Remarks

Spring

org.springframework

spring-aop

4.3.5.RELEASE

*

Spring

org.springframework

spring-beans

4.3.5.RELEASE

*

Spring

org.springframework

spring-context

4.3.5.RELEASE

*

Spring

org.springframework

spring-expression

4.3.5.RELEASE

*

Spring

org.springframework

spring-core

4.3.5.RELEASE

*

Spring

org.springframework

spring-tx

4.3.5.RELEASE

*

Spring

org.springframework

spring-jdbc

4.3.5.RELEASE

*

Spring Batch

org.springframework.batch

spring-batch-core

3.0.7.RELEASE

*

Spring Batch

org.springframework.batch

spring-batch-infrastructure

3.0.7.RELEASE

*

Spring Retry

org.springframework.retry

spring-retry

1.1.5.RELEASE

*

Java Batch

javax.batch

javax.batch-api

1.0.1

*

Java Batch

com.ibm.jbatch

com.ibm.jbatch-tck-spi

1.0

*

MyBatis3

org.mybatis

mybatis

3.4.2

MyBatis3

org.mybatis

mybatis-spring

1.3.1

MyBatis3

org.mybatis

mybatis-typehandlers-jsr310

1.0.2

DI

javax.inject

javax.inject

1

*

Log output

ch.qos.logback

logback-classic

1.1.8

*

Log output

ch.qos.logback

logback-core

1.1.8

*

*1

Log output

org.slf4j

jcl-over-slf4j

1.7.22

*

Log output

org.slf4j

slf4j-api

1.7.22

*

Input check

javax.validation

validation-api

1.1.0.Final

*

Input check

org.hibernate

hibernate-validator

5.2.4.Final

*

Input check

org.jboss.logging

jboss-logging

3.3.0.Final

*

*1

Input check

com.fasterxml

classmate

1.3.3

*

*1

Connection pool

org.apache.commons

commons-dbcp2

2.1.1

*

Connection pool

org.apache.commons

commons-pool2

2.4.2

*

Expression Language

org.glassfish

javax.el

3.0.0

*

In-memory database

com.h2database

h2

1.4.193

*

XML

com.thoughtworks.xstream

xstream

1.4.9

*

*1

XML

xmlpull

xmlpull

1.1.3.1

*1

XML

xpp

xpp3_min

1.1.4c

*1

XML

xpp

xpp3_min

1.1.4c

*1

JSON

org.codehaus.jettison

jettison

1.2

*

*1

Remarks
  1. Libraries which individually depend on libraries supported by Spring IO platform

2.2.3. Structural elements of TERASOLUNA Batch Framework for Java (5.x)

Software Framework structural elements of TERASOLUNA Batch Framework for Java (5.x) are explained.

TERASOLUNA Batch Framework for Java (5.x) Components of Software Framework
Schematic diagram of Software Framework structural elements

Overview of each element is shown below.

Foundation framework

Spring Framework is used as a framework foundation. Various functions are applied starting with DI container.

Batch framework

Spring Batch is used as a batch framework.

Asynchronous execution

Following functions are used as a method to execute asynchronous execution.

Periodic activation by using DB polling

A library offered by TERASOLUNA Batch Framework for Java (5.x) is used.

Web container activation

Link with Spring Batch using Spring MVC.

O/R Mapper

Use MyBatis, and use MyBatis-Spring as a library to coordinate with Spring Framework.

File access

In addition to Function offered from Spring Batch, TERASOLUNA Batch Framework for Java (5.x) is used as an auxiiliary function.

Logging

Logger uses SLF4J in API and Logback in the implementation.

Validation
Unit item check

Bean Validation is used in unit item check and Hibernate Validator is used for implementation.

Correlation check

Bean Validation or Spring Validation is used for correlation check.

Connection pool

DBCP is used in the connection pool.

A function wherein TERASOLUNA Batch Framework for Java (5.x) provides implementation

A function wherein TERASOLUNA Batch Framework for Java (5.x) provides implementation is given below.

A function list wherein TERASOLUNA Batch Framework for Java (5.x) offers implementation

Function name

Overview

"Asynchronous execution (DB polling)"

Asynchronous execution using DB polling is implemented.

"File access"

Read fixed-length file without line breaks by number of bytes.

Break down a fixed length record in individual field by number of bytes.

Control output of enclosed characters by variable length records.

2.3. Spring Batch Architecture

2.3.1. Overview

Spring Batch architecture acting as a base for TERASOLUNA Server Framework for Java (5.x) is explained.

2.3.1.1. What is Spring Batch

Spring Batch, as the name implies is a batch application framework. Following functions are offered based on DI container of Spring, AOP and transaction control function.

Functions to standardize process flow
Tasket model
Simple process

It is a method to freely describe a process. It is used in a simple cases like issuing SQL once, issuing a command etc and the complex cases like performing processing while accessing multiple database or files, which are difficult to standardize.

Chunk model
Efficient processing of large amount of data

A method to collectively input/process/output a fixed amount of data. Process flow of data input/processing and output is standardized and job can be implemented by implementing only a part of it.

Various activation methods

Execution is achieved by various triggers like command line execution, execution on Servlet and other triggers.

I/O of various data formats

Input and output for various data resources like file, database, message queue etc can be performed easily.

Efficient processing

Multiple execution, parallel execution, conditional branching are done based on the settings.

Job execution control

Permanence of execution, restart operation using data records as a standard can be performed.

2.3.1.2. Hello, Spring Batch!

If Spring Batch is not covered in understanding of Spring Batch architecture so far, the official documentation given below should be read. We would like you to get used to Spring Batch through creating simple application.

2.3.1.3. Basic structure of Spring Batch

Basic structure of Spring Batch is explained.

Spring Batch defines structure of batch process. It is recommended to perform development after understanding the structure.

Spring Batch Main Components
Primary components appearing in Spring Batch
Primary components appearing in Spring Batch
Components Roles

Job

A single execution unit that summarises a series of processes for batch application in Spring Batch.

Step

A unit of processing which constitutes Job. 1 job can contain 1~N steps
Reusing a process, parallelization, conditional branching can be performed by dividing 1 job process in multiple steps. Step is implemented by either chunk model or tasket model(will be described later).

JobLauncher

An interface for running a Job.
JobLauncher can be directly used by the user, however, a batch process can be started simply
by starting CommandLineJobRunner from java command. CommandLineJobRunner undertakes various processes for starting JobLauncher.

ItemReader
ItemProcessor
ItemWriter

An interface for dividing into three processes - input/processing/output of data while implementing chunk model.
Batch application consists of processing of these 3 patterns and in Spring Batch, implementation of these interfaces is utilized primarily in chunk model.
User describes business logic by dividing it according to respective roles.
Since ItemReader and ItemWriter responsible for data input and output are often the processes that perform conversion of database and files to Java objects and vice versa, a standard implementation is provided by Spring Batch. In general batch applications which perform input and output of data from file and database, conditions can be satisfied just by using standard implementation of Spring Batch as it is.
ItemProcessor which is responsible for processing data implements input check and business logic.

In Tasket model, ItemReader/ItemProcessor/ItemWriter substitutes a single Tasklet interface implementation. Input-Output, Input check and business logic all must be implemented in Tasklet.

JobRepository

A system to manage condition of Job and Step. The management information is persisted on the database based on the table schema specified by Spring Batch.

2.3.2. Architecture

Basic structure of Spring Batch is briefly explained in Overview.

Following points are explained on this basis.

In the end, performance tuning points of batch application which use Spring Batch are explained.

2.3.2.1. Overall process flow

Primary components of Spring Batch and overall process flow is explained. Further, explanation is also given about how to manage meta data of execution status of jobs.

Primary components of Spring Batch and overall process flow (chunk model) are shown in the figure below.

Spring Batch Process Flow
Primary components of Spring Batch and overall process flow

Main processing flow (black line) and the flow which persists job information (red line) are explained.

Main processing flow
  1. JobLauncher is initiated from the job scheduler.

  2. Job is executed from JobLauncher.

  3. Step is executed from Job.

  4. Step fetches input data by using ItemReader.

  5. Step processes input data by using ItemProcessor.

  6. Step outputs processed data by using ItemWriter.

A flow for persisting job information
  1. JobLauncher registers JobInstance in Database through JobRepository.

  2. JobLauncher registers that Job execution has started in Database through JobRepository.

  3. JobStep updates miscellaneous information like counts of I/O records and status in Database through JobRepository.

  4. JobLauncher registers that Job execution has completed in Database through JobRepository.

Components and JobRepository focusing on persistence are explained freshly again.

Components related to persistence
Components Roles

JobInstance

Spring Batch indicates "logical" execution of a Job. JobInstance is identified by Job name and arguments. In other words, execution with identical Job name and argument is identified as execution of identical JobInstance and Job is executed as a continuation from previous activation.
When the target Job supports re-execution and the process was suspended in between due to error in the previous execution, the job is executed from the middle of the process. On the other hand, when the target job does not support re-execution or when the target JobInstance has already been successfully processed, exception is thrown and Java process is terminated abnormally. For example, JobInstanceAlreadyCompleteException is thrown when the process has already been completed successfully.

JobExecution
ExecutionContext

JobExecution indicates "physical" execution of Job. Unlike JobInstance, it is termed as another JobExecution even while re-executing identical Job. As a result, JobInstance and JobExecution shows one-to-many relationship.
ExecutionContext is considered as an area for sharing metadata such as progress of a process in identical JobExecution. ExecutionContext is primarily used for enabling Spring Batch to record framework status, however, means to access ExecutionContext by the application is also provided.
The object stored in the JobExecutionContext must be a class which implements java.io.Serializable.

StepExecution
ExecutionContext

StepExecution indicates "physical" execution of Step. JobExecution and StepExecution shows one-to-many relationship.
Similar to JobExecution, ExecutionContext is an area for sharing data in Step. From the viewpoint of localization of data, information which is not required to be shared by multiple steps should use ExecutionContext of target step instead of using ExecutionContext of Job.
The object stored in StepExecutionContext must be a class which implements java.io.Serializable.

JobRepository

A function to manage and persist data for managing execution results and status of batch application like JobExecution or StepExecution is provided.
In general batch applications, the process is started by starting a Java process and Java process is also terminated along with termination of process. Hence, since the data is likely to be referred across Java process, it is stored in volatile memory as well as permanent layers like database. When data is to be stored in the database, database objects like table or sequence are required for storing JobExecution or StepExecution.
It is necessary to generate a database object based on schema information provided by Spring Batch.

Spring Batch heavily manages metadata in order to perform re-execution. A snapshot at the time of earlier execution must be retained and metadata and JobRepository should be used as a base in order to re-execute a batch process.

2.3.2.2. Running a Job

How to run a Job is explained.

A scenario is considered wherein a batch process is started immediately after starting Java process and Java process is terminated after completing a batch process. Figure below shows a process flow from starting a Java process till starting a batch process.

Job Launch Flow
Process flow from starting a Java process till starting a batch process
Starting a Java process and starting a Job

A shell script to start Java is generally described to start a Job defined on Spring Batch, along with starting a Java process. When CommandLineJobRunner offered by Spring Batch is used, Job on Spring Batch defined by the user can be easily started.

Start command of the Job which use CommandLineJobRunner is as shown below.

Start command when a Bean is defined by using XML
java -cp ${CLASSPATH} org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> <JobArgumentName1>=<value1> <JobArgumentName2>=<value2> ...
Specifying a Job parameter

CommandLineJobRunner can pass arguments (job parameters) as well along with Job name to be started. Arguments are specified in <Job argument name>=<Value> format as per the example described earlier. All the arguments are stored in JobExecution after conversion to JobParameters after interpreting and checking by CommandLineJobRunner or JobLauncher. For details, refer to running parameter of Job.

Register and restore JobInstance

JobLauncher fetches Job name from JobRepository and JobInstance matching with the argument from the database.

  • When corresponding JobInstance does not exist, JobInstance is registered as new.

  • When corresponding JobInstance exists, the associated JobExecution is restored.

    • In Spring Batch, for the jobs that can be executed repeatedly like daily execution etc, a method to add arguments only for making the JobInstance unique is listed. For example, adding system date or random number to arguments are listed.
      For the method recommended in this guideline, refer parameter conversion class.

2.3.2.3. Execution of business logic

Job is divided into smaller units called steps in Spring Batch. When Job is started, Job activates already registered steps and generates StepExecution. Step is a framework for dividing the process till the end and execution of business logic is delegated to Tasket called from Step.

Flow from Step to Tasklet is shown below.

Step-Tasklet Flow
Process flow from Step to Tasklet

A couple of methods can be listed as the implementation methods of Tasklet - "Chunk model" and "Tasket model". Since the overview has already been explained, the structure will be now explained here.

2.3.2.3.1. Chunk model

As described above, chunk model is a method wherein the processing is performed in a certain number of units (chunks) rather than processing the data to be processed one by one unit. ChunkOrientedTasklet acts as a concrete class of Tasklet which supports the chunk processing. Maximum records of data to be included in the chunk (hereafter referred as "chunk size") can be adjusted by using setup value called commit-interval of this class. ItemReader, ItemProcessor and ItemWriter are all the interfaces based on chunk processing.

Next, explanation is given about how ChunkOrientedTasklet calls the ItemReader, ItemProcessor and ItemWriter.

A sequence diagram wherein ChunkOrientedTasklet processes one chunk is shown below.

Sequence of Chunk processing with ChunkOrientedTasklet
Chunk processing by using ChunkOrientedTasklet

ChunkOrientedTasklet repeatedly executes ItemReader and ItemProcessor by the chunk size, in other words, reading and processing of data. After completing reading all the data of chunks, data writing process of ItemWriter is called only once and all the processed data in the chunks is passed. Data update processing is designed to be called once for chunks to enable easy organising like addBatch and executeBatch of JDBC.

Next, ItemReader, ItemProcessor and ItemWriter which are responsible for actual processing in chunk processing are introduced. Although it is assumed that the user handles his own implementation for each interface, it can also be covered by a generic concrete class provided by Spring Batch.

Especially, since ItemProcessor describes the business logic itself, the concrete classes are hardly provided by Spring Batch. ItemProcessor interface is implemented while describing the business logic. ItemProcessor is designed to allow types of objects used in I/O to be specified in respective generics so that typesafe programming is enabled.

An implementation example of a simple ItemProcessor is shown below.

Implementation example of ItemProcessor
public class MyItemProcessor implements
      ItemProcessor<MyInputObject, MyOutputObject> {  // (1)
  @Override
    public MyOutputObject process(MyInputObject item) throws Exception {  // (2)

        MyOutputObject processedObject = new MyOutputObject();  // (3)

        // Coding business logic for item of input data

    return processedObject; // (4)
  }
}
Sr. No. Description

(1)

Implement ItemProcessor interface which specifies the types of objects used for input and output.

(2)

Implement process method. Argument item is input data.

(3)

Create output object and store business logic results processed for the input data item.

(4)

Return output object.

Various concrete classes are offered by Spring Batch for ItemReader or ItemWriter and these are used quite frequently. However, when a file of specific format is to be input or output, a concrete class which implements individual ItemReader or ItemWriter can be created and used.

For implementation of business logic while developing actual application, refer application development flow.

Representative concrete classes of ItemReader, ItemProcessor and ItemWriter offered by Spring Batch are shown in the end.

Representative concrete classes of ItemReader, ItemProcessor and ItemReader offered by Spring Batch
Interface Concrete class name Overview

ItemReader

FlatFileItemReader

Read flat files (non-structural files) like CSV file. Mapping rules for delimiters and objects can be customised by using Resource object as input.

StaxEventItemReader

Read XML file. As the name implies, it is an implementation which reads a XML file based on StAX.

JdbcPagingItemReader
JdbcCursorItemReader

Execute SQL by using JDBC and read records on the database. When a large amount of data is to be processed on the database, it is necessary to avoid reading all the records on memory, and to read and discard only the data necessary for one processing.
JdbcPagingItemReader is implemented by dividing SELECT SQL for each page by using JdbcTemplate and then issuing the same. On the other hand, JdbcCursorItemReader is implemented by issuing one SELECT SQL by using JDBC cursor.
Using MyBatis is considered as a base in TERASOLUNA Batch 5.x.

MyBatisCursorItemReader
MyBatisPagingItemReader

Read records on the database in coordination with MyBatis. Spring coordination library offered by MyBatis is provided by MyBatis-Spring. For the difference between Paging and Cursor, it is same as JdbcXXXItemReader except for using MyBatis for implementation.+ In addition, JpaPagingItemReader, HibernatePagingItemReader and HibernateCursor are provided which reads records on the database by coordinating with ItemReaderJPA implementation or Hibernate.

Using MyBatisCursorItemReader is considered as a base in TERASOLUNA Batch 5.x.

JmsItemReader
AmqpItemReader

Receive messages from JMS or AMQP and read the data contained in the same.

ItemProcessor

PassThroughItemProcessor

No operation is performed. It is used when processing and modification of input data is not required.

ValidatingItemProcessor

Performs input check. It is necessary to implement Spring Batch specific org.springframework.batch.item.validator.Validator for the implementation of input check rules, however,
SpringValidator which is an adaptor of a generalorg.springframework.validation.Validator offered by Spring is provided and rules of org.springframework.validation.Validator can be used.
Use of ValidatingItemProcessor is prohibited in TERASOLUNA Batch 5.x.
For details, refer Input check.

CompositeItemProcessor

Sequentially execute multiple ItemProcessor for identical input data. It is enabled when business logic is to be executed after performing input check using ValidatingItemProcessor.

ItemWriter

FlatFileItemWriter

Write processed Java object as a flat file like CSV file. Mapping rules for file lines can be customised from delimiters and objects.

StaxEventItemWriter

Write processed Java object as a XML file.

JdbcBatchItemWriter

Execute SQL by using JDBC and output processed Java object to database. Internally JdbcTemplate is used.

MyBatisBatchItemWriter

Coordinate with MyBatis and output processed Java object to the database. It is provided by Spring coordination library MyBatis-Spring offered by MyBatis.
JPA implementation or JpaItemWriter and HibernateItemWriter for Hibernate is not used in TERASOLUNA Batch 5.x.

JmsItemWriter
AmqpItemWriter

Send a message of a processed Java object with JMS or AMQP.

PassThroughItemProcessor omitted

When a job is defined in XML, ItemProcessor setting can be omitted. When it is omitted, input data is passed to ItemWriter without performing any operation similar to PassThroughItemProcessor.

ItemProcessor omitted
<batch:job id="exampleJob">
    <batch:step id="exampleStep">
        <batch:tasklet>
            <batch:chunk reader="reader" writer="writer" commit-interval="10" />
        </batch:tasklet>
    </batch:step>
</batch:job>
2.3.2.3.2. Tasket model

Chunk model is a framework suitable for batch applications that read multiple input data one by one and perform a series of processing. However, a process which does not fit with the type of chunk processing is also implemented. For example, when system command is to be executed and when only one record of table for control is to be updated.

In such a case, merits of efficiency obtained by chunk processing are very less and demerits owing to difficult design and implementation are significant. Hence, it is rational to use tasket model.

It is necessary for the user to implement Tasket interface provided by Spring Batch while using a Tasket model. Further, following concrete class is provided in Spring Batch, subsequent description is not given in TERASOLUNA Batch 5.x.

Concrete class of Tasket offered by Spring Batch
Class name Overview

SystemCommandTasklet

Tasket to execute system commands asynchronously. Command to be specified in the command property is specified.
Since the system command is executed by a thread different from the thread for calling, it is possible to set a timeout and cancel the execution thread of the system command during the process.

MethodInvokingTaskletAdapter

Tasket for executing specific methods of POJO class. Specify Bean of target class in targetObject property and name of the method to be executed in targetMethod property.
POJO class can return batch process termination status as a return value of the method, however then the ExitStatus described later must be set as a return value. When a value of another type is returned, the status is considered as "normal termination (ExistStatus: COMPLETED) regardless of the return value.

2.3.2.4. Metadata schema of JobRepository

Metadata schema of JobRepository is explained.

Note that, overall picture is explained including the contents explained in Spring Batch reference Appendix B. Meta-Data Schema

Spring Batch metadata table corresponds to a domain object (Entity object) which are represented by Java.

Correspondence list

Table

Entity object

Overview

BATCH_JOB_INSTANCE

JobInstance

Retains the string which serialises job name and job parameter.

BATCH_JOB_EXECUTION

JobExecution

Retains job status and execution results.

BATCH_JOB_EXECUTION_PARAMS

JobExecutionParams

Retains job parameters assigned at the startup.

BATCH_JOB_EXECUTION_CONTEXT

JobExecutionContext

Retains the context inside the job.

BATCH_STEP_EXECUTION

StepExecution

Retains status and execution results of step, number of commits and rollbacks.

BATCH_STEP_EXECUTION_CONTEXT

StepExecutionContext

Retains context inside the step.

JobRepository is responsible for accurately storing the contents stored in each Java object, in the table.

Regarding the character string stored in the meta data table

Character string stored in the meta data table allows only a restricted number of characters and when this limit is exceeded, character string is truncated.
Note that, multibyte characters are not taken into consideration in Spring Batch and an error is likely to occur in DDL of meta data table offered by Spring Batch even if character string to be stored is within the character limit. It is necessary to extend the size by encoding using a column of meta data table, and to set the character data type in character count definition, in order to store multibyte characters.

Oracle Schema provided by Spring Batch offers a DDL for Oracle in TERASOLUNA Batch 5.x that explicitly sets character data type in character count definition since character data type of database is defined by default number of bytes.
DDL to be offered is included in the org.terasoluna.batch package which is included in jar of TERASOLUNA Batch 5.x.

6 ERD models of all the tables and interrelations are shown below.

ER Diagram
ER diagram
2.3.2.4.1. Version

Majority of database tables contain version columns. This column is important since Spring Batch adopts an optimistic locking strategy to handle updates to database. This record signifies that it is updated when the value of the version is incremented. When JobRepository updates the value and the version number is changed, an OptimisticLockingFailureException which indicates an occurrence of simultaneous access error is thrown. Other batch jobs may be running on a different machines, however, all the jobs use the same database, hence this check is required.

2.3.2.4.2. ID (Sequence) definition

BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION and BATCH_STEP_EXECUTION all contain column ending with _ID. These fields act as a primary key for respective tables. However, these keys are not generated in the database but are rather generated in a separate sequence. After inserting one of the domain objects in the database, the keys which assign the domain objects should be set in the actual objects so that they can be uniquely identified in Java.
Sequences may not be supported depending on the database. In this case, a table is used instead of each sequence.

2.3.2.4.3. Table definition

Explanation is given for each table item.

BATCH_JOB_INSTANCE

BATCH_JOB_INSTANCE table retains all the information related to JobInstance and is at top level of the overall hierarchy.

BATCH_JOB_INSTANCE definition
Column name Description

JOB_INSTANCE_ID

A primary key which is a unique ID identifying an instance.

VERSION

Refer Version.

JOB_NAME

Job name. A non-null value since it is necessary for identifying an instance.

JOB_KEY

JobParameters which are serialised for uniquely identifying same job as a different instance.
JobInstances with the same job name must contain different JobParameters (in other words, varying JOB_KEY values).

BATCH_JOB_EXECUTION

BATCH_JOB_EXECUTION table retains all the information related to JobExecution object. When a job is executed, new rows are always registered in the table with new JobExecution.

BATCH_JOB_EXECUTION definition
Column name Description

JOB_EXECUTION_ID

Primary key that uniquely identifies this job execution.

VERSION

Refer Version.

JOB_INSTANCE_ID

Foreign key from BATCH_JOB_INSTANCE table which shows an instance wherein the job execution belongs. Multiple executions are likely to exist for each instance.

CREATE_TIME

Time when the job execution was created.

START_TIME

Time when the job execution was started.

END_TIME

Indicates the time when the job execution was terminated regardless of whether it was successful or failed.
Even though the job is not running currently, the column value is empty which indicates there are several error types and the framework was unable to perform last save operation.

STATUS

A character string which indicates job execution status. It is a character string output by BatchStatus enumeration object.

EXIT_CODE

A character string which indicates an exit code of job execution. When it is activated by CommandLineJobRunner, it can be converted to a numeric value.

EXIT_MESSAGE

A character string which explains job termination status in detail. When a failure occurs, a character string that includes as many as stack traces as possible is likely.

LAST_UPDATED

Time when job execution of the record was last updated.

BATCH_JOB_EXECUTION_PARAMS

BATCH_JOB_EXECUTION_PARAMS table retains all the information related to JobParameters object. It contains a pair of 0 or more keys passed to the job and the value and records the parameters by which the job was executed.

BATCH_JOB_EXECUTION_PARAMS definition
Column name Description

JOB_EXECUTION_ID

Foreign key from BATCH_JOB_EXECUTION table which executes this job wherein the job parameter belongs.

TYPE_CD

A character string which indicates that the data type is string, date, long or double.

KEY_NAME

Parameter key.

STRING_VAL

Parameter value when data type is string.

DATE_VAL

Parameter value when data type is date.

LONG_VAL

Parameter value when data type is an integer.

DOUBLE_VAL

Parameter value when data type is a real number.

IDENTIFYING

A flag which indicates that the parameter is a value to identify that the job instance is unique.

BATCH_JOB_EXECUTION_CONTEXT

BATCH_JOB_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Job. It contains all the job level data required for execution of specific jobs. The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.

BATCH_JOB_EXECUTION_CONTEXT definition
Column name Description

JOB_EXECUTION_ID

A foreign key from BATCH_JOB_EXECUTION table which indicates job execution wherein ExecutionContext of Job belongs.

SHORT_CONTEXT

A string representation of SERIALIZED_CONTEXT.

SERIALIZED_CONTEXT

Overall serialised context.

BATCH_STEP_EXECUTION

BATCH_STEP_EXECUTION table retains all the information related to StepExecution object. This table very similar to BATCH_JOB_EXECUTION table in many ways. When each JobExecution is created, at least one entry exists for each Step.

BATCH_STEP_EXECUTION definition
Column name Description

STEP_EXECUTION_ID

Primary key that uniquely identifies the step execution.

VERSION

Refer Version.

STEP_NAME

Step name.

JOB_EXECUTION_ID

Foreign key from BATCH_JOB_EXECUTION table which indicates JobExecution wherein StepExecution belongs

START_TIME

Time when step execution was started.

END_TIME

Indicates time when step execution ends regardless of whether it is successful or failed.
Even though the job is not running currently, the column value is empty which indicates there are several error types and the framework was unable to perform last save operation.

STATUS

A character string that represents status of step execution. It is a string which outputs BatchStatus enumeration object.

COMMIT_COUNT

Number of times a transaction is committed.

READ_COUNT

Data records read by ItemReader.

FILTER_COUNT

Data records filtered by ItemProcessor.

WRITE_COUNT

Data records written by ItemWriter.

READ_SKIP_COUNT

Data records skipped by ItemReader.

WRITE_SKIP_COUNT

Data records skipped by ItemWriter.

PROCESS_SKIP_COUNT

Data records skipped by ItemProcessor.

ROLLBACK_COUNT

Number of times a transaction is rolled back.

EXIT_CODE

A character string which indicates exit code for step execution. When it is activated by using CommandLineJobRunner, it can be changed to a numeric value.

EXIT_MESSAGE

A character string which explains step termination status in detail. When a failure occurs, a character string that includes as many as stack traces as possible is likely.

LAST_UPDATED

Time when the step execution of the record was last updated.

BATCH_STEP_EXECUTION_CONTEXT

BATCH_STEP_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Step. It contains all the step level data required for execution of specific steps. The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.

BATCH_STEP_EXECUTION_CONTEXT definition
Column name Description

STEP_EXECUTION_ID

Foreign key from BATCH_STEP_EXECUTION table which indicates job execution wherein ExecutionContext of Step belongs.

SHORT_CONTEXT

String representation of SERIALIZED_CONTEXT.

SERIALIZED_CONTEXT

Overall serialized context.

2.3.2.4.4. DDL script

JAR file of Spring Batch Core contains a sample script which creates a relational table corresponding to several database platforms. These scripts can be used as it is or additional index or constraints can be changed as required.
The script is included in the package of org.springframework.batch.core and the file name is configured by schema-*.sql. "*" is the short name for Target Database Platform..

2.3.2.5. Typical performance tuning points

Typical performance tuning points in Spring Batch are explained.

Adjustment of chunk size

Chunk size is increased to reduce overhead occurring due to resource output.
However, if chunk size is too large, it increases load on the resources resulting in deterioration in the performance. Hence, chunk size must be adjusted to a moderate value.

Adjustment of fetch size

Fetch size (buffer size) for the resource is increased to reduce overhead occurring due to input from resources.

Reading of a file efficiently

When BeanWrapperFieldSetMapper is used, a record can be mapped to the Bean only by sequentially specifying Bean class and property name. However, it takes time to perform complex operations internally. Processing time can be reduced by using dedicated FieldSetMapper interface implementation which performs mapping.
For file I/O details, refer "File access".

Parallel processing, Multiple processing

Spring Batch supports parallel processing of Step execution and multiple processing by using data distribution. Parallel processing or multiple processing can be performed and the performance can be improved by running the processes in parallel. However, if number of parallel processes and multiple processes is too large, load on the resources increases resulting in deterioration of performance. Hence, size must be adjusted to a moderate value.
For details of parallel and multiple processing, refer "parallel processing and multiple processing.

Reviewing distributed processing

Spring Batch also supports distributed processing across multiple machines. Guidelines are same as parallel and multiple processing.
Distributed processing will not be explained in this guideline since the basic design and operational design are complex.

2.4. Architecture of TERASOLUNA Batch Framework for Java (5.x)

2.4.1. Overview

Overall architecture of TERASOLUNA Batch Framework for Java (5.x) is explained.

In TERASOLUNA Batch Framework for Java (5.x), as described in "General batch processing system", it is implemented by using OSS combination focused on Spring Batch.

A configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x) including hierarchy architecture of Spring Batch is shown below.

TERASOLUNA Batch Framework for Java (5.x) Stack
Configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x)
Description of hierarchy architecture of Spring Batch
Business Application

All job definitions and business logic written by developers.

spring batch core

A core runtime class required to start and control batch jobs offered by Spring Batch.

spring batch infrastructure

Implementation of general ItemReader/ItemProcessor/ItemWriter offered by Spring Batch which are used by developers and core framework itself.

2.4.2. Structural elements of job

A configuration schematic diagram of jobs is shown below in order to explain structural elements of the job.

Job Components
Configuration schematic diagram of job

This section also talks about guidelines which should be finely configured for job and step.

2.4.2.1. Job

A job is an entity that encapsulates entire batch process and is a container for storing steps.
A job can consist of one or more steps.

A job is defined in the Bean definition file by using XML. Multiple jobs can be defined in the job definition file, however, managing jobs tend to become complex.

Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.

1 job = 1 job definition file

2.4.2.2. Step

Step defines information required for controlling a batch process. A chunk model and a tasket model can be defined in the step.

Chunk model
  • It is configured by ItemReader, ItemProcessor and ItemWriter.

Tasket model
  • It is configured only by Tasklet.

As given in "Rules and precautions to be considered in batch processing", it is necessary to simplify as much as possible and avoid complex logical structures in a single batch process.

Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.

1 step = 1 batch process = 1 business logic

Distribution of business logic in chunk model

If a single business logic is complex and large-scale, the business logic is divided into units. As clear from the schematic diagram, since only one ItemProcessor can be set in 1 step, it looks like the division of business logic is not possible. However, since CompositeItemProcssor which is an ItemProcessor consisting of multiple ItemProcessors exist, the business logic can be divided and executed by using this implementation.

2.4.3. How to implement Step

2.4.3.1. Chunk model

Definition of chunk model and purpose of use are explained.

Definition

ItemReader, ItemProcessor and ItemWriter implementation and number of chunks are set in ChunkOrientedTasklet. Respective roles are explained.

  • ChunkOrientedTasklet…​Call ItemReader/ItemProcessor and create a chunk. Pass created chunk to ItemWriter.

  • ItemReader…​Read input data.

  • ItemProcessor…​Process read data.

  • ItemWriter…​Output processed data in chunk units.

For overview of chunk model, refer "Chunk model".

How to set a job in chunk model
<batch:job id="exampleJob">
    <batch:step id="exampleStep">
        <batch:tasklet>
            <batch:chunk reader="reader"
                         processor="processor"
                         writer="writer"
                         commit-interval="100" />
        </batch:tasklet>
    </batch:step>
</batch:job>
Purpose of use

Since it handles a certain amount of data collectively, it is used while handling a large amount of data.

2.4.3.2. Tasket model

Definition of tasket model and purpose of use are explained.

Definition

Only Tasklet implementation is set.
For overview of Tasket model, refer "Tasket model".

How to set a job in Tasket model
<batch:job id="exampleJob">
    <batch:step id="exampleStep">
        <batch:tasklet ref="myTasklet">
    </batch:step>
</batch:job>
Purpose of use

It can be used for executing a process which is not associated with I/O like execution of system commands etc.
Further, it can also be used while committing the data in batches.

2.4.3.3. Function difference between chunk model and Tasket model

Explanation is given for the function difference between chunk model and Tasket model. Here, only outline is given. Refer section for each function for details.

List of function differences
Function Chunk model Tasket model

Structural elements

Configured by ItemReader/ItemProcessor/ItemWriter/ChunkOrientedTasklet.

Configured only by Takslet.

Transaction

A transaction is generated in a chunk unit.

Processed in 1 transaction.

Recommended reprocessing method

Re-run and re-start can be used.

As a rule, only re-run is used.

Exception handling

Handling process becomes easier by using a listener. Individual implementation is also possible.

Individual implementation is required.

2.4.4. Running a job method

Running a job method is explained. This contains following.

Respective methods are explained.

2.4.4.1. Synchronous execution method

Synchronous execution method is an execution method wherein the control is not given back to the boot source from job start to job completion.

A schematic diagram which starts a job from job scheduler is shown.

Synchronized Execution
Schematic diagram for synchronous execution
  1. Start a shell script to run a job from job scheduler.
    Job scheduler waits until the exit code (numeric value) is returned.

  2. Start CommandLineJobRunner to run a job from shell script.
    Shell script waits until CommandLineJobRunner returns an exit code (numeric value).

  3. CommandLineJobRunner runs a job. Job returns an exit code (string) to CommandLineJobRunner after processing is completed.
    CommandLineJobRunner converts exit code (string) returned from the job to exit code (numeric value) and returns it to the shell script.

2.4.4.2. Asynchronous execution method

Asynchronous execution method is an execution method wherein the control is given back to boot source immediately after running a job, by executing a job on a different execution base than boot source (a separate thread etc). In this method, it is necessary to fetch job execution results by a means different from that of running a job.

Following 2 methods are explained in TERASOLUNA Batch Framework for Java (5.x).

Other asynchronous execution methods

Asynchronous execution can also be performed by using messages like MQ, however since the job execution points are identical, description will be omitted in TERASOLUNA Batch Framework for Java (5.x).

2.4.4.2.1. Asynchronous execution method (DB polling)

"Asynchronous execution (DB polling)" is a method wherein a job execution request is registered in the database, polling of the request is done and job is executed.

TERASOLUNA Batch Framework for Java (5.x) supports DB polling function. The schematic diagram of start by DB polling offered is shown.

DB Polling
DB polling schematic diagram
  1. User registers a job request to the database.

  2. DB polling function periodically monitors the registration of the job request and executes the corresponding job when the registration is detected.

    • Run the job from SimpleJobOperator and receive JobExecutionId after completion of the job.

    • JobExecutionId is an ID which uniquely identifies job execution and execution results are browsed from JobRepository by using this ID.

    • Job execution results are registered in JobRepository by using Spring Batch system.

    • DB polling is itself executed asynchronously.

  3. DB polling function updates JobExecutionId returned from SimpleJobOperator and the job request that started the status.

  4. Job process progress and results are referred separately by using JobExecutionId.

2.4.4.2.2. Asynchronous execution method (Web container)

"Asynchronous execution (Web container)" is a method wherein a job is executed asynchronously using the request sent to web application on the web container as a trigger.* A Web application can return a response immediately after starting without waiting for the job to end.

Web Container
Web container schematic diagram
  1. Send a request from a client to Web application.

  2. Web application asynchronously executes the job requested from a request.

    • Receive `JobExecutionId immediately after starting a job from SimpleJobOperator.

    • Job execution results are registered in JobRepository by using Spring Batch system.

  3. Web application returns a response to the client without waiting for the job to end.

  4. Job process progress and results are browsed separately by using JobExecutionId.

Further, it can also be linked with Web application configured by TERASOLUNA Server Framework for Java (5.x).

2.4.5. Points to consider while using

Points to consider while using TERASOLUNA Batch Framework for Java (5.x) are shown.

Running a job method
Synchronous execution method

It is used when job is run as per schedule and batch processing is carried out by combining multiple jobs.

Asynchronous execution method (DB polling)

It is used in delayed processing, continuous execution of jobs with a short processing time, aggregation of large quantity of jobs.

Asynchronous execution method (Web container)

Similar to DB polling, however it is used when an immediate action is required for the startup.

Implementation method
Chunk model

It is used when a large quantity of data is to be processed efficiently.

Tasket model

It is used for simple processing, processing that is difficult to standardize and for the processes wherein data is to be processed collectively.

3. Methodology of application development

3.1. Development of batch application

The development of batch application is explained in the following flow.

3.1.1. What is blank project

Blank project is the template of development project wherein various settings are made in advance such as Spring Batch, MyBatis3 and is the start point of application development.
In this guideline, a blank project with a single project structure is provided.
Refer to Project structure for the explanation of structure.

Difference from TERASOLUNA Server 5.x

Multi-project structure is recommended forTERASOLUNA Server 5.x. The reason is mainly to enjoy the following merits.

  • Makes the environmental differences easier to absorb

  • Makes separation of business logic and presentation easier

However, in this guideline, a single project structure is provided unlike TERASOLUNA Server 5.x.

This point should be considered for batch application also, however, by providing single project structure, accessing the resources related to one job is given priority.
In case of batch application, one of the reason is that there are many cases when environment differences can be switched by property file or environment variables.

3.1.2. Creation of project

How to create a project using archetype:generate of Maven Archetype Plugin is explained.

Regarding prerequisites of creating environment

Prerequisites are explained below.

  • Java SE Development Kit 8

  • Apache Maven 3.x

    • Internet should be connected

    • When connecting to the Internet via proxy, Maven proxy setting should be done

  • IDE

    • Spring Tool Suite / Eclipse etc.

Considerations after creating a project

Version of TERASOLUNA Batch 5.x defined in the generated pom.xml must be changed from 5.0.1-SNAPSHOT to 5.0.1.RELEASE.

Before modification
<properties>
    <!-- omiited -->

    <terasoluna-batch.version>5.0.1-SNAPSHOT</terasoluna-batch.version>

    <!-- omiited -->
</properties>
After modification
<properties>
    <!-- omiited -->

    <terasoluna-batch.version>5.0.1.RELEASE</terasoluna-batch.version>

    <!-- omiited -->
</properties>

Execute the following commands in the directory where project is created.

Command prompt(Windows)
C:\xxx> mvn archetype:generate ^
  -DarchetypeGroupId=org.terasoluna.batch ^
  -DarchetypeArtifactId=terasoluna-batch-archetype ^
  -DarchetypeVersion=5.0.1.RELEASE
Bash(Unix, Linux, …​)
$ mvn archetype:generate \
  -DarchetypeGroupId=org.terasoluna.batch \
  -DarchetypeArtifactId=terasoluna-batch-archetype \
  -DarchetypeVersion=5.0.1.RELEASE

Next, set the following to Interactive mode in accordance with the status of the user.

  • groupId

  • artifactId

  • version

  • package

An example of setting and executing the value is shown below.

Explanation of each element of blank project
Item name Setting example

groupId

com.example.batch

artifactId

batch

version

1.0.0-SNAPSHOT

package

com.example.batch

Execution example at command prompt
C:\xxx>mvn archetype:generate ^
More? -DarchetypeGroupId=org.terasoluna.batch ^
More? -DarchetypeArtifactId=terasoluna-batch-archetype ^
More? -DarchetypeVersion=5.0.1.RELEASE
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------

(.. omitted)

Define value for property 'groupId': com.example.batch
Define value for property 'artifactId': batch
Define value for property 'version' 1.0-SNAPSHOT: : 1.0.0-SNAPSHOT
Define value for property 'package' com.example.batch: :
Confirm properties configuration:
groupId: com.example.batch
artifactId: batch
version: 1.0.0-SNAPSHOT
package: com.example.batch
 Y: : y
[INFO] ------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: terasolua-batch-archetype:5.0.1.RELEASE
[INFO] ------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: packageInPathFormat, Value: com/example/batch
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] Project created from Archetype in dir: C:\xxx\batch
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36.952 s
[INFO] Finished at: 2017-07-25T14:23:42+09:00
[INFO] Final Memory: 14M/129M
[INFO] ------------------------------------------------------------------------
Execution example at Bash
$ mvn archetype:generate \
> -DarchetypeGroupId=org.terasoluna.batch \
> -DarchetypeArtifactId=terasoluna-batch-archetype \
> -DarchetypeVersion=5.0.1.RELEASE
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------

(.. omitted)

Define value for property 'groupId': com.example.batch
Define value for property 'artifactId': batch
Define value for property 'version' 1.0-SNAPSHOT: : 1.0.0-SNAPSHOT
Define value for property 'package' com.example.batch: :
Confirm properties configuration:
groupId: com.example.batch
artifactId: batch
version: 1.0.0-SNAPSHOT
package: com.example.batch
 Y: : y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: terasoluna-batch-archetype:5.0.1-RELEASE
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: packageInPathFormat, Value: com/example/batch
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] Project created from Archetype in dir: C:\xxx\batch
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:19 min
[INFO] Finished at: 2017-07-25T14:20:09+09:00
[INFO] Final Memory: 17M/201M
[INFO] ------------------------------------------------------------------------

The creation of project is completed by the above execution.

It can be confirmed whether the project was created properly by the following points.

Execution at command prompt (Verify that it was created correctly)
C:\xxx>mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
C:\xxx>java -cp "lib/*;target/*" ^
org.springframework.batch.core.launch.support.CommandLineJobRunner ^
META-INF/jobs/job01.xml job01
Execution at Bash (Verify that it was created correctly)
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
$ java -cp 'lib/*;target/*' \
org.springframework.batch.core.launch.support.CommandLineJobRunner \
META-INF/jobs/job01.xml job01

It is created properly if the following output is obtained.

Output example at command prompt
C:\xxx>mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building TERASOLUNA Batch Framework for Java (5.x) Blank Project 1.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------

(.. omitted)

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.007 s
[INFO] Finished at: 2017-07-25T14:24:36+09:00
[INFO] Final Memory: 23M/165M
[INFO] ------------------------------------------------------------------------

C:\xxx>java -cp "lib/*;target/*" ^
More? org.springframework.batch.core.launch.support.CommandLineJobRunner ^
More? META-INF/jobs/job01.xml job01

(.. omitted)

[2017/07/25 14:25:22] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] launched with the following parameters: [{jsr_batch_run_id=1}]
[2017/07/25 14:25:22] [main] [o.s.b.c.j.SimpleStepHandler] [INFO ] Executing step: [job01.step01]
[2017/07/25 14:25:23] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] completed with the following parameters: [{jsr_batch_run_id=1}] and the following status: [COMPLETED]
[2017/07/25 14:25:23] [main] [o.s.c.s.ClassPathXmlApplicationContext] [INFO ] Closing org.springframework.context.support.ClassPathXmlApplicationContext@62043840: startup date [Tue Jul 25 14:25:20 JST 2017]; root of context hierarchy
Output example at Bash
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building TERASOLUNA Batch Framework for Java (5.x) Blank Project 1.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------

(.. omitted)

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.827 s
[INFO] Finished at: 2017-07-25T14:21:19+09:00
[INFO] Final Memory: 27M/276M
[INFO] ------------------------------------------------------------------------

$ java -cp 'lib/*;target/*' \
> org.springframework.batch.core.launch.support.CommandLineJobRunner \
> META-INF/jobs/job01.xml job01
[2017/07/25 14:21:49] [main] [o.s.c.s.ClassPathXmlApplicationContext] [INFO ] Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@62043840: startup date [Tue Jul 25 14:21:49 JST 2017]; root of context hierarchy

(.. ommited)

[2017/07/25 14:21:52] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] launched with the following parameters: [{jsr_batch_run_id=1}]
[2017/07/25 14:21:52] [main] [o.s.b.c.j.SimpleStepHandler] [INFO ] Executing step: [job01.step01]
[2017/07/25 14:21:52] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] completed with the following parameters: [{jsr_batch_run_id=1}] and the following status: [COMPLETED]
[2017/07/25 14:21:52] [main] [o.s.c.s.ClassPathXmlApplicationContext] [INFO ] Closing org.springframework.context.support.ClassPathXmlApplicationContext@62043840: startup date [Tue Jul 25 14:21:49 JST 2017]; root of context hierarchy

3.1.3. Project structure

Project structure that was created above, is explained. Project structure should be made by considering the following points.

  • Implement the job independent of startup method

  • Save the efforts of performing various settings such as Spring Batch, MyBatis

  • Make the environment dependent switching easy

The structure is shown and each element is explained below.
(It is explained based on the output at the time of executing the above mvn archetype:generate to easily understand.)

BlankProject Structure
Directory configuration of project
Explanation of each element of blank project
Sr. No. Explanation

(1)

root package that stores various classes of the entire batch application.

(2)

Package that stores various classes of 1 job.
It stores DTO, implementation of Tasklet and Processor, Mapper interface of MyBatis3.
Since there are no restrictions on how to store in this guideline, refer to this as an example.

You can customize it with reference to default state however, consider making it easier to judge the resources specific to job.

(3)

Configuration file of the entire batch application.
In the default state, the settings related to database connection and asynchronous execution are set up. You can add by referring default.

(4)

Configuration file of Logback(log output).

(5)

Configuration file that defines messages to be displayed when an error occurs during the input check using BeanValidation.
In the default state, after defining default messages of BeanValidation and HibernateValidator that is its implementation, Comment-out All is done.
In this state, since default messages are used, it should be modified to any message by Comment-in only when you want to customize the messages.

(6)

Mapper XML file that pairs with Mapper interface of MyBatis3.

(7)

Property file that defines messages used mainly for log output.

(8)

Directory that stores job-specific Bean definition file.
The hierarchical structure can be configured according to the number of jobs.

(9)

Directory that stores Bean definition file related to the entire batch application.
It is set to start a job regardless of default setting of Spring Batch or MyBatis or start trigger such as synchronous / asynchronous.

(10)

Bean definition file that describes settings related to asynchronous execution (DB polling) function.

(11)

Bean definition file to reduce various settings by importing in a job-specific Bean definition file.
By importing this, the job can absorb the difference in the Bean definition by the start trigger.

(12)

Bean definition file for setting Spring Batch behavior and common jobs.

Relation figure of each file is shown below.

Files Relation
Relation figure of each file

3.1.4. Flow of development

Series of flow of developing job is explained.
Here, we will focus on understanding general flow and not the detailed explanation.

3.1.4.1. Import to IDE

Since the generated project is as per the project structure of Maven, import as Maven project using various IDEs.
Detailed procedures are omitted.

3.1.4.2. Setting of entire application

Customize as follows depending on user status.

How to customize settings other than these by individual functions is explained.

3.1.4.2.1. Project information of pom.xml

As the following information is set with temporary values in the POM of the project, values should be set as per the status.

  • Project name(name element)

  • Project description(description element)

  • Project URL(url element)

  • Project inception year(inceptionYear element)

  • Project license(licenses element)

  • Project organization(organization element)

3.1.4.2.2. Database related settings

Database related settings are at many places, so each place should be modified.

pom.xml
<!-- (1) -->
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <scope>runtime</scope>
</dependency>
batch-application.properties
# (2)
# Admin DataSource settings.
admin.jdbc.driver=org.h2.Driver
admin.jdbc.url=jdbc:h2:mem:batch-admin;DB_CLOSE_DELAY=-1
admin.jdbc.username=sa
admin.jdbc.password=

# (2)
# Job DataSource settings.
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/postgres
#jdbc.username=postgres
#jdbc.password=postgres
jdbc.driver=org.h2.Driver
jdbc.url=jdbc:h2:mem:batch;DB_CLOSE_DELAY=-1
jdbc.username=sa
jdbc.password=

# (3)
# Spring Batch schema initialize.
data-source.initialize.enabled=true
spring-batch.schema.script=classpath:org/springframework/batch/core/schema-h2.sql
terasoluna-batch.commit.script=classpath:org/terasoluna/batch/async/db/schema-commit.sql
META-INF/spring/launch-context.xml
<!-- (3) -->
<jdbc:initialize-database data-source="adminDataSource"
                          enabled="${data-source.initialize.enabled:false}"
                          ignore-failures="ALL">
    <jdbc:script location="${spring-batch.schema.script}" />
    <jdbc:script location="${terasoluna-batch.commit.script}" />
</jdbc:initialize-database>

<!-- (4) -->
<bean id="adminDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
      destroy-method="close"
      p:driverClassName="${admin.jdbc.driver}"
      p:url="${admin.jdbc.url}"
      p:username="${admin.jdbc.username}"
      p:password="${admin.jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false"/>

<!-- (4) -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
      destroy-method="close"
      p:driverClassName="${jdbc.driver}"
      p:url="${jdbc.url}"
      p:username="${jdbc.username}"
      p:password="${jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false" />

<!-- (5) -->
<bean id="jobSqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean"
      p:dataSource-ref="jobDataSource" >
    <property name="configuration">
        <bean class="org.apache.ibatis.session.Configuration"
            p:localCacheScope="STATEMENT"
            p:lazyLoadingEnabled="true"
            p:aggressiveLazyLoading="false"
            p:defaultFetchSize="1000"
            p:defaultExecutorType="REUSE" />
    </property>
</bean>
META-INF/spring/async-batch-daemon.xml
<!-- (5) -->
<bean id="adminSqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean"
      p:dataSource-ref="adminDataSource" >
    <property name="configuration">
        <bean class="org.apache.ibatis.session.Configuration"
              p:localCacheScope="STATEMENT"
              p:lazyLoadingEnabled="true"
              p:aggressiveLazyLoading="false"
              p:defaultFetchSize="1000"
              p:defaultExecutorType="REUSE" />
    </property>
</bean>
Each element in database related settings is explained
Sr. No. Explanation

(1)

In pom.xml, define dependency relation of JDBC driver for connecting to the database to be used.
In the default state, H2 Database(in-memory database) and PostgreSQL are set, however add/delete should be performed whenever required.

(2)

Set JDBC driver connection.
- admin.jdbc.xxx is used by Spring Batch and TERASOLUNA Batch 5.x
- jdbc.xxx~ is used in individual job

(3)

Define whether or not to execute the initialization of database used by Spring Batch or TERASOLUNA Batch 5.x, and the script to be used.
Since Spring Batch accesses JobRepository and TERASOLUNA Batch 5.x accesses job request table in the asynchronous execution(DB Polling), database is mandatory.
Whether to enable it, is based on the following.
- Enable it when H2 Database is to be used. If disabled, JobRepositoryandjob request table* cannot be accessed and an error occurs.
- When not using H2 Database, disable it to prevent accidents.

(4)

Set datasource.
Tune the number of connections as necessary.

(5)

Set MyBatis behavior.
Tune fetch size as necessary.

3.1.5. Creation of job

Refer to the following for how to create a job.

3.1.6. Build and execution of project

Build and execution of project is explained.

3.1.6.1. Build of application

Move to the root directory of the project and execute the following command.

Build(Windows/Bash)
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package

The following is generated by this.

  • <Root directory>/target/[artifactId]-[version].jar

    • Jar of the created batch application is generated

  • <Root directory>/lib/(Dependent Jar file)

    • A set of dependent Jar files is copied

When deploying to the test environment and the commercial environment, these Jar files can be copied to an arbitrary directory.

3.1.6.2. Switching of configuration file according to the environment

In the pom.xml of the project, the following Profile is set as the default value.

Profiles settings of pom.xml
<profiles>
    <!-- Including application properties and log settings into package. (default) -->
    <profile>
        <id>IncludeSettings</id>
        <activation>
            <activeByDefault>true</activeByDefault>
        </activation>
        <properties>
            <exclude-property/>
            <exclude-log/>
        </properties>
    </profile>

    <!-- Excluding application properties and log settings into package. -->
    <profile>
        <id>ExcludeSettings</id>
        <activation>
            <activeByDefault>false</activeByDefault>
        </activation>
        <properties>
            <exclude-property>batch-application.properties</exclude-property>
            <exclude-log>logback.xml</exclude-log>
        </properties>
    </profile>
</profiles>

Here, Whether to include environment dependent configuration file is switched. By utilizing this setting, it is possible to absorb the environmental difference by separately placing the configurationfile at the time of environment deployment. Moreover, by applying this, it is possible to change the configuration file to be included in Jar in the test environment and the commercial environment. An example is shown below.

Description example of pom.xml for switching configuration file for each environment
<build>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
        </resource>
        <resource>
            <directory>${project.root.basedir}/${project.config.resource.directory.rdbms}</directory>
        </resource>
    </resources>
</build>

<profiles>
    <profile>
        <id>postgresql9-local</id>
        <activation>
            <activeByDefault>true</activeByDefault>
        </activation>
        <dependencies>
            <dependency>
                <groupId>org.postgresql</groupId>
                <artifactId>postgresql</artifactId>
                <scope>runtime</scope>
            </dependency>
        </dependencies>
        <properties>
            <project.config.resource.directory.rdbms>config/rdbms/postgresql9/local</project.config.resource.directory.rdbms>
        </properties>
    </profile>
    <profile>
        <id>postgresql9-it</id>
        <dependencies>
            <dependency>
                <groupId>org.postgresql</groupId>
                <artifactId>postgresql</artifactId>
                <scope>runtime</scope>
            </dependency>
        </dependencies>
        <properties>
            <project.config.resource.directory.rdbms>config/rdbms/postgresql9/it</project.config.resource.directory.rdbms>
        </properties>
    </profile>
</profiles>

Maven Profile can be activated at the time of executing command as follows.
Multiple Profiles can be activated. Use effectively whenever required.

Example of activating Maven Profile
$ mvn -P profile-1,profile-2
3.1.6.2.1. Execution of application

An example of executing the job based on the above-mentioned build result, is shown.
[artifactId] and [version] should be changed according to the user as set by Creation of project.

Command prompt(Windows)
C:\xxx> java -cp "target\[artifactId]-[version].jar;lib\*" ^
org.springframework.batch.core.launch.support.CommandLineJobRunner ^
META-INF/jobs/job01.xml job01
Bash(Unix, Linux, …​)
$ java -cp 'target/[artifactId]-[version].jar;lib/*' \
org.springframework.batch.core.launch.support.CommandLineJobRunner \
META-INF/jobs/job01.xml job01
Necessity to handle exit code returned by java command

In the actual system, rather than issuing a java command directly when issuing a job from the job scheduler, It is common to start by inserting shell script for starting java.

This is for setting the environment variables before starting the java command and for handling the exit code of the java command. It is recommended that Handling of the exit code of the java command should always be done for the following reasons.

  • The normal exit code of the java command is 0 and abnormal is 1. The job scheduler judges the success / failure of the job within the range of the exit code. Depending on the settings of the job scheduler, it judges as 'Normal end' irrespective of the fact that the java commandended abnormally.

  • The exit code that can be handled by OS and job scheduler has finite range.

    • It is important to define the range of the exit code to be used by the user according to the specifications of the OS and job scheduler.

    • Generally, it is in the range of 0 to 255 which is defined by the POSIX standards.

      • In {batch 5 _ shortname}, it is set to return the normal exit code as 0 or otherwise, 255.

An example of handling exit code is shown below.

Example of handling exit code
#!/bin/bash

# ..omitted.

java -cp ...
RETURN_CODE=$?
if [ $RETURN_CODE = 1 ]; then
   return 255
else
   return $RETURN_CODE
fi

3.2. Creation of chunk model job

3.2.1. Overview

How to create chunk model job is explained. Refer to Spring Batch architecture for the architecture of chunk model.

The components of chunk model job is explained here.

3.2.1.1. Components

The components of chunk model job are shown below. Implement 1 job by combining these components in job Bean definition file.

Components of chunk model job
Sr. No. Name Role Mandatory settings Mandatory implementation

1

ItemReader

Interface to fetch data from various resources.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

2

ItemProcessor

Interface for processing data from input to output.
The user implements this interface whenever required and implements business logic.

3

ItemWriter

Interface for the output of data to various resources.
An interface paired with ItemReader.
Since implementation for flat files and database is provided by Spring Batch,
there is no need for the user to create it.

The points in this table are as follows.

  • If the data is to be only transferred from input resource to output resource in a simple way, it can be implemented only by setting.

  • ItemProcessor should be implemented whenever required.

Hereafter, how to implement the job using these components, is explained.

3.2.2. How to use

How to implement chunk model job is explained in the following order here.

3.2.2.1. Job configuration

Define a way to combine the elements that constitutes chunk model job in the Bean definition file. An example is shown below and the relation between components is explained.

Example of Bean definition file (Chunk model)
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:mybatis="http://mybatis.org/schema/mybatis-spring"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
             http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
             http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
             http://mybatis.org/schema/mybatis-spring http://mybatis.org/schema/mybatis-spring.xsd">

    <!-- (1) -->
    <import resource="classpath:META-INF/spring/job-base-context.xml"/>

    <!-- (2) -->
    <context:annotation-config/>

    <!-- (3) -->
    <context:component-scan
        base-package="org.terasoluna.batch.functionaltest.app.common" />

    <!-- (4) -->
    <mybatis:scan
        base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
        factory-ref="jobSqlSessionFactory"/>

    <!-- (5) -->
    <bean id="reader"
          class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
          p:queryId="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository.findAll"
          p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

    <!-- (6) -->
    <!-- Item Processor -->
    <!-- Item Processor in order that based on the Bean defined by the annotations, not defined here -->

    <!-- (7) -->
    <bean id="writer"
          class="org.springframework.batch.item.file.FlatFileItemWriter"
          scope="step"
          p:resource="file:#{jobParameters['outputFile']}">
        <property name="lineAggregator">
            <bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
                <property name="fieldExtractor">
                    <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
                          p:names="customerId,customerName,customerAddress,customerTel,chargeBranchId"/>
                </property>
            </bean>
        </property>
    </bean>

    <!-- (8) -->
    <batch:job id="jobCustomerList01" job-repository="jobRepository"> <!-- (9) -->
        <batch:step id="jobCustomerList01.step01"> <!-- (10) -->
            <batch:tasklet transaction-manager="jobTransactionManager"> <!-- (11) -->
                <batch:chunk reader="reader"
                             processor="processor"
                             writer="writer"
                             commit-interval="10" /> <!-- (12) -->
            </batch:tasklet>
        </batch:step>
    </batch:job>
</beans>
Configuration of ItemProcessor implementation class
@Component("processor") // (6)
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
  // omitted.
}
Sr. No. Explanation

(1)

Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x.

(2)

Enable Bean definition using annotation. Use it with (3) when implementing ItemProcessor, Listener etc.

(3)

Set base package of component scan target. When defining Bean using annotation, use it with (2).

(4)

MyBatis-Spring settings.
For details of MyBatis-Spring settings, refer Database access.

(5)

ItemReader configuration.
For the details of ItemReader, refer to Database access and File access.

(6)

ItemProcessor can be defined by annotation in (2), (3), so there is no need to define in the Bean definition file.

(7)

ItemWriter configuration.
For the details of ItemWriter, refer to Database access and File access.

(8)

Job configuration.
id attribute must be unique for all the jobs included in 1 batch application.

(9)

JobRepository configuration.
The value set in the job-repository attribute should be fixed to jobRepository unless there is a special reason.
This will allow all the jobs to be managed by 1 JobRepository. Resolve Bean definition of jobRepository by (1).

(10)

Step configuration.
Although it is not necessary to use a unique id attribute for all the jobs in 1 batch application, a unique id is used for enabling easy tracking at the time of failure occurrence.
A format of [step+serial number] is used for id attribute specified in (8) unless for a specific reason to use a different format.

(11)

Tasklet configuration.
The value set in the transaction-manager attribute should be fixed to jobTransactionManager unless there is a special reason.
This will allow the transaction to be managed for each commit-interval of (12). For details, refer to Transaction control.
Resolve Bean definition of jobTransactionManager by (1).

(12)

Chunk model job configuration.
Specify Bean ID of ItemReader and ItemWriter defined in the previous section, in respective reader and writer attributes.
Specify Bean ID of implementation class of ItemProcessor, in processor attribute.
Set input data count per chunk in commit-interval attribute.

Tuning of commit-interval

commit-interval is the performance tuning point in chunk model job.

In the above example, 10 records are used however, exact count differs with the characteristics of available machine resource and job. In case of a job that processes data by accessing multiple resources, the process throughput may reach to 100 records from 10 records. If input/output resource is of 1:1 correspondence and there is a job of transferring data, then the process throughput may increase to 5000 records or even to 10000 records.

Temporarily set commit-interval` to 100 records at the time of implementing the job, and then perform tuning of each job as per the result of performance measurement.

3.2.2.2. Implementation of components

How to implement mainly ItemProcessor is explained here.

Refer to the following for other components.

3.2.2.2.1. Implementation of ItemProcessor

How to implement ItemProcessor is explained.

ItemProcessor is responsible for creating 1 record data for the output resource based on the 1 record data fetched from the input resource as shown in the interface below. In other words, ItemProcessor is where business logic for 1 record data is implemented.

ItemProcessor interface
public interface ItemProcessor<I, O> {
    O process(I item) throws Exception;
}

The interface indicating I and O can be of same type or of different type as shown below. Same type means modifying input data partially. Different type means to generate output data based on the input data.

Example of implementation of ItemProcessor(Input/Output is of same type)
@Component
public class AmountUpdateItemProcessor implements
        ItemProcessor<SalesPlanDetail, SalesPlanDetail> {

    @Override
    public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
        item.setAmount(new BigDecimal("1000"));
        return item;
    }
}
Example of implementation of ItemProcessor(Input/Output is of different type)
@Component
public class UpdateItemFromDBProcessor implements
        ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {

    @Inject
    CustomerRepository customerRepository;

    @Override
    public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
        Customer customer = customerRepository.findOne(readItem.getCustomerId());

        SalesPlanDetail writeItem = new SalesPlanDetail();
        writeItem.setBranchId(customer.getChargeBranchId());
        writeItem.setYear(readItem.getYear());
        writeItem.setMonth(readItem.getMonth());
        writeItem.setCustomerId(readItem.getCustomerId());
        writeItem.setAmount(readItem.getAmount());
        return writeItem;
    }
}
Explanation of return of null from ItemProcessor

Return of null from ItemProcessor means the data is not passed to the subsequent process (Writer). In other words, the data is filtered. This can be effectively used to validate the input data. For detail, refer to Input check.

To increase process throughput of ItemProcessor

As shown in the previous implementation example, the implementation class of ItemProcessor should access resources such as DB and files. Since ItemProcessor is executed for each record of input data, even if there is small I/O, large I/O occurs in the entire job, so it is important to suppress I/O as much as possible for increasing process throughput.

One method is to store the required data in memory in advance by utilizing Listener to be mentioned later and implement most of the processing in ItemProcessor so that it completes between CPU/ memory. However, since it consumes a large amount of memory per job, its not that anything can be stored in the memory. The data to be stored in memory based on I/O frequency and data size should be studied.

This point is introduced even in Input/Output of data.

Use multiple ItemProcessors at the same time

If a general ItemProcessor is provided to apply to each job, it can be implemented by using CompositeItemProcessor provided by Spring Batch and linking it.

Linking of multiple ItemProcessor by CompositeItemProcessor
<bean id="processor"
      class="org.springframework.batch.item.support.CompositeItemProcessor">
    <property name="delegates">
        <list>
            <ref bean="commonItemProcessor"/>
            <ref bean="businessLogicItemProcessor"/>
        </list>
    </property>
</bean>

Note that it is processed in the order specified in the delegates attribute.

3.3. Creation of tasklet model job

3.3.1. Overview

How to create tasklet model job is explained. Refer to Spring Batch architecture for the architecture of tasklet model.

3.3.1.1. Components

Tasklet model job does not register multiple components. It only implements org.springframework.batch.core.step.tasklet.Tasklet and sets it in Bean definition. ItemReader and ItemWriter which are components of the chunk model can also be used as components as the advanced implementation means.

3.3.2. HowToUse

How to implement tasklet model job is explained in the following order here.

3.3.2.1. Job configuration

Define tasklet model job in Bean definition file. An example is shown below.

Example of Bean definition file (Tasklet model)
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:batch="http://www.springframework.org/schema/batch"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
             http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
             http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd">

    <!-- (1) -->
    <import resource="classpath:META-INF/spring/job-base-context.xml"/>

    <!-- (2) -->
    <context:annotation-config/>

    <!-- (3) -->
    <context:component-scan
          base-package="org.terasoluna.batch.functionaltest.app.common"/>

    <!-- (4) -->
    <batch:job id="simpleJob" job-repository="jobRepository"> <!-- (5) -->
        <batch:step id="simpleJob.step01"> <!-- (6) -->
            <batch:tasklet transaction-manager="jobTransactionManager"
                           ref="simpleJobTasklet"/> <!-- (7) -->
        </batch:step>
    </batch:job>

</beans>
Example of tasklet implementation class
package org.terasoluna.batch.functionaltest.app.common;

@Component // (3)
public class SimpleJobTasklet implements Tasklet {
  // omitted.
}
S. No. Explanation

(1)

Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x.

(2)

Enable Bean definition using annotation. Use it with (3).

(3)

Set base package of component-scan target. Use it with (2).
In the tasklet model, Bean is defined by annotation however, Bean definition of tasklet implementation class is not required in XML.

(4)

Job configuration.
id attribute must be unique for all the jobs included in 1 batch application.

(5)

JobRepository configuration.
The value set in the job-repository attribute should be fixed to jobRepository unless there is a special reason.
This will allow all the jobs to be managed in one JobRepository. Resolve Bean definition of jobRepository by (1).

(6)

Step configuration.
Although it is not necessary to use a unique id attribute for all the jobs in 1 batch application, a unique id is used for enabling easy tracking at the time of failure occurrence.
A format of [step+serial number] is used for id attribute specified in (4) unless for a specific reason to use a different format.

(7)

Tasklet configuration.
The value set in the transaction-manager attribute should be fixed to jobTransactionManager unless there is a special reason.
This will manage the processes of the entire tasklet in one transaction. For details, refer to Transaction control.
Resolve Bean definition of jobTransactionManager by (1).

Also, ref attribute specifies a Bean ID of Tasklet implementation class to be resolved by (3).
SimpleJobTasklet, the tasklet implementation class name should be simpleJobTasklet with the first letter in lower case.

Bean name when using annotation

Bean name when using @Component annotation is generated through org.springframework.context.annotation.AnnotationBeanNameGenerator. Refer to Javadoc of this class when you want to confirm the naming rules.

3.3.2.2. Implementation of tasklet

First, understand the overview with simple implementation, then proceed to implementation using the components of the chunk model.

It is explained in the following order.

3.3.2.3. Implementation of simple tasklet

Basic points are explained through tasklet implementation only for log output.

Example of simple tasklet implementation class
package org.terasoluna.batch.functionaltest.app.common;

// omitted.

@Component
public class SimpleJobTasklet implements Tasklet { // (1)

    private static final Logger logger =
            LoggerFactory.getLogger(SimpleJobTasklet.class);

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {  // (2)
        logger.info("called tasklet."); // (3)
        return RepeatStatus.FINISHED; // (4)
    }
}
Sr. No. Explanation

(1)

Implement org.springframework.batch.core.step.tasklet.Tasklet interface using implements.

(2)

Implement execute method to be defined by Tasklet interface. Arguments StepContribution, ChunkContext are used however, they are not explained here.

(3)

Implement any process. INFO log is output here.

(4)

Return whether or not the tasklet process is completed.
Always specify as return RepeatStatus.FINISHED;.

3.3.2.4. Implementation of tasklet using the components of chunk model

Spring Batch does not mention using various components of chunk model during tasklet implementation. In TERASOLUNA Batch 5.x, you may select this depending on the following situations.

  • When multiple resources are combined and processed, it is difficult to be as per the chunk model format

  • When processes are implemented at various places in the chunk model, tasklet model is better to understand the overall image easily

  • When recovery is made simple and you want to use batch commit of tasklet model instead of intermediate commit of chunk model

Note that, processing units should also be considered to implement Tasklet by using components of chunk model. Following 3 patterns can be considered as units of output records.

Units and features of output records
Output records Features

1 record

Since data is input, processed and output one by one for each record, processing images is easy.
It must be noted that performance deterioration is likely to occur due to frequent I/O in case of large amount of data.

All records

Data is input and processed one by one for each record and stored in the memory, all records are output together in the end.
Data consistency can be ensured and performance can be improved in case of small amount of data. However, it must be noted that high load is likely to be applied on resources (CPU, memory) in case of large amount of data.

Fixed records

Data is input and processed one by one for each record and stored in the memory, data is output when a certain number of records are reached.
Performance improvement is anticipated by efficiently processing large amount of data with certain resources (CPU, memory).
Also, since the data is processed for a fixed number of records, intermediate commit can also be employed by implementing transaction control. However, it must be noted that, processed and unprocessed data are likely to exist together in the recovery if the job has terminated abnormally, in case of intermediate commit method.

The tasklet implementation that uses ItemReader and ItemWriter which are the components of the chunk model is explained below.

The implementation example shows processing data one by one for each record.

Tasklet implementation example that uses the components of chunk model
@Component()
@Scope("step") // (1)
public class SalesPlanChunkTranTask implements Tasklet {

    @Inject
    @Named("detailCSVReader") // (2)
    ItemStreamReader<SalesPlanDetail> itemReader; // (3)

    @Inject
    SalesPlanDetailRepository repository; // (4)

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {

        SalesPlanDetail item;

        try {
            itemReader.open(chunkContext.getStepContext().getStepExecution()
                    .getExecutionContext()); // (5)

            while ((item = itemReader.read()) != null) { // (6)

                // do some processes.

                repository.create(item); // (7)
            }
        } finally {
            itemReader.close(); // (8)
        }
        return RepeatStatus.FINISHED;
    }
}
Bean definition example 1
<!-- omitted -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>

<context:annotation-config/>

<context:component-scan
    base-package="org.terasoluna.batch.functionaltest.app.plan" />
<context:component-scan
    base-package="org.terasoluna.batch.functionaltest.ch05.transaction.component" />

<!-- (9) -->
<mybatis:scan
    base-package="org.terasoluna.batch.functionaltest.app.repository.plan"
    factory-ref="jobSqlSessionFactory"/>

<!-- (10) -->
<bean id="detailCSVReader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}">
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
                      p:names="branchId,year,month,customerId,amount"/>
            </property>
            <property name="fieldSetMapper">
                <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
                      p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
            </property>
        </bean>
    </property>
</bean>

<!-- (11) -->
<batch:job id="createSalesPlanChunkTranTask" job-repository="jobRepository">
    <batch:step id="createSalesPlanChunkTranTask.step01">
        <batch:tasklet transaction-manager="jobTransactionManager"
                       ref="salesPlanChunkTranTask"/>
    </batch:step>
</batch:job>
Sr. No. Explanation

(1)

Set the same step scope as the Bean scope of ItemReader to be used in this class.

(2)

Access input resources (flat files in this example) through ItemReader.
Specify Bean name as detailCSVReader but it is optional for clarity purpose.

(3)

Define the type as ItemStreamReader that is sub-interface of ItemReader.
This is because it is necessary to open/close the resource of (5), (8). It is supplemented later.

(4)

Access output resources (database in this example) through Mapper of MyBatis.
Mapper is directly used for the sake of simplicity. There is no need to always use ItemWriter. Of course, MyBatisBatchItemWriter can be used.

(5)

Open input resource.

(6)

Loop all input resources sequentially.
ItemReader#read returns null when it reads all the input data and reaches the end.

(7)

Output to database.

(8)

Resource should be closed without fail.
Exception handling should be implemented. When an exception occurs, the transactions of the entire tasklet are rolled-backed, stack trace of exception is output and the job terminates abnormally.

(9)

MyBatis-Spring settings.
For details of MyBatis-Spring settings, refer Database access.

(10)

To enter from file, add Bean definition of FlatFileItemReader. The details are not explained here.

(11)

Since all the components are resolved by annotation,
it is same as Implementation of simple tasklet.

On unification of scope

The scope of tasklet implementation class and Bean to be Injected should have the same scope.

For example, if FlatFileItemReader receives an input file path from an argument, the Bean scope should be step. In this case, the scope of tasklet implementation class should also be step.

If the scope of tasklet implementation class is set to singleton temporarily, after instantiating the tasklet implementation class at the time of generating ApplicationContext at application startup if it tries to Inject by resolving the instance of FlatFileItemReader, FlatFileItemReader will be in the step scope, however will not exist yet. because it is to be generated at the time of step execution. As the result, the tasklet implementation class cannot be instantiated and fails to generate ApplicationContext.

Regarding the type of field assigned with @Inject

Any one of the following type depending on the implementation class to be used.

  • ItemReader/ItemWriter

    • Used when there is no need to open/close the target resource.

  • ItemSteamReader/ItemStreamWriter

    • Used when there is a need to open/close the target resource.

It should be decided which type to use after confirming javadoc. Typical examples are shown below.

In case of FlatFileItemReader/Writer

handle by ItemSteamReader/ItemStreamWriter

In case of MyBatisCursorItemReader

handle by ItemStreamReader

In case of MyBatisBatchItemWriter

handle by ItemWriter

The implementation example imitates a chunk model to process a certain number of records

Tasklet implementation example 2 that uses the components of chunk model
@Component
@Scope("step")
public class SalesPerformanceTasklet implements Tasklet {


    @Inject
    ItemStreamReader<SalesPerformanceDetail> reader;

    @Inject
    ItemWriter<SalesPerformanceDetail> writer; // (1)

    int chunkSize = 10; // (2)

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {

        try {
            reader.open(chunkContext.getStepContext().getStepExecution()
                    .getExecutionContext());

            List<SalesPerformanceDetail> items = new ArrayList<>(chunkSize); // (2)
            SalesPerformanceDetail item = null;
            do {
                // Pseudo operation of ItemReader
                for (int i = 0; i < chunkSize; i++) { // (3)
                    item = reader.read();
                    if (item == null) {
                        break;
                    }
                    // Pseudo operation of ItemProcessor
                    // do some processes.

                    items.add(item);
                }

                // Pseudo operation of ItemWriter
                if (!items.isEmpty()) {
                    writer.write(items); // (4)
                    items.clear();
                }
            } while (item != null);
        } finally {
            try {
                reader.close();
            } catch (Exception e) {
                // do nothing.
            }
        }

        return RepeatStatus.FINISHED;
    }
}
Bean definition example 2
<!-- omitted -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>

<context:annotation-config/>
<context:component-scan
    base-package="org.terasoluna.batch.functionaltest.app.common,
        org.terasoluna.batch.functionaltest.app.performance,
        org.terasoluna.batch.functionaltest.ch06.exceptionhandling"/>
<mybatis:scan
    base-package="org.terasoluna.batch.functionaltest.app.repository.performance"
    factory-ref="jobSqlSessionFactory"/>

<bean id="detailCSVReader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}">
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
                      p:names="branchId,year,month,customerId,amount"/>
            </property>
            <property name="fieldSetMapper">
                <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
                      p:targetType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail"/>
            </property>
        </bean>
    </property>
</bean>

<!-- (1) -->
<bean id="detailWriter"
      class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="org.terasoluna.batch.functionaltest.app.repository.performance.SalesPerformanceDetailRepository.create"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>


<batch:job id="jobSalesPerfTasklet" job-repository="jobRepository">
    <batch:step id="jobSalesPerfTasklet.step01">
        <batch:tasklet ref="salesPerformanceTasklet"
                       transaction-manager="jobTransactionManager"/>
    </batch:step>
</batch:job>
Sr. No. Explanation

(1)

Use MyBatisBatchItemWriter as the implementation of ItemWriter.

(2)

ItemWriter outputs a fixed number of records collectively.
It processes and output 10 records each.

(3)

As per the behavior of chunk model,
it should be read→process→read→process→…​→write.

(4)

Output through ItemWriter collectively.

Decide each time whether to use the implementation class of ItemReader or ItemWriter. For file access, the implementation class of ItemReader and ItemWriter can be used. For other than this such as database access, there is no need to use compulsorily. It can be used to improve performance.

3.4. How to choose chunk model or tasklet model

Here, how to choose chunk model and tasklet model is explained by organizing each feature. Refer to the following chapters which are explained in detail appropriately.

Understand the following contents as examples of concepts without any constraints or recommendations. Refer to it while creating a job depending on the characteristics of the users and systems.

The main differences between the chunk model and the tasklet model are given below.

Comparison of chunk model and tasklet model.
Item Chunk Tasklet

Components

It consists of 3 components mainly ItemReader, ItemProcessor and ItemWriter.

It is consolidated in one Tasklet.

Transaction

A certain number of records are processed by issuing intermediate commit. Batch commit cannot be done.
It can be processed by specific machine resources regardless of the data count.
If an error occurs in the midway, then unprocessed data and processed data will get mixed.

The data is entirely processed by batch commit. There is a need for the user to implement intermediate commit.
If the data to be processed is large, machine resources may get exhausted.
If an error occurs in the midway, only the unprocessed data is rolled back.

Restart

It can be restarted based on the record count.

It cannot be restarted based on the record count.

Based on this, we will introduce some examples of using each one as follows.

To make recovery as simple as possible

When the job having error, is to be recovered by only re-running the target job, tasklet model can be chooseed to make recovery simple.
In chunk model, it should be dealt by returning the processed data to the state before executing the job and by creating a job to process only the unprocessed data.

To consolidate the process contents

When you want to prioritize the outlook of job such as 1 job in 1 class, tasklet can be chooseed.

To process large data stably

When performing batch process of 10 million records, consider to use chunk model in case the record count that influences the resources is the target. It means stabilizing the process by intermediate commit. Even in tasklet model, intermediate commit can be used, but it is simpler to implement in chunk model.

To restart based on the record count for the recovery after error

When batch window is difficult and you want to resume from error data onwards, chunk model should be chooseed to use restart based on the record count provided by Spring Batch. This eliminates the need to create that mechanism for each job.

Chunk model and tasklet model are basically used in combination.
It is not necessary to implement only one model in all jobs in the batch system.
Use one model based on the characteristics of jobs of the entire system and use the other model in accordance with the situation.

For example, in most cases it is to choose a tasklet model if there is a margin in the number of processesing records and processing time. In a very small number of cases, choosing a chunk model for jobs that process large numbers of records.

4. Running a job

4.1. Synchronous job

4.1.1. Overview

Synchronous job is explained. Synchronous job is the execution method of launching a new process through shell by job scheduler and returning the execution result of the job to the caller.

overview of sync job
Overview of synchronous job
sequence of sync job
Sequence of synchronous job

The usage method of this function is same in the chunk model as well as tasklet model.

4.1.2. How to use

How to running a job by CommandLineJobRunner is explained.

Refer to Create project for building and executing the application. Refer to Job parameters for how to specify and use job parameters. Some explanation given in the above reference and in this section overlap however, the elements of synchronous job are mainly explained.

4.1.2.1. How to run

In TERASOLUNA Batch 5.x, run the synchronous job using CommandLineJobRunner provided by Spring Batch. Start CommandLineJobRunner by issuing java command as shown below.

CommandLineJobRunner syntax
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <options> <jobIdentifier> <jobParameters>
Items to be specified by the arguments
Items to be specified Explanation Required

jobPath

Bean definition file path where the settings of the job to be run are described. Specify by relative path from classpath.

options

Specify various options (stop, restart etc.) at the time of launching.

jobIdentifier

Specify job name in Bean definition or job run ID after job execution as the job identifier. Normally, specify job name. Specify job run ID only when specifying stop and restart options.

jobParameters

Specify job arguments. Specify in key=value format.

The execution example when only the required items are specified, is shown below.

Execution example 1 of CommandLineJobRunner
$ java -cp 'target/[artifactId]-[version].jar:lib/*' \ # (1)
    org.springframework.batch.core.launch.support.CommandLineJobRunner \ # (2)
    META-INF/jobs/job01.xml job01 # (3)
Settings of Bean definition(Abstract)
<batch:job id="job01" job-repository="jobRepository"> <!-- (3) -->
    <batch:step id="job01.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="employeeReader"
                         processor="employeeProcessor"
                         writer="employeeWriter" commit-interval="10" />
        </batch:tasklet>
    </batch:step>
</batch:job>
Items list of setting contents
Sr. No. Explanation

(1)

Specify the batch application jar and dependent jar in classpath at the time of executing java command. Here, it is specified by command arguments however, environment variables can also be used.

(2)

Specify CommandLineJobRunner with FQCN in the class to be run.

(3)

Pass the run arguments along the CommandLineJobRunner. Here, 2 job names are specified as jobPath and jobIdentifier.

Execution example when launch parameters are specified as the optional items, is shown below.

Execution example 2 of CommandLineJobRunner
$ java -cp 'target/[artifactId]-[version].jar:lib/*' \
    org.springframework.batch.core.launch.support.CommandLineJobRunner \
    META-INF/jobs/setupJob.xml setupJob target=server1 outputFile=/tmp/result.csv # (1)
Items list of setting contents
Sr. No. Explanation

(1)

target=server1 and outputFile=/tmp/result.csv are specified as job running parameters.

4.1.2.2. Options

Supplement the options indicated in CommandLineJobRunner syntax.

In CommandLineJobRunner, the following 4 launch options can be used. Here, only the overview of each option is explained.

-restart

Restarts the failed job. Refer to Reprocessing for the details.

-stop

Stops a running job. Refer to Job management for the details.

-abandon

Abandons a stopped job. The abandoned job cannot be restarted. In TERASOLUNA Batch 5.x, there is no case of using this option, hence it is not explained.

-next

Runs the job executed once in the past, again. However, in TERASOLUNA Batch 5.x, this option is not used.
In TERASOLUNA Batch 5.x, it is for avoiding the restriction "Running the job by the same parameter is recognized as the same job and the same job can be executed only once" that is given by default in Spring Batch.
The details are explained in regarding parameter conversion class.
For using this option, implementation class of JobParametersIncrementer interface is required, it is not set inTERASOLUNA Batch 5.x.
Therefore, when this option is specified and launched, an error occurs because the required Bean definition does not exist.

4.2. Job parameters

4.2.1. Overview

This section explains about using the job parameter (hereafter referred to as 'parameter').

The usage method of this function is same in the chunk model as well as tasklet model.

A parameter is used to flexibly switch the operation of the job according to the execution environment and execution timing as shown below.

  • File path of process target

  • System operation date and time

The following explanation is about assigning parameters.

The specified parameters can be referred in Bean definition or in Java under Spring management.

4.2.2. How to use

4.2.2.1. Regarding parameter conversion class

In Spring Batch, the received parameters are processed in the following sequence.

  1. The implementation class of JobParametersConverter convert to JobParameters.

  2. Refer to the parameters from JobParameters in Bean definition and Java under Spring management.

Regarding implementation class of parameter conversion class

Multiple implementation classes of the above mentioned JobParametersConverter are provided. The features of each class are shown below.

  • DefaultJobParametersConverter

    • It can specify the data type of parameters(4 types; String, Long, Date, Double).

  • JsrJobParametersConverter

    • It cannot specify the data type of parameters (Only String).

    • It assigns ID (RUN_ID) that identifies job execution to parameter with the name jsr_batch_run_id automatically.

      • It increments the RUN_ID each time the job is executed. Since it uses SEQUENCE (name is JOB_SEQ) of the database for incrementing, the name does not overlap.

      • In Spring Batch, runnaing the job by the same parameters is identified as the same job and the same job can be executed only once. Whereas, adding a unique value to the parameter name jsr_batch_run_id will recognize it as a separate job. Refer to Spring Batch architecture for details.

In Spring Batch, when the implementation class of JobParametersConverter to be used in Bean definition, is not specified, DefaultJobParametersConverter is used.
However, in TERASOLUNA Batch 5.x, DefaultJobParametersConverter is not used due to the following reasons.

  • It is common to run one job by the same parameter at different timing.

  • It is possible to specify the time stamp of the start time and manage them as different jobs, but it is complicated to specify job parameters only for that purpose.

  • DefaultJobParametersConverter can specify data types for parameters, but handling becomes complicated when type conversion fails.

In TERASOLUNA Batch 5.x, by using JsrJobParametersConverter, RUN_ID is automatically assigned without the user knowledge. By this, the same job is handled as a different job in Spring Batch as seen by the user.

About setting of parameter conversion class

In TERASOLUNA Batch 5.x, it is set in advance so as to use JsrJobParametersConverter in launch-context.xml.
Therefore, when TERASOLUNA Batch 5.x is used with the recommended setting, there is no need to set JobParametersConverter.

META-INF\spring\launch-context.xml
<bean id="jobParametersConverter"
      class="org.springframework.batch.core.jsr.JsrJobParametersConverter"
      c:dataSource-ref="adminDataSource" />

<bean id="jobOperator"
      class="org.springframework.batch.core.launch.support.SimpleJobOperator"
      p:jobRepository-ref="jobRepository"
      p:jobRegistry-ref="jobRegistry"
      p:jobExplorer-ref="jobExplorer"
      p:jobParametersConverter-ref="jobParametersConverter"
      p:jobLauncher-ref="jobLauncher" />

The following description assumes that JsrJobParametersConverter is used.

4.2.2.2. Assign from command-line arguments

Firstly, how to assign from the most basic command-line arguments, is explained.

Assignment of parameters

Command-line arguments are enumerated in the <Parameter name>=<Value> format after 3rd argument of CommandLineJobRunner.

The number and length of parameters are not restricted in Spring Batch and {batch 5 _ shortname}. However, there are restrictions in the length of command arguments in the OS.
Therefore, when a large number of arguments is required, the method of Redirect from file to standard input and Using parameters and properties together should be used.

Example of setting parameters as command-line arguments
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID param1=abc outputFileName=/tmp/result.csv
Refer to parameters

Parameters can be referred in Bean definition or in Java as shown below.

  • Refer in Bean definition

    • It can be referred by #{jobParameters['xxx']}

  • Refer in Java

    • It can be referred by @Value("#{jobParameters['xxx']}")

The scope of the Bean that refers to JobParameters should be Step scope

When referring to JobParameters, the scope of the Bean to be referred should be set to Step scope. This is for using the mechanism of late binding of Spring Batch when JobParameters is to be referred.

As its name implies, late binding is setting of the delayed value. ApplicationContext of Spring Framework generates an instance of ApplicationContext after resolving the properties of various Beans by default. Spring Batch does not resolve the property at the time of generating an instance of ApplicationContext. It has a function to resolve the property when various Beans are required. This is what the word Delay means. With this function, after generating and executing ApplicationContext required for executing the Spring Batch itself, it is possible to alter the behavior of various Beans according to parameters.

Step scope is a unique scope of Spring Batch and a new instance is generated for each Step execution. The value with late binding can be resolved by using SpEL expression in Bean definition.

@StepScope annotation cannot be used for specifying Step scope

In Spring Batch, @StepScope is provided as the annotation that specifies Step scope. However, this is an annotation that can only be used in JavaConfig.

Therefore, specify the Step scope in TERASOLUNA Batch 5.x by any one of the following methods.

  1. In Bean definition, assign scope="step" to Bean.

  2. In Java, assign @Scope("step") to class.

Example of referring to the parameter assigned by the command-line arguments in Bean definition
<!-- (1) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}">  <!-- (2) -->
    <property name="lineMapper">
        <!-- omitted settings -->
    </property>
</bean>
Items list of setting contents
Sr. No. Explanation

(1)

Specify scope as scope attribute in bean tag.

(2)

Specify the parameter to be referred.

Example of referring to the parameter assigned by the command-line arguments in Java
@Component
@Scope("step")  // (1)
public class ParamRefInJavaTasklet implements Tasklet {

    /**
     * Holds a String type value
     */
    @Value("#{jobParameters['str']}")  // (2)
    private String str;

    // omitted execute()
}
Items list of setting contents
Sr. No. Explanation

(1)

Specify scope by assigning @Scope annotation in class.

(2)

Specify the parameter to be referred by using @Value annotation.

4.2.2.3. Redirect from file to standard input

How to redirect from file to standard input is explained.

Creation of file for defining parameters

Define the parameters in the files as follows.

params.txt
param1=abc
outputFile=/tmp/result.csv
Redirect the files wherein parameters are defined to standard input

Redirect the files wherein parameters are defined as command-line arguments.

Execution method
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID < params.txt
Refer to parameters

How to refer to the parameters is same as the Assign from command-line arguments method.

4.2.2.4. Set the default value of parameter

When parameters are optional, default values can be set in the following format.

  • #{jobParameters['Parameter name'] ?: Default value}

However, in the item where the value is set using parameters, the default values can also differ with the environment and execution timing same as the parameters.

Firstly, how to hardcode the default values in source code is explained. However, there are many cases where it is better to use Using parameters and properties together, so refer them also.

Refer to the parameter wherein default value is set

When the relevant parameter is not set, the value set as the default value is referred.

Example of referring to the parameter assigned by the command-line arguments in Bean definition
<!-- (1) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParamaters[inputFile] ?: /input/sample.csv}">  <!-- (2) -->
    <property name="lineMapper">
        // omitted settings
    </property>
</bean>
Items list of setting contents
Sr. No. Explanation

(1)

Specify the scope as scope attribute in the bean tag.

(2)

Specify the parameter to be referred.
/input/sample.csv is set as the default value.

Example of referring to the parameter assigned by the command-line arguments in Java
@Component
@Scope("step")  // (1)
public class ParamRefInJavaTasklet implements Tasklet {

    /**
     * Holds a String type value
     */
    @Value("#{jobParameters['str'] ?: 'xyz'}")  // (2)
    private String str;

    // omitted execute()
}
Items list of setting contents
Sr. No. Explanation

(1)

Specify the scope by assigning @Scope annotation in class.

(2)

Specify the parameter to be referred by using @Value annotation.
xyz is set as the default value.

4.2.2.5. Validation of parameters

Validation of the parameters is required at job launch in order to prevent operation errors or unintended behavior.
Validation of parameters can be implemented by using the JobParametersValidator provided by Spring Batch.

Since parameters are referred at various places such as ItemReader/ItemProcessor/ItemWriter, validation is performed immediately after the job is launched.

There are two ways to verify the validity of a parameter, and it differs with the degree of complexity of the verification.

  • Simple validation

    • Application example

      • Verify that the required parameters are set

      • Verify that the unspecified parameters are not set

    • Validator to be used

      • DefaultJobParametersValidator provided by Spring Batch

  • Complex validation

    • Application example

      • Numerical value range verification and complex verification such as correlation check between parameters

      • Verification that cannot be done by DefaultJobParametersValidator provided by Spring Batch

    • Validator to be used

      • Class wherein JobParametersValidator is implemented independently

How to verify the validity of Simple validation and Complex validation is explained respectively.

4.2.2.5.1. Simple validation

Spring Batch provides DefaultJobParametersValidator as the default implementation of JobParametersValidator.
This validator can verify the following as per the settings.

  • Required parameters should be set

  • Parameters other than required or optional should not be specified

Definition example is shown as follows.

Definition of validation that uses DefaultJobParametersValidator
<!-- (1) -->
<bean id="jobParametersValidator"
      class="org.springframework.batch.core.job.DefaultJobParametersValidator">
  <property name="requiredKeys">  <!-- (2) -->
    <list>
        <value>jsr_batch_run_id</value>  <!-- (3) -->
        <value>inputFileName</value>
        <value>outputFileName</value>
    </list>
  </property>
  <property name="optionalKeys">  <!-- (4) -->
    <list>
        <value>param1</value>
        <value>param2</value>
    </list>
  </property>
</bean>

<batch:job id="jobUseDefaultJobParametersValidator" job-repository="jobRepository">
  <batch:step id="jobUseDefaultJobParametersValidator.step01">
    <batch:tasklet ref="sampleTasklet" transaction-manager="jobTransactionManager"/>
  </batch:step>
  <batch:validator ref="jobParametersValidator"/>  <!-- (5) -->
</batch:job>
Items list of setting contents
Sr. No. Explanation

(1)

Define Bean for DefaultJobParametersValidator.

(2)

Set the required parameters to property requiredKeys.
Multiple parameter names of the required parameters can be specified using list tag.

(3)

Set jsr_batch_run_id to the required parameters.
In TERASOLUNA Batch 5.x, this setting is mandatory when using DefaultJobParametersValidator.
The reason for making the setting mandatory is explained later.

(4)

Set optional parameters to property optionalKeys.
Multiple parameter names of the optional parameters can be specified using list tag.

(5)

Apply the validator to the job using validator tag in the job tag.

Required parameters that cannot be omitted in TERASOLUNA Batch 5.x

JsrJobParametersConverter is used for parameter conversion in {batch 5 _ shortname}, so the following parameters are always set.

  • jsr_batch_run_id

Therefore, jsr_batch_run_id should be included in the requiredKeys.
Refer to Regarding parameter conversion class for detailed explanation.

Example of parameter definition
<bean id="jobParametersValidator"
      class="org.springframework.batch.core.job.DefaultJobParametersValidator">
  <property name="requiredKeys">
    <list>
        <value>jsr_batch_run_id</value>  <!-- mandatory -->
        <value>inputFileName</value>
        <value>outputFileName</value>
    </list>
  </property>
  <property name="optionalKeys">
    <list>
        <value>param1</value>
        <value>param2</value>
    </list>
  </property>
</bean>
OK case and NG case when DefaultJobParametersValidator is used

An example when the verification result is OK and NG are shown to understand the verification possible conditions in DefaultJobParametersValidator.

DefaultJobParametersValidator definition example
<bean id="jobParametersValidator"
    class="org.springframework.batch.core.job.DefaultJobParametersValidator"
    p:requiredKeys="outputFileName"
    p:optionalKeys="param1"/>
NG case1
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID param1=aaa

NG as the required parameter outputFile is not set.

NG case 2
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID outputFileName=/tmp/result.csv param2=aaa

NG as the parameter param2 which is not specified for either the required parameter or the optional parameter is set.

OK case 1
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID param1=aaa outputFileName=/tmp/result.csv

OK as the parameters specified as required and optional are set.

OK case 2
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID fileoutputFilename=/tmp/result.csv

OK as the required parameters are set and there is no need to set optional parameters.

4.2.2.5.2. Complex validation

Implementing JobParametersValidator interface independently helps in verifying the parameters as per requirements.

Implement JobParametersValidator class as follows.

  • Implement JobParametersValidator class and override validate method

  • Implement validate method as follows

    • Fetch each parameter from JobParameters and verify

      • If the verification result is OK, there is no need to perform any operation

      • If verification result is NG, throw JobParametersInvalidException

Implementation example of JobParametersValidator class is shown. In this case, it is verified that the length of the string specified by str is less than or equal to the number specified by num.

Implementation example of JobParametersValidator interface
public class ComplexJobParametersValidator implements JobParametersValidator {  // (1)
    @Override
    public void validate(JobParameters parameters) throws JobParametersInvalidException {
        Map<String, JobParameter> params = parameters.getParameters();  // (2)

        String str = params.get("str").getValue().toString();  // (3)
        int num = Integer.parseInt(params.get("num").getValue().toString()); // (4)

        if(str.length() > num){
            throw new JobParametersInvalidException(
            "The str must be less than or equal to num. [str:"
                    + str + "][num:" + num + "]");  // (5)
        }
    }
}
Items list of setting contents
Sr. No. Explanation

(1)

Implement JobParametersValidator class and override validate method.

(2)

Receive the parameters as arguments in JobParameters type.
By setting parameters.getParameters(), it is easier to refer the parameters by fetching them in Map format.

(3)

Get parameters by specifying key.

(4)

Convert parameters to int type. When handling parameters of other than String type, they should be appropriately converted.

(5)

Validation result is NG when the string length of the parameter str exceeds the value of parameter num.

Job definition example
<batch:job id="jobUseComplexJobParametersValidator" job-repository="jobRepository">
    <batch:step id="jobUseComplexJobParametersValidator.step01">
        <batch:tasklet ref="sampleTasklet" transaction-manager="jobTransactionManager"/>
    </batch:step>
    <batch:validator>  <!-- (1) -->
        <bean class="org.terasoluna.batch.functionaltest.ch04.jobparameter.ComplexJobParametersValidator"/>
    </batch:validator>
</batch:job>
Items list of setting contents
Sr. No. Explanation

(1)

Apply validator in the job by using validator tag in the job tag.

Regarding validation of parameters at asynchronous start

By the asynchronous start method (DB polling and Web container), it is possible to verify the parameters at the job launch in the same way, however, it is desirable to verify them before launching the job at the following timing.

  • DB polling

    • Before INSERTing to job request table

  • Web container

    • At the time of calling Controller (assign @Validated)

In case of asynchronous start, since it is necessary to confirm the result separately, errors such as parameter settings should be responded quickly and job requests should be rejected.

For validation in this case, there is no need to use JobParametersValidator. The function to INSERT to the job request table and the controller in the Web container mostly should not depend on Spring Batch and it is better to avoid depending on Spring Batch since only JobParametersValidator is used.

4.2.3. How to extend

4.2.3.1. Using parameters and properties together

Spring Framework based on Spring Batch is equipped with the property management function to enable it to handle the values set in the environment variables and property files. For details, refer to Property management of TERASOLUNA Server 5.x Development Guideline.

By combining properties and parameters, it is possible to overwrite some parameters after making common settings for most jobs in the property file.

About when parameters and propertis are resolved

As mentioned above, parameters and properties are different components that provide the function.
Spring Batch has a function of parameter management and Spring Framework has a function of property management.
This difference appears in the description method.

  • In case of function possessed by Spring Batch

    • #{jobParamaters[xxx]}

  • In case of function possessed by Spring Framework

    • @Value("${xxx}")

The timing of resolving each value is different.

  • In case of function possessed by Spring Batch

    • It is set when the job is executed after generating Application Context.

  • In case of function possessed by Spring Framework

    • It is set at the time of generating Application Context.

Therefore, the parameter value is given priority by Spring Batch.
Note that since the application is effective when they are combined together, both of them should be treated individually

How to set by combining properties and parameters, is explained.

In addition to the setting by environment variables, when additional settings is done by command-line arguments

In addition to the setting by environment variables, how to set the parameters using command-line arguments, is explained.
It is possible to refer to it in the same manner as Bean definition.

Example of setting parameters by command-line arguments in addition to environment variables
# Set environment variables
$ export env1=aaa
$ export env2=bbb

# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID param3=ccc outputFile=/tmp/result.csv
Example of referring environment variables and parameters in Java
@Value("${env1}")  // (1)
private String param1;

@Value("${env2}")  // (1)
private String param2;

private String param3;

@Value("#{jobParameters['param3']")  // (2)
public void setParam3(String param3) {
    this.param3 = param3;
}
Items list of setting contents
Sr. No. Explanation

(1)

Specify the environment variables to be referred by using @Value annotation.
The format for reference is ${Environment variable name}.

(2)

Specify the parameters to be referred by using @Value annotation.
The format for reference is #{jobParameters['Parameter name'].

Example when environment variables are default
# Set environment variables
$ export env1=aaa

# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
    JobDefined.xml JOBID param1=bbb outputFile=/tmp/result.csv
Example of referring parameters by setting default values for environment variables in Java
@Value("#{jobParameters['param1'] ?: '${env1}'}")  // (1)
public void setParam1(String param1) {
    this.param1 = param1;
}
Items list of setting contents
Sr. No. Explanation

(1)

Specify the parameters to be referred by using @Value annotation by setting default values in environment variables.
When parameters are not set, the value of environment variables are set.

How to set incorrect default values

When the following is defined and param1 is not set by command-line arguments, note that null is set in param1 irrespective of the fact that you want to set env1 value.

Setting method example of incorrect default value
@Value("${env1}")
private String param1;

@Value("#{jobParameters['param1']}")
public void setParam1(String param1) {
  this.param1 = param1;
}

4.3. Asynchronous execution (DB polling)

4.3.1. Overview

Running a job using DB polling is explained.

The usage method of this function is same in the chunk model as well as tasklet model.

4.3.1.1. What is asynchronous execution by using DB polling?

A dedicated table which registers jobs to be executed asynchronously (hereafter referred to as Job-request-table) is monitored periodically and job is asynchronously executed based on the registered information.
In TERASOLUNA Batch 5.x, a module which monitors the table and starts the job is defined with the name asynchronous batch daemon. Asynchronous batch daemon runs as a single Java process and executes by assigning threads in the process for each job.

4.3.1.1.1. Functions offered by TERASOLUNA Batch 5.x

TERASOLUNA Batch 5.x offers following functions as Asynchronous execution (DB polling).

List of asynchronous execution (DB polling) functions
Function Description

Asynchronous batch daemon function

A function which permanently executes Job-request-table polling function

Job-request-table polling function

A function which asynchronously executes the job based on information registered in the Job-request-table.
It also offers a table definition of Job-request-table.

Usage premise

Only job requests are managed in Job-request-table. Execution status and results of requested job is entrusted to JobRepository. It is assumed that job status is managed through these two factors.

Further, if in-memory database is used in JobRepository, JobRepository is cleared after terminating asynchronous batch daemon and job execution status and results cannot be referred. Hence, it is assumed that a database that is ensured to be persistent is used in JobRepository.

Using in-memory database

When job execution results success or failure can be obtained without referring JobRepository, in-memory database can be used.
When long term continuous operations are performed in in-memory database, a large quantity of memory resources are likely to get consumed resulting in adverse effect on job execution.
In other words, in-memory database is not suitable for long term continuous operations and should be restarted periodically.
However, if it is to be used for long term continuous operations, maintenance work like deleting data periodically from JobRepository is necessary.
In case of a restart, if initialization is enabled, it gets recreated at the time of restart. Hence, maintenance is not required. For initialization, refer Database related settings.

4.3.1.1.2. Usage scene

A few scenes which use asynchronous execution (DB polling).

List of application scenes
Usage scene Description

Delayed processing

When it is not necessary to complete the operation immediately in coordination with online processing and the operation which takes time to process is to be extracted as a job.

Continuous execution of jobs with short processing time

When continuous processing is done for a few seconds or a few tens of seconds for 1 job.
It is possible to avoid compression of resources by start and stop of Java process for 1 job, by using asynchronous execution (DB polling). Further, since it leads to omission of start and end processing, it is possible to reduce execution time of the job.

Aggregation of large number of jobs

Same as continuous execution of jobs with short processing time.

Points to choose asynchronous execution(DB polling) instead of asynchronous execution (Web container)

Points to choose asynchronous execution(DB polling) instead of "Asynchronous execution (Web container)" are shown below.

  • A hurdle in the introduction of WebAP server in batch processing

  • Consider only database while ensuring availability

    • Alternatively, since the access is concentrated in the database, scale is not likely to be like asynchronous execution (Web container).

Reasons not to use Spring Batch Integration

The same function can be implemented by using Spring Batch Integration.
However, when Spring Batch Integration is used, it is necessary to understand and fetch technical elements including the elements other than that of asynchronous execution.
Accordingly, application of Spring Batch Integration is deferred in order to avoid difficulty in understanding / use / customization of this function.

Precautions in asynchronous execution (DB polling)

When a large number of super short batches which are less than several seconds for 1 job are to be executed, database including JobRepository is accessed every time. Since performance degradation can occur at this point of time, mass processing of super short batches is not suitable for asynchronous execution (DB polling). This point must be adequately reviewed while using this function to check whether target performance is met.

4.3.2. Architecture

4.3.2.1. Processing sequence of DB polling

Processing sequence of DB polling is explained.

sequence of DB polling
Processing sequence diagram of DB polling
  1. Launch AsyncBatchDaemon from sh etc.

  2. AsyncBatchDaemon reads all Bean definition files which defines the jobs at the startup.

  3. AsyncBatchDaemon starts TaskScheduler for polling at regular intervals.

    • TaskScheduler starts a specific process at regular interval.

  4. TaskScheduler starts JobRequestPollTask (a process which performs polling of Job-request-table).

  5. JobRequestPollTask fetches a record for which the polling status is "not executed" (INIT), from Job-request-table.

    • Fetch a fixed number of records collectively. Default is 3 records.

    • When the target record does not exist, perform polling at regular intervals. Default is 5 seconds interval.

  6. JobRequestPollTask allocates jobs to thread and executes them based on information of records.

  7. JobRequestPollTask updates polling status of the Job-request-table to "polled" (POLLED).

    • When number of synchronous execution jobs is achieved, the record which cannot be activated from the fetched records is discarded and the record is fetched again at the time of next polling process.

  8. Job assigned to the thread run a job with JobOperator.

  9. Fetch job execution ID of executed jobs (Job execution id).

  10. JobRequestPollTask updates the polling status of the Job-request-table to "Executed" (EXECUTED) based on job execution ID fetched at the time of job execution.

Supplement of processing sequence

Spring Batch reference shows that asynchronous execution can be implemented by setting AsyncTaskExecutor in JobLauncher. However, when this method is adopted, AsyncTaskExecutor cannot detect the state wherein job execution cannot be performed. This issue occurs when there is no thread assigned to the job and it is likely to lead to following events.

  • Even though the job cannot be executed, it tries to run the job and continues to perform unnecessary operation.

  • The job does not run in the polling sequence but appears to be started randomly on the Job-request-table depending on the time when the thread is free.

The processing sequence described earlier is used in order to avoid this phenomenon.

4.3.2.2. About the table to be polled

Explanation is given about table which performs polling in asynchronous execution (DB polling).

Following database objects are necessary.

  • Job-request-table (Required)

  • Job sequence (Required for some database products)

    • It is necessary when database does not support auto-numbering of columns.

4.3.2.2.1. Job-request-table structure

PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown. For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.

  1. Regarding character string stored in the job request table

Similar to meta data table, job request table column provides a DDL which explicitly sets character data type in character count definition.

batch_job_request (In case of PostgreSQL)
Column Name Data type Constraint Description

job_seq_id

bigserial

(Use bigint to define a separate sequence)

NOT NULL
PRIMARY KEY

A number to determine the sequence of jobs to be executed at the time of polling.
Use auto-numbering function of database.

job_name

varchar(100)

NOT NULL

Job name to be executed.
Required parameters for job execution.

job_parameter

varchar(200)

-

Parameters to be passed to jobs to be executed.

Single parameter format is same as synchronous execution, however, when multiple parameters are to be specified, each parameter must be separated by a comma (see below) unlike blank delimiters of synchronous execution.

{Parameter name}={parameter value},{Parameter name}={Parameter value}…​

job_execution_id

bigint

-

ID to be paid out at the time of job execution.
Refer JobRepository using ID as a key.

polling_status

varchar(10)

NOT NULL

Polling process status.
INIT : Not executed
POLLED: Polled
EXECUTED : Job executed

create_date

TIMESTAMP

NOT NULL

Date and time when the record of the job request is registered.

update_date

TIMESTAMP

-

Date and time when the record of job request is updated.

DDL is as below.

CREATE TABLE IF NOT EXISTS batch_job_request (
    job_seq_id bigserial PRIMARY KEY,
    job_name varchar(100) NOT NULL,
    job_parameter varchar(200),
    job_execution_id bigint,
    polling_status varchar(10) NOT NULL,
    create_date timestamp NOT NULL,
    update_date timestamp
);
4.3.2.2.2. Job request sequence structure

When the database does not support auto-numbering of database columns, numbering according to sequence is required.
A PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown.
For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.

DDL is as below.

CREATE SEQUENCE batch_job_request_seq MAXVALUE 9223372036854775807 NO CYCLE;

A job request sequence is not defined in DDL included in jar of TERASOLUNA Batch 5.x, for databases supporting auto-numbering of columns. When you want to change maximum value in the sequence, it is preferable to define the job request sequence besides changing data type of job_seq_id from auto-numbering definition to numeric data type (In case of PostgreSQL, from bigserial to bigint).

4.3.2.2.3. Transition pattern of polling status (polling_status)

Transition pattern of polling status is shown in the table below.

Transition pattern list of polling status
Transition source Transition destination Description

INIT

INIT

When the number of synchronous executions has been achieved and execution of job is denied, status remains unchanged.
It acts as a record for polling at the time of next polling.

INIT

POLLED

Transition is done when the job is successfully started.
Status when the job is running.

POLLED

EXECUTED

Transition occurs when job execution is completed.

4.3.2.2.4. Job request fetch SQL

Number to be fetched by job request fetch SQL is restricted in order to fetch job request for number of synchronously executed jobs.
Job request fetch SQL varies depending on the database product and version to be used. Hence, it may not be possible to handle with SQL provided by TERASOLUNA Batch 5.x.
In that case, SQLMap of BatchJobRequestMapper.xml should be redefined using Customising Job-request-table as a reference.
For SQL offered, refer BatchJobRequestMapper.xml included in jar of TERASOLUNA Batch 5.x.

4.3.2.3. About job running

Running method of job is explained.

Job is run by start method of JobOperator offered by Spring Batch in Job-request-table polling function of TERASOLUNA Batch 5.x.

With TERASOLUNA Batch 5.x, guidelines explain the restart of jobs started by asynchronous execution (DB polling) from the command line. Hence, JobOperator also contains startup methods like restart etc besides start, however, only start method is used.

Arguments of start method
jobName

Set the value registered in job_name of Job-request-table.

jobParametrers

Set the value registered in job_parameters of Job-request-table.

4.3.2.4. When abnormality is detected in DB polling process.

Explanation is given for when an abnormality is detected in DB polling process.

4.3.2.4.1. Database connection failure

Describe behaviour for the processing performed at the time of failure occurrence.

When records of Job-request-table are fetched
  • JobRequestPollTask results in an error, however, JobRequestPollTask is executed again in next polling.

While changing the polling status from INIT to POLLED
  • JobRequestPollTask terminates with an error prior to executing job by JobOperator. Polling status remains unchanged as INIT.

  • In the polling process performed after connection failure recovery, the job becomes a target for execution as there is no change in the Job-request-table and the job is executed at the next polling.

While changing polling status from POLLED to EXECUTED
  • JobRequestPollTask terminates with an error since the job execution ID cannot be updated in the Job-request-table. Polling status remains unchanged as POLLED.

  • It is out of the scope for the polling process to be performed after connection failure recovery and the job at the time of failure is not executed.

  • Since a job execution ID cannot be identified from a Job-request-table, final status of the job is determined from log or JobRepository and re-execute the job as a process of recovery when required.

Even if an exception occurs in JobRequestPollTask, it is not restored immediately. Reason is given below.

  1. Since JobRequestPollTask is started at regular intervals, auto-restoration is possible (not immediate) by delegating the operation to JobRequestPollTask.

  2. It is very rare to be able to recover after retrying immediately at the time of failure occurrence, in addition, it is likely to generate load due to attempt of retry.

4.3.2.4.2. Abnormal termination of asynchronous batch daemon process

When a process of asynchronous batch daemon terminates abnormally, transaction of the job being executed is rolled back implicitly.
State of the polling status is same as status at the time of database connection failure.

4.3.2.5. Stopping DB polling process

Asynchronous batch daemon (AsyncBatchDaemon) stops by generation of a file. After confirming that the file has been generated, make the polling process idle, wait as long as possible to job being started and then stop the process.

4.3.2.6. About application configuration specific to asynchronous execution

Configuration specific to asynchronous execution is explained.

4.3.2.6.1. ApplicationContext configuration

Asynchronous batch daemon reads async-batch-daemon.xml dedicated to asynchronous execution as ApplicationContext. Configuration below is added besides launch-context.xml used in synchronous execution as well.

Asynchronous execution settings

A Bean necessary for asynchronous execution like JobRequestPollTask etc. is defined.

Job registration settings

Job executed as an asynchronous execution registers by org.springframework.batch.core.configuration.support.AutomaticJobRegistrar. Context for each job is modularized by using AutomaticJobRegistrar. When modularization is done, it does not pose an issue even of Bean ID used between the jobs is duplicated.

What is modularization

Modularization is a hierarchical structure of "Common definition - Definition of each job" and the Bean defined in each job belongs to an independent context between jobs. If a reference to a Bean which is not defined in each job definition exists, it refers to a Bean defined in common definition.

4.3.2.6.2. Bean definition structure

Bean definition of a job can be same as Bean definition of synchronous execution. However, following precautions must be taken.

  • When job is to be registered by AutomaticJobRegistrar, Bean ID of the job is an identifier, and hence should not be duplicated.

  • It is also desirable to not to duplicate Bean ID of step.

    • Only the job ID should be uniquely designed by designing naming rules of Bean ID as {Job ID}.{Step ID}.

Import of job-base-context.xml in the Bean definition of job varies for synchronous and asynchronous execution.

  • In synchronous execution, launch-context.xml is imported from job-base-context.xml.

  • In asynchronous execution, launch-context.xml is not imported from job-base-context.xml. Alternatively, import launch-context.xml from async-batch-daemon.xml which AsyncBatchDaemon loads.

This is because various Beans required for starting Spring Batch need not be instantiated for each job. Only one bean should be created in common definition (async-batch-daemon.xml) which acts as a parent for each job, from various Beans required for starting Spring Batch.

4.3.3. How to use

4.3.3.1. Various settings
4.3.3.1.1. Settings for polling process

Use batch-application.properties for settings required for asynchronous execution.

batch-application.properties
#(1)
# Admin DataSource settings.
admin.jdbc.driver=org.postgresql.Driver
admin.jdbc.url=jdbc:postgresql://localhost:5432/postgres
admin.jdbc.username=postgres
admin.jdbc.password=postgres

# TERASOLUNA AsyncBatchDaemon settings.
# (2)
async-batch-daemon.schema.script=classpath:org/terasoluna/batch/async/db/schema-postgresql.sql
# (3)
async-batch-daemon.job-concurrency-num=3
# (4)
async-batch-daemon.polling-interval=5000
# (5)
async-batch-daemon.polling-initial-delay=1000
# (6)
async-batch-daemon.polling-stop-file-path=/tmp/end-async-batch-daemon
Setup details item list
Sr. No. Description

(1)

Connection settings for database wherein Job-request-table is stored.
JobRepository settings are used by default.

(2)

A path for DDL which defines Job-request-table.
It is auto-generated when Job-request-table does not exist at the time of starting asynchronous batch daemon.
This is primarily a test function and execution can be set by data-source.initialize.enabled of+ batch-application.properties.
For detailed definition, refer <jdbc:initialize-database>> in async-batch-daemon.xml.

(3)

Setting for records which are fetched collectively at the time of polling. This setup value is also used as a synchronous parallel number.

(4)

Polling cycle settings. Unit is milliseconds.

(5)

Polling initial start delay time settings. Unit is milliseconds.

(6)

Exit file path settings.

Changing setup value using environment variable

Setup value of batch-application.properties can be changed by defining environment variable with same name.
When an environment variable is set, it is prioritized over property value.
This happens due to Bean definition below.

Settings for launch-context.xml
<context:property-placeholder location="classpath:batch-application.properties"
        system-properties-mode="OVERRIDE"
        ignore-resource-not-found="false"
        ignore-unresolvable="true"
        order="1"/>

For details, refer How to define a property file of TERASOLUNA Server 5.x Development Guideline.

4.3.3.1.2. Job settings

Job to be executed asynchronously is set in automaticJobRegistrar of async-batch-daemon.xml.
Default settings are shown below.

async-batch-daemon.xml
<bean id="automaticJobRegistrar"
      class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
    <property name="applicationContextFactories">
        <bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
            <property name="resources">
                <list>
                    <value>classpath:/META-INF/jobs/**/*.xml</value>  <!-- (1) -->
                </list>
            </property>
        </bean>
    </property>
    <property name="jobLoader">
        <bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
              p:jobRegistry-ref="jobRegistry" />
    </property>
</bean>
Setting details item list
Sr.No. Description

(1)

A path for Bean definition of a job executed asynchronously.

About registered jobs

For registering jobs, jobs which are designed and implemented on the premise that they are executed asynchronously should be specified. If the jobs which are not supposed to be executed asynchronously are included, exceptions may occur due to unintended references at the time of job registration.

Example of Narrowing down
<bean id="automaticJobRegistrar"
      class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
    <property name="applicationContextFactories">
        <bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
            <property name="resources">
                <list>
                    <!-- For the async directory and below -->
                    <value>classpath:/META-INF/jobs/aysnc/**/*.xml</value>
                    <!-- For a specific job -->
                    <value>classpath:/META-INF/jobs/CASE100/SpecialJob.xml</value>
                </list>
            </property>
        </bean>
    </property>
    <property name="jobLoader">
        <bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
            p:jobRegistry-ref="jobRegistry" />
    </property>
</bean>
Input value verification for job parameters

JobPollingTask does not validate the records obtained from Job-request-table.
Hence, the job name and job parameter must be verified for the table registration.
If the job name is incorrect, job is not detected even if it has started and an exception occurs.
If the job parameter is incorrect, an erroneous operation is performed even if the job has started.
Only job parameters can be verified once the job is started. For verification of job parameters, refer "Validity verification of parameters".

Job design considerations

As a characteristic of asynchronous execution (DB polling), the same job can be executed in parallel. It is necessary to prevent the same job to create an impact when the jobs are run in parallel.

4.3.3.2. From start to end of asynchronous execution

Start and end of asynchronous batch daemon and how to register in Job-request-table are explained.

4.3.3.2.1. Start of asynchronous batch daemon

Start AsyncBatchDaemon offered by TERASOLUNA Batch 5.x.

Start of AsyncBatchDaemon
# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon

In this case, META-INF/spring/async-batch-daemon.xml is read and various Beans are generated.

Further, when async-batch-daemon.xml customised separately, it is implemented by specifying first argument and starting AsyncBatchDaemon.
Bean definition file specified in the argument must be specified as a relative path from the class path.
Note that, the second and subsequent arguments are ignored.

When customised META-INF/spring/customized-async-batch-daemon.xml is used,
# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon \
    META-INF/spring/customized-async-batch-daemon.xml

Customisation of async-batch-daemon.xml can be modified directly by changing some of the settings.
However, when significant changes are added or when multiple settings are managed in Multiple runningsdescribed later, it is easier to manage and create separate files.
It should be choosed according to user’s situation..

It is assumed that jar expressions necessary for execution are stored under dependency.

4.3.3.2.2. Job request

Register in Job-request-table by issuing SQL of INSERT statement.

In case of PostgreSQL
INSERT INTO batch_job_request(job_name,job_parameter,polling_status,create_date)
VALUES ('JOB01', 'param1=dummy,param2=100', 'INIT', current_timestamp);
4.3.3.2.3. Stopping asynchronous batch daemon

Keep exit file set in batch-application.properties.

$ touch /tmp/end-async-batch-daemon
When the exit file exists prior to starting asynchronous batch daemon

When the exit file exists prior to starting asynchronous batch daemon, asynchronous batch daemon terminates immediately. Asynchronous batch daemon must be started in the absence of exit file.

4.3.3.3. Confirm job status

Job status management is performed with JobRepository offered by Spring Batch and the job status is not managed in the Job-request-table. Job-request-table has a column of job_execution_id and job status corresponding to individual requests can be confirmed by the value stored in this column. Here, a simple example wherein SQL is issued directly and job status is confirmed is shown. For details of job status confirmation, refer "Status confirmation".

In case of PostgreSQL
SELECT job_execution_id FROM batch_job_request WHERE job_seq_id = 1;

job_execution_id
----------------
              2
(1 row)

SELECT * FROM batch_job_execution WHERE job_execution_id = 2;

job_execution_id | version | job_instance_id |       create_time       |       start_time        |        end_time         |  status   | exit_code | exit_message |
ocation
------------------+---------+-----------------+-------------------------+-------------------------+-------------------------+-----------+-----------+--------------+-
--------
              2 |       2 |               2 | 2017-02-06 20:54:02.263 | 2017-02-06 20:54:02.295 | 2017-02-06 20:54:02.428 | COMPLETED | COMPLETED |              |
(1 row)
4.3.3.4. Recovery after a job is terminated abnormally

For basic points related to recovery of a job which is terminated abnormally, refer "Re-execution of process". Here, points specific to asynchronous execution are explained.

4.3.3.4.1. Re-run

Job which is terminated abnormally is re-run by inserting it as a separate record in Job-request-table.

4.3.3.4.2. Restart

When the job which is terminated abnormally is to be restarted, it is executed as a synchronous execution job from the command line. The reason for executing from the command line is "since it is difficult to determine whether the restart is intended or whether it is an unintended duplicate execution resulting in chaotic operation."
For restart methods, refer "Job restart".

4.3.3.4.3. Termination
  1. When the process has not terminated even after exceeding the expected processing time, attempt terminating the operation from the command line. For methods of termination, refer "Job stop".

  2. When the termination is not accepted even from a command line, asynchronous batch daemon should be terminated by Stopping asynchronous batch daemon.

  3. If even an asynchronous batch daemon cannot be terminated, process of asynchronous batch daemon should be forcibly terminated.

Adequate care should be taken not to impact other jobs when an asynchronous batch daemon is being terminated.

4.3.3.5. About environment deployment

Building and deploying job is same as a synchronous execution. However, it is important to narrow down the jobs which are executed asynchronously as shown in Job settings.

4.3.3.6. Evacuation of cumulative data

If you run an asynchronous batch daemon for a long time, a huge amount of data is accumulated in JobRepository and the Job-request-table. It is necessary to clear this cumulative data for the following reasons.

  • Performance degradation when data is retrieved or updated for a large quantity of data

  • Duplication of ID due to circulation of ID numbering sequence

For evacuation of table data and resetting a sequence, refer manual for the database to be used.

List of tables and sequences for evacuation is shown below.

List for evacuation
Table/Sequence Framework offered

batch_job_request

TERASOLUNA Batch 5.x

batch_job_request_seq

batch_job_instance

Spring Batch

batch_job_exeution

batch_job_exeution_params

batch_job_exeution_context

batch_step_exeution

batch_step_exeution_context

batch_job_seq

batch_job_execution_seq

batch_step_execution_seq

Auto-numbering column sequence

Since a sequence is created automatically for an auto-numbering column, remember to include this sequence while evacuating data.

About database specific specifications

Note that Oracle uses database-specific data types in some cases, such as using CLOB for data types.

4.3.4. How to extend

4.3.4.1. Customising Job-request-table

Job-request-table can be customised by adding a column in order to change extraction conditions of fetched records. However, only BatchJobRequest can be passed as an item while issuing SQL from JobRequestPollTask.

Extension procedure by customising the Job-request-table is shown below.

  1. Customising Job-request-table

  2. Creating an extension interface of BatchJobRequestRepository interface

  3. Defining SQLMap which uses customised table

  4. Modifying Bean definition of async-batch-daemon.xml

Examples of customization are as below.

Hereafter, the extension procedure will be described for these two examples.

4.3.4.1.1. Example of controlling job execution sequence by priority column
  1. Customising Job-request-table

Add a priority column (priority) in Job-request-table.

Adding a priority column (In case of PostgreSQL)
CREATE TABLE IF NOT EXISTS batch_job_request (
    job_seq_id bigserial PRIMARY KEY,
    job_name varchar(100) NOT NULL,
    job_parameter varchar(200),
    priority int NOT NULL,
    job_execution_id bigint,
    polling_status varchar(10) NOT NULL,
    create_date timestamp NOT NULL,
    update_date timestamp
);
  1. Create extension interface of BatchJobRequestRepository interface

An interface which extends BatchJobRequestRepository interface is created.

Extension interface
// (1)
public interface CustomizedBatchJobRequestRepository extends BatchJobRequestRepository {
    // (2)
}
Extension points
Sr. No. Description

(1)

Extend BatchJobRequestRepository.

(2)

Do not add a method.

  1. Definition of SQLMap which use a customised table

Define SQL in SQLMap with group ID as a condition for extraction.

SQLMap definition (CustomizedBatchJobRequestRepository.xml)
<!-- (1) -->
<mapper namespace="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestRepository">

    <select id="find" resultType="org.terasoluna.batch.async.db.model.BatchJobRequest">
        SELECT
            job_seq_id AS jobSeqId,
            job_name AS jobName,
            job_parameter AS jobParameter,
            job_execution_id AS jobExecutionId,
            polling_status AS pollingStatus,
            create_date AS createDate,
            update_date AS updateDate
        FROM
            batch_job_request
        WHERE
            polling_status = 'INIT'
        ORDER BY
            priority ASC,   <!--(2) -->
            job_seq_id ASC
        LIMIT #{pollingRowLimit}
    </select>

    <!-- (3) -->
    <update id="updateStatus">
        UPDATE
            batch_job_request
        SET
            polling_status = #{batchJobRequest.pollingStatus},
            job_execution_id = #{batchJobRequest.jobExecutionId},
            update_date = #{batchJobRequest.updateDate}
        WHERE
            job_seq_id = #{batchJobRequest.jobSeqId}
        AND
            polling_status = #{pollingStatus}
    </update>

</mapper>
Extension points
Sr. No. Description

(1)

Set extended interface of BatchJobRequestRepositoryin namespace by FQCN.

(2)

Add priority to ORDER clause.

(3)

Do not change updated SQL.

  1. Modifying Bean definition of async-batch-daemon.xml

Set extended interface created in (2) in batchJobRequestRepository.

async-batch-daemon.xml
 <!--(1) -->
<bean id="batchJobRequestRepository"
      class="org.mybatis.spring.mapper.MapperFactoryBean"
      p:mapperInterface="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestRepository"
      p:sqlSessionFactory-ref="adminSqlSessionFactory" />
Extension points
Sr. No. Description

(1)

Set extended interface of BatchJobRequestRepository in mapperInterface property by FQCN.

4.3.4.1.2. Distributed processing by multiple processes using a group ID

Specify group ID by using environment variable while starting AsyncBatchDaemon and narrow down the target job.

  1. Customizing Job-request-table

Add group ID column (group_id) to Job-request-table.

Adding group ID column (In case of PostgreSQL)
CREATE TABLE IF NOT EXISTS batch_job_request (
    job_seq_id bigserial PRIMARY KEY,
    job_name varchar(100) NOT NULL,
    job_parameter varchar(200),
    group_id varchar(10) NOT NULL,
    job_execution_id bigint,
    polling_status varchar(10) NOT NULL,
    create_date timestamp NOT NULL,
    update_date timestamp
);
  1. Creating extended interface of BatchJobRequestRepository interface

  1. Definition of SQLMap which use customised table

Define SQL in SQLMap with the group ID as the extraction condition.

SQLMap definition (CustomizedBatchJobRequestRepository.xml)
<!-- (1) -->
<mapper namespace="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestRepository">

    <select id="find" resultType="org.terasoluna.batch.async.db.model.BatchJobRequest">
        SELECT
            job_seq_id AS jobSeqId,
            job_name AS jobName,
            job_parameter AS jobParameter,
            job_execution_id AS jobExecutionId,
            polling_status AS pollingStatus,
            create_date AS createDate,
            update_date AS updateDate
        FROM
            batch_job_request
        WHERE
            polling_status = 'INIT'
        AND
            group_id = #{groupId}  <!--(2) -->
        ORDER BY
            job_seq_id ASC
        LIMIT #{pollingRowLimit}
    </select>

    <!-- omitted -->
</mapper>
Extension points
Sr. No. Description

(1)

Set extended interface of BatchJobRequestRepository in namespace by FQCN.

(2)

Add groupId to extraction conditions.

  1. Modifying Bean definition of async-batch-daemon.xml

Set extended interface created in (2) in batchJobRequestRepository and set the group ID assigned by environment variable in jobRequestPollTask as a query parameter.

async-batch-daemon.xml
 <!--(1) -->
<bean id="batchJobRequestRepository"
      class="org.mybatis.spring.mapper.MapperFactoryBean"
      p:mapperInterface="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestRepository"
      p:sqlSessionFactory-ref="adminSqlSessionFactory" />

    <bean id="jobRequestPollTask"
          class="org.terasoluna.batch.async.db.JobRequestPollTask"
          c:transactionManager-ref="adminTransactionManager"
          c:jobOperator-ref="jobOperator"
          c:batchJobRequestRepository-ref="batchJobRequestRepository"
          c:daemonTaskExecutor-ref="daemonTaskExecutor"
          c:automaticJobRegistrar-ref="automaticJobRegistrar"
          p:optionalPollingQueryParams-ref="pollingQueryParam" /> <!-- (2) -->

   <bean id="pollingQueryParam"
         class="org.springframework.beans.factory.config.MapFactoryBean">
        <property name="sourceMap">
            <map>
                <entry key="groupId" value="${GROUP_ID}"/>  <!-- (3) -->
            </map>
        </property>
   </bean>
Extension points
Sr. No. Description

(1)

Set extended interface of BatchJobRequestRepository in mapperInterface property by FQCN..

(2)

Set Map defined in (3), in optionalPollingQueryParams property of JobRequestPollTask.

(3)

Set group ID assigned by environment variable (GROUP_ID) in group ID (groupId) of query parameter.

  1. Set group ID in environment variable and start AsyncBatchDaemon.

Starting AsyncBatchDaemon
# Set environment variables
$ export GROUP_ID=G1

# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon
4.3.4.2. Multiple runnings

Asynchronous batch daemon is run on multiple servers for the following purposes.

  • Enhanced availability

    • It suffices if an asynchronous batch job can be executed on one of the servers, and it is to eliminate the situation that the job can not be run.

  • Enhanced performance

    • When batch processing load is to be distributed across multiple servers

  • Effective use of resources

    • When a specific job is to be distributed on a server with optimal resources when a variation is observed in the server performance

An operational design must be adopted considering whether it can be used based on the viewpoints given above.

Ch04 AsyncJobWithDB MultipleActivation
Schematic diagram for multiple starts
When multiple asynchronous batch daemons fetch identical job request records

Since JobRequestPollTask performs exclusive control using optimistic locking, it can execute the job of the record fetched by asynchronous batch daemon which can update the polling status from INIT to POLLED. Other exclusive asynchronous batch daemons fetch next job request record.

4.3.5. Appendix

4.3.5.1. About modularization of job definition

Although it is briefly explained in ApplicationContext configuration, following events can be avoided by using AutomaticJobRegistrar.

  • When same BeanID (BeanName) is used, Bean is overwritten and the job shows unintended behaviour.

    • Accordingly, there is a high risk of occurrence of unintended errors.

  • Naming should be performed to make all Bean IDs in the job unique, to avoid these errors.

    • As the number of job increases, it becomes difficult to manage the same resulting in unnecessary troubles.

An event when AutomaticJobRegistrar is not used is explained. Since the contents explained here pose the issues given above, it is not used in asynchronous execution.

Job1.xml
<!-- Reader -->
<!-- (1) -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
      p:queryId="jp.terasoluna.batch.job.repository.EmployeeRepositoy.findAll"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

<!-- Writer -->
<!-- (2) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:resource="file:#{jobParameters['basedir']}/input/employee.csv">
  <property name="lineAggregator">
    <bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
      <property name="fieldExtractor">
        <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
              p:names="invoiceNo,salesDate,productId,customerId,quant,price"/>
      </property>
    </bean>
  </property>
</bean>


<!-- Job -->
<batch:job id="job1" job-repository="jobRepository">
  <batch:step id="job1.step">
    <batch:tasklet transaction-manager="transactionManager">
      <batch:chunk reader="reader" writer="writer" commit-interval="100" />
    </batch:tasklet>
  </batch:step>
</batch:job>
Job2.xml
<!-- Reader -->
<!-- (3) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['basedir']}/input/invoice.csv">
  <property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <property name="lineTokenizer">
        <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
              p:names="invoiceNo,salesDate,productId,customerId,quant,price"/>
      </property>
      <property name="fieldSetMapper" ref="invoiceFieldSetMapper"/>
    </bean>
  </property>
</bean>

<!-- Writer -->
<!-- (4) -->
<bean id="writer" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="jp.terasoluna.batch.job.repository.InvoiceRepository.create"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

<!-- Job -->
<batch:job id="job2" job-repository="jobRepository">
  <batch:step id="job2.step">
    <batch:tasklet transaction-manager="transactionManager">
      <batch:chunk reader="reader" writer="writer" commit-interval="100" />
    </batch:tasklet>
  </batch:step>
</batch:job>
Definition wherein BeanId is overwritten
<bean id="automaticJobRegistrar"
      class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
    <property name="applicationContextFactories">
        <bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
            <property name="resources">
                <list>
                    <value>classpath:/META-INF/jobs/other/async/*.xml</value>  <!-- (5) -->
                </list>
            </property>
        </bean>
    </property>
    <property name="jobLoader">
        <bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
              p:jobRegistry-ref="jobRegistry"/>
    </property>
</bean>

<bean class="org.springframework.batch.core.configuration.support.JobRegistryBeanPostProcessor"
    p:jobRegistry-ref="jobRegistry" />

<import resource="classpath:/META-INF/jobs/async/*.xml" />   <!-- (6) -->
List of setup points
Sr. No. Description

(1)

In Job1, ItemReader which reads from the database is defined by a Bean ID - reader.

(2)

In Job1, ItemWriter which writes in a file is defined by a Bean ID - writer.

(3)

In Job2, ItemReader which reads from the file is defined by a Bean ID - reader.

(4)

In Job2, ItemWriter which writes to a database is defined by a Bean ID - writer.

(5)

AutomaticJobRegistrar is set so as to read job definitions other than target jobs.

(6)

Use import of Spring and enable reading of target job definition.

In this case, if Job1.xml and Job2.xml are read in the sequence, reader and writer to be defined by Job1.xml will be overwritten by Job2.xml definition.
As a result, when Job1 is executed, reader and writer of Job2 are used and intended processing cannot be performed.

4.4. Asynchronous execution (Web container)

4.4.1. Overview

A method to execute the job asynchronously in Web container is explained.

The usage method of this function is same in the chunk model as well as tasklet model.

What is Asynchronous execution of jobs by Web container

Web application that contains a job is deployed in a Web container and the job is executed based on information of sent request.
Since one thread is allocated for each job execution and operation is run in parallel, it can be executed independent of processes for other jobs and requests.

Function offered

TERASOLUNA Batch 5.x does not offer implementation for asynchronous execution (Web container).
Only methods of implementation will be provided in this guideline.
This is because the start timing of the Web application is various such as HTTP / SOAP / MQ, and hence it is determined that the implementation should be appropriately done by the user.

Usage premise
  • A Web container is required besides the application.

  • Besides implementation of job, required Web application and client are separately implemented according to the operation requirements.

  • Execution status and results of the job is entrusted to JobRepository. Further, a permanently residing database is used instead of in-memory database to enable execution status and results of job to be referred from JobRepository even after stopping Web container.

Difference with asynchronous execution (DB polling)

On the architecture front, immediacy at the time of asynchronous execution and presence or absence of request management table are different.
"Asynchronous execution (DB polling)" performs asynchronous execution of multiple jobs registered in the request management table.
On the other hand, this function does not require request management table and accepts asynchronous execution on the Web container instead.
It is suitable for a short batch which requires immediacy till the start of the operation in order to execute the operation immediately by sending a Web request.

4.4.2. Architecture

Asynchronous jobs by using this method are operated as applications (war) deployed on the Web container, however, the job itself runs asynchronously (another thread) from the request processing of Web container.

sequence of async web
Process sequence diagram of asynchronous execution (Web container)
Running a job
  1. Web client requests Web container to execute the job.

  2. JobController asks JobOperator of Spring Batch to start the execution of the job.

  3. Execute the job asynchronously by using ThreadPoolTaskExecutor.

  4. Return a job execution ID (job execution id) for uniquely identifying an executed target job.

  5. JobController returns a response including job execution ID for the Web client.

  6. Execute target job.

    • Job results are reflected in JobRepository.

  7. Job returns execution results. It cannot be notified directly to the client.

Confirm job execution results
  1. Web client sends job execution ID and JobController to Web container.

  2. JobController asks JobExplorer for execution results of job by using a job execution ID.

  3. JobExplorer returns job execution results.

  4. JobController returns a response for Web client.

    • Set Job execution ID in the response.

After receiving a request using Web container, operation is synchronised with the request processing till job execution ID payout, however subsequent job execution is performed asynchronously in a thread pool different from that of Web container.
As long as the query is not sent again by sending a request, it signifies that execution status of asynchronous job cannot be detected on Web client side.

Hence, the request should be sent once at the time of "running a job" on the Web client side during one job execution. When "confirmation of results" is necessary, request must be sent once again to the Web container.
Abnormality detection which looks different from first "running a job" will be explained later in About detection of abnormality occurrence at the time of running a job.

Job execution status can be checked by referring direct RDBMS, by using JobRepository and JobExplorer. For details of the function which refer to job execution status and results, refer Job management.

About handling job execution ID (job execution id)

Job execution ID generates a different sequence value for each job even though job and job parameters are identical.
Job execution ID accepted by sending a request is persisted in external RDBMS by JobRepository.
However, when this ID is lost due to failure of Web client, specifying or tracking job execution status becomes difficult.
Hence, adequate preparations must be made on Web client side to cope with loss of job execution ID like logging the job execution ID returned as a response.

4.4.2.1. About detection of abnormality occurrence at the time of running a job

After sending a job run request from Web client, abnormality detection appearance varies along with job execution ID payout.

  • Abnormality can be detected immediately by the response at the time of running a job

    • Job to be activated does not exist.

    • Invalid job parameter format.

  • After running a job, queries regarding job execution status and results for Web container are necessary

    • Job execution status

    • Job start failure due to depletion of thread pool used in asynchronous job execution

"Job running error" can be detected as an exception occurring in Spring MVC controller. Since the explanation is omitted here, refer Implementation of exception handling of TERASOLUNA Server 5.x Development Guideline described separately.

Further, input check of the request used as a job parameter is performed in the Spring MVC controller as required.
For basic implementation methods, refer Input check of TERASOLUNA Server 5.x Development Guideline.

Job start failure occurring due to depletion of thread pool cannot be captured at the time of running a job.

Job start failure due to depletion of thread pool is not generated from JobOperator, hence it must be checked separately. One of the methods of confirmation include using JobExplorer while checking execution status of job and checking whether the following conditions are satisfied.

  • Status is FAILED

  • Exception stack trace of org.springframework.core.task.TaskRejectedException is recorded in jobExecution.getExitStatus().getExitDescription().

4.4.2.2. Application configuration of asynchronous execution (Web container)

The function is same as "Asynchronous execution (DB polling)" and use async and AutomaticJobRegistrar of Spring profile as a configuration specific to asynchronous execution.

On the other hand, prior knowledge and some specific settings are required in order to use these functions asynchronously (Web container). Refer "ApplicationContext configuration".
For configuration methods of basic async profile and AutomaticJobRegistrar, "How to implement applications using asynchronous execution (Web container)" will be described later.

4.4.2.2.1. ApplicationContext configuration

As described above, multiple application modules are included as application configuration of asynchronous execution (Web container).
It is necessary to understand respective application contexts, types of Bean definitions and their relationships.

Package structure of async web
ApplicationContext configuration
BeanDefinitions structure of async web
Bean definition file configuration

ApplicationContext of batch application is incorporated in the context, in ApplicationContext during asynchronous execution (Web container).
Individual job contexts are modularised from Web context using AutomaticJobRegistrar and it acts as a sub-context of Web context.

Bean definition file which constitute respective contexts are explained.

List of Bean definition files
Sr. No. Description

(1)

Common Bean definition file.
It acts as a parent context in the application and is uniquely shared among jobs acting as sub-contexts.

(2)

Bean definition file which is always imported from job Bean definitions.
If Spring profile is async specified at the time of asynchronous execution, launch-context.xml of (1) is not read.

(3)

Bean definition file created for each job.
It is modularized by AutomaticJobRegistrar and are used as respective independent sub-contexts in the application.

(4)

It is read from DispatcherServlet.
Define the Beans unique to asynchronous execution such as AutomaticJobRegistrar which performs modularization of job Bean definition and taskExecutor which is a thread pool used in asynchronous and parallel execution of jobs.
Further, in asynchronous execution, launch-context.xml of (1) is imported directly
and uniquely shared as parent contexts.

(5)

It acts as a parent context shared within the Web application by using ContextLoaderListener.

4.4.3. How to use

Here, explanation is given using TERASOLUNA Server Framework for Java (5.x), as an implementation example of Web application.
Kindly remember that only explanation is offered and TERASOLUNA Server 5.x is not a necessary requirement of asynchronous execution (Web container).

4.4.3.1. Overview of implementation of application by asynchronous execution (Web container)

Explanation is given based on following configuration.

  • Web application project and batch application project are independent and a batch application is referred from a web application.

    • war file generated from Web application project contains jar file generated from batch application project

Implementation of asynchronous execution is performed in accordance with Architecture wherein Spring MVC controller in the Web application starts the job by using JobOperator.

About isolation of Web/batch application project

Final deliverable of application build is a war file of Web application, however, a development project should be implemented by separating Web/batch applications.
Since it is a library which can be operated by a batch application alone, it helps in identifying work boundary and library dependency besides making the development project testing easier to implement.

Web/batch development is explained now assuming the use of 2 components below.

  • Batch application project by TERASOLUNA Batch 5.x

  • Web application project by TERASOLUNA Server 5.x

For how to create a batch application project and how to implement a basic job, refer "How to create a project", "Creation of tasklet model job", "Creation of chunk model job".

Here, we will focus on starting a batch application from a Web application.

Here, explanation is given by creating a batch application project, by using Maven archetype:generate.

How to create a job project
Name Value

groupId

org.terasoluna.batch.sample

artifactId

asyncbatch

version

1.0-SNAPSHOT

package

org.terasoluna.batch.sample

A job registered from the beginning for a blank project is used for convenience of explanation.

Job used for explanation
Name Description

Job name

job01

Job parameter

param1=value1

Precautions for asynchronous execution (Web container) job design

Individual jobs are completed in a short period of time as a characteristic of asynchronous execution (Web container) and are operated in a stateless manner on the Web container.
Further, it is necessary to build a job definition with only a single step to avoid complexity and it is desirable not to define flow branching by using exit codes of step and parallel/multiple processing.

Create a Web application as a state wherein a jar file including a job implementation can be created.

Implementation of Web application

How to implement a Web application is explained by using a blank project offered by TERASOLUNA Server 5.x. For details, refer TERASOLUNA Server 5.x Development Guideline Creating a development project for Web application.

Here, similar to asynchronous execution application project, explanation is given below creating with the following names.

How to create a Web container project
Name Value

groupId

org.terasoluna.batch.sample

artifactId

asyncapp

version

1.0-SNAPSHOT

package

org.terasoluna.batch.sample

About naming of groupId

Although naming a project is optional, when a batch application as a Maven multiproject is considered as a sub-module, it is easy to manage if groupId is integrated.
Here,
groupId of both is considered as org.terasoluna.batch.sample`
.

4.4.3.2. Various settings
Include batch application as a part of Web application

Edit pom.xml and include batch application as a part of Web application.

Batch application is registered in NEXUS or Maven local repository as jar This process is not required while setting a separate project from that of Web application.
However, target to be built by Maven is a separate project and it will not be reflected while building the web application even if the batch application is modified.
It should be registered in the same repository in order to reflect the modification of batch application in the Web application.

directory structure
Directory structure
asyncapp/pom.xml
<project>
  <!-- omitted -->
  <modules>
    <module>asyncapp-domain</module>
    <module>asyncapp-env</module>
    <module>asyncapp-initdb</module>
    <module>asyncapp-web</module>
    <module>asyncapp-selenium</module>
    <module>asyncbatch</module> <!-- (1) -->
  </modules>
</project>
asyncapp/asyncbatch/pom.xml
<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.terasoluna.batch.sample</groupId> <!-- (2) -->
  <artifactId>asyncbatch</artifactId>
  <version>1.0-SNAPSHOT</version> <!-- (2) -->
  <!-- (1) -->
  <parent>
    <groupId>org.terasoluna.batch.sample</groupId>
    <artifactId>asyncapp</artifactId>
    <version>1.0-SNAPSHOT</version>
    <relativePath>../pom.xml</relativePath>
  </parent>
  <!-- omitted -->
</project>
Deleted / added contents
Sr. No. Description

(1)

Add settings for considering the Web application as a parent and batch application as a child.

(2)

Delete unnecessary description with deletion of child or sub-module.

Addition of dependent library

Add a batch application as a dependent library of Web application.

asyncapp/async-web/pom.xml
<project>
  <!-- omitted -->
  <dependencies>
  <!-- (1) -->
    <dependency>
        <groupId>${project.groupId}</groupId>
        <artifactId>asyncbatch</artifactId>
        <version>${project.version}</version>
    </dependency>
    <!-- omitted -->
  </dependencies>
  <!-- omitted -->
</project>
Details added
Sr. No. Description

(1)

Add a batch application as a dependent library of Web application.

4.4.3.3. Implementation of Web application

Here, a RESTful Web service is created as a Web application using TERASOLUNA Server 5.x Development Guideline as a reference below.

Setting for enabling Spring MVC component which is necessary for RESTful Web Service

4.4.3.3.1. Web application settings

At first, add, delete and edit various configuration files from the blank project of Web application.

For the explanation, an implementation which use RESTful Web Service as an implementation status of batch application is given.
Procedure will be same even when conventional Web application (Servlet/JSP) or SOAP is used. Read accordingly.

AppendBeanDefinitionsOnBlank
Bean definition file to be added/deleted from a blank project
Description example of asyncapp/asyncapp-web/src/main/resources/META-INF/spring/spring-mvc-rest.xml
<!-- omitted -->
<!-- (1) -->
<import resource="classpath:META-INF/spring/launch-context.xml"/>

<bean id="jsonMessageConverter"
      class="org.springframework.http.converter.json.MappingJackson2HttpMessageConverter"
      p:objectMapper-ref="objectMapper"/>

<bean id="objectMapper"
      class="org.springframework.http.converter.json.Jackson2ObjectMapperFactoryBean">
  <property name="dateFormat">
      <bean class="com.fasterxml.jackson.databind.util.StdDateFormat"/>
  </property>
</bean>

<mvc:annotation-driven>
  <mvc:message-converters register-defaults="false">
    <ref bean="jsonMessageConverter"/>
  </mvc:message-converters>
</mvc:annotation-driven>

<mvc:default-servlet-handler/>

<!-- (2) -->
<context:component-scan base-package="org.terasoluna.batch.sample.app.api"/>

<!-- (3) -->
<bean class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
    <property name="applicationContextFactories">
        <bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
            <property name="resources">
                <list>
                  <value>classpath:/META-INF/jobs/**/*.xml</value>
                </list>
            </property>
        </bean>
    </property>
    <property name="jobLoader">
        <bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
              p:jobRegistry-ref="jobRegistry"/>
    </property>
</bean>

<!-- (4) -->
<task:executor id="taskExecutor" pool-size="3" queue-capacity="10"/>

<!-- (5) -->
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"
      p:jobRepository-ref="jobRepository"
      p:taskExecutor-ref="taskExecutor"/>
<!-- omitted -->
Description example of asyncapp/asyncapp-web/src/main/webapp/WEB-INF/web.xml
<!-- omitted -->
<servlet>
    <servlet-name>restApiServlet</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <init-param>
        <param-name>contextConfigLocation</param-name>
        <!-- (6) -->
        <param-value>classpath*:META-INF/spring/spring-mvc-rest.xml</param-value>
    </init-param>
    <!-- (7) -->
    <init-param>
        <param-name>spring.profiles.active</param-name>
        <param-value>async</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
</servlet>

<servlet-mapping>
    <servlet-name>restApiServlet</servlet-name>
    <url-pattern>/api/v1/*</url-pattern>
</servlet-mapping>
<!-- omitted -->
RESTful Web Service validation example
Sr. No. Description

(1)

Import launch-context.xml which is in the batch application and incorporate required Bean definition.

(2)

Describe package for dynamically scanning the controller.

(3)

Describe a Bean definition of AutomaticJobRegistrar which dynamically loads as a child or sub context by modularizing each Bean definition file.

(4)

Define TaskExecutor which executes the job asynchronously.
Asynchronous execution can be performed by setting AsyncTaskExecutor implementation class in TaskExecutor of JobLauncher. Use ThreadPoolTaskExecutor which is one of the components of AsyncTaskExecutor implementation class.

Further, multiplicity of threads which can be operated in parallel can be specified.
In this example, 3 threads are assigned to the job execution and requests exceeding this number are queued upto 10. Queued job is in "not started" state, however REST request is considered to be successful.+ In addition, job requests that exceed the queuing limit generate `org.springframework.core.task.TaskRejectedException and job run request is rejected.

(5)

Override jobLauncher defined in launch-context.xml to enable taskExecutor of (4).

(6)

Specify spring-mvc-rest.xml described above as a Bean definition
read by DispatcherServlet.

(7)

Specify async which shows an asynchronous batch, as a profile of Spring Framework.

When async profile is not specified

In this case, a Bean defined in launch-context.xml which should be shared across Web applications is duplicated for each job.
Even in case of duplication, since the operation takes place at the functional level, it is difficult to notice an error and it may result in unexpected resource exhaustion and performance degradation. Must be specified.

Thread pool sizing

When the upper limit of thread pool is in excess, an enormous amount of jobs run in parallel resulting in deterioration of entire thread pool. Sizing should be done and appropriate upper value must be determined.
Besides thread pool of asynchronous execution, request thread of Web container and other applications working in the same enclosure must also be considered.

Further, a separate request must be sent from Web client for checking occurrence of TaskRejectException due to thread pool exhaustion and its re-execution. Hence, queue-capacity which waits for job to start must be set at the time of thread pool exhaustion.

Implementation of RESTful Web Service API

Here, "Running a job" and "Job status check" are defined as 2 examples of requests used in REST API.

REST API Definition example
Sr.No. API Path HTTP method Request/Response Message format Message details

(1)

Running a job

/api/v1/job/Job name

POST

Request

JSON

Job parameter

Response

JSON

Job execution ID
Job name
Message

(2)

Job execution status check

/api/v1/job/Job execution ID

GET

Request

N/A

N/A

Response

JSON

Job execution ID
Job name
Job execution status
Job exit code
Step execution ID
Step name
Step exit code

4.4.3.3.2. Implementation of JavaBeans used in Controller

Create following 3 classes that are returned to REST client as JSON message.

  • Job run operation JobOperationResource

  • Job execution status JobExecutionResource

  • Step execution status StepExecutionResource

These classes are implementations for reference except for job execution ID (job execution id) of JobOperationResource and implementation of field is optional.

Implementation example of job run operation information
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/JobOperationResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;

public class JobOperationResource {

    private String jobName = null;

    private String jobParams = null;

    private Long jobExecutionId = null;

    private String errorMessage = null;

    private Exception error = null;

    // Getter and setter are omitted.
}
Implementation example of job execution information
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/JobExecutionResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;

// omitted.

public class JobExecutionResource {

    private Long jobExecutionId = null;

    private String jobName = null;

    private Long stepExecutionId = null;

    private String stepName = null;

    private List<StepExecutionResource> stepExecutions = new ArrayList<>();

    private String status = null;

    private String exitStatus = null;

    private String errorMessage;

    private List<String> failureExceptions = new ArrayList<>();

    // Getter and setter are omitted.
}
Implementation example of step execution information
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/StepExecutionResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;

public class StepExecutionResource {

  private Long stepExecutionId = null;

  private String stepName = null;

  private String status = null;

  private List<String> failureExceptions = new ArrayList<>();

    // Getter and setter are omitted.
}
4.4.3.3.3. Implementation of controller

A controller of RESTful Web Service is implemented by using @RestController.
in order to simplify, JobOperator is injected in the controller and running a job and execution status are fetched. Of course, JobOperator can also be started by using Service from the controller in accordance with TERASOLUNA Server 5.x.

About job parameters that are passed at the time of running a job

The job parameter passed in the second argument of JobOperator#start() at running a job is String. When there are multiple job parameters, they should be separated by using a comma unlike CommandLineJobRunner of synchronous execution. Basically the format is as below.
{Job parameter 1}={Value 1},{Job parameter 2}={Value 2},…​

This is same as the method of specifying job parameters in "Asynchronous execution (DB polling)".

Example of implementing a controller
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/JobController.java
package org.terasoluna.batch.sample.app.api;

// omitted.

// (1)
@RequestMapping("job")
@RestController
public class JobController {

    // (2)
    @Inject
    JobOperator jobOperator;

    // (2)
    @Inject
    JobExplorer jobExplorer;

    @RequestMapping(value = "{jobName}", method = RequestMethod.POST)
    public ResponseEntity<JobOperationResource> launch(@PathVariable("jobName") String jobName,
            @RequestBody JobOperationResource requestResource) {

        JobOperationResource responseResource = new JobOperationResource();
        responseResource.setJobName(jobName);
        try {
            // (3)
            Long jobExecutionId = jobOperator.start(jobName, requestResource.getJobParams());
            responseResource.setJobExecutionId(jobExecutionId);
            return ResponseEntity.ok().body(responseResource);
        } catch (NoSuchJobException | JobInstanceAlreadyExistsException | JobParametersInvalidException e) {
            responseResource.setError(e);
            return ResponseEntity.badRequest().body(responseResource);
        }
    }

    @RequestMapping(value = "{jobExecutionId}", method = RequestMethod.GET)
    @ResponseStatus(HttpStatus.OK)
    public JobExecutionResource getJob(@PathVariable("jobExecutionId") Long jobExecutionId) {

        JobExecutionResource responseResource = new JobExecutionResource();
        responseResource.setJobExecutionId(jobExecutionId);

        // (4)
        JobExecution jobExecution = jobExplorer.getJobExecution(jobExecutionId);

        if (jobExecution == null) {
            responseResource.setErrorMessage("Job execution not found.");
        } else {
            mappingExecutionInfo(jobExecution, responseResource);
        }

        return responseResource;
    }

    private void mappingExecutionInfo(JobExecution src, JobExecutionResource dest) {
      dest.setJobName(src.getJobInstance().getJobName());
      for (StepExecution se : src.getStepExecutions()) {
          StepExecutionResource ser = new StepExecutionResource();
          ser.setStepExecutionId(se.getId());
          ser.setStepName(se.getStepName());
          ser.setStatus(se.getStatus().toString());
          for (Throwable th : se.getFailureExceptions()) {
              ser.getFailureExceptions().add(th.toString());
          }
          dest.getStepExecutions().add(ser);
      }
      dest.setStatus(src.getStatus().toString());
      dest.setExitStatus(src.getExitStatus().toString());
    }
}
Implementation of controller
Sr. No. Description

(1)

Specify @RestController. Further, when servlet mapping of web.xml is done by using @RequestMapping("job"), base path of REST API is contextName/api/v1/job/.

(2)

Describe field injections of JobOperator and JobExplorer.

(3)

Use JobOperator and start a new asynchronous job.
Receive job execution ID as a return value and return to REST client.

(4)

Use JobExplorer and fetch job execution status (JobExecution) based on job execution ID.
Return it to REST client after converting it to a pre-designed message.

4.4.3.3.4. Integration of Web/batch application module setting

Batch application module (asyncbatch) operates as a stand-alone application. Hence, batch application module (asyncbatch) consists of settings which are in conflict and overlapping with settings of Web application module (asyncapp-web). These settings must be integrated as required.

  1. Integration of log configuration file logback.xml
    When multiple Logback definition files are defined in Web/batch, they do not work appropriately.
    The contents of asyncbatch/src/main/resources/logback.xml are integrated into same file of asyncapp-env/src/main/resources/ and then the file is deleted.

  2. Data source and MyBatis configuration file are not integrated
    Definitions of data source and MyBatis configuration file are not integrated between Web/batch since the definition of application context is independent due to following relation.

    • asyncbatch module of the batch is defined in the servlet as a closed context.

    • asyncapp-domain and asyncapp-env modules of Web are defined as contexts used by entire application.

Cross-reference of data source and MyBatis settings by Web and batch modules

Since the scope of context for Web and batch modules is different, data source, MyBatis settings and Mapper interface cannot be referred especially from Web module.
Since initialization of RDBMS schema is also carried out independently based on the different settings of respective modules, adequate care must be taken not to perform unintended initialization due to mutual interference.

CSRF countermeasures specific to REST controller

When a request is sent for REST controller in the initialization settings of Web blank project, it results in a CSRF error and execution of job is rejected. Hence, explanation is given here assuming that CSRF countermeasures are disabled by the following method.

Web application created here is not published on the internet and CSRF countermeasures are disabled on the premise that REST request is not sent from a third party who can exploit CSRF as a means of attack. Please note that necessity may differ in the actual Web application depending on the operating environment.

4.4.3.3.5. Build

Build Maven command and create a war file.

$ cd asyncapp
$ ls
asyncbatch/  asyncapp-web/  pom.xml
$ mvn clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3)
[INFO] TERASOLUNA Batch Framework for Java (5.x) Blank Project
[INFO] asyncapp-web
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3) 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------

(omitted)

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3) SUCCESS [  0.226 s]
[INFO] TERASOLUNA Batch Framework for Java (5.x) Blank Project SUCCESS [  6.481s]
[INFO] asyncapp-web ....................................... SUCCESS [  5.400 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.597 s
[INFO] Finished at: 2017-02-10T22:32:43+09:00
[INFO] Final Memory: 38M/250M
[INFO] ------------------------------------------------------------------------
$
4.4.3.3.6. Deploy

Start a Web container like Tomcat and deploy war file generated in the build. Detailed process is omitted.

4.4.3.4. Job start and confirmation of execution results using REST Client

Here, curl command is used as a REST client and an asynchronous job is started.

$ curl -v \
  -H "Accept: application/json" -H "Content-type: application/json" \
  -d '{"jobParams": "param1=value1"}' \
  http://localhost:8080/asyncapp-web/api/v1/job/job01
* timeout on name lookup is not supported
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> POST /asyncapp-web/api/v1/job/job01 HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.51.0
> Accept: application/json
> Content-type: application/json
> Content-Length: 30
>
* upload completely sent off: 30 out of 30 bytes
< HTTP/1.1 200
< X-Track: 0267db93977b4552880a4704cf3e4565
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 10 Feb 2017 13:55:46 GMT
<
{"jobName":"job01","jobParams":null,"jobExecutionId":3,"error":null,"errorMessag
e":null}* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
$

From the above, it can be confirmed that job is executed with a job execution ID jobExecutionId = 3.
Subsequently, job execution results are fetched by using job execution ID.

$ curl -v http://localhost:8080/asyncapp-web/api/v1/job/3
* timeout on name lookup is not supported
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> GET /asyncapp-web/api/v1/job/3 HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200
< X-Track: 7d94bf4d383745efb20cbf37cb6a8e13
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 10 Feb 2017 14:07:44 GMT
<
{"jobExecutionId":3,"jobName":"job01","stepExecutions":[{"stepExecutionId":5,"st
epName":"job01.step01","status":"COMPLETED","failureExceptions":[]}],"status":"C
OMPLETED","exitStatus":"exitCode=COMPLETED;exitDescription=","errorMessage":null
}* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
$

Since exitCode=COMPLETED, it can be confirmed that the job is completed successfully.

When execution results of curl are to be determined by a shell script etc

In the example above, it is displayed upto the response message using REST API. When only HTTP status is to be confirmed by curl command, HTTP status can be displayed in standard output by considering curl -s URL -o /dev/null -w "%{http_code}\n".
However, since job execution ID need to analyse JSON of response body part, REST client application must be created as required.

4.4.4. How to extend

4.4.4.1. Stopping and restarting jobs

It is necessary to stop and restart asynchronous jobs from the multiple jobs that are being executed. Further, when jobs of identical names are running in parallel, it is necessary to target only those jobs with the issues. Hence, job execution to be targeted must be identified and the status of the job must be confirmed.
When this premise is met, an implementation for stopping and restarting asynchronous executions is explained here.

Further, a method to add job stopping (stop) and restarting (restart) is explained for JobController of Implementation of controller.

Job stopping and restarting can also be implemented without using JobOperator.
For details, refer Job management and identify a method suitable for this objective.
Implementation example of stop and restart
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/JobController.java
package org.terasoluna.batch.sample.app.api;

// omitted.

@RequestMapping("job")
@RestController
public class JobController {

    // omitted.

    @RequestMapping(value = "stop/{jobExecutionId}", method = RequestMethod.PUT)
    @Deprecated
    public ResponseEntity<JobOperationResource> stop(
            @PathVariable("jobExecutionId") Long jobExecutionId) {

      JobOperationResource responseResource = new JobOperationResource();
      responseResource.setJobExecutionId(jobExecutionId);
      boolean result = false;
      try {
          // (1)
          result = jobOperator.stop(jobExecutionId);
          if (!result) {
              responseResource.setErrorMessage("stop failed.");
              return ResponseEntity.badRequest().body(responseResource);
          }
          return ResponseEntity.ok().body(responseResource);
      } catch (NoSuchJobExecutionException | JobExecutionNotRunningException e) {
          responseResource.setError(e);
          return ResponseEntity.badRequest().body(responseResource);
      }
    }

    @RequestMapping(value = "restart/{jobExecutionId}",
                    method = RequestMethod.PUT)
    @Deprecated
    public ResponseEntity<JobOperationResource> restart(
            @PathVariable("jobExecutionId") Long jobExecutionId) {

        JobOperationResource responseResource = new JobOperationResource();
        responseResource.setJobExecutionId(jobExecutionId);
        try {
            // (2)
            Long id = jobOperator.restart(jobExecutionId);
            responseResource.setJobExecutionId(id);
            return ResponseEntity.ok().body(responseResource);
        } catch (JobInstanceAlreadyCompleteException |
                  NoSuchJobExecutionException | NoSuchJobException |
                  JobRestartException | JobParametersInvalidException e) {
            responseResource.setErrorMessage(e.getMessage());
            return ResponseEntity.badRequest().body(responseResource);
        }
    }

    // omitted.
}
Implementation example of stop / restart using controller
Sr. No. Description

(1)

Specify "stop" for job being executed by calling JobOperator#stop().

(2)

Re-execute from the step where the job has terminated abnormally or stopped by calling JobOperator#restart().

4.4.4.2. Multiple running

Multiple running signify that a Web container is started for multiple times and waits for respective job requests.

Execution of asynchronous jobs is controlled by external RDBMS so as to connect to each application. By sharing an external RDBMS, it is possible to wait for an asynchronous job to be started across the same enclosure or another enclosure.

Applications include load balancing and redundancy for specific jobs. However, as described in Implementation of Web application, these effects cannot be obtained easily just by starting multiple Web containers or enhancing parallel operations. Sometimes measures similar to a general Web application need to be taken in order to obtain the effect. An example is given below.

  • 1 request processing operates in a stateless manner according to the characteristics of Web application, however, asynchronous execution of batch is likely to have a reduced failure tolerance unless it is designed in combination with job start results and confirmation.
    For example, even when Web container for starting a job is made redundant, it is difficult to confirm the progress and results of the job when the job execution ID is lost after starting a job due to failure on the client side.

  • A function to distribute request destinations on the client side must be implemented and a load balancer must be introduced in order to distribute the load on multiple Web containers.

In this way, adequacy of multiple starts cannot be necessarily determined. Hence, using load balancer and reviewing a control method to send requests by Web client should be considered based on the purpose and use. A design which does not degrade the performance and fault tolerance of the asynchronous execution application is required.

4.5. Listener

4.5.1. Overview

A listener is an interface for inserting processing before and after executing a job or a step.

Since this function works differently for chunk model and tasket model, respective explanations are given.

A listener consists of multiple interfaces, respective roles are explained here. Subsequently, how to set and implement a listener is explained.

4.5.1.1. Types of listener

A lot of listener interfaces are defined in Spring Batch. All will not be explained here, however we will focus on the interface with highest usage frequency.

A listener is roughly divided into 2 types.

JobListener

An interface to insert the processing for execution of the job

StepListener

An interface to insert the processing for execution of the step

About JobListener

An interface called JobListener does not exist in Spring Batch. It is conveniently described in this guideline for the comparison with StepListener.
Java Batch(jBatch) consists of an interface called javax.batch.api.listener.JobListener, hence care should be taken at the time of implementation to avoid mistakes. Further, StepListener also consists of interface with same name but different signature (javax.batch.api.listener.StepListener), so it is necessary to take adequate precautions.

4.5.1.1.1. JobListener

JobListener interface consists of only one JobExecutionListener.

JobExecutionListener

Process is inserted prior to starting a job and after terminating a job.

JobExecutionListener interface
public interface JobExecutionListener {
  void beforeJob(JobExecution jobExecution);
  void afterJob(JobExecution jobExecution);
}
4.5.1.1.2. StepListener

StepListener interface is of multiple types as below.

StepListener

Marker interfaces of various listeners will be introduced later.

StepExecutionListener

Process is inserted prior to starting a step and after terminating a job.

StepExecutionListener interface
public interface StepExecutionListener extends StepListener {
  void beforeStep(StepExecution stepExecution);
  ExitStatus afterStep(StepExecution stepExecution);
}
ChunkListener

A process is inserted between before and after processing of one chunk and when an error occurs.

ChunkListener interface
public interface ChunkListener extends StepListener {
  static final String ROLLBACK_EXCEPTION_KEY = "sb_rollback_exception";
  void beforeChunk(ChunkContext context);
  void afterChunk(ChunkContext context);
  void afterChunkError(ChunkContext context);
}
Uses of ROLLBACK_EXCEPTION_KEY

It is used when the exception occurred is to be fetched by afterChunkError method. If an error occurs during chunk process, Spring Batch uses sb_rollback_exception key in ChunkContext to call ChunkListener after storing the exception which can be accessed as below.

Usage example
public void afterChunkError(ChunkContext context) {
    logger.error("Exception occurred while chunk. [context:{}]", context,
            context.getAttribute(ChunkListener.ROLLBACK_EXCEPTION_KEY));
}

For exception handling, refer Exception handling using ChunkListener interface

ItemReadListener

Insert a process before and after fetching 1 data record by ItemReader and when an error occurs.

ItemReadListener interface
public interface ItemReadListener<T> extends StepListener {
  void beforeRead();
  void afterRead(T item);
  void onReadError(Exception ex);
}
ItemProcessListener

Insert a process before and after processing 1 data record by ItemProcessor and when an error occurs.

ItemProcessListener interface
public interface ItemProcessListener<T, S> extends StepListener {
  void beforeProcess(T item);
  void afterProcess(T item, S result);
  void onProcessError(T item, Exception e);
}
ItemWriteListener

Insert a process before and after output of 1 chunk by ItemWriter and when an error occurs.

ItemWriteListener interface
public interface ItemWriteListener<S> extends StepListener {
  void beforeWrite(List<? extends S> items);
  void afterWrite(List<? extends S> items);
  void onWriteError(Exception exception, List<? extends S> items);
}

This guideline does not explain following listeners.

  • Retry type listener

  • Skip type listener

These listeners are intended to be used for exception handling, however, the policy of these guidelines is not to perform exception handling using these listeners. For details, refer Exception handling.

4.5.2. How to use

Explanation is given about how to implement and set a listener.

4.5.2.1. Implementation of a listener

Explanation is given about how to implement and set a listener.

  1. Implement the listener interface with implements.

  2. Implement components with method-based annotation.

The type of implementation to use will be choosed on the role of the listener. Criteria will be described later.

4.5.2.1.1. When an interface is to be implemented

Various listener interfaces are implemented by using implements. Multiple interfaces can be implemented at the same time based on requirement. Implementation example is shown below.

Implementation example for JobExecutionListener
@Component
public class JobExecutionLoggingListener implements JobExecutionListener { // (1)

    private static final Logger logger =
            LoggerFactory.getLogger(JobExecutionLoggingListener.class);

    @Override
    public void beforeJob(JobExecution jobExecution) { // (2)
        logger.info("job started. [JobName:{}]", jobExecution.getJobInstance().getJobName());
    }

    @Override
    public void afterJob(JobExecution jobExecution) { // (3)

        logger.info("job finished.[JobName:{}][ExitStatus:{}]", jobExecution.getJobInstance().getJobName(),
                jobExecution.getExitStatus().getExitCode());
    }
}
Configuration example of listener
<batch:job id="chunkJobWithListener" job-repository="jobRepository">
     <batch:step id="chunkJobWithListener.step01">
         <batch:tasklet transaction-manager="jobTransactionManager">
             <batch:chunk reader="reader" processor="processor"
                          writer="writer" commit-interval="10"/>
             <batch:listeners>
                 <batch:listener ref="loggingEachProcessInStepListener"/>
             </batch:listeners>
         </batch:tasklet>
     </batch:step>
     <batch:listeners>
         <batch:listener ref="jobExecutionLoggingListener"/> <!-- (4) -->
     </batch:listeners>
 </batch:job>
Description
Sr. No. Description

(1)

Implement JobExecutionListener using implements.

(2)

Implement beforeJob method defined by JobExecutionListener.
In this example, job start log is output.

(3)

Implement afterJob method defined by JobExecutionListener.
In this example, job end log is output.

(4)

Set the listener implemented in (1), in <listeners> tag of Bean definition.
Details of setup method are explained in Listener settings.

Listener support class

When multiple listener interfaces are set to implements, blank implementation is required to be done for the components which are not necessary for the process. Support classes wherein blank implementation is performed are provided in Spring Batch in order to simplify this operation. Please note that support classes may be used instead of interfaces, and extends is used instead of implements.

Support class
  • org.springframework.batch.core.listener.ItemListenerSupport

  • org.springframework.batch.core.listener.StepListenerSupport

4.5.2.1.2. When annotations are assigned

Annotations corresponding to various listener interfaces are assigned. Multiple annotations can also be implemented as required.

Correspondence table with listener interface
Listener interface Annotation

JobExecutionListener

@beforeJob
@afterJob

StepExecutionListener

@BeforeStep
@AfterStep

ChunkListener

@BeforeChunk
@AfterChunk
@afterChunkError

ItemReadListener

@BeforeRead
@AfterRead
@OnReadError

ItemProcessListener

@beforeProcess
@afterProcess
@onProcessError

ItemWriteListener

@BeforeWrite
@AfterWrite
@OnWriteError

These annotations work for the target scope by assigning them to the implementation method which is divided into components. Implementation example is given below.

Implementation example for ItemProcessor wherein the annotation is assigned
@Component
public class AnnotationAmountCheckProcessor implements
        ItemProcessor<SalesPlanDetail, SalesPlanDetail> {

    private static final Logger logger =
            LoggerFactory.getLogger(AnnotationAmountCheckProcessor.class);

    @Override
    public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
        if (item.getAmount().signum() == -1) {
            throw new IllegalArgumentException("amount is negative.");
        }
        return item;
    }

    // (1)
    /*
    @BeforeProcess
    public void beforeProcess(Object item) {
        logger.info("before process. [Item :{}]", item);
    }
    */

    // (2)
    @AfterProcess
    public void afterProcess(Object item, Object result) {
        logger.info("after process. [Result :{}]", result);
    }

    // (3)
    @OnProcessError
    public void onProcessError(Object item, Exception e) {
        logger.error("on process error.", e);
    }
}
Configuration example of listener
<batch:job id="chunkJobWithListenerAnnotation" job-repository="jobRepository">
    <batch:step id="chunkJobWithListenerAnnotation.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="annotationAmountCheckProcessor"
                         writer="writer" commit-interval="10"/>  <! -- (4) -->
        </batch:tasklet>
    </batch:step>
</batch:job>
Description
Sr. No. Description

(1)

When the annotation is to be used for implementation, only the annotations of the timing required for the processing should be assigned.
In this example, since no operation is required prior to processing of ItemProcess, the implementation wherein @beforeProcess is assigned, becomes unnecessary.

(2)

Implement the process to be performed after the processing of ItemProcess.
In this example, process results are output in a log.

(3)

Implement processing when an error occurs in ItemProcess.
Exception generated in this example is output in a log.

(4)

Set ItemProcess wherein the listener is implemented by using annotation in <chunk> tag.
Unlike listener interface, the listener is automatically registered even when it is not set in <listener> tag.

Constraints for the method which assigns the annotations

Any method cannot be used as a method to assign the annotation. The signature must match with the method of corresponding listener interface. This point is clearly mentioned in javadoc of respective annotations.

Precautions while implementing JobExecutionListener by an annotation

Since JobExecutionListener has a different scope than the other listeners, listener is not automatically registered in the configuration above. Hence, it is necessary to explicitly set in the <listener> tag. For details ,refer Listener settings.

Implementation of a listener to Tasklet implementation by using annotation

When a listener is implemented in Tasklet implementation by using an annotation, Note that listener does not start with the following settings.

In case of Tasklet
<batch:job id="taskletJobWithListenerAnnotation" job-repository="jobRepository">
    <batch:step id="taskletJobWithListenerAnnotation.step01">
        <batch:tasklet transaction-manager="jobTransactionManager"
                       ref="annotationSalesPlanDetailRegisterTasklet"/>
    </batch:step>
</batch:job>

In case of Tasket model, the listener interface should be used in accordance with How to choose an interface or an annotation.

4.5.2.2. Listener settings

Listeners are set by <listeners>.<listener> tag of Bean definition. Although it can be described at various locations by XML schema definition, some operations do not work as intended based on the type of interface. Set it to the following position.

Position where listener is set
<!-- for chunk mode -->
<batch:job id="chunkJob" job-repository="jobRepository">
    <batch:step id="chunkJob.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="(1)"
                         processor="(1)"
                         writer="(1)" commit-interval="10"/>
            <batch:listeners>
                <batch:listener ref="(2)"/>
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
    <batch:listeners>
        <batch:listener ref="(3)"/>
    </batch:listeners>
</batch:job>

<!-- for tasklet mode -->
<batch:job id="taskletJob" job-repository="jobRepository">
    <batch:step id="taskletJob.step01">
        <batch:tasklet transaction-manager="jobTransactionManager" ref="tasklet">
            <batch:listeners>
                <batch:listener ref="(2)"/>
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
    <batch:listeners>
        <batch:listener ref="(3)"/>
    </batch:listeners>
</batch:job>
Description of configuration value
Sr. No. Description

(1)

Set the component which includes the implementation attributing to StepListener, performed by using an annotation.
In case of an annotation, it will be inevitably set to this location.

(2)

Set listener interface implementation attributing to StepListener. In case of tasklet model, ItemReadListener, ItemProcessListener and ItemWriteListener cannot be used.

(3)

Set listener attributing to JobListener.
Either of interface or annotations must be implemented here.

4.5.2.2.1. Setting multiple listeners

Multiple listeners can be set in <batch:listeners> tag.

The sequence in which the listeners are started while registering multiple listeners is shown below.

  • ItemProcessListener implementation

    • listenerA, listenerB

  • JobExecutionListener implementation

    • listenerC, listenerD

Configuration example of multiple listeners
<batch:job id="chunkJob" job-repository="jobRepository">
    <batch:step id="chunkJob.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="pocessor"
                         writer="writer" commit-interval="10"/>
            <batch:listeners>
                <batch:listener ref="listenerA"/>
                <batch:listener ref="listenerB"/>
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
    <batch:listeners>
        <batch:listener ref="listenerC"/>
        <batch:listener ref="listenerD"/>
    </batch:listeners>
</batch:job>
Listener execution order
Listener startup sequence
  • Processing corresponding to pre-processing is started in the sequence of listener registration.

  • Processing corresponding to post-processing or error processing is started in the reverse sequence of listener registration.

4.5.2.3. How to choose an interface or an annotation

How to choose listener used a interface or listener used an annotation is explained.

Listener interface

It is used in case of cross-sectional processes which are shared across job, step and chunk.

Annotation

It is used when business logic specific process is to be performed.
As a rule, it is implemented only for ItemProcessor.

5. Input/Output of Data

5.1. Transaction control

5.1.1. Overview

In this section, transaction control in jobs will be described in the following order.

Since this function is different in usage between chunk model and tasklet model, each will be explained.

5.1.1.1. About the pattern of transaction control in general batch processing

Generally, since batch processing is processing a large number of cases, if any errors are thrown at the end of the processing and all processing need to be done again, the batch system schedule will be adversely affected.
In order to avoid this, the influence at the time of error occurrence is often localized by advancing the process while confirming the transaction for each fixed number of data within the processing of one job.
(Hereafter, we call the "intermediate commit method" as the method of commiting the transaction for every fixed number of data, and the "chunk" as the one grouping the data in the commit unit.)

The points of the intermediate commit method are summarized below.

  1. Localize the effects at the time of error occurrence.

    • Even if an error occurs, the processing till the chunk just before the error part is confirmed.

  2. Only use a certain amount of resources.

    • Regardless of whether the data to be processed is large or small, only resources for chunks are used, so they are stable.

However, the intermediate commit method is not a valid method in every situation.
Processed data and unprocessed data are mixed in the system even though it is temporary. As a result, since it is necessary to identify unprocessed data at the time of recovery processing, there is a possibility that the recovery becomes complicated. In order to avoid this, all of the cases must be confirmed with one transaction, and not use the intermediate commit method.
(Hereinafter, the method of determining all transactions in one transaction is called "single commit method".)

Nevertheless, if you process a large number of such as tens of thousands of items in a single commit method, you will get a heavy load trying to reflect all the databases when committing. Therefore, although the single commit method is suitable for small-scale batch processing, care must be taken when adopting it in a large-scale batch. So this method is not a versatile method too.

In other words, there is a trade-off between "localization of impact" and "ease of recovery". Which one of "intermediate commit method" and "single commit method" is used depends on the nature of the job and decides which one should be prioritized.
Of course, it is not necessary to implement all the jobs in the batch system on either side. It is natural to basically use "intermediate commit method" and use "single commit method" for special jobs (or the other way).

Below is the summary of advantages, disadvantages and adoption points of "intermediate commit method" and "single commit method".

Features list by method
Commit method Advantage Disadvantage Adoption point

intermediate commit method

Localize the effect at the time of error occurrence

Recovery processing may be complicated

When you want to process large amounts of data with certain machine resources

single commit method

Ensure data integrity

There is a possibility of high work-load when processing a large number of cases

When you want to set the processing result for the persistent resource to All or Nothing
Suitable for small batch processing

Notes on inputting and outputting to the same table in the database

In terms of the structure of the database, care is required when handling large amounts of data in processing to input and output to the same table regardless of the commit method.

  • As information that guarantees reading consistency is lost due to output (issuance of UPDATE), errors may occur at the input (SELECT).

In order to avoid this, the following measures are taken.

  • Increase the area to secure information.

    • When expanding, please consider it carefully in resource design.

    • Since the extension method depends on the database to be used, refer to the manual.

  • Divides input data and performs multiplexing processing.

5.1.2. Architecture

5.1.2.1. Transaction control in Spring Batch

Job transaction control leverages the mechanism of Spring Batch.

Two kinds of transactions are defined below.

Framework transaction

Transaction controlled by Spring Batch

User transaction

Transactions controlled by the user

5.1.2.1.1. Transaction control mechanism in chunk model

Transaction control in the chunk model is only the intermediate commit method. A single commit method can not be done.

The single commit method in the chunk model is reported in JIRA.
https://jira.spring.io/browse/BATCH-647
As a result, it is solved by customizing chunk completion policy and dynamically changing the chunk size. However, with this method, since all data is stored in one chunk and memory is compressed, it can not be adopted as a method.

A feature of this method is that transactions are repeatedly performed for each chunk.

Transaction control in normal process

Transaction control in normal process will be explained.

Transaction Control Chunk Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The subsequent processing is repeated until there is no input data.

    • Start a framework transaction on a per chunk basis.

    • Repeat steps 2 to 5 until the chunk size is reached.

  2. The step obtains input data from ItemReader.

  3. ItemReader returns the input data to the step.

  4. In the step, ItemProcessor processes input data.

  5. ItemProcessor returns the processing result to the step.

  6. The step outputs data for chunk size with ItemWriter.

  7. ItemWriter will output to the target resource.

  8. The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Transaction Control Chunk Model Abnormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The subsequent processing is repeated until there is no input data.

    • Start a framework transaction on a per chunk basis.

    • Repeat steps 2 to 5 until the chunk size is reached.

  2. The step obtains input data from ItemReader.

  3. ItemReader returns the input data to the step.

  4. In the step, ItemProcessor processes input data.

  5. ItemProcessor returns the processing result to the step.

  6. The step outputs data for chunk size with ItemWriter.

  7. ItemWriter will output to the target resource.

If any exception occurs between the process from 2 to 7,

  1. The step rolls back the framework transaction.

5.1.2.1.2. Mechanism of transaction control in tasklet model

For transaction control in the tasklet model, either the single commit method or the intermediate commit method can be used.

single commit method

Use the transaction control mechanism of Spring Batch

Intermediate commit method

Manipulate the transaction directly with the user

single commit method in tasklet model

Explain the mechanism of transaction control by Spring Batch.

A feature of this method is to process data repeatedly within one transaction.

Transaction control in normal process

Transaction control in normal process will be explained.

Single Transaction Control Tasklet Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts a framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 7 until there is no more input data.

  3. Tasklet gets input data from Repository.

  4. Repository will return input data to tasklet.

  5. Tasklets process input data.

  6. Tasklets pass output data to Repository.

  7. Repository will output to the target resource.

  8. The tasklet returns the process end to the step.

  9. The step commits the framework transaction.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Single Transaction Control Tasklet Model Abormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts a framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 7 until there is no more input data.

  3. Tasklet gets input data from Repository.

  4. Repository will return input data to tasklet.

  5. Tasklets process input data.

  6. Tasklets pass output data to Repository.

  7. Repository will output to the target resource.

If any exception occurs between the process from 2 to 7,

  1. The tasklet throws an exception to the step.

  2. The step rolls back the framework transaction.

Intermediate commit method in tasklet model

A mechanism for directly operating a transaction by a user will be described.

Feature of this method is that resource transactions are handled only by user transactions, by using framework transactions that can manipulate resources. Specify org.springframework.batch.support.transaction.ResourcelessTransactionManager without resources, in transaction-manager attribute.

Transaction control in normal process

Transaction control in normal process will be explained.

Chunk Transaction Control Tasklet Model Normal Process
Sequence diagram of normal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 10 until there is no more input data.

  3. The tasklet starts user transaction via TransacitonManager.

    • Repeat steps 4 to 6 until the chunk size is reached.

  4. Tasklet gets input data from Repository.

  5. Repository will return input data to tasklet.

  6. Tasklets process input data.

  7. Tasklets pass output data to Repository.

  8. Repository will output to the target resource.

  9. The tasklet commits the user transaction via TransacitonManager.

  10. TransacitonManager issues a commit to the target resource.

  11. The tasklet returns the process end to the step.

  12. The step commits the framework transaction.

In this case, each item is output to a resource, but like the chunk model, it is also possible to update the processing throughput collectively by chunk unit and improve the processing throughput. At that time, you can also use BatchUpdate by setting executorType of SqlSessionTemplate to BATCH. This is the same behavior as using MyBatis' ItemWriter, so you can update it using MyBatis' ItemWriter. For details of MyBatis' ItemWriter, refer to Database access with ItemWriter.

Transaction control in abnormal process

Transaction control in abnormal process will be explained.

Chunk Transaction Control Tasklet Model Abormal Process
Sequence diagram of abnormal process
Description of the Sequence Diagram
  1. Steps are executed from the job.

    • The step starts framework transaction.

  2. The step executes the tasklet.

    • Repeat steps 3 to 11 until there is no more input data.

  3. The tasklet starts user transaction from TransacitonManager.

    • Repeat steps 4 to 6 until the chunk size is reached.

  4. Tasklet gets input data from Repository.

  5. Repository will return input data to tasklet.

  6. Tasklets process input data.

  7. Tasklets pass output data to Repository.

  8. Repository will output to the target resource.

If any exception occurs between the process from 3 to 8,

  1. The tasklet processes the exception that occurred.

  2. The tasklet performs a rollback of user transaction via TransacitonManager.

  3. TransacitonManager issues a rollback to the target resource.

  4. The tasklet throws an exception to the step.

  5. The step rolls back framework transaction.

About processing continuation

Here, although processing is abnormally terminated after handling exceptions and rolling back the processing, it is possible to continue processing the next chunk. In either case, it is necessary to notify the subsequent processing by changing the status / end code of the step that an error has occurred during that process.

About framework transactions

In this case, although the job is abnormally terminated by throwing an exception after rolling back the user transaction, it is also possible to return the processing end to the step and terminate the job normally. In this case, the framework transaction is committed.

5.1.2.1.3. Selection policy for model-specific transaction control

In Spring Batch that is the basis of TERASOLUNA Batch 5.x, only the intermediate commit method can be implemented in the chunk model. However, in the tasklet model, either the intermediate commit method or the single commit method can be implemented.

Therefore, in TERASOLUNA Batch 5.x, when the single commit method is necessary, it is to be implemented in the tasklet model.

5.1.2.2. Difference in transaction control for each execution method

Depending on the execution method, a transaction that is not managed by Spring Batch occurs before and after the job is executed. This section explains transactions in two asynchronous execution processing schemes.

5.1.2.2.1. About transaction of DB polling

Regarding processing to the Job-request-table performed by the DB polling, transaction processing other than Spring Batch managed will be performed. Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by JobRequestPollTask.

A simple sequence diagram focusing on transactions is shown in the figure below.

With Database polling transaction
Transaction of DB polling
Description of the Sequence Diagram
  1. JobRequestPollTask is executed periodically from asynchronous batch daemon.

  2. JobRequestPollTask will start a transaction other than Spring Batch managed.

  3. JobRequestPollTask will retreive an asynchronous batch to execute from Job-request-table.

  4. JobRequestPollTask will commit the transaction other than Spring Batch managed.

  5. JobRequestPollTask will start a transaction other than Spring Batch managed.

  6. JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to POLLED.

  7. JobRequestPollTask will commit the transaction other than Spring Batch managed.

  8. JobRequestPollTask will execute the job.

  9. Inside the job, transaction control for DB for Management(JobRepository) will be managed by Spring Batch.

  10. Inside the job, transaction control for DB for Job will be managed by Spring Batch.

  11. job_execution_id is returned to JobRequestPollTask

  12. JobRequestPollTask will start a transaction other than Spring Batch managed.

  13. JobRequestPollTask will update the status of Job-request-table’s polling status from INIT to EXECUTE.

  14. JobRequestPollTask will commit the transaction other than Spring Batch managed.

About Commit at SELECT Issuance

Some databases may implicitly start transactions when SELECT is issued. Therefore, by explicitly issuing a commit, the transaction is confirmed so that the transaction is clearly distinguished from other transactions and is not influenced.

5.1.2.2.2. About the transaction of WebAP server process

As for processing to resources targeted by WebAP, transaction processing outside Spring Batch managed is performed. Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by WebAP.

A simple sequence diagram focusing on transactions is shown in the figure below.

With Web Application transaction
Transaction of WebAP server process
Description of the Sequence Diagram
  1. WebAP processing is executed by the request from the client

  2. WebAP will start the transaction managed outside of Spring Batch.

  3. WebAP reads from and writes to resources in WebAP before job execution.

  4. WebAP executes the job.

  5. Within a job, Spring Batch carries out transaction management to the Management DB (JobRepository).

  6. Within a job, Spring Batch carries out transaction management to the Job DB.

  7. job_execution_id is returned to WebAP

  8. WebAP reads from and writes to resources in WebAP after job execution.

  9. WebAP will commit the transaction managed outside of Spring Batch.

  10. WebAP returns a response to the client.

5.1.3. How to use

Here, transaction control in one job will be explained separately in the following cases.

The data source refers to the data storage location (database, file, etc.). A single data source refers to one data source, and multiple data sources refers to two or more data sources.

In the case of processing a single data source, the case of processing database data is representative.
There are some variations in the case of processing multiple data sources as follows.

  • multiple databases

  • databases and files

5.1.3.1. For a single data source

Transaction control of jobs input / output to one data source will be described.

Below is a sample setting with TERASOLUNA Batch 5.x.

DataSource setting(META-INF/spring/launch-context.xml)
<!-- Job-common definitions -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
      destroy-method="close"
      p:driverClassName="${jdbc.driver}"
      p:url="${jdbc.url}"
      p:username="${jdbc.username}"
      p:password="${jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false" />
TransactionManager setting(META-INF/spring/launch-context.xml)
<!-- (1) -->
<bean id="jobTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"
      p:dataSource-ref="jobDataSource"
      p:rollbackOnCommitFailure="true" />
No Description

(1)

Bean definition of TransactionManager.
Set jobDataSource defined above for the data source.
It has been set to roll back if commit fails.

5.1.3.1.1. Implement transaction control

The control method differs depending on the job model and the commit method.

In case of chunk model

In the case of the chunk model, it is an intermediate commit method, leaving transaction control to Spring Batch. Do not control the transaction at all by the user.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <batch:tasklet transaction-manager="jobTransactionManager"> <!-- (1) -->
            <batch:chunk reader="detailCSVReader"
                         writer="detailWriter"
                         commit-interval="10" /> <!-- (2) -->
        </batch:tasklet>
    </batch:step>
</batch:job>
No Description

(1)

Set jobTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.
The intermediate commit method transaction is controlled by the transaction manager set here.

(2)

Set chunk size to commit-interval attribute. In this sample, commit once for every 10 processing.

For the tasklet model

In the case of the tasklet model, the method of transaction control differs depending on whether the method is single commit method or the intermediate commit method.

single commit method

Spring Batch control transaction.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (1) -->
        <batch:tasklet transaction-manager="jobTransactionManager"
                       ref="salesPlanSingleTranTask" />
    </batch:step>
</batch:job>
No Description

(1)

Set jobTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.
The single commit method transaction is controlled by the transaction manager set here.

intermediate commit method

Control transaction by user.

  • If you want to commit in the middle of processing, inject the TransacitonManager and operate manually.

Setting sample(job definition)
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (1) -->
        <batch:tasklet transaction-manager="jobResourcelessTransactionManager"
                       ref="salesPlanChunkTranTask" />
    </batch:step>
</batch:job>
Implementation sample
@Component()
public class SalesPlanChunkTranTask implements Tasklet {

    @Inject
    ItemStreamReader<SalesPlanDetail> itemReader;

     // (2)
    @Inject
    @Named("jobTransactionManager")
    PlatformTransactionManager transactionManager;

    @Inject
    SalesPlanDetailRepository repository;

    private static final int CHUNK_SIZE = 10;

    @Override
    public RepeatStatus execute(StepContribution contribution,
                                ChunkContext chunkContext) throws Exception {

        DefaultTransactionDefinition definition = new DefaultTransactionDefinition();
        TransactionStatus status = null;

        try {
            // omitted.

            itemReader.open(executionContext);

            while ((item = itemReader.read()) != null) {

                if (count % CHUNK_SIZE == 0) {
                    status = transactionManager.getTransaction(definition); // (3)
                }
                count++;

                // omitted.

                repository.create(item);
                if (count % CHUNK_SIZE == 0) {
                    transactionManager.commit(status);  // (4)
                }
            }
        } catch (Exception e) {
            logger.error("Exception occurred while reading.", e);
            transactionManager.rollback(status);    // (5)
            throw e;
        } finally {
            if (!status.isCompleted()) {
                transactionManager.commit(status);   // (6)
            }
            itemReader.close();
        }

        return RepeatStatus.FINISHED;
    }
}
No Description

(1)

Set jobResourcelessTransactionManager which is already defined in transaction-manager attribute of <batch:tasklet> tag.

(2)

Inject the transaction manager.
In the @Named annotation, specify jobTransactionManager to identify the bean to use.

(3)

Start transaction at the beginning of chunk.

(4)

Commit the transaction at the end of the chunk.

(5)

When an exception occurs, roll back the transaction.

(6)

For the last chunk, commit the transaction.

Updating by ItemWriter

In the above example, although Repository is used, it is possible to update data using ItemWriter. Using ItemWriter has the effect of simplifying implementation, especially FlatFileItemWriter should be used when updating files.

5.1.3.1.2. Note for non-transactional data sources

In the case of files, no transaction setting or operation is necessary.

When using FlatFileItemWriter, pseudo transaction control can be performed. This is implemented by delaying the writing to the resource and actually writing out at the commit timing. Normally, when it reaches the chunk size, it outputs chunk data to the actual file, and if an exception occurs, data output of the chunk is not performed.

FlatFileItemWriter can switch transaction control on and off with transactional property. The default is true and transaction control is enabled. If the transactional property is false, FlatFileItemWriter will output the data regardless of the transaction.

When adopting the single commit method, it is recommend to set the transactional property to false. As described above, since data is written to the resource at the commit timing, until then, all the output data is held in the memory. Therefore, when the amount of data is large, there is a high possibility that the memory becomes insufficient and an error will occur.

On TransacitonManager settings in jobs that only handle files

As in the following job definition, the transaction-manager attribute of batch: tasklet is mandatory in the xsd schema and can not be omitted.

Excerpt of TransacitonManager setting part
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>

Therefore, always specify jobTransactionManager. At this time, the following behaviors are obtained.

  • If transactional is true

    • Synchronize with specified TransacitonManager and output to resource.

  • If transactional is false

    • Transaction processing of the specified TransacitonManager is idle and it outputs to the resource regardless of the transaction.

At this time, transactions are issued to the resource (eg, database) referred to by jobTransactionManager, but since there is no table access, there is no actual damage.

If you do not want to issue transactions to refer to even if it is idle or in case of actual damage, you can use ResourcelessTransactionManager which does not require resources. ResourcelessTransactionManager is defined as jobResourcelessTransactionManager in launch-context.xml.

Sample usage of ResourcelessTransactionManager
<batch:tasklet transaction-manager="jobResourcelessTransactionManager">
    <batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>
5.1.3.2. For multiple data sources

Transaction control of jobs input / output to multiple data sources will be described. Since consideration points are different between input and output, they will be explained separately.

5.1.3.2.1. Input from multiple data source

When retrieving data from multiple data sources, the data that is the axis of the process and it’s additional data should be retrieved separately. Hereinafter, the data as the axis of processing is referred to as the process target record, and the additional data accompanying it is referred to as accompanying data.

Because of the structure of Spring Batch, ItemReader is based on the premise that it retriedves a process target record from one resource. This is the same way of thinking regardless of the type of resource.

  1. Retriving process target record

    • Get it by ItemReader.

  2. Retriving accompanying data

    • In the accompanying data, it is necessary to select the following retreiving method according to the presence or absence of change to the data and the number of cases. This is not an option, and it may be used in combination.

      • Batch retrieval before step execution

      • Retrieve each time according to the record to be processed

When retrieving all at once before step execution

Implement Listener to do the following and refer to data from the following Step.

  • Retrieve data collectively

  • Store the information in the bean whose scope is Job or Step

    • ExecutionContext of Spring Batch can be used, but a diffferent class can be created to store data considering the readability and maintainability. For the sake of simplicity, the sample will be explained using ExecutionContext.

This method is adopted when reading data that does not depend on data to be processed such as master data. However, even if it is a master data, if there is a large number of items which may give an impact to the memory, retrieving each time should be considered.

Implementation of Listener for collective retrieve
@Component
// (1)
public class BranchMasterReadStepListener extends StepExecutionListenerSupport {

    @Inject
    BranchRepository branchRepository;

    @Override
    public void beforeStep(StepExecution stepExecution) {   // (2)

        List<Branch> branches = branchRepository.findAll(); //(3)

        Map<String, Branch> map = branches.stream()
                .collect(Collectors.toMap(Branch::getBranchId,
                        UnaryOperator.identity()));  // (4)

        stepExecution.getExecutionContext().put("branches", map); // (5)
    }
}
Definition of Listener for collective retrieve
<batch:job id="outputAllCustomerList01" job-repository="jobRepository">
    <batch:step id="outputAllCustomerList01.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="retrieveBranchFromContextItemProcessor"
                         writer="writer" commit-interval="10"/>
            <batch:listeners>
                <batch:listener ref="branchMasterReadStepListener"/> <!-- (6) -->
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
</batch:job>
An example of referring data collectively retrieved by the ItemProcessor of the subsequent step
@Component
public class RetrieveBranchFromContextItemProcessor implements
        ItemProcessor<Customer, CustomerWithBranch> {

    private Map<String, Branch> branches;

    @BeforeStep       // (7)
    @SuppressWarnings("unchecked")
    public void beforeStep(StepExecution stepExecution) {
        branches = (Map<String, Branch>) stepExecution.getExecutionContext()
                .get("branches"); // (8)
    }

    @Override
    public CustomerWithBranch process(Customer item) throws Exception {
        CustomerWithBranch newItem = new CustomerWithBranch(item);
        newItem.setBranch(branches.get(item.getChargeBranchId()));    // (9)
        return newItem;
    }
}
No Description

(1)

Implement StepExecutionListener interface.
In order to simplify the implementation here, it is an extension from StepExecutionListenerSupport which implements the StepExecutionListener interface.

(2)

Implement the beforeStep method to get data before step execution.

(3)

Implement processing to retrieve master data.

(4)

Convert from List type to Map type so that it can be used easily in subsequent processing.

(5)

Set the acquired master data in the context of the step as branches.

(6)

Register the created Listener to the target job.

(7)

In order to acquire master data before step execution of ItemProcessor, set up Listener with @BeforeStep annotation.

(8)

In the method given the @BeforeStep annotation, obtain the master data set in (5) from the context of the step.

(9)

In the process method of ItemProcessor, data is retrieved from the master data.

Object to store in context

The object to be stored in the context(ExecutionContext) must be a class that implements java.io.Serializable. This is because ExecutionContext is stored in JobRepository.

Retrieving each time according to the record to be processed

Apart from ItemProcessor of business processing, it retrieves by ItemProcessor designated just for retrieving every time. This simplifies processing of each ItemProcessor.

  1. Define ItemProcessor designated just for retrieving every time, and separate it from business process.

    • At this time, use MyBatis as it is when accessing the table.

  2. Concatenate multiple ItemProcessors using CompositeItemProcessor.

    • Note that ItemProcessor is processed in the order specified in the delegates attribute.

Sample implementation of ItemProcessor designated just for retrieving every time
@Component
public class RetrieveBranchFromRepositoryItemProcessor implements
        ItemProcessor<Customer, CustomerWithBranch> {

    @Inject
    BranchRepository branchRepository;  // (1)

    @Override
    public CustomerWithBranch process(Customer item) throws Exception {
        CustomerWithBranch newItem = new CustomerWithBranch(item);
        newItem.setBranch(branchRepository.findOne(
                item.getChargeBranchId())); // (2)
        return newItem; // (3)
    }
}
Definition sample of ItemProcessor designated just for retrieving every time and ItemProcessor for business process
<bean id="compositeItemProcessor"
      class="org.springframework.batch.item.support.CompositeItemProcessor">
    <property name="delegates">
        <list>
            <ref bean="retrieveBranchFromRepositoryItemProcessor"/> <!-- (4) -->
            <ref bean="businessLogicItemProcessor"/>  <!-- (5) -->
        </list>
    </property>
</bean>
No Description

(1)

Inject Repository for retrieving every time using MyBatis.

(2)

Accompaniment data is retrieved from the Repository for input data(process target record).

(3)

Return data with processing target record and accompanying data together.
Notice that this data will be the input data to the next ItemProcessor.

(4)

Set ItemProcessor for retrieving every time.

(5)

Set ItemProcessor for business logic.

5.1.3.2.2. Output to multiple data sources(multiple steps)

Process multiple data sources throughout the job by dividing the steps for each data source and processing a single data source at each step.

  • Data processed at the first step is stored in a table, and at the second step, it is outputted to a file.

  • Although each step is simple and easy to recover, there is a possibility that it may be troublesome twice.

    • As a result, in the case of causing the following harmful effects, consider processing multiple data sources in one step.

      • Processing time increases

      • Business logic becomes redundant

5.1.3.2.3. Output to multiple data sources(single step)

Generally, when transactions for a plurality of data sources are combined into one, a distributed transaction based on 2 phase-commit is used. However, it is also known that there are the following disadvantages.

  • Middleware must be compatible with distributed transaction API such as XAResource, and special setting based on it is required

  • In standalone Java like a batch program, you need to add a JTA implementation library for distributed transactions

  • Recovery in case of failure is difficult

Although it is possible to utilize distributed transactions also in Spring Batch, the method using global transaction by JTA requires performance overhead due to the characteristics of the protocol. As a method to process multiple data sources collectively more easily, Best Efforts 1PC pattern is recommended.

What is Best Efforts 1PC pattern

Briefly, it refers to the technique of handling multiple data sources as local transactions and issuing sequential commits at the same timing. The conceptual diagram is shown in the figure below.

Best Efforts 1PC Overview
Conceptual diagram of Best Efforts 1PC pattern
Description of figure
  1. The user instructs ChainedTransactionManager to start the transaction.

  2. ChainedTransactionManager starts a transaction sequentially with registered transaction managers.

  3. The user performs transactional operations on each resource.

  4. The user instructs ChainedTransactionManager to commit.

  5. ChainedTransactionManager issues sequential commits on registered transaction managers.

    • Commit(or roll back) in reverse order of transaction start

Since this method is not a distributed transaction, there is a possibility that data consistency may not be maintained if a failure(exception) occurs at commit / rollback in the second and subsequent transaction managers. Therefore, although it is necessary to design a recovery method when a failure occurs at a transaction boundary, there is an effect that the recovery frequency can be reduced and the recovery procedure can be simplified.

When processing multiple transactional resources at the same time

Use it on cases such as when processing multiple databases simultaneously, when processing databases and MQ, and so on.

Process as 1 phase-commit by defining multiple transaction managers as one using ChainedTransactionManager as follows. Note that ChainedTransactionManager is a class provided by Spring Data.

pom.xml
<dependencies>
    <!-- omitted -->
    <!-- (1) -->
    <dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-commons</artifactId>
    </dependency>
<dependencies>
Sample usage of chainedTransactionManager
<!-- Chained Transaction Manager -->
<!-- (2) -->
<bean id="chainedTransactionManager"
      class="org.springframework.data.transaction.ChainedTransactionManager">
   <constructor-arg>
        <!-- (3) -->
        <list>
            <ref bean="transactionManager1"/>
            <ref bean="transactionManager2"/>
        </list>
    </constructor-arg>
</bean>

<batch:job id="jobSalesPlan01" job-repository="jobRepository">
    <batch:step id="jobSalesPlan01.step01">
        <!-- (4) -->
        <batch:tasklet transaction-manager="chainedTransactionManager">
            <!-- omitted -->
        </batch:tasklet>
    </batch:step>
</batch:job>
No Description

(1)

Add a dependency to use ChainedTransactionManager.

(2)

Define the bean of ChainedTransactionManager.

(3)

Define multiple transaction managers that you want to summarize in a list.

(4)

Specify the bean ID defined in (1) for the transaction manager used by the job.

When processing transactional and nontransactional resources simultaneously

This method is used when processing databases and files at the same time.

For database it is the same as For a single data source.

For files, setting FlatFileItemWriter’s transactional property to true provides the same effect as the "Best Efforts 1PC pattern" described above.
For details, refer to Note for non-transactional data sources.

This setting delays writing to the file until just before committing the transaction of the database, so it is easy to synchronize with the two data sources. However, even in this case, if an error occurs during file output processing after committing to the database, there is a possibility that data consistency can not be maintained, It is necessary to design a recovery method.

5.1.3.3. Notes on intermediate method commit

Although it is deprecated, when processing data is skipped in ItemWriter, the chunk size setting value is forcibly changed. Note that this has a very big impact on transactions. Refer to Skip for details.

5.2. Database Access

5.2.1. Overview

MyBatis3 (hereafter, called [MyBatis]) is used for database access in TERASOLUNA Batch 5.x. Please refer below TERASOLUNA Server 5.x Development Guideline for basic usage of database access using MyBatis.

This chapter mainly explain how to use database access as TERASOLUNA Batch 5.x specifically.

Notes for how to use Oracle JDBC in Linux environment

While using Oracle JDBC in Linux environment, locking of random generator number of OS used by Oracle JDBC occurs. Hence, even though jobs are attempted to be executed in parallel, events for sequential execution and events for one connection timeout occur.
2 patterns for how to avoid these events are shown below.

  • Set following in system properties while executing Java command.

    • -Djava.security.egd=file:///dev/urandom

  • Change securerandom.source=/dev/random in ${JAVA_HOME}/jre/lib/security/java.security to securerandom.source=/dev/urandom.

5.2.2. How to use

Explain how to use database access as TERASOLUNA Batch 5.x.

It must be remembered that how to access database varies for chunk model and tasklet model.

There are following 2 ways to use database access in TERASOLUNA Batch 5.x.
Please choose them based on the components accessing the database.

  1. Use ItemReader and ItemWriter for MyBatis.

    • For Input/Output by using database access as chunk model.

      • org.mybatis.spring.batch.MyBatisCursorItemReader

      • org.mybatis.spring.batch.MyBatisBatchItemWriter

  2. Use Mapper interface

    • For bussiness logic processing as chunk model.

      • With ItemProcessor implementation.

    • For whole database access as tasklet model.

      • With Tasklet implementation.

5.2.2.1. Common Settings

Explain common settings required for database access.

5.2.2.1.1. DataSource Setting

It assumes two data sources in TERASOLUNA Batch 5.x. Show 2 default data sources in launch-context.xml.

Data source list
Data source name Description

adminDataSource

Data source used by Spring Batch and TERASOLUNA Batch 5.x
It is used in JobRepository and Asynchronous execution(DB polling)

jobDataSource

Data source used by job

Show the property of connection information and launch-context.xml below.

Set these settings according to the user’s environment.

resources\META-INF\spring\launch-context.xml
<!-- (1) -->
<bean id="adminDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
      destroy-method="close"
      p:driverClassName="${admin.h2.jdbc.driver}"
      p:url="${admin.h2.jdbc.url}"
      p:username="${admin.h2.jdbc.username}"
      p:password="${admin.h2.jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false"/>

<!-- (2) -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource" 
      destroy-method="close"
      p:driverClassName="${jdbc.driver}"
      p:url="${jdbc.url}"
      p:username="${jdbc.username}"
      p:password="${jdbc.password}"
      p:maxTotal="10"
      p:minIdle="1"
      p:maxWaitMillis="5000"
      p:defaultAutoCommit="false" />
batch-application.properties
# (3)
# Admin DataSource settings.
admin.jdbc.driver=org.h2.Driver
admin.jdbc.url=jdbc:h2:mem:batch;DB_CLOSE_DELAY=-1
admin.jdbc.username=sa
admin.jdbc.password=

# (4)
# Job DataSource settings.
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost:5432/postgres
jdbc.username=postgres
jdbc.password=postgres
Description
Sr. No. Description

(1)

adminDataSource definition. Connection information of (3) is set.

(2)

jobDataSource definition. Connection information of (4) is set.

(3)

Connection information to database used by adminDataSource
H2 is used in this example.

(4)

Connection information to database used by jobDataSource
PostgreSQL is used in this example.

5.2.2.1.2. MyBatis Setting

Important points for setting MyBatis on TERASOLUNA Batch 5.x.

One of the important points in implementing batch processing is "to efficiently process large amounts of data with certain resources"
Explain the setting.

  • fetchSize

    • In general batch processing, it is mandatory to specify the appropriate fetchSize for the JDBC driver to reduce the communication cost of processing large amounts of data. fetchSize is a parameter that sets the number of data to be acquired by one communication between the JDBC driver and the database. It is desirable to set this value as large as possible. However, if it is too large, it presses memory. So please be careful. user has to tune the parameter.

    • In MyBatis, user can set defaultFetchSize as a common setting for all queries, and can override it with fetchSize setting for each query.

  • executorType

    • In general batch processing, the same SQL is executed within the same transaction for the number of total data count/fetchSize. At this time, it is possible to process efficiently by reusing a statement instead of creating it each time.

    • In the MyBatis setting, it can reuse statements by setting REUSE in defaultExecutorType and contributes to improved processing throughput.

    • When updating a large amount of data at once, performance improvement can be expected by using batch update of JDBC.
      Therefore, SqlSessionTemplate used in MyBatisBatchItemWriter
      is set to BATCH (not REUSE) in executorType.

In TERASOLUNA Batch 5.x, two different ExecutorType exists at the same time. It is assumed that it is often implemented by one ExecutorType, but special attention is required when using them together. The detail will be explained in Database Access other than ItemReader・ItemWriter.

Other parameter of MyBatis

For other parameters, refer to the following links and make settings that match the application characteristics.
http://www.mybatis.org/mybatis-3/configuration.html

Show the default setting below.

META-INF/spring/launch-context.xml
<bean id="jobSqlSessionFactory"
      class="org.mybatis.spring.SqlSessionFactoryBean"
      p:dataSource-ref="jobDataSource">
    <!-- (1) -->
    <property name="configuration">
        <bean class="org.apache.ibatis.session.Configuration"
              p:localCacheScope="STATEMENT"
              p:lazyLoadingEnabled="true"
              p:aggressiveLazyLoading="false"
              p:defaultFetchSize="1000"
              p:defaultExecutorType="REUSE"/>
    </property>
</bean>

<!-- (2) -->
<bean id="batchModeSqlSessionTemplate"
      class="org.mybatis.spring.SqlSessionTemplate"
      c:sqlSessionFactory-ref="jobSqlSessionFactory"
      c:executorType="BATCH"/>
Description
Sr. No. Description

(1)

Various setting of MyBatis
fetchSize is set to 1000 by default.

(2)

For MyBatisBatchItemWriter, executorType defines SqlSessionTemplate of BATCH.

For the definition of SqlSessionFactory using adminDataSource

When performing synchronous execution, SqlSessionFactory using adminDataSource is unnecessary and is not defined. When performing Asynchronous execution(DB polling), it is defined in META-INF/spring/async-batch-daemon.xml to access the Job-request-table.

META-INF/spring/async-batch-daemon.xml
<bean id="adminSqlSessionFactory"
      class="org.mybatis.spring.SqlSessionFactoryBean"
      p:dataSource-ref="adminDataSource" >
    <property name="configuration">
        <bean class="org.apache.ibatis.session.Configuration"
              p:localCacheScope="STATEMENT"
              p:lazyLoadingEnabled="true"
              p:aggressiveLazyLoading="false"
              p:defaultFetchSize="1000"
              p:defaultExecutorType="REUSE"/>
    </property>
</bean>
5.2.2.1.3. Mapper XML definition

Please refer to Implementation of database access process in TERASOLUNA Server 5.x Development Guideline, because there are no specific description about TERASOLUNA Batch 5.x.

5.2.2.1.4. MyBatis-Spring setting

When using ItemReader and ItemWriter provided by MyBatis-Spring, it is necessary to set Mapper XML used in Mapper’s Config.

As the setting method, there are following 2 methods.

  1. Register Mapper XML to be used for all jobs as a common setting.

    • All Mapper XML has to be described in META-INF/spring/launch-context.xml.

  2. Register Mapper XML to be used for each job as individual setting.

    • Mapper XML required by each job has to be described in bean definition under META-INF/jobs/

If common settings are made, the following adverse effects arise because not only Mapper XML of jobs executed, but also Mapper XML used by other jobs are also read when executing synchronous execution.

  • It takes time to start the job

  • Consumption of memory resources increases

To avoid it, TERASOLUNA Batch 5.x adopts a setting method that specifies only Mapper XML that the job requires for each job definition as individual setting.

For the basic setting method, please refer to MyBatis-Spring settings in TERASOLUNA Server 5.x Development Guideline.

In TERASOLUNA Batch 5.x, since multiple SqlSessionFactory and SqlSessionTemplate are defined, it is necessary to explicitly specify which one to use.
Basically, specify jobSqlSessionFactory

Show setting example below.

META-INF/jobs/common/jobCustomerList01.xml
<!-- (1) -->
<mybatis:scan
    base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
    factory-ref="jobSqlSessionFactory"/>
Description
Sr. No. Description

(1)

Set jobSqlSessionFactory in factory-ref attribute of <mybatis:scan>

5.2.2.2. Database access with ItemReader

Explain Database access with ItemReader here.

5.2.2.2.1. ItemReader of MyBatis

MyBatis-Spring provides the following two ItemReader.

  • org.mybatis.spring.batch.MyBatisCursorItemReader

  • org.mybatis.spring.batch.MyBatisPagingItemReader

MyBatisPagingItemReader is an ItemReader that uses the mechanism described in Pagination search for Entity (SQL refinement method) of TERASOLUNA Server 5.x Development Guideline
Since SQL is issued again after acquiring a certain number of cases, there is a possibility that data consistency may not be maintained. Therefore, it is dangerous to use it in batch processing, so TERASOLUNA Batch 5.x does not use it in principle.
TERASOLUNA Batch 5.x uses only MyBatisCursorItemReader.

In TERASOLUNA Batch 5.x, as explained in MyBatis-Spring setting, It adopts a method of dynamically registering Mapper XML with mybatis:scan. Therefore, it is necessary to prepare an interface corresponding to Mapper XML. For details, please refer to Implementation of database access process in TERASOLUNA Server 5.x Development Guideline.

Show an example of usage of MyBatisCursorItemReader below.

META-INF/jobs/common/jobCustomerList01.xml
<!-- (1) -->
<mybatis:scan
    base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
    factory-ref="jobSqlSessionFactory"/>

<!-- (2) (3) (4) -->
<bean id="reader"
      class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
      p:queryId="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository.findAll"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
org/terasoluna/batch/functionaltest/app/repository/mst/CustomerRepository.xml
<!-- (5) -->
<mapper namespace="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository">

    <!-- (6) -->
    <select id="findAll"
            resultType="org.terasoluna.batch.functionaltest.app.model.mst.Customer">
        <![CDATA[
        SELECT
            customer_id AS customerId,
            customer_name AS customerName,
            customer_address AS customerAddress,
            customer_tel AS customerTel,
            charge_branch_id AS chargeBranchId,
            create_date AS createDate,
            update_date AS updateDate
        FROM
            customer_mst
        ORDER by
            charge_branch_id ASC, customer_id ASC
        ]]>
    </select>

    <!-- omitted -->
</mapper>
org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository
public interface CustomerRepository {
    // (7)
    List<Customer> findAll();

    // omitted.
}
Description
Sr. No. Description

(1)

Register Mapper XML.

(2)

Define MyBatisCursorItemReader.

(3)

Specify the SQL ID defined in (6) with namespace + <method name> of (5) to the property of queryId.

(4)

Specify SqlSessionFactory of the database to be accessed in sqlSessionFactory-ref property.

(5)

Define Mapper XML. Match the value of namespace with the FQCN of the interface.

(6)

Define SQL.

(7)

Define the method corresponding to the SQL ID defined in (6) for the interface.

5.2.2.3. Database Access with ItemWriter

Explain database access with ItemWriter in here.

5.2.2.3.1. ItemWriter of MyBatis

MyBatis-Spring provides only one following ItemWriter.

  • org.mybatis.spring.batch.MyBatisBatchItemWriter

The basic setting is the same as ItemReader of MyBatis. MyBatisBatchItemWriter needs to specify batchModeSqlSessionTemplate described in MyBatis Setting.

Show an example definition of MyBatisBatchItemWriter below.

META-INF/jobs/common/jobSalesPlan01.xml
<!-- (1) -->
<mybatis:scan
    base-package="org.terasoluna.batch.functionaltest.app.repository.plan"
    factory-ref="jobSqlSessionFactory"/>

<!-- (2) (3) (4) -->
<bean id="detailWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository.create"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>

<!-- omitted -->
org/terasoluna/batch/functionaltest/app/repository/plan/SalesPlanDetailRepository.xml
<!-- (5) -->
<mapper namespace="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository">

    <!-- (6) -->
    <insert id="create"
            parameterType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
        <![CDATA[
        INSERT INTO
            sales_plan_detail(branch_id, year, month, customer_id, amount)
        VALUES (
            #{branchId}, #{year}, #{month}, #{customerId}, #{amount}
        )
        ]]>
    </insert>

    <!-- omitted -->
</mapper>
org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository
public interface SalesPlanDetailRepository {

    // (7)
    void create(SalesPlanDetail salesPlanDetail);

    // omitted.
}
Description
Sr. No. Description

(1)

Register Mapper XML.

(2)

Define MyBatisBatchItemWriter.

(3)

Specify the SQL ID defined in (6) with namespace + <method name> of (5) to the property of statementId.

(4)

Specify SessionTemplate of the database to be accessed in sqlSessionTemplate-ref property.
The specified SessionTemplate is mandatory that executorType is set to BATCH.

(5)

Define Mapper XML. Match the value of namespace with the FQCN of the interface.

(6)

Define SQL.

(7)

Define the method corresponding to the SQL ID defined in (6) for the interface.

5.2.2.4. Database Access other than ItemReader・ItemWriter

Explain database access except for ItemReader・ItemWriter.

To access the database except for ItemReader・ItemWriter, use the Mapper interface. In using the Mapper interface, TERASOLUNA Batch 5.x has the following restrictions.

The available points of Mapper interface.
Process ItemProcessor Tasklet Listner

Reference

Available

Available

Available

Update

Conditionally available

Available

Unavailable

Restrictions in ItemProcessor

There is a restriction that it should not be executed with two or more ExecutorType within the same transaction in MyBatis.
If "use MyBatisBatchItemWriter for ItemWriter" and "use ItemProcessor to update and reference the Mapper interface" are satisfied at the same time, it conflicts with this restriction.
To avoid this restriction, database is accessed by using Mapper interface that ExecutorType is BATCH in ItemProcessor.
In addition, MyBatisBatchItemWriter checks whether it is SQL issued by itself with the status check after executing SQL but naturally it can not manage SQL execution by ItemProcessor and an error will occur.
Therefore, if MyBatisBatchItemWriter is used, updating with the Mapper interface will not be possible and only reference.

It can set to invalidate the error check of MyBatisBatchItemWriter, but the setting is prohibited because there is a possibility that unexpected behavior may occur.

Restrictions in Tasklet

In Tasklet, since it is basic to use the Mapper interface, there is no influence like ItemProcessor.
It is possible to use MyBatisBatchItemWriter by Inject, but in that case Mapper interface itself can be processed with BATCH setting. In other words, there is basically no need to use MyBatisBatchItemWriter by Inject.

Restrictions in Listener

Even at the listener, the same restriction as that of ItemProcessor is established. In addition, for listeners, use cases requiring updates are difficult to think. Therefore, update processing is prohibited at the listner.

Replacement of update processing assumed by the listner
Job state management

It is done by JobRepository of Spring Batch

Log output to database

It should be done in the log Appender. It is necessary to manage it separately from the transaction of the job.

5.2.2.4.1. Database access with ItemProcessor

Show an example of database access with ItemProcessor.

Implementation example with ItemProcessor
@Component
public class UpdateItemFromDBProcessor implements
        ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {

    // (1)
    @Inject
    CustomerRepository customerRepository;

    @Override
    public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {

        // (2)
        Customer customer = customerRepository.findOne(readItem.getCustomerId());

        // (3)
        SalesPlanDetail writeItem = new SalesPlanDetail();
        writeItem.setBranchId(customer.getChargeBranchId());
        writeItem.setYear(readItem.getYear());
        writeItem.setMonth(readItem.getMonth());
        writeItem.setCustomerId(readItem.getCustomerId());
        writeItem.setAmount(readItem.getAmount());
        return writeItem;
    }
}
Bean definition
<!-- (2) -->
<mybatis:scan
        base-package="org.terasoluna.batch.functionaltest.app.repository"
        template-ref="batchModeSqlSessionTemplate"/>

<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
      p:queryId="org.terasoluna.batch.functionaltest.app.repository.performance.SalesPerformanceDetailRepository.findAll"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

<!-- (3) -->
<bean id="writer" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository.create"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>

<batch:job id="DBAccessByItemProcessor" job-repository="jobRepository">
    <batch:step id="DBAccessByItemProcessor.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <!-- (4) -->
            <batch:chunk reader="reader"
                         processor="updateItemFromDBProcessor"
                         writer="writer" commit-interval="10"/>
        </batch:tasklet>
    </batch:step>
</batch:job>

Mapper interface and Mapper XML are omitted.

Description

Sr. No.

Description

(1)

Inject Mapper interface.

(2)

Register Mapper XML.
By specifying batchModeSqlSessionTemplate set as BATCH in template-ref attribute, database access with ItemProcessor is BATCH. if you set factory-ref="jobSqlSessionFactory", it conflicts with the above restriction and an exception is thrown when MyBatisBatchItemWriter is executed.

(3)

Define MyBatisBatchItemWriter
Specify batchModeSqlSessionTemplate set as BATCH in sqlSessionTemplate-ref property.

(4)

Set ItemProcessor that injected Mapper interface.

Supplement of MyBatisCursorItemReader setting

Different ExecutorType can be used for MyBatisCursorItemReader and MyBatisBatchItemWriter like the definition example below. This is because the opening of the resource by MyBatisCursorItemReader is done before the start of the transaction.

<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
      p:queryId="xxx"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

<bean id="writer" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="yyy"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>
5.2.2.4.2. Database Access with Tasklet

Show an example of database access in Tasklet.

Implementation example with Tasklet
@Component
public class OptimisticLockTasklet implements Tasklet {

    // (1)
    @Inject
    ExclusiveControlRepository repository;

    // omitted.

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {

        Branch branch = repository.branchFindOne(branchId); // (2)
        ExclusiveBranch exclusiveBranch = new ExclusiveBranch();

        exclusiveBranch.setBranchId(branch.getBranchId());
        exclusiveBranch.setBranchName(branch.getBranchName() + " - " + identifier);
        exclusiveBranch.setBranchAddress(branch.getBranchAddress() + " - " + identifier);
        exclusiveBranch.setBranchTel(branch.getBranchTel());
        exclusiveBranch.setCreateDate(branch.getUpdateDate());
        exclusiveBranch.setUpdateDate(new Timestamp(System.currentTimeMillis()));
        exclusiveBranch.setOldBranchName(branch.getBranchName());

        int result = repository.branchExclusiveUpdate(exclusiveBranch); // (3)

        return RepeatStatus.FINISHED;
    }
}
Bean definition
<!-- (4) -->
<mybatis:scan
        base-package="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository"
        factory-ref="jobSqlSessionFactory"/>

<batch:job id="taskletOptimisticLockCheckJob" job-repository="jobRepository">
    <batch:step id="taskletOptimisticLockCheckJob.step01">
        <batch:tasklet transaction-manager="jobTransactionManager"
                       ref="optimisticLockTasklet"> <!-- (5) -->
        </batch:tasklet>
    </batch:step>
</batch:job>

Mapper interface and Mapper XML are omitted.

Description

Sr. No.

Description

(1)

Inject Mapper interface.

(2)

Execute the search process with the Mapper interface.

(3)

Execute the update process with the Mapper interface.

(4)

Register Mapper XML
Specify jobSqlSessionFactory set as REUSE in factory-ref attribute.

(5)

Inject Mapper interface and set Tasklet.

Use batchModeSqlSessionTemplate

If there are many updating processes with the tasklet model, set batchModeSqlSessionTemplate in factory-ref attribute. As a result, batch update processing is performed, so performance improvement can be expected. However, be aware that executing batch updates requires flush explicitly. For details, please refer to Precautions when using batch mode Repository.

5.2.2.4.3. Database Access with Listener

Database access with listener is often linked with other components. Depending on the listener to be used and the implementation method, It is necessary to prepare additional mechanism to hand over to other components.

Show an example in which StepExecutionListener acquires data before step execution and uses the data acquired by ItemProcessor.

Implementation example with Listener
public class CacheSetListener extends StepExecutionListenerSupport {

    // (1)
    @Inject
    CustomerRepository customerRepository;

    // (2)
    @Inject
    CustomerCache cache;

    @Override
    public void beforeStep(StepExecution stepExecution) {
        // (3)
        for(Customer customer : customerRepository.findAll()) {
            cache.addCustomer(customer.getCustomerId(), customer);
        }
    }
}
Application example with ItemProcessor
@Component
public class UpdateItemFromCacheProcessor implements
        ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {

    // (4)
    @Inject
    CustomerCache cache;

    @Override
    public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
        Customer customer = cache.getCustomer(readItem.getCustomerId());  // (5)

        SalesPlanDetail writeItem = new SalesPlanDetail();

        // omitted.
        writerItem.setCustomerName(customer.getCustomerName); // (6)

        return writeItem;
    }
}
Cache class
// (7)
@Component
public class CustomerCache {

    Map<String, Customer> customerMap = new HashMap<>();

    public Customer getCustomer(String customerId) {
        return customerMap.get(customerId);
    }

    public void addCustomer(String id, Customer customer) {
        customerMap.put(id, customer);
    }
}
Bean definition
<!-- omitted -->

<!-- (8) -->
<mybatis:scan
        base-package="org.terasoluna.batch.functionaltest.app.repository"
        template-ref="batchModeSqlSessionTemplate"/>
<!-- (9) -->
<bean id="cacheSetListener"
      class="org.terasoluna.batch.functionaltest.ch05.dbaccess.CacheSetListener"/>

<!-- omitted -->

<batch:job id="DBAccessByItemListener" job-repository="jobRepository">
    <batch:step id="DBAccessByItemListener.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="updateItemFromCacheProcessor"
                         writer="writer" commit-interval="10"/> <!-- (10) -->
            <!-- (11) -->
            <batch:listeners>
                <batch:listener ref="cacheSetListener"/>
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
</batch:job>
Description

Sr. No.

Description

(1)

Inject Mapper interface.

(2)

Inject a bean for caching data acquired from the Mapper interface.

(3)

Get data from the Mapper interface and cache it at the listener.
In this case, I/O is reduced and processing efficiency is improved by creating a cache before step execution with StepExecutionListener#beforeStep and referring to the cache in the subsequent processing.

(4)

Inject the same bean as the cache set in (2).

(5)

Get corresponding data from the cache.

(6)

Reflect the data from the cache in the update data.

(7)

Implement the cache class as a component.
The Bean scope is singleton in here. Please set according to job.

(8)

Register Mapper XML
Specify batchModeSqlSessionTemplate set as BATCH in template-ref attribute.

(9)

Define the listener that uses the Mapper interface.

(10)

Specify ItemProcessor that uses cache.

(11)

Register the listener defined in (9).

Using SqlSessionFactory with the Listener

In the above example, batchModeSqlSessionTemplate is set, but jobSqlSessionFactory also can be set.

For listeners that run outside the scope of chunks, since it is processed outside the transaction, setting jobSqlSessionFactory does not matter.

5.2.3. How To Extend

5.2.3.1. Updating multiple tables with CompositeItemWriter

In a chunk model, when multiple tables are to be updated for 1 input data, it can be achieved by using CompositeItemWriter provided by Spring Batch and linking MyBatisBatchItemWriter corresponding to each table.

An implementation example wherein two tables of sales plan and actual sales are updated is shown here.

How to implement ItemProcessor
@Component
public class SalesItemProcessor implements ItemProcessor<SalesPlanDetail, SalesDTO> {
    @Override
    public SalesDTO process(SalesPlanDetail item) throws Exception { // (1)

        SalesDTO salesDTO = new SalesDTO();

        // (2)
        SalesPerformanceDetail spd = new SalesPerformanceDetail();
        spd.setBranchId(item.getBranchId());
        spd.setYear(item.getYear());
        spd.setMonth(item.getMonth());
        spd.setCustomerId(item.getCustomerId());
        spd.setAmount(new BigDecimal(0L));
        salesDTO.setSalesPerformanceDetail(spd);

        // (3)
        item.setAmount(item.getAmount().add(new BigDecimal(1L)));
        salesDTO.setSalesPlanDetail(item);

        return salesDTO;
    }
}
Implementation of DTO
public class SalesDTO implements Serializable {

    // (4)
    private SalesPlanDetail salesPlanDetail;

    // (5)
    private SalesPerformanceDetail salesPerformanceDetail;

    // omitted
}
How to implement MapperXML
<mapper namespace="org.terasoluna.batch.functionaltest.ch05.dbaccess.repository.SalesRepository">

    <select id="findAll" resultType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
        <![CDATA[
        SELECT
            branch_id AS branchId, year, month, customer_id AS customerId, amount
        FROM
            sales_plan_detail
        ORDER BY
            branch_id ASC, year ASC, month ASC, customer_id ASC
        ]]>
    </select>

    <!-- (6) -->
    <update id="update" parameterType="org.terasoluna.batch.functionaltest.ch05.dbaccess.SalesDTO">
        <![CDATA[
        UPDATE
            sales_plan_detail
        SET
            amount = #{salesPlanDetail.amount}
        WHERE
            branch_id = #{salesPlanDetail.branchId}
        AND
            year = #{salesPlanDetail.year}
        AND
            month = #{salesPlanDetail.month}
        AND
            customer_id = #{salesPlanDetail.customerId}
        ]]>
    </update>

    <!-- (7) -->
    <insert id="create" parameterType="org.terasoluna.batch.functionaltest.ch05.dbaccess.SalesDTO">
        <![CDATA[
        INSERT INTO
            sales_performance_detail(
                branch_id,
                year,
                month,
                customer_id,
                amount
            )
        VALUES (
            #{salesPerformanceDetail.branchId},
            #{salesPerformanceDetail.year},
            #{salesPerformanceDetail.month},
            #{salesPerformanceDetail.customerId},
            #{salesPerformanceDetail.amount}
        )
        ]]>
    </insert>

</mapper>
How to apply CompositeItemWriter
<!-- reader using MyBatisCursorItemReader -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
      p:queryId="org.terasoluna.batch.functionaltest.ch05.dbaccess.repository.SalesRepository.findAll"
      p:sqlSessionFactory-ref="jobSqlSessionFactory"/>

<!-- writer MyBatisBatchItemWriter -->
<!-- (8) -->
<bean id="planWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="org.terasoluna.batch.functionaltest.ch05.dbaccess.repository.SalesRepository.update"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>

<!-- (9) -->
<bean id="performanceWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
      p:statementId="org.terasoluna.batch.functionaltest.ch05.dbaccess.repository.SalesRepository.create"
      p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>

<!-- (10) -->
<bean id="writer" class="org.springframework.batch.item.support.CompositeItemWriter">
    <property name="delegates">
      <!-- (11)-->
        <list>
            <ref bean="performanceWriter"/>
            <ref bean="planWriter"/>
        </list>
    </property>
</bean>

<!-- (12) -->
<batch:job id="useCompositeItemWriter" job-repository="jobRepository">
    <batch:step id="useCompositeItemWriter.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="salesItemProcessor"
                         writer="writer" commit-interval="3"/>
        </batch:tasklet>
    </batch:step>
</batch:job>
Description

Sr. No.

Description

(1)

Implement ItemProcessor with DTO as output which retains each entity for updating both the tables for input data.
Since different objects cannot be passed in ItemWriter for updating 2 tables, a DTO which consolidates objects necessary for update is defined.

(2)

Create an entity for creating a new actual sales record (SalesPerformanceDetail) and store in DTO.

(3)

Update input data for updating sales plan which is also input data (SalesPlanDetail) and store it in DTO.

(4)

Define DTO(SalesPlanDetail) so as to retain a sales plan.

(5)

Define DTO(SalesPerformanceDetail) so as to retain actual sales record.

(6)

Define a SQL to update sales plan table (sales_plan_detail) in sales plan fetched from DTO (SalesPlanDetail).

(7)

Define a SQL to create a new actual sales table (sales_performance_detail) in actual sales fetched from DTO (SalesPlanDetail).

(8)

Define MyBatisBatchItemWriter which updates sales plan table (sales_plan_detail).

(9)

Define MyBatisBatchItemWriter which creates a new actual sales table (sales_performance_detail).

(10)

Define CompositeItemWriter in order to execute (9) and (10) sequentially.

(11)

Set (9) and (10) in <list> tag. ItemWriter is executed in the specified order.

(12)

Specify the Bean defined in (10), in writer attribute of chunk. Specify ItemProcessor of (1) in processor attribute.

It can also be updated for multiple data sources by using it together with org.springframework.data.transaction.ChainedTransactionManager which is explained in Output to multiple data sources (1 step).

Further, since CompositeItemWriter can be linked in case of ItemWriter implementation, it can be done along with database output and file output by setting MyBatisBatchItemWriter and FlatFileItemWriter.

5.3. File Access

5.3.1. Overview

This chapter describes how to input and output files.

The usage method of this function is same in the chunk model as well as tasklet model.

5.3.1.1. Type of File which can be handled
Type of File which can be handled

The type of files that can be handled with TERASOLUNA Batch 5.x are ones decribed as below.
This is the same for which Spring Batch can handle.

  • Flat File

  • XML

Here it will explain how to handle flat file first, and then explain about XML in How To Extend.

First, show the types of Flat File which can be used with TERASOLUNA Batch 5.x.
Each row inside the flat file will be called record, and type of file is determined by the record’s format.

Record Format
Format Overview

Variable-length Record

A record format which each items are separated by a delimiter, such as CSV and TSF. Each item’s length can be variable.

Fixed-length Record

A record format which each items are separeted by the items length(bytes). Each item’s length are fixed.

Single String Record

1 Record will be handled as 1 String item.

File Structure which can be handled

The basic structure for Flat File is constructed by these 2 points.

  • Record Division

  • Record Format

Elements to construct format of Flat File
Element Overview

Record Division

A division will indicate the type of record, such as Header Record, Data Record, and Trailer Record.
Details will be described later.

Record Format

The format will have informations of the record such as how many rows there is for Header, Data, and Trailer, how many times eace record will repeat, and so on.
There is also Single Format and Multi Format.Details will be described later.

With TERASOLUNA Batch 5.x, Flat File with Single Format of Multi Format which includes each record division can be handles.

Here it willl explain about the record division and the record formats.

The overview of each record devision is explained as below.

Characteristic of each Record Division
Record Division Overview

Header Record

A record that is mentioned at the beginning of the file(data part).
It has items such as field names, common matters of the file, and summary of the data part.

Data Record

It is a record having data to be processed as a main object of the file.

Trailer/Footer Record

A record that is mentioned at the end of the file if the file(data part).
It has items such as common matters of the file and summary of the data part.
For Single Format file, it is called a Fotter Record.

Footer/End Record

A record that is mentioned at the end of the file if the file is a Multi Format.
It has items such as common matters of the file and summary of the data part.

About the field that indicates the record division

A flat file having a header record or a trailer record may have a field indicating a record division.
In TERASOLUNA Batch 5.x, especially in the processing of multi-format files, the record division field is utilized, for example when different processing is performed for each record division.
Refer to Multi format for the implementation when selecting the processing to be executed by record classification.

About the name of file format

Depending on the definition of the file format in each system, There are cases where names are different from this guideline such as calling Footer Record as End Record.
Must be read as appropriate.

A summary of Single Format and Multi Format is shown below.

Overview of Single Format and Multi Format
Format Overview

Single Format

A format with Header N Rows + Data N Rows + Trailer N Rows.

Multi Format

A format with (Header N Rows + Data N Rows + Trailer N Rows) * N + Footer N Rows.
A format in which a Footer Record is added after repeating a Single Format a plurality of times.

The Multi Format record structure is shown in the figure as follows.

Multi format file layout
Multi Format Rrecord Structure Diagram

An example of a Single Format and Multi Format flat file is shown below.
// is used as a comment-out character for the description of the file.

Example of Single Format, flat file(CSV format) without record division
branchId,year,month,customerId,amount  // (1)
000001,2016,1,0000000001,100000000  // (2)
000001,2016,1,0000000002,200000000  // (2)
000001,2016,1,0000000003,300000000  // (2)
000001,3,600000000  // (3)
Item list of file contents
No Descriptions

(1)

A header record
Field name of the data part is described.

(2)

A data record.

(3)

A trailer record.
It holds summary information of the data part.

Example of Multi Format, flat file(CSV format) with record division
// (1)
H,branchId,year,month,customerId,amount  // (2)
D,000001,2016,1,0000000001,100000000
D,000001,2016,1,0000000002,200000000
D,000001,2016,1,0000000003,300000000
T,000001,3,600000000
H,branchId,year,month,customerId,amount  // (2)
D,00002,2016,1,0000000004,400000000
D,00002,2016,1,0000000005,500000000
D,00002,2016,1,0000000006,600000000
T,00002,3,1500000000
H,branchId,year,month,customerId,amount  // (2)
D,00003,2016,1,0000000007,700000000
D,00003,2016,1,0000000008,800000000
D,00003,2016,1,0000000009,900000000
T,00003,3,2400000000
F,3,9,4500000000  // (3)
Item list of file contents
No Descriptions

(1)

It has a field indicating the record division at the beginning of the record.
Each record division is defined as below.
H:Header Record
D:Data Record
T:Trailer Record
F:Footer Record

(2)

Every time branchId changes, it repeats header, data, trailer.

(3)

A footer record.
It holds summary information for the whole file.

Assumptions on format of data part

In How To Use, it will explain on the premise that the layout of the data part is the same format.
This means that all the records of the data part are mapped to the same conversion target class

About explanation of Multi Format file
  • In How To Use, it will describe about the Single Format file.

  • For flat files having Multi Format or a structure including a footer part in the above structure, refer to How To Extend

5.3.1.2. A component that inputs and outputs a flat file

Describe a class for handling flat file.

Input

The relationships of classes used for inputting flat files are as follows.

Component relationship FlatFileItemReader class diagram
Relationship of classes used for inputting flat files

The calling relationship of each component is as follows.

Component relationship FlatFileItemReader sequence diagram
Calling relationship of each component

Details of each component are shown below.

org.springframework.batch.item.file.FlatFileItemReader

Implementation class of ItemReader to use for loading flat files. Use the following components.
The flow of simple processing is as follows.
1.Use BufferedReaderFactory to get BufferedReader.
2.Read one record from the flat file using the acquired BufferedReader.
3.Use LineMapper to map one record to the target bean.

org.springframework.batch.item.file.BufferedReaderFactory

Generate BufferedReader to read the file.

org.springframework.batch.item.file.LineMapper

One record is mapped to the target bean. Use the following components.
The flow of simple processing is as follows.
1.Use LineTokenizer to split one record into each item.
2.Mapping items split by FieldSetMapper to bean properties.

org.springframework.batch.item.file.transform.LineTokenizer

Divide one record acquired from the file into each item.
Each partitioned item is stored in FieldSet class.

org.springframework.batch.item.file.mapping.FieldSetMapper

Map each item in one divided record to the property of the target bean.

Output

Relationships of classes used for outputting flat files are as follows.

Component relationship FlatFileItemWriter class diagram
Relationship of classes used for outputting flat files

The calling relationship of each component is as follows.

Component relationship FlatFileItemWriter sequence diagram
Calling relationship of each component
org.springframework.batch.item.file.FlatFileItemWriter

Implementation class of ItemWriter for exporting to a flat file. Use the following components. LineAggregator Mapping the target bean to one record.

org.springframework.batch.item.file.transform.LineAggregator

It is used to map the target bean to one record. The mapping between the properties of the bean and each item in the record is done in FieldExtractor.

org.springframework.batch.item.file.transform.FieldExtractor

Map the property of the target bean to each item in one record.

5.3.2. How To Use

Descriptoins for how to use flat file according to the record format.

Then, the following items are explained.

5.3.2.1. Variable-length record

Describe the definition method when dealing with variable-length record file.

5.3.2.1.1. Input

An example of setting for reading the following input file is shown.

Input File Sample
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
Class to be converted
public class SalesPlanDetail {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}

The setting for reading the above file is as follows.

Bean definition example
<!-- (1) (2) (3) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}"
      p:encoding="MS932"
      p:strict="true">
  <property name="lineMapper">  <!-- (4) -->
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <property name="lineTokenizer">  <!-- (5) -->
        <!-- (6) (7) (8) -->
        <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
              p:names="branchId,year,month,customerId,amount"
              p:delimiter=","
              p:quoteCharacter='"'/>
      </property>
      <property name="fieldSetMapper">  <!-- (9) -->
        <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
              p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
      </property>
    </bean>
  </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the input file.

Nothing

(2)

encoding

Sets the character code of the input file.
Default value of the character code of the component offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM’s default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

lineMapper

Set org.springframework.batch.item.file.mapping.DefaultLineMapper.
DefaultLineMapper is LineMapper which provides the basic operation of converting records to the class to be converted using the defined LineTokenizer and FieldSetMapper.

Nothing

(5)

lineTokenizer

Set org.springframework.batch.item.file.transform.DelimitedLineTokenizer.
DelimitedLineTokenizer is an implementation class of LineTokenizer that separates records by specifying delimiters.
It corresponds to the reading of escaped line feeds, delimiters, and enclosed characters defined in the specification of RFC-4180, which is a general format of CSV format.

Nothing

(6)

names

Give a name to each item of one record.
Each item can be retrieved using the name set in FieldSet used in FieldSetMapper.
Set each name from the beginning of the record with a comma separator.
When using BeanWrapperFieldSetMapper it is mandatory setting.

Nothing

(7)

delimiter

Set delimiter

comma

(8)

quoteCharacter

Set enclosing character

Nothing

(9)

fieldSetMapper

If special conversion processing such as character strings and numbers is unnecessary, use org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper, and specify the class to be converted to property targetType. By doing this, an instance that automatically sets the value in the field that matches the name of each item set in (5) will be created.
If conversion processing is necessary, set the implementation class of org.springframework.batch.item.file.mapping.FieldSetMapper.

Nothing

See How To Extend for the case of implementing FieldSetMapper yourself.

How to enter TSV format file

When reading the TSV file, it can be realized by setting a tab as a delimiter.

TSV file loading: Example of delimiter setting (setting by constant)
<property name="delimiter">
    <util:constant
            static-field="org.springframework.batch.item.file.transform.DelimitedLineTokenizer.DELIMITER_TAB"/>
</property>

Or, it may be as follows.

TSV file reading: Example of delimiter setting (setting by character reference)
<property name="delimiter" value="&#09;"/>
5.3.2.1.2. Output

An example of setting for writing the following output file is shown.

Output file example
001,CustomerName001,CustomerAddress001,11111111111,001
002,CustomerName002,CustomerAddress002,11111111111,002
003,CustomerName003,CustomerAddress003,11111111111,003
Class to be converted
public class Customer {

    private String customerId;
    private String customerName;
    private String customerAddress;
    private String customerTel;
    private String chargeBranchId;
    private Timestamp createDate;
    private Timestamp updateDate;

    // omitted getter/setter
}

The settings for writing the above file are as follows.

Bean definition example
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:resource="file:#{jobParameters['outputFile']}"
      p:encoding="MS932"
      p:lineSeparator="&#x0A;"
      p:appendAllowed="true"
      p:shouldDeleteIfExists="false"
      p:shouldDeleteIfEmpty="false"
      p:transactional="true">
  <property name="lineAggregator">  <!-- (8) -->
    <bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator"
          p:delimiter=",">  <!-- (9) -->
      <property name="fieldExtractor">  <!-- (10) -->
        <!-- (11) -->
        <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
              p:names="customerId,customerName,customerAddress,customerTel,chargeBranchId">
      </property>
    </bean>
  </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the output file.

Nothing

(2)

encoding

Sets the character code of the output file.
Default value of character code of the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

lineSeparator

Set record break (line feed code).

line.separator of system’s property

(4)

appendAllowed

If true, add to the existing file.
If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify property since the property is invalidated.
If true, delete if the file already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete file for output when output count is 0.
Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Described later.

false

(7)

transactional

Set whether to perform transaction control. For details, see Transaction Control.

true

(8)

lineAggregator

Set org.springframework.batch.item.file.transform.DelimitedLineAggregator.
To enclose a field around it, set org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator.
Usage of EnclosableDelimitedLineAggregator will be described later.

Nothing

(9)

delimiter

Sets the delimiter.

comma

(10)

fieldExtractor

If special conversion processing for strings and numbers is unnecessary, you can use org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor.
If conversion processing is necessary, set implementation class of org.springframework.batch.item.file.transform.FieldExtractor.
An example for implementatiion of FieldExtractor, refer to Output of Fixed-length record where a sample is described using full-width character.

Nothing

(11)

names

Give a name to each item of one record. Set each name from the beginning of the record with a comma separator.

Nothing

It is recommended not to set true for shouldDeleteIfEmpty property of FlatFileItemWriter.

For FlatFileItemWriter, unintended files are deleted when the properties are configured by the combinations as shown below.

  • p:shouldDeleteIfEmpty="true"

  • p:shouldDeleteIfExists="false"

Reasons are as given below.
When shouldDeleteIfEmpty is set to true, file for output is deleted when output count is 0.
The "output count is 0" also includes a case wherein file for output already exists with shouldDeleteIfExists set to false.

Hence, when properties are specified by combinations above, file for output is deleted if it exists already.
This becomes the unintended behaviour when preferably an exception should be thrown and the process should be terminated in case a file for output exists.

It is recommended not to set shouldDeleteIfEmpty property to true since it results in unintended operation.

Further, when subsequent processing like deletion of file is to be done if output count is 0, implementation should be done by using OS command or Listener instead of shouldDeleteIfEmpty property.

How to use EnclosableDelimitedLineAggregator

To enclose a field around it, use org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator provided by TERASOLUNA Batch 5.x.
The specification of EnclosableDelimitedLineAggregator is as follows.

  • Optional specification of enclosure character and delimiter character

    • Default is the following value commonly used in CSV format

      • Enclosed character: "(double quote)

      • Separator: , (comma)

  • If the field contains a carriage return, line feed, enclosure character, or delimiter, enclose the field with an enclosing character

    • When enclosing characters are included, the enclosing character will be escaped by adding an enclosing character right before this enclosing characters.

    • All fields can be surrounded by characters by setting

The usage of EnclosableDelimitedLineAggregator is shown below.

Output file example
"001","CustomerName""001""","CustomerAddress,001","11111111111","001"
"002","CustomerName""002""","CustomerAddress,002","11111111111","002"
"003","CustomerName""003""","CustomerAddress,003","11111111111","003"
Class to be converted
// Same as above example
Bean definition example(only settings for lineAggregator)
<property name="lineAggregator">  <!-- (1) -->
  <!-- (2) (3) (4) -->
  <bean class="org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator"
        p:delimiter=","
        p:enclosure='"'
        p:allEnclosing="true">
      <property name="fieldExtractor">
        <!-- omitted settings -->
      </property>
  </bean>
</property>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

lineAggregator

Set org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator.

Nothing

(2)

delimiter

Sets the delimiter.

comma

(3)

enclosure

Set the enclosing character.
If the enclosing character is included in the field, it is replaced with a concatenated character as an escape process.

double quote

(4)

allEnclosing

If true, all fields are enclosed in an enclosing character.
If false, only fields containing carriage return (CR), line-leading (LF), delimiter, and enclosing characters will be enclosed.

false

TERASOLUNA Batch 5.x provides the extension class org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator to satisfy the specification of RFC-4180.

The org.springframework.batch.item.file.transform.DelimitedLineAggregator provided by Spring Batch does not correspond to the enclosing process of the field, therefore it can not satisfy the specification of RFC-4180. Refer to Spring Batch/BATCH-2463 .

The format of the CSV format is defined as follows in RFC-4180 which is a general format of CSV format.

  • If the field does not contain line breaks, enclosing characters, or delimiters, each field can be enclosed in double quotes (enclosing characters) or not enclosed

  • Fields that contain line feed (CRLF), double quote (enclosing character), comma (delimiter) should be enclosed in double quotes

  • If the field is enclosed in double quotes (enclosing characters), the double quotes contained in the value of the field must be escaped with a single double quote immediately before it

How to output TSV format file

When outputting a TSV file, it can be realized by setting a tab as a delimiter.

Setting example of delimiter when outputting TSV file (setting by constant)
<property name="delimiter">
    <util:constant
            static-field="org.springframework.batch.item.file.transform.DelimitedLineTokenizer.DELIMITER_TAB"/>
</property>

Or, it may be as follows.

Example of delimiter setting when TSV file is output (setting by character reference)
<property name="delimiter" value="&#09;"/>
5.3.2.2. Fixed-length record

Describe how to define fixed length record files.

5.3.2.2.1. Input

An example of setting for reading the following input file is shown.

TERASOLUNA Batch 5.x corresponds to a format in which record delimitation is judged by line feed and format judged by the number of bytes.

Input file example 1 (record breaks are line feeds)
Sale012016 1   00000011000000000
Sale022017 2   00000022000000000
Sale032018 3   00000033000000000
Input file example 2 (record delimiter is byte number, 32 bytes is 1 record)
Sale012016 1   00000011000000000Sale022017 2   00000022000000000Sale032018 3   00000033000000000
Input file specification
No Field Name Data Type Number of bytes

(1)

branchId

String

6

(2)

year

int

4

(3)

month

int

2

(4)

customerId

String

10

(5)

amount

BigDecimal

10

Class to be converted
public class SalesPlanDetail {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}

The setting for reading the above file is as follows.

Bean definition example
<!-- (1) (2) (3) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}"
      p:encoding="MS932"
      p:strict="true">
    <property name="bufferedReaderFactory">  <!-- (4) -->
        <bean class="org.springframework.batch.item.file.DefaultBufferedReaderFactory"/>
    </property>
    <property name="lineMapper">  <!-- (5) -->
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">  <!-- (6) -->
                <!-- (7) -->
                <!-- (8) -->
                <!-- (9) -->
                <bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer"
                      p:names="branchId,year,month,customerId,amount"
                      c:ranges="1-6, 7-10, 11-12, 13-22, 23-32"
                      c:charset="MS932" />
            </property>
            <property name="fieldSetMapper">  <!-- (10) -->
              <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
                    p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
            </property>
        </bean>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the input file.

Nothing

(2)

encoding

Sets the character code of the input file.
Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

bufferedReaderFactory

To decide record breaks by line breaks, use the default value org.springframework.batch.item.file.DefaultBufferedReaderFactory. BufferedReader generated by DefaultBufferedReaderFactory acquires up to a newline as one record.

To judge the delimiter of a record by the number of bytes, set org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory provided by TERASOLUNA Batch 5.x. BufferedReader generated by FixedByteLengthBufferedReaderFactory acquires up to the specified number of bytes as one record.
Detailed specifications and usage of FixedByteLengthBufferedReaderFactory will be described later.

DefaultBufferedReaderFactory

(5)

lineMapper

Set org.springframework.batch.item.file.mapping.DefaultLineMapper.

Nothing

(6)

lineTokenizer

Set org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer provided by TERASOLUNA Batch 5.x.

Nothing

(7)

names

Give a name to each item of one record.
Each item can be retrieved using the name set in FieldSet used in FieldSetMapper.
Set each name from the beginning of the record with a comma separator.
When using BeanWrapperFieldSetMapper it is mandatory setting.

Nothing

(8)

ranges
(Constructor argument)

Sets the break position. Set the delimiter position from the beginning of the record, separated by commas.
The unit of each delimiter position is byte, and it is specified in start position - end position format.
The range specified from the record is acquired in the order in which the delimiter positions are set, and stored in FieldSet.
When names of (6) are specified, the delimiter positions are stored in FieldSet in correspondence with names in the order in which they are set.

Nothing

(9)

charset
(Constructor argument)

Set the same character code as (2).

Nothing

(10)

fieldSetMapper

If special conversion processing for character strings and numbers is unnecessary, use org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper, and specify the conversion target class as property targetType. By doing this, we create an instance that automatically sets the value in the field that matches the name of each item set in (6).
If conversion processing is necessary, set the implementation class of org.springframework.batch.item.file.mapping.FieldSetMapper.

Nothing

See How To Extend for the case of implementing FieldSetMapper yourself.

How to use FixedByteLengthBufferedReaderFactory

To read a file that judges record delimiter by byte count, use org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory provided by TERASOLUNA Batch 5.x.

By using FixedByteLengthBufferedReaderFactory, it is possible to acquire up to the number of bytes specified as one record.
The specification of FixedByteLengthBufferedReaderFactory is as follows.

  • Specify byte count of record as constructor argument

  • Generate FixedByteLengthBufferedReader which reads the file with the specified number of bytes as one record

Use of FixedByteLengthBufferedReader is as follows.

  • Reads a file with one byte length specified at instance creation

  • If there is a line feed code, do not discard it and read it by including it in the byte length of one record

  • The file encoding to be used for reading is the value set for FlatFileItemWriter, and it will be used when BufferedReader is generated.

The method of defining FixedByteLengthBufferedReaderFactory is shown below.

<property name="bufferedReaderFactory">
    <bean class="org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory"
        c:byteLength="32"/>  <!-- (1) -->

</property>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

byteLength
(Constructor argument)

Set the number of bytes per record.

Nothing

Components to use when handling Fixed-length files

When dealing with Fixed-length files, it is based on using the component provided by TERASOLUNA Batch 5.x.

FixedByteLengthBufferedReaderFactory

BufferedReader generation class that reads one record from the fixed-length file without line break by the number of bytes of the specified character code

FixedByteLengthLineTokenizer

The FixedLengthTokenizer extension class, separated by the number of bytes corresponding to the multibyte character string

Processing records containing multibyte character strings

When processing records containing multibyte character strings, be sure to use FixedByteLengthLineTokenizer.
The FixedLengthTokenizer provided by Spring Batch separates the record by the number of characters instead of the number of bytes, so there is a possibility that the item will not be extracted as expected.

Since this issue is already reported to JIRA Spring Batch/BATCH-2540, it might be unnecessary in the future.

For the implementation of FieldSetMapper, refer to How To Extend.
5.3.2.2.2. Output

An example of setting for writing the following output file is shown.

In order to write a fixed-length file, it is necessary to format the value obtained from the bean according to the number of bytes of the field.
The format execution method differs as follows depending on whether double-byte characters are included or not.

  • If double-byte characters is not included(single-byte characters only and the number of bytes of characters is constant)

    • Format using FormatterLineAggregator.

    • The format is set by the format used in the String.format method.

  • If double-byte characters is included(The number of bytes of characters is not constant depending on the character code)

    • Format with implementation class of FieldExtractor.

First, a setting example in the case where double-byte characters are not included in the output file is shown, followed by a setting example in the case where double-byte characters are included.

The setting when double-byte characters are not included in the output file is shown below.

Output file example
   0012016 10000000001  10000000
   0022017 20000000002  20000000
   0032018 30000000003  30000000
Output file specification
No Field Name Data Type Number of bytes

(1)

branchId

String

6

(2)

year

int

4

(3)

month

int

2

(4)

customerId

String

10

(5)

amount

BigDecimal

10

If the field’s value is less than the number of bytes specified, the rest of the field will be filled with halfwidth space.

Class to be converted
public class SalesPlanDetail {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}

The settings for writing the above file are as follows.

Bean definition
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:resource="file:#{jobParameters['outputFile']}"
      p:encoding="MS932"
      p:lineSeparator="&#x0A;"
      p:appendAllowed="true"
      p:shouldDeleteIfExists="false"
      p:shouldDeleteIfEmpty="false"
      p:transactional="true">
    <property name="lineAggregator">  <!-- (8) -->
        <bean class="org.springframework.batch.item.file.transform.FormatterLineAggregator"
              p:format="%6s%4s%2s%10s%10s"/>  <!-- (9) -->
            <property name="fieldExtractor">  <!-- (10) -->
              <bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
                    p:names="branchId,year,month,customerId,amount"/>  <!-- (11) -->
            </property>
        </bean>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the output file.

Nothing

(2)

encoding

Sets the character code of the output file.
Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set the character code even while using default value.

UTF-8

(3)

lineSeparator

Set the record break(line feed code).
To make it without line breaks, set (empty string).

line.separator of system’s property

(4)

appendAllowed

If true, add to the existing file.
If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify a property since this property is invalidated.
If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete the file for output if the output count is 0.
Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Notes for how to output variable length record.

false

(7)

transactional

Set whether to perform transaction control. For details, see Transaction Control.

true

(8)

lineAggregator

Set org.springframework.batch.item.file.transform.FormatterLineAggregator.

Nothing

(9)

format

Set the output format with the format used in the String.format method.

Nothing

(10)

fieldExtractor

If special conversion processing for strings and numbers is unnecessary, you can use org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor.

If conversion processing is necessary, set implementation class of org.springframework.batch.item.file.transform.FieldExtractor.
An example for implementatiion of FieldExtractor to format double-byte characters is written later on.

PassThroughFieldExtractor

(11)

names

Give a name to each item of one record. Set the names of each field from the beginning of the record with a comma.

Nothing

About PassThroughFieldExtractor

Deafult value for property fieldExtractor of FormatterLineAggregator is org.springframework.batch.item.file.transform.PassThroughFieldExtractor.

PassThroughFieldExtractor is a class to return the original item without processing anything, and is used when FieldExtractor will not process anything.

If the item is an array or a collection, it is returned as is, otherwise it is wrapped in an array of single elements.

Example of how to format a field with double-byte character

When formatting for double-byte characters, since the number of bytes per character differs depending on the character code, use the implementation class of FieldExtractor instead of FormatterLineAggregator.

Implementation class of FieldExtractor is to be done as follows.

  • Implement FieldExtractor and override extract method.

  • extract method is to be implemented as below

    • get the value from the item(target bean), and perform the conversion as needed

    • set the value to an array of object and return it.

The format of a field that includes double-byte characters is to be done in the implementation class of FieldExtractor by the following way.

  • Get the number of bytes for the character code

  • Format the value by trimming or padding it according to be number of bytes

Below is a setting example for formatting a field including double-byte characters.

Output file example
   0012016 10000000001  10000000
  番号2017 2 売上高002  20000000
 番号32018 3   売上003  30000000

Use of the output file is the same as the example above.

Bean definition(settings of lineAggregator only)
<property name="lineAggregator">  <!-- (1) -->
    <bean class="org.springframework.batch.item.file.transform.FormatterLineAggregator"
          p:format="%s%4s%2s%s%10s"/>  <!-- (2) -->
        <property name="fieldExtractor">  <!-- (3) -->
            <bean class="org.terasoluna.batch.functionaltest.ch05.fileaccess.plan.SalesPlanFixedLengthFieldExtractor"/>
        </property>
    </bean>
</property>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

lineAggregator

Set org.springframework.batch.item.file.transform.FormatterLineAggregator.

Nothing

(2)

format

Set the output format with the format used in the String.format method.
The number of digits is specified only for fields that do not contain double-byte characters.

Nothing

(3)

fieldExtractor

Set implementation class of FieldExtractor.
An implementation example will be described later.

PassThroughFieldExtractor

Class to be converted
public class SalesPlanDetail {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}
Sample implementation of FieldExtractor to format double-byte characters
public class SalesPlanFixedLengthFieldExtractor implements FieldExtractor<SalesPlanDetail> {
    // (1)
    @Override
    public Object[] extract(SalesPlanDetail item) {
        Object[] values = new Object[5];  // (2)

        // (3)
        values[0] = fillUpSpace(item.getBranchId(), 6);  // (4)
        values[1] = item.getYear();
        values[2] = item.getMonth();
        values[3] = fillUpSpace(item.getCustomerId(), 10);  // (4)
        values[4] = item.getAmount();

        return values; // (7)
    }

    // It is a simple impl for example
    private String fillUpSpace(String val, int num) {
        String charsetName = "MS932";
        int len;
        try {
            len = val.getBytes(charsetName).length;  // (5)
        } catch (UnsupportedEncodingException e) {
            // omitted exception handling
        }

        String fillStr = "";
        for (int i = 0; i < (num - len); i++) { // (6)
            fillStr += " ";
        }

        return fillStr + val;
    }
}
Item list of setting contents
No Description

(1)

Implement FieldExtractor class and override extract method.
Set the conversion target class as the type argument of FieldExtractor.

(2)

Define a Object type array to store data after the conversion.

(3)

Get the value from the item(target bean), and perform the conversion as needed, set the value to an array of object.

(4)

Format the field that includes double-byte character.
Refer to (5) and (6) for the details of format process.

(5)

Get the number of bytes for the character code.

(6)

Format the value by trimming or padding it according to be number of bytes.
In the implementation example, white space characters are added before the character string up to the specified number of bytes.

(7)

Returns an array of Object type holding the processing result.

5.3.2.3. Single String record

Describe the definition method when dealing with a single character string record file.

5.3.2.3.1. Input

An example of setting for reading the following input file is shown below.

Input file sample
Summary1:4,000,000,000
Summary2:5,000,000,000
Summary3:6,000,000,000

The setting for reading the above file is as follows.

Bean definition
<!-- (1) (2) (3) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}"
      p:encoding="MS932"
      p:strict="true">
    <property name="lineMapper">  <!-- (4) -->
        <bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper"/>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the input file.

Nothing

(2)

encoding

Sets the character code of the input file.
Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemWriter is "UTF-8").
Hence, it is recommended to explicitly set character code even while using default value.

JavaVM default character set

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

lineMapper

Set org.springframework.batch.item.file.mapping.PassThroughLineMapper.
PassThroughLineMapper is a implementation class of LineMapper, and it will return the String value of passed record as it is.

Nothing

5.3.2.3.2. Output

The setting for writing the above file is as follows.

Output file example
Summary1:4,000,000,000
Summary2:5,000,000,000
Summary3:6,000,000,000
Bean definition
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:resource="file:#{jobParameters['outputFile']}"
      p:encoding="MS932"
      p:lineSeparator="&#x0A;"
      p:appendAllowed="true"
      p:shouldDeleteIfExists="false"
      p:shouldDeleteIfEmpty="false"
      p:transactional="true">
    <property name="lineAggregator">  <!-- (8) -->
        <bean class="org.springframework.batch.item.file.transform.PassThroughLineAggregator"/>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the output file.

Nothing

(2)

encoding

Sets the character code of the output file.
Default value of character code for the components offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

lineSeparator

Set the record break(line feed code)

line.separator of system’s property

(4)

appendAllowed

If true, add to existing file.
If true, it must be noted that setting value of shouldDeleteIfExists is invalidated.

false

(5)

shouldDeleteIfExists

If appendAllowed is true, it is recommended not to specify the property since the property is invalidated.
If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(6)

shouldDeleteIfEmpty

If true, delete file for output if output count is 0.
Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details, refer Notes for how to output variable length records.

false

(7)

transactional

Set whether to perform transaction control. For details, see Transaction Control.

true

(8)

lineAggregator

Set org.springframework.batch.item.file.transform.PassThroughLineAggregator.
PassThroughLineAggregator is the implementation class of LineAggregator that will return the converted String value of the item(target Bean) as it is by processing item.toString().

Nothing

5.3.2.4. Header and Footer

Explain the input / output method when there is a header / footer.

Here, a method of skipping the header footer by specifying the number of lines will be explained.
When the number of records of header / footer is variable and it is not possible to specify the number of lines, use PatternMatchingCompositeLineMapper with reference to Multi format input

5.3.2.4.1. Input
Skipping Header

There are 2 ways to skip the header record.

  • Set the number of lines to skip to property linesToSkip of FlatFileItemReader

  • Remove header record in preprocessing by OS command

Input file sample
sales_plan_detail_11
branchId,year,month,customerId,amount
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000

The first 2 lines is the header record.

The setting for reading the above file is as follows.
Skip by using linesToSkip
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}"
      p:linesToSkip=value="2">  <!-- (1) -->
    <property name="lineMapper">
        <!-- omitted settings -->
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

linesToSkip

Set the number of header record lines to skipl.

0

Skip by using OS command
# Remove number of lines in header from the top of input file
tail -n +`expr 2 + 1` input.txt > output.txt

Use the tail command and get the 3rd line and after from input.txt, and then write it out to output.txt. Please note that the value specified for option -n + K of tail command is the number of header records + 1.

OS command to skip header record and footer record

By using the head and tail commands, it is possible to skip the header record and footer record by specifying the number of lines.

How to skip the header record

Execute the tail command with option -n +K, and get the lines after K from the target file.

How to skip the footer record

Execute the head command with option -n -K, and get the lines befor K from the target file.

A sample of shell script to skip header record and footer record can be written as follows.

An example of a shell script that removes a specified number of lines from a header / footer
#!/bin/bash

if [ $# -ne 4 ]; then
  echo "The number of arguments must be 4, given is $#." 1>&2
  exit 1
fi

# Input file.
input=$1

# Output file.
output=$2

# Number of lines in header.
header=$3

# Number of lines in footer.
footer=$4

# Remove number of lines in header from the top of input file
# and number of lines in footer from the end,
# and save to output file.
tail -n +`expr ${header} + 1` ${input} | head -n -${footer} > ${output}
Arguments
No Description

(1)

Input file

(2)

Output file

(3)

Number of lines to skip for header

(4)

Number of lines to skip for footer

Retrieving header information

Here shows how to recognize and retrive the header record.

The extraction of header information is implemented as follows.

Settings
  • Write the process for header record in implementation class of org.springframework.batch.item.file.LineCallbackHandler

    • Set the information retrieved in LineCallbackHandler#handleLine() to stepExecutionContext

  • Set implementation class of LineCallbackHandler to property skippedLinesCallback of FlatFileItemReader``

  • Set the number of lines to skip to property linesToSkip of FlatFileItemReader

Reading files and retrieving header information
  • For each line which is skipped by the setting of linesToSkip, LineCallbackHandler#handleLine() is executed

    • Header information is set to stepExecutionContext

Use retrieved header information
  • Get header information from stepExecutionContext and use it in the processing of the data part

An example of implementation for retrieving header record information is shown below.

Bean definition
<bean id="lineCallbackHandler"
      class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.HoldHeaderLineCallbackHandler"/>

<!-- (1) (2) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:linesToSkip="2"
      p:skippedLinesCallback-ref="lineCallbackHandler"
      p:resource="file:#{jobParameters['inputFile']}">
    <property name="lineMapper">
        <!-- omitted settings -->
    </property>
</bean>

<batch:job id="jobReadCsvSkipAndReferHeader" job-repository="jobRepository">
    <batch:step id="jobReadCsvSkipAndReferHeader.step01">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="reader"
                         processor="loggingHeaderRecordItemProcessor"
                         writer="writer" commit-interval="10"/>
            <batch:listeners>
                <batch:listener ref="lineCallbackHandler"/>  <!-- (3) -->
            </batch:listeners>
        </batch:tasklet>
    </batch:step>
</batch:job>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

linesToSkip

Set the number of lines to skip.

0

(2)

skippedLinesCallback

Set implementation class of LineCallbackHandler.
An implementation sample will be described later.

Nothing

(2)

listener

Set implementation class of StepExecutionListener.
This setting is needed sinche the LineCallbackHandler set to property skippedLinesCallback of FlatFileItemReader will not be automatically registered as the Listener.
The detailed reason will be described later.

Nothing

About the listener

Since the following two cases are not automatically registered as Listener, it is necessary to add a definition to Listeners at the time of job definition.
(If listener definitions are not added, StepExecutionListener # beforeStep () will not be executed)

  • StepExecutionListener of LineCallbackHandler which is set to skippedLinesCallback of FlatFileItemReader

  • StepExecutionListener implemented to implementation class of Tasklet

    <batch:job id="jobReadCsvSkipAndReferHeader" job-repository="jobRepository">
        <batch:step id="jobReadCsvSkipAndReferHeader.step01">
            <batch:tasklet transaction-manager="jobTransactionManager">
                <batch:chunk reader="reader"
                             processor="loggingHeaderRecordItemProcessor"
                             writer="writer" commit-interval="10"/>
                <batch:listeners>
                    <batch:listener ref="loggingItemReaderListener"/>
                    <!-- mandatory -->
                    <batch:listener ref="lineCallbackHandler"/>
                </batch:listeners>
            </batch:tasklet>
        </batch:step>
    </batch:job>

LineCallbackHandler should be implemented as follows.

  • Implement StepExecutionListener#beforeStep()

    • Implement StepExecutionListener#beforeStep() by either ways shown below

      • Implement StepExecutionListener class and override beforeStep method

      • Implement beforeStep method and annotate with @BeforeStep

    • Get StepExecution in the beforeStep method and save it in the class field

  • Implement LineCallbackHandler#handleLine()

    • Implement LineCallbackHandler class and override handleLine

      • Note that handleLine method will be executed each time skip is proceeded

    • Get stepExecutionContext from StepExecution and set header information to stepExecutionContext

Sample implementation of LineCallbackHandler
@Component
public class HoldHeaderLineCallbackHandler implements LineCallbackHandler {  // (!)
    private StepExecution stepExecution;  // (2)

    @BeforeStep  // (3)
    public void beforeStep(StepExecution stepExecution) {
        this.stepExecution = stepExecution;  // (4)
    }

    @Override  // (5)
    public void handleLine(String line) {
        this.stepExecution.getExecutionContext().putString("header", line);  // (6)
    }
}
Item list of setting contents
No Description

(1)

Implement LineCallbackHandler class and override handleLine.

(2)

Define a field to save StepExecution.

(3)

Implement beforeStep method and annotate it with @BeforeStep.
The signature will be void beforeStep(StepExecution stepExecution).
It is also possible to implement the StepExecutionListener class and override beforeStep method.

(4)

Get the StepExecution and save it to the class field.

(5)

Implement LineCallbackHandler class and override handleLine method.

(6)

Get stepExecutionContext from StepExecution, set header information to stepExecutionContext by using key header.
Here, for simplicity, only the last one line of two lines to be skipped is stored.

Here is a sample of getting the header information from stepExecutionContext and using it for processing of data part.
A sample of using header information in ItemProcessor will be described as an example.
The same can be done when using header information in other components.

The implementation of using header information is done as follows.

  • As like the sample of implementing LineCallbackHandler, implement StepExecutionListener#beforeStep()

  • Get StepExecution in beforeStep method and save it to the class field

  • Get stepExecutionContext and the header information from StepExecution and use it

Sample of how to use header information
@Component
public class LoggingHeaderRecordItemProcessor implements
        ItemProcessor<SalesPlanDetail, SalesPlanDetail> {
    private StepExecution stepExecution;  // (1)

    @BeforeStep  // (2)
    public void beforeStep(StepExecution stepExecution) {
        this.stepExecution = stepExecution;  // (3)
    }

    @Override
    public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
        String headerData = this.stepExecution.getExecutionContext()
                .getString("header");  // (4)
        // omitted business logic
        return item;
    }
}
Item list of setting contents
No Description

(1)

Define a field to save StepExecution.

(2)

Implement beforeStep method and annotate it with @BeforeStep.
The signature will be void beforeStep(StepExecution stepExecution).
It is also possible to implement the StepExecutionListener class and override beforeStep method.

(3)

Get the StepExecution and save it to the class field.

(4)

Get stepExecutionContext from StepExecution, set header information to stepExecutionContext by using key header.

About the use of ExecutionContext of Job/Step

In retrieving header (footer) information, the method is to store the read header information in ExecutionContext of StepExecution, and retrieves it from ExecutionContext when using it.

In the example below, header information is stored in ExecutionContext of StepExecution in order to obtain and use header information within one step. If step is divided by retreiving and using the header information, use ExecutionContext of JobExecution.

For details about ExecutionContext of Job/Step, refer to Architecture of Spring Batch

Skipping Footer

Since Spring Batch nor TERASOLUNA Batch 5.x does not support skipping footer record, it needs to be done by OS command.

Input File Sample
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000

The last 2 lines is the footer record.

The setting for reading the above file is as follows.

Skipping by OS command
# Remove number of lines in footer from the end of input file
head -n -2 input.txt > output.txt

Use head command, get the lines above the second line from the last from input.txt, and write it out to output.txt.

It is reported to JIRA Spring Batch/BATCH-2539 that Spring Batch does not have a functions to skip the footer record.
Hence, there is a possibility that not only by OS command, but Spring Batch will be able to skip the footer record in the future.

Retrieving footer information

In Spring Batch and TERASOLUNA Batch 5.x, functions for skipping footer record retreiving footer information is not provided.

Therefore, it needs to be divided into preprocessing OS command and 2 steps as described below.

  • Divide footer record by OS command

  • In 1st step, read the footer record and set footer information to ExecutionContext

  • In 2nd step, retrive footer information from ExecutionContext and use it

Retreiving footer information will be implemented as follows.

Divide footer record by OS command
  • Use OS command to divide the input file to footer part and others

1st step, read the footer record and get footer information
  • Read the footer record and set it to jobExecutionContext

    • Since the steps are different in storing and using footer information, store it in jobExecutionContext.

    • The use of jobExecutionContext is same as the stepExecutionContext explained in Retrieving header information, execpt for the scope of Job and Step.

2nd step, use the retrieved footer information
  • Get the footer information from jobExecutionContext and use it for processing of data part.

An example will be described in which footer information of the following file is taken out and used.

Input File Sample
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000

The last 2 lines is footer record.

Divide footer record by OS command

The setting to divide the above file into footer part and others by OS command is as follows.

Skipping by OS command
# Extract non-footer record from input file and save to output file.
head -n -2 input.txt > input_data.txt

# Extract footer record from input file and save to output file.
tail -n 2 input.txt > input_footer.txt

Use head command, write footer part of input.txt to input_footer.txt, and others to input_data.txt.

Output file sample is as follows.

Output file example(input_data.txt)
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
Output file example(input_footer.txt)
number of items,3
total of amounts,6000000000
Get/Use footer information

Explain how to get and use footer information from a footer record divided by OS command.

The step of reading the footer record is divided into the preprocessing and main processing.
Refer to Flow Controll for details of step dividing.

In the example below, a sample is shown in which footer information is retreived and stored in jobExecutionContext.
Footer information can be used by retreiving it from jobExecutionContext like the same way described in Retrieving header information.

Class to set information of data record
public class SalesPlanDetail {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}
Class to set information of footer record
public class SalesPlanDetailFooter implements Serializable {

    // omitted serialVersionUID

    private String name;
    private String value;

    // omitted getter/setter
}

Define the Bean like below.

  • Define ItemReader to read footer record

  • Define ItemReader to read data record

  • Define business logic to retreive footer record

    • In the sample below, it is done by implementing Tasklet

  • Define a job

    • Define a step with a preprocess to get footer information and a main process to read data records.

Bean definition
<!-- ItemReader for reading footer records -->
<!-- (1) -->
<bean id="footerReader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['footerInputFile']}">
    <property name="lineMapper">
        <!-- omitted other settings -->
    </property>
</bean>

<!-- ItemReader for reading data records -->
<!-- (2) -->
<bean id="dataReader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['dataInputFile']}">
    <property name="lineMapper">
        <!-- omitted other settings -->
    </property>
</bean>

<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
  <!-- omitted settings -->
</bean>

<!-- Tasklet for reading footer records -->
<bean id="readFooterTasklet"
      class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.ReadFooterTasklet"/>

<batch:job id="jobReadAndWriteCsvWithFooter" job-repository="jobRepository">
    <!-- (3) -->
    <batch:step id="jobReadAndWriteCsvWithFooter.step01"
            next="jobReadAndWriteCsvWithFooter.step02">
        <batch:tasklet ref="readFooterTasklet"
                       transaction-manager="jobTransactionManager"/>
    </batch:step>
    <!-- (4) -->
    <batch:step id="jobReadAndWriteCsvWithFooter.step02">
        <batch:tasklet transaction-manager="jobTransactionManager">
            <batch:chunk reader="dataReader"
                         writer="writer" commit-interval="10"/>
        </batch:tasklet>
    </batch:step>
    <batch:listeners>
        <batch:listener ref="readFooterTasklet"/> <!-- (5) -->
    </batch:listeners>
</batch:job>
Item list of setting contents
No Item Setting contents Required Default Value

(1)

footerReader

Define ItemReader to read a file with footer record.
Used by injecting it to readFooterTasklet which is executed when retreiving footer information.

(2)

dataReader

Define ItemReader to read a file with data record.

(3)

preprocess step

Define a step to get the footer information.
Implemented at readFooterTasklet. Implementation sample is written later on.

(4)

main process step

A step of retreiving data information and using footer information is defined.
Use dataReader for reader.
In the sample, method to get footer information from jobExecutionContext such as ItemProcessor is not implemented.
Footer information can be retreived and used the same way described in Retrieving header information.

(5)

listeners

Set readFooterTasklet.
Without this setting, JobExecutionListener#beforeJob() implemented in readFooterTasklet will not be executed.
For details, refer to Retrieving header information.

Nothing

An example for reading a file with footer record and storing it to jobExecutionContextis shown below.

The way to make it as the implementation class of Tasklet is as follows.

  • Inject the bean defined footerReader by name using @Inject@ and @Named

  • Set the footer information to jobExecutionContext

Getting footer information
public class ReadFooterTasklet implements Tasklet {
    // (1)
    @Inject
    @Named("footerReader")
    ItemStreamReader<SalesPlanDetailFooter> itemReader;

    private JobExecution jobExecution;

    @BeforeJob
    public void beforeJob(JobExecution jobExecution) {
        this.jobExecution = jobExecution;
    }

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {
        ArrayList<SalesPlanDetailFooter> footers = new ArrayList<>();

        // (2)
        itemReader.open(chunkContext.getStepContext().getStepExecution()
                .getExecutionContext());

        SalesPlanDetailFooter footer;
        while ((footer = itemReader.read()) != null) {
            footers.add(footer);
        }

        // (3)
        jobExecution.getExecutionContext().put("footers", footers);

        return RepeatStatus.FINISHED;
    }
}
Item list of setting contents
No Description

(1)

Inject the bean defined footerReader by name using @Inject@ and @Named.

(2)

Use footerReader to read the file with footer record and get the footer information.
To use ItemReader bean defined in implementation class of Tasklet, refer to Creating a tasklet-oriented job

(3)

Get jobExecutionContext from JobExecution, set the footer information to jobExecutionContext by key footers.

5.3.2.4.2. Output
Output header information

To output header information to a flat file, implement as follows.

  • Implement org.springframework.batch.item.file.FlatFileHeaderCallback

  • Set the implemented FlatFileHeaderCallback to property headerCallback of FlatFileItemWriter

    • By setting headerCallback, FlatFileHeaderCallback#writeHeader() will be executed at first when processing FlatFileItemWriter

Implement FlatFileHeaderCallback as follows.

  • Implement FlatFileHeaderCallback class and override writeHeader.

  • Write the header information using Writer from the argument.

Sample implementation of FlatFileHeaderCallback is shown below.

Sample implementation of FlatFileHeaderCallback
@Component
// (1)
public class WriteHeaderFlatFileFooterCallback implements FlatFileHeaderCallback {
    @Override
    public void writeHeader(Writer writer) throws IOException {
        // (2)
        writer.write("omitted");
    }
}
Item list of setting contents
No Description

(1)

Implement FlatFileHeaderCallback class and override writeHeader method.

(2)

Write the header information using Writer from the argument.
Write method of FlatFileItemWriter will be executed right after the execution of FlatFileHeaderCallback#writeHeader().
Therefore, printing line break at the end of header information is not needed. The line feed that is printed is the one set when FlatFileItemWriter bean was defined.

Bean definition
<!-- (1) (2) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:headerCallback-ref="writeHeaderFlatFileFooterCallback"
      p:lineSeparator="&#x0A;"
      p:resource="file:#{jobParameters['outputFile']}">
    <property name="lineAggregator">
        <!-- omitted settings -->
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

headerCallback

Set implementation class of FlatFileHeaderCallback.

(2)

lineSeparator

Set the record break(line feed code)

line.separator of system’s property

When implementing FlatFileHeaderCallback, printing line feed at the end of header information is not necessary

Right after executing FlatFileHeaderCallback#writeHeader() in FlatFileItemWriter, line feed is printed according to the bean definition, so the line feed at the end of header information does not need to be printed.

Output footer information

To output footer information to a flat file, implement as follows.

  • Implement org.springframework.batch.item.file.FlatFileFooterCallback

  • Set the implemented FlatFileFooterCallback to property footerCallback of FlatFileItemWriter

    • By setting footerCallback, FlatFileHeaderCallback#writeFooter() will be executed at first when processing FlatFileItemWriter

A method of outputting footer information with a flat file will be described.

Implement FlatFileFooterCallback as follows.

  • Output footer information using Writer from the argument.

  • Implement FlatFileFooterCallback class and override writeFooter.

Below is an implementation sample of FlatFileFooterCallback class for a Job to get footer information from ExecutionContext and write it out to a file.

Class to set information of footer record
public class SalesPlanDetailFooter implements Serializable {

    // omitted serialVersionUID

    private String name;
    private String value;

    // omitted getter/setter
}
Implementation Sample of FlatFileFooterCallback
@Component
public class WriteFooterFlatFileFooterCallback implements FlatFileFooterCallback {  // (1)
    private JobExecution jobExecution;

    @BeforeJob
    public void beforeJob(JobExecution jobExecution) {
        this.jobExecution = jobExecution;
    }

    @Override
    public void writeFooter(Writer writer) throws IOException {
        @SuppressWarnings("unchecked")
        ArrayList<SalesPlanDetailFooter> footers = (ArrayList<SalesPlanDetailFooter>) this.jobExecution.getExecutionContext().get("footers");  // (2)

        BufferedWriter bufferedWriter = new BufferedWriter(writer);  // (3)
        // (4)
        for (SalesPlanDetailFooter footer : footers) {
            bufferedWriter.write(footer.getName() +" is " + footer.getValue());
            bufferedWriter.newLine();
            bufferedWriter.flush();
        }
    }
}
Item list of setting contents
No Description

(1)

Implement FlatFileFooterCallback class and override writeFooter method.

(2)

Get footer information form ExecutionContext of the Job using key footers.
In the sample, it uses ArrayList to get several footer informations.

(3)

In the sample, in order to use BufferedWriter.newLine() for printing line feed, it is using Writer from the argument as a parameter to generate BufferedWriter.

(4)

Use the Writer of argument to print footer information.

Bean definition
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
      p:resource="file:#{jobParameters['outputFile']}"
      p:footerCallback-ref="writeFooterFlatFileFooterCallback">  <!-- (1) -->
    <property name="lineAggregator">
        <!-- omitted settings -->
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

footerCallback

Set implementation class of FlatFileFooterCallback.

5.3.2.5. Multiple Files

Describe how to handle multiple files.

5.3.2.5.1. Input

To read multiple files of the same record format, use org.springframework.batch.item.file.MultiResourceItemReader.
MultiResourceItemReader can use the specified ItemReader to read multiple files specified by regular expressions.

Implement MultiResourceItemReader as follows.

  • Define bean of MultiResourceItemReader

    • Set file to read to property resources

      • user regular expression to read multiple files

    • Set ItemReader to read files to property delegate

Below is a definition example of MultiResourceItemReader to read multiple files with the following file names.

File to be read (file name)
sales_plan_detail_01.csv
sales_plan_detail_02.csv
sales_plan_detail_03.csv
Bean definition
<!-- (1) (2) -->
<bean id="multiResourceReader"
      class="org.springframework.batch.item.file.MultiResourceItemReader"
      scope="step"
      p:resources="file:input/sales_plan_detail_*.csv"
      p:delegate-ref="reader"/>
</bean>

<!-- (3) -->
<bean id="reader"
      class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="lineMapper">
      <!-- omitted settings -->
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set multiple input files with regular expressions.

Nothing

(2)

delegate

Set ItemReader where it has the actual file read implementation.

Nothing

(3)

ItemReader with the actual file read implementation

Since property resource is set automatically from MultiResourceItemReader, it is not necessary to set it in Bean definition.

It is unnecessary to specify resource for ItemReader used by MultiResourceItemReader

Since resource of ItemReader delegated from MultiResourceItemReader is automatically set from MultiResourceItemReader, it is not necessary to set it in Bean definition.

5.3.2.5.2. Output

Explain how to define multiple files.

To output to a different file for a certain number of cases, use org.springframework.batch.item.file.MultiResourceItemWriter.

MultiResourceItemWriter can output to multiple files for each number specified using the specified ItemWriter.
It is necessary to make the output file name unique so as not to overlap, but ResourceSuffixCreator is provided as a mechanism for doing it.
ResourceSuffixCreator is a class that generates a suffix that makes the file name unique.

For example, if you want to make the output target file a file name outputDir / customer_list_01.csv (01 part is serial number), set it as follows.

  • Set outputDir/customer_list_ to MultiResourceItemWriter

  • Implement a code to generate suffix 01.csv(01 part is serial number) at ResourceSuffixCreator

    • Serial numbers can use the value automatically incremented and passed from MultiResourceItemWriter

  • outputDir/customer_list_01.csv is set to the ItemWriter that is actually used

MultiResourceItemWriter is defined as follows. How to implement ResourceSuffixCreator is described later.

  • Define implementation class of ResourceSuffixCreator

  • Define bean for MultiResourceItemWriter

    • Set output file to property resources

      • Set the file name up to the suffix given to implementation class of ResourceSuffixCreator

    • Set implementation class of ResourceSuffixCreator that generates suffix to property resourceSuffixCreator

    • Set ItemWrite which is to be used to read file to property delegate

    • Set the number of output per file to property itemCountLimitPerResource

Bean definition
<!-- (1) (2) (3) (4) -->
<bean id="multiResourceItemWriter"
      class="org.springframework.batch.item.file.MultiResourceItemWriter"
      scope="step"
      p:resource="file:#{jobParameters['outputDir']}"
      p:resourceSuffixCreator-ref="customerListResourceSuffixCreator"
      p:delegate-ref="writer"
      p:itemCountLimitPerResource="4"/>
</bean>

<!-- (5) -->
<bean id="writer"
      class="org.springframework.batch.item.file.FlatFileItemWriter">
    <property name="lineAggregator">
        <!-- omitted settings -->
    </property>
</bean>

<bean id="customerListResourceSuffixCreator"
      class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.CustomerListResourceSuffixCreator"/>  <!-- (6) -->
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Sets the state before adding the suffix of the output target file.
A file name with suffix given automatically by MultiResourceItemWriter is set to ItemWriter.

Nothing

(2)

resourceSuffixCreator

Set implementation class of ResourceSuffixCreator.
Default is org.springframework.batch.item.file.SimpleResourceSuffixCreator whice generates suffix "." + index.

SimpleResourceSuffixCreator

(3)

delegate

Set a ItemWriter which actually reads the file.

Nothing

(4)

itemCountLimitPerResource

Set the number of output per file.

Integer.MAX_VALUE

(5)

ItemWriter which actually reads the file.

Since property resource is automatically set from MultiResourceItemWriter, it is not necessary to set it in Bean definition.

Setting of resource of ItemWrite used by MultiResourceItemWriter is not necessary

Since Resource of ItemWriter delegated from MultiResourceItemWriter is automatically set from MultiResourceItemWriter, it is not necessary to set it in the bean definition.

Implement ResourceSuffixCreator as follows.

  • Implement ResourceSuffixCreator and override getSuffix method

  • Use argument’s index and generate suffix to return

    • index is an int type value with initial value 1, and will be incremented for each output file

Sample implementation of ResourceSuffixCreator
// (1)
public class CustomerListResourceSuffixCreator implements ResourceSuffixCreator {
    @Override
    public String getSuffix(int index) {
        return String.format("%02d", index) + ".csv";  // (2)
    }
}
Item list of setting contents
No Description

(1)

Implement ResourceSuffixCreator class and override getSuffix method.

(2)

Use argument’s index to generate suffix to return. index is an int type value with initial value 1, and will be incremented for each output file.

5.3.2.6. Control Break

How to actually do the Control Break will be described here.

What is Control Break

Control Break process(or Key Break process) is a process method to read sorted records one by one, and handle records with a certain item(key item) as one group.
It is an algorithm that is used mainly for aggregating data, continues counting while key items are the same value, and outputs aggregate values when key items become different values.

In order to perform the control break processing, it is necessary to pre-read the record in order to judge the change of the group. Pre-reading records can be done by using org.springframework.batch.item.support.SingleItemPeekableItemReader.
Also, control break can be processed only in tasklet model. This is because of the premise that the chunk model is based on, which are "processing N data rows defined by one line" and "transaction boundaries every fixed number of lines", does not fit with the control break’s basic algorithm, "proceed at the turn of group".

The execution timing of control break processing and comparison conditions are shown below.

  • Execute control break before processing the target record

    • Keep the previously read record, compare previous record with current record

  • Execute control break after processing the target record

    • Pre-read the next record by SingleItemPeekableItemReader and compare the current record with the next record

A sample for outputting process result from input data using control break is shown below.

Input Data
01,2016,10,1000
01,2016,11,1500
01,2016,12,1300
02,2016,12,900
02,2016,12,1200
Process Result
Header Branch Id : 01,,,
01,2016,10,1000
01,2016,11,1500
01,2016,12,1300
Summary Branch Id : 01,,,3800
Header Branch Id : 02,,,
02,2016,12,900
02,2016,12,1200
Summary Branch Id : 02,,,2100
Implementation Sample of Control Break
@Component
public class ControlBreakTasklet implements Tasklet {

    @Inject
    SingleItemPeekableItemReader<SalesPerformanceDetail> reader; // (1)

    @Inject
    ItemStreamWriter<SalesPerformanceDetail> writer;

    @Override
    public RepeatStatus execute(StepContribution contribution,
            ChunkContext chunkContext) throws Exception {

        // omitted.

        SalesPerformanceDetail previousData = null;   // (2)
        BigDecimal summary = new BigDecimal(0);  //(3)

        List<SalesPerformanceDetail> items = new ArrayList<>();   // (4)

        try {
            reader.open(executionContext);
            writer.open(executionContext);

            while (reader.peek() != null) {   // (5)
                SalesPerformanceDetail data = reader.read(); // (6)

                // (7)
                if (isBreakByBranchId(previousData, data)) {
                    SalesPerformanceDetail beforeBreakData =
                            new SalesPerformanceDetail();
                    beforeBreakData.setBranchId("Header Branch Id : "
                              + currentData.getBranchId());
                    items.add(beforeBreakData);
                }

                // omitted.
                items.add(data);  // (8)

                SalesPerformanceDetail nextData = reader.peek();  // (9)
                summary = summary.add(data.getAmount());

                // (10)
                SalesPerformanceDetail afterBreakData = null;
                if (isBreakByBranchId(nextData, data)) {
                    afterBreakData = new SalesPerformanceDetail();
                    afterBreakData.setBranchId("Summary Branch Id : "
                            + currentData.getBranchId());
                    afterBreakData.setAmount(summary);
                    items.add(afterBreakData);
                    summary = new BigDecimal(0);
                    writer.write(items);  // (11)
                    items.clear();
                }
                previousData = data;  // (12)
            }
        } finally {
            try {
                reader.close();
            } catch (ItemStreamException e) {
            }
            try {
                writer.close();
            } catch (ItemStreamException e) {
            }
        }
        return RepeatStatus.FINISHED;
    }
    // (13)
    private boolean isBreakByBranchId(SalesPerformanceDetail o1,
            SalesPerformanceDetail o2) {
        return (o1 == null || !o1.getBranchId().equals(o2.getBranchId()));
    }
}
Item list of setting contents
No Description

(1)

Inject SingleItemPeekableItemReader.

(2)

Define a variable to set the previously read record.

(3)

Define a variable to set aggregated values for each group.

(4)

Define a variable to set records for each group including the control break’s process result

(5)

Repeat the process until there is no input data.

(6)

Read the record to be processed.

(7)

Execute a control break before target record processing.
In the sample, if it is at the beginning of the group, heading is set stored it the variable defined in (4).

(8)

Set the process result to the variable defined in (4).

(9)

Pre-read the next record.

(10)

Execute a control break after target record processing. In this case, if it is at the end of the group, the aggregated data is set in the trailer and stored in the variable defined in (4).

(11)

Output processing results for each group.

(12)

Store the processing record in the variable defined in (2).

(13)

Judge whether the key item has switched or not.

Bean definition
<!-- (1) -->
<bean id="reader"
      class="org.springframework.batch.item.support.SingleItemPeekableItemReader"
      p:delegate-ref="delegateReader" />  <!-- (2) -->

<!-- (3) -->
<bean id="delegateReader"
      class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}">
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
                      p:names="branchId,year,month,customerId,amount"/>
            </property>
            <property name="fieldSetMapper">
                <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
                      p:targetType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail"/>
            </property>
        </bean>
    </property>
</bean>
Item list of setting contents
No Description

(1)

Define bean for SingleItemPeekableItemReader. It will be injected to the Tasklet.

(2)

Set the bean of ItemReader that actually reads the file to delegate property.

(3)

Define a bean for ItemReader that actually read the file.

5.3.3. How To Extend

Here, an explanation will be written based on the below case.

5.3.3.1. Implmementation of FieldSetMapper

Explain how to implement FieldSetMapper yourself.

Implement FieldSetMapper class as follows.

  • Implement FieldSetMapper class and override mapFieldSet method.

  • Get the value from argument’s FieldSet, do any process needed, and then set it to the conversion target bean as a return value

    • The FieldSet class is a class that holds data in association with an index or name, as in the JDBC ResultSet class

    • The FieldSet class holds the value of each field of a record divided by LineTokenizer

    • You can store and retrieve values by specifying an index or name

Here is sample implementation for reading a file that includes data that needs to be converted, such as BigDecimal type with comma and Date type of Japanese calendar format.

Input File Sample
"000001","平成28年1月1日","000000001","1,000,000,000"
"000002","平成29年2月2日","000000002","2,000,000,000"
"000003","平成30年3月3日","000000003","3,000,000,000"
Input file specification
No Field Name Data Type Note

(1)

branchId

String

(2)

Date

Date

Japanese calendar format

(3)

customerId

String

(4)

amount

BigDecimal

include comma

Class to be converted
public class UseDateSalesPlanDetail {

    private String branchId;
    private Date date;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}
Implementation Sample of FieldSetMapper
@Component
public class UseDateSalesPlanDetailFieldSetMapper implements FieldSetMapper<UseDateSalesPlanDetail> {  // (1)
    /**
     * {@inheritDoc}
     *
     * @param fieldSet {@inheritDoc}
     * @return Sales performance detail.
     * @throws BindException {@inheritDoc}
     */
    @Override
    public UseDateSalesPlanDetail mapFieldSet(FieldSet fieldSet) throws BindException {
        UseDateSalesPlanDetail item = new UseDateSalesPlanDetail();  // (2)

        item.setBranchId(fieldSet.readString("branchId"));  // (3)

        // (4)
        DateFormat japaneseFormat = new SimpleDateFormat("GGGGy年M月d日", new Locale("ja", "JP", "JP"));
        try {
            item.setDate(japaneseFormat.parse(fieldSet.readString("date")));
        } catch (ParseException e) {
            // omitted exception handling
        }

        // (5)
        item.setCustomerId(fieldSet.readString("customerId"));

        // (6)
        DecimalFormat decimalFormat = new DecimalFormat();
        decimalFormat.setParseBigDecimal(true);
        try {
            item.setAmount((BigDecimal) decimalFormat.parse(fieldSet.readString("amount")));
        } catch (ParseException e) {
            // omitted exception handling
        }

        return item;  // (7)
    }
}
Item list of setting contents
No Description

(1)

Implement FieldSetMapper class and override mapFieldSet method. Set conversion target class for type argument of FieldSetMapper.

(2)

Define a variable of conversion target class to store converted data.

(3)

Get branchId from argument’s FieldSet, and store it to conversion target class variable.
Conversion for branchId is not done in the sample since it is not necessary.

(4)

Get date from argument’s FieldSet, and store it to conversion target class variable.
Use SimpleDateFormat to convert Japanese calendar format date to Date type value.

(5)

Get customerId from argument’s FieldSet, and store it to conversion target class variable.
Conversion for customerId is not done in the sample since it is not necessary.

(4)

Get amount from argument’s FieldSet, and store it to conversion target class variable.
Use DecimalFormat to convert value with comma to BigDecimal type value.

(7)

Return the conversion target class holding the processing result.

Getting value from FieldSet class

The FieldSet class has methods corresponding to various data types for obtaining stored values such as listed below.
When generating FieldSet if data is stored in association with the field name, it is possible to get data by specifying that name or by specifying the index.

  • readString()

  • readInt()

  • readBigDecimal()

etc

5.3.3.2. XML File

Describe the definition method when dealing with XML files.

For the conversion process between Bean and XML (O / X (Object / XML) mapping), use the library provided by Spring Framework.
Implementation classes are provided as Marshaller and Unmarshaller using XStream, JAXB, etc. as libraries for converting between XML files and objects.
Use one suitable for your situation.

Below are features and points for adopting JAXB and XStream.

JAXB
  • Specify the bean to be converted in the bean definition file

  • Validation using a schema file can be performed

  • It is useful when the schema is defined externally and the specification of the input file is strictly determined

XStream
  • You can map XML elements and bean fields flexibly in the bean definition file

  • It is useful when you need to flexibly map beans

Here is a sample using JAXB.

5.3.3.2.1. Input

For inputting XML file, use org.springframework.batch.item.xml.StaxEventItemReader provided by Spring Batch.
StaxEventItemReader can read the XML file by mapping the XML file to the bean using the specified Unmarshaller.

Implement StaxEventItemReader as follows.

  • Add @XmlRootElement to the conversion target class of XML root element

  • Set below property to StaxEventItemReader

    • Set the file to read to property resource

    • Set the name of the root element to property fragmentRootElementName

    • Set org.springframework.oxm.jaxb.Jaxb2Marshaller to property unmarshaller

  • Set below property to Jaxb2Marshaller

    • Set conversion target classs in list format to property classesToBeBound

    • To validate using schema file, set the 2 properties as below

      • Set the schema file for validation to property schema

      • Set implementation class of ValidationEventHandler to property validationEventHandler to handle events occured during the validation

Here is the sample setting to read the input file below.

Input File Sample
<?xml version="1.0" encoding="UTF-8"?>
<records>
    <SalesPlanDetail>
        <branchId>000001</branchId>
        <year>2016</year>
        <month>1</month>
        <customerId>0000000001</customerId>
        <amount>1000000000</amount>
    </SalesPlanDetail>
    <SalesPlanDetail>
        <branchId>000002</branchId>
        <year>2017</year>
        <month>2</month>
        <customerId>0000000002</customerId>
        <amount>2000000000</amount>
    </SalesPlanDetail>
    <SalesPlanDetail>
        <branchId>000003</branchId>
        <year>2018</year>
        <month>3</month>
        <customerId>0000000003</customerId>
        <amount>3000000000</amount>
    </SalesPlanDetail>
</records>
Class to be converted
@XmlRootElement(name = "SalesPlanDetail")  // (1)
public class SalesPlanDetailToJaxb {

    private String branchId;
    private int year;
    private int month;
    private String customerId;
    private BigDecimal amount;

    // omitted getter/setter
}
Item list of setting contents
No Description

(1)

Add @XmlRootElement annotation to make this as the root tag of XML.
Set SalesPlanDetail for the tag name.

The setting for reading the above file is as follows.

Bean definition
<!-- (1) (2) (3) -->
<bean id="reader"
      class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step"
      p:resource="file:#{jobParameters['inputFile']}"
      p:fragmentRootElementName="SalesPlanDetail"
      p:strict="true">
    <property name="unmarshaller">  <!-- (4) -->
        <!-- (5) (6) -->
        <bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller"
              p:schema="file:files/test/input/ch05/fileaccess/SalesPlanDetail.xsd"
              p:validationEventHandler-ref="salesPlanDetailValidationEventHandler">
            <property name="classesToBeBound">  <!-- (7) -->
                <list>
                    <value>org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailToJaxb</value>
                </list>
            </property>
        </bean>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set the input file.

Nothing

(2)

fragmentRootElementName

Set the name of the root element.
If there are several target objects, use fragmentRootElementNames.

Nothing

(3)

strict

If true is set, an exception occurs if the input file does not exist(can not be opened).

true

(4)

unmarshaller

Set the unmarshaller.
Set Bean of org.springframework.oxm.jaxb.Jaxb2Marshaller when using JAXB.

Nothing

(5)

schema

Set shema file for validation.

(6)

validationEventHandler

Set implementation class of ValidationEventHandler to handle events occured during the validation.
Sample implementation of ValidationEventHandler is described later on.

(7)

classesToBeBound

Set conversion target classes in list format.

Nothing

Sample implementation of ValidationEventHandler
@Component
// (1)
public class SalesPlanDetailValidationEventHandler implements ValidationEventHandler {
    /**
     * Logger.
     */
    private static final Logger logger =
            LoggerFactory.getLogger(SalesPlanDetailValidationEventHandler.class);

    @Override
    public boolean handleEvent(ValidationEvent event) {
        // (2)
        logger.error("[EVENT [SEVERITY:{}] [MESSAGE:{}] [LINKED EXCEPTION:{}]" +
                " [LOCATOR: [LINE NUMBER:{}] [COLUMN NUMBER:{}] [OFFSET:{}]" +
                " [OBJECT:{}] [NODE:{}] [URL:{}] ] ]",
                event.getSeverity(),
                event.getMessage(),
                event.getLinkedException(),
                event.getLocator().getLineNumber(),
                event.getLocator().getColumnNumber(),
                event.getLocator().getOffset(),
                event.getLocator().getObject(),
                event.getLocator().getNode(),
                event.getLocator().getURL());
        return false;  // (3)
    }
}
Item list of setting contents
No Description

(1)

Implement ValidationEventHandler class and override handleEvent method.

(2)

Get event information from argument’s event(ValidationEvent), and do any process needed.
In the sample, logging is proceeded.

(3)

Return false to end the search process. Return true to continue the search process.
Return false to end this operation by generating appropriate UnmarshalException, ValidationException or MarshalException.

Adding dependency library

Library dependency needs to be added as below when using Spring Object/Xml Marshalling provided by Spring Framework such as org.springframework.oxm.jaxb.Jaxb2Marshaller.

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-oxm</artifactId>
</dependency>
5.3.3.2.2. Output

Use org.springframework.batch.item.xml.StaxEventItemWriter provided by Spring Batch for outputting XML file.
StaxEventItemWriter can output an XML file by mapping the bean to XML using the specified Marshaller.

Implement StaxEventItemWriter as follows.

  • Do the below setting to conversion target class

    • Add @XmlRootElement to the class as it is to be the root element of the XML

    • Use @XmlType annotation to set orders for outputting fields

    • If there is a field to be excluded from conversion to XML, add @XmlTransient to the getter method of it’s field

  • Set below properties to StaxEventItemWriter

    • Set output target file to property resource

    • Set org.springframework.oxm.jaxb.Jaxb2Marshaller to property marshaller

  • Set below property to Jaxb2Marshaller

    • Set conversion target classes in list format to property classesToBeBound

Here is a sample for outputting below file.

Output file example
<?xml version="1.0" encoding="UTF-8"?>
<records>
  <Customer>
    <customerId>001</customerId>
    <customerName>CustomerName001</customerName>
    <customerAddress>CustomerAddress001</customerAddress>
    <customerTel>11111111111</customerTel>
    <chargeBranchId>001</chargeBranchId></Customer>
  <Customer>
    <customerId>002</customerId>
    <customerName>CustomerName002</customerName>
    <customerAddress>CustomerAddress002</customerAddress>
    <customerTel>11111111111</customerTel>
    <chargeBranchId>002</chargeBranchId></Customer>
  <Customer>
    <customerId>003</customerId>
    <customerName>CustomerName003</customerName>
    <customerAddress>CustomerAddress003</customerAddress>
    <customerTel>11111111111</customerTel>
    <chargeBranchId>003</chargeBranchId>
  </Customer>
</records>
About XML file fomatX(line break and indents)

In the sample above, the output XML file has been formatted(has line break and indents), but the actual XML will not be formatted.

Jaxb2Marshaller has a function to format it when outputting the XML file, but it does not work as is it expected.
This issue is being discussed in the Spring Forum, and might be fixed in the future.

To avoid this and output the formatted XML, set marshallerProperties as below.

<property name="marshaller">
    <bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
        <property name="classesToBeBound">
            <!-- omitted settings -->
        </property>
        <property name="marshallerProperties">
            <map>
                <entry>
                    <key>
                        <util:constant
                            static-field="javax.xml.bind.Marshaller.JAXB_FORMATTED_OUTPUT"/>
                    </key>
                    <value type="java.lang.Boolean">true</value>
                </entry>
            </map>
        </property>
    </bean>
</property>
Class to be converted
@XmlRootElement(name = "Customer")  // (2)
@XmlType(propOrder={"customerId", "customerName", "customerAddress",
        "customerTel", "chargeBranchId"})  // (2)
public class CustomerToJaxb {

    private String customerId;
    private String customerName;
    private String customerAddress;
    private String customerTel;
    private String chargeBranchId;
    private Timestamp createDate;
    private Timestamp updateDate;

    // omitted getter/setter

    @XmlTransient  // (3)
    public Timestamp getCreateDate() { return createDate; }

    @XmlTransient  // (3)
    public Timestamp getUpdateDate() { return updateDate; }
}
Item list of setting contents
No Description

(1)

Add @XmlRootElement annotation to make this as the root tag of XML.
Set Customer for the tag name.

(2)

Use @XmlType annotation to set field output order.

(3)

Add @XmlTransient to getter method of fileds which is to be excluded from XML conversion.

The settings for writing the above file are as follows.

Bean definition
<!-- (1) (2) (3) (4) (5) (6) -->
<bean id="writer"
      class="org.springframework.batch.item.xml.StaxEventItemWriter" scope="step"
      p:resource="file:#{jobParameters['outputFile']}"
      p:encoding="MS932"
      p:rootTagName="records"
      p:overwriteOutput="true"
      p:shouldDeleteIfEmpty="false"
      p:transactional="true">
    <property name="marshaller">  <!-- (7) -->
        <bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
            <property name="classesToBeBound">  <!-- (8) -->
                <list>
                    <value>org.terasoluna.batch.functionaltest.ch05.fileaccess.model.mst.CustomerToJaxb</value>
                </list>
            </property>
        </bean>
    </property>
</bean>
Item list of setting contents
No Property Name Setting contents Required Default Value

(1)

resource

Set output file

Nothing

(2)

encoding

Set character encoding for output file
Default value of character code for the component offered by Spring Batch varies for ItemReader and ItemWriter (Default value of ItemReader is "Default character set of JavaVM").
Hence, it is recommended to explicitly set character code even while using default value.

UTF-8

(3)

rootTagName

Set XML root tag name.

(4)

overwriteOutput

If true, delete the file if it already exists.
If false, throw an exception if the file already exists.

true

(5)

shouldDeleteIfEmpty

If true, delete the file for output if output count is 0.
Since unintended behaviour is likely to happen by combining with other properties, it is recommended not to set it to true. For details