1. Introduction
1.1. Terms of Use
In order to use this document, you are required to agree to abide by the following terms. If you do not agree with the terms, you must immediately delete or destroy this document and all its duplicate copies.
-
Copyrights and all other rights of this document shall belong to NTT DATA or third party possessing such rights.
-
This document may be reproduced, translated or adapted, in whole or in part for personal use. However, deletion of the terms given on this page and copyright notice of NTT DATA is prohibited.
-
This document may be changed, in whole or in part for personal use. Creation of secondary work using this document is allowed. However, “Reference document: TERASOLUNA Batch Framework for Java (5.x) Development Guideline” or equivalent documents may be mentioned in created document and its duplicate copies.
-
Document and its duplicate copies created according to Clause 2 may be provided to third party only if these are free of cost.
-
Use of this document and its duplicate copies, and transfer of rights of this contract to a third party, in whole or in part, beyond the conditions specified in this contract, are prohibited without the written consent of NTT Data.
-
NTT DATA shall not bear any responsibility regarding correctness of contents of this document, warranty of fitness for usage purpose, assurance for accuracy and reliability of usage result, liability for defect warranty, and any damage incurred directly or indirectly.
-
NTT DATA does not guarantee the infringement of copyrights and any other rights of third party through this document. In addition to this, NTT DATA shall not bear any responsibility regarding any claim (Including the claims occurred due to dispute with third party) occurred directly or indirectly due to infringement of copyright and other rights.
Registered trademarks or trademarks of company name and service name, and product name of their respective companies used in this document are as follows.
-
TERASOLUNA is a registered trademark of NTT DATA Corporation.
-
All other company names and product names are the registered trademarks or trademarks of their respective companies.
1.2. Introduction
1.2.1. Goal of guideline
This guideline provides best practices to develop high maintainability Batch applications using full stack framework focusing on Spring Framework, Spring Batch and MyBatis.
This guideline helps to proceed with the software development (mainly coding) smoothly.
1.2.2. Target readers
This guideline is written for architects and programmers having software development experience and knowledge of the following.
-
Basic knowledge of DI and AOP of Spring Framework
-
Application development experience using Java
-
Knowledge of SQL
-
Have experiences on using Maven
This guideline is not for beginners.
In order to check whether one has enough basic knowledge to understand the document, refer to Spring Framework Comprehension Check If one is not able to answer 40% of the comprehension test, then it is recommended to study the following books separately.
1.2.3. Structure of guideline
The most important thing is that the guideline is considered as the subset of TERASOLUNA Server Framework for Java (5.x) Development Guideline (hereafter, referred to as TERASOLUNA Server 5.x Development Guideline). By using TERASOLUNA Server 5.x Development Guideline, you can eliminate duplication in explanation and reduce the cost of learning as much as possible. Since it indicates reference to TERASOLUNA Server 5.x Development Guideline everywhere, we would like you to proceed with the development by using both guides.
- TERASOLUNA Batch Framework for Java (5.x)concept
-
Explains the basic concept of batch processing and the basic concept of TERASOLUNA Batch Framework for Java (5.x) and the overview of Spring Batch.
- Methodology of application development
-
Explains the knowledge and method to be kept in mind while developing an application using TERASOLUNA Batch Framework for Java (5.x).
- Running a Job
-
Explains how to running a job as Synchronous, Asynchronous and provide job parameters.
- Input/output of data
-
Explains how to provide Input/Output to various resources such as Database, File access etc.
- Handling for abnormal condition
-
Explains how to handle the abnormal conditions like Input checks, Exceptions.
- Job management
-
Explains how to manage the Job execution.
- Flow control and parallel/multiple processing
-
Explains the processing of parallel/multiple Job execution.
1.2.4. How to read guideline
It is strongly recommended for all the developers to read the following contents for using TERASOLUNA Batch Framework for Java (5.x).
The following contents are usually required, so they should be read in advance. It is better to select according to the development target.
Refer to the following contents for the first time when proceeding with the implementation.
1.2.4.1. Notations in guideline
This section describe for the notations of this guideline.
- About Windows command prompt and Unix terminal
-
If command syntax in Windows and Unix are different, it describe both. Otherwise, standardize the notations of Unix.
- Prompt sign
-
Describe as
$
in Unix.
$ java -version
- About defining properties and constructor of Bean definition
-
In this guideline, it is described by using namespace of
p
andc
. The use of namespace helps in simplifying and clarifying the description of Bean definition.
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer"
c:ranges="1-6, 7-10, 11-12, 13-22, 23-32"
c:charset="MS932"
p:names="branchId,year,month,customerId,amount"/>
</property>
</bean>
For your reference, the description not using namespace is shown.
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer">
<constructor-arg index="0" value="1-6, 7-10, 11-12, 13-22, 23-32"/>
<constructor-arg index="1" value="MS932"/>
<property name="names" value="branchId,year,month,customerId,amount"/>
</property>
</bean>
This guideline does not force the user to use namespace. We would like to consider it for simplifying the explanation.
1.2.5. Tested environments of guideline
For tested environments of contents described in this guideline, refer to " Tested Environment".
2. TERASOLUNA Batch Framework for Java (5.x) concept
2.1. Batch Processing in General
2.1.1. Introduction to Batch Processing
The term of "Batch Processing" refers to the execution or process of a series of jobs in a computer program without manual intervention (non-interactive).
It is often a process of reading, processing and writing a large number of records from a database or a file.
Batch processing consists of following features and is a processing method which prioritizes process throughput than the responsiveness, as compared to online processing.
-
Process large number of data is collected and processed.
-
Uninterruptible process for certainty of time is done in a fixed sequence.
-
Process runs in accordance with the schedule.
Objective of batch processing is given below.
- Enhanced throughput
-
Process throughput can be enhanced by processing the data sets collectively in a batch.
File or database does not input or output data one by one, and instead sums up data of a fixed quantity thus dramatically reducing overheads of waiting for I/O resulting in the increased efficiency. Even though waiting period for I/O of a single record is insignificant, cumulative accumulation while processing a large amount of data result in fatal delay. - Ensuring responsiveness
-
Processes which are not required to be processed immediately are cut for batch processing in order to ensure responsiveness of online processing.
For example, when the process results are not required immediately, the processing is done by online processing till its acceptance and batch processing is performed in the background. The processing method is generally called "delayed processing". - Response to time and events
-
Processes corresponding to specific period and events are naturally implemented by batch processing.
For example, aggregating business data sets per month on the next 1st weekend,
taking backup every Sunday at 2a.m in accordance with the system operation rules,
and so on. - Restriction for coordination with external system
-
Batch processing is also used due to restrictions of interface like files with interactions of external systems.
File sent from the external system is a summary of data collected for a certain period. Batch processing is better suited for the processes which incorporate these files, than the online processing.
It is very common to combine various techniques to achieve batch processing. Major techniques are introduced here.
- Job Scheduler
-
A single execution unit of a batch processing is called a job. A job scheduler is a middleware to manage this job.
A batch system rarely has several jobs, and usually the number of jobs can reach hundreds or even thousands at times. Hence, an exclusive system to define the relation with the job and manage execution schedule becomes indispensable. - Shell script
-
One of the methods to implement a job. A process is achieved by combining the commands implemented in OS and middleware.
Although the method can be implemented easily, it is not suitable for writing complex business logic. Hence, it is primarily used in simple processes like copying a file, backup, clearing a table etc. Further, shell script performs only the pre-start settings and post-execution processing while executing a process implemented in another programming language. - Programming language
-
One of the methods to implement a job. Structured code can be written rather than the shell script and is advantageous for securing development productivity, maintainability and quality. Hence, it is commonly used to implement business logic that processes data of file or database which tend to be relatively complex with logic.
2.1.2. Requirements for batch processing
Requirements for batch processing in order to implement business process is as given below.
-
Performance improvement
-
A certain quantity of data can be processed in a batch.
-
Jobs can be executed in parallel/in multiple.
-
-
Recovery in case of an abnormality
-
Jobs can be reexecuted (manual/schedule).
-
At the time of reprocessing, it is possible to process only unprocessed records by skipping processed records.
-
-
Various activation methods for running jobs
-
Synchronous execution possible.
-
Asynchronous execution possible.
-
DB polling, HTTP requests can be used as opportunities for execution.
-
-
-
Various input and output interfaces
-
Database
-
File
-
Variable length like CSV or TSV
-
Fixed length
-
XML
-
-
Specific details for the above requirements are given below.
- A large amount of data can be efficiently processed using certain resources (Performance improvement)
-
Processing time is reduced by processing the data collectively. Important part here is "Certain resources" part.
Processing can be done by using a CPU and memory for 100 or even 1 million records and the processing time is ideally extended slowly and linearly according to number of records. Transaction is started and terminated for certain number of records to perform a process collectively. Resources to be used must be levelled in order to perform I/O collectively.
Still, when a large amount of data is to be handled which is yet to be processed, a system wherein hardware resources are used till the limit going a step further. Data to be processed is divided into records or groups and multiple processing is done by using multiple processes and multiple threads. Moving ahead, distributed processing using multiple machines is also implemented. When resources are used upto the limit, it becomes extremely important to reduce as much as possible. - Continue the processing as much as possible (Recovery at the time of occurrence of abnormality)
-
When a large amount of data is to be processed, the countermeasures when an abnormality occurs in input data or system itself must be considered.
A large amount of data takes a long time to finish processing, however if the time till recovery after occurrence of error is prolonged, it is likely to affect the system a great deal.
For example, consider a data consisting of 1 billion records to be processed. Operation schedule would be obviously affected a great deal if error is detected in 999 millionth record and the processing so far is to be performed all over again.
To control this impact, process continuity unique to batch processing becomes very important Hence a system wherein error data skipped and next data record is processed, a system to restart the process and a system which attempts auto-recovery become necessary. Further, it is important to simplify a job as much as possible and enable its easy execution later. - Can be executed flexibly according to triggers of execution (various activation methods)
-
A system to respond to various execution triggers is necessary when triggered by time, or by connecting online or connecting with external system. various systems are widely known such as synchronous processing wherein processing starts when job scheduler reaches scheduled time, asynchronous processing wherein the process is kept resident and batch processing is performed as per the events.
- Handles various input and output interfaces (Various input output interfaces)
-
It is important to handle various files like CSV/XML as well as databases for linking online and external systems. Further, if a method which transparently handles respective input and output method exists, implementation becomes easier and to deal with various formats becomes more quickly.
2.1.3. Rules and precautions to be considered in batch processing
Important rules while building a batch processing system and a few considerations are shown.
-
Simplify unit batch processing as much as possible and avoid complex logical structures.
-
Keep process and data in physical proximity (Save data at the location where process is executed).
-
Minimise the use of system resources (especially I/O) and execute operations in in-memory as much as possible.
-
Further, review I/O of application (SQL etc) to avoid unnecessary physical I/O.
-
Do not repeat the same process for multiple jobs.
-
For example, in case of counting and reporting process, avoid repetition of counting process during reporting process.
-
-
Always assume the worst situation related to data consistency. Verify data to check and to maintain consistency.
-
Review backups carefully. Difficulty level of backup will be high especially when system is operational seven days a week.
2.2. TERASOLUNA Batch Framework for Java (5.x) stack
2.2.1. Overview
TERASOLUNA Batch Framework for Java (5.x)configuration is explained and TERASOLUNA Batch Framework for Java (5.x) scope of responsibility.
2.2.2. TERASOLUNA Batch Framework for Java (5.x) stack
Software Framework used in TERASOLUNA Batch Framework for Java (5.x) is a combination of OSS focusing on Spring Framework (Spring Batch) A stack schematic diagram of TERASOLUNA Batch Framework for Java (5.x) is shown below.
Descriptions for products like job scheduler and database are excluded from this guideline.
2.2.2.1. OSS version to be used
List of OSS versions to be used in 5.0.0.RELEASE of TERASOLUNA Batch Framework for Java (5.x) is given below.
OSS version to be used in TERASOLUNA Batch Framework for Java (5.x) as a rule conforms to definition of Spring IO platform. Note that, version of Spring IO platform in 5.0.0.RELEASE is Athens-SR2.+ For details of Spring IO platform, refer OSS version to be used of TERASOLUNA Server Framework for Java (5.x). |
Type | GroupId | ArtifactId | Version | Spring IO platform | Remarks |
---|---|---|---|---|---|
Spring |
org.springframework |
spring-aop |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-beans |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-context |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-expression |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-core |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-tx |
4.3.5.RELEASE |
* |
|
Spring |
org.springframework |
spring-jdbc |
4.3.5.RELEASE |
* |
|
Spring Batch |
org.springframework.batch |
spring-batch-core |
3.0.7.RELEASE |
* |
|
Spring Batch |
org.springframework.batch |
spring-batch-infrastructure |
3.0.7.RELEASE |
* |
|
Spring Retry |
org.springframework.retry |
spring-retry |
1.1.5.RELEASE |
* |
|
Java Batch |
javax.batch |
javax.batch-api |
1.0.1 |
* |
|
Java Batch |
com.ibm.jbatch |
com.ibm.jbatch-tck-spi |
1.0 |
* |
|
MyBatis3 |
org.mybatis |
mybatis |
3.4.2 |
||
MyBatis3 |
org.mybatis |
mybatis-spring |
1.3.1 |
||
MyBatis3 |
org.mybatis |
mybatis-typehandlers-jsr310 |
1.0.2 |
||
DI |
javax.inject |
javax.inject |
1 |
* |
|
Log output |
ch.qos.logback |
logback-classic |
1.1.8 |
* |
|
Log output |
ch.qos.logback |
logback-core |
1.1.8 |
* |
*1 |
Log output |
org.slf4j |
jcl-over-slf4j |
1.7.22 |
* |
|
Log output |
org.slf4j |
slf4j-api |
1.7.22 |
* |
|
Input check |
javax.validation |
validation-api |
1.1.0.Final |
* |
|
Input check |
org.hibernate |
hibernate-validator |
5.2.4.Final |
* |
|
Input check |
org.jboss.logging |
jboss-logging |
3.3.0.Final |
* |
*1 |
Input check |
com.fasterxml |
classmate |
1.3.3 |
* |
*1 |
Connection pool |
org.apache.commons |
commons-dbcp2 |
2.1.1 |
* |
|
Connection pool |
org.apache.commons |
commons-pool2 |
2.4.2 |
* |
|
Expression Language |
org.glassfish |
javax.el |
3.0.0 |
* |
|
In-memory database |
com.h2database |
h2 |
1.4.193 |
* |
|
XML |
com.thoughtworks.xstream |
xstream |
1.4.9 |
* |
*1 |
XML |
xmlpull |
xmlpull |
1.1.3.1 |
*1 |
|
XML |
xpp |
xpp3_min |
1.1.4c |
*1 |
|
XML |
xpp |
xpp3_min |
1.1.4c |
*1 |
|
JSON |
org.codehaus.jettison |
jettison |
1.2 |
* |
*1 |
-
Libraries which individually depend on libraries supported by Spring IO platform
2.2.3. Structural elements of TERASOLUNA Batch Framework for Java (5.x)
Software Framework structural elements of TERASOLUNA Batch Framework for Java (5.x) are explained.
Overview of each element is shown below.
- Foundation framework
-
Spring Framework is used as a framework foundation. Various functions are applied starting with DI container.
- Batch framework
-
Spring Batch is used as a batch framework.
- Asynchronous execution
-
Following functions are used as a method to execute asynchronous execution.
- Periodic activation by using DB polling
-
A library offered by TERASOLUNA Batch Framework for Java (5.x) is used.
- Web container activation
-
Link with Spring Batch using Spring MVC.
- O/R Mapper
-
Use MyBatis, and use MyBatis-Spring as a library to coordinate with Spring Framework.
- File access
-
In addition to Function offered from Spring Batch, TERASOLUNA Batch Framework for Java (5.x) is used as an auxiiliary function.
- Logging
-
Logger uses SLF4J in API and Logback in the implementation.
- Validation
-
- Unit item check
-
Bean Validation is used in unit item check and Hibernate Validator is used for implementation.
- Correlation check
-
Bean Validation or Spring Validation is used for correlation check.
- Connection pool
-
DBCP is used in the connection pool.
2.2.3.1. A function wherein TERASOLUNA Batch Framework for Java (5.x) provides implementation
A function wherein TERASOLUNA Batch Framework for Java (5.x) provides implementation is given below.
Function name |
Overview |
Asynchronous execution using DB polling is implemented. |
|
Read fixed-length file without line breaks by number of bytes. |
|
Break down a fixed length record in individual field by number of bytes. |
|
Control output of enclosed characters by variable length records. |
2.3. Spring Batch Architecture
2.3.1. Overview
Spring Batch architecture acting as a base for TERASOLUNA Server Framework for Java (5.x) is explained.
2.3.1.1. What is Spring Batch
Spring Batch, as the name implies is a batch application framework. Following functions are offered based on DI container of Spring, AOP and transaction control function.
- Functions to standardize process flow
-
- Tasket model
-
- Simple process
-
It is a method to freely describe a process. It is used in a simple cases like issuing SQL once, issuing a command etc and the complex cases like performing processing while accessing multiple database or files, which are difficult to standardize.
- Chunk model
-
- Efficient processing of large amount of data
-
A method to collectively input / process / output a fixed amount of data. Process flow of data input / processing and output is standardized and job can be implemented by implementing only a part of it.
- Various activation methods
-
Execution is achieved by various triggers like command line execution, execution on Servlet and other triggers.
- I/O of various data formats
-
Input and output for various data resources like file, database, message queue etc can be performed easily.
- Efficient processing
-
Multiple execution, parallel execution, conditional branching are done based on the settings.
- Job execution control
-
Permanence of execution, restart operation using data records as a standard can be performed.
2.3.1.2. Hello, Spring Batch!
If Spring Batch is not covered in understanding of Spring Batch architecture so far, the official documentation given below should be read. We would like you to get used to Spring Batch through creating simple application.
2.3.1.3. Basic structure of Spring Batch
Basic structure of Spring Batch is explained.
Spring Batch defines structure of batch process. It is recommended to perform development after understanding the structure.
Components | Roles |
---|---|
Job |
A single execution unit that summarises a series of processes for batch application in Spring Batch. |
Step |
A unit of processing which constitutes Job. 1 job can contain 1~N steps |
JobLauncher |
An interface for running a Job. |
ItemReader |
An interface for dividing into three processes - input / processing / output of data while implementing chunk model. In Tasket model, ItemReader/ItemProcessor/ItemWriter substitutes a single Tasklet interface implementation. |
JobRepository |
A system to manage condition of Job and Step. The management information is persisted on the database based on the table schema specified by Spring Batch. |
2.3.2. Architecture
Basic structure of Spring Batch is briefly explained in Overview.
Following points are explained on this basis.
In the end, performance tuning points of batch application which use Spring Batch are explained.
2.3.2.1. Overall process flow
Primary components of Spring Batch and overall process flow is explained. Further, explanation is also given about how to manage meta data of execution status of jobs.
Primary components of Spring Batch and overall process flow (chunk model) are shown in the figure below.
Main processing flow (black line) and the flow which persists job information (red line) are explained.
-
JobLauncher is initiated from the job scheduler.
-
Job is executed from JobLauncher.
-
Step is executed from Job.
-
Step fetches input data by using ItemReader.
-
Step processes input data by using ItemProcessor.
-
Step outputs processed data by using ItemWriter.
-
JobLauncher registers JobInstance in Database through JobRepository.
-
JobLauncher registers that Job execution has started in Database through JobRepository.
-
JobStep updates miscellaneous information like counts of I/O records and status in Database through JobRepository.
-
JobLauncher registers that Job execution has completed in Database through JobRepository.
Components and JobRepository focusing on persistence are explained freshly again.
Components | Roles |
---|---|
JobInstance |
Spring Batch indicates "logical" execution of a Job. JobInstance is identified by Job name and arguments.
In other words, execution with identical Job name and argument is identified as execution of identical JobInstance and Job is executed as a continuation from previous activation. |
JobExecution |
JobExecution indicates "physical" execution of Job. Unlike JobInstance, it is termed as another JobExecution even while re-executing identical Job. As a result, JobInstance and JobExecution shows one-to-many relationship. |
StepExecution |
StepExecution indicates "physical" execution of Step. JobExecution and StepExecution shows one-to-many relationship. |
JobRepository |
A function to manage and persist data for managing execution results and status of batch application like JobExecution or StepExecution is provided. |
Spring Batch heavily manages metadata in order to perform re-execution. A snapshot at the time of earlier execution must be retained and metadata and JobRepository should be used as a base in order to re-execute a batch process.
2.3.2.2. Running a Job
How to run a Job is explained.
A scenario is considered wherein a batch process is started immediately after starting Java process and Java process is terminated after completing a batch process. Figure below shows a process flow from starting a Java process till starting a batch process.
A shell script to start Java is generally described to start a Job defined on Spring Batch, along with starting a Java process. When CommandLineJobRunner offered by Spring Batch is used, Job on Spring Batch defined by the user can be easily started.
Start command of the Job which use CommandLineJobRunner is as shown below.
java -cp ${CLASSPATH} org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> <JobArgumentName1>=<value1> <JobArgumentName2>=<value2> ...
CommandLineJobRunner can pass arguments (job parameters) as well along with Job name to be started.
Arguments are specified in <Job argument name>=<Value>
format as per the example described earlier.
All the arguments are stored in JobExecution after conversion to JobParameters after interpreting and checking by CommandLineJobRunner or JobLauncher.
For details, refer to running parameter of Job.
JobLauncher fetches Job name from JobRepository and JobInstance matching with the argument from the database.
-
When corresponding JobInstance does not exist, JobInstance is registered as new.
-
When corresponding JobInstance exists, the associated JobExecution is restored.
-
In Spring Batch, for the jobs that can be executed repeatedly like daily execution etc, a method to add arguments for making the JobInstance unique is listed. For example, adding system date or random number to arguments are listed.
For the method recommended in this guideline, refer parameter conversion class.
-
2.3.2.3. Execution of business logic
Job is divided into smaller units called steps in Spring Batch. When Job is started, Job activates already registered steps and generates StepExecution. Step is a framework for dividing the process till the end and execution of business logic is delegated to Tasket called from Step.
Flow from Step to Tasklet is shown below.
A couple of methods can be listed as the implementation methods of Tasklet - "Chunk model" and "Tasket model". Since the overview has already been explained, the structure will be now explained here.
2.3.2.3.1. Chunk model
As described above, chunk model is a method wherein the processing is performed in a certain number of units (chunks) rather than processing the data to be processed one by one unit. ChunkOrientedTasklet acts as a concrete class of Tasklet which supports the chunk processing. Maximum records of data to be included in the chunk (hereafter referred as "chunk size") can be adjusted by using setup value called commit-interval of this class. ItemReader, ItemProcessor and ItemWriter are all the interfaces based on chunk processing.
Next, explanation is given about how ChunkOrientedTasklet calls the ItemReader, ItemProcessor and ItemWriter.
A sequence diagram wherein ChunkOrientedTasklet processes one chunk is shown below.
ChunkOrientedTasklet repeatedly executes ItemReader and ItemProcessor by the chunk size, in other words, reading and processing of data. After completing reading all the data of chunks, data writing process of ItemWriter is called only once and all the processed data in the chunks is passed. Data update processing is designed to be called once for chunks to enable easy organising like addBatch and executeBatch of JDBC.
Next, ItemReader, ItemProcessor and ItemWriter which are responsible for actual processing in chunk processing are introduced. Although it is assumed that the user handles his own implementation for each interface, it can also be covered by a generic concrete class provided by Spring Batch.
Especially, since ItemProcessor describes the business logic itself, the concrete classes are hardly provided by Spring Batch. ItemProcessor interface is implemented while describing the business logic. ItemProcessor is designed to allow types of objects used in I/O to be specified in respective generics so that typesafe programming is enabled.
An implementation example of a simple ItemProcessor is shown below.
public class MyItemProcessor implements
ItemProcessor<MyInputObject, MyOutputObject> { // (1)
@Override
public MyOutputObject process(MyInputObject item) throws Exception { // (2)
MyOutputObject processedObject = new MyOutputObject(); // (3)
// Coding business logic for item of input data
return processedObject; // (4)
}
}
Sr. No. | Description |
---|---|
(1) |
Implement ItemProcessor interface which specifies the types of objects used for input and output. |
(2) |
Implement |
(3) |
Create output object and store business logic results processed for the input data item. |
(4) |
Return output object. |
Various concrete classes are offered by Spring Batch for ItemReader or ItemWriter and these are used quite frequently. However, when a file of specific format is to be input or output, a concrete class which implements individual ItemReader or ItemWriter can be created and used.
For implementation of business logic while developing actual application, refer application development flow.
Representative concrete classes of ItemReader, ItemProcessor and ItemWriter offered by Spring Batch are shown in the end.
Interface | Concrete class name | Overview |
---|---|---|
ItemReader |
FlatFileItemReader |
Read flat files (non-structural files) like CSV file. Mapping rules for delimiters and objects can be customised by using Resource object as input. |
StaxEventItemReader |
Read XML file. As the name implies, it is an implementation which reads a XML file based on StAX. |
|
JdbcPagingItemReader |
Execute SQL by using JDBC and read records on the database. When a large amount of data is to be processed on the database, it is necessary to avoid reading all the records on memory, and to read and discard only the data necessary for one processing. |
|
MyBatisCursorItemReader |
Read records on the database in coordination with MyBatis. Spring coordination library offered by MyBatis is provided by MyBatis-Spring. For the difference between Paging and Cursor, it is same as JdbcXXXItemReader except for using MyBatis for implementation.+ In addition, JpaPagingItemReader, HibernatePagingItemReader and HibernateCursor are provided which reads records on the database by coordinating with ItemReaderJPA implementation or Hibernate. Using |
|
JmsItemReader |
Receive messages from JMS or AMQP and read the data contained in the same. |
|
ItemProcessor |
PassThroughItemProcessor |
No operation is performed. It is used when processing and modification of input data is not required. |
ValidatingItemProcessor |
Performs input check. It is necessary to implement Spring Batch specific org.springframework.batch.item.validator.Validator for the implementation of input check rules, however, |
|
CompositeItemProcessor |
Sequentially execute multiple ItemProcessor for identical input data. It is enabled when business logic is to be executed after performing input check using ValidatingItemProcessor. |
|
ItemWriter |
FlatFileItemWriter |
Write processed Java object as a flat file like CSV file. Mapping rules for file lines can be customised from delimiters and objects. |
StaxEventItemWriter |
Write processed Java object as a XML file. |
|
JdbcBatchItemWriter |
Execute SQL by using JDBC and output processed Java object to database. Internally JdbcTemplate is used. |
|
MyBatisBatchItemWriter |
Coordinate with MyBatis and output processed Java object to the database. It is provided by Spring coordination library MyBatis-Spring offered by MyBatis. |
|
JmsItemWriter |
Send a message of a processed Java object with JMS or AMQP. |
PassThroughItemProcessor omitted
When a job is defined in XML, ItemProcessor setting can be omitted. When it is omitted, input data is passed to ItemWriter without performing any operation similar to PassThroughItemProcessor. ItemProcessor omitted
|
2.3.2.3.2. Tasket model
Chunk model is a framework suitable for batch applications that read multiple input data one by one and perform a series of processing. However, a process which does not fit with the type of chunk processing is also implemented. For example, when system command is to be executed and when only one record of table for control is to be updated.
In such a case, merits of efficiency obtained by chunk processing are very less and demerits owing to difficult design and implementation are significant. Hence, it is rational to use tasket model.
It is necessary for the user to implement Tasket interface provided by Spring Batch while using a Tasket model. Further, following concrete class is provided in Spring Batch, subsequent description is not given in TERASOLUNA Batch 5.x.
Class name | Overview |
---|---|
SystemCommandTasklet |
Tasket to execute system commands asynchronously. Command to be specified in the command property is specified. |
MethodInvokingTaskletAdapter |
Tasket for executing specific methods of POJO class. Specify Bean of target class in targetObject property and name of the method to be executed in targetMethod property. |
2.3.2.4. Metadata schema of JobRepository
Metadata schema of JobRepository is explained.
Note that, overall picture is explained including the contents explained in Spring Batch reference Appendix B. Meta-Data Schema
Spring Batch metadata table corresponds to a domain object (Entity object) which are represented by Java.
Table |
Entity object |
Overview |
BATCH_JOB_INSTANCE |
JobInstance |
Retains the string which serialises job name and job parameter. |
BATCH_JOB_EXECUTION |
JobExecution |
Retains job status and execution results. |
BATCH_JOB_EXECUTION_PARAMS |
JobExecutionParams |
Retains job parameters assigned at the startup. |
BATCH_JOB_EXECUTION_CONTEXT |
JobExecutionContext |
Retains the context inside the job. |
BATCH_STEP_EXECUTION |
StepExecution |
Retains status and execution results of step, number of commits and rollbacks. |
BATCH_STEP_EXECUTION_CONTEXT |
StepExecutionContext |
Retains context inside the step. |
JobRepository is responsible for accurately storing the contents stored in each Java object, in the table.
6 ERD models of all the tables and interrelations are shown below.
2.3.2.4.1. Version
Majority of database tables contain version columns. This column is important since Spring Batch adopts an optimistic locking strategy to handle updates to database. This record signifies that it is updated when the value of the version is incremented. When JobRepository updates the value and the version number is changed, an OptimisticLockingFailureException which indicates an occurrence of simultaneous access error is thrown. Other batch jobs may be running on a different machines, however, all the jobs use the same database, hence this check is required.
2.3.2.4.2. ID (Sequence) definition
BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION and BATCH_STEP_EXECUTION all contain column ending with _ID.
These fields act as a primary key for respective tables.
However, these keys are not generated in the database but are rather generated in a separate sequence.
After inserting one of the domain objects in the database, the keys which assign the domain objects should be set in the actual objects so that they can be uniquely identified in Java.
Sequences may not be supported depending on the database. In this case, a table is used instead of each sequence.
2.3.2.4.3. Table definition
Explanation is given for each table item.
BATCH_JOB_INSTANCE table retains all the information related to JobInstance and is at top level of the overall hierarchy.
Column name | Description |
---|---|
JOB_INSTANCE_ID |
A primary key which is a unique ID identifying an instance. |
VERSION |
Refer Version. |
JOB_NAME |
Job name. A non-null value since it is necessary for identifying an instance. |
JOB_KEY |
JobParameters which are serialised for uniquely identifying same job as a different instance. |
BATCH_JOB_EXECUTION table retains all the information related to JobExecution object. When a job is executed, new rows are always registered in the table with new JobExecution.
Column name | Description |
---|---|
JOB_EXECUTION_ID |
Primary key that uniquely identifies this job execution. |
VERSION |
Refer Version. |
JOB_INSTANCE_ID |
Foreign key from BATCH_JOB_INSTANCE table which shows an instance wherein the job execution belongs. Multiple executions are likely to exist for each instance. |
CREATE_TIME |
Time when the job execution was created. |
START_TIME |
Time when the job execution was started. |
END_TIME |
Indicates the time when the job execution was terminated regardless of whether it was successful or failed. |
STATUS |
A character string which indicates job execution status. It is a character string output by BatchStatus enumeration object. |
EXIT_CODE |
A character string which indicates an exit code of job execution. When it is activated by CommandLineJobRunner, it can be converted to a numeric value. |
EXIT_MESSAGE |
A character string which indicates detailed explanation of job end status. When a failure occurs, a character string that includes as many as stack traces as possible is likely. |
LAST_UPDATED |
Time when job execution of the record was last updated. |
BATCH_JOB_EXECUTION_PARAMS table retains all the information related to JobParameters object. It contains a pair of 0 or more keys passed to the job and the value and records the parameters by which the job was executed.
Column name | Description |
---|---|
JOB_EXECUTION_ID |
Foreign key from BATCH_JOB_EXECUTION table which executes this job wherein the job parameter belongs. |
TYPE_CD |
A character string which indicates that the data type is string, date, long or double. |
KEY_NAME |
Parameter key. |
STRING_VAL |
Parameter value when data type is string. |
DATE_VAL |
Parameter value when data type is date. |
LONG_VAL |
Parameter value when data type is an integer. |
DOUBLE_VAL |
Parameter value when data type is a real number. |
IDENTIFYING |
A flag which indicates that the parameter is a value to identify that the job instance is unique. |
Constraints of job parameter
|
BATCH_JOB_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Job. It contains all the job level data required for execution of specific jobs. The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.
Column name | Description |
---|---|
JOB_EXECUTION_ID |
A foreign key from BATCH_JOB_EXECUTION table which indicates job execution wherein ExecutionContext of Job belongs. |
SHORT_CONTEXT |
A string representation of SERIALIZED_CONTEXT. |
SERIALIZED_CONTEXT |
Overall serialised context. |
BATCH_STEP_EXECUTION table retains all the information related to StepExecution object. This table very similar to BATCH_JOB_EXECUTION table in many ways. When each JobExecution is created, at least one entry exists for each Step.
Column name | Description |
---|---|
STEP_EXECUTION_ID |
Primary key that uniquely identifies the step execution. |
VERSION |
Refer Version. |
STEP_NAME |
Step name. |
JOB_EXECUTION_ID |
Foreign key from BATCH_JOB_EXECUTION table which indicates JobExecution wherein StepExecution belongs |
START_TIME |
Time when step execution was started. |
END_TIME |
Indicates time when step execution ends regardless of whether it is successful or failed. |
STATUS |
A character string that represents status of step execution. It is a string which outputs BatchStatus enumeration object. |
COMMIT_COUNT |
Number of times a transaction is committed. |
READ_COUNT |
Data records read by ItemReader. |
FILTER_COUNT |
Data records filtered by ItemProcessor. |
WRITE_COUNT |
Data records written by ItemWriter. |
READ_SKIP_COUNT |
Data records skipped by ItemReader. |
WRITE_SKIP_COUNT |
Data records skipped by ItemWriter. |
PROCESS_SKIP_COUNT |
Data records skipped by ItemProcessor. |
ROLLBACK_COUNT |
Number of times a transaction is rolled back. |
EXIT_CODE |
A character string which indicates exit code for step execution. When it is activated by using CommandLineJobRunner, it can be changed to a numeric value. |
EXIT_MESSAGE |
A character string which indicates detailed explanation of the end state of the step. When a failure occurs, a character string that includes as many as stack traces as possible is likely. |
LAST_UPDATED |
Time when the step execution of the record was last updated. |
BATCH_STEP_EXECUTION_CONTEXT table retains all the information related to ExecutionContext of Step. It contains all the step level data required for execution of specific steps. The data indicates the status that must be fetched when the process is to be executed again after a job failure and enables the failed job to start from the point where processing has stopped.
Column name | Description |
---|---|
STEP_EXECUTION_ID |
Foreign key from BATCH_STEP_EXECUTION table which indicates job execution wherein ExecutionContext of Step belongs. |
SHORT_CONTEXT |
String representation of SERIALIZED_CONTEXT. |
SERIALIZED_CONTEXT |
Overall serialized context. |
2.3.2.4.4. DDL script
JAR file of Spring Batch Core contains a sample script which creates a relational table corresponding to several database platforms.
These scripts can be used as it is or additional index or constraints can be changed as required.
The script is included in the package of org.springframework.batch.core and the file name is configured by schema-*.sql
.
"*" is the short name for Target Database Platform..
2.3.2.5. Typical performance tuning points
Typical performance tuning points in Spring Batch are explained.
- Adjustment of chunk size
-
Chunk size is increased to reduce overhead occurring due to resource output.
However, if chunk size is too large, it increases load on the resources resulting in deterioration in the performance. Hence, chunk size must be adjusted to a moderate value. - Adjustment of fetch size
-
Fetch size (buffer size) for the resource is increased to reduce overhead occurring due to input from resources.
- Reading of a file efficiently
-
When BeanWrapperFieldSetMapper is used, a record can be mapped to the Bean only by sequentially specifying Bean class and property name. However, it takes time to perform complex operations internally. Processing time can be reduced by using dedicated FieldSetMapper interface implementation which performs mapping.
For file I/O details, refer File access. - Parallel processing, Multiple processing
-
Spring Batch supports parallel processing of Step execution and multiple processing by using data distribution. Parallel processing or multiple processing can be performed and the performance can be improved by running the processes in parallel. However, if number of parallel processes and multiple processes is too large, load on the resources increases resulting in deterioration of performance. Hence, size must be adjusted to a moderate value.
For details of parallel and multiple processing, refer "parallel processing and multiple processing. - Reviewing distributed processing
-
Spring Batch also supports distributed processing across multiple machines. Guidelines are same as parallel and multiple processing.
Distributed processing will not be explained in this guideline since the basic design and operational design are complex.
2.4. Architecture of TERASOLUNA Batch Framework for Java (5.x)
2.4.1. Overview
Overall architecture of TERASOLUNA Batch Framework for Java (5.x) is explained.
In TERASOLUNA Batch Framework for Java (5.x), as described in General batch processing system, it is implemented by using OSS combination focused on Spring Batch.
A configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x) including hierarchy architecture of Spring Batch is shown below.
- Business Application
-
All job definitions and business logic written by developers.
- spring batch core
-
A core runtime class required to start and control batch jobs offered by Spring Batch.
- spring batch infrastructure
-
Implementation of general ItemReader/ItemProcessor/ItemWriter offered by Spring Batch which are used by developers and core framework itself.
2.4.2. Structural elements of job
A configuration schematic diagram of jobs is shown below in order to explain structural elements of the job.
This section also talks about guidelines which should be finely configured for job and step.
2.4.2.1. Job
A job is an entity that encapsulates entire batch process and is a container for storing steps.
A job can consist of one or more steps.
A job is defined in the Bean definition file by using XML. Multiple jobs can be defined in the job definition file, however, managing jobs tend to become complex.
Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.
1 job = 1 job definition file
2.4.2.2. Step
Step defines information required for controlling a batch process. A chunk model and a tasket model can be defined in the step.
- Chunk model
-
-
It is configured by ItemReader, ItemProcessor and ItemWriter.
-
- Tasket model
-
-
It is configured only by Tasklet.
-
As given in Rules and precautions to be considered in batch processing, it is necessary to simplify as much as possible and avoid complex logical structures in a single batch process.
Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.
1 step = 1 batch process = 1 business logic
Distribution of business logic in chunk model
If a single business logic is complex and large-scale, the business logic is divided into units. As clear from the schematic diagram, since only one ItemProcessor can be set in 1 step, it looks like the division of business logic is not possible. However, since CompositeItemProcssor which is an ItemProcessor consisting of multiple ItemProcessors exist, the business logic can be divided and executed by using this implementation. |
2.4.3. How to implement Step
2.4.3.1. Chunk model
Definition of chunk model and purpose of use are explained.
- Definition
-
ItemReader, ItemProcessor and ItemWriter implementation and number of chunks are set in ChunkOrientedTasklet. Respective roles are explained.
-
ChunkOrientedTasklet…Call ItemReader/ItemProcessor and create a chunk. Pass created chunk to ItemWriter.
-
ItemReader…Read input data.
-
ItemProcessor…Process read data.
-
ItemWriter…Output processed data in chunk units.
-
-
For overview of chunk model, refer Chunk model.
<batch:job id="exampleJob">
<batch:step id="exampleStep">
<batch:tasklet>
<batch:chunk reader="reader"
processor="processor"
writer="writer"
commit-interval="100" />
</batch:tasklet>
</batch:step>
</batch:job>
- Purpose of use
-
Since it handles a certain amount of data collectively, it is used while handling a large amount of data.
2.4.3.2. Tasket model
Definition of tasket model and purpose of use are explained.
- Definition
-
Only Tasklet implementation is set.
For overview of Tasket model, refer Tasket model.
<batch:job id="exampleJob">
<batch:step id="exampleStep">
<batch:tasklet ref="myTasklet">
</batch:step>
</batch:job>
- Purpose of use
-
It can be used for executing a process which is not associated with I/O like execution of system commands etc.
Further, it can also be used while committing the data in batches.
2.4.3.3. Function difference between chunk model and Tasket model
Explanation is given for the function difference between chunk model and Tasket model. Here, only outline is given. Refer section for each function for details.
Function | Chunk model | Tasket model |
---|---|---|
Structural elements |
Configured by ItemReader/ItemProcessor/ItemWriter/ChunkOrientedTasklet. |
Configured only by Takslet. |
Transaction |
A transaction is generated in a chunk unit. |
Processed in 1 transaction. |
Recommended reprocessing method |
Re-run and re-start can be used. |
As a rule, only re-run is used. |
Exception handling |
Handling process becomes easier by using a listener. Individual implementation is also possible. |
Individual implementation is required. |
2.4.4. Running a job method
Running a job method is explained. This contains following.
Respective methods are explained.
2.4.4.1. Synchronous execution method
Synchronous execution method is an execution method wherein the control is not given back to the boot source from job start to job completion.
A schematic diagram which starts a job from job scheduler is shown.
-
Start a shell script to run a job from job scheduler.
Job scheduler waits until the exit code (numeric value) is returned. -
Start
CommandLineJonRunner
to run a job from shell script.
Shell script waits untilCommandLineJonRunner
returns an exit code (numeric value). -
CommandLineJonRunner
runs a job. Job returns an exit code (string) toCommandLineJonRunner
after processing is completed.
CommandLineJonRunner
converts exit code (string) returned from the job to exit code (numeric value) and returns it to the shell script.
2.4.4.2. Asynchronous execution method
Asynchronous execution method is an execution method wherein the control is given back to boot source immediately after running a job, by executing a job on a different execution base than boot source (a separate thread etc). In this method, it is necessary to fetch job execution results by a means different from that of running a job.
Following 2 methods are explained in TERASOLUNA Batch Framework for Java (5.x).
Other asynchronous execution methods
Asynchronous execution can also be performed by using messages like MQ, however since the job execution points are identical, description will be omitted in {batch5_guide}. |
2.4.4.2.1. Asynchronous execution method (DB polling)
Asynchronous execution (DB polling) is a method wherein a job execution request is registered in the database, polling of the request is done and job is executed.
TERASOLUNA Batch Framework for Java (5.x) supports DB polling function. The schematic diagram of start by DB polling offered is shown.
-
User registers a job request to the database.
-
DB polling function periodically monitors the registration of the job request and executes the corresponding job when the registration is detected.
-
Run the job from SimpleJobOperator and receive
JobExecutionId
after completion of the job. -
JobExecutionId is an ID which uniquely identifies job execution and execution results are browsed from JobRepository by using this ID.
-
Job execution results are registered in JobRepository by using Spring Batch system.
-
DB polling is itself executed asynchronously.
-
-
DB polling function updates JobExecutionId returned from SimpleJobOperator and the job request that started the status.
-
Job process progress and results are referred separately by using JobExecutionId.
2.4.4.2.2. Asynchronous execution method (Web container)
Asynchronous execution (Web container) is a method wherein a job is executed asynchronously using the request sent to web application on the web container as a trigger.* A Web application can return a response immediately after starting without waiting for the job to end.
-
Send a request from a client to Web application.
-
Web application asynchronously executes the job requested from a request.
-
Receive
`JobExecutionId
immediately after starting a job from SimpleJobOperator. -
Job execution results are registered in JobRepository by using Spring Batch system.
-
-
Web application returns a response to the client without waiting for the job to end.
-
Job process progress and results are browsed separately by using JobExecutionId.
Further, it can also be linked with Web application configured by TERASOLUNA Server Framework for Java (5.x).
2.4.5. Points to consider while using
Points to consider while using TERASOLUNA Batch Framework for Java (5.x) are shown.
- Running a job method
-
- Synchronous execution method
-
It is used when job is run as per schedule and batch processing is carried out by combining multiple jobs.
- Asynchronous execution method (DB polling)
-
It is used in delayed processing, continuous execution of jobs with a short processing time, aggregation of large quantity of jobs.
- Asynchronous execution method (Web container)
-
Similar to DB polling, however it is used when an immediate action is required for the startup.
- Implementation method
-
- Chunk model
-
It is used when a large quantity of data is to be processed efficiently.
- Tasket model
-
It is used for simple processing, processing that is difficult to standardize and for the processes wherein data is to be processed collectively.
3. Methodology of application development
3.1. Development of batch application
The development of batch application is explained in the following flow.
3.1.1. What is blank project
Blank project is the template of development project wherein various settings are made in advance such as Spring Batch, MyBatis3
and is the start point of application development.
In this guideline, a blank project with a single project structure is provided.
Refer to Project structure for the explanation of structure.
Difference from TERASOLUNA Server 5.x
Multi-project structure is recommended forTERASOLUNA Server 5.x. The reason is mainly to enjoy the following merits.
However, in this guideline, a single project structure is provided unlike TERASOLUNA Server 5.x. This point should be considered for batch application also, however,
by providing single project structure, accessing the resources related to one job is given priority. |
3.1.2. Creation of project
How to create a project using archetype:generate
of Maven Archetype Plugin
is explained.
Regarding prerequisites of creating environment
Prerequisites are explained below.
|
Execute the following commands in the directory where project is created.
C:\xxx> mvn archetype:generate^
-DarchetypeGroupId=org.terasoluna.batch^
-DarchetypeArtifactId=terasoluna-batch-archetype^
-DarchetypeVersion=5.0.0.RELEASE
$ mvn archetype:generate \
-DarchetypeGroupId=org.terasoluna.batch \
-DarchetypeArtifactId=terasoluna-batch-archetype \
-DarchetypeVersion=5.0.0.RELEASE
Next, set the following to Interactive mode in accordance with the status of the user.
-
groupId
-
artifactId
-
version
-
package
An example of setting and executing the value is shown below.
Item name | Setting example |
---|---|
groupId |
com.example.batch |
artifactId |
batch |
version |
1.0.0-SNAPSHOT |
package |
com.example.batch |
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:2.4:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:2.4:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO] --- maven-archetype-plugin:2.4:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
(.. omitted)
Define value for property 'groupId': : com.example.batch
Define value for property 'artifactId': : batch
Define value for property 'version': 1.0-SNAPSHOT: : 1.0.0-SNAPSHOT
Define value for property 'package': com.example.batch: :
Confirm properties configuration:
groupId: com.example.batch
artifactId: batch
version: 1.0.0-SNAPSHOT
package: com.example.batch
Y: : y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: terasoluna-batch-archetype:5.0.0-SNAPSHOT
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: packageInPathFormat, Value: com/example/batch
[INFO] Parameter: package, Value: com.example.batch
[INFO] Parameter: version, Value: 1.0.0-SNAPSHOT
[INFO] Parameter: groupId, Value: com.example.batch
[INFO] Parameter: artifactId, Value: batch
[INFO] project created from Archetype in dir: C:\workspaces\zzz\batch
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:56 min
[INFO] Finished at: 2017-02-07T17:09:52+09:00
[INFO] Final Memory: 16M/240M
[INFO] ------------------------------------------------------------------------
The creation of project is completed by the above execution.
It can be confirmed whether the project was created properly by the following points.
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
$ java -cp 'lib/*:target/*' org.springframework.batch.core.launch.support.CommandLineJobRunner \
META-INF/jobs/job01/job01.xml job01
It is created properly if the following output is obtained.
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building TERASOLUNA Batch Framework for Java (5.x) Blank Project 1.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
(.. omitted)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.618 s
[INFO] Finished at: 2017-02-07T17:32:27+09:00
[INFO] Final Memory: 26M/250M
[INFO] ------------------------------------------------------------------------
$ java -cp 'lib/*;target/*' org.springframework.batch.core.launch.support.CommandLineJobRunner META-INF/jobs/job01/job01.xml job01
[2017/02/07 17:35:26] [main] [o.s.c.s.ClassPathXmlApplicationContext] [INFO ] Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@62043840: startup date [Tue Feb 07 17:35:26 JST 2017]; root of context hierarchy
(.. ommited)
[2017/02/07 17:35:27] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] launched with the following parameters: [{jsr_batch_run_id=1}]
[2017/02/07 17:35:27] [main] [o.s.b.c.j.SimpleStepHandler] [INFO ] Executing step: [job01.step01]
[2017/02/07 17:35:27] [main] [o.s.b.c.l.s.SimpleJobLauncher] [INFO ] Job: [FlowJob: [name=job01]] completed with the following parameters: [{jsr_batch_run_id=1}] and the following status: [COMPLETED]
[2017/02/07 17:35:27] [main] [o.s.c.s.ClassPathXmlApplicationContext] [INFO ] Closing org.springframework.context.support.ClassPathXmlApplicationContext@62043840: startup date [Tue Feb 07 17:35:26 JST 2017]; root of context hierarchy
3.1.3. Project structure
Project structure that was created above, is explained. Project structure should be made by considering the following points.
-
Implement the job independent of startup method
-
Save the efforts of performing various settings such as Spring Batch, MyBatis
-
Make the environment dependent switching easy
The structure is shown and each element is explained below.
(It is explained based on the output at the time of executing the above mvn archetype:generate
to easily understand.)
Sr. No. | Explanation |
---|---|
(1) |
root package that stores various classes of the entire batch application. |
(2) |
Package that stores various classes of 1 job. You can customize it with reference to default state however, consider making it easier to judge the resources specific to job. |
(3) |
Configuration file of the entire batch application. |
(4) |
Configuration file of Logback(log output). |
(5) |
Configuration file that defines messages to be displayed when an error occurs during the input check using BeanValidation. |
(6) |
Mapper XML file that pairs with Mapper interface of MyBatis3. |
(7) |
Property file that defines messages used mainly for log output. |
(8) |
Directory that stores job-specific Bean definition file. |
(9) |
Directory that stores Bean definition file related to the entire batch application. |
(10) |
Bean definition file that describes settings related to asynchronous execution (DB polling) function. |
(11) |
Bean definition file to reduce various settings by importing in a job-specific Bean definition file. |
(12) |
Bean definition file for setting Spring Batch behavior and common jobs. |
Relation figure of each file is shown below.
3.1.4. Flow of development
Series of flow of developing job is explained.
Here, we will focus on understanding general flow and not the detailed explanation.
3.1.4.1. Import to IDE
Since the generated project is as per the project structure of Maven,
import as Maven project using various IDEs.
Detailed procedures are omitted.
3.1.4.2. Setting of entire application
Customize as follows depending on user status.
How to customize settings other than these by individual functions is explained.
3.1.4.2.1. Project information of pom.xml
As the following information is set with temporary values in the POM of the project, values should be set as per the status.
-
Project name(name element)
-
Project description(description element)
-
Project URL(url element)
-
Project inception year(inceptionYear element)
-
Project license(licenses element)
-
Project organization(organization element)
3.1.4.2.2. Database related settings
Database related settings are at many places, so each place should be modified.
<!-- (1) -->
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
# (2)
# Admin DataSource settings.
admin.jdbc.driver=org.h2.Driver
admin.jdbc.url=jdbc:h2:mem:batch-admin;DB_CLOSE_DELAY=-1
admin.jdbc.username=sa
admin.jdbc.password=
# (2)
# Job DataSource settings.
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/postgres
#jdbc.username=postgres
#jdbc.password=postgres
jdbc.driver=org.h2.Driver
jdbc.url=jdbc:h2:mem:batch;DB_CLOSE_DELAY=-1
jdbc.username=sa
jdbc.password=
# (3)
# Spring Batch schema initialize.
data-source.initialize.enabled=true
spring-batch.schema.script=classpath:org/springframework/batch/core/schema-h2.sql
terasoluna-batch.commit.script=classpath:org/terasoluna/batch/async/db/schema-commit.sql
<!-- (3) -->
<jdbc:initialize-database data-source="adminDataSource"
enabled="${data-source.initialize.enabled:false}"
ignore-failures="ALL">
<jdbc:script location="${spring-batch.schema.script}" />
<jdbc:script location="${terasoluna-batch.commit.script}" />
</jdbc:initialize-database>
<!-- (4) -->
<bean id="adminDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
destroy-method="close"
p:driverClassName="${admin.jdbc.driver}"
p:url="${admin.jdbc.url}"
p:username="${admin.jdbc.username}"
p:password="${admin.jdbc.password}"
p:maxTotal="10"
p:minIdle="1"
p:maxWaitMillis="5000"
p:defaultAutoCommit="false"/>
<!-- (4) -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
destroy-method="close"
p:driverClassName="${jdbc.driver}"
p:url="${jdbc.url}"
p:username="${jdbc.username}"
p:password="${jdbc.password}"
p:maxTotal="10"
p:minIdle="1"
p:maxWaitMillis="5000"
p:defaultAutoCommit="false" />
<!-- (5) -->
<bean id="jobSqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean"
p:dataSource-ref="jobDataSource" >
<property name="configuration">
<bean class="org.apache.ibatis.session.Configuration"
p:localCacheScope="STATEMENT"
p:lazyLoadingEnabled="true"
p:aggressiveLazyLoading="false"
p:defaultFetchSize="1000"
p:defaultExecutorType="REUSE" />
</property>
</bean>
<!-- (5) -->
<bean id="adminSqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean"
p:dataSource-ref="adminDataSource" >
<property name="configuration">
<bean class="org.apache.ibatis.session.Configuration"
p:localCacheScope="STATEMENT"
p:lazyLoadingEnabled="true"
p:aggressiveLazyLoading="false"
p:defaultFetchSize="1000"
p:defaultExecutorType="REUSE" />
</property>
</bean>
Sr. No. | Explanation |
---|---|
(1) |
In pom.xml, define dependency relation of JDBC driver for connecting to the database to be used. |
(2) |
Set JDBC driver connection. |
(3) |
Define whether or not to execute the initialization of database used by Spring Batch or TERASOLUNA Batch 5.x, and the script to be used. |
(4) |
Set datasource. |
(5) |
Set MyBatis behavior. |
3.1.6. Build and execution of project
Build and execution of project is explained.
3.1.6.1. Build of application
Move to the root directory of the project and execute the following command.
$ mvn clean dependency:copy-dependencies -DoutputDirectory=lib package
The following is generated by this.
-
<Root directory>/target/<archetypeId>-<version>.jar
-
Jar of the created batch application is generated
-
-
<Root directory>/lib/(Dependent Jar file)
-
A set of dependent Jar files is copied
-
When deploying to the test environment and the commercial environment, these Jar files can be copied to an arbitrary directory.
3.1.6.2. Switching of configuration file according to the environment
In the pom.xml of the project, the following Profile is set as the default value.
<profiles>
<!-- Including application properties and log settings into package. (default) -->
<profile>
<id>IncludeSettings</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<exclude-property/>
<exclude-log/>
</properties>
</profile>
<!-- Excluding application properties and log settings into package. -->
<profile>
<id>ExcludeSettings</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<exclude-property>batch-application.properties</exclude-property>
<exclude-log>logback.xml</exclude-log>
</properties>
</profile>
</profiles>
Here, Whether to include environment dependent configuration file
is switched.
By utilizing this setting, it is possible to absorb the environmental difference by separately placing the configurationfile at the time of environment deployment.
Moreover, by applying this, it is possible to change the configuration file to be included in Jar in the test environment and the commercial environment.
An example is shown below.
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
<resource>
<directory>${project.root.basedir}/${project.config.resource.directory.rdbms}</directory>
</resource>
</resources>
</build>
<profiles>
<profile>
<id>postgresql9-local</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
<properties>
<project.config.resource.directory.rdbms>config/rdbms/postgresql9/local</project.config.resource.directory.rdbms>
</properties>
</profile>
<profile>
<id>postgresql9-it</id>
<dependencies>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
<properties>
<project.config.resource.directory.rdbms>config/rdbms/postgresql9/it</project.config.resource.directory.rdbms>
</properties>
</profile>
</profiles>
Maven Profile can be activated at the time of executing command as follows.
Multiple Profiles can be activated. Use effectively whenever required.
$ mvn -P profile-1,profile-2
3.1.6.2.1. Execution of application
An example of executing the job based on the above-mentioned build result, is shown.
Replace archetypeId
and version
in accordance with the user’s environment.
C:\xxxx> java -cp target\archetypeId-version.jar;lib\*^
org.springframework.batch.core.launch.support.CommandLineJobRunner^
META-INF/jobs/job01.xml job01
$ java -cp 'target/archetypeId-version.jar:lib/*' \
org.springframework.batch.core.launch.support.CommandLineJobRunner \
META-INF/jobs/job01.xml job01
Necessity to handle exit code returned by java command
In the actual system,
rather than issuing a java command directly when issuing a job from the job scheduler,
It is common to start by inserting shell script for starting java. This is for setting the environment variables before starting the java command and for handling the exit code of the java command.
It is recommended that
An example of handling exit code is shown below. Example of handling exit code
|
3.2. Creation of chunk model job
3.2.1. Overview
How to create chunk model job is explained. Refer to Spring Batch architecture for the architecture of chunk model.
The components of chunk model job is explained here.
3.2.1.1. Components
The components of chunk model job are shown below. Implement 1 job by combining these components in job Bean definition file.
Sr. No. | Name | Role | Mandatory settings | Mandatory implementation |
---|---|---|---|---|
1 |
ItemReader |
Interface to fetch data from various resources. |
||
2 |
ItemProcessor |
Interface for processing data from input to output. |
||
3 |
ItemWriter |
Interface for the output of data to various resources. |
The points in this table are as follows.
-
If the data is to be only transferred from input resource to output resource in a simple way, it can be implemented only by setting.
-
ItemProcessor
should be implemented whenever required.
Hereafter, how to implement the job using these components, is explained.
3.2.2. How to use
How to implement chunk model job is explained in the following order here.
3.2.2.1. Job configuration
Define a way to combine the elements that constitutes chunk model job in the Bean definition file. An example is shown below and the relation between components is explained.
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:batch="http://www.springframework.org/schema/batch"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:mybatis="http://mybatis.org/schema/mybatis-spring"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
http://mybatis.org/schema/mybatis-spring http://mybatis.org/schema/mybatis-spring.xsd">
<!-- (1) -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>
<!-- (2) -->
<context:annotation-config/>
<!-- (3) -->
<context:component-scan
base-package="org.terasoluna.batch.functionaltest.app.common" />
<!-- (4) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
factory-ref="jobSqlSessionFactory"/>
<!-- (5) -->
<bean id="reader"
class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository.findAll"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- (6) -->
<!-- Item Processor -->
<!-- Item Processor in order that based on the Bean defined by the annotations, not defined here -->
<!-- (7) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter"
scope="step"
p:resource="file:#{jobParameters[outputFile]}">
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="customerId,customerName,customerAddress,customerTel,chargeBranchId"/>
</property>
</bean>
</property>
</bean>
<!-- (8) -->
<batch:job id="jobCustomerList01" job-repository="jobRepository"> <!-- (9) -->
<batch:step id="jobCustomerList01.step01"> <!-- (10) -->
<batch:tasklet transaction-manager="jobTransactionManager"> <!-- (11) -->
<batch:chunk reader="reader"
processor="processor"
writer="writer"
commit-interval="10" /> <!-- (12) -->
</batch:tasklet>
</batch:step>
</batch:job>
</beans>
@Component("processor") // (6)
public class CustomerProcessor implement ItemProcessor<Customer, Customer> {
// omitted
}
Sr. No. | Explanation |
---|---|
(1) |
Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x. |
(2) |
Enable Bean definition using annotation. Use it with (3) when implementing ItemProcessor, Listener etc. |
(3) |
Set base package of component scan target. When defining Bean using annotation, use it with (2). |
(5) |
ItemReader configuration. |
(6) |
ItemProcessor can be defined by annotation in (2), (3), so there is no need to define in the Bean definition file. |
(7) |
ItemWriter configuration. |
(8) |
Job configuration. |
(9) |
|
(10) |
Step configuration. |
(11) |
Tasklet configuration. |
(12) |
Chunk model job configuration. |
Tuning of commit-interval
In the above example, 10 records are used however, exact count differs with the characteristics of available machine resource and job. In case of a job that processes data by accessing multiple resources, the process throughput may reach to 100 records from 10 records. If input/output resource is of 1:1 correspondence and there is a job of transferring data, then the process throughput may increase to 5000 records or even to 10000 records. Temporarily set |
3.2.2.2. Implementation of components
How to implement mainly ItemProcessor is explained here.
Refer to the following for other components.
-
ItemReader、ItemWriter
-
Listener
3.2.2.2.1. Implementation of ItemProcessor
How to implement ItemProcessor is explained.
ItemProcessor is responsible for creating 1 record data for the output resource based on the 1 record data fetched from the input resource as shown in the interface below. In other words, ItemProcessor is where business logic for 1 record data is implemented.
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
The interface indicating I
and O
can be of same type or of different type as shown below.
Same type means modifying input data partially.
Different type means to generate output data based on the input data.
@Component
public class AmountUpdateItemProcessor implements
ItemProcessor<SalesPlanDetail, SalesPlanDetail> {
@Override
public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
item.setAmount(new BigDecimal("1000"));
return item;
}
}
@Component
public class UpdateItemFromDBProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {
@Inject
CustomerRepository customerRepository;
@Override
public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
Customer customer = customerRepository.findOne(readItem.getCustomerId());
SalesPlanDetail writeItem = new SalesPlanDetail();
writeItem.setBranchId(customer.getChargeBranchId());
writeItem.setYear(readItem.getYear());
writeItem.setMonth(readItem.getMonth());
writeItem.setCustomerId(readItem.getCustomerId());
writeItem.setAmount(readItem.getAmount());
return writeItem;
}
}
Explanation of return of null from ItemProcessor
Return of null from ItemProcessor means the data is not passed to the subsequent process (Writer). In other words, the data is filtered. This can be effectively used to validate the input data. For detail, refer to Input check. |
To increase process throughput of ItemProcessor
As shown in the previous implementation example, the implementation class of ItemProcessor should access resources such as DB and files. Since ItemProcessor is executed for each record of input data, even if there is small I/O, large I/O occurs in the entire job, so it is important to suppress I/O as much as possible for increasing process throughput. One method is to store the required data in memory in advance by utilizing Listener to be mentioned later and implement most of the processing in ItemProcessor so that it completes between CPU/ memory. However, since it consumes a large amount of memory per job, its not that anything can be stored in the memory. The data to be stored in memory based on I/O frequency and data size should be studied. This point is introduced even in Input/Output of data. |
Use multiple ItemProcessors at the same time
If a general ItemProcessor is provided to apply to each job,
it can be implemented by using Linking of multiple ItemProcessor by CompositeItemProcessor
Note that it is processed in the order specified in the delegates attribute. |
3.3. Creation of tasklet model job
3.3.1. Overview
How to create tasklet model job is explained. Refer to Spring Batch architecture for the architecture of tasklet model.
3.3.1.1. Components
Tasklet model job does not register multiple components.
It only implements org.springframework.batch.core.step.tasklet.Tasklet
and sets it in Bean definition.
ItemReader
and ItemWriter
which are components of the chunk model can also be used as components as the advanced implementation means.
3.3.2. HowToUse
How to implement tasklet model job is explained in the following order here.
3.3.2.1. Job configuration
Define tasklet model job in Bean definition file. An example is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:batch="http://www.springframework.org/schema/batch"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd">
<!-- (1) -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>
<!-- (2) -->
<context:annotation-config/>
<!-- (3) -->
<context:component-scan
base-package="org.terasoluna.batch.functionaltest.app.common"/>
<!-- (4) -->
<batch:job id="jobSimpleJob" job-repository="jobRepository"> <!-- (5) -->
<batch:step id="simpleJob.step01"> <!-- (6) -->
<batch:tasklet transaction-manager="jobTransactionManager"
ref="simpleJobTasklet"/> <!-- (7) -->
</batch:step>
</batch:job>
</beans>
package org.terasoluna.batch.functionaltest.app.common;
@Component // (3)
public class SimpleJobTasklet implements Tasklet {
// omitted
}
S. No. | Explanation |
---|---|
(1) |
Import the settings to always read the required Bean definition when using TERASOLUNA Batch 5.x. |
(2) |
Enable Bean definition using annotation. Use it with (3). |
(3) |
Set base package of component-scan target. Use it with (2). |
(4) |
Job configuration. |
(5) |
|
(6) |
Step configuration. |
(7) |
Tasklet configuration. The value to be set in |
Bean name when using annotation
Bean name when using |
3.3.2.2. Implementation of tasklet
First, understand the overview with simple implementation, then proceed to implementation using the components of the chunk model.
It is explained in the following order.
3.3.2.3. Implementation of simple tasklet
Basic points are explained through tasklet implementation only for log output.
package org.terasoluna.batch.functionaltest.app.common;
// omitted
@Component
public class SimpleJobTasklet implements Tasklet { // (1)
private static final Logger logger =
LoggerFactory.getLogger(SimpleJobTasklet.class);
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception { // (2)
logger.info("called tasklet."); // (3)
return RepeatStatus.FINISHED; // (4)
}
}
Sr. No. | Explanation |
---|---|
(1) |
Implement |
(2) |
Implement |
(3) |
Implement any process. INFO log is output here. |
(4) |
Return whether or not the tasklet process is completed. |
3.3.2.4. Implementation of tasklet using the components of chunk model
Spring Batch does not mention using various components of chunk model during tasklet implementation. In TERASOLUNA Batch 5.x, you can select this depending on the following situations.
-
When multiple resources are combined and processed, it is difficult to be as per the chunk model format
-
When processes are implemented at various places in the chunk model, tasklet model is better to understand the overall image easily
-
When recovery is made simple and you want to use batch commit of tasklet model instead of intermediate commit of chunk model
The tasklet implementation that uses ItemReader
and ItemWriter
which are the components of the chunk model is explained below.
@Component()
@Scope("step") // (1)
public class SalesPlanChunkTranTask implements Tasklet {
@Inject
@Named("detailCSVReader") // (2)
ItemStreamReader<SalesPlanDetail> itemReader; // (3)
@Inject
SalesPlanDetailRepository repository; // (4)
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
SalesPlanDetail item;
try {
itemReader.open(chunkContext.getStepContext().getStepExecution()
.getExecutionContext()); // (5)
while ((item = itemReader.read()) != null) { // (6)
// do some processes.
repository.create(item); // (7)
}
} finally {
itemReader.close(); // (8)
}
return RepeatStatus.FINISHED;
}
}
<!-- omitted -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>
<context:annotation-config/>
<context:component-scan
base-package="org.terasoluna.batch.functionaltest.app.plan" />
<context:component-scan
base-package="org.terasoluna.batch.functionaltest.ch05.transaction.component" />
<!-- (9) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.plan"
factory-ref="jobSqlSessionFactory"/>
<!-- (10) -->
<bean id="detailCSVReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="branchId,year,month,customerId,amount"/>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
</property>
</bean>
</property>
</bean>
<!-- (11) -->
<batch:job id="createSalesPlanChunkTranTask" job-repository="jobRepository">
<batch:step id="createSalesPlanChunkTranTask.step01">
<batch:tasklet transaction-manager="jobTransactionManager"
ref="salesPlanChunkTranTask"/>
</batch:step>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Set the same step scope as the Bean scope of ItemReader to be used in this class. |
(2) |
Access input resources (flat files in this example) through |
(3) |
Define the type as |
(4) |
Access output resources (database in this example) through Mapper of MyBatis. |
(5) |
Open input resource. |
(6) |
Loop all input resources sequentially. |
(7) |
Output to database. |
(8) |
Resource should be closed without fail. |
(9) |
To output to database, add the settings of |
(10) |
To enter from file, add Bean definition of |
(11) |
Since all the components are resolved by annotation, |
On unification of scope
The scope of tasklet implementation class and Bean to be Injected should have the same scope. For example, if If the scope of tasklet implementation class is set to |
Regarding the type of field assigned with @Inject
Any one of the following type depending on the implementation class to be used.
It should be decided which type to use after confirming javadoc. Typical examples are shown below.
|
Another example is shown when ItemReader
and ItemWriter
are used at the same time.
@Component
@Scope("step")
public class SalesPerformanceTasklet implements Tasklet {
@Inject
ItemStreamReader<SalesPerformanceDetail> reader;
@Inject
ItemWriter<SalesPerformanceDetail> writer; // (1)
int chunkSize = 10; // (2)
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
try {
reader.open(chunkContext.getStepContext().getStepExecution()
.getExecutionContext());
List<SalesPerformanceDetail> items = new ArrayList<>(chunkSize); // (2)
SalesPerformanceDetail item = null;
do {
// Pseudo operation of ItemReader
for (int i = 0; i < chunkSize; i++) { // (3)
item = reader.read();
if (item == null) {
break;
}
// Pseudo operation of ItemProcessor
// do some processes.
items.add(item);
}
// Pseudo operation of ItemWriter
if (!items.isEmpty()) {
writer.write(items); // (4)
items.clear();
}
} while (item != null);
} finally {
try {
reader.close();
} catch (Exception e) {
// do nothing.
}
}
return RepeatStatus.FINISHED;
}
}
<!-- omitted -->
<import resource="classpath:META-INF/spring/job-base-context.xml"/>
<context:annotation-config/>
<context:component-scan
base-package="org.terasoluna.batch.functionaltest.app.common,
org.terasoluna.batch.functionaltest.app.performance,
org.terasoluna.batch.functionaltest.ch06.exceptionhandling"/>
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.performance"
factory-ref="jobSqlSessionFactory"/>
<bean id="detailCSVReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="branchId,year,month,customerId,amount"/>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail"/>
</property>
</bean>
</property>
</bean>
<!-- (1) -->
<bean id="detailWriter"
class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="org.terasoluna.batch.functionaltest.app.repository.performance.SalesPerformanceDetailRepository.create"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>
<batch:job id="jobSalesPerfTasklet" job-repository="jobRepository">
<batch:step id="jobSalesPerfTasklet.step01">
<batch:tasklet ref="salesPerformanceTasklet"
transaction-manager="jobTransactionManager"/>
</batch:step>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Use |
(2) |
|
(3) |
As per the behavior of chunk model, |
(4) |
Output through |
Decide each time whether to use the implementation class of ItemReader
or ItemWriter
.
For file access, the implementation class of ItemReader
and ItemWriter
can be used.
For other than this such as database access, there is no need to use compulsorily. It can be used to improve performance.
3.4. How to choose chunk model or tasklet model
Here, how to choose chunk model and tasklet model is explained by organizing each feature. Refer to the following chapters which are explained in detail appropriately.
Understand the following contents as examples of concepts without any constraints or recommendations. Refer to it while creating a job depending on the characteristics of the users and systems.
The main differences between the chunk model and the tasklet model are given below.
Item | Chunk | Tasklet |
---|---|---|
Components |
It consists of 3 components mainly |
It is consolidated in one |
Transaction |
A certain number of records are processed by issuing intermediate commit. Batch commit cannot be done. |
The data is entirely processed by batch commit. There is a need for the user to implement intermediate commit. |
Restart |
It can be restarted based on the record count. |
It cannot be restarted based on the record count. |
Based on this, we will introduce some examples of using each one as follows.
- To make recovery as simple as possible
-
When the job having error, is to be recovered by only re-running the target job, tasklet model can be chooseed to make recovery simple.
In chunk model, it should be dealt by returning the processed data to the state before executing the job and by creating a job to process only the unprocessed data. - To consolidate the process contents
-
When you want to prioritize the outlook of job such as 1 job in 1 class, tasklet can be chooseed.
- To process large data stably
-
When performing batch process of 10 million records, consider to use chunk model in case the record count that influences the resources is the target. It means stabilizing the process by intermediate commit. Even in tasklet model, intermediate commit can be used, but it is simpler to implement in chunk model.
- To restart based on the record count for the recovery after error
-
When batch window is difficult and you want to resume from error data onwards, chunk model should be chooseed to use restart based on the record count provided by Spring Batch. This eliminates the need to create that mechanism for each job.
Chunk model and tasklet model are basically used in combination. For example, in most cases it is to choose a tasklet model if there is a margin in the number of processesing records and processing time. In a very small number of cases, choosing a chunk model for jobs that process large numbers of records. |
4. Running a job
4.1. Synchronous job
4.1.1. Overview
Synchronous job is explained. Synchronous job is the execution method of launching a new process through shell by job scheduler and returning the execution result of the job to the caller.
The usage method of this function is same in the chunk model as well as tasklet model.
4.1.2. How to use
How to running a job by CommandLineJobRunner
is explained.
Refer to Create project for building and executing the application. Refer to Job parameters for how to specify and use job parameters. Some explanation given in the above reference and in this section overlap however, the elements of synchronous job are mainly explained.
4.1.2.1. How to run
In TERASOLUNA Batch 5.x, run the synchronous job using CommandLineJobRunner
provided by Spring Batch.
Start CommandLineJobRunner
by issuing java command as shown below.
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <options> <jobIdentifier> <jobParameters>
Items to be specified | Explanation | Required |
---|---|---|
jobPath |
Bean definition file path where the settings of the job to be run are described. Specify by relative path from classpath. |
|
options |
Specify various options (stop, restart etc.) at the time of launching. |
|
jobIdentifier |
Specify job name in Bean definition or job run ID after job execution as the job identifier. Normally, specify job name. Specify job run ID only when specifying stop and restart options. |
|
jobParameters |
Specify job arguments. Specify in |
The execution example when only the required items are specified, is shown below.
. Execution example 1 of CommandLineJobRunner
----
$ java -cp 'target/archetypeId-version.jar:lib/*' \ # (1)
org.springframework.batch.core.launch.support.CommandLineJobRunner \ # (2)
META-INF/jobs/job01.xml job01 # (3)
----
<batch:job id="job01" job-repository="jobRepository"> <!-- (3) -->
<batch:step id="job01.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="employeeReader"
processor="employeeProcessor"
writer="employeeWriter" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Specify the batch application jar and dependent jar in |
(2) |
Specify |
(3) |
Pass the run arguments along the |
Execution example when launch parameters are specified as the optional items, is shown below.
$ java -cp 'target/archetypeId-version.jar:lib/*' \
org.springframework.batch.core.launch.support.CommandLineJobRunner \
META-INF/jobs/setupJob.xml setupJob target=server1 outputFile=/tmp/result.csv # (1)
Sr. No. | Explanation |
---|---|
(1) |
|
4.1.2.2. Options
Supplement the options indicated in CommandLineJobRunner syntax.
In CommandLineJobRunner
, the following 4 launch options can be used.
Here, only the overview of each option is explained.
- -restart
-
Restarts the failed job. Refer to Reprocessing for the details.
- -stop
-
Stops a running job. Refer to Job management for the details.
- -abandon
-
Abandons a stopped job. The abandoned job cannot be restarted. In TERASOLUNA Batch 5.x, there is no case of using this option, hence it is not explained.
- -next
-
Runs the job executed once in the past, again. However, in TERASOLUNA Batch 5.x, this option is not used.
In TERASOLUNA Batch 5.x, it is for avoiding the restriction "Running the job by the same parameter is recognized as the same job and the same job can be executed only once" that is given by default in Spring Batch.
The details are explained in regarding parameter conversion class.
For using this option, implementation class ofJobParametersIncrementer
interface is required, it is not set inTERASOLUNA Batch 5.x.
Therefore, when this option is specified and launched, an error occurs because the required Bean definition does not exist.
4.2. Job parameters
4.2.1. Overview
This section explains about using the job parameter (hereafter referred to as 'parameter').
The usage method of this function is same in the chunk model as well as tasklet model.
A parameter is used to flexibly switch the operation of the job according to the execution environment and execution timing as shown below.
-
File path of process target
-
System operation date and time
The following explanation is about assigning parameters.
The specified parameters can be referred in Bean definition or in Java under Spring management.
4.2.2. How to use
4.2.2.1. Regarding parameter conversion class
In Spring Batch, the received parameters are processed in the following sequence.
-
The implementation class of
JobParametersConverter
convert toJobParameters
. -
Refer to the parameters from
JobParameters
in Bean definition and Java under Spring management.
Multiple implementation classes of the above mentioned JobParametersConverter
are provided.
The features of each class are shown below.
-
DefaultJobParametersConverter
-
It can specify the data type of parameters(4 types; String, Long, Date, Double).
-
-
JsrJobParametersConverter
-
It cannot specify the data type of parameters (Only String).
-
It assigns ID (RUN_ID) that identifies job execution to parameter with the name
jsr_batch_run_id
automatically.-
It increments the RUN_ID each time the job is executed. Since it uses SEQUENCE (name is
JOB_SEQ
) of the database for incrementing, the name does not overlap. -
In Spring Batch, runnaing the job by the same parameters is identified as the same job and the same job can be executed only once. Whereas, adding a unique value to the parameter name
jsr_batch_run_id
will recognize it as a separate job. Refer to Spring Batch architecture for details.
-
-
In Spring Batch, when the implementation class of JobParametersConverter
to be used in Bean definition, is not specified, DefaultJobParametersConverter
is used.
However, in TERASOLUNA Batch 5.x, DefaultJobParametersConverter
is not used due to the following reasons.
-
It is common to run one job by the same parameter at different timing.
-
It is possible to specify the time stamp of the start time and manage them as different jobs, but it is complicated to specify job parameters only for that purpose.
-
DefaultJobParametersConverter
can specify data types for parameters, but handling becomes complicated when type conversion fails.
In TERASOLUNA Batch 5.x, by using JsrJobParametersConverter
, RUN_ID is automatically assigned without the user knowledge.
By this, the same job is handled as a different job in Spring Batch as seen by the user.
In TERASOLUNA Batch 5.x, it is set in advance so as to use JsrJobParametersConverter
in launch-context.xml
.
Therefore, when TERASOLUNA Batch 5.x is used with the recommended setting, there is no need to set JobParametersConverter
.
<bean id="jobParametersConverter"
class="org.springframework.batch.core.jsr.JsrJobParametersConverter"
c:dataSource-ref="adminDataSource" />
<bean id="jobOperator"
class="org.springframework.batch.core.launch.support.SimpleJobOperator"
p:jobRepository-ref="jobRepository"
p:jobRegistry-ref="jobRegistry"
p:jobExplorer-ref="jobExplorer"
p:jobParametersConverter-ref="jobParametersConverter"
p:jobLauncher-ref="jobLauncher" />
The following description assumes that JsrJobParametersConverter is used.
4.2.2.2. Assign from command-line arguments
Firstly, how to assign from the most basic command-line arguments, is explained.
Command-line arguments are enumerated in the <Parameter name>=<Value>
format after 3rd argument of CommandLineJobRunner
.
The number and length of parameters are not restricted in Spring Batch and {batch 5 _ shortname}.
However, there are restrictions in the length of command arguments in the OS.
Therefore, when a large number of arguments is required, the method of Redirect from file to standard input and
Using parameters and properties together should be used.
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID param1=abc outputFileName=/tmp/result.csv
Parameters can be referred in Bean definition or in Java as shown below.
-
Refer in Bean definition
-
It can be referred by
#{jobParamaters[xxx]}
-
-
Refer in Java
-
It can be referred by
@Value("#{jobParameters[xxx]}")
-
The scope of the Bean that refers to JobParameters should be Step scope
When referring to As its name implies, late binding is setting of the delayed value.
|
@StepScope annotation cannot be used for specifying Step scope
In Spring Batch, Therefore, specify the
|
<!-- (1) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"> <!-- (2) -->
<property name="lineMapper">
<!-- omitted settings -->
</property>
</bean>
Sr. No. | Explanation |
---|---|
(1) |
Specify scope as scope attribute in bean tag. |
(2) |
Specify the parameter to be referred. |
@Component
@Scope("step") // (1)
public class ParamRefInJavaTasklet implements Tasklet {
/**
* Holds a String type value
*/
@Value("#{jobParameters[str]}") // (2)
private String str;
// omitted execute()
}
Sr. No. | Explanation |
---|---|
(1) |
Specify scope by assigning |
(2) |
Specify the parameter to be referred by using |
4.2.2.3. Redirect from file to standard input
How to redirect from file to standard input is explained.
Define the parameters in the files as follows.
param1=abc
outputFile=/tmp/result.csv
Redirect the files wherein parameters are defined as command-line arguments.
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID < params.txt
How to refer to the parameters is same as the Assign from command-line arguments method.
4.2.2.4. Set the default value of parameter
When parameters are optional, default values can be set in the following format.
-
#{jobParameters[Parameter name] ?: Default value}
However, in the item where the value is set using parameters, the default values can also differ with the environment and execution timing same as the parameters.
Firstly, how to hardcode the default values in source code is explained. However, there are many cases where it is better to use Using parameters and properties together, so refer them also.
When the relevant parameter is not set, the value set as the default value is referred.
<!-- (1) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParamaters[inputFile] ?: /input/sample.csv}"> <!-- (2) -->
<property name="lineMapper">
// omitted settings
</property>
</bean>
Sr. No. | Explanation |
---|---|
(1) |
Specify the scope as scope attribute in the bean tag. |
(2) |
Specify the parameter to be referred. |
@Component
@Scope("step") // (1)
public class ParamRefInJavaTasklet implements Tasklet {
/**
* Holds a String type value
*/
@Value("#{jobParameters[str] ?: xyz}") // (2)
private String str;
// omitted execute()
}
Sr. No. | Explanation |
---|---|
(1) |
Specify the scope by assigning |
(2) |
Specify the parameter to be referred by using |
4.2.2.5. Validation of parameters
Validation of the parameters is required at job launch in order to prevent operation errors or unintended behavior.
Validation of parameters can be implemented by using the JobParametersValidator
provided by Spring Batch.
Since parameters are referred at various places such as ItemReader/ItemProcessor/ItemWriter, validation is performed immediately after the job is launched.
There are two ways to verify the validity of a parameter, and it differs with the degree of complexity of the verification.
-
-
Application example
-
Verify that the required parameters are set
-
Verify that the unspecified parameters are not set
-
-
Validator to be used
-
DefaultJobParametersValidator
provided by Spring Batch
-
-
-
-
Application example
-
Numerical value range verification and complex verification such as correlation check between parameters
-
Verification that cannot be done by
DefaultJobParametersValidator
provided by Spring Batch
-
-
Validator to be used
-
Class wherein
JobParametersValidator
is implemented independently
-
-
How to verify the validity of Simple validation and Complex validation is explained respectively.
4.2.2.5.1. Simple validation
Spring Batch provides DefaultJobParametersValidator
as the default implementation of JobParametersValidator
.
This validator can verify the following as per the settings.
-
Required parameters should be set
-
Parameters other than required or optional should not be specified
Definition example is shown as follows.
<!-- (1) -->
<bean id="jobParametersValidator"
class="org.springframework.batch.core.job.DefaultJobParametersValidator">
<property name="requiredKeys"> <!-- (2) -->
<list>
<value>jsr_batch_run_id</value> <!-- (3) -->
<value>inputFileName</value>
<value>outputFileName</value>
</list>
</property>
<property name="optionalKeys"> <!-- (4) -->
<list>
<value>param1</value>
<value>param2</value>
</list>
</property>
</bean>
<batch:job id="jobUseDefaultJobParametersValidator" job-repository="jobRepository">
<batch:step id="jobUseDefaultJobParametersValidator.step01">
<batch:tasklet ref="sampleTasklet" transaction-manager="jobTransactionManager"/>
</batch:step>
<batch:validator ref="jobParametersValidator"/> <!-- (5) -->
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Define Bean for |
(2) |
Set the required parameters to property |
(3) |
Set |
(4) |
Set optional parameters to property |
(5) |
Apply the validator to the job using validator tag in the job tag. |
Required parameters that cannot be omitted in TERASOLUNA Batch 5.x
Therefore, Example of parameter definition
|
An example when the verification result is OK and NG are shown to understand the verification possible conditions in DefaultJobParametersValidator
.
<bean id="jobParametersValidator"
class="org.springframework.batch.core.job.DefaultJobParametersValidator"
p:requiredKeys="outputFileName"
p:optionalKeys="param1"/>
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID param1=aaa
NG as the required parameter outputFile
is not set.
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID outputFileName=/tmp/result.csv param2=aaa
NG as the parameter param2
which is not specified for either the required parameter or the optional parameter is set.
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID param1=aaa outputFileName=/tmp/result.csv
OK as the parameters specified as required and optional are set.
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID fileoutputFilename=/tmp/result.csv
OK as the required parameters are set and there is no need to set optional parameters.
4.2.2.5.2. Complex validation
Implementing JobParametersValidator
interface independently
helps in verifying the parameters as per requirements.
Implement JobParametersValidator
class as follows.
-
Implement
JobParametersValidator
class and override validate method -
Implement validate method as follows
-
Fetch each parameter from
JobParameters
and verify-
If the verification result is OK, there is no need to perform any operation
-
If verification result is NG, throw
JobParametersInvalidException
-
-
Implementation example of JobParametersValidator
class is shown.
In this case, it is verified that the length of the string specified by str
is less than or equal to the number specified by num
.
public class ComplexJobParametersValidator implements JobParametersValidator { // (1)
@Override
public void validate(JobParameters parameters) throws JobParametersInvalidException {
Map<String, JobParameter> params = parameters.getParameters(); // (2)
String str = params.get("str").getValue().toString(); // (3)
int num = Integer.parseInt(params.get("num").getValue().toString()); // (4)
if(str.length() > num){
throw new JobParametersInvalidException(
"The str must be less than or equal to num. [str:"
+ str + "][num:" + num + "]"); // (5)
}
}
}
Sr. No. | Explanation |
---|---|
(1) |
Implement |
(2) |
Receive the parameters as arguments in |
(3) |
Get parameters by specifying key. |
(4) |
Convert parameters to int type. When handling parameters of other than String type, they should be appropriately converted. |
(5) |
Validation result is NG when the string length of the parameter |
<batch:job id="jobUseComplexJobParametersValidator" job-repository="jobRepository">
<batch:step id="jobUseComplexJobParametersValidator.step01">
<batch:tasklet ref="sampleTasklet" transaction-manager="jobTransactionManager"/>
</batch:step>
<batch:validator> <!-- (1) -->
<bean class="org.terasoluna.batch.functionaltest.ch04.jobparameter.ComplexJobParametersValidator"/>
</batch:validator>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Apply validator in the job by using validator tag in the job tag. |
Regarding validation of parameters at asynchronous start
By the asynchronous start method (DB polling and Web container), it is possible to verify the parameters at the job launch in the same way, however, it is desirable to verify them before launching the job at the following timing.
In case of asynchronous start, since it is necessary to confirm the result separately, errors such as parameter settings should be responded quickly and job requests should be rejected. For validation in this case, there is no need to use |
4.2.3. How to extend
4.2.3.1. Using parameters and properties together
Spring Framework based on Spring Batch is equipped with the property management function to enable it to handle the values set in the environment variables and property files. For details, refer to Property management of TERASOLUNA Server 5.x Development Guideline.
By combining properties and parameters, it is possible to overwrite some parameters after making common settings for most jobs in the property file.
About when parameters and propertis are resolved
As mentioned above, parameters and properties are different components that provide the function.
The timing of resolving each value is different.
Therefore, the parameter value is given priority by Spring Batch. |
How to set by combining properties and parameters, is explained.
In addition to the setting by environment variables, how to set the parameters using command-line arguments, is explained.
It is possible to refer to it in the same manner as Bean definition.
# Set environment variables
$ export env1=aaa
$ export env2=bbb
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID param3=ccc outputFile=/tmp/result.csv
@Value("${env1}") // (1)
private String param1;
@Value("${env2}") // (1)
private String param2;
private String param3;
@Value("#{jobParameters[param3]") // (2)
public void setParam3(String param3) {
this.param3 = param3;
}
Sr. No. | Explanation |
---|---|
(1) |
Specify the environment variables to be referred by using |
(2) |
Specify the parameters to be referred by using |
# Set environment variables
$ export env1=aaa
# Execute job
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
JobDefined.xml JOBID param1=bbb outputFile=/tmp/result.csv
@Value("#{jobParameters[param1] ?: '${env1}'}") // (1)
public void setParam1(String param1) {
this.param1 = param1;
}
Sr. No. | Explanation |
---|---|
(1) |
Specify the parameters to be referred by using |
How to set incorrect default values
When the following is defined and param1 is not set by command-line arguments, note that null is set in param1 irrespective of the fact that you want to set env1 value. Setting method example of incorrect default value
|
4.3. Asynchronous execution (DB polling)
4.3.1. Overview
Running a job using DB polling is explained.
The usage method of this function is same in the chunk model as well as tasklet model.
4.3.1.1. What is asynchronous execution by using DB polling?
A dedicated table which registers jobs to be executed asynchronously (hereafter referred to as Job-request-table) is monitored periodically and job is asynchronously executed based on the registered information.
In TERASOLUNA Batch 5.x, a module which monitors the table and starts the job is defined with the name asynchronous batch daemon.
Asynchronous batch daemon runs as a single Java process and executes by assigning threads in the process for each job.
4.3.1.1.1. Functions offered by TERASOLUNA Batch 5.x
TERASOLUNA Batch 5.x offers following functions as Asynchronous execution (DB polling).
Function | Description |
---|---|
Asynchronous batch daemon function |
A function which permanently executes Job-request-table polling function |
Job-request-table polling function |
A function which asynchronously executes the job based on information registered in the Job-request-table. |
Only job requests are managed in Job-request-table. Execution status and results of requested job is entrusted to JobRepository
.
It is assumed that job status is managed through these two factors.
Further, if in-memory database is used in JobRepository
, JobRepository
is cleared after terminating asynchronous batch daemon and job execution status and results cannot be referred.
Hence, it is assumed that a database that is ensured to be persistent is used in JobRepository
.
Using in-memory database
When job execution results success or failure can be obtained without referring |
4.3.1.1.2. Usage scene
A few scenes which use asynchronous execution (DB polling).
Usage scene | Description |
---|---|
Delayed processing |
When it is not necessary to complete the operation immediately in coordination with online processing and the operation which takes time to process is to be extracted as a job. |
Continuous execution of jobs with short processing time |
When continuous processing is done for a few seconds or a few tens of seconds for 1 job. |
Aggregation of large number of jobs |
Same as continuous execution of jobs with short processing time. |
Points to choose asynchronous execution(DB polling) instead of asynchronous execution (Web container)
Points to choose asynchronous execution(DB polling) instead of Asynchronous execution (Web container) are shown below.
|
Reasons not to use Spring Batch Integeration
The same function can be implemented by using Spring Batch Integeration. |
Precautions in asynchronous execution (DB polling)
When a large number of super short batches which are less than several seconds for 1 job are to be executed, database including |
4.3.2. Architecture
4.3.2.1. Processing sequence of DB polling
Processing sequence of DB polling is explained.
-
Launch
AsyncBatchDaemon
from sh etc. -
AsyncBatchDaemon
reads all Bean definition files which defines the jobs at the startup. -
AsyncBatchDaemon
startsTaskScheduler
for polling at regular intervals.-
TaskScheduler
starts a specific process at regular interval.
-
-
TaskScheduler
startsJobRequestPollTask
(a process which performs polling of Job-request-table). -
JobRequestPollTask
fetches a record for which the polling status is "not executed" (INIT), from Job-request-table.-
Fetch a fixed number of records collectively. Default is 3 records.
-
When the target record does not exist, perform polling at regular intervals. Default is 5 seconds interval.
-
-
JobRequestPollTask
allocates jobs to thread and executes them based on information of records. -
JobRequestPollTask
updates polling status of the Job-request-table to "polled" (POLLED).-
When number of synchronous execution jobs is achieved, the record which cannot be activated from the fetched records is discarded and the record is fetched again at the time of next polling process.
-
-
Job assigned to the thread run a job with
JobOperator
. -
Fetch job execution ID of executed jobs (Job execution id).
-
JobRequestPollTask
updates the polling status of the Job-request-table to "Executed" (EXECUTED) based on job execution ID fetched at the time of job execution.
Supplement of processing sequence
Spring Batch reference shows that asynchronous execution can be implemented by setting
The processing sequence described earlier is used in order to avoid this phenomenon. |
4.3.2.2. About the table to be polled
Explanation is given about table which performs polling in asynchronous execution (DB polling).
Following database objects are necessary.
-
Job-request-table (Required)
-
Job sequence (Required for some database products)
-
It is necessary when database does not support auto-numbering of columns.
-
4.3.2.2.1. Job-request-table structure
PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown. For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.
Column Name | Data type | Constraint | Description |
---|---|---|---|
job_seq_id |
bigserial (Use bigint to define a separate sequence) |
NOT NULL |
A number to determine the sequence of jobs to be executed at the time of polling. |
job_name |
varchar(100) |
NOT NULL |
Job name to be executed. |
job_parameter |
varchar(2000) |
- |
Parameters to be passed to jobs to be executed. Single parameter format is same as synchronous execution, however, when multiple parameters are to be specified,
each parameter must be separated by a comma (see below) unlike blank delimiters of synchronous execution. {Parameter name}={parameter value},{Parameter name}={Parameter value}… |
job_execution_id |
bigint |
- |
ID to be paid out at the time of job execution. |
polling_status |
varchar(10) |
NOT NULL |
Polling process status. |
create_date |
TIMESTAMP |
NOT NULL |
Date and time when the record of the job request is registered. |
update_date |
TIMESTAMP |
- |
Date and time when the record of job request is updated. |
DDL is as below.
CREATE TABLE IF NOT EXISTS batch_job_request (
job_seq_id bigserial PRIMARY KEY,
job_name varchar(100) NOT NULL,
job_parameter varchar(200),
job_execution_id bigint,
polling_status varchar(10) NOT NULL,
create_date timestamp NOT NULL,
update_date timestamp
);
4.3.2.2.2. Job request sequence structure
When the database does not support auto-numbering of database columns, numbering according to sequence is required.
A PostgreSQL from database products corresponding to TERASOLUNA Batch 5.x is shown.
For other databases, refer DDL included in jar of TERASOLUNA Batch 5.x.
DDL is as below.
CREATE SEQUENCE batch_job_request_seq MAXVALUE 9223372036854775807 NO CYCLE;
Since PostgreSQL supports auto-numbering of columns, job request sequence is not defined in DDL included in jar of TERASOLUNA Batch 5.x. |
4.3.2.2.3. Transition pattern of polling status (polling_status)
Transition pattern of polling status is shown in the table below.
Transition source | Transition destination | Description |
---|---|---|
INIT |
INIT |
When the number of synchronous executions has been achieved and execution of job is denied, status remains unchanged. |
INIT |
POLLED |
Transition is done when the job is successfully started. |
POLLED |
EXECUTED |
Transition occurs when job execution is completed. |
4.3.2.3. About job running
Running method of job is explained.
Job is run by start
method of JobOperator
offered by Spring Batch in Job-request-table polling function of
TERASOLUNA Batch 5.x.
With TERASOLUNA Batch 5.x, guidelines explain the restart of jobs started by asynchronous execution (DB polling) from the command line.
Hence, JobOperator
also contains startup methods like restart
etc besides start
, however,
only start
method is used.
- jobName
-
Set the value registered in
job_name
of Job-request-table. - jobParametrers
-
Set the value registered in
job_parameters
of Job-request-table.
4.3.2.4. When abnormality is detected in DB polling process.
Explanation is given for when an abnormality is detected in DB polling process.
4.3.2.4.1. Database connection failure
Describe behaviour for the processing performed at the time of failure occurrence.
- When records of Job-request-table are fetched
-
-
JobRequestPollTask
results in an error, however,JobRequestPollTask
is executed again in next polling.
-
- While changing the polling status from INIT to POLLED
-
-
JobRequestPollTask
terminates with an error prior to executing job byJobOperator
. Polling status remains unchanged as INIT. -
In the polling process performed after connection failure recovery, the job becomes a target for execution as there is no change in the Job-request-table and the job is executed at the next polling.
-
- While changing polling status from POLLED to EXECUTED
-
-
JobRequestPollTask
terminates with an error since the job execution ID cannot be updated in the Job-request-table. Polling status remains unchanged as POLLED. -
It is out of the scope for the polling process to be performed after connection failure recovery and the job at the time of failure is not executed.
-
Since a job execution ID cannot be identified from a Job-request-table, final status of the job is determined from log or
JobRepository
and re-execute the job as a process of recovery when required.
-
Even if an exception occurs in
|
4.3.2.5. Stopping DB polling process
Asynchronous batch daemon (AsyncBatchDaemon
) stops by generation of a file.
After confirming that the file has been generated, make the polling process idle, wait as long as possible to job being started and then stop the process.
4.3.2.6. About application configuration specific to asynchronous execution
Configuration specific to asynchronous execution is explained.
4.3.2.6.1. ApplicationContext configuration
Asynchronous batch daemon reads async-batch-daemon.xml
dedicated to asynchronous execution as ApplicationContext.
Configuration below is added besides launch-context.xml
used in synchronous execution as well.
- Asynchronous execution settings
-
A Bean necessary for asynchronous execution like
JobRequestPollTask
etc. is defined. - Job registration settings
-
Job executed as an asynchronous execution registers by
org.springframework.batch.core.configuration.support.AutomaticJobRegistrar
. Context for each job is modularized by usingAutomaticJobRegistrar
. When modularization is done, it does not pose an issue even of Bean ID used between the jobs is duplicated.
What is modularization
Modularization is a hierarchical structure of "Common definition - Definition of each job" and the Bean defined in each job belongs to an independent context between jobs. If a reference to a Bean which is not defined in each job definition exists, it refers to a Bean defined in common definition. |
4.3.2.6.2. Bean definition structure
Bean definition of a job can be same as Bean definition of synchronous execution. However, following precautions must be taken.
-
When job is to be registered by
AutomaticJobRegistrar
, Bean ID of the job is an identifier, and hence should not be duplicated. -
It is also desirable to not to duplicate Bean ID of step.
-
Only the job ID should be uniquely designed by designing naming rules of Bean ID as
{Job ID}.{Step ID}
.
-
Import of
This is because various Beans required for starting Spring Batch need not be instantiated for each job.
Only one bean should be created in common definition ( |
4.3.3. How to use
4.3.3.1. Various settings
4.3.3.1.1. Settings for polling process
Use batch-application.properties
for settings required for asynchronous execution.
#(1)
# Admin DataSource settings.
admin.jdbc.driver=org.postgresql.Driver
admin.jdbc.url=jdbc:postgresql://localhost:5432/postgres
admin.jdbc.username=postgres
admin.jdbc.password=postgres
# TERASOLUNA AsyncBatchDaemon settings.
# (2)
async-batch-daemon.schema.script=classpath:org/terasoluna/batch/async/db/schema-postgresql.sql
# (3)
async-batch-daemon.job-concurrency-num=3
# (4)
async-batch-daemon.polling-interval=5000
# (5)
async-batch-daemon.polling-initial-delay=1000
# (6)
async-batch-daemon.polling-stop-file-path=/tmp/end-async-batch-daemon
Sr. No. | Description |
---|---|
(1) |
Connection settings for database wherein Job-request-table is stored. |
(2) |
A path for DDL which defines Job-request-table. |
(3) |
Setting for records which are fetched collectively at the time of polling. This setup value is also used as a synchronous parallel number. |
(4) |
Polling cycle settings. Unit is milliseconds. |
(5) |
Polling initial start delay time settings. Unit is milliseconds. |
(6) |
Exit file path settings. |
Changing setup value using environment variable
Setup value of Settings for launch-context.xml
For details, refer How to define a property file of TERASOLUNA Server 5.x Development Guideline. |
4.3.3.1.2. Job settings
Job to be executed asynchronously is set in automaticJobRegistrar
of async-batch-daemon.xml
.
Default settings are shown below.
<bean id="automaticJobRegistrar"
class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
<property name="applicationContextFactories">
<bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
<property name="resources">
<list>
<value>classpath:/META-INF/jobs/**/*.xml</value> <!-- (1) -->
</list>
</property>
</bean>
</property>
<property name="jobLoader">
<bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
p:jobRegistry-ref="jobRegistry" />
</property>
</bean>
Sr.No. | Description |
---|---|
(1) |
A path for Bean definition of a job executed asynchronously. |
About registered jobs
For registering jobs, jobs which are designed and implemented on the premise that they are executed asynchronously should be specified. If the jobs which are not supposed to be executed asynchronously are included, exceptions may occur due to unintended references at the time of job registration. Example of Narrowing down
|
Input value verification for job parameters
|
Job design considerations
As a characteristic of asynchronous execution (DB polling), the same job can be executed in parallel. It is necessary to prevent the same job to create an impact when the jobs are run in parallel. |
4.3.3.2. From start to end of asynchronous execution
Start and end of asynchronous batch daemon and how to register in Job-request-table are explained.
4.3.3.2.1. Start of asynchronous batch daemon
Start AsyncBatchDaemon
offered by TERASOLUNA Batch 5.x.
# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon
In this case, META-INF/spring/async-batch-daemon.xml
is read and various Beans are generated.
Further, when async-batch-daemon.xml
customised separately, it is implemented by specifying first argument and starting AsyncBatchDaemon
.
Bean definition file specified in the argument must be specified as a relative path from the class path.
Note that, the second and subsequent arguments are ignored.
# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon \
META-INF/spring/customized-async-batch-daemon.xml
Customisation of async-batch-daemon.xml
can be modified directly by changing some of the settings.
However, when significant changes are added or when multiple settings are managed in Multiple runningsdescribed later,
it is easier to manage and create separate files.
It should be choosed according to user’s situation..
It is assumed that jar expressions necessary for execution are stored under dependency. |
4.3.3.2.2. Job request
Register in Job-request-table by issuing SQL of INSERT statement.
INSERT INTO batch_job_request(job_name,job_parameter,polling_status,create_date)
VALUES ('JOB01', 'param1=dummy,param2=100', 'INIT', current_timestamp);
4.3.3.2.3. Stopping asynchronous batch daemon
Keep exit file set in batch-application.properties
.
$ touch /tmp/end-async-batch-daemon
When the exit file exists prior to starting asynchronous batch daemon
When the exit file exists prior to starting asynchronous batch daemon, asynchronous batch daemon terminates immediately. Asynchronous batch daemon must be started in the absence of exit file. |
4.3.3.3. Confirm job status
Job status management is performed with JobRepository
offered by Spring Batch and the job status is not managed in the Job-request-table.
Job-request-table has a column of job_execution_id
and job status corresponding to individual requests can be confirmed by the value stored in this column.
Here, a simple example wherein SQL is issued directly and job status is confirmed is shown.
For details of job status confirmation, refer Status confirmation.
SELECT job_execution_id FROM batch_job_request WHERE job_seq_id = 1;
job_execution_id
----------------
2
(1 row)
SELECT * FROM batch_job_execution WHERE job_execution_id = 2;
job_execution_id | version | job_instance_id | create_time | start_time | end_time | status | exit_code | exit_message |
ocation
------------------+---------+-----------------+-------------------------+-------------------------+-------------------------+-----------+-----------+--------------+-
--------
2 | 2 | 2 | 2017-02-06 20:54:02.263 | 2017-02-06 20:54:02.295 | 2017-02-06 20:54:02.428 | COMPLETED | COMPLETED | |
(1 row)
4.3.3.4. Recovery after a job is terminated abnormally
For basic points related to recovery of a job which is terminated abnormally, refer Re-execution of process. Here, points specific to asynchronous execution are explained.
4.3.3.4.1. Re-run
Job which is terminated abnormally is re-run by inserting it as a separate record in Job-request-table.
4.3.3.4.2. Restart
When the job which is terminated abnormally is to be restarted, it is executed as a synchronous execution job from the command line.
The reason for executing from the command line is "since it is difficult to determine whether the restart is intended or whether it is an unintended duplicate execution resulting in chaotic operation."
For restart methods, refer Job restart.
4.3.3.4.3. Termination
-
When the process has not terminated even after exceeding the expected processing time, attempt terminating the operation from the command line. For methods of termination, refer Job stop.
-
When the termination is not accepted even from a command line, asynchronous batch daemon should be terminated by Stopping asynchronous batch daemon.
-
If even an asynchronous batch daemon cannot be terminated, process of asynchronous batch daemon should be forcibly terminated.
Adequate care should be taken not to impact other jobs when an asynchronous batch daemon is being terminated. |
4.3.3.5. About environment deployment
Building and deploying job is same as a synchronous execution. However, it is important to narrow down the jobs which are executed asynchronously as shown in Job settings.
4.3.3.6. Evacuation of cumulative data
If you run an asynchronous batch daemon for a long time, a huge amount of data is accumulated in JobRepository and the Job-request-table. It is necessary to clear this cumulative data for the following reasons.
-
Performance degradation when data is retrieved or updated for a large quantity of data
-
Duplication of ID due to circulation of ID numbering sequence
For evacuation of table data and resetting a sequence, refer manual for the database to be used.
List of tables and sequences for evacuation is shown below.
Table/Sequence | Framework offered |
---|---|
batch_job_request |
TERASOLUNA Batch 5.x |
batch_job_request_seq |
|
batch_job_instance |
Spring Batch |
batch_job_exeution |
|
batch_job_exeution_params |
|
batch_job_exeution_context |
|
batch_step_exeution |
|
batch_step_exeution_context |
|
batch_job_seq |
|
batch_job_execution_seq |
|
batch_step_execution_seq |
Auto-numbering column sequence
Since a sequence is created automatically for an auto-numbering column, remember to include this sequence while evacuating data. |
About database specific specifications
Note that Oracle uses database-specific data types in some cases, such as using CLOB for data types. |
4.3.4. How to extend
4.3.4.1. Customising Job-request-table
Job-request-table can be customised by adding a column in order to change extraction conditions of fetched records.
However, only BatchJobRequest
can be passed as an item while issuing SQL from JobRequestPollTask
.
Extension procedure by customising the Job-request-table is shown below.
-
Customising Job-request-table
-
Creating an extension interface of
BatchJobRequestMapper
interface -
Defining SQLMap which uses customised table
-
Modifying Bean definition of
async-batch-daemon.xml
Examples of customization are as below.
Hereafter, the extension procedure will be described for these two examples.
4.3.4.1.1. Example of controlling job execution sequence by priority column
-
Customising Job-request-table
Add a priority column (priority) in Job-request-table.
CREATE TABLE IF NOT EXISTS batch_job_request (
job_seq_id bigserial PRIMARY KEY,
job_name varchar(100) NOT NULL,
job_parameter varchar(200),
priority int NOT NULL,
job_execution_id bigint,
polling_status varchar(10) NOT NULL,
create_date timestamp NOT NULL,
update_date timestamp
);
-
Create extension interface of
BatchJobRequestMapper
interface
An interface which extends `BatchJobRequestMapper
interface is created.
// (1)
public interface CustomizedBatchJobRequestMapper extends BatchJobRequestMapper {
// (2)
}
Sr. No. | Description |
---|---|
(1) |
Extend |
(2) |
Do not add a method. |
-
Definition of SQLMap which use a customised table
Define SQL in SQLMap with priority as a condition for order.
<!-- (1) -->
<mapper namespace="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestMapper">
<select id="find" resultType="org.terasoluna.batch.async.db.model.BatchJobRequest">
SELECT
job_seq_id AS jobSeqId,
job_name AS jobName,
job_parameter AS jobParameter,
job_execution_id AS jobExecutionId,
polling_status AS pollingStatus,
create_date AS createDate,
update_date AS updateDate
FROM
batch_job_request
WHERE
polling_status = 'INIT'
ORDER BY
priority ASC, <!--(2) -->
job_seq_id ASC
LIMIT #{pollingRowLimit}
</select>
<!-- (3) -->
<update id="updateStatus">
UPDATE
batch_job_request
SET
polling_status = #{batchJobRequest.pollingStatus},
job_execution_id = #{batchJobRequest.jobExecutionId},
update_date = #{batchJobRequest.updateDate}
WHERE
job_seq_id = #{batchJobRequest.jobSeqId}
AND
polling_status = #{pollingStatus}
</update>
</mapper>
Sr. No. | Description |
---|---|
(1) |
Set extended interface of |
(2) |
Add priority to ORDER clause. |
(3) |
Do not change updated SQL. |
-
Modifying Bean definition of
async-batch-daemon.xml
Set extended interface created in (2) in batchJobRequestMapper
.
<!--(1) -->
<bean id="batchJobRequestMapper"
class="org.mybatis.spring.mapper.MapperFactoryBean"
p:mapperInterface="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestMapper"
p:sqlSessionFactory-ref="adminSqlSessionFactory" />
Sr. No. | Description |
---|---|
(1) |
Set extended interface of |
4.3.4.1.2. Distributed processing by multiple processes using a group ID
Specify group ID by using environment variable while starting AsyncBatchDaemon
and narrow down the target job.
-
Customizing Job-request-table
Add group ID column (group_id) to Job-request-table.
CREATE TABLE IF NOT EXISTS batch_job_request (
job_seq_id bigserial PRIMARY KEY,
job_name varchar(100) NOT NULL,
job_parameter varchar(200),
group_id varchar(10) NOT NULL,
job_execution_id bigint,
polling_status varchar(10) NOT NULL,
create_date timestamp NOT NULL,
update_date timestamp
);
-
Creating extended interface of
BatchJobRequestMapper
interface
-
Definition of SQLMap which use customised table
Define SQL in SQLMap with the group ID as the extraction condition.
<!-- (1) -->
<mapper namespace="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestMapper">
<select id="find" resultType="org.terasoluna.batch.async.db.model.BatchJobRequest">
SELECT
job_seq_id AS jobSeqId,
job_name AS jobName,
job_parameter AS jobParameter,
job_execution_id AS jobExecutionId,
polling_status AS pollingStatus,
create_date AS createDate,
update_date AS updateDate
FROM
batch_job_request
WHERE
polling_status = 'INIT'
AND
group_id = #{groupId} <!--(2) -->
ORDER BY
job_seq_id ASC
LIMIT #{pollingRowLimit}
</select>
<!-- ommited -->
</mapper>
Sr. No. | Description |
---|---|
(1) |
Set extended interface of |
(2) |
Add groupId to extraction conditions. |
-
Modifying Bean definition of
async-batch-daemon.xml
Set extended interface created in (2) in batchJobRequestMapper
and
set the group ID assigned by environment variable in jobRequestPollTask
as a query parameter.
<!--(1) -->
<bean id="batchJobRequestMapper"
class="org.mybatis.spring.mapper.MapperFactoryBean"
p:mapperInterface="org.terasoluna.batch.extend.repository.CustomizedBatchJobRequestMapper"
p:sqlSessionFactory-ref="adminSqlSessionFactory" />
<bean id="jobRequestPollTask"
class="org.terasoluna.batch.async.db.JobRequestPollTask"
c:transactionManager-ref="adminTransactionManager"
c:jobOperator-ref="jobOperator"
c:batchJobRequestMapper-ref="batchJobRequestMapper"
c:daemonTaskExecutor-ref="daemonTaskExecutor"
c:automaticJobRegistrar-ref="automaticJobRegistrar"
p:optionalPollingQueryParams-ref="pollingQueryParam" /> <!-- (2) -->
<bean id="pollingQueryParam"
class="org.springframework.beans.factory.config.MapFactoryBean">
<property name="sourceMap">
<map>
<entry key="groupId" value="${GROUP_ID}"/> <!-- (3) -->
</map>
</property>
</bean>
Sr. No. | Description |
---|---|
(1) |
Set extended interface of |
(2) |
Set Map defined in (3), in |
(3) |
Set group ID assigned by environment variable (GROUP_ID) in group ID (groupId) of query parameter. |
-
Set group ID in environment variable and start
AsyncBatchDaemon
.
# Set environment variables
$ export GROUP_ID=G1
# Start AsyncBatchDaemon
$ java -cp dependency/* org.terasoluna.batch.async.db.AsyncBatchDaemon
4.3.4.2. Multiple runnings
Asynchronous batch daemon is run on multiple servers for the following purposes.
-
Enhanced availability
-
It suffices if an asynchronous batch job can be executed on one of the servers, and it is to eliminate the situation that the job can not be run.
-
-
Enhanced performance
-
When batch processing load is to be distributed across multiple servers
-
-
Effective use of resources
-
When a specific job is to be distributed on a server with optimal resources when a variation is observed in the server performance
-
Equivalent to dividing a job node based on group ID shown in Customising Job-request-table
-
-
An operational design must be adopted considering whether it can be used based on the viewpoints given above.
When multiple asynchronous batch daemons fetch identical job request records
Since |
4.3.5. Appendix
4.3.5.1. About modularization of job definition
Although it is briefly explained in ApplicationContext configuration, following events can be avoided by using AutomaticJobRegistrar
.
-
When same BeanID (BeanName) is used, Bean is overwritten and the job shows unintended behaviour.
-
Accordingly, there is a high risk of occurrence of unintended errors.
-
-
Naming should be performed to make all Bean IDs in the job unique, to avoid these errors.
-
As the number of job increases, it becomes difficult to manage the same resulting in unnecessary troubles.
-
An event when AutomaticJobRegistrar
is not used is explained.
Since the contents explained here pose the issues given above, it is not used in asynchronous execution.
<!-- Reader -->
<!-- (1) -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
p:queryId="jp.terasoluna.batch.job.repository.EmployeeRepositoy.findAll"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- Writer -->
<!-- (2) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[basedir]}/input/employee.csv">
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="invoiceNo,salesDate,productId,customerId,quant,price"/>
</property>
</bean>
</property>
</bean>
<!-- Job -->
<batch:job id="job1" job-repository="jobRepository">
<batch:step id="job1.step">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>
</batch:step>
</batch:job>
<!-- Reader -->
<!-- (3) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[basedir]}/input/invoice.csv">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="invoiceNo,salesDate,productId,customerId,quant,price"/>
</property>
<property name="fieldSetMapper" ref="invoiceFieldSetMapper"/>
</bean>
</property>
</bean>
<!-- Writer -->
<!-- (4) -->
<bean id="writer" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="jp.terasoluna.batch.job.repository.InvoiceRepository.create"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- Job -->
<batch:job id="job2" job-repository="jobRepository">
<batch:step id="job2.step">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="100" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="automaticJobRegistrar"
class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
<property name="applicationContextFactories">
<bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
<property name="resources">
<list>
<value>classpath:/META-INF/jobs/other/async/*.xml</value> <!-- (5) -->
</list>
</property>
</bean>
</property>
<property name="jobLoader">
<bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
p:jobRegistry-ref="jobRegistry"/>
</property>
</bean>
<bean class="org.springframework.batch.core.configuration.support.JobRegistryBeanPostProcessor"
p:jobRegistry-ref="jobRegistry" />
<import resource="classpath:/META-INF/jobs/async/*.xml" /> <!-- (6) -->
Sr. No. | Description |
---|---|
(1) |
In Job1, ItemReader which reads from the database is defined by a Bean ID - |
(2) |
In Job1, ItemWriter which writes in a file is defined by a Bean ID - |
(3) |
In Job2, ItemReader which reads from the file is defined by a Bean ID - |
(4) |
In Job2, ItemWriter which writes to a database is defined by a Bean ID - |
(5) |
|
(6) |
Use import of Spring and enable reading of target job definition. |
In this case, if Job1.xml and Job2.xml are read in the sequence, reader and writer to be defined by Job1.xml will be overwritten by Job2.xml definition.
As a result, when Job1 is executed, reader and writer of Job2 are used and intended processing cannot be performed.
4.4. Asynchronous execution (Web container)
4.4.1. Overview
A method to execute the job asynchronously in Web container is explained.
The usage method of this function is same in the chunk model as well as tasklet model.
Web application that contains a job is deployed in a Web container
and the job is executed based on information of sent request.
Since one thread is allocated for each job execution and operation is run in parallel,
it can be executed independent of processes for other jobs and requests.
TERASOLUNA Batch 5.x does not offer implementation for asynchronous execution (Web container).
Only methods of implementation will be provided in this guideline.
This is because the start timing of the Web application is various such as HTTP / SOAP / MQ,
and hence it is determined that the implementation should be appropriately done by the user.
-
A Web container is required besides the application.
-
Besides implementation of job, required Web application and client are separately implemented according to the operation requirements.
-
Execution status and results of the job is entrusted to
JobRepository
. Further, a permanently residing database is used instead of in-memory database to enable execution status and results of job to be referred fromJobRepository
even after stopping Web container.
It is same as Asynchronous execution (DB polling) - Overview.
Difference with asynchronous execution (DB polling)
On the architecture front, immediacy at the time of asynchronous execution and presence or absence of request management table are different. |
4.4.2. Architecture
Asynchronous jobs by using this method are operated as applications (war) deployed on the Web container, however, the job itself runs asynchronously (another thread) from the request processing of Web container.
-
Web client requests Web container to execute the job.
-
JobController
asksJobOperator
of Spring Batch to start the execution of the job. -
Execute the job asynchronously by using
ThreadPoolTaskExecutor
. -
Return a job execution ID (
job execution id
) for uniquely identifying an executed target job. -
JobController
returns a response including job execution ID for the Web client. -
Execute target job.
-
Job results are reflected in
JobRepository
.
-
-
Job
returns execution results. It cannot be notified directly to the client.
-
Web client sends job execution ID and
JobController
to Web container. -
JobController
asksJobExplorer
for execution results of job by using a job execution ID. -
JobExplorer
returns job execution results. -
JobController
returns a response for Web client.-
Set Job execution ID in the response.
-
After receiving a request using Web container, operation is synchronised with the request processing till job execution ID payout, however subsequent job execution is performed asynchronously in a thread pool different from that of Web container.+ As long as the query is not sent again by sending a request, it signifies that execution status of asynchronous job cannot be detected on Web client side.
Hence, the request should be sent once at the time of "running a job" on the Web client side during one job execution.
When "confirmation of results" is necessary, request must be sent once again to the Web container.
Abnormality detection which looks different from first "running a job" will be explained later in
About detection of abnormality occurrence at the time of running a job.
Job execution status can be checked by referring direct RDBMS, by using |
About handling job execution ID (job execution id)
Job execution ID generates a different sequence value for each job even though job and job parameters are identical. |
4.4.2.1. About detection of abnormality occurrence at the time of running a job
After sending a job run request from Web client, abnormality detection appearance varies along with job execution ID payout.
-
Abnormality can be detected immediately by the response at the time of running a job
-
Job to be activated does not exist.
-
Invalid job parameter format.
-
-
After running a job, queries regarding job execution status and results for Web container are necessary
-
Job execution status
-
Job start failure due to depletion of thread pool used in asynchronous job execution
-
"Job running error" can be detected as an exception occurring in Spring MVC controller. Since the explanation is omitted here, refer Implementation of exception handling of TERASOLUNA Server 5.x Development Guideline described separately. Further, input check of the request used as a job parameter is performed in the Spring MVC controller as required. |
Job start failure occurring due to depletion of thread pool cannot be caught at the time of running a job.
Job start failure due to depletion of thread pool is not generated from
|
4.4.2.2. Application configuration of asynchronous execution (Web container)
The function is same as Asynchronous execution (DB polling) and use
async
and AutomaticJobRegistrar
of Spring profile as a configuration specific to asynchronous execution.
On the other hand, prior knowledge and some specific settings are required in order to use these functions asynchronously (Web container).
Refer ApplicationContext configuration.
For configuration methods of basic async
profile and AutomaticJobRegistrar
,
How to implement applications using asynchronous execution (Web container) will be described later.
4.4.2.2.1. ApplicationContext configuration
As described above, multiple application modules are included as application configuration of asynchronous execution (Web container).
It is necessary to understand respective application contexts, types of Bean definitions and their relationships.
ApplicationContext
of batch application is incorporated in the context, in ApplicationContext
during asynchronous execution (Web container).
Individual job contexts are modularised from Web context using AutomaticJobRegistrar
and
it acts as a sub-context of Web context.
Bean definition file which constitute respective contexts are explained.
Sr. No. | Description |
---|---|
(1) |
Common Bean definition file. |
(2) |
Bean definition file which is always imported from job Bean definitions. |
(3) |
Bean definition file created for each job. |
(4) |
It is read from |
(5) |
It acts as a parent context shared within the Web application by using |
4.4.3. How to use
Here, explanation is given using TERASOLUNA Server Framework for Java (5.x), as an implementation example of Web application.
Kindly remember that only explanation is offered and TERASOLUNA Server 5.x is not a necessary requirement of asynchronous execution (Web container).
4.4.3.1. Overview of implementation of application by asynchronous execution (Web container)
Explanation is given based on following configuration.
-
Web application project and batch application project are independent and a batch application is referred from a web application.
-
war file generated from Web application project contains jar file generated from batch application project
-
Implementation of asynchronous execution is performed in accordance with Architecture wherein Spring
MVC controller in the Web application starts the job by using JobOperator
.
About isolation of Web/batch application project
Final deliverable of application build is a war file of Web application, however,
a development project should be implemented by separating Web/batch applications. |
Web/batch development is explained now assuming the use of 2 components below.
-
Batch application project by TERASOLUNA Batch 5.x
-
Web application project by TERASOLUNA Server 5.x
For how to create a batch application project and how to implement a basic job, refer How to create a project, Creation of chunk model job, Creation of tasklet model job.
Here, we will focus on starting a batch application from a Web application.
Here, explanation is given by creating a batch application project, by using Maven archetype:generate.
Name | Value |
---|---|
groupId |
org.terasoluna.batch.sample |
archetypeId |
asyncbatch |
version |
1.0-SNAPSHOT |
package |
org.terasoluna.batch.sample |
A job registered from the beginning for a blank project is used for convenience of explanation.
Name | Description |
---|---|
Job name |
job01 |
Job parameter |
param1=value1 |
Precautions for asynchronous execution (Web container) job design
Individual jobs are completed in a short period of time as a characteristic of asynchronous execution (Web container)
and are operated in a stateless manner on the Web container. |
Create a Web application as a state wherein a jar file including a job implementation can be created.
How to implement a Web application is explained by using a blank project offered by TERASOLUNA Server 5.x. For details, refer TERASOLUNA Server 5.x Development Guideline Creating a development project for Web application.
Here, similar to asynchronous execution application project, explanation is given below creating with the following names.
Name | Value |
---|---|
groupId |
org.terasoluna.batch.sample |
archetypeId |
asyncapp |
version |
1.0-SNAPSHOT |
package |
org.terasoluna.batch.sample |
About naming of groupId
Although naming a project is optional, when a batch application as a Maven multiproject is considered as a sub-module,
it is easy to manage if |
4.4.3.2. Various settings
Edit pom.xml and include batch application as a part of Web application.
Batch application is registered in NEXUS or Maven local repository as |
<project>
<!-- omitted -->
<modules>
<module>asyncapp-domain</module>
<module>asyncapp-env</module>
<module>asyncapp-initdb</module>
<module>asyncapp-web</module>
<module>asyncapp-selenium</module>
<module>asyncbatch</module> <!-- (1) -->
</modules>
</project>
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>org.terasoluna.batch.sample</groupId> <!-- (2) -->
<artifactId>asyncbatch</artifactId>
<version>1.0-SNAPSHOT</version> <!-- (2) -->
<!-- (1) -->
<parent>
<groupId>org.terasoluna.batch.sample</groupId>
<artifactId>asyncapp</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<!-- omitted -->
</project>
Sr. No. | Description |
---|---|
(1) |
Add settings for considering the Web application as a parent and batch application as a child. |
(2) |
Delete unnecessary description with deletion of child or sub-module. |
Add a batch application as a dependent library of Web application.
<project>
<!-- omitted -->
<dependencies>
<!-- (1) -->
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>asyncbatch</artifactId>
<version>${project.version}</version>
</dependency>
<!-- omitted -->
</dependencies>
<!-- omitted -->
</project>
Sr. No. | Description |
---|---|
(1) |
Add a batch application as a dependent library of Web application. |
4.4.3.3. Implementation of Web application
Here, a RESTful Web service is created as a Web application using TERASOLUNA Server 5.x Development Guideline as a reference below.
Setting for enabling Spring MVC component which is necessary for RESTful Web Service
4.4.3.3.1. Web application settings
At first, add, delete and edit various configuration files from the blank project of Web application.
For the explanation, an implementation which use RESTful Web Service as an implementation status of batch application is given. |
<!-- omitted -->
<!-- (1) -->
<import resource="classpath:META-INF/spring/launch-context.xml"/>
<bean id="jsonMessageConverter"
class="org.springframework.http.converter.json.MappingJackson2HttpMessageConverter"
p:objectMapper-ref="objectMapper"/>
<bean id="objectMapper"
class="org.springframework.http.converter.json.Jackson2ObjectMapperFactoryBean">
<property name="dateFormat">
<bean class="com.fasterxml.jackson.databind.util.StdDateFormat"/>
</property>
</bean>
<mvc:annotation-driven>
<mvc:message-converters register-defaults="false">
<ref bean="jsonMessageConverter"/>
</mvc:message-converters>
</mvc:annotation-driven>
<mvc:default-servlet-handler/>
<!-- (2) -->
<context:component-scan base-package="org.terasoluna.batch.sample.app.api"/>
<!-- (3) -->
<bean class="org.springframework.batch.core.configuration.support.AutomaticJobRegistrar">
<property name="applicationContextFactories">
<bean class="org.springframework.batch.core.configuration.support.ClasspathXmlApplicationContextsFactoryBean">
<property name="resources">
<list>
<value>classpath:/META-INF/jobs/**/*.xml</value>
</list>
</property>
</bean>
</property>
<property name="jobLoader">
<bean class="org.springframework.batch.core.configuration.support.DefaultJobLoader"
p:jobRegistry-ref="jobRegistry"/>
</property>
</bean>
<!-- (4) -->
<task:executor id="taskExecutor" pool-size="3" queue-capacity="10"/>
<!-- (5) -->
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"
p:jobRepository-ref="jobRepository"
p:taskExecutor-ref="taskExecutor"/>
<!-- omitted -->
<!-- omitted -->
<servlet>
<servlet-name>restApiServlet</servlet-name>
<servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
<init-param>
<param-name>contextConfigLocation</param-name>
<!-- (6) -->
<param-value>classpath*:META-INF/spring/spring-mvc-rest.xml</param-value>
</init-param>
<!-- (7) -->
<init-param>
<param-name>spring.profiles.active</param-name>
<param-value>async</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>restApiServlet</servlet-name>
<url-pattern>/api/v1/*</url-pattern>
</servlet-mapping>
<!-- omitted -->
Sr. No. | Description |
---|---|
(1) |
Import |
(2) |
Describe package for dynamically scanning the controller. |
(3) |
Describe a Bean definition of |
(4) |
Define Further, multiplicity of threads which can be operated in parallel can be specified. |
(5) |
Override |
(6) |
Specify |
(7) |
Specify |
When async profile is not specified
In this case, a Bean defined in |
Thread pool sizing
When the upper limit of thread pool is in excess, an enormous amount of jobs run in parallel resulting
in deterioration of entire thread pool.
Sizing should be done and appropriate upper value must be determined. Further, a separate request must be sent from Web client for checking occurrence of |
Here, "Running a job" and "Job status check" are defined as 2 examples of requests used in REST API.
Sr.No. | API | Path | HTTP method | Request / Response | Message format | Message details |
---|---|---|---|---|---|---|
(1) |
Running a job |
/api/v1/job/Job name |
POST |
Request |
JSON |
Job parameter |
Response |
JSON |
Job execution ID |
||||
(2) |
Job execution status check |
/api/v1/job/Job execution ID |
GET |
Request |
N/A |
N/A |
Response |
JSON |
Job execution ID |
4.4.3.3.2. Implementation of JavaBeans used in Controller
Create following 3 classes that are returned to REST client as JSON message.
-
Job run operation
JobOperationResource
-
Job execution status
JobExecutionResource
-
Step execution status
StepExecutionResource
These classes are implementations for reference except for job execution ID (job execution id
) of JobOperationResource
and implementation of field is optional.
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/JobOperationResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;
public class JobOperationResource {
private String jobName = null;
private String jobParams = null;
private Long jobExecutionId = null;
private String errorMessage = null;
private Exception error = null;
// Getter and setter are omitted.
}
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/JobExecutionResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;
// omitted.
public class JobExecutionResource {
private Long jobExecutionId = null;
private String jobName = null;
private Long stepExecutionId = null;
private String stepName = null;
private List<StepExecutionResource> stepExecutions = new ArrayList<>();
private String status = null;
private String exitStatus = null;
private String errorMessage;
private List<String> failureExceptions = new ArrayList<>();
// Getter and setter are omitted.
}
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/jobinfo/StepExecutionResource.java
package org.terasoluna.batch.sample.app.api.jobinfo;
public class StepExecutionResource {
private Long stepExecutionId = null;
private String stepName = null;
private String status = null;
private List<String> failureExceptions = new ArrayList<>();
// Getter and setter are omitted.
}
4.4.3.3.3. Implementation of controller
A controller of RESTful Web Service is implemented by using @RestController
.
in order to simplify, JobOperator
is injected in the controller and running a job and execution status are fetched.
Of course, JobOperator
can also be started by using Service from the controller in accordance with TERASOLUNA Server 5.x.
About job parameters that are passed at the time of running a job
The job parameter passed in the second argument of This is same as the method of specifying job parameters in Asynchronous execution (DB polling). |
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/JobController.java
package org.terasoluna.batch.sample.app.api;
// omitted
// (1)
@RequestMapping("job")
@RestController
public class JobController {
// (2)
@Inject
JobOperator jobOperator;
// (2)
@Inject
JobExplorer jobExplorer;
@RequestMapping(value = "{jobName}", method = RequestMethod.POST)
public ResponseEntity<JobOperationResource> launch(@PathVariable("jobName") String jobName,
@RequestBody JobOperationResource requestResource) {
JobOperationResource responseResource = new JobOperationResource();
responseResource.setJobName(jobName);
try {
// (3)
Long jobExecutionId = jobOperator.start(jobName, requestResource.getJobParams());
responseResource.setJobExecutionId(jobExecutionId);
return ResponseEntity.ok().body(responseResource);
} catch (NoSuchJobException | JobInstanceAlreadyExistsException | JobParametersInvalidException e) {
responseResource.setError(e);
return ResponseEntity.badRequest().body(responseResource);
}
}
@RequestMapping(value = "{jobExecutionId}", method = RequestMethod.GET)
@ResponseStatus(HttpStatus.OK)
public JobExecutionResource getJob(@PathVariable("jobExecutionId") Long jobExecutionId) {
JobExecutionResource responseResource = new JobExecutionResource();
responseResource.setJobExecutionId(jobExecutionId);
// (4)
JobExecution jobExecution = jobExplorer.getJobExecution(jobExecutionId);
if (jobExecution == null) {
responseResource.setErrorMessage("Job execution not found.");
} else {
mappingExecutionInfo(jobExecution, responseResource);
}
return responseResource;
}
private void mappingExecutionInfo(JobExecution src, JobExecutionResource dest) {
dest.setJobName(src.getJobInstance().getJobName());
for (StepExecution se : src.getStepExecutions()) {
StepExecutionResource ser = new StepExecutionResource();
ser.setStepExecutionId(se.getId());
ser.setStepName(se.getStepName());
ser.setStatus(se.getStatus().toString());
for (Throwable th : se.getFailureExceptions()) {
ser.getFailureExceptions().add(th.toString());
}
dest.getStepExecutions().add(ser);
}
dest.setStatus(src.getStatus().toString());
dest.setExitStatus(src.getExitStatus().toString());
}
}
Sr. No. | Description |
---|---|
(1) |
Specify |
(2) |
Describe field injections of |
(3) |
Use |
(4) |
Use |
4.4.3.3.4. Integration of Web/batch application module setting
Batch application module (asyncbatch
) operates as a stand-alone application.
Hence, batch application module (asyncbatch
) consists of settings which are in conflict and overlapping with settings of Web application module (asyncapp-web
).
These settings must be integrated as required.
-
Integration of log configuration file
logback.xml
When multiple Logback definition files are defined in Web/batch, they do not work appropriately.+ The contents ofasyncbatch/src/main/resources/logback.xml
are integrated into same file ofasyncapp-env/src/main/resources/
and then the file is deleted. -
Data source and MyBatis configuration file are not integrated
Definitions of data source and MyBatis configuration file are not integrated between Web/batch since the definition of application context is independent due to following relation.-
asyncbatch
module of the batch is defined in the servlet as a closed context. -
asyncapp-domain
andasyncapp-env
modules of Web are defined as contexts used by entire application.
-
Cross-reference of data source and MyBatis settings by Web and batch modules
Since the scope of context for Web and batch modules is different,
data source, MyBatis settings and Mapper interface cannot be referred especially from Web module. |
CSRF countermeasures specific to REST controller
When a request is sent for REST controller in the initialization settings of Web blank project, it results in a CSRF error and execution of job is rejected. Hence, explanation is given here assuming that CSRF countermeasures are disabled by the following method. Web application created here is not published on the internet and CSRF countermeasures are disabled on the premise that REST request is not sent from a third party who can exploit CSRF as a means of attack. Please note that necessity may differ in the actual Web application depending on the operating environment. |
4.4.3.3.5. Build
Build Maven command and create a war file.
$ cd asyncapp
$ ls
asyncbatch/ asyncapp-web/ pom.xml
$ mvn clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3)
[INFO] TERASOLUNA Batch Framework for Java (5.x) Blank Project
[INFO] asyncapp-web
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3) 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
(omitted)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] TERASOLUNA Server Framework for Java (5.x) Web Blank Multi Project (MyBatis3) SUCCESS [ 0.226 s]
[INFO] TERASOLUNA Batch Framework for Java (5.x) Blank Project SUCCESS [ 6.481s]
[INFO] asyncapp-web ....................................... SUCCESS [ 5.400 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.597 s
[INFO] Finished at: 2017-02-10T22:32:43+09:00
[INFO] Final Memory: 38M/250M
[INFO] ------------------------------------------------------------------------
$
4.4.3.4. Job start and confirmation of execution results using REST Client
Here, curl command is used as a REST client and an asynchronous job is started.
$ curl -v \
-H "Accept: application/json" -H "Content-type: application/json" \
-d '{"jobParams": "param1=value1"}' \
http://localhost:8080/asyncapp-web/api/v1/job/job01
* timeout on name lookup is not supported
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> POST /asyncapp-web/api/v1/job/job01 HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.51.0
> Accept: application/json
> Content-type: application/json
> Content-Length: 30
>
* upload completely sent off: 30 out of 30 bytes
< HTTP/1.1 200
< X-Track: 0267db93977b4552880a4704cf3e4565
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 10 Feb 2017 13:55:46 GMT
<
{"jobName":"job01","jobParams":null,"jobExecutionId":3,"error":null,"errorMessag
e":null}* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
$
From the above, it can be confirmed that job is executed with a job execution ID jobExecutionId = 3
.
Subsequently, job execution results are fetched by using job execution ID.
$ curl -v http://localhost:8080/asyncapp-web/api/v1/job/3
* timeout on name lookup is not supported
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8088 (#0)
> GET /asyncapp-web/api/v1/job/3 HTTP/1.1
> Host: localhost:8088
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200
< X-Track: 7d94bf4d383745efb20cbf37cb6a8e13
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Fri, 10 Feb 2017 14:07:44 GMT
<
{"jobExecutionId":3,"jobName":"job01","stepExecutions":[{"stepExecutionId":5,"st
epName":"job01.step01","status":"COMPLETED","failureExceptions":[]}],"status":"C
OMPLETED","exitStatus":"exitCode=COMPLETED;exitDescription=","errorMessage":null
}* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
$
Since exitCode=COMPLETED
, it can be confirmed that the job is completed successfully.
When execution results of curl are to be determined by a shell script etc
In the example above, it is displayed upto the response message using REST API.
When only HTTP status is to be confirmed by curl command, HTTP status can be displayed in standard output by considering |
4.4.4. How to extend
4.4.4.1. Stopping and restarting jobs
It is necessary to stop and restart asynchronous jobs from the multiple jobs that are being executed.
Further, when jobs of identical names are running in parallel, it is necessary to target only those jobs with the issues.
Hence, job execution to be targeted must be identified and the status of the job must be confirmed.
When this premise is met, an implementation for stopping and restarting asynchronous executions is explained here.
Further, a method to add job stopping (stop) and restarting (restart) is explained
for JobController
of Implementation of controller.
Job stopping and restarting can also be implemented without using JobOperator .For details, refer Job management and identify a method suitable for this objective. |
// asyncapp/asyncapp-web/src/main/java/org/terasoluna/batch/sample/app/api/JobController.java
package org.terasoluna.batch.sample.app.api;
// omitted
@RequestMapping("job")
@RestController
public class JobController {
// omitted.
@RequestMapping(value = "stop/{jobExecutionId}", method = RequestMethod.PUT)
@Deprecated
public ResponseEntity<JobOperationResource> stop(
@PathVariable("jobExecutionId") Long jobExecutionId) {
JobOperationResource responseResource = new JobOperationResource();
responseResource.setJobExecutionId(jobExecutionId);
boolean result = false;
try {
// (1)
result = jobOperator.stop(jobExecutionId);
if (!result) {
responseResource.setErrorMessage("stop failed.");
return ResponseEntity.badRequest().body(responseResource);
}
return ResponseEntity.ok().body(responseResource);
} catch (NoSuchJobExecutionException | JobExecutionNotRunningException e) {
responseResource.setError(e);
return ResponseEntity.badRequest().body(responseResource);
}
}
@RequestMapping(value = "restart/{jobExecutionId}",
method = RequestMethod.PUT)
@Deprecated
public ResponseEntity<JobOperationResource> restart(
@PathVariable("jobExecutionId") Long jobExecutionId) {
JobOperationResource responseResource = new JobOperationResource();
responseResource.setJobExecutionId(jobExecutionId);
try {
// (2)
Long id = jobOperator.restart(jobExecutionId);
responseResource.setJobExecutionId(id);
return ResponseEntity.ok().body(responseResource);
} catch (JobInstanceAlreadyCompleteException |
NoSuchJobExecutionException | NoSuchJobException |
JobRestartException | JobParametersInvalidException e) {
responseResource.setErrorMessage(e.getMessage());
return ResponseEntity.badRequest().body(responseResource);
}
}
// omitted.
}
Sr. No. | Description |
---|---|
(1) |
Specify "stop" for job being executed by calling |
(2) |
Re-execute from the step where the job has terminated abnormally or stopped by calling |
4.4.4.2. Multiple running
Multiple running signify that a Web container is started for multiple times and waits for respective job requests.
Execution of asynchronous jobs is controlled by external RDBMS so as to connect to each application. By sharing an external RDBMS, it is possible to wait for an asynchronous job to be started across the same enclosure or another enclosure.
Applications include load balancing and redundancy for specific jobs. However, as described in Implementation of Web application, these effects cannot be obtained easily just by starting multiple Web containers or enhancing parallel operations. Sometimes measures similar to a general Web application need to be taken in order to obtain the effect. An example is given below.
-
1 request processing operates in a stateless manner according to the characteristics of Web application, however, asynchronous execution of batch is likely to have a reduced failure tolerance unless it is designed in combination with job start results and confirmation.
For example, even when Web container for starting a job is made redundant, it is difficult to confirm the progress and results of the job when the job execution ID is lost after starting a job due to failure on the client side. -
A function to distribute request destinations on the client side must be implemented and a load balancer must be introduced in order to distribute the load on multiple Web containers.
In this way, adequacy of multiple starts cannot be necessarily determined. Hence, using load balancer and reviewing a control method to send requests by Web client should be considered based on the purpose and use. A design which does not degrade the performance and fault tolerance of the asynchronous execution application is required.
4.5. Listener
4.5.1. Overview
A listener is an interface for inserting processing before and after executing a job or a step.
Since this function works differently for chunk model and tasket model, respective explanations are given.
A listener consists of multiple interfaces, respective roles are explained here. Subsequently, how to set and implement a listener is explained.
4.5.1.1. Types of listener
A lot of listener interfaces are defined in Spring Batch. All will not be explained here, however we will focus on the interface with highest usage frequency.
A listener is roughly divided into 2 types.
- JobListener
-
An interface to insert the processing for execution of the job
- StepListener
-
An interface to insert the processing for execution of the step
About JobListener
An interface called |
4.5.1.1.1. JobListener
JobListener
interface consists of only one JobExecutionListener
.
- JobExecutionListener
-
Process is inserted prior to starting a job and after terminating a job.
public interface JobExecutionListener {
void beforeJob(JobExecution jobExecution);
void afterJob(JobExecution jobExecution);
}
4.5.1.1.2. StepListener
StepListener
interface is of multiple types as below.
- StepListener
-
Marker interfaces of various listeners will be introduced later.
- StepExecutionListener
-
Process is inserted prior to starting a step and after terminating a job.
public interface StepExecutionListener extends StepListener {
void beforeStep(StepExecution stepExecution);
ExitStatus afterStep(StepExecution stepExecution);
}
- ChunkListener
-
A process is inserted between before and after processing of one chunk and when an error occurs.
public interface ChunkListener extends StepListener {
static final String ROLLBACK_EXCEPTION_KEY = "sb_rollback_exception";
void beforeChunk(ChunkContext context);
void afterChunk(ChunkContext context);
void afterChunkError(ChunkContext context);
}
Uses of ROLLBACK_EXCEPTION_KEY
It is used when the exception occurred is to be fetched by Usage example
|
- ItemReadListener
-
Insert a process before and after fetching 1 data record by ItemReader and when an error occurs.
public interface ItemReadListener<T> extends StepListener {
void beforeRead();
void afterRead(T item);
void onReadError(Exception ex);
}
- ItemProcessListener
-
Insert a process before and after processing 1 data record by ItemProcessor and when an error occurs.
public interface ItemProcessListener<T, S> extends StepListener {
void beforeProcess(T item);
void afterProcess(T item, S result);
void onProcessError(T item, Exception e);
}
- ItemWriteListener
-
Insert a process before and after output of 1 chunk by ItemWriter and when an error occurs.
public interface ItemWriteListener<S> extends StepListener {
void beforeWrite(List<? extends S> items);
void afterWrite(List<? extends S> items);
void onWriteError(Exception exception, List<? extends S> items);
}
This guideline does not explain following listeners.
These listeners are intended to be used for exception handling, however, the policy of these guidelines is not to perform exception handling using these listeners. For details, refer Exception handling. |
4.5.2. How to use
Explanation is given about how to implement and set a listener.
4.5.2.1. Implementation of a listener
Explanation is given about how to implement and set a listener.
-
Implement the listener interface with
implements
. -
Implement components with method-based annotation.
The type of implementation to use will be choosed on the role of the listener. Criteria will be described later.
4.5.2.1.1. When an interface is to be implemented
Various listener interfaces are implemented by using implements
. Multiple interfaces can be implemented at the same time based on requirement.
Implementation example is shown below.
@Component
public class JobExecutionLoggingListener implements JobExecutionListener { // (1)
private static final Logger logger =
LoggerFactory.getLogger(JobExecutionLoggingListener.class);
@Override
public void beforeJob(JobExecution jobExecution) { // (2)
// do nothing.
}
@Override
public void afterJob(JobExecution jobExecution) { // (3)
logger.info("job finished.[JobName:{}][ExitStatus:{}]",
jobExecution.getJobInstance().getJobName(),
jobExecution.getExitStatus().getExitCode()); // (4)
// per step execution
// (5)
jobExecution.getStepExecutions().forEach(stepExecution -> {
Object errorItem = stepExecution.getExecutionContext().get("ERROR_ITEM");
if (errorItem != null) {
logger.error("detected error on this item processing. " +
"[step:{}] [item:{}]", stepExecution.getStepName(),
errorItem);
}
});
}
}
<batch:job id="chunkJobWithListener" job-repository="jobRepository">
<batch:step id="chunkJobWithListener.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="processor"
writer="writer" commit-interval="10"/>
<batch:listeners>
<batch:listener ref="loggingEachProcessInStepListener"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="jobExecutionLoggingListener"/> <!-- (6) -->
</batch:listeners>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement |
(3) |
Implement |
(4) |
Output job name and exit code in INFO log. Fetch necessary information from |
(5) |
Output exception occurred for each step in a log. Fetch and implement |
(6) |
Set the listener implemented in (1), in |
Listener support class
When multiple listener interfaces are set to Support class
|
4.5.2.1.2. When annotations are assigned
Annotations corresponding to various listener interfaces are assigned. Multiple annotations can also be implemented as required.
Listener interface | Annotation |
---|---|
|
|
|
|
|
|
|
|
|
|
|
These annotations work for the target scope by assigning them to the implementation method which is divided into components. Implementation example is given below.
@Component
public class AnnotationAmountCheckProcessor implements
ItemProcessor<SalesPlanDetail, SalesPlanDetail> {
private static final Logger logger =
LoggerFactory.getLogger(AnnotationAmountCheckProcessor.class);
@Override
public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
if (item.getAmount().signum() == -1) {
throw new IllegalArgumentException("amount is negative.");
}
return item;
}
// (1)
/*
@BeforeProcess
public void beforeProcess(Object item) {
logger.info("before process. [Item :{}]", item);
}
*/
// (2)
@AfterProcess
public void afterProcess(Object item, Object result) {
logger.info("after process. [Result :{}]", result);
}
// (3)
@OnProcessError
public void onProcessError(Object item, Exception e) {
logger.error("on process error.", e);
}
}
<batch:job id="chunkJobWithListenerAnnotation" job-repository="jobRepository">
<batch:step id="chunkJobWithListenerAnnotation.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader"
processor="annotationAmountCheckProcessor"
writer="writer" commit-interval="10"/> <! -- (4) -->
</batch:tasklet>
</batch:step>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
When the annotation is to be used for implementation, only the annotations of the timing required for the processing should be assigned. |
(2) |
Implement the process to be performed after the processing of ItemProcess. |
(3) |
Implement processing when an error occurs in ItemProcess. |
(4) |
Set ItemProcess wherein the listener is implemented by using annotation in |
Constraints for the method which assigns the annotations
Any method cannot be used as a method to assign the annotation. The signature must match with the method of corresponding listener interface. This point is clearly mentioned in javadoc of respective annotations. |
Precautions while implementing JobExecutionListener by an annotation
Since JobExecutionListener has a different scope than the other listeners, listener is not automatically registered in the configuration above.
Hence, it is necessary to explicitly set in the |
Implementation of a listener to Tasklet implementation by using annotation
When a listener is implemented in Tasklet implementation by using an annotation, Note that listener does not start with the following settings. In case of Tasklet
In case of Tasket model, the listener interface should be used in accordance with How to choose an interface or an annotation. |
4.5.2.2. Listener settings
Listeners are set by <listeners>
.<listener>
tag of Bean definition.
Although it can be described at various locations by XML schema definition, some operations do not work as intended based on the type of interface.
Set it to the following position.
<!-- for chunk mode -->
<batch:job id="chunkJob" job-repository="jobRepository">
<batch:step id="chunkJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="(1)"
processor="(1)"
writer="(1)" commit-interval="10"/>
<batch:listeners>
<batch:listener ref="(2)"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="(3)"/>
</batch:listeners>
</batch:job>
<!-- for tasklet mode -->
<batch:job id="taskletJob" job-repository="jobRepository">
<batch:step id="taskletJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager" ref="tasklet">
<batch:listeners>
<batch:listener ref="(2)"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="(3)"/>
</batch:listeners>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Set the component which includes the implementation attributing to StepListener, performed by using an annotation. |
(2) |
Set listener interface implementation attributing to StepListener. |
(3) |
Set listener attributing to JobListener. |
4.5.2.2.1. Setting multiple listeners
Multiple listeners can be set in <batch:listeners>
tag.
The sequence in which the listeners are started while registering multiple listeners is shown below.
-
ItemProcessListener implementation
-
listenerA, listenerB
-
-
JobExecutionListener implementation
-
listenerC, listenerD
-
<batch:job id="chunkJob" job-repository="jobRepository">
<batch:step id="chunkJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader"
processor="pocessor"
writer="writer" commit-interval="10"/>
<batch:listeners>
<batch:listener ref="listenerA"/>
<batch:listener ref="listenerB"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="listenerC"/>
<batch:listener ref="listenerD"/>
</batch:listeners>
</batch:job>
-
Processing corresponding to pre-processing is started in the sequence of listener registration.
-
Processing corresponding to post-processing or error processing is started in the reverse sequence of listener registration.
4.5.2.3. How to choose an interface or an annotation
How to choose listener used a interface or listener used an annotation is explained.
- Listener interface
-
It is used in case of cross-sectional processes which are shared across job, step and chunk.
- Annotation
-
It is used when business logic specific process is to be performed.
As a rule, it is implemented only for ItemProcessor.
5. Input/Output of Data
5.1. Transaction control
5.1.1. Overview
In this section, transaction control in jobs will be described in the following order.
Since this function is different in usage between chunk model and tasklet model, each will be explained.
5.1.1.1. About the pattern of transaction control in general batch processing
Generally, since batch processing is processing a large number of cases, if any errors are thrown at the end of the processing and all processing need to be done again,
the batch system schedule will be adversely affected.
In order to avoid this, the influence at the time of error occurrence is often localized by advancing the process
while confirming the transaction for each fixed number of data within the processing of one job.
(Hereafter, we call the "intermediate commit method" as the method of commiting the transaction for every fixed number of data,
and the "chunk" as the one grouping the data in the commit unit.)
The points of the intermediate commit method are summarized below.
-
Localize the effects at the time of error occurrence.
-
Even if an error occurs, the processing to the chunk just before the error part is confirmed.
-
-
Only use a certain amount of resources.
-
Regardless of whether the data to be processed is large or small, only resources for chunks are used, so they are stable.
-
However, the intermediate commit method is not a valid method in every situation.
Processed data and unprocessed data are mixed in the system even though it is temporary.
As a result, since it is necessary to identify unprocessed data at the time of recovery processing, there is a possibility that the recovery becomes complicated.
In order to avoid this, all of the cases must be confirmed with one transaction, and not use the intermediate commit method.
(Hereinafter, the method of determining all transactions in one transaction is called "single commit method".)
Nevertheless, if you process a large number of such as tens of thousands of items in a single commit method, you will get a heavy load trying to reflect all the databases when committing. Therefore, although the single commit method is suitable for small-scale batch processing, care must be taken when adopting it in a large-scale batch. So this method is not a versatile method too.
In other words, there is a trade-off between "localization of impact" and "ease of recovery".
Which one of "intermediate commit method" and "single commit method" is used depends on the nature of the job and decides which one should be prioritized.
Of course, it is not necessary to implement all the jobs in the batch system on either side.
It is natural to basically use "intermediate commit method" and use "single commit method" for special jobs (or the other way).
Below is the summary of advantages, disadvantages and adoption points of "intermediate commit method" and "single commit method".
Commit method | Advantage | Disadvantage | Adoption point |
---|---|---|---|
intermediate commit method |
Localize the effect at the time of error occurrence |
Recovery processing may be complicated |
When you want to process large amounts of data with certain machine resources |
single commit method |
Ensure data integrity |
There is a possibility of high work-load when processing a large number of cases |
When you want to set the processing result for the persistent resource to All or Nothing |
Notes on inputting and outputting to the same table in the database
In terms of the structure of the database, care is required when handling large amounts of data in processing to input and output to the same table regardless of the commit method.
In order to avoid this, the following measures are taken.
|
5.1.2. Architecture
5.1.2.1. Transaction control in Spring Batch
Job transaction control leverages the mechanism of Spring Batch.
Two kinds of transactions are defined below.
- Framework transaction
-
Transaction controlled by Spring Batch
- User transaction
-
Transactions controlled by the user
5.1.2.1.1. Transaction control mechanism in chunk model
Transaction control in the chunk model is only the intermediate commit method. A single commit method can not be done.
The single commit method in the chunk model is reported in JIRA. |
A feature of this method is that transactions are repeatedly performed for each chunk.
- Transaction control in normal process
-
Transaction control in normal process will be explained.
-
Steps are executed from the job.
-
The subsequent processing is repeated until there is no input data.
-
Start a framework transaction on a per chunk basis.
-
Repeat steps 2 to 5 until the chunk size is reached.
-
-
The step obtains input data from
ItemReader
. -
ItemReader
returns the input data to the step. -
In the step,
ItemProcessor
processes input data. -
ItemProcessor
returns the processing result to the step. -
The step outputs data for chunk size with
ItemWriter
. -
ItemWriter
will output to the target resource. -
The step commits the framework transaction.
- Transaction control in abnormal process
-
Transaction control in abnormal process will be explained.
-
Steps are executed from the job.
-
The subsequent processing is repeated until there is no input data.
-
Start a framework transaction on a per chunk basis.
-
Repeat steps 2 to 5 until the chunk size is reached.
-
-
The step obtains input data from
ItemReader
. -
ItemReader
returns the input data to the step. -
In the step,
ItemProcessor
processes input data. -
ItemProcessor
returns the processing result to the step. -
The step outputs data for chunk size with
ItemWriter
. -
ItemWriter
will output to the target resource.
If any exception occurs between the process from 2 to 7,
-
The step rolls back the framework transaction.
5.1.2.1.2. Mechanism of transaction control in tasklet model
For transaction control in the tasklet model, either the single commit method or the intermediate commit method can be used.
- single commit method
-
Use the transaction control mechanism of Spring Batch
- Intermediate commit method
-
Manipulate the transaction directly with the user
Explain the mechanism of transaction control by Spring Batch.
A feature of this method is to process data repeatedly within one transaction.
- Transaction control in normal process
-
Transaction control in normal process will be explained.
-
Steps are executed from the job.
-
The step starts a framework transaction.
-
-
The step executes the tasklet.
-
Repeat steps 3 to 7 until there is no more input data.
-
-
Tasklet gets input data from
Repository
. -
Repository
will return input data to tasklet. -
Tasklets process input data.
-
Tasklets pass output data to
Repository
. -
Repository
will output to the target resource. -
The tasklet returns the process end to the step.
-
The step commits the framework transaction.
- Transaction control in abnormal process
-
Transaction control in abnormal process will be explained.
-
Steps are executed from the job.
-
The step starts a framework transaction.
-
-
The step executes the tasklet.
-
Repeat steps 3 to 7 until there is no more input data.
-
-
Tasklet gets input data from
Repository
. -
Repository
will return input data to tasklet. -
Tasklets process input data.
-
Tasklets pass output data to
Repository
. -
Repository
will output to the target resource.
If any exception occurs between the process from 2 to 7,
-
The tasklet throws an exception to the step.
-
The step rolls back the framework transaction.
A mechanism for directly operating a transaction by a user will be described.
A feature of this scheme is to start and manipulate a new user transaction within a framework transaction.
- Transaction control in normal process
-
Transaction control in normal process will be explained.
-
Steps are executed from the job.
-
The step starts framework transaction.
-
-
The step executes the tasklet.
-
Repeat steps 3 to 10 until there is no more input data.
-
-
The tasklet starts user transaction via
TransacitonManager
.-
Execute the user transaction with REQUESTS_NEW to separate it from the framework transaction.
-
Repeat steps 4 to 6 until the chunk size is reached.
-
-
Tasklet gets input data from
Repository
. -
Repository
will return input data to tasklet. -
Tasklets process input data.
-
Tasklets pass output data to
Repository
. -
Repository
will output to the target resource. -
The tasklet commits the user transaction via
TransacitonManager
. -
TransacitonManager
issues a commit to the target resource. -
The tasklet returns the process end to the step.
-
The step commits the framework transaction.
In this case, each item is output to a resource, but like the chunk model,
it is also possible to update the processing throughput collectively by chunk unit and improve the processing throughput.
At that time, you can also use BatchUpdate by setting |
- Transaction control in abnormal process
-
Transaction control in abnormal process will be explained.
-
Steps are executed from the job.
-
The step starts framework transaction.
-
-
The step executes the tasklet.
-
Repeat steps 3 to 11 until there is no more input data.
-
-
The tasklet starts user transaction from
TransacitonManager
.-
Execute the user transaction with REQUESTS_NEW to separate it from the framework transaction.
-
Repeat steps 4 to 6 until the chunk size is reached.
-
-
Tasklet gets input data from
Repository
. -
Repository
will return input data to tasklet. -
Tasklets process input data.
-
Tasklets pass output data to
Repository
. -
Repository
will output to the target resource.
If any exception occurs between the process from 3 to 8,
-
The tasklet processes the exception that occurred.
-
The tasklet performs a rollback of user transaction via
TransacitonManager
. -
TransacitonManager
issues a rollback to the target resource. -
The tasklet throws an exception to the step.
-
The step rolls back framework transaction.
About processing continuation
Here, although processing is abnormally terminated after handling exceptions and rolling back the processing, it is possible to continue processing the next chunk. In either case, it is necessary to notify the subsequent processing by changing the status / end code of the step that an error has occurred during that process. |
About framework transactions
In this case, although the job is abnormally terminated by throwing an exception after rolling back the user transaction, it is also possible to return the processing end to the step and terminate the job normally. In this case, the framework transaction is committed. |
5.1.2.1.3. Selection policy for model-specific transaction control
In Spring Batch that is the basis of TERASOLUNA Batch 5.x, only the intermediate commit method can be implemented in the chunk model. However, in the tasklet model, either the intermediate commit method or the single commit method can be implemented.
Therefore, in TERASOLUNA Batch 5.x, when the single commit method is necessary, it is to be implemented in the tasklet model.
5.1.2.2. Difference in transaction control for each execution method
Depending on the execution method, a transaction that is not managed by Spring Batch occurs before and after the job is executed. This section explains transactions in two asynchronous execution processing schemes.
5.1.2.2.1. About transaction of DB polling
Regarding processing to the Job-request-table performed by the DB polling, transaction processing other than Spring Batch managed will be performed.
Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by JobRequestPollTask
.
A simple sequence diagram focusing on transactions is shown in the figure below.
-
JobRequestPollTask
is executed periodically from asynchronous batch daemon. -
JobRequestPollTask
will start a transaction other than Spring Batch managed. -
JobRequestPollTask
will retreive an asynchronous batch to execute from Job-request-table. -
JobRequestPollTask
will commit the transaction other than Spring Batch managed. -
JobRequestPollTask
will start a transaction other than Spring Batch managed. -
JobRequestPollTask
will update the status of Job-request-table’s polling status from INIT to POLLED. -
JobRequestPollTask
will commit the transaction other than Spring Batch managed. -
JobRequestPollTask
will execute the job. -
Inside the job, transaction control for DB for Management(
JobRepository
) will be managed by Spring Batch. -
Inside the job, transaction control for DB for Job will be managed by Spring Batch.
-
job_execution_id is returned to
JobRequestPollTask
-
JobRequestPollTask
will start a transaction other than Spring Batch managed. -
JobRequestPollTask
will update the status of Job-request-table’s polling status from INIT to EXECUTE. -
JobRequestPollTask
will commit the transaction other than Spring Batch managed.
About Commit at SELECT Issuance
Some databases may implicitly start transactions when SELECT is issued. Therefore, by explicitly issuing a commit, the transaction is confirmed so that the transaction is clearly distinguished from other transactions and is not influenced. |
5.1.2.2.2. About the transaction of WebAP server process
As for processing to resources targeted by WebAP, transaction processing outside Spring Batch managed is performed. Also, regarding exceptions that occurred in the job, since correspondence is completed within the job, it does not affect transactions performed by WebAP.
A simple sequence diagram focusing on transactions is shown in the figure below.
-
WebAP processing is executed by the request from the client
-
WebAP will start the transaction managed outside of Spring Batch.
-
WebAP reads from and writes to resources in WebAP before job execution.
-
WebAP executes the job.
-
Within a job, Spring Batch carries out transaction management to the Management DB (
JobRepository
). -
Within a job, Spring Batch carries out transaction management to the Job DB.
-
job_execution_id is returned to WebAP
-
WebAP reads from and writes to resources in WebAP after job execution.
-
WebAP will commit the transaction managed outside of Spring Batch.
-
WebAP returns a response to the client.
5.1.3. How to use
Here, transaction control in one job will be explained separately in the following cases.
The data source refers to the data storage location (database, file, etc.). A single data source refers to one data source, and multiple data sources refers to two or more data sources.
In the case of processing a single data source, the case of processing database data is representative.
There are some variations in the case of processing multiple data sources as follows.
-
multiple databases
-
databases and files
5.1.3.1. For a single data source
Transaction control of jobs input / output to one data source will be described.
Below is a sample setting with TERASOLUNA Batch 5.x.
<!-- Job-common definitions -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
destroy-method="close"
p:driverClassName="${jdbc.driver}"
p:url="${jdbc.url}"
p:username="${jdbc.username}"
p:password="${jdbc.password}"
p:maxTotal="10"
p:minIdle="1"
p:maxWaitMillis="5000"
p:defaultAutoCommit="false" />
<!-- (1) -->
<bean id="jobTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"
p:dataSource-ref="jobDataSource"
p:rollbackOnCommitFailure="true" />
No | Description |
---|---|
(1) |
Bean definition of TransactionManager. |
5.1.3.1.1. Implement transaction control
The control method differs depending on the job model and the commit method.
In the case of the chunk model, it is an intermediate commit method, leaving transaction control to Spring Batch. Do not control the transaction at all by the user.
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
<batch:step id="jobSalesPlan01.step01">
<batch:tasklet transaction-manager="jobTransactionManager"> <!-- (1) -->
<batch:chunk reader="detailCSVReader"
writer="detailWriter"
commit-interval="10" /> <!-- (2) -->
</batch:tasklet>
</batch:step>
</batch:job>
No | Description |
---|---|
(1) |
Set |
(2) |
Set chunk size to |
In the case of the tasklet model, the method of transaction control differs depending on whether the method is single commit method or the intermediate commit method.
- single commit method
-
Spring Batch control transaction.
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
<batch:step id="jobSalesPlan01.step01">
<!-- (1) -->
<batch:tasklet transaction-manager="jobTransactionManager"
ref="salesPlanSingleTranTask" />
</batch:step>
</batch:job>
No | Description |
---|---|
(1) |
Set |
- intermediate commit method
-
Control transaction by user.
-
If you want to commit in the middle of processing, inject the
TransacitonManager
and operate manually.
-
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
<batch:step id="jobSalesPlan01.step01">
<!-- (1) -->
<batch:tasklet transaction-manager="jobTransactionManager"
ref="salesPlanChunkTranTask" />
</batch:step>
</batch:job>
@Component()
public class SalesPlanChunkTranTask implements Tasklet {
@Inject
ItemStreamReader<SalesPlanDetail> itemReader;
// (2)
@Inject
@Named("jobTransactionManager")
PlatformTransactionManager transactionManager;
@Inject
SalesPlanDetailRepository repository;
private static final int CHUNK_SIZE = 10;
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
DefaultTransactionDefinition definition = new DefaultTransactionDefinition();
definition.setPropagationBehavior(TransactionDefinition
.PROPAGATION_REQUIRES_NEW); // (3)
TransactionStatus status = null;
try {
// omitted
itemReader.open(executionContext);
while ((item = itemReader.read()) != null) {
if (count % CHUNK_SIZE == 0) {
status = transactionManager.getTransaction(definition); // (4)
}
count++;
// omitted
repository.create(item);
if (count % CHUNK_SIZE == 0) {
transactionManager.commit(status); // (5)
}
}
} catch (Exception e) {
logger.error("Exception occurred while reading.", e);
transactionManager.rollback(status); // (6)
throw e;
} finally {
if (!status.isCompleted()) {
transactionManager.commit(status); // (7)
}
itemReader.close();
}
return RepeatStatus.FINISHED;
}
}
No | Description |
---|---|
(1) |
Set |
(2) |
Inject the transaction manager. |
(3) |
Specify |
(4) |
Start transaction at the beginning of chunk. |
(5) |
Commit the transaction at the end of the chunk. |
(6) |
When an exception occurs, roll back the transaction. |
(7) |
For the last chunk, commit the transaction. |
About the Propagation of TransacitonManager
In the tasklet model, transaction control is newly performed within the transaction controlled by Spring Batch. Therefore, it is necessary to set Propagation of Transaciton Manager to REQUESTS_NEW. |
Updating by ItemWriter
In the above example, although Repository is used, it is possible to update data using ItemWriter. Using ItemWriter has the effect of simplifying implementation, especially FlatFileItemWriter should be used when updating files. |
5.1.3.1.2. Note for non-transactional data sources
In the case of files, no transaction setting or operation is necessary.
When using FlatFileItemWriter
, pseudo transaction control can be performed.
This is implemented by delaying the writing to the resource and actually writing out at the commit timing.
Normally, when it reaches the chunk size, it outputs chunk data to the actual file, and if an exception occurs, data output of the chunk is not performed.
FlatFileItemWriter
can switch transaction control on and off with transactional
property. The default is true and transaction control is enabled.
If the transactional
property is false, FlatFileItemWriter
will output the data regardless of the transaction.
When adopting the single commit method, it is recommend to set the transactional
property to false.
As described above, since data is written to the resource at the commit timing, until then, all the output data is held in the memory.
Therefore, when the amount of data is large, there is a high possibility that the memory becomes insufficient and an error will occur.
On TransacitonManager settings in jobs that only handle files
As in the following job definition, the Excerpt of TransacitonManager setting part
Therefore, always specify
At this time, transactions are issued to the resource (eg, database) referred to by If you do not want to issue transactions to refer to even if it is idle or in case of actual damage, you can use Sample usage of ResourcelessTransactionManager
|
5.1.3.2. For multiple data sources
Transaction control of jobs input / output to multiple data sources will be described. Since consideration points are different between input and output, they will be explained separately.
5.1.3.2.1. Input from multiple data source
When retrieving data from multiple data sources, the data that is the axis of the process and it’s additional data should be retrieved separately. Hereinafter, the data as the axis of processing is referred to as the process target record, and the additional data accompanying it is referred to as accompanying data.
Because of the structure of Spring Batch, ItemReader is based on the premise that it retriedves a process target record from one resource. This is the same way of thinking regardless of the type of resource.
-
Retriving process target record
-
Get it by ItemReader.
-
-
Retriving accompanying data
-
In the accompanying data, it is necessary to select the following retreiving method according to the presence or absence of change to the data and the number of cases. This is not an option, and it may be used in combination.
-
Batch retrieval before step execution
-
Retrieve each time according to the record to be processed
-
-
Implement Listener to do the following and refer to data from the following Step.
-
Retrieve data collectively
-
Store the information in the bean whose scope is
Job
orStep
-
ExecutionContext
of Spring Batch can be used, but a diffferent class can be created to store data considering the readability and maintainability. For the sake of simplicity, the sample will be explained usingExecutionContext
.
-
This method is adopted when reading data that does not depend on data to be processed such as master data. However, even if it is a master data, if there is a large number of items which may give an impact to the memory, retrieving each time should be considered.
@Component
// (1)
public class BranchMasterReadStepListener extends StepExecutionListenerSupport {
@Inject
BranchRepository branchRepository;
@Override
public void beforeStep(StepExecution stepExecution) { // (2)
List<Branch> branches = branchRepository.findAll(); //(3)
Map<String, Branch> map = branches.stream()
.collect(Collectors.toMap(Branch::getBranchId,
UnaryOperator.identity())); // (4)
stepExecution.getExecutionContext().put("branches", map); // (5)
}
}
<batch:job id="outputAllCustomerList01" job-repository="jobRepository">
<batch:step id="outputAllCustomerList01.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader"
processor="retrieveBranchFromContextItemProcessor"
writer="writer" commit-interval="10"/>
<batch:listeners>
<batch:listener ref="branchMasterReadStepListener"/> <!-- (6) -->
</batch:listeners>
</batch:tasklet>
</batch:step>
</batch:job>
@Component
public class RetrieveBranchFromContextItemProcessor implements
ItemProcessor<Customer, CustomerWithBranch> {
private Map<String, Branch> branches;
@BeforeStep // (7)
@SuppressWarnings("unchecked")
public void beforeStep(StepExecution stepExecution) {
branches = (Map<String, Branch>) stepExecution.getExecutionContext()
.get("branches"); // (8)
}
@Override
public CustomerWithBranch process(Customer item) throws Exception {
CustomerWithBranch newItem = new CustomerWithBranch(item);
newItem.setBranch(branches.get(item.getChargeBranchId())); // (9)
return newItem;
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Implement the |
(3) |
Implement processing to retrieve master data. |
(4) |
Convert from List type to Map type so that it can be used easily in subsequent processing. |
(5) |
Set the acquired master data in the context of the step as |
(6) |
Register the created Listener to the target job. |
(7) |
In order to acquire master data before step execution of ItemProcessor, set up Listener with @BeforeStep annotation. |
(8) |
In the method given the @BeforeStep annotation, obtain the master data set in (5) from the context of the step. |
(9) |
In the process method of ItemProcessor, data is retrieved from the master data. |
Object to store in context
The object to be stored in the context( |
Apart from ItemProcessor of business processing, it retrieves by ItemProcessor designated just for retrieving every time. This simplifies processing of each ItemProcessor.
-
Define ItemProcessor designated just for retrieving every time, and separate it from business process.
-
At this time, use MyBatis as it is when accessing the table.
-
-
Concatenate multiple ItemProcessors using CompositeItemProcessor.
-
Note that ItemProcessor is processed in the order specified in the delegates attribute.
-
@Component
public class RetrieveBranchFromRepositoryItemProcessor implements
ItemProcessor<Customer, CustomerWithBranch> {
@Inject
BranchRepository branchRepository; // (1)
@Override
public CustomerWithBranch process(Customer item) throws Exception {
CustomerWithBranch newItem = new CustomerWithBranch(item);
newItem.setBranch(branchRepository.findOne(
item.getChargeBranchId())); // (2)
return newItem; // (3)
}
}
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<ref bean="retrieveBranchFromRepositoryItemProcessor"/> <!-- (4) -->
<ref bean="businessLogicItemProcessor"/> <!-- (5) -->
</list>
</property>
</bean>
No | Description |
---|---|
(1) |
Inject Repository for retrieving every time using MyBatis. |
(2) |
Accompaniment data is retrieved from the Repository for input data(process target record). |
(3) |
Return data with processing target record and accompanying data together. |
(4) |
Set ItemProcessor for retrieving every time. |
(5) |
Set ItemProcessor for business logic. |
5.1.3.2.2. Output to multiple data sources(multiple steps)
Process multiple data sources throughout the job by dividing the steps for each data source and processing a single data source at each step.
-
Data processed at the first step is stored in a table, and at the second step, it is outputted to a file.
-
Although each step is simple and easy to recover, there is a possibility that it may be troublesome twice.
-
As a result, in the case of causing the following harmful effects, consider processing multiple data sources in one step.
-
Processing time increases
-
Business logic becomes redundant
-
-
5.1.3.2.3. Output to multiple data sources(single step)
Generally, when transactions for a plurality of data sources are combined into one, a distributed transaction based on 2 phase-commit is used. However, it is also known that there are the following disadvantages.
-
Middleware must be compatible with distributed transaction API such as XAResource, and special setting based on it is required
-
In standalone Java like a batch program, you need to add a JTA implementation library for distributed transactions
-
Recovery in case of failure is difficult
Although it is possible to utilize distributed transactions also in Spring Batch, the method using global transaction by JTA requires performance overhead due to the characteristics of the protocol. As a method to process multiple data sources collectively more easily, Best Efforts 1PC pattern is recommended.
- What is Best Efforts 1PC pattern
-
Briefly, it refers to the technique of handling multiple data sources as local transactions and issuing sequential commits at the same timing. The conceptual diagram is shown in the figure below.
-
The user instructs
ChainedTransactionManager
to start the transaction. -
ChainedTransactionManager
starts a transaction sequentially with registered transaction managers. -
The user performs transactional operations on each resource.
-
The user instructs
ChainedTransactionManager
to commit. -
ChainedTransactionManager
issues sequential commits on registered transaction managers.-
Commit(or roll back) in reverse order of transaction start
-
Since this method is not a distributed transaction, there is a possibility that data consistency may not be maintained if a failure(exception) occurs at commit / rollback in the second and subsequent transaction managers. Therefore, although it is necessary to design a recovery method when a failure occurs at a transaction boundary, there is an effect that the recovery frequency can be reduced and the recovery procedure can be simplified.
Use it on cases such as when processing multiple databases simultaneously, when processing databases and MQ, and so on.
Process as 1 phase-commit by defining multiple transaction managers as one using ChainedTransactionManager
as follows.
Note that ChainedTransactionManager
is a class provided by Spring Data.
<dependencies>
<!-- omitted -->
<!-- (1) -->
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-commons</artifactId>
</dependency>
<dependencies>
<!-- Chained Transaction Manager -->
<!-- (2) -->
<bean id="chainedTransactionManager"
class="org.springframework.data.transaction.ChainedTransactionManager">
<constructor-arg>
<!-- (3) -->
<list>
<ref bean="transactionManager1"/>
<ref bean="transactionManager2"/>
</list>
</constructor-arg>
</bean>
<batch:job id="jobSalesPlan01" job-repository="jobRepository">
<batch:step id="jobSalesPlan01.step01">
<!-- (4) -->
<batch:tasklet transaction-manager="chainedTransactionManager">
<!-- omitted -->
</batch:tasklet>
</batch:step>
</batch:job>
No | Description |
---|---|
(1) |
Add a dependency to use |
(2) |
Define the bean of |
(3) |
Define multiple transaction managers that you want to summarize in a list. |
(4) |
Specify the bean ID defined in (1) for the transaction manager used by the job. |
This method is used when processing databases and files at the same time.
For database it is the same as For a single data source.
For files, setting FlatFileItemWriter’s transactional
property to true provides the same effect as the "Best Efforts 1PC pattern" described above.
For details, refer to Note for non-transactional data sources.
This setting delays writing to the file until just before committing the transaction of the database, so it is easy to synchronize with the two data sources. However, even in this case, if an error occurs during file output processing after committing to the database, there is a possibility that data consistency can not be maintained, It is necessary to design a recovery method.
5.1.3.3. Notes on intermediate method commit
Although it is deprecated, when processing data is skipped in ItemWriter, the chunk size setting value is forcibly changed. Note that this has a very big impact on transactions. Refer to Skip for details.
5.2. Database Access
5.2.1. Overview
MyBatis3 (hereafter, called [MyBatis]) is used for database access in TERASOLUNA Batch 5.x. Please refer below TERASOLUNA Server 5.x Development Guideline for basic usage of database access using MyBatis.
This chapter mainly explain how to use database access as TERASOLUNA Batch 5.x specifically.
Since this function works differently for chunk model and tasket model, respective explanations are given.
5.2.2. How to use
Explain how to use database access as TERASOLUNA Batch 5.x.
There are following 2 ways to use database access in TERASOLUNA Batch 5.x.
Please choose them based on the components accessing the database.
-
Use ItemReader and ItemWriter for MyBatis.
-
For Input/Output by using database access as chunk model.
-
org.mybatis.spring.batch.MyBatisCursorItemReader
-
org.mybatis.spring.batch.MyBatisBatchItemWriter
-
-
-
Use Mapper interface
-
For bussiness logic processing as chunk model.
-
With ItemProcessor implementation.
-
-
For whole database access as tasklet model.
-
With Tasklet implementation.
-
-
5.2.2.1. Common Settings
Explain common settings required for database access.
5.2.2.1.1. DataSource Setting
It assumes two data sources in TERASOLUNA Batch 5.x.
Show 2 default data sources in launch-context.xml
.
Data source name | Description |
---|---|
|
Data source used by Spring Batch and TERASOLUNA Batch 5.x |
|
Data source used by job |
Show the property of connection information and launch-context.xml below.
Set these settings according to the user’s environment.
<!-- (1) -->
<bean id="adminDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
destroy-method="close"
p:driverClassName="${admin.jdbc.driver}"
p:url="${admin.jdbc.url}"
p:username="${admin.jdbc.username}"
p:password="${admin.jdbc.password}"
p:maxTotal="10"
p:minIdle="1"
p:maxWaitMillis="5000"
p:defaultAutoCommit="false"/>
<!-- (2) -->
<bean id="jobDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
destroy-method="close"
p:driverClassName="${jdbc.driver}"
p:url="${jdbc.url}"
p:username="${jdbc.username}"
p:password="${jdbc.password}"
p:maxTotal="10"
p:minIdle="1"
p:maxWaitMillis="5000"
p:defaultAutoCommit="false" />
# (3)
# Admin DataSource settings.
admin.jdbc.driver=org.h2.Driver
admin.jdbc.url=jdbc:h2:mem:batch;DB_CLOSE_DELAY=-1
admin.jdbc.username=sa
admin.jdbc.password=
# (4)
# Job DataSource settings.
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost:5432/postgres
jdbc.username=postgres
jdbc.password=postgres
Sr. No. | Description |
---|---|
(1) |
|
(2) |
|
(3) |
Connection information to database used by |
(4) |
Connection information to database used by |
5.2.2.1.2. MyBatis Setting
Important points for setting MyBatis on TERASOLUNA Batch 5.x.
One of the important points in implementing batch processing is "to efficiently process large amounts of data with certain resources"
Explain the setting.
-
fetchSize
-
In general batch processing, it is mandatory to specify the appropriate
fetchSize
for the JDBC driver to reduce the communication cost of processing large amounts of data.fetchSize
is a parameter that sets the number of data to be acquired by one communication between the JDBC driver and the database. It is desirable to set this value as large as possible. However, if it is too large, it presses memory. So please be careful. user has to tune the parameter. -
In MyBatis, user can set
defaultFetchSize
as a common setting for all queries, and can override it withfetchSize
setting for each query.
-
-
executorType
-
In general batch processing, the same SQL is executed within the same transaction for the number of
total data count/fetchSize
. At this time, it is possible to process efficiently by reusing a statement instead of creating it each time. -
In the MyBatis setting, it can reuse statements by setting
REUSE
indefaultExecutorType
and contributes to improved processing throughput. -
When updating a large amount of data at once, performance improvement can be expected by using batch update of JDBC.
Therefore,SqlSessionTemplate
used inMyBatisBatchItemWriter
is set toBATCH
(notREUSE
) inexecutorType
.
-
In TERASOLUNA Batch 5.x, two different ExecutorType
exists at the same time.
It is assumed that it is often implemented by one ExecutorType
, but special attention is required when using them together.
The detail will be explained in Database Access other than ItemReader・ItemWriter.
Other parameter of MyBatis
For other parameters, refer to the following links and make settings that match the application characteristics. |
Show the default setting below.
<bean id="jobSqlSessionFactory"
class="org.mybatis.spring.SqlSessionFactoryBean"
p:dataSource-ref="jobDataSource">
<!-- (1) -->
<property name="configuration">
<bean class="org.apache.ibatis.session.Configuration"
p:localCacheScope="STATEMENT"
p:lazyLoadingEnabled="true"
p:aggressiveLazyLoading="false"
p:defaultFetchSize="1000"
p:defaultExecutorType="REUSE"/>
</property>
</bean>
<!-- (2) -->
<bean id="batchModeSqlSessionTemplate"
class="org.mybatis.spring.SqlSessionTemplate"
c:sqlSessionFactory-ref="jobSqlSessionFactory"
c:executorType="BATCH"/>
Sr. No. | Description |
---|---|
(1) |
Various setting of MyBatis |
(2) |
For |
For the definition of SqlSessionFactory using adminDataSource
When performing synchronous execution, META-INF/spring/async-batch-daemon.xml
|
5.2.2.1.3. Mapper XML definition
Please refer to Implementation of database access process in TERASOLUNA Server 5.x Development Guideline, because there are no specific description about TERASOLUNA Batch 5.x.
5.2.2.1.4. MyBatis-Spring setting
When using ItemReader and ItemWriter provided by MyBatis-Spring, it is necessary to set Mapper XML used in Mapper’s Config.
As the setting method, there are following 2 methods.
-
Register Mapper XML to be used for all jobs as a common setting.
-
All Mapper XML has to be described in
META-INF/spring/launch-context.xml
.
-
-
Register Mapper XML to be used for each job as individual setting.
-
Mapper XML required by each job has to be described in bean definition under
META-INF/jobs/
-
If common settings are made, the following adverse effects arise because not only Mapper XML of jobs executed, but also Mapper XML used by other jobs are also read when executing synchronous execution.
-
It takes time to start the job
-
Consumption of memory resources increases
To avoid it, TERASOLUNA Batch 5.x adopts a setting method that specifies only Mapper XML that the job requires for each job definition as individual setting.
For the basic setting method, please refer to MyBatis-Spring settings in TERASOLUNA Server 5.x Development Guideline.
In TERASOLUNA Batch 5.x, since multiple SqlSessionFactory
and SqlSessionTemplate
are defined,
it is necessary to explicitly specify which one to use.
Basically, specify jobSqlSessionFactory
Show setting example below.
<!-- (1) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
factory-ref="jobSqlSessionFactory"/>
Sr. No. | Description |
---|---|
(1) |
Set |
5.2.2.2. Database access with ItemReader
Explain Database access with ItemReader here.
5.2.2.2.1. ItemReader of MyBatis
MyBatis-Spring provides the following two ItemReader.
-
org.mybatis.spring.batch.MyBatisCursorItemReader
-
org.mybatis.spring.batch.MyBatisPagingItemReader
MyBatisPagingItemReader
is an ItemReader
that uses the mechanism described
in Pagination search for Entity (SQL refinement method) of TERASOLUNA Server 5.x Development Guideline
Since SQL is issued again after acquiring a certain number of cases, there is a possibility that data consistency may not be maintained.
Therefore, it is dangerous to use it in batch processing, so TERASOLUNA Batch 5.x does not use it in principle.
TERASOLUNA Batch 5.x uses only MyBatisCursorItemReader
.
In TERASOLUNA Batch 5.x, as explained in MyBatis-Spring setting,
It adopts a method of dynamically registering Mapper XML with mybatis:scan
.
Therefore, it is necessary to prepare an interface corresponding to Mapper XML.
For details, please refer to
Implementation of database access process in TERASOLUNA Server 5.x Development Guideline.
Show an example of usage of MyBatisCursorItemReader
below.
<!-- (1) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.mst"
factory-ref="jobSqlSessionFactory"/>
<!-- (2) (3) (4) -->
<bean id="reader"
class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository.findAll"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- (5) -->
<mapper namespace="org.terasoluna.batch.functionaltest.app.repository.mst.CustomerRepository">
<!-- (6) -->
<select id="findAll"
resultType="org.terasoluna.batch.functionaltest.app.model.mst.Customer">
<![CDATA[
SELECT
customer_id AS customerId,
customer_name AS customerName,
customer_address AS customerAddress,
customer_tel AS customerTel,
charge_branch_id AS chargeBranchId,
create_date AS createDate,
update_date AS updateDate
FROM
customer_mst
ORDER by
charge_branch_id ASC, customer_id ASC
]]>
</select>
<!-- omitted -->
</mapper>
public interface CustomerRepository {
// (7)
List<Customer> findAll();
// omitted
}
Sr. No. | Description |
---|---|
(1) |
Register Mapper XML. |
(2) |
Define |
(3) |
Specify the SQL ID defined in (6) with |
(4) |
Specify |
(5) |
Define Mapper XML. Match the value of namespace with the FQCN of the interface. |
(6) |
Define SQL. |
(7) |
Define the method corresponding to the SQL ID defined in (6) for the interface. |
5.2.2.3. Database Access with ItemWriter
Explain database access with ItemWriter in here.
5.2.2.3.1. ItemWriter of MyBatis
MyBatis-Spring provides only one following ItemWriter.
-
org.mybatis.spring.batch.MyBatisBatchItemWriter
The basic setting is the same as ItemReader of MyBatis.
MyBatisBatchItemWriter
needs to specify batchModeSqlSessionTemplate
described in MyBatis Setting.
Show an example definition of MyBatisBatchItemWriter
below.
<!-- (1) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository.plan"
factory-ref="jobSqlSessionFactory"/>
<!-- (2) (3) (4) -->
<bean id="detailWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository.create"
p:sqlSessionTemplate="batchModeSqlSessionTemplate"/>
<!-- omitted -->
<!-- (5) -->
<mapper namespace="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository">
<!-- (6) -->
<insert id="create"
parameterType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
<![CDATA[
INSERT INTO
sales_plan_detail(branch_id, year, month, customer_id, amount)
VALUES (
#{branchId}, #{year}, #{month}, #{customerId}, #{amount}
)
]]>
</insert>
<!-- omitted -->
</mapper>
public interface SalesPlanDetailRepository {
// (7)
void create(SalesPlanDetail salesPlanDetail);
// omitted
}
Sr. No. | Description |
---|---|
(1) |
Register Mapper XML. |
(2) |
Define |
(3) |
Specify the SQL ID defined in (6) with |
(4) |
Specify |
(5) |
Define Mapper XML. Match the value of namespace with the FQCN of the interface. |
(6) |
Define SQL. |
(7) |
Define the method corresponding to the SQL ID defined in (6) for the interface. |
5.2.2.4. Database Access other than ItemReader・ItemWriter
Explain database access except for ItemReader・ItemWriter.
To access the database except for ItemReader・ItemWriter, use the Mapper interface. In using the Mapper interface, TERASOLUNA Batch 5.x has the following restrictions.
Process | ItemProcessor | Tasklet | Listner |
---|---|---|---|
Reference |
Available |
Available |
Available |
Update |
Conditionally available |
Available |
Unavailable |
- Restrictions in ItemProcessor
-
There is a restriction that it should not be executed with two or more
ExecutorType
within the same transaction in MyBatis.
If "useMyBatisBatchItemWriter
for ItemWriter" and "use ItemProcessor to update and reference the Mapper interface" are satisfied at the same time, it conflicts with this restriction.
To avoid this restriction, database is accessed by using Mapper interface thatExecutorType
isBATCH
in ItemProcessor.
In addition,MyBatisBatchItemWriter
checks whether it is SQL issued by itself with the status check after executing SQL but naturally it can not manage SQL execution by ItemProcessor and an error will occur.
Therefore, ifMyBatisBatchItemWriter
is used, updating with the Mapper interface will not be possible and only reference.
It can set to invalidate the error check of |
- Restrictions in Tasklet
-
In Tasklet, since it is basic to use the Mapper interface, there is no influence like ItemProcessor.
It is possible to useMyBatisBatchItemWriter
by Inject, but in that case Mapper interface itself can be processed withBATCH
setting. In other words, there is basically no need to useMyBatisBatchItemWriter
by Inject. - Restrictions in Listener
-
Even at the listener, the same restriction as that of ItemProcessor is established. In addition, for listeners, use cases requiring updates are difficult to think. Therefore, update processing is prohibited at the listner.
Replacement of update processing assumed by the listner
|
5.2.2.4.1. Database access with ItemProcessor
Show an example of database access with ItemProcessor.
@Component
public class UpdateItemFromDBProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {
// (1)
@Inject
CustomerRepository customerRepository;
@Override
public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
// (2)
Customer customer = customerRepository.findOne(readItem.getCustomerId());
// (3)
SalesPlanDetail writeItem = new SalesPlanDetail();
writeItem.setBranchId(customer.getChargeBranchId());
writeItem.setYear(readItem.getYear());
writeItem.setMonth(readItem.getMonth());
writeItem.setCustomerId(readItem.getCustomerId());
writeItem.setAmount(readItem.getAmount());
return writeItem;
}
}
<!-- (2) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository"
template-ref="batchModeSqlSessionTemplate"/>
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
p:queryId"org.terasoluna.batch.functionaltest.app.repository.performance.SalesPerformanceDetailRepository.findAll"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- (3) -->
<bean id="writer" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="org.terasoluna.batch.functionaltest.app.repository.plan.SalesPlanDetailRepository.create"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>
<batch:job id="DBAccessByItemProcessor" job-repository="jobRepository">
<batch:step id="DBAccessByItemProcessor.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<!-- (4) -->
<batch:chunk reader="reader"
processor="updateItemFromDBProcessor"
writer="writer" commit-interval="10"/>
</batch:tasklet>
</batch:step>
</batch:job>
Mapper interface and Mapper XML are omitted.
Sr. No. |
Description |
(1) |
Inject Mapper interface. |
(2) |
Register Mapper XML. |
(3) |
Define |
(4) |
Set ItemProcessor that injected Mapper interface. |
Supplement of MyBatisCursorItemReader setting
Different
|
5.2.2.4.2. Database Access with Tasklet
Show an example of database access in Tasklet.
@Component
public class OptimisticLockTasklet implements Tasklet {
// (1)
@Inject
ExclusiveControlRepository repository;
// omitted
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
Branch branch = repository.branchFindOne(branchId); // (2)
ExclusiveBranch exclusiveBranch = new ExclusiveBranch();
exclusiveBranch.setBranchId(branch.getBranchId());
exclusiveBranch.setBranchName(branch.getBranchName() + " - " + identifier);
exclusiveBranch.setBranchAddress(branch.getBranchAddress() + " - " + identifier);
exclusiveBranch.setBranchTel(branch.getBranchTel());
exclusiveBranch.setCreateDate(branch.getUpdateDate());
exclusiveBranch.setUpdateDate(new Timestamp(System.currentTimeMillis()));
exclusiveBranch.setOldBranchName(branch.getBranchName());
int result = repository.branchExclusiveUpdate(exclusiveBranch); // (3)
return RepeatStatus.FINISHED;
}
}
<!-- (4) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository"
factory-ref="jobSqlSessionFactory"/>
<batch:job id="taskletOptimisticLockCheckJob" job-repository="jobRepository">
<batch:step id="taskletOptimisticLockCheckJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager"
ref="optimisticLockTasklet"> <!-- (5) -->
</batch:tasklet>
</batch:step>
</batch:job>
Mapper interface and Mapper XML are omitted.
Sr. No. |
Description |
(1) |
Inject Mapper interface. |
(2) |
Execute the search process with the Mapper interface. |
(3) |
Execute the update process with the Mapper interface. |
(4) |
Register Mapper XML |
(5) |
Inject Mapper interface and set Tasklet. |
Use batchModeSqlSessionTemplate
If there are many updating processes with the tasklet model, set |
5.2.2.4.3. Database Access with Listener
Database access with listener is often linked with other components. Depending on the listener to be used and the implementation method, It is necessary to prepare additional mechanism to hand over to other components.
Show an example in which StepExecutionListener acquires data before step execution and uses the data acquired by ItemProcessor.
public class CacheSetListener extends StepExecutionListenerSupport {
// (1)
@Inject
CustomerRepository customerRepository;
// (2)
@Inject
CustomerCache cache;
@Override
public void beforeStep(StepExecution stepExecution) {
// (3)
customerRepository.findAll().forEach(customer ->
cache.addCustomer(customer.getCustomerId(), customer));
}
}
@Component
public class UpdateItemFromCacheProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPlanDetail> {
// (4)
@Inject
CustomerCache cache;
@Override
public SalesPlanDetail process(SalesPerformanceDetail readItem) throws Exception {
Customer customer = cache.getCustomer(readItem.getCustomerId()); // (5)
SalesPlanDetail writeItem = new SalesPlanDetail();
// omitted
writerItem.setCustomerName(customer.getCustomerName); // (6)
return writeItem;
}
}
// (7)
@Component
public class CustomerCache {
Map<String, Customer> customerMap = new HashMap<>();
public Customer getCustomer(String customerId) {
return customerMap.get(customerId);
}
public void addCustomer(String id, Customer customer) {
customerMap.put(id, customer);
}
}
<!-- omitted -->
<!-- (8) -->
<mybatis:scan
base-package="org.terasoluna.batch.functionaltest.app.repository"
template-ref="batchModeSqlSessionTemplate"/>
<!-- (9) -->
<bean id="cacheSetListener"
class="org.terasoluna.batch.functionaltest.ch05.dbaccess.CacheSetListener"/>
<!-- omitted -->
<batch:job id="DBAccessByItemListener" job-repository="jobRepository">
<batch:step id="DBAccessByItemListener.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader"
processor="updateItemFromCacheProcessor"
writer="writer" commit-interval="10"/> <!-- (10) -->
<!-- (11) -->
<batch:listeners>
<batch:listener ref="cacheSetListener"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
</batch:job>
Sr. No. |
Description |
(1) |
Inject Mapper interface. |
(2) |
Inject a bean for caching data acquired from the Mapper interface. |
(3) |
Get data from the Mapper interface and cache it at the listener. |
(4) |
Inject the same bean as the cache set in (2). |
(5) |
Get corresponding data from the cache. |
(6) |
Reflect the data from the cache in the update data. |
(7) |
Implement the cache class as a component. |
(8) |
Register Mapper XML |
(9) |
Define the listener that uses the Mapper interface. |
(10) |
Specify ItemProcessor that uses cache. |
(11) |
Register the listener defined in (9). |
Using SqlSessionFactory with the Listener
In the above example, For listeners that run outside the scope of chunks,
since it is processed outside the transaction, setting |
5.3. File Access
5.3.1. Overview
This chapter describes how to input and output files.
The usage method of this function is same in the chunk model as well as tasklet model.
5.3.1.1. Type of File which can be handled
The type of files that can be handled with TERASOLUNA Batch 5.x are ones decribed as below.
This is the same for which Spring Batch can handle.
-
Flat File
-
XML
Here it will explain how to handle flat file first, and then explain about XML in How To Extend.
First, show the types of Flat File which can be used with TERASOLUNA Batch 5.x.
Each row inside the flat file will be called record
,
and type of file is determined by the record’s format.
Format | Overview |
---|---|
Variable-length Record |
A record format which each items are separated by a delimiter, such as CSV and TSF. Each item’s length can be variable. |
Fixed-length Record |
A record format which each items are separeted by the items length(bytes). Each item’s length are fixed. |
Single String Record |
1 Record will be handled as 1 String item. |
The basic structure for Flat File is constructed by these 2 points.
-
Record Division
-
Record Format
Element | Overview |
---|---|
Record Division |
A division will indicate the type of record, such as Header Record, Data Record, and Trailer Record. |
Record Format |
The format will have informations of the record such as how many rows there is for Header, Data, and Trailer, how many times eace record will repeat, and so on. |
With TERASOLUNA Batch 5.x, Flat File with Single Format of Multi Format which includes each record division can be handles.
Here it willl explain about the record division and the record formats.
The overview of each record devision is explained as below.
Record Division | Overview |
---|---|
Header Record |
A record that is mentioned at the beginning of the file(data part). |
Data Record |
It is a record having data to be processed as a main object of the file. |
Trailer/Footer Record |
A record that is mentioned at the end of the file if the file(data part). |
Footer/End Record |
A record that is mentioned at the end of the file if the file is a Multi Format. |
About the field that indicates the record division
A flat file having a header record or a trailer record may have a field indicating a record division. |
About the name of file format
Depending on the definition of the file format in each system,
There are cases where names are different from this guideline such as calling Footer Record as End Record. |
A summary of Single Format and Multi Format is shown below.
Format | Overview |
---|---|
Single Format |
A format with Header N Rows + Data N Rows + Trailer N Rows. |
Multi Format |
A format with (Header N Rows + Data N Rows + Trailer N Rows) * N + Footer N Rows. |
The Multi Format record structure is shown in the figure as follows.
An example of a Single Format and Multi Format flat file is shown below.
//
is used as a comment-out character for the description of the file.
branchId,year,month,customerId,amount // (1)
000001,2016,1,0000000001,100000000 // (2)
000001,2016,1,0000000002,200000000 // (2)
000001,2016,1,0000000003,300000000 // (2)
000001,3,600000000 // (3)
No | Descriptions |
---|---|
(1) |
A header record |
(2) |
A data record. |
(3) |
A trailer record. |
// (1)
H,branchId,year,month,customerId,amount // (2)
D,000001,2016,1,0000000001,100000000
D,000001,2016,1,0000000002,200000000
D,000001,2016,1,0000000003,300000000
T,000001,3,600000000
H,branchId,year,month,customerId,amount // (2)
D,00002,2016,1,0000000004,400000000
D,00002,2016,1,0000000005,500000000
D,00002,2016,1,0000000006,600000000
T,00002,3,1500000000
H,branchId,year,month,customerId,amount // (2)
D,00003,2016,1,0000000007,700000000
D,00003,2016,1,0000000008,800000000
D,00003,2016,1,0000000009,900000000
T,00003,3,2400000000
F,3,9,4500000000 // (3)
No | Descriptions |
---|---|
(1) |
It has a field indicating the record division at the beginning of the record. |
(2) |
Every time branchId changes, it repeats header, data, trailer. |
(3) |
A footer record. |
Assumptions on format of data part
In How To Use, it will explain on the premise that the layout of the data part is the same format. This means that all the records of the data part are mapped to the same conversion target class |
About explanation of Multi Format file
|
5.3.1.2. A component that inputs and outputs a flat file
Describe a class for handling flat file.
The relationships of classes used for inputting flat files are as follows.
The calling relationship of each component is as follows.
Details of each component are shown below.
- org.springframework.batch.item.file.FlatFileItemReader
-
Implementation class of
ItemReader
to use for loading flat files. Use the following components.
The flow of simple processing is as follows.
1.UseBufferedReaderFactory
to getBufferedReader
.
2.Read one record from the flat file using the acquiredBufferedReader
.
3.UseLineMapper
to map one record to the target bean.- org.springframework.batch.item.file.BufferedReaderFactory
-
Generate
BufferedReader
to read the file. - org.springframework.batch.item.file.LineMapper
-
One record is mapped to the target bean. Use the following components.
The flow of simple processing is as follows.
1.UseLineTokenizer
to split one record into each item.
2.Mapping items split byFieldSetMapper
to bean properties.- org.springframework.batch.item.file.transform.LineTokenizer
-
Divide one record acquired from the file into each item.
Each partitioned item is stored inFieldSet
class. - org.springframework.batch.item.file.mapping.FieldSetMapper
-
Map each item in one divided record to the property of the target bean.
Relationships of classes used for outputting flat files are as follows.
The calling relationship of each component is as follows.
- org.springframework.batch.item.file.FlatFileItemWriter
-
Implementation class of
ItemWriter
for exporting to a flat file. Use the following components.LineAggregator
Mapping the target bean to one record.- org.springframework.batch.item.file.transform.LineAggregator
-
It is used to map the target bean to one record. The mapping between the properties of the bean and each item in the record is done in
FieldExtractor
.- org.springframework.batch.item.file.transform.FieldExtractor
-
Map the property of the target bean to each item in one record.
5.3.2. How To Use
Descriptoins for how to use flat file according to the record format.
Then, the following items are explained.
5.3.2.1. Variable-length record
Describe the definition method when dealing with variable-length record file.
5.3.2.1.1. Input
An example of setting for reading the following input file is shown.
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
public class SalesPlanDetail {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
The setting for reading the above file is as follows.
<!-- (1) (2) (3) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"
p:encoding="MS932"
p:strict="true">
<property name="lineMapper"> <!-- (4) -->
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer"> <!-- (5) -->
<!-- (6) (7) (8) -->
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="branchId,year,month,customerId,amount"
p:delimiter=","
p:quoteCharacter='"'/>
</property>
<property name="fieldSetMapper"> <!-- (9) -->
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the input file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the input file. |
JavaVM’s default character set |
|
(3) |
strict |
If true is set, an exception occurs if the input file does not exist(can not be opened). |
true |
|
(4) |
lineMapper |
Set |
Nothing |
|
(5) |
lineTokenizer |
Set |
Nothing |
|
(6) |
names |
Give a name to each item of one record. |
Nothing |
|
(7) |
delimiter |
Set delimiter |
comma |
|
(8) |
quoteCharacter |
Set enclosing character |
Nothing |
|
(9) |
fieldSetMapper |
If special conversion processing such as character strings and numbers is unnecessary, use |
Nothing |
See How To Extend for the case of implementing FieldSetMapper yourself. |
How to enter TSV format file
When reading the TSV file, it can be realized by setting a tab as a delimiter. TSV file loading: Example of delimiter setting (setting by constant)
Or, it may be as follows. TSV file reading: Example of delimiter setting (setting by character reference)
|
5.3.2.1.2. Output
An example of setting for writing the following output file is shown.
001,CustomerName001,CustomerAddress001,11111111111,001
002,CustomerName002,CustomerAddress002,11111111111,002
003,CustomerName003,CustomerAddress003,11111111111,003
public class Customer {
private String customerId;
private String customerName;
private String customerAddress;
private String customerTel;
private String chargeBranchId;
private Timestamp createDate;
private Timestamp updateDate;
// omitted getter/setter
}
The settings for writing the above file are as follows.
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:encoding="MS932"
p:lineSeparator="
"
p:appendAllowed="true"
p:shouldDeleteIfEmpty="true"
p:shouldDeleteIfExists="false"
p:transactional="true">
<property name="lineAggregator"> <!-- (8) -->
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator"
p:delimiter=","> <!-- (9) -->
<property name="fieldExtractor"> <!-- (10) -->
<!-- (11) -->
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="customerId,customerName,customerAddress,customerTel,chargeBranchId">
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the output file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the output file. |
JavaVM default character set |
|
(3) |
lineSeparator |
Set record break (line feed code). |
|
|
(4) |
appendAllowed |
If true, add to the existing file. |
false |
|
(5) |
shouldDeleteIfEmpty |
If true, delete the output if it is an empty file. |
false |
|
(6) |
shouldDeleteIfExists |
If true, delete the file if it already exists. |
true |
|
(7) |
transactional |
Set whether to perform transaction control. For details, see Transaction Control. |
true |
|
(8) |
lineAggregator |
Set |
Nothing |
|
(9) |
delimiter |
Sets the delimiter. |
comma |
|
(10) |
fieldExtractor |
If special conversion processing for strings and numbers is unnecessary, you can use |
Nothing |
|
(11) |
names |
Give a name to each item of one record. Set each name from the beginning of the record with a comma separator. |
Nothing |
To enclose a field around it, use org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator
provided by TERASOLUNA Batch 5.x.
The specification of EnclosableDelimitedLineAggregator
is as follows.
-
Optional specification of enclosure character and delimiter character
-
Default is the following value commonly used in CSV format
-
Enclosed character:
"
(double quote) -
Separator:
,
(comma)
-
-
-
If the field contains a carriage return, line feed, enclosure character, or delimiter, enclose the field with an enclosing character
-
When enclosing characters are included, the enclosing character will be escaped by adding an enclosing character right before this enclosing characters.
-
All fields can be surrounded by characters by setting
-
The usage of EnclosableDelimitedLineAggregator
is shown below.
"001","CustomerName""001""","CustomerAddress,001","11111111111","001"
"002","CustomerName""002""","CustomerAddress,002","11111111111","002"
"003","CustomerName""003""","CustomerAddress,003","11111111111","003"
// Same as above example
<property name="lineAggregator"> <!-- (1) -->
<!-- (2) (3) (4) -->
<bean class="org.terasoluna.batch.item.file.transform.EnclosableDelimitedLineAggregator"
p:delimiter=","
p:enclosure='"'
p:allEnclosing="true">
<property name="fieldExtractor">
<!-- omitted settings -->
</property>
</bean>
</property>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
lineAggregator |
Set |
Nothing |
|
(2) |
delimiter |
Sets the delimiter. |
comma |
|
(3) |
enclosure |
Set the enclosing character. |
double quote |
|
(4) |
allEnclosing |
If true, all fields are enclosed in an enclosing character. |
false |
TERASOLUNA Batch 5.x provides the extension class The The format of the CSV format is defined as follows in RFC-4180 which is a general format of CSV format.
|
How to output TSV format file
When outputting a TSV file, it can be realized by setting a tab as a delimiter. Setting example of delimiter when outputting TSV file (setting by constant)
Or, it may be as follows. Example of delimiter setting when TSV file is output (setting by character reference)
|
5.3.2.2. Fixed-length record
Describe how to define fixed length record files.
5.3.2.2.1. Input
An example of setting for reading the following input file is shown.
TERASOLUNA Batch 5.x corresponds to a format in which record delimitation is judged by line feed and format judged by the number of bytes.
Sale012016 1 00000011000000000
Sale022017 2 00000022000000000
Sale032018 3 00000033000000000
Sale012016 1 00000011000000000Sale022017 2 00000022000000000Sale032018 3 00000033000000000
No | Field Name | Data Type | Number of bytes |
---|---|---|---|
(1) |
branchId |
String |
6 |
(2) |
year |
int |
4 |
(3) |
month |
int |
2 |
(4) |
customerId |
String |
10 |
(5) |
amount |
BigDecimal |
10 |
public class SalesPlanDetail {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
The setting for reading the above file is as follows.
<!-- (1) (2) (3) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"
p:encoding="MS932"
p:strict="true">
<property name="bufferedReaderFactory"> <!-- (4) -->
<bean class="org.springframework.batch.item.file.DefaultBufferedReaderFactory"/>
</property>
<property name="lineMapper"> <!-- (5) -->
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer"> <!-- (6) -->
<!-- (7) -->
<!-- (8) -->
<!-- (9) -->
<bean class="org.terasoluna.batch.item.file.transform.FixedByteLengthLineTokenizer"
p:names="branchId,year,month,customerId,amount"
c:ranges="1-6, 7-10, 11-12, 13-22, 23-32"
c:charset="MS932" />
</property>
<property name="fieldSetMapper"> <!-- (10) -->
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail"/>
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the input file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the input file. |
JavaVM default character set |
|
(3) |
strict |
If true is set, an exception occurs if the input file does not exist(can not be opened). |
true |
|
(4) |
bufferedReaderFactory |
To decide record breaks by line breaks, use the default value To judge the delimiter of a record by the number of bytes, set |
|
|
(5) |
lineMapper |
Set |
Nothing |
|
(6) |
lineTokenizer |
Set |
Nothing |
|
(7) |
names |
Give a name to each item of one record. |
Nothing |
|
(8) |
ranges |
Sets the break position. Set the delimiter position from the beginning of the record, separated by commas. |
Nothing |
|
(9) |
charset |
Set the same character code as (2). |
Nothing |
|
(10) |
fieldSetMapper |
If special conversion processing for character strings and numbers is unnecessary, use |
Nothing |
See How To Extend for the case of implementing FieldSetMapper yourself. |
To read a file that judges record delimiter by byte count, use org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory
provided by TERASOLUNA Batch 5.x.
By using FixedByteLengthBufferedReaderFactory
, it is possible to acquire up to the number of bytes specified as one record.
The specification of FixedByteLengthBufferedReaderFactory
is as follows.
-
Specify byte count of record as constructor argument
-
Generate
FixedByteLengthBufferedReader
which reads the file with the specified number of bytes as one record
Use of FixedByteLengthBufferedReader
is as follows.
-
Reads a file with one byte length specified at instance creation
-
If there is a line feed code, do not discard it and read it by including it in the byte length of one record
-
The file encoding to be used for reading is the value set for
FlatFileItemWriter
, and it will be used whenBufferedReader
is generated.
The method of defining FixedByteLengthBufferedReaderFactory
is shown below.
<property name="bufferedReaderFactory">
<bean class="org.terasoluna.batch.item.file.FixedByteLengthBufferedReaderFactory"
c:byteLength="32"/> <!-- (1) -->
</property>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
byteLength |
Set the number of bytes per record. |
Nothing |
Components to use when handling Fixed-length files
When dealing with Fixed-length files, it is based on using the component provided by TERASOLUNA Batch 5.x.
|
Processing records containing multibyte character strings
When processing records containing multibyte character strings, be sure to use Since this issue is already reported to JIRA Spring Batch/BATCH-2540, it might be unnecessary in the future. |
For the implementation of FieldSetMapper, refer to How To Extend. |
5.3.2.2.2. Output
An example of setting for writing the following output file is shown.
In order to write a fixed-length file, it is necessary to format the value obtained from the bean according to the number of bytes of the field.
The format execution method differs as follows depending on whether double-byte characters are included or not.
-
If double-byte characters is not included(single-byte characters only and the number of bytes of characters is constant)
-
Format using
FormatterLineAggregator
. -
The format is set by the format used in the
String.format
method.
-
-
If double-byte characters is included(The number of bytes of characters is not constant depending on the character code)
-
Format with implementation class of
FieldExtractor
.
-
First, a setting example in the case where double-byte characters are not included in the output file is shown, followed by a setting example in the case where double-byte characters are included.
The setting when double-byte characters are not included in the output file is shown below.
0012016 10000000001 10000000
0022017 20000000002 20000000
0032018 30000000003 30000000
No | Field Name | Data Type | Number of bytes |
---|---|---|---|
(1) |
branchId |
String |
6 |
(2) |
year |
int |
4 |
(3) |
month |
int |
2 |
(4) |
customerId |
String |
10 |
(5) |
amount |
BigDecimal |
10 |
If the field’s value is less than the number of bytes specified, the rest of the field will be filled with halfwidth space.
public class SalesPlanDetail {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
The settings for writing the above file are as follows.
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:encoding="MS932"
p:lineSeparator="
"
p:appendAllowed="true"
p:shouldDeleteIfEmpty="true"
p:shouldDeleteIfExists="false"
p:transactional="true">
<property name="lineAggregator"> <!-- (8) -->
<bean class="org.springframework.batch.item.file.transform.FormatterLineAggregator"
p:format="%6s%4s%2s%10s%10s"/> <!-- (9) -->
<property name="fieldExtractor"> <!-- (10) -->
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="branchId,year,month,customerId,amount"/> <!-- (11) -->
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the output file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the output file. |
JavaVM default character set |
|
(3) |
lineSeparator |
Set the record break(line feed code) |
|
|
(4) |
appendAllowed |
If true, append to the existing file. |
false |
|
(5) |
shouldDeleteIfEmpty |
If true, delete the output if it is an empty file. |
false |
|
(6) |
shouldDeleteIfExists |
If true, delete the file if it already exists. |
true |
|
(7) |
transactional |
Set whether to perform transaction control. For details, see Transaction Control. |
true |
|
(8) |
lineAggregator |
Set |
Nothing |
|
(9) |
format |
Set the output format with the format used in the |
Nothing |
|
(10) |
fieldExtractor |
If special conversion processing for strings and numbers is unnecessary, you can use If conversion processing is necessary, set implementation class of |
|
|
(11) |
names |
Give a name to each item of one record. Set the names of each field from the beginning of the record with a comma. |
Nothing |
About PassThroughFieldExtractor
Deafult value for property
If the item is an array or a collection, it is returned as is, otherwise it is wrapped in an array of single elements. |
When formatting for double-byte characters, since the number of bytes per character differs depending on the character code, use the implementation class of FieldExtractor
instead of FormatterLineAggregator
.
Implementation class of FieldExtractor
is to be done as follows.
-
Implement
FieldExtractor
and override extract method. -
extract method is to be implemented as below
-
get the value from the item(target bean), and perform the conversion as needed
-
set the value to an array of object and return it.
-
The format of a field that includes double-byte characters is to be done in the implementation class of FieldExtractor
by the following way.
-
Get the number of bytes for the character code
-
Format the value by trimming or padding it according to be number of bytes
Below is a setting example for formatting a field including double-byte characters.
0012016 10000000001 10000000
番号2017 2 売上高002 20000000
番号32018 3 売上003 30000000
Use of the output file is the same as the example above.
<property name="lineAggregator"> <!-- (1) -->
<bean class="org.springframework.batch.item.file.transform.FormatterLineAggregator"
p:format="%s%4s%2s%s%10s"/> <!-- (2) -->
<property name="fieldExtractor"> <!-- (3) -->
<bean class="org.terasoluna.batch.functionaltest.ch05.fileaccess.plan.SalesPlanFixedLengthFieldExtractor"/>
</property>
</bean>
</property>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
lineAggregator |
Set |
Nothing |
|
(2) |
format |
Set the output format with the format used in the |
Nothing |
|
(3) |
fieldExtractor |
Set implementation class of |
|
public class SalesPlanDetail {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
public class SalesPlanFixedLengthFieldExtractor implements FieldExtractor<SalesPlanDetail> {
// (1)
@Override
public Object[] extract(SalesPlanDetail item) {
Object[] values = new Object[5]; // (2)
// (3)
values[0] = fillUpSpace(item.getBranchId(), 6); // (4)
values[1] = item.getYear();
values[2] = item.getMonth();
values[3] = fillUpSpace(item.getCustomerId(), 10); // (4)
values[4] = item.getAmount();
return values; // (7)
}
// It is a simple impl for example
private String fillUpSpace(String val, int num) {
String charsetName = "MS932";
int len;
try {
len = val.getBytes(charsetName).length; // (5)
} catch (UnsupportedEncodingException e) {
// omitted exception handling
}
String fillStr = "";
for (int i = 0; i < (num - len); i++) { // (6)
fillStr += " ";
}
return fillStr + val;
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Define a Object type array to store data after the conversion. |
(3) |
Get the value from the item(target bean), and perform the conversion as needed, set the value to an array of object. |
(4) |
Format the field that includes double-byte character. |
(5) |
Get the number of bytes for the character code. |
(6) |
Format the value by trimming or padding it according to be number of bytes. |
(7) |
Returns an array of Object type holding the processing result. |
5.3.2.3. Single String record
Describe the definition method when dealing with a single character string record file.
5.3.2.3.1. Input
An example of setting for reading the following input file is shown below.
Summary1:4,000,000,000
Summary2:5,000,000,000
Summary3:6,000,000,000
The setting for reading the above file is as follows.
<!-- (1) (2) (3) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"
p:encoding="MS932"
p:strict="true">
<property name="lineMapper"> <!-- (4) -->
<bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper"/>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the input file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the input file. |
JavaVM default character set |
|
(3) |
strict |
If true is set, an exception occurs if the input file does not exist(can not be opened). |
true |
|
(4) |
lineMapper |
Set |
Nothing |
5.3.2.3.2. Output
The setting for writing the above file is as follows.
Summary1:4,000,000,000
Summary2:5,000,000,000
Summary3:6,000,000,000
<!-- Writer -->
<!-- (1) (2) (3) (4) (5) (6) (7) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:encoding="MS932"
p:lineSeparator="
"
p:appendAllowed="true"
p:shouldDeleteIfEmpty="true"
p:shouldDeleteIfExists="false"
p:transactional="true">
<property name="lineAggregator"> <!-- (8) -->
<bean class="org.springframework.batch.item.file.transform.PassThroughLineAggregator"/>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the output file. |
Nothing |
|
(2) |
encoding |
Sets the character code of the output file. |
JavaVM default character set |
|
(3) |
lineSeparator |
Set the record break(line feed code) |
|
|
(4) |
appendAllowed |
If true, append to the existing file. |
false |
|
(5) |
shouldDeleteIfEmpty |
If true, delete the output if it is an empty file. |
false |
|
(6) |
shouldDeleteIfExists |
If true, delete the file if it already exists. |
true |
|
(7) |
transactional |
Set whether to perform transaction control. For details, see Transaction Control. |
true |
|
(8) |
lineAggregator |
Set |
Nothing |
5.3.2.4. Header and Footer
Explain the input / output method when there is a header / footer.
Here, a method of skipping the header footer by specifying the number of lines will be explained.
When the number of records of header / footer is variable and it is not possible to specify the number of lines, use PatternMatchingCompositeLineMapper
with reference to Multi format input
5.3.2.4.1. Input
There are 2 ways to skip the header record.
-
Set the number of lines to skip to property
linesToSkip
ofFlatFileItemReader
-
Remove header record in preprocessing by OS command
sales_plan_detail_11
branchId,year,month,customerId,amount
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
The first 2 lines is the header record.
The setting for reading the above file is as follows.
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"
p:linesToSkip=value="2"> <!-- (1) -->
<property name="lineMapper">
<!-- omitted settings -->
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
linesToSkip |
Set the number of header record lines to skipl. |
0 |
# Remove number of lines in header from the top of input file
tail -n +`expr 2 + 1` input.txt > output.txt
Use the tail command and get the 3rd line and after from input.txt, and then write it out to output.txt.
Please note that the value specified for option -n + K
of tail command is the number of header records + 1.
OS command to skip header record and footer record
By using the head and tail commands, it is possible to skip the header record and footer record by specifying the number of lines.
A sample of shell script to skip header record and footer record can be written as follows. An example of a shell script that removes a specified number of lines from a header / footer
|
Here shows how to recognize and retrive the header record.
The extraction of header information is implemented as follows.
- Settings
-
-
Write the process for header record in implementation class of
org.springframework.batch.item.file.LineCallbackHandler
-
Set the information retrieved in
LineCallbackHandler#handleLine()
tostepExecutionContext
-
-
Set implementation class of
LineCallbackHandler
to property skippedLinesCallbackof
FlatFileItemReader`` -
Set the number of lines to skip to property
linesToSkip
ofFlatFileItemReader
-
- Reading files and retrieving header information
-
-
For each line which is skipped by the setting of
linesToSkip
,LineCallbackHandler#handleLine()
is executed-
Header information is set to
stepExecutionContext
-
-
- Use retrieved header information
-
-
Get header information from
stepExecutionContext
and use it in the processing of the data part
-
An example of implementation for retrieving header record information is shown below.
<bean id="lineCallbackHandler"
class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.HoldHeaderLineCallbackHandler"/>
<!-- (1) (2) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:linesToSkip="2"
p:skippedLinesCallback-ref="lineCallbackHandler"
p:resource="file:#{jobParameters[inputFile]}">
<property name="lineMapper">
<!-- omitted settings -->
</property>
</bean>
<batch:job id="jobReadCsvSkipAndReferHeader" job-repository="jobRepository">
<batch:step id="jobReadCsvSkipAndReferHeader.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader"
processor="loggingHeaderRecordItemProcessor"
writer="writer" commit-interval="10"/>
<batch:listeners>
<batch:listener ref="lineCallbackHandler"/> <!-- (3) -->
</batch:listeners>
</batch:tasklet>
</batch:step>
</batch:job>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
linesToSkip |
Set the number of lines to skip. |
0 |
|
(2) |
skippedLinesCallback |
Set implementation class of |
Nothing |
|
(2) |
listener |
Set implementation class of |
Nothing |
About the listener
Since the following two cases are not automatically registered as
|
LineCallbackHandler
should be implemented as follows.
-
Implement
StepExecutionListener#beforeStep()
-
Implement
StepExecutionListener#beforeStep()
by either ways shown below-
Implement
StepExecutionListener
class and override beforeStep method -
Implement beforeStep method and annotate with
@BeforeStep
-
-
Get
StepExecution
in the beforeStep method and save it in the class field
-
-
Implement
LineCallbackHandler#handleLine()
-
Implement
LineCallbackHandler
class and overridehandleLine
-
Note that
handleLine
method will be executed each time skip is proceeded
-
-
Get
stepExecutionContext
fromStepExecution
and set header information tostepExecutionContext
-
@Component
public class HoldHeaderLineCallbackHandler implements LineCallbackHandler { // (!)
private StepExecution stepExecution; // (2)
@BeforeStep // (3)
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution; // (4)
}
@Override // (5)
public void handleLine(String line) {
this.stepExecution.getExecutionContext().putString("header", line); // (6)
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Define a field to save |
(3) |
Implement |
(4) |
Get the |
(5) |
Implement |
(6) |
Get |
Here is a sample of getting the header information from stepExecutionContext
and using it for processing of data part.
A sample of using header information in ItemProcessor
will be described as an example.
The same can be done when using header information in other components.
The implementation of using header information is done as follows.
-
As like the sample of implementing
LineCallbackHandler
, implementStepExecutionListener#beforeStep()
-
Get
StepExecution
in beforeStep method and save it to the class field -
Get
stepExecutionContext
and the header information fromStepExecution
and use it
@Component
public class LoggingHeaderRecordItemProcessor implements
ItemProcessor<SalesPlanDetail, SalesPlanDetail> {
private StepExecution stepExecution; // (1)
@BeforeStep // (2)
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution; // (3)
}
@Override
public SalesPlanDetail process(SalesPlanDetail item) throws Exception {
String headerData = this.stepExecution.getExecutionContext()
.getString("header"); // (4)
// omitted business logic
return item;
}
}
No | Description |
---|---|
(1) |
Define a field to save |
(2) |
Implement |
(3) |
Get the |
(4) |
Get |
About the use of ExecutionContext of Job/Step
In retrieving header (footer) information, the method is to store the read header information in In the example below, header information is stored in For details about |
Since Spring Batch nor TERASOLUNA Batch 5.x does not support skipping footer record, it needs to be done by OS command.
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000
The last 2 lines is the footer record.
The setting for reading the above file is as follows.
# Remove number of lines in footer from the end of input file
head -n -2 input.txt > output.txt
Use head command, get the lines above the second line from the last from input.txt, and write it out to output.txt.
It is reported to JIRA Spring Batch/BATCH-2539 that Spring Batch does not have a functions to skip the footer record. |
In Spring Batch and TERASOLUNA Batch 5.x, functions for skipping footer record retreiving footer information is not provided.
Therefore, it needs to be divided into preprocessing OS command and 2 steps as described below.
-
Divide footer record by OS command
-
In 1st step, read the footer record and set footer information to
ExecutionContext
-
In 2nd step, retrive footer information from
ExecutionContext
and use it
Retreiving footer information will be implemented as follows.
- Divide footer record by OS command
-
-
Use OS command to divide the input file to footer part and others
-
- 1st step, read the footer record and get footer information
-
-
Read the footer record and set it to
jobExecutionContext
-
Since the steps are different in storing and using footer information, store it in
jobExecutionContext
. -
The use of
jobExecutionContext
is same as thestepExecutionContext
explained in Retrieving header information, execpt for the scope of Job and Step.
-
-
- 2nd step, use the retrieved footer information
-
-
Get the footer information from
jobExecutionContext
and use it for processing of data part.
-
An example will be described in which footer information of the following file is taken out and used.
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000
The last 2 lines is footer record.
The setting to divide the above file into footer part and others by OS command is as follows.
# Extract non-footer record from input file and save to output file.
head -n -2 input.txt > input_data.txt
# Extract footer record from input file and save to output file.
tail -n 2 input.txt > input_footer.txt
Use head command, write footer part of input.txt to input_footer.txt, and others to input_data.txt.
Output file sample is as follows.
000001,2016,1,0000000001,1000000000
000002,2017,2,0000000002,2000000000
000003,2018,3,0000000003,3000000000
number of items,3
total of amounts,6000000000
Explain how to get and use footer information from a footer record divided by OS command.
The step of reading the footer record is divided into the preprocessing and main processing.
Refer to Flow Controll for details of step dividing.
In the example below, a sample is shown in which footer information is retreived and stored in jobExecutionContext
.
Footer information can be used by retreiving it from jobExecutionContext
like the same way described in Retrieving header information.
public class SalesPlanDetail {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
public class SalesPlanDetailFooter implements Serializable {
// omitted serialVersionUID
private String name;
private String value;
// omitted getter/setter
}
Define the Bean like below.
-
Define
ItemReader
to read footer record -
Define
ItemReader
to read data record -
Define business logic to retreive footer record
-
In the sample below, it is done by implementing
Tasklet
-
-
Define a job
-
Define a step with a preprocess to get footer information and a main process to read data records.
-
<!-- ItemReader for reading footer records -->
<!-- (1) -->
<bean id="footerReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[footerInputFile]}">
<property name="lineMapper">
<!-- omitted other settings -->
</property>
</bean>
<!-- ItemReader for reading data records -->
<!-- (2) -->
<bean id="dataReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[dataInputFile]}">
<property name="lineMapper">
<!-- omitted other settings -->
</property>
</bean>
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step">
<!-- omitted settings -->
</bean>
<!-- Tasklet for reading footer records -->
<bean id="readFooterTasklet"
class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.ReadFooterTasklet"/>
<batch:job id="jobReadAndWriteCsvWithFooter" job-repository="jobRepository">
<!-- (3) -->
<batch:step id="jobReadAndWriteCsvWithFooter.step01"
next="jobReadAndWriteCsvWithFooter.step02">
<batch:tasklet ref="readFooterTasklet"
transaction-manager="jobTransactionManager"/>
</batch:step>
<!-- (4) -->
<batch:step id="jobReadAndWriteCsvWithFooter.step02">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="dataReader"
writer="writer" commit-interval="10"/>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="readFooterTasklet"/> <!-- (5) -->
</batch:listeners>
</batch:job>
No | Item | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
footerReader |
Define |
||
(2) |
dataReader |
Define |
||
(3) |
preprocess step |
Define a step to get the footer information. |
||
(4) |
main process step |
A step of retreiving data information and using footer information is defined. |
||
(5) |
listeners |
Set |
Nothing |
An example for reading a file with footer record and storing it to jobExecutionContext
is shown below.
The way to make it as the implementation class of Tasklet
is as follows.
-
Inject the bean defined
footerReader
by name using@Inject@
and@Named
-
Set the footer information to
jobExecutionContext
-
The realization method is the same as Retrieving header information
-
public class ReadFooterTasklet implements Tasklet {
// (1)
@Inject
@Named("footerReader")
ItemStreamReader<SalesPlanDetailFooter> itemReader;
private JobExecution jobExecution;
@BeforeJob
public void beforeJob(JobExecution jobExecution) {
this.jobExecution = jobExecution;
}
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
ArrayList<SalesPlanDetailFooter> footers = new ArrayList<>();
// (2)
itemReader.open(chunkContext.getStepContext().getStepExecution()
.getExecutionContext());
SalesPlanDetailFooter footer;
while ((footer = itemReader.read()) != null) {
footers.add(footer);
}
// (3)
jobExecution.getExecutionContext().put("footers", footers);
return RepeatStatus.FINISHED;
}
}
No | Description |
---|---|
(1) |
Inject the bean defined |
(2) |
Use |
(3) |
Get |
5.3.2.4.2. Output
To output header information to a flat file, implement as follows.
-
Implement
org.springframework.batch.item.file.FlatFileHeaderCallback
-
Set the implemented
FlatFileHeaderCallback
to propertyheaderCallback
ofFlatFileItemWriter
-
By setting
headerCallback
,FlatFileHeaderCallback#writeHeader()
will be executed at first when processingFlatFileItemWriter
-
Implement FlatFileHeaderCallback
as follows.
-
Implement
FlatFileHeaderCallback
class and overridewriteHeader
. -
Write the header information using
Writer
from the argument.
Sample implementation of FlatFileHeaderCallback
is shown below.
@Component
// (1)
public class WriteHeaderFlatFileFooterCallback implements FlatFileHeaderCallback {
@Override
public void writeHeader(Writer writer) throws IOException {
// (2)
writer.write("omitted");
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Write the header information using |
<!-- (1) (2) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:headerCallback-ref="writeHeaderFlatFileFooterCallback"
p:lineSeparator="
"
p:resource="file:#{jobParameters[outputFile]}">
<property name="lineAggregator">
<!-- omitted settings -->
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
headerCallback |
Set implementation class of |
||
(2) |
lineSeparator |
Set the record break(line feed code) |
|
When implementing FlatFileHeaderCallback, printing line feed at the end of header information is not necessary
Right after executing |
To output footer information to a flat file, implement as follows.
-
Implement
org.springframework.batch.item.file.FlatFileFooterCallback
-
Set the implemented
FlatFileFooterCallback
to propertyfooterCallback
ofFlatFileItemWriter
-
By setting
footerCallback
,FlatFileHeaderCallback#writeFooter()
will be executed at first when processingFlatFileItemWriter
-
A method of outputting footer information with a flat file will be described.
Implement FlatFileFooterCallback
as follows.
-
Output footer information using
Writer
from the argument. -
Implement
FlatFileFooterCallback
class and overridewriteFooter
.
Below is an implementation sample of FlatFileFooterCallback
class for a Job to get footer information from ExecutionContext
and write it out to a file.
public class SalesPlanDetailFooter implements Serializable {
// omitted serialVersionUID
private String name;
private String value;
// omitted getter/setter
}
@Component
public class WriteFooterFlatFileFooterCallback implements FlatFileFooterCallback { // (1)
private JobExecution jobExecution;
@BeforeJob
public void beforeJob(JobExecution jobExecution) {
this.jobExecution = jobExecution;
}
@Override
public void writeFooter(Writer writer) throws IOException {
@SuppressWarnings("unchecked")
ArrayList<SalesPlanDetailFooter> footers = (ArrayList<SalesPlanDetailFooter>) this.jobExecution.getExecutionContext().get("footers"); // (2)
BufferedWriter bufferedWriter = new BufferedWriter(writer); // (3)
// (4)
for (SalesPlanDetailFooter footer : footers) {
bufferedWriter.write(footer.getName() +" is " + footer.getValue());
bufferedWriter.newLine();
bufferedWriter.flush();
}
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Get footer information form |
(3) |
In the sample, in order to use |
(4) |
Use the |
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:footerCallback-ref="writeFooterFlatFileFooterCallback"> <!-- (1) -->
<property name="lineAggregator">
<!-- omitted settings -->
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
footerCallback |
Set implementation class of |
5.3.2.5. Multiple Files
Describe how to handle multiple files.
5.3.2.5.1. Input
To read multiple files of the same record format, use org.springframework.batch.item.file.MultiResourceItemReader
.
MultiResourceItemReader
can use the specified ItemReader
to read multiple files specified by regular expressions.
Implement MultiResourceItemReader
as follows.
-
Define bean of
MultiResourceItemReader
-
Set file to read to property
resources
-
user regular expression to read multiple files
-
-
Set
ItemReader
to read files to propertydelegate
-
Below is a definition example of MultiResourceItemReader
to read multiple files with the following file names.
sales_plan_detail_01.csv
sales_plan_detail_02.csv
sales_plan_detail_03.csv
<!-- (1) (2) -->
<bean id="multiResourceReader"
class="org.springframework.batch.item.file.MultiResourceItemReader"
scope="step"
p:resources="file:input/sales_plan_detail_*.csv"
p:delegate-ref="reader"/>
</bean>
<!-- (3) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper">
<!-- omitted settings -->
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set multiple input files with regular expressions. |
Nothing |
|
(2) |
delegate |
Set |
Nothing |
|
(3) |
|
Bean definition for property |
It is unnecessary to specify resource for ItemReader used by MultiResourceItemReader
The |
5.3.2.5.2. Output
Explain how to define multiple files.
To output to a different file for a certain number of cases, use org.springframework.batch.item.file.MultiResourceItemWriter
.
MultiResourceItemWriter
can output to multiple files for each number specified using the specified ItemWriter
.
It is necessary to make the output file name unique so as not to overlap, but ResourceSuffixCreator
is provided as a mechanism for doing it.
ResourceSuffixCreator
is a class that generates a suffix that makes the file name unique.
For example, if you want to make the output target file a file name outputDir / customer_list_01.csv
(01
part is serial number), set it as follows.
-
Set
outputDir/customer_list_
toMultiResourceItemWriter
-
Implement a code to generate suffix
01.csv
(01
part is serial number) atResourceSuffixCreator
-
Serial numbers can use the value automatically incremented and passed from
MultiResourceItemWriter
-
-
outputDir/customer_list_01.csv
is set to theItemWriter
that is actually used
MultiResourceItemWriter
is defined as follows. How to implement ResourceSuffixCreator
is described later.
-
Define implementation class of
ResourceSuffixCreator
-
Define bean for
MultiResourceItemWriter
-
Set output file to property
resources
-
Set the file name up to the suffix given to implementation class of
ResourceSuffixCreator
-
-
Set implementation class of
ResourceSuffixCreator
that generates suffix to propertyresourceSuffixCreator
-
Set
ItemWrite
which is to be used to read file to propertydelegate
-
Set the number of output per file to property
itemCountLimitPerResource
-
<!-- (1) (2) (3) (4) -->
<bean id="multiResourceItemWriter"
class="org.springframework.batch.item.file.MultiResourceItemWriter"
scope="step"
p:resource="file:#{jobParameters[outputDir]}"
p:resourceSuffixCreator-ref="customerListResourceSuffixCreator"
p:delegate-ref="writer"
p:itemCountLimitPerResource="4"/>
</bean>
<!-- (5) -->
<bean id="writer"
class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="lineAggregator">
<!-- omitted settings -->
</property>
</bean>
<bean id="customerListResourceSuffixCreator"
class="org.terasoluna.batch.functionaltest.ch05.fileaccess.module.CustomerListResourceSuffixCreator"/> <!-- (6) -->
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Sets the state before adding the suffix of the output target file. |
Nothing |
|
(2) |
resourceSuffixCreator |
Set implementation class of |
|
|
(3) |
delegate |
Set a |
Nothing |
|
(4) |
itemCountLimitPerResource |
Set the number of output per file. |
|
|
(5) |
|
Property |
Setting of resource of ItemWrite used by MultiResourceItemWriter is not necessary
Since |
Implement ResourceSuffixCreator
as follows.
-
Implement
ResourceSuffixCreator
and override getSuffix method -
Use argument’s
index
and generate suffix to return-
index
is anint
type value with initial value1
, and will be incremented for each output file
-
// (1)
public class CustomerListResourceSuffixCreator implements ResourceSuffixCreator {
@Override
public String getSuffix(int index) {
return String.format("%02d", index) + ".csv"; // (2)
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Use argument’s |
5.3.2.6. Control Break
How to actually do the Control Break will be described here.
- What is Control Break
-
Control Break process(or Key Break process) is a process method to read sorted records one by one, and handle records with a certain item(key item) as one group.
It is an algorithm that is used mainly for aggregating data, continues counting while key items are the same value, and outputs aggregate values when key items become different values.
In order to perform the control break processing, it is necessary to pre-read the record in order to judge the change of the group.
Pre-reading records can be done by using org.springframework.batch.item.support.SingleItemPeekableItemReader
.
Also, control break can be processed only in tasklet model.
This is because of the premise that the chunk model is based on, which are "processing N data rows defined by one line" and "transaction boundaries every fixed number of lines",
does not fit with the control break’s basic algorithm, "proceed at the turn of group".
The execution timing of control break processing and comparison conditions are shown below.
-
Execute control break before processing the target record
-
Keep the previously read record, compare previous record with current record
-
-
Execute control break after processing the target record
-
Pre-read the next record by
SingleItemPeekableItemReader
and compare the current record with the next record
-
A sample for outputting process result from input data using control break is shown below.
01,2016,10,1000
01,2016,11,1500
01,2016,12,1300
02,2016,12,900
02,2016,12,1200
Header Branch Id : 01,,,
01,2016,10,1000
01,2016,11,1500
01,2016,12,1300
Summary Branch Id : 01,,,3800
Header Branch Id : 02,,,
02,2016,12,900
02,2016,12,1200
Summary Branch Id : 02,,,2100
@Component
public class ControlBreakTasklet implements Tasklet {
@Inject
SingleItemPeekableItemReader<SalesPerformanceDetail> reader; // (1)
@Inject
ItemStreamWriter<SalesPerformanceDetail> writer;
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
// omitted.
SalesPerformanceDetail previousData = null; // (2)
BigDecimal summary = new BigDecimal(0); //(3)
List<SalesPerformanceDetail> items = new ArrayList<>(); // (4)
try {
reader.open(executionContext);
writer.open(executionContext);
while (reader.peek() != null) { // (5)
SalesPerformanceDetail data = reader.read(); // (6)
// (7)
if (isBreakByBranchId(previousData, data)) {
SalesPerformanceDetail beforeBreakData =
new SalesPerformanceDetail();
beforeBreakData.setBranchId("Header Branch Id : "
+ currentData.getBranchId());
items.add(beforeBreakData);
}
// omitted.
items.add(data); // (8)
SalesPerformanceDetail nextData = reader.peek(); // (9)
summary = summary.add(data.getAmount());
// (10)
SalesPerformanceDetail afterBreakData = null;
if (isBreakByBranchId(nextData, data)) {
afterBreakData = new SalesPerformanceDetail();
afterBreakData.setBranchId("Summary Branch Id : "
+ currentData.getBranchId());
afterBreakData.setAmount(summary);
items.add(afterBreakData);
summary = new BigDecimal(0);
writer.write(items); // (11)
items.clear();
}
previousData = data; // (12)
}
} finally {
try {
reader.close();
} catch (ItemStreamException e) {
}
try {
writer.close();
} catch (ItemStreamException e) {
}
}
return RepeatStatus.FINISHED;
}
// (13)
private boolean isBreakByBranchId(SalesPerformanceDetail o1,
SalesPerformanceDetail o2) {
return (o1 == null || !o1.getBranchId().equals(o2.getBranchId()));
}
}
No | Description |
---|---|
(1) |
Inject |
(2) |
Define a variable to set the previously read record. |
(3) |
Define a variable to set aggregated values for each group. |
(4) |
Define a variable to set records for each group including the control break’s process result |
(5) |
Repeat the process until there is no input data. |
(6) |
Read the record to be processed. |
(7) |
Execute a control break before target record processing. |
(8) |
Set the process result to the variable defined in (4). |
(9) |
Pre-read the next record. |
(10) |
Execute a control break after target record processing. In this case, if it is at the end of the group, the aggregated data is set in the trailer and stored in the variable defined in (4). |
(11) |
Output processing results for each group. |
(12) |
Store the processing record in the variable defined in (2). |
(13) |
Judge whether the key item has switched or not. |
<!-- (1) -->
<bean id="reader"
class="org.springframework.batch.item.support.SingleItemPeekableItemReader"
p:delegate-ref="delegateReader" /> <!-- (2) -->
<!-- (3) -->
<bean id="delegateReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="branchId,year,month,customerId,amount"/>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail"/>
</property>
</bean>
</property>
</bean>
No | Description |
---|---|
(1) |
Define bean for |
(2) |
Set the bean of ItemReader that actually reads the file to |
(3) |
Define a bean for ItemReader that actually read the file. |
5.3.3. How To Extend
Here, an explanation will be written based on the below case.
-
Input/Output of XML File
-
Input/Output of Multi format
5.3.3.1. Implmementation of FieldSetMapper
Explain how to implement FieldSetMapper
yourself.
Implement FieldSetMapper
class as follows.
-
Implement
FieldSetMapper
class and override mapFieldSet method. -
Get the value from argument’s
FieldSet
, do any process needed, and then set it to the conversion target bean as a return value-
The
FieldSet
class is a class that holds data in association with an index or name, as in the JDBCResultSet
class -
The
FieldSet
class holds the value of each field of a record divided byLineTokenizer
-
You can store and retrieve values by specifying an index or name
-
Here is sample implementation for reading a file that includes data that needs to be converted, such as BigDecimal type with comma and Date type of Japanese calendar format.
"000001","平成28年1月1日","000000001","1,000,000,000"
"000002","平成29年2月2日","000000002","2,000,000,000"
"000003","平成30年3月3日","000000003","3,000,000,000"
No | Field Name | Data Type | Note |
---|---|---|---|
(1) |
branchId |
String |
|
(2) |
Date |
Date |
Japanese calendar format |
(3) |
customerId |
String |
|
(4) |
amount |
BigDecimal |
include comma |
public class UseDateSalesPlanDetail {
private String branchId;
private Date date;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
@Component
public class UseDateSalesPlanDetailFieldSetMapper implements FieldSetMapper<UseDateSalesPlanDetail> { // (1)
/**
* {@inheritDoc}
*
* @param fieldSet {@inheritDoc}
* @return Sales performance detail.
* @throws BindException {@inheritDoc}
*/
@Override
public UseDateSalesPlanDetail mapFieldSet(FieldSet fieldSet) throws BindException {
UseDateSalesPlanDetail item = new UseDateSalesPlanDetail(); // (2)
item.setBranchId(fieldSet.readString("branchId")); // (3)
// (4)
DateFormat japaneseFormat = new SimpleDateFormat("GGGGy年M月d日", new Locale("ja", "JP", "JP"));
try {
item.setDate(japaneseFormat.parse(fieldSet.readString("date")));
} catch (ParseException e) {
// omitted exception handling
}
// (5)
item.setCustomerId(fieldSet.readString("customerId"));
// (6)
DecimalFormat decimalFormat = new DecimalFormat();
decimalFormat.setParseBigDecimal(true);
try {
item.setAmount((BigDecimal) decimalFormat.parse(fieldSet.readString("amount")));
} catch (ParseException e) {
// omitted exception handling
}
return item; // (7)
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Define a variable of conversion target class to store converted data. |
(3) |
Get |
(4) |
Get |
(5) |
Get |
(4) |
Get |
(7) |
Return the conversion target class holding the processing result. |
Getting value from FieldSet class
The
etc |
5.3.3.2. XML File
Describe the definition method when dealing with XML files.
For the conversion process between Bean and XML (O / X (Object / XML) mapping), use the library provided by Spring Framework.
Implementation classes are provided as Marshaller
and Unmarshaller
using XStream, JAXB, etc. as libraries for converting between XML files and objects.
Use one suitable for your situation.
Below are features and points for adopting JAXB and XStream.
- JAXB
-
-
Specify the bean to be converted in the bean definition file
-
Validation using a schema file can be performed
-
It is useful when the schema is defined externally and the specification of the input file is strictly determined
-
- XStream
-
-
You can map XML elements and bean fields flexibly in the bean definition file
-
It is useful when you need to flexibly map beans
-
Here is a sample using JAXB.
5.3.3.2.1. Input
For inputting XML file, use org.springframework.batch.item.xml.StaxEventItemReader
provided by Spring Batch.
StaxEventItemReader
can read the XML file by mapping the XML file to the bean using the specified Unmarshaller
.
Implement StaxEventItemReader
as follows.
-
Add
@XmlRootElement
to the conversion target class of XML root element -
Set below property to
StaxEventItemReader
-
Set the file to read to property
resource
-
Set the name of the root element to property
fragmentRootElementName
-
Set
org.springframework.oxm.jaxb.Jaxb2Marshaller
to propertyunmarshaller
-
-
Set below property to
Jaxb2Marshaller
-
Set conversion target classs in list format to property
classesToBeBound
-
To validate using schema file, set the 2 properties as below
-
Set the schema file for validation to property
schema
-
Set implementation class of
ValidationEventHandler
to propertyvalidationEventHandler
to handle events occured during the validation
-
-
Here is the sample setting to read the input file below.
<?xml version="1.0" encoding="UTF-8"?>
<records>
<SalesPlanDetail>
<branchId>000001</branchId>
<year>2016</year>
<month>1</month>
<customerId>0000000001</customerId>
<amount>1000000000</amount>
</SalesPlanDetail>
<SalesPlanDetail>
<branchId>000002</branchId>
<year>2017</year>
<month>2</month>
<customerId>0000000002</customerId>
<amount>2000000000</amount>
</SalesPlanDetail>
<SalesPlanDetail>
<branchId>000003</branchId>
<year>2018</year>
<month>3</month>
<customerId>0000000003</customerId>
<amount>3000000000</amount>
</SalesPlanDetail>
</records>
@XmlRootElement(name = "SalesPlanDetail") // (1)
public class SalesPlanDetailToJaxb {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
No | Description |
---|---|
(1) |
Add |
The setting for reading the above file is as follows.
<!-- (1) (2) (3) -->
<bean id="reader"
class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}"
p:fragmentRootElementName="SalesPlanDetail"
p:strict="true">
<property name="unmarshaller"> <!-- (4) -->
<!-- (5) (6) -->
<bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller"
p:schema="file:files/test/input/ch05/fileaccess/SalesPlanDetail.xsd"
p:validationEventHandler-ref="salesPlanDetailValidationEventHandler">
<property name="classesToBeBound"> <!-- (7) -->
<list>
<value>org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailToJaxb</value>
</list>
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set the input file. |
Nothing |
|
(2) |
fragmentRootElementName |
Set the name of the root element. |
Nothing |
|
(3) |
strict |
If true is set, an exception occurs if the input file does not exist(can not be opened). |
true |
|
(4) |
unmarshaller |
Set the unmarshaller. |
Nothing |
|
(5) |
schema |
Set shema file for validation. |
||
(6) |
validationEventHandler |
Set implementation class of |
||
(7) |
classesToBeBound |
Set conversion target classes in list format. |
Nothing |
@Component
// (1)
public class SalesPlanDetailValidationEventHandler implements ValidationEventHandler {
/**
* Logger.
*/
private static final Logger logger =
LoggerFactory.getLogger(SalesPlanDetailValidationEventHandler.class);
@Override
public boolean handleEvent(ValidationEvent event) {
// (2)
logger.error("[EVENT [SEVERITY:{}] [MESSAGE:{}] [LINKED EXCEPTION:{}]" +
" [LOCATOR: [LINE NUMBER:{}] [COLUMN NUMBER:{}] [OFFSET:{}]" +
" [OBJECT:{}] [NODE:{}] [URL:{}] ] ]",
event.getSeverity(),
event.getMessage(),
event.getLinkedException(),
event.getLocator().getLineNumber(),
event.getLocator().getColumnNumber(),
event.getLocator().getOffset(),
event.getLocator().getObject(),
event.getLocator().getNode(),
event.getLocator().getURL());
return false; // (3)
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Get event information from argument’s event( |
(3) |
Return false to end the search process.
Return true to continue the search process. |
Adding dependency library
Library dependency needs to be added as below when using
Spring Object/Xml Marshalling provided by Spring Framework
such as
|
5.3.3.2.2. Output
Use org.springframework.batch.item.xml.StaxEventItemWriter
provided by Spring Batch for outputting XML file.
StaxEventItemWriter
can output an XML file by mapping the bean to XML using the specified Marshaller
.
Implement StaxEventItemWriter
as follows.
-
Do the below setting to conversion target class
-
Add
@XmlRootElement
to the class as it is to be the root element of the XML -
Use
@XmlType
annotation to set orders for outputting fields -
If there is a field to be excluded from conversion to XML, add
@XmlTransient
to the getter method of it’s field
-
-
Set below properties to
StaxEventItemWriter
-
Set output target file to property
resource
-
Set
org.springframework.oxm.jaxb.Jaxb2Marshaller
to propertymarshaller
-
-
Set below property to
Jaxb2Marshaller
-
Set conversion target classes in list format to property
classesToBeBound
-
Here is a sample for outputting below file.
<?xml version="1.0" encoding="UTF-8"?>
<records>
<Customer>
<customerId>001</customerId>
<customerName>CustomerName001</customerName>
<customerAddress>CustomerAddress001</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>001</chargeBranchId></Customer>
<Customer>
<customerId>002</customerId>
<customerName>CustomerName002</customerName>
<customerAddress>CustomerAddress002</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>002</chargeBranchId></Customer>
<Customer>
<customerId>003</customerId>
<customerName>CustomerName003</customerName>
<customerAddress>CustomerAddress003</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>003</chargeBranchId>
</Customer>
</records>
About XML file fomatX(line break and indents)
In the sample above, the output XML file has been formatted(has line break and indents), but the actual XML will not be formatted.
To avoid this and output the formatted XML, set
|
@XmlRootElement(name = "Customer") // (2)
@XmlType(propOrder={"customerId", "customerName", "customerAddress",
"customerTel", "chargeBranchId"}) // (2)
public class CustomerToJaxb {
private String customerId;
private String customerName;
private String customerAddress;
private String customerTel;
private String chargeBranchId;
private Timestamp createDate;
private Timestamp updateDate;
// omitted getter/setter
@XmlTransient // (3)
public Timestamp getCreateDate() { return createDate; }
@XmlTransient // (3)
public Timestamp getUpdateDate() { return updateDate; }
}
No | Description |
---|---|
(1) |
Add |
(2) |
Use |
(3) |
Add |
The settings for writing the above file are as follows.
<!-- (1) (2) (3) (4) (5) (6) -->
<bean id="writer"
class="org.springframework.batch.item.xml.StaxEventItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:encoding="MS932"
p:rootTagName="records"
p:overwriteOutput="true"
p:shouldDeleteIfEmpty="false"
p:transactional="true">
<property name="marshaller"> <!-- (7) -->
<bean class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="classesToBeBound"> <!-- (8) -->
<list>
<value>org.terasoluna.batch.functionaltest.ch05.fileaccess.model.mst.CustomerToJaxb</value>
</list>
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
resource |
Set output file |
Nothing |
|
(2) |
encoding |
Set character encoding for output file. |
JavaVM default character set |
|
(3) |
rootTagName |
Set XML root tag name. |
||
(4) |
overwriteOutput |
If true, delete the file if it already exists. |
true |
|
(5) |
shouldDeleteIfEmpty |
If true, delete the output if it is an empty file. |
false |
|
(6) |
transactional |
Set whether to perform transaction control. For details, see Transaction Control. |
true |
|
(7) |
marshaller |
Set the marshaller.
Set |
Nothing |
|
(8) |
classesToBeBound |
Set conversion target classes in list format. |
Nothing |
Adding dependency library
Library dependency needs to be added as below when using
Spring Object/Xml Marshalling provided by Spring Framework
such as
|
For output of header and footer, use the implementation class of org.springframework.batch.item.xml.StaxWriterCallback
.
Set implementation of headerCallback
for header output, and footerCallback
for footer output.
Below is a sample of output file.
Header is printed right after the root tag’s openning element, and footer is printed right before the root element’s closing tag.
<?xml version="1.0" encoding="UTF-8"?>
<records>
<!-- Customer list header -->
<Customer>
<customerId>001</customerId>
<customerName>CustomerName001</customerName>
<customerAddress>CustomerAddress001</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>001</chargeBranchId></Customer>
<Customer>
<customerId>002</customerId>
<customerName>CustomerName002</customerName>
<customerAddress>CustomerAddress002</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>002</chargeBranchId></Customer>
<Customer>
<customerId>003</customerId>
<customerName>CustomerName003</customerName>
<customerAddress>CustomerAddress003</customerAddress>
<customerTel>11111111111</customerTel>
<chargeBranchId>003</chargeBranchId>
</Customer>
<!-- Customer list footer -->
</records>
About XML file fomatX(line break and indents)
In the sample above, the output XML file has been formatted(has line break and indents), but the actual XML will not be formatted. Refer to About XML file fomatX(line break and indents) for details. |
To output the above file, do the setting as below.
<!-- (1) (2) -->
<bean id="writer"
class="org.springframework.batch.item.xml.StaxEventItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:headerCallback-ref="writeHeaderStaxWriterCallback"
p:footerCallback-ref="writeFooterStaxWriterCallback">
<property name="marshaller">
<!-- omitted settings -->
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
headerCallback |
Set implementation class of |
||
(2) |
footerCallback |
Set implementation class of |
Implement StaxWriterCallback
as follows.
-
Implement
StaxWriterCallback
class and override write method -
Print header/footer by using the argument’s
XMLEventWriter
@Component
public class WriteHeaderStaxWriterCallback implements StaxWriterCallback { // (1)
@Override
public void write(XMLEventWriter writer) throws IOException {
XMLEventFactory factory = XMLEventFactory.newInstance();
try {
writer.add(factory.createComment(" Customer list header ")); // (2)
} catch (XMLStreamException e) {
// omitted exception handling
}
}
}
No | Description |
---|---|
(1) |
Implement |
(2) |
Print header/footer by using the argument’s |
XML output using XMLEventFactory
In the output of the XML file using the The |
5.3.3.3. Multi format
Describe the definition method when dealing with multi format file.
As described in Overview, multi format is basically (Header N Rows + Data N Rows + Trailer N Rows) * N + Footer N Rows format, but there are other format patterns like below.
-
When there is a footer record or not
-
When there are records with different formats in the same record classification
-
eg) there is a data record that has 5 items and a data record with 6 items in data part
-
Although there are several patterns to multi format file, implementation method will be the same.
5.3.3.3.1. Input
Use org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
provided by Spring Batch for readeing multi format file.
In multi format file, each record needs to be mapped to defferent bean for each format.
PatternMatchingCompositeLineMapper
will select the LineTokenizer
and FieldSetMapper
to use for the record by regular expression.
For example, LineTokenizers
to use can be selected like below.
-
Use
userTokenizer
if the beginning of the record matches to regular expressionUSER*
-
Use
lineATokenizer
if the beginning of the record matches to regular expressionLINEA*
Restrictions on the format of records when reading multi-format files
In order to read a multi-format file, it must be in a format that can distinguish record classification by regular expression. |
Implement PatternMatchingCompositeLineMapper
as follows.
-
Define a class with record division for conversino target class, and inherit this class to each classes of each record division
-
Define
LineTokenizer
andFieldSetMapper
to map each record to bean -
Define
PatternMatchingCompositeLineMapper
-
Set
LineTokenizer
that correspond to each record division to propertytokenizers
-
Set
FieldSetMapper
that correspond to each record division to propertyfieldSetMappers
-
Define a class with record division for conversino target class, and inherit this class to each classes of each record division
However, if you simply map Therefore, it is possible to solve this by giving an inheritance relation to the class to be converted and specifying a superclass as the type of the argument of The class diagram of the conversion target class and the definition sample of Class diagram of conversion target class
Implementation Sample of ItemProcessor
|
Here is a setting sample and implementation sample for reading below input file.
H,Sales_plan_detail header No.1
D,000001,2016,1,0000000001,100000000
D,000001,2016,1,0000000002,200000000
D,000001,2016,1,0000000003,300000000
T,000001,3,600000000
H,Sales_plan_detail header No.2
D,00002,2016,1,0000000004,400000000
D,00002,2016,1,0000000005,500000000
D,00002,2016,1,0000000006,600000000
T,00002,3,1500000000
H,Sales_plan_detail header No.3
D,00003,2016,1,0000000007,700000000
D,00003,2016,1,0000000008,800000000
D,00003,2016,1,0000000009,900000000
T,00003,3,2400000000
E,3,9,4500000000
Below is the bean definition sample of conversion target class.
/**
* Model of record indicator of sales plan detail.
*/
public class SalesPlanDetailMultiLayoutRecord {
protected String record;
// omitted getter/setter
}
/**
* Model of sales plan detail header.
*/
public class SalesPlanDetailHeader extends SalesPlanDetailMultiLayoutRecord {
private String description;
// omitted getter/setter
}
/**
* Model of Sales plan Detail.
*/
public class SalesPlanDetailData extends SalesPlanDetailMultiLayoutRecord {
private String branchId;
private int year;
private int month;
private String customerId;
private BigDecimal amount;
// omitted getter/setter
}
/**
* Model of Sales plan Detail.
*/
public class SalesPlanDetailTrailer extends SalesPlanDetailMultiLayoutRecord {
private String branchId;
private int number;
private BigDecimal total;
// omitted getter/setter
}
/**
* Model of Sales plan Detail.
*/
public class SalesPlanDetailEnd extends SalesPlanDetailMultiLayoutRecord {
// omitted getter/setter
private int headNum;
private int trailerNum;
private BigDecimal total;
// omitted getter/setter
}
The setting for reading the above file is as follows.
<!-- (1) -->
<bean id="headerDelimitedLineTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="record,description"/>
<bean id="dataDelimitedLineTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="record,branchId,year,month,customerId,amount"/>
<bean id="trailerDelimitedLineTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="record,branchId,number,total"/>
<bean id="endDelimitedLineTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="record,headNum,trailerNum,total"/>
<!-- (2) -->
<bean id="headerBeanWrapperFieldSetMapper"
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailHeader"/>
<bean id="dataBeanWrapperFieldSetMapper"
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailData"/>
<bean id="trailerBeanWrapperFieldSetMapper"
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailTrailer"/>
<bean id="endBeanWrapperFieldSetMapper"
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.ch05.fileaccess.model.plan.SalesPlanDetailEnd"/>
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="file:#{jobParameters[inputFile]}">
<property name="lineMapper"> <!-- (3) -->
<bean class="org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper">
<property name="tokenizers"> <!-- (4) -->
<map>
<entry key="H*" value-ref="headerDelimitedLineTokenizer"/>
<entry key="D*" value-ref="dataDelimitedLineTokenizer"/>
<entry key="T*" value-ref="trailerDelimitedLineTokenizer"/>
<entry key="E*" value-ref="endDelimitedLineTokenizer"/>
</map>
</property>
<property name="fieldSetMappers"> <!-- (5) -->
<map>
<entry key="H*" value-ref="headerBeanWrapperFieldSetMapper"/>
<entry key="D*" value-ref="dataBeanWrapperFieldSetMapper"/>
<entry key="T*" value-ref="trailerBeanWrapperFieldSetMapper"/>
<entry key="E*" value-ref="endBeanWrapperFieldSetMapper"/>
</map>
</property>
</bean>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
The |
Define |
||
(2) |
The |
Define |
||
(3) |
lineMapper |
Set |
Nothing |
|
(3) |
tokenizers |
Set |
Nothing |
|
(4) |
tokenizers |
Set |
Nothing |
5.3.3.3.2. Output
Describe the definition method when dealing with multi format file.
For reading multi format file PatternMatchingCompositeLineMapper
was provided to determine which LineTokenizer
and FieldSetMapper
to use for each record division.
However for writing, no similar components are provided.
Therefore, processing up to conversion target class to record (character string) within ItemProcessor
is carried out, and ItemWriter
writes the received character string as it is to achieve writing of multi format file .
Implement multi format output as follows.
-
ItemProcessor
converts the conversion target class to a record (character string) and passes it toItemWriter
-
In the sample, define
LineAggregator
andFieldExtractor
for each record division and use it by injecting it withItemProcessor
-
-
ItemWriter
writes the received character string as it is to the file-
Set
PassThroughLineAggregator
to propertylineAggregator
ofItemWriter
-
PassThroughLineAggregator
isLineAggregator
which returnsitem.toString ()
result of received item
-
Here is a setting sample and implementation sample for writing below output file.
H,Sales_plan_detail header No.1
D,000001,2016,1,0000000001,100000000
D,000001,2016,1,0000000002,200000000
D,000001,2016,1,0000000003,300000000
T,000001,3,600000000
H,Sales_plan_detail header No.2
D,00002,2016,1,0000000004,400000000
D,00002,2016,1,0000000005,500000000
D,00002,2016,1,0000000006,600000000
T,00002,3,1500000000
H,Sales_plan_detail header No.3
D,00003,2016,1,0000000007,700000000
D,00003,2016,1,0000000008,800000000
D,00003,2016,1,0000000009,900000000
T,00003,3,2400000000
E,3,9,4500000000
Definition of conversion target class and ItemProcessor
sample, notes are the same as Multi format Input.
Settings to output above file is as below.
Bean definition sample for ItemProcessor
is written later.
<!-- (1) -->
<bean id="headerDelimitedLineAggregator"
class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="record,description"/>
</property>
</bean>
<bean id="dataDelimitedLineAggregator"
class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="record,branchId,year,month,customerId,amount"/>
</property>
</bean>
<bean id="trailerDelimitedLineAggregator"
class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="record,branchId,number,total"/>
</property>
</bean>
<bean id="endDelimitedLineAggregator"
class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="record,headNum,trailerNum,total"/>
</property>
</bean>
<bean id="writer" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"/>
<property name="lineAggregator"> <!-- (2) -->
<bean class="org.springframework.batch.item.file.transform.PassThroughLineAggregator"/>
</property>
</bean>
No | Property Name | Setting contents | Required | Default Value |
---|---|---|---|---|
(1) |
The |
Define |
||
(2) |
lineAggregator |
Set |
Nothing |
Implementation sample of ItemProcessor
is shown below.
In this sample, only the process of converting the received item to a string and passing it to ItemWriter
is performed.
public class MultiLayoutItemProcessor implements
ItemProcessor<SalesPlanDetailMultiLayoutRecord, String> {
// (1)
@Inject
@Named("headerDelimitedLineAggregator")
DelimitedLineAggregator<SalesPlanDetailMultiLayoutRecord> headerDelimitedLineAggregator;
@Inject
@Named("dataDelimitedLineAggregator")
DelimitedLineAggregator<SalesPlanDetailMultiLayoutRecord> dataDelimitedLineAggregator;
@Inject
@Named("trailerDelimitedLineAggregator")
DelimitedLineAggregator<SalesPlanDetailMultiLayoutRecord> trailerDelimitedLineAggregator;
@Inject
@Named("endDelimitedLineAggregator")
DelimitedLineAggregator<SalesPlanDetailMultiLayoutRecord> endDelimitedLineAggregator;
@Override
// (2)
public String process(SalesPlanDetailMultiLayoutRecord item) throws Exception {
String record = item.getRecord(); // (3)
switch (record) { // (4)
case "H":
return headerDelimitedLineAggregator.aggregate(item); // (5)
case "D":
return dataDelimitedLineAggregator.aggregate(item); // (5)
case "T":
return trailerDelimitedLineAggregator.aggregate(item); // (5)
case "E":
return endDelimitedLineAggregator.aggregate(item); // (5)
default:
throw new IncorrectRecordClassificationException(
"Record classification is incorrect.[value:" + record + "]");
}
}
}
No | Description |
---|---|
(1) |
Inject |
(2) |
Set the superclass of the class to be converted whose inheritance relation is given as the argument of |
(3) |
Get the record division from item. |
(4) |
Judge record division and do any process for each record division. |
(5) |
Use |
5.4. Exclusive Control
5.4.1. Overview
Exclusive control is a process performed to maintain consistency of data when update processing is performed simultaneously for the same resource from multiple transactions. In the case where there is a possibility that updating processing is performed simultaneously for the same resource from multiple transactions, it is basically necessary to perform exclusive control.
Multiple transaction means following in this chapter.
-
Transaction at the time of simultaneous execution of multiple jobs
-
Transaction at the time of simultaneous execution with online processing
Exclusive control of multiple jobs
When multiple jobs are executed at the same time, it is fundamental to design jobs so that exclusive control is not required. This means that it is basic to divide the resources to be accessed and the processing target for each job. |
Since the concept of exclusive control is the same as online processing, please refer to Exclusive Control in TERASOLUNA Server 5.x Development Guideline
Here, focus on the part not described in TERASOLUNA Server 5.x.
The usage method of this function is same in the chunk model as well as tasklet model.
5.4.1.1. Necessity of Exclusive Control
For the necessity of exclusive control, please refer to Necessity of Exclusive Control in TERASOLUNA Server 5.x Development Guideline.
5.4.1.2. Exclusive Control for File
Exclusive control for file is generally implemented by file locking.
- File Locking
-
File locking is a mechanism for restricting reading and writing from other programs while using files with a certain program. The outline of file lock processing is as follows.
-
The batch process A acquires the lock of the file and starts the file updating process.
-
Batch process B attempts to update the same file and attempts to acquire the file lock fails.
-
The batch process A ends the processing and unlocks the file
-
The Batch Process A tries to acquire the lock of the Target File.
-
The Batch Process A succeeds in acquiring the lock of the Target File.
-
The Batch Process B tries to acquire the lock of the Target File.
-
The Batch Process A writes the Target File.
-
Since the Batch Process A is locked the Target File, the Batch Process B fails to acquire the lock of the Target File.
-
The Batch Process B performs processing of file update failure.
-
The Batch Process A releases the lock of the Target File.
Prevention of Deadlock
Even in a file,in the same as a database, when acquiring a lock on multiple files, deadlock may occur in some cases.
Therefore, it is important to make a rule the update order of files. |
5.4.1.3. Exclusive Control of Database
For details About Exclusive Control of Database, refer to Exclusive control using database locking in TERASOLUNA Server 5.x Development Guideline.
5.4.1.4. Choose Exclusive Control Scheme
Explain the locking scheme and suitable situation for TERASOLUNA Batch 5.x.
Lock scheme | Suitable situation |
---|---|
The case where process can be continued with the update result of another transaction being out of processing targets in a transaction at the time of concurrent execution. |
|
Process in which the processing time is long and it is difficult to redo because the situation of the target data has changed during processing |
5.4.1.5. Relationship between Exclusive Control and Components
The relationship between each component provided by TERASOLUNA Batch 5.x and exclusive control is as follows.
- Optimistic lock
Processing model | Component | File | Database |
---|---|---|---|
Chunk |
ItemReader |
- |
Acquires data including a column that can confirm that the same data is obtained at the time of acquiring and updating such as Version column. |
ItemProcessor |
- |
Exclusive control is unnecessary. |
|
ItemWriter |
- |
Check the difference between acquisition and update, confirm that it is not updated by other processing, then update. |
|
Tasklet |
Tasklet |
- |
When acquiring data, execute the processing described in the ItemReader section, and when updating the data, the processing described in ItemWriter section. |
Optimistic lock on files
Because of the characteristic of the file, do not apply optimistic lock on files. |
- Pessimistic lock
Processing model | Component | File | Database |
---|---|---|---|
Chunk |
ItemReader |
- |
Use FOR UPDATE of SQL statement. |
ItemProcessor |
Since it is fundamental to handle locked data, in principle, exclusive control is not performed here. |
||
ItemWriter |
- |
Update data without conscious of exclusion. |
|
Tasklet |
Tasklet |
Get a file lock right after opening a file with ItemStreamReader. |
When acquiring data, execute the processing described in the ItemReader section, and when updating the data, the processing described in ItemWriter section. |
Pessimistic lock on file
Pessimistic lock on files should be implemented in the tasklet model. In the chunk model, due to its structure, there is a period that can not be excluded in the gap of chunk processing. Also, it is assumed that file access is done by Injecting ItemStreamReader / ItemStreamWriter. |
Waiting time due to Pessimistic lock in database
When pessimistic locking is performed, the wait time for processing due to contention may be prolonged. In that case, it is reasonable to use the pessimistic lock by specifying the NO WAIT option and the timeout time. |
5.4.2. How to use
Explain how to use exclusive control by resource.
5.4.2.1. Exclusive control of file
Exclusive control of file with TERASOLUNA Batch 5.x is realized by implementing Tasklet.
As a means of achieving exclusion, exclusive control is performed by file lock acquisition using the java.nio.channels.FileChannel
class.
Details of the FileChannel class
For details and how to use |
Show an example of using FileChannel
class to get a file lock.
@Component
@Scope("step")
public class FileExcusiveTasklet implements Tasklet {
private String targetPath = null; // (1)
@Inject
ItemStreamReader<SalesPlanDetail> reader;
@Inject
ItemStreamWriter<SalesPlanDetailWithProcessName> writer;
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
// omitted.
FileChannel fc = null;
FileLock fileLock = null;
try {
reader.open(executionConetxt);
writer.open(executionConetxt); // (2)
try {
File file = new File(targetPath);
fc = FileChannel.open(file.toPath(), StandardOpenOption.WRITE,
StandardOpenOption.CREATE,
StandardOpenOption.APPEND); // (3)
fileLock = fc.tryLock(); // (4)
} catch (IOException e) {
logger.error("Failure other than lock acquisition", e);
throw new FailedOtherAcquireLockException(
"Failure other than lock acquisition", e);
}
if (fileLock == null) {
logger.error("Failed to acquire lock. [processName={}]", processName);
throw new FailedAcquireLockException("Failed to acquire lock");
}
// (5)
SalesPlanDetail item;
List<SalesPlanDetailWithProcessName> items = new ArrayList<>();
while ((item = reader.read()) != null) {
// omitted.
items.add(item);
if (items.size() >= 10) {
writer.write(items);
items.clear();
}
}
if (items.size() > 0) {
writer.write(items);
}
} finally {
if (fileLock != null) {
try {
fileLock.release(); // (6)
} catch (IOException e) {
logger.warn("Lock release failed.", e);
}
}
if (fc != null) {
try {
fc.close();
} catch (IOException e) {
// ignore
}
}
writer.close(); // (7)
reader.close();
}
return RepeatStatus.FINISHED;
}
// (8)
@Value("#{jobParameters[outputFile]}")
public void setTargetPath(String targetPath) {
this.targetPath = targetPath;
}
}
Sr. No. | Description |
---|---|
(1) |
File path to be exclusively controlled. |
(2) |
Open file to be exclusively controlled. |
(3) |
Get file channel. |
(4) |
Get file lock. |
(5) |
Execute business logic with file output. |
(6) |
Release file lock. |
(7) |
Close file to be exclusively controlled. |
(8) |
Set file path. |
About the method of FileChannel used for lock acquisition
It is recommended to use the |
Exclusive control between threads in the same VM
Attention must be paid to exclusive control between threads in the same VM.
When processing files between threads in the same VM, the lock function using the |
About appendAllowed property of FlatFileItemWriter
When creating (overwriting) a file, exclusive control can be realized by setting the |
5.4.2.2. Exclusive Control of Database
Explain exclusive control of database in TERASOLUNA Batch 5.x.
The exclusive control implementation of the database is basically How to implement while using MyBatis3 in TERASOLUNA Server 5.x Development Guideline. In this guideline, Explain it on the premise that How to implement while using MyBatis3 is done.
As shown in Relationship between Exclusive Control and Components, there are variations due to combination of processing model and component.
Exclusive control scheme |
Processing model |
Component |
Optimistic lock |
Chunk model |
ItemReader/ItemWriter |
Tasklet model |
ItemReader/ItemWriter |
|
Mapper interface |
||
Pessimistic lock |
Chunk model |
ItemReader/ItemWriter |
Tasklet model |
ItemReader/ItemWriter |
|
Mapper interface |
When using the Mapper interface in tasklet model, the explanation is omitted. Refer to How to implement while using MyBatis3.
When using ItemReader/ItemWriter in tasklet model, the calling part in the Mapper interface is replaced by ItemReader/ItemWriter, so the explanation is also omitted.
Therefore, exclusive control of chunk model will be explained here.
5.4.2.2.1. Optimistic Lock
Explain Optimistic lock in chunk model.
Since the behavior of the job changes according to the setting of the assertUpdates
property of MyBatisBatchItemWriter, it is necessary to set it appropriately according to the business requirements.
Show the job definition for optimistic lock.
<!-- (1) -->
<bean id="reader"
class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository.ExclusiveControlRepository.branchFindOne"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<property name="parameterValues">
<map>
<entry key="branchId" value="#{jobParameters[branchId]}"/>
</map>
</property>
</bean>
<!-- (2) --->
<bean id="writer"
class="org.mybatis.spring.batch.MyBatisBatchItemWriter" scope="step"
p:statementId="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository.ExclusiveControlRepository.branchExclusiveUpdate"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"
p:assertUpdates="true" /> <!-- (3) -->
<batch:job id="chunkOptimisticLockCheckJob" job-repository="jobRepository">
<batch:step id="chunkOptimisticLockCheckJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="branchEditItemProcessor"
writer="writer" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Set SQLID of data acquisition by optimistic lock. |
(2) |
Set SQLID of data update by optimistic lock. |
(3) |
Set whether to check the number of batch updates. |
5.4.2.2.2. Pessimistic Lock
Explain pessimistic lock in chunk model.
Show the job definition for pessimistic lock.
<!-- (1) -->
<bean id="reader"
class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository.ExclusiveControlRepository.branchFindOneWithNowWaitLock"
p:sqlSessionFactory-ref="jobSqlSessionFactory">
<property name="parameterValues">
<map>
<entry key="branchId" value="#{jobParameters[branchId]}"/>
</map>
</property>
</bean>
<!-- (2) -->
<bean id="writer"
class="org.mybatis.spring.batch.MyBatisBatchItemWriter" scope="step"
p:statementId="org.terasoluna.batch.functionaltest.ch05.exclusivecontrol.repository.ExclusiveControlRepository.branchUpdate"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"
p:assertUpdates="#{new Boolean(jobParameters[assertUpdates])}" />
<batch:job id="chunkPessimisticLockCheckJob" job-repository="jobRepository">
<batch:step id="chunkPessimisticLockCheckJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="branchEditItemProcessor"
writer="writer" commit-interval="10" />
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="jobExecutionLoggingListener"/>
</batch:listeners>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Set SQLID of data acquisition by pessimistic lock. |
(2) |
Set the same SQLID as SQL of data update without exclusive control. |
Behavior during exclusive control
If performing pessimistic lock by setting NO WAIT or timeout, when excluded by another transaction, an exception is thrown in the |
6. Support to abnormal system
6.1. Input Validation
6.1.1. Overview
In this section, Explain the validation check of job input data (hereinafter referred to as input validation).
This function is the same usage for chunk model and tasklet model.
In general, input validation in batch processing is often carried out
to confirm that data received from other systems etc. is valid in its own system.
Conversely, it can be said that it is unnecessary to perform input validation
on reliable data in its own system (for example, data stored in the database).
Please refer to input Validation in TERASOLUNA Server 5.x Development Guideline because the input validation duplicates the contents of TERASOLUNA Server 5.x. Explain the main comparisons below.
Comparison target | TERASOLUNA Server 5.x | TERASOLUNA Batch 5.x |
---|---|---|
Available input validation rules |
Same as TERASOLUNA Server 5.x |
|
The target to which the rule is attached |
|
|
Validation execute method |
|
|
Setting error messages |
Same as Definition of error messages in TERASOLUNA Server 5.x Development Guideline. |
|
Error message output destination |
View |
Log etc. |
The input validation to be explained in this section mainly covers data obtained from ItemReader
.
For checking job parameters, refer to Validation check of parameters.
6.1.1.1. Classification of input validation
The input validation is classified into single item check and correlation item check.
Type | Description | Example | Implementation method |
---|---|---|---|
Single item check |
Check to be completed with a single field |
Required input check |
Bean Validation (using Hibernate Validator as implementation library) |
Correlation item check |
Check to compare multiple fields |
Comparison of numerical values |
|
Spring supports Bean Validation which is a Java standard.
For this single item check, this Bean Validation is used.
For correlation item check, use Bean Validation of the org.springframework.validation.Validator
interface provided by Spring.
In this respect, same as Classification of input validation in TERASOLUNA Server 5.x Development Guideline.
6.1.1.2. Overview of Input Validation
The timing of input validation in the chunk model and tasklet model is as follows.
-
For chunk model, use
ItemProcessor
-
For tasklet model, use
Tasklet#execute()
at an arbitrary timing.
In the chunk model and tasklet model, the implementation method of input validation is the same,
so here, explain the case where input validation is done in ItemProcessor
of the chunk model.
First, explain an overview of input validation. The relationships of classes related to input validation are as follows.
-
Inject
org.springframework.batch.item.validator.SpringValidator
which is the implementation oforg.springframework.batch.item.validator.Validator
inItemProcessor
and execute the validate method.-
SpringValidator
internally holdsorg.springframework.validation.Validator
and execute the validate method.
It can be said that it is a wrapper fororg.springframework.validation.Validator
.
The implementation oforg.springframework.validation.Validator
isorg.springframework.validation.beanvalidation.LocalValidatorFactoryBean
. Use Hibernate Validator through this class.
-
-
Implement
org.springframework.batch.item.ItemCountAware
in the input DTO to determine where the input validation error occured.
Setting the number of data
|
Validators such as javax.validation.Validator or org.springframework.validation.Validator should not be used directly.
Validators such as
On the other hand, if validators such as |
Do not use org.springframework.batch.item.validator.ValidatingItemProcessor
The input validation by However, depending on the circumstances, it is necessary to extend it because of the following reasons, so do not use it from the viewpoint of unifying the implementation method.
|
6.1.2. How to use
As mentioned earlier, the implementation method of input validation is the same as TERASOLUNA Server 5.x as follows.
-
single item check uses Bean Validation.
-
correlation item check uses Bean Validation or the
org.springframework.validation.Validator
interface provided by Spring.
Explain The method of input validation in the following order.
6.1.2.1. Various settings
Use Hibernate Validator for input validation. Confirm that the definition of Hibernate Validator is in the library dependency and that the required bean definition exists. These have already been set in the blank project provided by TERASOLUNA Batch 5.x.
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-validator</artifactId>
</dependency>
<bean id="validator" class="org.springframework.batch.item.validator.SpringValidator"
p:validator-ref="beanValidator"/>
<bean id="beanValidator"
class="org.springframework.validation.beanvalidation.LocalValidatorFactoryBean" />
As mentioned earlier, for setting of error messages, refer to Definition of error messages in TERASOLUNA Server 5.x Development Guideline.
6.1.2.2. Input validation rule definition
The target of implementing the rule of input validation is the DTO obtained through ItemReader
.
Implement the DTO obtained through ItemReader
as follows.
-
Implement
org.springframework.batch.item.ItemCountAware
in the input DTO to determine where the input validation error occured.-
In the
setItemCount
method, hold a numerical value in the class field indicating the number of items read in the currently processed item received as an argument.
-
-
Define the input validation rule.
-
refer to Input Validation in TERASOLUNA Server 5.x Development Guideline.
-
Show an example of a DTO defining an input validation rule below.
public class VerificationSalesPlanDetail implements ItemCountAware { // (1)
private int count;
@NotEmpty
@Size(min = 1, max = 6)
private String branchId;
@NotNull
@Min(1)
@Max(9999)
private int year;
@NotNull
@Min(1)
@Max(12)
private int month;
@NotEmpty
@Size(min = 1, max = 10)
private String customerId;
@NotNull
@DecimalMin("0")
@DecimalMax("9999999999")
private BigDecimal amount;
@Override
public void setItemCount(int count) {
this.count = count; // (2)
}
// omitted getter/setter
}
Sr. No. | Description |
---|---|
(1) |
Implement the |
(2) |
Holds the |
6.1.2.3. Input validation execution
Explain how to implement input validation. Implement input validation execution as follows.
-
Execute
org.springframework.batch.item.validator.Validator#validate()
in the implementation ofItemProcessor
.-
Use an instance of
SpringValidator
by injecting it asValidator
field.
-
-
Handle input validation error. For details, refer to Input validation error handling.
Show an implementation example of input validation below.
@Component
public class ValidateAndContinueItemProcessor implements ItemProcessor<VerificationSalesPlanDetail, SalesPlanDetail> {
@Inject // (1)
Validator<VerificationSalesPlanDetail> validator;
@Override
public SalesPlanDetail process(VerificationSalesPlanDetail item) throws Exception {
try { // (2)
validator.validate(item); // (3)
} catch (ValidationException e) {
// omitted exception handling
}
SalesPlanDetail salesPlanDetail = new SalesPlanDetail();
// omitted business logic
return salesPlanDetail;
}
}
Sr. No. | Description |
---|---|
(1) |
Inject |
(2) |
Handle input validation error. |
(3) |
Execute |
6.1.2.4. Input validation error handling
There are following 2 ways to handle input validation error.
-
Processing is aborted at the time when input validation error occurs, abnormally end the job.
-
Leave the occurrence of input validation error in the log etc. and continue processing the subsequent data. Thereafter, at the end of the job, the job is ended by specifying a warning.
6.1.2.4.1. Abnormal Termination of Processing
In order to abnormally terminate processing when an exception occurs, it throws java.lang.RuntimeException
or its subclass.
There are two ways to perform processing such as log output when an exception occurs.
-
Catch exceptions with try/catch and do it before throwing an exception.
-
Do not catch exceptions with try/catch, implement
ItemProcessListener
and do it with the onProcessError method.-
ItemProcessListener#onProcessError()
can be implemented using the@OnProcessError
annotation. For details, refer to Listener.
-
Following is an example of logging exception information and abnormally terminating processing when an exception occurs.
@Component
public class ValidateAndAbortItemProcessor implements ItemProcessor<VerificationSalesPlanDetail, SalesPlanDetail> {
/**
* Logger.
*/
private static final Logger logger = LoggerFactory.getLogger(ValidateAndAbortItemProcessor.class);
@Inject
Validator<VerificationSalesPlanDetail> validator;
@Override
public SalesPlanDetail process(VerificationSalesPlanDetail item) throws Exception {
try { // (1)
validator.validate(item); // (2)
} catch (ValidationException e) {
// (3)
logger.error("Exception occurred in input validation at the {} th item. [message:{}]",
item.getCount(), e.getMessage());
throw e; // (4)
}
SalesPlanDetail salesPlanDetail = new SalesPlanDetail();
// omitted business logic
return salesPlanDetail;
}
}
Sr. No. | Description |
---|---|
(1) |
Catch exceptions with try/catch. |
(2) |
Execute input validation. |
(3) |
Perform log output processing before throwing an exception. |
(4) |
Throw exceptions |
@Component
public class ValidateAndAbortItemProcessor implements ItemProcessor<VerificationSalesPlanDetail, SalesPlanDetail> {
/**
* Logger.
*/
private static final Logger logger = LoggerFactory.getLogger(ValidateAndAbortItemProcessor.class);
@Inject
Validator<VerificationSalesPlanDetail> validator;
@Override
public SalesPlanDetail process(VerificationSalesPlanDetail item) throws Exception {
validator.validate(item); // (1)
SalesPlanDetail salesPlanDetail = new SalesPlanDetail();
// omitted business logic
return salesPlanDetail;
}
@OnProcessError // (2)
void onProcessError(VerificationSalesPlanDetail item, Exception e) {
// (3)
logger.error("Exception occurred in input validation at the {} th item. [message:{}]", item.getCount() ,e.getMessage());
}
}
Sr. No. | Description |
---|---|
(1) |
Execute input validation. |
(2) |
Implement |
(3) |
Perform log output processing before throwing an exception. |
Note on using ItemProcessListener#onProcessError()
Using of the onProcessError method is useful for improving the readability of source code, maintainability, etc. since business process and exception handling can be separated. When outputting log output in |
6.1.2.4.2. Skipping Error Records
After logging the information of the record where input validation error occurred, skip the record where the error occurred and continue the processing of the subsequent data as follows.
-
Catch exceptions with try/catch.
-
Perform log output etc. when an exceptions occurs.
-
Return
null
as the return value ofItemProcessor#process()
.-
By returning
null
, records in which an input validation error occurs are no longer included in subsequent processing targets (output withItemWriter
).
-
@Component
public class ValidateAndContinueItemProcessor implements ItemProcessor<VerificationSalesPlanDetail, SalesPlanDetail> {
/**
* Logger.
*/
private static final Logger logger = LoggerFactory.getLogger(ValidateAndContinueItemProcessor.class);
@Inject
Validator<VerificationSalesPlanDetail> validator;
@Override
public SalesPlanDetail process(VerificationSalesPlanDetail item) throws Exception {
try { // (1)
validator.validate(item); // (2)
} catch (ValidationException e) {
// (3)
logger.warn("Skipping item because exception occurred in input validation at the {} th item. [message:{}]",
item.getCount(), e.getMessage());
// (4)
return null; // skipping item
}
SalesPlanDetail salesPlanDetail = new SalesPlanDetail();
// omitted business logic
return salesPlanDetail;
}
}
Sr. No. | Description |
---|---|
(1) |
Catch exceptions with try/catch |
(2) |
Execute the input validation. |
(3) |
Perform log output processing before returning |
(4) |
Return |
6.1.2.4.3. Setting the exit code
When an input validation error occurs, in order to distinguish between the case where input validation error did not occur and the state of the job, be sure to set an exit code that is not a normal termination.
If data with input validation error is skipped, setting of exit code is required even when abnormal termination occurs.
For details on how to set the exit code, refer to Job Management.
6.2. Exception handling
6.2.1. Overview
How to handle exception generated at the time of job execution is explained.
Since this function has different usage for chunk model and tasklet model, each will be explained.
First, classification of exceptions is explained, and handling method according to the type of exception is explained.
6.2.1.1. Classification of exception
The exception generated at the time of job execution are classified into 3 types as below.
Sr.No. |
Classification |
Description |
|
(1) |
Exception that can be resolved the cause by job re-execution (parameters, change / modification of input data, etc.). |
An exception that can resolve the cause by re-execution of a job that handles an exception with application code and performs exception handling. |
Business exception |
(2) |
Exception that can not be resolved by job re-execution. |
Exceptions that can be resolved by job re-execution are handled with the following pattern. 1. If exception can be caught in StepListener,
handling exception with application code. 2. If exception cannot be caught in StepListener, handling exception in the framework. |
|
(3) |
(During asynchronous execution)Exception caused by illegal request for job request |
Exception caused by illegal request of job request is handled in the framework and performs exception handling. In case of Asynchronous execution(DB polling) in the polling process, the validity of the job request is not verified.Therefore, it is desirable that the input check for the request is made in advance by the application that registers the job request. In case of Asynchronous execution(Web container) It is assumed that the input check for the request is made in advance by the Web application. Therefore, exception handling is performed in an application that accepts requests or job requests. |
Avoid transaction processing within exception processing
When performing transactional processing such as writing to a database in exception processing, there is a possibility of causing a secondary exception. Exception processing should be based on output log analysis and end code setting. |
6.2.1.2. Exception type
Types of exceptions are explained.
6.2.1.2.1. Business exception
A business exception is an exception notifying that a violation of a business rule has been detected.
This exception is generated within the logic of the step.
Since it is assumed to be an application state, it is not necessary to deal with the system operator.
-
In case of stock out at inventory reserve
-
When the number of days exceeds the scheduled date
-
etc …
Applicable exception class
|
6.2.1.2.2. Library exception occurring during normal operation
A library exception that occurs during normal operation refers to an exception that may occur when the system is operating normally, among exceptions generated in the framework and library.
Exceptions raised in the framework and library are exception classes that occur in the Spring Framework and other libraries.
Since it is assumed to be an application state, it is not necessary to deal with the system operator.
-
Optimistic lock exception which occurs in exclusive control with online processing.
-
Unique constraint exception that occurs when registering the same data at same time from multiple jobs or online processing.
-
etc …
Applicable exception class
|
6.2.1.2.3. System exception
A system exception is an exception to notify that a state that should not occur is detected when the system is operating normally.
This exception is generated within the logic of the step.
Action by the system operator is required.
-
Master data, directory, file, etc. that should exist in advance do not exist.
-
When an exception classified as system abnormality is caught(IOException at file operation, etc.) among inspection exception occurring in the framework or library.
-
etc…
Applicable exception class
|
6.2.1.2.4. Unexpected system exception
Unexpected system exceptions are non-inspection exceptions that do not occur when the system is operating normally.
It is necessary for the system operator to deal with it or to analyze it by the system developer.
Unexpected system exceptions will not be handled except by doing the following processing. If handled, throw the exception again.
-
Log capture exception for analysis and set the corresponding exit code.
-
Bugs are hidden in applications, frameworks, and libraries.
-
When the DB server is down.
-
etc…
Applicable exception class
|
6.2.1.2.5. Fatal error
A fatal error is an error that notifies that a fatal problem has occurred that affects the entire system (application).
It is necessary for system operator or system developer to cope with it and recover.
Fatal errors are not handled except for the following processing.If handled, throw the exception again.
-
Log capture exception for analysis and set the corresponding exit code.
-
When memory available for Java virtual machine is insufficient.
-
etc…
Applicable exception class
|
6.2.1.2.6. Invalid job request error
Job request invalid error is an error to notify that a problem has occurred in the job request at asynchronous execution.
It is necessary for the system operator to cope with and recover from it.
Job request incorrect error is based on exception handling in the application processing the request of job request, It is not explained in this guideline.
6.2.1.3. How to handle exceptions
How to handle exceptions is explained.
The exception handling pattern is as follows.
-
Decide whether to continue the job when an exception occurs (3 types)
-
Decide how to re-execute the suspended job (2 types)
Sr.No. | How to handle exceptions | Description |
---|---|---|
(1) |
Skip error record and continue processing. |
|
(2) |
Reprocess the error record until the specified condition (number of times, time etc.) is reached. |
|
(3) |
Processing is interrupted. |
Even if an exception has not occurred, the job may stop while processing because the job has exceeded the expected processing time. |
Sr.No. | How to handle exceptions | Description |
---|---|---|
(1) |
Re-execute the suspended job from the beginning. |
|
(2) |
Re-execute the interrupted job from the point where it was interrupted. |
For details, please refer how to re-execute the suspended job Rerun processing.
6.2.1.3.1. Skip
Skipping is a method of skipping error data without stopping batch processing and continuing processing.
-
Invalid record exists in input data
-
When a business exception occurs
-
etc …
Reprocess skip record
When skipping, design how to deal with skipped invalid records. In the case of extracting and reprocessing an invalid record, a case like processing it is included at the time of the next execution,can be considered. |
6.2.1.3.2. Retry
Retrying is a method of repeatedly attempting until a specified number of times or time is reached for a record that failed a specific process.
It is used only when the cause of processing failure depends on the execution environment and it is expected to be resolved over time.
-
When the record to be processed is locked by exclusive control
-
When message transmission fails due to instantaneous interruption of network
-
etc …
Application of retry
If the retry is applied in every scene, the processing time unnecessarily increases at the time of occurrence of an abnormality, there is a risk that the detection of the abnormality will be delayed |
6.2.1.3.3. Process interruption
Process interruption is literally a method of interrupting processing midway.
It is used when processing cannot be continued on detecting an erroneous content or when there is requirement which does not allow skipping of records.
-
Invalid record exists in input data
-
When a business exception occurs
-
etc …
6.2.2. How to use
How to implement exception handling is explained.
A log is the main user interface for batch application operation.Therefore, monitoring of exception occurred will also be done through the log.
In Spring Batch, if an exception occurs during step execution, the log is output and abnormally terminated, so there is a possibility that the requirement can be satisfied without additional implementation by the user. The following explanation should be implemented pinpoint only when it is necessary for the user to output logs according to the system. Basically there are no case where all processing must be implemented.
For common log setting of exception handling, please refer Logging.
6.2.2.1. Step unit exception handling
Exception handling method in step units is explained.
- Exception handling with ChunkListener interface
-
If you want to handle exceptions uniquely regardless of the processing model, use ChunkListener interface.
Although it can be implemented by using a step or job listener which is wider in scope than chunk AdoptChunkListener
emphasizing in handling it immediately as soon as possible after occurrence.
The exception handling method for each processing model is as follows.
- Exception handling in chunk model
-
Implement the function using various Listener interfaces provided by Spring Batch.
- Exception handling in tasklet model
-
Implement exception handling independently within tasklet implementation.
Why unified handling possible with ChunkListener.
A sense of incompatibility might be felt with This point also appears in
|
6.2.2.1.1. Exception handling with ChunkListener interface
Implement afterChunkError
method of ChunkListener interface.
Get error information from ChunkContext
argument of afterChunkError
method using ChunkListener.ROLLBACK_EXCEPTION_KEY
as a key.
For details on how to set the listener,please refer Listerner setting.
@Component
public class ChunkAroundListener implements ChunkListener {
private static final Logger logger =
LoggerFactory.getLogger(ChunkAroundListener.class);
@Override
public void beforeChunk(ChunkContext context) {
logger.info("before chunk. [context:{}]", context);
}
@Override
public void afterChunk(ChunkContext context) {
logger.info("after chunk. [context:{}]", context);
}
// (1)
@Override
public void afterChunkError(ChunkContext context) {
logger.error("Exception occurred while chunk. [context:{}]", context,
context.getAttribute(ChunkListener.ROLLBACK_EXCEPTION_KEY)); // (2)
}
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Get error information from |
Difference in behavior of ChunkListener due to difference in processing model
In the chunk model, handling is not performed with the afterChunkError method because exceptions caused by opening / closing resources are outside the scope captured by the ChunkListener interface. A schematic diagram is shown below. Schematic diagram of exception handling in chunk model
In the tasklet model, exceptions caused by opening and closing resources are handled by the afterChunkError method because they are within the scope captured by the ChunkListener interface. A schematic diagram is shown below. Schematic diagram of exception handling in the tasklet model
If you wish to handle exceptions unifiedly by absorbing this behavior difference,
it can be implemented by checking the occurrence of an exception in the StepExecutionListener interface.
However, the implementation is slightly more complicated than Example of StepExecutionListener implementation.
|
6.2.2.1.2. Exception handling in chunk model
In the chunk model, exception handling with an inherited Listener StepListener.
For details on how to set listener, please refer Listner setting.
By implementing onReadError
method of ItemReadListener interface,
exceptions raised within ItemReader are handled.
@Component
public class CommonItemReadListener implements ItemReadListener<Object> {
private static final Logger logger =
LoggerFactory.getLogger(CommonItemReadListener.class);
// omitted.
// (1)
@Override
public void onReadError(Exception ex) {
logger.error("Exception occurred while reading.", ex); // (2)
}
// omitted.
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
There are two ways to handle exception in ItemProcessor, and use it according to requirements.
-
How to try ~ catch in ItemProcessor
-
Using ItemProcessListener interface.
Why they are used properly is explained.
The argument of the onProcessError
method executed when an exception occurs in ItemProcessor processing has two items, processing item and exception.
Depending on the requirements of the system, when handling exceptions such as log output in the ItemProcessListener
interface, these two arguments may not satisfy the requirement.
In that case, it is recommended to catch the exception with try ~ catch in ItemProcessor and perform exception handling processing.
Note that implementing try ~ catch in ItemProcessor and implementing the ItemProcessListener
interface may result in double processing, so care must be taken
If fine-grained exception handling is to be done, then adopt a method to try ~ catch in ItemProcessor.
Each method is explained.
- How to try ~ catch in ItemProcessor
-
This is used to do fine-grained exception handling.
As explained in the skip section below,it will be used when doing error record of Skip.
@Component
public class AmountCheckProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPerformanceDetail> {
// omitted.
@Override
public SalesPerformanceDetail process(SalesPerformanceDetail item)
throws Exception {
// (1)
try {
checkAmount(item.getAmount(), amountLimit);
} catch (ArithmeticException ae) {
// (2)
logger.error(
"Exception occurred while processing. [item:{}]", item, ae);
// (3)
throw new IllegalStateException("check error at processor.", ae);
}
return item;
}
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
(3) |
Throw a transaction rollback exception. |
- How to use the
ItemProcessListener
interface -
Use this,if business exceptions can be handled in the same way.
@Component
public class CommonItemProcessListener implements ItemProcessListener<Object, Object> {
private static final Logger logger =
LoggerFactory.getLogger(CommonItemProcessListener.class);
// omitted.
// (1)
@Override
public void onProcessError(Object item, Exception e) {
// (2)
logger.error("Exception occurred while processing. [item:{}]", item, e);
}
// omitted.
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
By implementing the onWriteError
method of ItemWriteListener interface
exceptions raised within ItemWriter are handled.
@Component
public class CommonItemWriteListener implements ItemWriteListener<Object> {
private static final Logger logger =
LoggerFactory.getLogger(CommonItemWriteListener.class);
// omitted.
// (1)
@Override
public void onWriteError(Exception ex, List item) {
// (2)
logger.error("Exception occurred while processing. [items:{}]", item, ex);
}
// omitted.
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
6.2.2.1.3. Exception handling in tasklet model
Implement exception handling of tasklet model on its own in tasklet.
When performing transaction processing, be sure to throw the exception again in order to roll back.
@Component
public class SalesPerformanceTasklet implements Tasklet {
private static final Logger logger =
LoggerFactory.getLogger(SalesPerformanceTasklet.class);
// omitted
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
// (1)
try {
reader.open(chunkContext.getStepContext().getStepExecution()
.getExecutionContext());
List<SalesPerformanceDetail> items = new ArrayList<>(10);
SalesPerformanceDetail item = null;
do {
// Pseudo operation of ItemReader
// omitted
// Pseudo operation of ItemProcessor
checkAmount(item.getAmount(), amountLimit);
// Pseudo operation of ItemWriter
// omitted
} while (item != null);
} catch (Exception e) {
logger.error("exception in tasklet.", e); // (2)
throw e; // (3)
} finally {
try {
reader.close();
} catch (Exception e) {
// do nothing.
}
}
return RepeatStatus.FINISHED;
}
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
(3) |
Throw the exception again to roll back the transaction. |
6.2.2.2. Job-level exception handling
Exception handling method on a job level is explained.
It is a common handling method for chunk model and tasklet model.
Implement errors such as system exception and fatal error etc. in job level JobExecutionListener interface.
In order to collectively define exception handling processing, handling is performed on a job level without defining handling processing for each step.
In the exception handling here, do output log and setting ExitCode, do not implement transaction processing.
Prohibition of transaction processing
The processing performed by |
Here, an example of handling an exception when occurs in ItemProcessor is shown. For details on how to set the listener,please refer Listner setting.
@Component
public class AmountCheckProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPerformanceDetail> {
// omitted.
private StepExecution stepExecution;
// (1)
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
@Override
public SalesPerformanceDetail process(SalesPerformanceDetail item)
throws Exception {
// (2)
try {
checkAmount(item.getAmount(), amountLimit);
} catch (ArithmeticException ae) {
// (3)
stepExecution.getExecutionContext().put("ERROR_ITEM", item);
// (4)
throw new IllegalStateException("check error at processor.", ae);
}
return item;
}
}
@Component
public class JobErrorLoggingListener implements JobExecutionListener {
private static final Logger logger =
LoggerFactory.getLogger(JobErrorLoggingListener.class);
@Override
public void beforeJob(JobExecution jobExecution) {
// do nothing.
}
// (5)
@Override
public void afterJob(JobExecution jobExecution) {
// whole job execution
List<Throwable> exceptions = jobExecution.getAllFailureExceptions(); // (6)
// (7)
if (exceptions.isEmpty()) {
return;
}
// (8)
logger.info("This job has occurred some exceptions as follow. " +
"[job-name:{}] [size:{}]",
jobExecution.getJobInstance().getJobName(), exceptions.size());
exceptions.forEach(th -> logger.error("exception has occurred in job.", th));
// (9)
jobExecution.getStepExecutions().forEach(stepExecution -> {
Object errorItem = stepExecution.getExecutionContext()
.get("ERROR_ITEM"); // (10)
if (errorItem != null) {
logger.error("detected error on this item processing. " +
"[step:{}] [item:{}]", stepExecution.getStepName(),
errorItem);
}
});
}
}
Sr.No. | Description |
---|---|
(1) |
In order to output error data with |
(2) |
Implement |
(3) |
Implement exception handling. |
(4) |
Throw an exception to do exception handling with |
(5) |
Implement exception handling in |
(6) |
From the argument |
(7) |
If there is no error information, it is determined as normal termination. |
(8) |
If there is error information, exception handling is performed. |
(9) |
In this example, when there is error data, log output is performed. |
Object to be stored in ExecutionContext
The object to be stored in |
6.2.2.3. Determination as to whether processing can be continued
How to decide whether or not to continue processing jobs when an exception occurs is explained.
6.2.2.3.1. Skip
A method of skipping an erroneous record and continuing processing is described.
In the chunk model, the implementation method differs for components of each processing
before applying the contents described here, make sure to read About reason why <skippable-exception-classes> is not used. |
- Skip with ItemReader
-
Specify the skip method in
skip-policy
attribute of<batch:chunk>
. In<batch:skippable-exception-classes>
, specify the exception class that occurs in the ItemReader to be skipped
For theskip-policy
attribute, use one of the following classes provided by Spring Batch.
Class name | Description |
---|---|
Always skip. |
|
Do not skip. |
|
Skip until the upper limit of the specified number of skips is reached. This is the skipping method used by default when |
|
Use this when you want to change |
Implementation example of skipping is explained.
Handle case where an incorrect record exists when reading a CSV file with FlatFileItemReader
.
The following exceptions occur at this time.
-
org.springframework.batch.item.ItemReaderException
(Base exception class)-
org.springframework.batch.item.file.FlatFileParseException
(Exception occured class)
-
skip-policy
shows how to define it separately.
<bean id="detailCSVReader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="file:#{jobParameters[inputFile]}"/>
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="branchId,year,month,customerId,amount"/>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"
p:targetType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail"/>
</property>
</bean>
</property>
</bean>
- AlwaysSkipItemSkipPolicy
<!-- (1) -->
<bean id="skipPolicy"
class="org.springframework.batch.core.step.skip.AlwaysSkipItemSkipPolicy"/>
<batch:job id="jobSalesPerfAtSkipAllReadError" job-repository="jobRepository">
<batch:step id="jobSalesPerfAtSkipAllReadError.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"
skip-policy="skipPolicy"> <!-- (2) -->
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
Define |
(2) |
Set the bean defined in (1) to the |
- NeverSkipItemSkipPolicy
<!-- (1) -->
<bean id="skipPolicy"
class="org.springframework.batch.core.step.skip.NeverSkipItemSkipPolicy"/>
<batch:job id="jobSalesPerfAtSkipNeverReadError" job-repository="jobRepository">
<batch:step id="jobSalesPerfAtSkipNeverReadError.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"
skip-policy="skipPolicy"> <!-- (2) -->
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
Define |
(2) |
Set the bean defined in (1) to the |
- LimitCheckingItemSkipPolicy
(1)
<!--
<bean id="skipPolicy"
class="org.springframework.batch.core.step.skip.LimitCheckingItemSkipPolicy"/>
-->
<batch:job id="jobSalesPerfAtValidSkipReadError" job-repository="jobRepository">
<batch:step id="jobSalesPerfAtValidSkipReadError.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"
skip-limit="2"> <!-- (2) -->
<!-- (3) -->
<batch:skippable-exception-classes>
<!-- (4) -->
<batch:include
class="org.springframework.batch.item.ItemReaderException"/>
</batch:skippable-exception-classes>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
Define |
(2) |
Set the upper limit value of skip number in the |
(3) |
Define |
(4) |
Set |
- ExceptionClassifierSkipPolicy
<!-- (1) -->
<bean id="skipPolicy"
class="org.springframework.batch.core.step.skip.ExceptionClassifierSkipPolicy">
<property name="policyMap">
<map>
<!-- (2) -->
<entry key="org.springframework.batch.item.ItemReaderException"
value-ref="alwaysSkip"/>
</map>
</property>
</bean>
<!-- (3) -->
<bean id="alwaysSkip"
class="org.springframework.batch.core.step.skip.AlwaysSkipItemSkipPolicy"/>
<batch:job id="jobSalesPerfAtValidNolimitSkipReadError"
job-repository="jobRepository">
<batch:step id="jobSalesPerfAtValidNolimitSkipReadError.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<!-- skip-limit value is dummy. -->
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"
skip-policy="skipPolicy"> <!-- (4) -->
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
Define |
(2) |
Set the |
(3) |
Define the skipping method you want to execute by exception. |
(4) |
Set the bean defined in (1) to the |
- Skip on ItemProcessor
-
Try ~ catch in ItemProcessor and return null.
Skip withskip-policy
is not used because reprocessing occurs in ItemProcessor. For details, please refer About reason why <skippable-exception-classes> is not used.
Restrictions on exception handling in ItemProcessor
As in About reason why <skippable-exception-classes> is not used,、
In ItemProcessor, skipping using |
Implementation example of skip.
Implementation example of try~catch in ItemProcessor of Coding point (ItemProcessor) correspond to skip.
@Component
public class AmountCheckProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPerformanceDetail> {
// omitted.
@Override
public SalesPerformanceDetail process(SalesPerformanceDetail item) throws Exception {
// (1)
try {
checkAmount(item.getAmount(), amountLimit);
} catch (ArithmeticException ae) {
logger.warn("Exception occurred while processing. Skipped. [item:{}]",
item, ae); // (2)
return null; // (3)
}
return item;
}
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling |
(3) |
Skip error data by returning null. |
- Skip with ItemWriter
-
In ItemWriter skip processing is not done generally.
Even when skipping is necessary, skipping byskip-policy
will not be used as the chunk size will change. For details, please refer About reason why <skippable-exception-classes> is not used.
Handle exceptions in business logic and implement processing to skip error records independently.
Implementation example of Exception handling in tasklet model corresponds to skip.
@Component
public class SalesPerformanceTasklet implements Tasklet {
private static final Logger logger =
LoggerFactory.getLogger(SalesPerformanceTasklet.class);
// omitted
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
// (1)
try {
reader.open(chunkContext.getStepContext().getStepExecution()
.getExecutionContext());
List<SalesPerformanceDetail> items = new ArrayList<>(10);
SalesPerformanceDetail item = null;
do {
// Pseudo operation of ItemReader
// omitted
// Pseudo operation of ItemProcessor
checkAmount(item.getAmount(), amountLimit);
// Pseudo operation of ItemWriter
// omitted
} while (item != null);
} catch (Exception e) {
logger.warn("exception in tasklet. Skipped.", e); // (2)
continue; // (3)
} finally {
try {
reader.close();
} catch (Exception e) {
// do nothing.
}
}
return RepeatStatus.FINISHED;
}
}
Sr.No. | Description |
---|---|
(1) |
Implement |
(2) |
Implement exception handling. |
(3) |
Processing of error data is skipped by continue. |
6.2.2.3.2. Retry
When an exception is detected, a method of reprocessing until the specified number of times is reached is described.
For retry, it is necessary to consider various factors such as the presence or absence of state management and the situation where retry occurs, there is no reliable method, and retrying it unnecessarily deteriorates the situation.
Therefore, this guideline explains how to use org.springframework.retry.support.RetryTemplate
which implements a local retry.
As with skip, there is also a way to specify the target exception class with |
public class RetryableAmountCheckProcessor implements
ItemProcessor<SalesPerformanceDetail, SalesPerformanceDetail> {
// omitted
// (1)
private RetryPolicy retryPolicy;
@Override
public SalesPerformanceDetail process(SalesPerformanceDetail item)
throws Exception {
// (2)
RetryTemplate rt = new RetryTemplate();
if (retryPolicy != null) {
rt.setRetryPolicy(retryPolicy);
}
try {
// (3)
rt.execute(context -> {
item.setAmount(item.getAmount().divide(new BigDecimal(10)));
checkAmount(item.getAmount(), amountLimit);
return null;
});
} catch (ArithmeticException ae) {
// (4)
throw new IllegalStateException("check error at processor.", ae);
}
return item;
}
public void setRetryPolicy(RetryPolicy retryPolicy) {
this.retryPolicy = retryPolicy;
}
}
<!-- omitted -->
<bean id="amountCheckProcessor"
class="org.terasoluna.batch.functionaltest.ch06.exceptionhandling.RetryableAmountCheckProcessor"
scope="step"
p:retryPolicy-ref="retryPolicy"/> <!-- (5) -->
<!-- (6) (7) (8)-->
<bean id="retryPolicy" class="org.springframework.retry.policy.SimpleRetryPolicy"
c:maxAttempts="3"
c:retryableExceptions-ref="exceptionMap"/>
<!-- (9) -->
<util:map id="exceptionMap">
<entry key="java.lang.ArithmeticException" value="true"/>
</util:map>
<batch:job id="jobSalesPerfWithRetryPolicy" job-repository="jobRepository">
<batch:step id="jobSalesPerfWithRetryPolicy.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"/>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
Store the retry condition. |
(2) |
Create an instance of RetryTemplate. |
(3) |
Use the |
(4) |
Exception handling when the number of retries exceeds the specified number of times |
(5) |
Specify the retry condition defined in (6). |
(6) |
Define the retry condition in the class that implements |
(7) |
Specify the number of retries in the |
(8) |
Specify the map that defines the target exception to be retried defined in (9) in |
(9) |
Define a map in which the target exception class to be retried and the boolean value are set in the key |
6.2.2.3.3. Process interruption
If you want to abort step execution, throw RuntimeException or its subclass other than skip / retry object.
Implementation example of skip is shown based on LimitCheckingItemSkipPolicy
<batch:job id="jobSalesPerfAtValidSkipReadError" job-repository="jobRepository">
<batch:step id="jobSalesPerfAtValidSkipReadError.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"
skip-limit="2">
<batch:skippable-exception-classes>
<!-- (1) -->
<batch:include class="org.springframework.batch.item.validator.ValidationException"/>
</batch:skippable-exception-classes>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
If an exception other than |
An implementation example of retry is shown based on Retry.
<!-- omitted -->
<bean id="retryPolicy" class="org.springframework.retry.policy.SimpleRetryPolicy"
c:maxAttempts="3"
c:retryableExceptions-ref="exceptionMap"/>
<util:map id="exceptionMap">
<!-- (1) -->
<entry key="java.lang.UnsupportedOperationException" value="true"/>
</util:map>
<batch:job id="jobSalesPerfWithRetryPolicy" job-repository="jobRepository">
<batch:step id="jobSalesPerfWithRetryPolicy.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="detailCSVReader"
processor="amountCheckProcessor"
writer="detailWriter" commit-interval="10"/>
</batch:tasklet>
</batch:step>
</batch:job>
Sr.No. | Description |
---|---|
(1) |
If an exception other than |
6.2.3. Appendix
6.2.3.1. About reason why <skippable-exception-classes> is not used
Spring Batch provides a function to specify an exception to be skipped for the entire job, skip processing the item where the exception occurred, and continue the processing.
It implements the function by setting the <skippable-exception-classes>
tag under the <chunk>
tag and specifying the exception to be skipped as follows.
<job id="flowJob">
<step id="retryStep">
<tasklet>
<chunk reader="itemReader" writer="itemWriter"
processor="itemProcessor" commit-interval="20"
skip-limit="10">
<skippable-exception-classes>
<!-- specify exceptions to the skipped -->
<include class="java.lang.Exception"/>
<exclude class="java.lang.NullPointerException"/>
</skippable-exception-classes>
</chunk>
</tasklet>
</step>
</job>
By using this function, it is possible to skip the record where the input check error has occurred and continue processing the subsequent data, For TERASOLUNA Batch 5.x it is not used for the following reasons.
-
If an exception is skipped using the
<skippable-exception-classes>
tag, since the number of data items included in one chunk varies, performance deterioration may occur.-
This depends on where the exception occurred (
ItemReader
/ItemProcessor
/ItemWriter
). Details are described later.
-
Avoid using SkipPolicy without defining <skippable-exception-classes>
All exceptions are implicitly registered, and the possibility of performance degradation increases dramatically. |
The behavior of each exception occurrence ( ItemReader
/ ItemProcessor
/ ItemWriter
) is explained respectively.
The transaction operation is not processed regardless of where the exception occurred, but if an exception occurs, it is always rolled back and then processed again.
- When an exception occurs in ItemReader
-
-
When an exception occurs in the process of
ItemReader
, the processing object moves to the next item. -
There is no side effect by this.
-
- When an exception occurs in ItemProcessor
-
-
If an exception occurs within the processing of
ItemProcessor
, return to the beginning of the chunk and reprocess from the first. -
Items to be skipped for reprocessing are not included.
-
The chunk size at the first processing and reprocessing does not change.
-
- When an exception occurs in ItemWriter
-
-
If an exception occurs within the processing of
ItemWriter
, return to the beginning of the chunk and reprocess from the first. -
Reprocessing is fixed to
ChunkSize=1
and executed one by one. -
Items to be skipped for reprocessing are also included.
-
If an exception occurs in ItemProcessor
, considering the case of ChunkSize=1000
as an example,
when an exception occurs on the 1000th case, reprocessing is done from the 1st and total of 1999 processes are executed.
If an exception occurs in ItemWriter
, ChunkSize=1
is fixed and reprocessed.
Considering the case of ChunkSize = 1000
as an example,
it is divided into 1000 transactions regardless of originally 1 transaction and processed.
These means that the processing time of the entire job is prolonged, and there is a high possibility of deteriorating the situation at the time of abnormality. In addition, there is a possibility that the double treatment itself becomes a problem, and it gives rise to additional consideration to the design manufacturing.
Therefore, we do not recommend using <skippable-exception-classes>
.
Skipping data that failed in ItemReader
does not cause these problems,
in order to prevent accidents, basically avoid it and apply it only when it is absolutely necessary.
6.3. Restart processing
6.3.1. Overview
After the job gets terminated abnormally due to occurence of some failure,means to recover to restart the job is explained.
Since this function has different usage for chunk model and tasklet model, each will be explained.
There are the following methods to restart a job.
-
Job rerun
-
Job restart
-
Stateless restart
-
Number based restart
-
-
Stateful restart
-
Determine processing status, restart process to extract unprocessed data
-
It is necessary to separately implement a process for identifying the processing state
-
-
-
Below is terminology definition:
- Rerun
-
Redoing the job from the beginning.
As a preliminary work, it is necessary to recover to the state before failure occurred such as initializing data,at the time of starting the job. - Restart
-
Resume the processing from where the job was interrupted.
It is necessary to design/implement retention of restart position processing, acquisition method, data skip method till restart position etc in advance.
There are two types of restart, stateless and stateful. - Stateless restart
-
A restart method not considering the state (unprocessed / processed) for each input data.
- Number based restart
-
One of stateless restart.
A method of retaining the processed input data count and skipping that input data at the time of restart.
If the output is a non-transactional resource, it is also necessary to hold the output position and move the write position to that position at the time of restart. - Stateful restart
-
A restart method in which the state (unprocessed / processed) for each input data is judged, and only unprocessed data is acquired as an acquisition condition.
If the output is a non-transactional resource, make the resource additional, and at the time of restart, add it to the previous result.
Generally rerun is the easiest way to re-execute. With Rerun < Stateless restart < Stateful restart order, it becomes difficult to design and implement. Of course, it is always preferable to use rerun if possible, For each job that the user implements, please consider which method to apply depending on the allowable batch window and processing characteristics.
6.3.2. How to use
Implementation method of Rerun and restart is explained.
6.3.2.1. Job rerun
How to implement job rerun is explained.
-
Preliminary work of data recovery such as initialization of data before re-run is carried out.
-
Execute the failed job again with the same condition (same parameter).
-
In Spring Batch, if you execute a job with the same parameters, it will be treated as double execution, but TERASOLUNA Batch 5.x treats it as a separate job
For details,please referAbout parameter conversion class.
-
6.3.2.2. Job restart
How to restart a job is explained.
When restarting a job, it is basically done on a job executed synchronously.
It is recommended that asynchronously executed jobs should be designed with a corresponding job design with a rerun instead of a restart. This is difficult to judge whether it is "intended restart execution" or "unintended duplicate execution", this is because there is a possibility of confusion in operation.
If restart requirements can not be excluded for asynchronous execution job, The following methods can be used to clarify "intended restart execution".
-
Restart by
-restart
ofCommandLineJobRunner
-
Restart asynchronously executed job separately from synchronous execution. It becomes effective when progressing the recovery process sequentially.
-
-
Restart by
JobOperator#restart(JobExecutionId)
-
Restart the asynchronously executed job on the mechanism of asynchronous execution again.It is effective when progressing with recovery processing collectively.
-
Asynchronous execution(DB polling) does not support restart. Therefore, it is necessary to implement it separately by the user.
-
Asynchronous execution(Web container) guides how to implement restart.User implements it according to this description.
-
-
About restart when there is input check
The input check error is not recoverable unless the input resource causing the check error is corrected. For reference, an input resource correction example when an input check error occurs is shown below.
|
In the case of multiple processing (Partition Step)
When restarting in multiple processing(Partition Step),
processing is carried out again from split processing.
When all of the data are processed as the result of dividing the data、unnecessary splitting is performed and recorded on |
6.3.2.3. Stateless restart
How to implement stateless restart is explained.
Stateless restart with TERASOLUNA Batch 5.x refers to a number based restart.This is implemented by using the mechanism of Spring Batch as it is.
The number based restart can be used in job execution of chunk model.
In addition, the number based restart uses context information about inputs and outputs registered in JobRepository
.
Therefore, in a number based restart, it is assumed that JobRepository
does not use the in-memory database, but uses persistence guaranteed.
About failure occurence of JobRepository
Updating to |
- Input at restart
-
Since most of the ItemReaders provided by Spring Batch are compatible with the number-based restart, special support is not necessary.
If you want to create a number based restartable ItemReader yourself, the following abstract classes can be extended that have restart processing implemented.-
-
org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader
-
-
-
Since the number based restart determines the restart starting point based only on the number of items to the last, it can not detect change / addition / deletion of input data. Often the input data is corrected after the job ends abnormally, When data change like the following, be careful as there will be a difference in the output between, the result after the job ends normally and the result after the job ends abnormally is restarted and recovered.
-
-
Change the data acquisition order
-
At the time of restart, duplicate or unprocessed data will get generated, so never go as it results in a different recovery result from the result of rerun.
-
-
Update processed data
-
Since the data updated at the time of restarting is skipped, it is not preferred as there are cases where rerun result and the recovered result by restart result changes.
-
-
Update or add unprocessed data
-
As rerun result and recovered result becomes same, its allowed. However, it is different from the result of the normal termination in the first execution. This should be used when patching abnormal data in an emergency coping manner or when processing as much as possible data received at the time of execution.
-
-
-
- Output at Restart
-
Care must be taken in output to non-transactional resources. For example in a file, it is necessary to grasp the position to which the output was made and output from that position.
TheFlatFileItemWriter
provided by Spring Batch gets the previous output position from the context and outputs from that position at the time of restart, so no special countermeasure is necessary.
For transactional resources, since rollback is performed at the time of failure, it is possible to perform processing without taking any special action at restart.
If the above conditions are satisfied, add the option -restart
to the failed job and execute it again.
Below an example of job restart is shown.
# (1)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> -restart
Sr.No. |
Description |
(1) |
At |
An example of restarting a job executed in asynchronous execution (DB polling) is shown below.
# (1)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <JobExecutionId> -restart
Sr.No. |
Description |
(1) |
Run The job execution ID can be acquired from the job-request-table. About the job-request-table, please refer About polling table. |
Output log of job execution ID
In order to promptly specify the job execution ID of the abnormally terminated job, It is recommended to implement a listener or exception handling class that logs the job execution ID when the job ends or when an exception occurs. |
An example of restart in asynchronous execution (Web container) is shown below.
public long restart(long JobExecutionId) throws Execption {
return jobOperator.restart(JobExecutionId); // (1)
}
Sr.No. |
Description |
(1) |
Specify the same job execution ID (JobExecutionId) as the failed job to The job execution ID can be obtained from the ID acquired when executing the job with the web application or from |
6.3.2.4. Stateful restart
How to achieve stateful restart is explained.
Stateful restart is a method of reprocessing by acquiring only unprocessed data together with input/output results at the time of execution. Although this method is difficult to design such as state retaining / determination unprocessed etc, it is sometimes used because it has a strong characteristic in data change.
In stateful restart, since restart conditions are determined from input / output resources, persistence of JobRepository
becomes unnecessary.
- Input at restart
-
Prepare an ItemReader that implements logic that acquires only unprocessed data with input / output results.
- Output at restart
-
Similar to Stateless restart caution is required for output to non-transactional resource.
In the case of a file, assuming that the context is not used, it is necessary to design such that file addition is permitted.
Stateful restart,similar to Job rerun reruns the job with the same condition as with the failed job.
Unlike stateless restart, -restart
option is not used.
An example of implementing an easy stateful restart is shown below.
-
Define a processed column in the input target table, and update it with a value other than NULL if the processing succeeds.
-
For the extraction condition of unprocessed data, the value of the processed column is NULL.
-
-
Output the processing result to a file.
<!-- (1) -->
<select id="findByProcessedIsNull"
resultType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
<![CDATA[
SELECT
branch_id AS branchId, year, month, customer_id AS customerId, amount
FROM
sales_plan_detail
WHERE
processed IS NULL
ORDER BY
branch_id ASC, year ASC, month ASC, customer_id ASC
]]>
</select>
<!-- (2) -->
<update id="update" parameterType="org.terasoluna.batch.functionaltest.app.model.plan.SalesPlanDetail">
<![CDATA[
UPDATE
sales_plan_detail
SET
processed = '1'
WHERE
branch_id = #{branchId}
AND
year = #{year}
AND
month = #{month}
AND
customer_id = #{customerId}
]]>
</update>
<!-- (3) -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader"
p:queryId="org.terasoluna.batch.functionaltest.ch06.reprocessing.repository.RestartOnConditionRepository.findByZeroOrLessAmount"
p:sqlSessionFactory-ref="jobSqlSessionFactory"/>
<!-- (4) -->
<bean id="dbWriter" class="org.mybatis.spring.batch.MyBatisBatchItemWriter"
p:statementId="org.terasoluna.batch.functionaltest.ch06.reprocessing.repository.RestartOnConditionRepository.update"
p:sqlSessionTemplate-ref="batchModeSqlSessionTemplate"/>
<bean id="fileWriter"
class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"
p:resource="file:#{jobParameters[outputFile]}"
p:appendAllowed="true"> <!-- (5) -->
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor"
p:names="branchId,year,month,customerId,amount"/>
</property>
</bean>
</property>
</bean>
<!-- (6) -->
<bean id="compositeWriter" class="org.springframework.batch.item.support.CompositeItemWriter">
<property name="delegates">
<list>
<ref bean="fileWriter"/>
<ref bean="dbWriter"/>
</list>
</property>
</bean>
<batch:job id="restartOnConditionBasisJob"
job-repository="jobRepository" restartable="false"> <!-- (7) -->
<batch:step id="restartOnConditionBasisJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="amountUpdateItemProcessor"
writer="compositeWriter" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
# (8)
java -cp dependency/* org.springframework.batch.core.launch.support.CommandLineJobRunner <jobPath> <jobName> <jobParameters> ...
Sr.No. | Description |
---|---|
(1) |
Define SQL so that the processed column has only NULL data. |
(2) |
Define SQL to update processed columns with non-NULL |
(3) |
For ItemReader, set the SQLID defined in (1). |
(4) |
For updating to the database, set the SQLID defined in (2). |
(5) |
At restart,allow addition of files in order to make it possible to write from the last interruption point. |
(6) |
Set |
(7) |
It is not mandatory, but set the |
(8) |
Execute again according to the execution condition of the failed job. |
About the job’s restartable attribute
If |
7. Job Management
7.1. Overview
Explain how to manage job execution.
This function is the same usage for chunk model and tasklet model.
7.1.1. What is Job Execution Management?
It means to record the activation state and execution result of the job and maintain the batch system. In particular, it is important to secure necessary information in order to detect when an abnormality has occurred and determine what action should be taken next (such as rerun / restart after abnormal termination). Due to the characteristics of the batch application, it is rare that the result can be confirmed on the user interface immediately after startup. Therefore, it is necessary to have a mechanism to record execution status and results separately from job execution, such as job scheduler / RDBMS / application log.
7.1.1.1. Functions Offered by Spring Batch
Spring Batch provides the following interface for job execution management.
Function | Corresponding interface |
---|---|
Record job execution status/result |
|
Convert job exit code and process exit code |
|
Spring Batch uses JobRepository
for recording the job’s activation status and execution result.
For TERASOLUNA Batch 5.x, if all of the following are true, persistence is optional:
-
Using TERASOLUNA Batch 5.x only for synchronous job execution.
-
Managing all job execution with the job scheduler including job stop/restart.
-
Especially not using restart assuming
JobRepository
of Spring Batch.
-
When these are applicable, use H2
which is an in-memory/built-in database as an option of RDBMS used by JobRepository
.
On the other hand, when using asynchronous execution or when using stop/restart of Spring Batch, an RDBMS that can persist the job execution status/result is required.
Default transaction isolation level
In xsd provided by Spring Batch, the transaction isolation level of |
In-memory JobRepository Options
Spring Batch has |
For the job execution management using job scheduler, refer to the manual of each product.
In this guideline,
explain the following items related to managing the job status using JobRepository
within TERASOLUNA Batch 5.x.
-
-
How to persist state
-
How to check the status
-
How to stop the job manually
-
7.2. How to use
JobRepository
provided by Spring Batch registers/updates the job status/execution result in RDBMS automatically.
When confirming them, select one of the following methods so that unintended change processing is not performed from inside or outside the batch application.
-
Query the table relating to Job Status Management
-
Use
org.springframework.batch.core.explore.JobExplorer
7.2.1. Job Status Management
Explain job status management method using JobRepository
.
By Spring Batch, the following Entities are registered in the RDBMS table.
Sr. No. | Entity class | Table name | Generation unit | Desctiption |
---|---|---|---|---|
(1) |
|
|
Execution of one job |
Maintain job status/execution result. |
(2) |
|
|
Execution of one job |
Maintain the context inside the job. |
(3) |
|
|
Execution of one job |
Hold job parameters given at startup. |
(4) |
|
|
Execution of one step |
Maintain the state/execution result of the step, commit/rollback number. |
(5) |
|
|
Execution of one step |
Maintain the context inside the step. |
(6) |
|
|
Combination of job name and job parameter |
Hold job name and string serialized job parameter. |
For example, when three steps are executed with one job execution, the following difference occurs.
-
JobExecution
,JobExecutionContext
andJobExecutionParams
register 1 record. -
StepExecution
andStepExecutionContext
register 3 record.
Also, JobInstance
is used to suppress double execution by the same name job and same parameter started in the past,
TERASOLUNA Batch 5.x does not check this. For details, refer to Double Activation Prevention.
The structure of each table by |
About the item count of StepExecution in the chunk method
As shown below, it seems that inconsistency is occurring, but there are cases where it is reasonable from the specification.
|
7.2.1.1. Status Persistence
By using external RDBMS, job execution management information by JobRepository
can be made persistent.
To enable this, change the following items in batch-application.properties to be data sources, schema settings for external RDBMS.
# (1)
# Admin DataSource settings.
admin.jdbc.driver=org.postgresql.Driver
admin.jdbc.url=jdbc:postgresql://serverhost:5432/admin
admin.jdbc.username=postgres
admin.jdbc.password=postgres
# (2)
spring-batch.schema.script=classpath:org/springframework/batch/core/schema-postgresql.sql
Sr. No. | Description |
---|---|
(1) |
Describe the setting of the external RDBMS to be connected as the value of the property to which the prefix |
(2) |
Specify a script file to automatically generate the schema as |
Supplementary to administrative/business data sources
|
7.2.1.2. Confirmation of job status/execution result
Explain how to check the job execution status from JobRepository
In either method, the job execution ID to be checked is known in advance.
7.2.1.2.1. Query directly
Using the RDBMS console, query directly on the table persisted by JobRepository
.
admin=# select JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE from BATCH_JOB_EXECUTION where JOB_EXECUTION_ID = 1;
job_execution_id | start_time | end_time | status | exit_code
------------------+-------------------------+-------------------------+-----------+-----------
1 | 2017-02-14 17:57:38.486 | 2017-02-14 18:19:45.421 | COMPLETED | COMPLETED
(1 row)
admin=# select JOB_EXECUTION_ID, STEP_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE from BATCH_STEP_EXECUTION where JOB_EXECUTION_ID = 1;
job_execution_id | step_execution_id | start_time | end_time | status | exit_code
------------------+-------------------+-------------------------+------------------------+-----------+-----------
1 | 1 | 2017-02-14 17:57:38.524 | 2017-02-14 18:19:45.41 | COMPLETED | COMPLETED
(1 row)
7.2.1.2.2. Use JobExplorer
Under sharing the application context of the batch application, JobExplorer
enables to confirm job execution status by injecting it.
// omitted.
@Inject
private JobExplorer jobExplorer;
private void monitor(long jobExecutionId) {
// (1)
JobExecution jobExecution = jobExplorer.getJobExecution(jobExecutionId);
// (2)
String jobName = jobExecution.getJobInstance().getJobName();
Date jobStartTime = jobExecution.getStartTime();
Date jobEndTime = jobExecution.getEndTime();
BatchStatus jobBatchStatus = jobExecution.getStatus();
String jobExitCode = jobExecution.getExitStatus().getExitCode();
// omitted.
// (3)
jobExecution.getStepExecutions().forEach( s -> {
String stepName = s.getStepName();
Date stepStartTime = s.getStartTime();
Date stepEndTime = s.getEndTime();
BatchStatus stepStatus = s.getStatus();
String stepExitCode = s.getExitStatus().getExitCode();
// omitted.
});
}
Sr. No. | Description |
---|---|
(1) |
Specify the job execution ID from the injected |
(2) |
Get the job execution result by |
(3) |
Get a collection of steps executed within the job from |
7.2.1.3. Stopping a Job
"Stopping a job" is a function that updates the running status of JobRepository
to a stopping status
and stops jobs at the boundary of steps or at chunk commit by chunk method.
Combined with restart, processing from the stopped position can be restarted.
For details of the restart, refer to Job Restart. |
"Stopping a job" is not a function to immediately stop a job in progress but to update the running status of Therefore, it can be said that stopping the job is "to reserve to stop when a processing that becomes a milestone is completed, such as a chunk break". For example, even if you stop the job under the following circumstances, it will not be the expected behavior.
|
Explain how to stop the job below.
-
Stop from command line
-
Available for both synchronous and asynchronous jobs
-
Use
-stop
option ofCommandLineJobRunner
-
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
classpath:/META-INF/jobs/job01/job01.xml job01 -stop
-
Stopping a job by its name specification is suitable for synchronous batch execution when jobs with the same name rarely start in parallel.
$ java org.springframework.batch.core.launch.support.CommandLineJobRunner \
classpath:/META-INF/jobs/job01/job01.xml 3 -stop
-
Stopping a job by JobExecutionId specification is suitable for asynchronous batch execution when jobs with tha same name often start in parallel.
|
7.2.2. Customizing Exit Codes
When the job is terminated by synchronous execution, the exit code of the java process can be customized according to the end status of the job. To customize the exit code, the following two operations are required.
-
Change the exit code of the job, which indicates the end status of the job.
-
Map the exit code (character string) of the job/step and the process exit code (numerical value).
Explain the following in order.
7.2.2.1. Change the exit code of the job.
The exit code of the job returned as a string can be changed.
-
Implement the
afterStep
method to return a specific exit status at the end of the step.-
Implement
StepExecutionListener
-
@Component
public class ExitStatusChangeListener implements StepExecutionListener {
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
ExitStatus exitStatus = stepExecution.getExitStatus();
if (conditionalCheck(stepExecution)) {
// (1)
exitStatus = new ExitStatus("CUSTOM STEP FAILED");
}
return exitStatus;
}
private boolean conditionalCheck(StepExecution stepExecution) {
// omitted.
}
}
<batch:step id="exitstatusjob.step">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="10" />
</batch:tasklet>
<batch:listeners>
<batch:listener ref="exitStatusChangeListener"/>
</batch:listeners>
</batch:step>
Sr. No. | Description |
---|---|
(1) |
Set custom exit code according to the execution result of step. |
-
Reflect the exit code returned by the step at the end of the job as the final job exit code.
-
Implement
afterJob
method in implementation class ofJobExecutionListener
.
-
@Component
public class JobExitCodeChangeListener extends JobExecutionListenerSupport {
@Override
public void afterJob(JobExecution jobExecution) {
// (1)
if (jobExecution.getStepExecutions().stream()
.anyMatch(s -> "CUSTOM STEP FAILED".equals(s.getExitStatus().getExitCode()))) {
jobExecution.setExitStatus(new ExitStatus("CUSTOM FAILED"));
}
}
}
<batch:job id="exitstatusjob" job-repository="jobRepository">
<batch:step id="exitstatusjob.step">
<!-- omitted -->
</batch:step>
<batch:listeners>
<batch:listener ref="jobExitCodeChangeListener"/>
</batch:listeners>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Set the final job exit code to |
7.2.2.2. Define mapping of exit codes additionally.
-
Define the mapping between the exit code of the job and the process exit code.
<!-- exitCodeMapper -->
<bean id="exitCodeMapper"
class="org.springframework.batch.core.launch.support.SimpleJvmExitCodeMapper">
<property name="mapping">
<util:map id="exitCodeMapper" key-type="java.lang.String"
value-type="java.lang.Integer">
<!-- ExitStatus -->
<entry key="NOOP" value="0" />
<entry key="COMPLETED" value="0" />
<entry key="STOPPED" value="255" />
<entry key="FAILED" value="255" />
<entry key="UNKNOWN" value="255" />
<entry key="CUSTOM FAILED" value="100" /> <!-- Custom Exit Status -->
</util:map>
</property>
</bean>
Process exit code 1 is prohibited
Generally, when a Java process is forcibly terminated due to a VM crash or SIGKILL signal reception, the process may return 1 as the exit status. Since it should be clearly distinguished from the end state of a batch application regardless of whether it is normal or abnormal, do not define 1 as a process exit code within an application. |
About the difference between status and exit code
There are "status (
|
7.2.3. Double Activation Prevention
In Spring Batch, when running a job,
confirm whether the following combination exists from JobRepositry
to JobInstance(BATCH_JOB_INSTANCE table)
.
-
Job name to be activated
-
Job parameters
TERASOLUNA Batch 5.x makes it possible to activate multiple times even if the combinations of job and job parameters match.
That is, it allows double activation.
For details, refer to Job Activation Parameter
In order to prevent double activation, it is necessary to execute in the job scheduler or application.
Detailed means are strongly dependent on job scheduler products and business requirements, so omitted here.
Consider whether it is necessary to suppress double start for each job.
7.2.4. Logging
Explain log setting method.
Log output, settings and considerations are in common with TERASOLUNA Server 5.x. At first, refer to Logging.
Explain specific considerations of TERASOLUNA Batch 5.x here.
7.2.4.1. Clarification of log output source
It is necessary to be able to clearly specify the output source job and job execution at the time of batch execution. Therefore, it is good to output the thread name, the job name and the job execution ID. Especially at asynchronous execution, since jobs with the same name will operate in parallel with different threads, recording only the job name may make it difficult to specify the log output source.
Each element can be realized in the following way.
- Thread name
-
Specify
%thread
which is the output pattern oflogback.xml
- Job name / Job Execution ID
-
Create a component implementing
JobExecutionListener
and record it at the start and end of the job
// package and import omitted.
@Component
public class JobExecutionLoggingListener implements JobExecutionListener {
private static final Logger logger =
LoggerFactory.getLogger(JobExecutionLoggingListener.class);
@Override
public void beforeJob(JobExecution jobExecution) {
// (1)
logger.info("job started. [JobName:{}][jobExecutionId:{}]",
jobExecution.getJobInstance().getJobName(), jobExecution.getId());
}
@Override
public void afterJob(JobExecution jobExecution) {
// (2)
logger.info("job finished.[JobName:{}][jobExecutionId:{}][ExitStatus:{}]"
, jobExecution.getJobInstance().getJobName(),
, jobExecution.getId(), jobExecution.getExitStatus().getExitCode());
}
}
<!-- omitted. -->
<batch:job id="loggingJob" job-repository="jobRepository">
<batch:step id="loggingJob.step01">
<batch:tasklet transaction-manager="jobTransactionManager">
<!-- omitted. -->
</batch:tasklet>
</batch:step>
<batch:listeners>
<!-- (3) -->
<batch:listener ref="jobExecutionLoggingListener"/>
</batch:listeners>
</batch:job>
<!-- omitted. -->
Sr. No. | Description |
---|---|
(1) |
Before starting the job, the job name and job execution ID are output to the INFO log. |
(2) |
When the job ends, an exit code is also output in addition to (1). |
(3) |
Associate |
7.2.4.2. Log Monitoring
In the batch application, the log is the main user interface of the operation. Unless the monitoring target and the actions at the time of occurrence are clearly designed, filtering becomes difficult and there is a danger that logs necessary for action are buried. For this reason, it is advisable to determine in advance a message or code system to be a keyword to be monitored for logs. For message management to be output to the log, refer to Message Management below.
7.2.4.3. Log Output Destination
For log output destinations in batch applications, it is good to design in which units logs are distributed / aggregated. For example, even when logs are output to a flat file, multiple patterns are considered as follows.
-
Output to 1 file per 1 job
-
Output to 1 file in units of multiple jobs grouped together
-
1 Output to 1 file per server
-
Output multiple servers in 1 file
In each case, depending on the total number of jobs / the total amount of logs / I/O rate to be generated in the target system, it is decided which unit is best to be grouped. It also depends on how to check logs. It is assumed that options will change depending on the utilization method such as whether to refer frequently from the job scheduler or from the console frequently.
The important thing is to carefully examine the log output in operational design and to verify the usefulness of the log in the test.
7.2.5. Message Management
Explain message management.
In order to prevent variations in the code system and to facilitate designing extraction as a keyword to be monitored, it is desirable to give messages according to certain rules.
As with logging, message management is basically the same as TERASOLUNA Server 5.x.
About utilization of MessageSource
launch-context.xml
|
7.3. Appendix. Spring Batch Admin
Spring Batch Admin is a subproject of Spring Batch, and it is possible to check the job execution status through the Web interface. Introduce it because it can refer easily regardless of test/commercial environment.
Spring Batch Admin is distributed as a sample application. Use Apache Tomcat as the web container and deploy the war file here.
Spring Batch Admin can not only check the execution status and results of jobs but also start and stop jobs. In that case, It is needed to include the job in the same war file, which creates a strong restriction that it is essential to execute the job with the web container. Since there is no necessity as long as it is just confirming the execution state/result, it is introduced as a reference method here. |
The installation procedure is as follows.
-
Download 1.3.1.RELEASE zip file from Release Distribution Site, and unzip it anywhere.
-
Create an external RDBMS-defined property file
batch-RDBMSNAME.properties
.-
Place it in
spring-batch-admin-1.3.1.RELEASE/spring-batch-admin-sample/src/main/resources
.
-
# Placeholders batch.*
# for PostgreSQL:
# (1)
batch.jdbc.driver=org.postgresql.Driver
batch.jdbc.url=jdbc:postgresql://localhost:5432/admin
batch.jdbc.user=postgres
batch.jdbc.password=postgres
batch.jdbc.testWhileIdle=true
batch.jdbc.validationQuery=SELECT 1
# (2)
batch.schema.script=classpath:/org/springframework/batch/core/schema-postgresql.sql
batch.drop.script=classpath*:/org/springframework/batch/core/schema-drop-postgresql.sql
batch.business.schema.script=classpath:/business-schema-postgresql.sql
batch.database.incrementer.class=org.springframework.jdbc.support.incrementer.PostgreSQLSequenceMaxValueIncrementer
# Non-platform dependent settings that you might like to change
# (3)
batch.data.source.init=false
Sr. No. | Description |
---|---|
(1) |
Describe the JDBC driver setting of the connection destination RDBMS. |
(2) |
Describe |
(3) |
Be sure to specify |
-
Add the JDBC driver dependency library to pom.xml under
spring-batch-admin-1.3.1.RELEASE/spring-batch-admin-sample/
<project>
<!-- omitted. -->
<dependencies>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>9.4.1212.jre7</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<!-- omitted. -->
</project>
-
Create a war file with
mvn clean package
command. -
Set the external RDBMS name
-DENVIRONMENT=postgresql
to the environment variableJAVA_OPTS
and start Tomcat.
$ export JAVA_OPTS="$JAVA_OPTS -DENVIRONMENT=postgresql"
$ echo $JAVA_OPTS
-DENVIRONMENT=postgresql
$ TOMCAT_HOME/bin/catalina.sh run
-
Deploy
target/spring-batch-admin-1.3.1.war
to Tomcat. -
In the browser, open
http://tomcathost:port/spring-batch-admin-sample-1.3.1.RELEASE/
and select Jobs.
-
Select the job name of the execution status/result acquisition target.
-
Display the execution status and result of the target job.
8. Flow control and parallel, multiple processing
8.1. Flow control
8.1.1. Overview
It is a method of implementing a single business process by splitting one job to multiple jobs and combining them instead of implementing by integrating them in one job. The item wherein dependency relationship between jobs is defined, is called as job net.
The advantages of defining a job net are enumerated below.
-
It is easier to visualize progress status of a process
-
It is possible to do partial re-execution, pending execution and stop execution of jobs
-
It is easier to do parallel execution of jobs
When designing a batch process, it is common to design the job net and the jobs together.
Suitability of processing contents and job net
Job nets are often not suitable for simple business process that need not be split and the process for integrating with online process. |
In this guideline, control of the flow of jobs between job nets is called flow control. In the processing flow, the previous job is called as preceding job and the next job is called as subsequent job. The dependency relationship between the preceding job and the subsequent job is called preceding and succeeding relationship.
The conceptual diagram of flow control is shown below.
As shown above, flow control can be implemented by both the job scheduler and TERASOLUNA Batch 5.x. However, it is desirable to use the job scheduler as much as possible due to the following reasons.
-
There is a strong tendency to have diverse processes and status for one job making it easier to form a black box.
-
The boundary between the job scheduler and the job becomes ambiguous
-
It becomes difficult to see the situation at the time of error from the job scheduler
However, it is generally known that there are following disadvantages when the number of jobs defined in the job scheduler increases.
-
The cost mentioned below increases and the processing time of the entire system increases due to the job scheduler
-
Job scheduler product specific communication, control of execution node, etc.
-
Overhead cost associated with Java process start for each job
-
-
Number of job registrations limit
The policy is as follows.
-
Basically, flow control is performed by the job scheduler.
-
Following measures are taken only when any damage is caused due to the large number of jobs.
-
Multiple sequential processes are consolidated in one job in TERASOLUNA Batch 5.x.
-
Simple preceding and succeeding relationships are only consolidated in one job.
-
Changing the step exit code and conditional branch of executing subsequent step by it can be used functionally. However, because job execution management becomes complicated, it is used in principle only for the process exit code determination at the end of the job.
-
-
Refer to Customization of exit code for the details of deciding job exit code. |
The points to be considered for implementing preceding and succeeding process are shown below.
-
Job scheduler starts java process through shell.
-
One job should correspond to one java process.
-
In the entire process, 4 java processes start.
-
-
The job scheduler controls the start order of each process. Each java process is independent.
-
The process exit code of the preceding job is used for deciding the start of the succeeding job.
-
External resources such as files, DB etc. should be used to pass the data between jobs.
-
Job scheduler starts java process through shell.
-
One job should be one java process.
-
In the entire process, only one java process is used.
-
-
Start order of each step is controlled by one java process. Each step is independent.
-
The exit code of the preceding job is used for deciding the start of the succeeding job.
-
The data can be passed between steps by in-memory.
How to implement flow control by TERASOLUNA Batch 5.x is explained below.
The flow control of job scheduler is strongly dependent on the product specifications so it is not explained here.
Application example of flow control
In general, parallel/multiple processes of multiple jobs is often implemented by job scheduler and job net. |
The usage method of this function is same in the chunk model as well as tasklet model.
8.1.2. How to use
How to use flow control in TERASOLUNA Batch 5.x is explained.
8.1.2.1. Sequential flow
Sequential flow is a flow that links the before step and after step in series.
If any business process ends abnormally in a step of the sequential flow, the succeeding step is not executed and the job is interrupted.
In this case, the step and job status and exit code associated with the job execution ID are
recorded as FAILED
by JobRepository
.
By restarting after recovering the cause of failure, it is possible to resume the process from the abnormally ended step.
Refer to Job restart for how to restart a job. |
Set sequential flow of the job consisting of 3 steps.
<bean id="sequentialFlowTasklet"
class="org.terasoluna.batch.functionaltest.ch08.flowcontrol.SequentialFlowTasklet"
p:failExecutionStep="#{jobParameters[failExecutionStep]}" scope="step"/>
<batch:step id="parentStep">
<batch:tasklet ref="sequentialFlowTasklet"
transaction-manager="jobTransactionManager"/>
</batch:step>
<batch:job id="jobSequentialFlow" job-repository="jobRepository">
<batch:step id="jobSequentialFlow.step1"
next="jobSequentialFlow.step2" parent="parentStep"/> <!-- (1) -->
<batch:step id="jobSequentialFlow.step2"
next="jobSequentialFlow.step3" parent="parentStep"/> <!-- (1) -->
<batch:step id="jobSequentialFlow.step3" parent="parentStep"/> <!-- (2) -->
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Specify the next step to be started after this step ends normally in |
(2) |
|
As a result, steps are started in series in the following order.
jobSequentialFlow.step1
→ jobSequentialFlow.step2
→ jobSequentialFlow.step3
How to define using <batch:flow>
In the above example, the flow is directly defined in
|
8.1.2.2. Passing data between steps
In Spring Batch, ExecutionContext
of execution context that can be used in the scope of each step and job is provided.
By using the execution context, data can be shared between the components in the step.
At this time, since the execution context of the step cannot be shared between steps, the execution context of the preceding step cannot be referred from the succeeding step.
It can be implemented if the execution context of the job is used, but since it can be referred from all steps, it needsto be handled carefully.
When the information between the steps needs to be inherited, it can be done by the following procedure.
-
In the post-processing of the preceding step, the information stored in the execution scope of the step scope is passed to the execution context of the job scope.
-
The succeeding step gets information from the execution context of the job scope.
By using ExecutionContextPromotionListener
provided by Spring Batch,
the first procedure can be realized only by specifying the inherited information to the listener even without implementing it.
Notes on using ExecutionContext
Also, it is possible to exchange information by sharing Bean of Singleton or Job scope rather than going through the execution context. Note that the larger the size of this method more the pressure on memory resources. |
The data passed between steps is explained for the tasklet model and the chunk model respectively below.
8.1.2.2.1. Data passing between steps using tasklet model
In order to save and fetch passing data, get ExecutionContext
from ChunkContext
and pass the data between the steps.
// package, imports are omitted.
@Component
public class SavePromotionalTasklet implements Tasklet {
// omitted.
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
// (1)
chunkContext.getStepContext().getStepExecution().getExecutionContext()
.put("promotion", "value1");
// omitted.
return RepeatStatus.FINISHED;
}
}
// package and imports are omitted.
@Component
public class ConfirmPromotionalTasklet implements Tasklet {
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) {
// (2)
Object promotion = chunkContext.getStepContext().getJobExecutionContext()
.get("promotion");
// omitted.
return RepeatStatus.FINISHED;
}
}
<!-- import,annotation,component-scan definitions are omitted -->
<batch:job id="jobPromotionalFlow" job-repository="jobRepository">
<batch:step id="jobPromotionalFlow.step1" next="jobPromotionalFlow.step2">
<batch:tasklet ref="savePromotionalTasklet"
transaction-manager="jobTransactionManager"/>
<batch:listeners>
<batch:listener>
<!-- (3) -->
<bean class="org.springframework.batch.core.listener.ExecutionContextPromotionListener"
p:keys="promotion"
p:strict="true"/>
</batch:listener>
</batch:listeners>
</batch:step>
<batch:step id="jobPromotionalFlow.step2">
<batch:tasklet ref="confirmPromotionalTasklet"
transaction-manager="jobTransactionManager"/>
</batch:step>
</batch:job>
<!-- omitted -->
Sr. No. | Explanation |
---|---|
(1) |
Set the value to be passed to the after step in the |
(2) |
Get the passing data set in (1) of the preceding step using |
(3) |
Using |
Regarding ExecutionContextPromotionListener and step exit code
|
8.1.2.2.2. Data passing between steps using the chunk model
Use the method assigned with @AfterStep
、@BeforeStep
annotation in ItemProcessor
.
The listener to be used for data passing and how to use ExecutionContext
is the same as the tasklet model.
// package and imports are omitted.
@Component
@Scope("step")
public class PromotionSourceItemProcessor implements ItemProcessor<String, String> {
@Override
public String process(String item) {
// omitted.
}
@AfterStep
public ExitStatus afterStep(StepExecution stepExecution) {
// (1)
ExecutionContext jobContext = stepExecution.getExecutionContext();
// (2)
jobContext.put("promotion", "value2");
return null;
}
}
// package and imports are omitted.
@Component
@Scope("step")
public class PromotionTargetItemProcessor implements ItemProcessor<String, String> {
@Override
public String process(String item) {
// omitted.
}
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
// (3)
ExecutionContext jobContext = stepExecution.getJobExecution()
.getExecutionContext();
// omitted.
}
}
<!-- import,annotation,component-scan definitions are omitted -->
<batch:job id="jobChunkPromotionalFlow" job-repository="jobRepository">
<batch:step id="jobChunkPromotionalFlow.step1" parent="sourceStep"
next="jobChunkPromotionalFlow.step2">
<batch:listeners>
<batch:listener>
<!-- (4) -->
<bean class="org.springframework.batch.core.listener.ExecutionContextPromotionListener"
p:keys="promotion"
p:strict="true" />
</batch:listener>
</batch:listeners>
</batch:step>
<batch:step id="jobChunkPromotionalFlow.step2" parent="targetStep"/>
</batch:job>
<!-- step definitions are omitted. -->
Sr. No. | Explanation |
---|---|
(1) |
Set the value to be passed to the succeeding step to |
(2) |
Get the passing data set in (1) of the preceding step using |
(3) |
Using |
8.1.3. How to extend
Here, conditional branching of succeeding step and the condition to stop the job before executing the after step isexplained.
Difference between exit code and status of job and step.
In the following explanation, the terms "Status" and "Exit code" frequently appear. |
8.1.3.1. Conditional branching
The conditional branch means receiving the exit code which is the execution result of the preceding step, selecting one from multiple after steps and continuing execution.
To stop the job without executing any succeeding step, refer to Stop condition.
<batch:job id="jobConditionalFlow" job-repository="jobRepository">
<batch:step id="jobConditionalFlow.stepA" parent="conditionalFlow.parentStep">
<!-- (1) -->
<batch:next on="COMPLETED" to="jobConditionalFlow.stepB" />
<batch:next on="FAILED" to="jobConditionalFlow.stepC"/>
</batch:step>
<!-- (2) -->
<batch:step id="jobConditionalFlow.stepB" parent="conditionalFlow.parentStep"/>
<!-- (3) -->
<batch:step id="jobConditionalFlow.stepC" parent="conditionalFlow.parentStep"/>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
Do not specify |
(2) |
It is the succeeding step executed only when the step exit code of (1) is |
(3) |
It is the succeeding step executed only when the step exit code of (1) is |
Notes on recovery process by after steps
When recovery process of the succeeding step is performed due to failure of preceding step process (Exit code is When the recovery process of the succeeding step fails, only the recovery process is re-executed on restarting the job. |
8.1.3.2. Stop condition
How to stop the job depending on the exit code of the preceding step, is explained.
There are methods to specify the following 3 elements as means to stop.
-
end
-
fail
-
stop
If these exit codes correspond to the preceding step, the succeeding step is not executed.
Multiple exit codes can be specified within the same step.
<batch:job id="jobStopFlow" job-repository="jobRepository">
<batch:step id="jobStopFlow.step1" parent="stopFlow.parentStep">
<!-- (1) -->
<batch:end on="END_WITH_NO_EXIT_CODE"/>
<batch:end on="END_WITH_EXIT_CODE" exit-code="COMPLETED_CUSTOM"/>
<!-- (2) -->
<batch:next on="*" to="jobStopFlow.step2"/>
</batch:step>
<batch:step id="jobStopFlow.step2" parent="stopFlow.parentStep">
<!-- (3) -->
<batch:fail on="FORCE_FAIL_WITH_NO_EXIT_CODE"/>
<batch:fail on="FORCE_FAIL_WITH_EXIT_CODE" exit-code="FAILED_CUSTOM"/>
<!-- (2) -->
<batch:next on="*" to="jobStopFlow.step3"/>
</batch:step>
<batch:step id="jobStopFlow.step3" parent="stopFlow.parentStep">
<!-- (4) -->
<batch:stop on="FORCE_STOP" restart="jobStopFlow.step4" exit-code=""/>
<!-- (2) -->
<batch:next on="*" to="jobStopFlow.step4"/>
</batch:step>
<batch:step id="jobStopFlow.step4" parent="stopFlow.parentStep"/>
</batch:job>
Sr. No. | Explanation |
---|---|
(1) |
When |
(2) |
By specifying wildcard ( |
(3) |
When |
(4) |
When |
When customizing the exit code by the exit-code attribute, it should be mapped to the process exit code without omission.
Refer to Customization of exit code for details. |
Empty charactor string should be specified to exit-code in <batch:stop>.
The expected flow control should be that when step1 ends normally, the job stops and step2 is executed when restart is executed again. To avoid this, "" (empty charactor string) should be assigned in Refer to Spring Batch/BATCH-2315 for the details of failure. |
8.2. Parallel processing and multiple processing
8.2.1. Overview
Generally, the batch system where the batch window is severe (time available for batch processing) is designed
to reduce overall processing time as much as possible by operating multiple jobs in parallel (hereafter referred to as parallel processing).
However, it may happen that processing time does not fit in the batch window due to large size of 1 processing job.
In this case, a method to reduce processing time by dividing the processing data of a job and performing multiple processing (hereafter referred to as multiple processing) can be used.
Although parallel processing and multiple processing can be handled with the same significance, the definitions are given here as below.
- Parallel processing
-
Execute multiple different jobs at the same time.
- Multiple processing
-
Divide the processing target of 1 job and execute simultaneously.
A method to use job scheduler and a method to use TERASOLUNA Batch 5.x are used for both parallel processing and multiple processing.
Note that, parallel processing and multiple processing in TERASOLUNA Batch 5.x is established
on Flow control.
Implementation method | Parallel processing | Multiple processing |
---|---|---|
Job scheduler |
It is defined to enable execution of multiple different jobs without dependencies to run at the same time. |
It is defined to execute multiple identical jobs in different data scopes. Pass information to narrow down data to be processed by each job argument, to each job. |
TERASOLUNA Batch 5.x |
Parallel Step (Parallel processing) |
Partitioning Step (Multiple processing) |
- When job scheduler is used
-
Since 1 process is allocated to 1 job, it is activated by multiple processes. Hence, designing and implementing one job is not very difficult.
However, since multiple processes are started, the load on machine resources increase when number of synchronous executions increase.
Hence, when the number of synchronous executions is 3 or 4, a job scheduler may be used.
Of course, this number is not absolute. It would like you to used as a guide as it depends on execution environment or job implementation. - When TERASOLUNA Batch 5.x is used
-
Since each step is assigned to a thread, it is operated as 1 process with multiple threads. Hence, the difficulty level for design and implementation of 1 job is higher than while using a job scheduler.
However, since the process is implemented by multiple threads, the load on machine resources will not be as high as the time when job scheduler is used even when the number of synchronous executions show an increase. Hence, when number of synchronous executions is large (5 or more than 5), TERASOLUNA Batch 5.x may be used.
Of course, this number is not absolute. It would like you to used as a guide as it depends on execution environment and system characteristics.
One of the parallel processing methods that can be executed in Spring Batch is
|
When data is to be updated to 1 database by parallel processing and multiple processing, resource conflict and deadlock are likely to occur. Potential conflicts should be eliminated from the job design stage. Distributed processing for multiple processes and multiple housings is included in Spring Batch as a function. However, since the failure design becomes difficult for TERASOLUNA Batch 5.x, it should not be used. |
The usage method of this function is same in the chunk model as well as tasklet model.
8.2.1.1. Parallel processing and multiple processing by job scheduler
Parallel processing and multiple processing using a job scheduler is explained here.
For job registration and schedule setting, refer the manual of the job scheduler to be used.
8.2.1.1.1. Parallel processing of jobs using job scheduler
The processes to be executed in parallel are registered as jobs and schedules are set so that each job starts on the time. Each job can be registered as a different process.
8.2.1.1.2. Multiple processing of jobs using job scheduler
Processes to be subjected to multiple processing are registered multiple times and extraction scope of target data is specified by parameters. Further, the schedule is set to enable the respective jobs at the same time. Although each job is in the same process, data range to be processed must be independent.
8.2.2. How to use
A method to perform parallel processing and multiple processing in TERASOLUNA Batch 5.x is explained.
8.2.2.1. Parallel Step (Parallel processing)
A method of Parallel Step (parallel processing) is explained.
Separate processes can be defined for each step and can be executed in parallel. A thread is allocated for each step.
How to define Parallel Step is shown below using schematic diagram of Parallel Step.
<!-- Task Executor -->
<!-- (1) -->
<task:executor id="parallelTaskExecutor" pool-size="10" queue-capacity="200"/>
<!-- Job Definition -->
<!-- (2) -->
<batch:job id="parallelStepJob" job-repository="jobRepository">
<batch:split id="parallelStepJob.split" task-executor="parallelTaskExecutor">
<batch:flow> <!-- (3) -->
<batch:step id="parallelStepJob.step.chunk.db">
<!-- (4) -->
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="fileReader" writer="databaseWriter"
commit-interval="100"/>
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow> <!-- (3) -->
<batch:step id="parallelStepJob.step.tasklet.chunk">
<!-- (5) -->
<batch:tasklet transaction-manager="jobTransactionManager"
ref="chunkTransactionTasklet"/>
</batch:step>
</batch:flow>
<batch:flow> <!-- (3) -->
<batch:step id="parallelStepJob.step.tasklet.single">
<!-- (6) -->
<batch:tasklet transaction-manager="jobTransactionManager"
ref="singleTransactionTasklet"/>
</batch:step>
</batch:flow>
<batch:flow> <!-- (3) -->
<batch:step id="parallelStepJob.step.chunk.file">
<batch:tasklet transaction-manager="jobTransactionManager">
<!-- (7) -->
<batch:chunk reader="databaseReader" writer="fileWriter"
commit-interval="200"/>
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
</batch:job>
Description | Sr. No. |
---|---|
(1) |
Define a thread pool to assign to each thread for parallel processing. |
(2) |
Define steps to be executed in parallel in |
(3) |
Define |
(4) |
Step 1 of schematic diagram :Define intermediate commit method processing of chunk model. |
(5) |
Step 2 of schematic diagram :Define intermediate commit method processing of tasket model. |
(6) |
Step 3 of schematic diagram :Define batch commit method processing of tasket model. |
(7) |
Step 4 of schematic diagram :Define intermediate commit method processing for non-transactional resources of chunk model. |
Cases wherein processing performance deteriorates due to parallel processing
In the parallel processing, same process can be run in parallel by changing the data range, similar to multiple processing. In this case, data range is assigned by the parameters.+ At this time, when the amount of data to be processed is not very significant for each process, footprints like resource amount and processing time which are occupied at the time of execution proves to be a disadvantage in parallel processing, and on the contrary, processing performance may get deteriorated. Examples of footprints
|
Further, steps of common processing can be defined as well before and after Parallel Step process.
<batch:job id="parallelRegisterJob" job-repository="jobRepository">
<!-- (1) -->
<batch:step id="parallelRegisterJob.step.preprocess"
next="parallelRegisterJob.split">
<batch:tasklet transaction-manager="jobTransactionManager"
ref="deleteDetailTasklet" />
</batch:step>
<!--(2) -->
<batch:split id="parallelRegisterJob.split" task-executor="parallelTaskExecutor">
<batch:flow>
<batch:step id="parallelRegisterJob.step.plan">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="planReader" writer="planWriter"
commit-interval="1000" />
</batch:tasklet>
</batch:step>
</batch:flow>
<batch:flow>
<batch:step id="parallelRegisterJob.step.performance">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="performanceReader" writer="performanceWriter"
commit-interval="1000" />
</batch:tasklet>
</batch:step>
</batch:flow>
</batch:split>
</batch:job>
Sr. No. | Description |
---|---|
(1) |
Define steps to be processed as preprocessing. Specify id set in |
(2) |
Define Parallel Step. |
8.2.2.2. Partitioning Step (Multiple processing)
A method of Partitioning Step (multiple processing) is explained.
Partitioning Step is divided into processing phases of Master step and Slave step.
-
In Master step,
Partitioner
generates aParition Key
to specify data range wherein each Slave step is processed.Parition Key
is stored in the step context. -
In Slave step,
Parition Key
assigned on its own from step context is fetched and data for processing is specified using the same. Step defined for specified data for processing are executed.
In the Partitioning Step, although it is necessary to divide the processing data, either of the variable number and fixed number are handled for the number of partitionings.
- In case of a variable number
-
Divide by department or process for each file existing in specific directory
- In case of a fixed number
-
Process data by dividing overall data in fixed numbers
In Spring Batch, fixed number is called grid-size
and data partitioning range is determined so that grid-size
becomes Partitioner
.
In Partitioning Step, number of partitionings can be significantly higher than the thread size. In this case, multiple executions are performed using number of threads and a step is generated wherein the process is not executed until the thread becomes empty.
Use case of Partitioning Step is shown below.
Use case | Master(Patitioner) | Slave | Number of partitionings |
---|---|---|---|
A case wherein transaction information is divided or multiple processing is performed from master information |
DB (Master information) |
DB (Transaction information) |
Variable |
A case wherein multiple processing is performed for 1 file from a list of files |
Multiple files |
Single file |
Variable |
A case wherein a large amount of data is divided by a fixed number or multiple processing is performed A case wherein since recovery design other than re-run becomes difficult in case of a failure occurrence, it is not used the actual operation. |
Specify data range from |
DB (Transaction information) |
Fixed |
8.2.2.2.1. When number of partitionings are variable
A method wherein number of partitionings are made variable by Partitioning Step is explained.
Processing image is shown below.
How to implement is shown below using the processing image as an example.
<!-- (1) -->
<select id="findAll" resultType="org.terasoluna.batch.functionaltest.app.model.mst.Branch">
<![CDATA[
SELECT
branch_id AS branchId,
branch_name AS branchName,
branch_address AS branchAddrss,
branch_tel AS branchTel,
create_date AS createDate,
update_date AS updateDate
FROM
branch_mst
]]>
</select>
<!-- (2) -->
<select id="summarizeInvoice"
resultType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceDetail">
<![CDATA[
SELECT
branchId, year, month, customerId, SUM(amount) AS amount
FROM (
SELECT
t2.charge_branch_id AS branchId,
date_part('year', t1.invoice_date) AS year,
date_part('month', t1.invoice_date) AS month,
t1.customer_id AS customerId,
t1.invoice_amount AS amount
FROM invoice t1
INNER JOIN customer_mst t2 ON t1.customer_id = t2.customer_id
WHERE
t2.charge_branch_id = #{branchId}
) t3
GROUP BY branchId, year, month, customerId
ORDER BY branchId ASC, year ASC, month ASC, customerId ASC
]]>
</select>
<!-- omitted -->
@Component
public class BranchPartitioner implements Partitioner {
@Inject
BranchRepository branchRepository; // (3)
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
List<Branch> branches = branchRepository.findAll();
int index = 0;
for (Branch branch : branches) {
ExecutionContext context = new ExecutionContext();
context.putString("branchId", branch.getBranchId()); // (4)
map.put("partition" + index, context); // (5)
index++;
}
return map;
}
}
<!-- (6) -->
<task:executor id="parallelTaskExecutor"
pool-size="${thread.size}" queue-capacity="10"/>
<!-- (7) -->
<bean id="reader" class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.app.repository.performance.InvoiceRepository.summarizeInvoice"
p:sqlSessionFactory-ref="jobSqlSessionFactory">
<property name="parameterValues">
<map>
<!-- (8) -->
<entry key="branchId" value="#{stepExecutionContext[branchId]}" />
</map>
</property>
</bean>
<!-- omitted -->
<batch:job id="multipleInvoiceSummarizeJob" job-repository="jobRepository">
<!-- (9) -->
<batch:step id="multipleInvoiceSummarizeJob.master">
<!-- (10) -->
<batch:partition partitioner="branchPartitioner"
step="multipleInvoiceSummarizeJob.slave">
<!-- (11) -->
<batch:handler grid-size="0" task-executor="parallelTaskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<!-- (12) -->
<batch:step id="multipleInvoiceSummarizeJob.slave">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" writer="writer" commit-interval="10"/>
</batch:tasklet>
</batch:step>
Sr. No. | Description |
---|---|
(1) |
Define a SQL wherein processing target is fetched from master data. |
(2) |
Define a SQL wherein fetched values from master data are considered as search conditions. |
(3) |
Inject defined Repository(SQLMapper). |
(4) |
Store master value processed by 1 Slave step in the step context. |
(5) |
Store each Slave in Map so that it can fetch corresponding context. |
(6) |
Define a thread pool to assign to each thread of Slave step in multiple processing. |
(7) |
Define ItemReader for fetching data using master value. |
(8) |
Fetch master value set in (4) from step context and add to search conditions. |
(9) |
Define Master step. |
(10) |
Define processing to generate partitioning conditions of data. |
(11) |
Since |
(12) |
Define Slave step. |
When multiple processing is performed for each file from the list of files, Partitioner
given below offered by Spring Batch can be used.
-
org.springframework.batch.core.partition.support.MultiResourcePartitioner
How to use MultiResourcePartitioner
is shown below.
<!-- (1) -->
<task:executor id="parallelTaskExecutor" pool-size="10" queue-capacity="200"/>
<!-- (2) -->
<bean id="reader"
class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"
p:resource="#{stepExecutionContext[fileName]}"> <!-- (3) -->
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper"
p:fieldSetMapper-ref="invoiceFieldSetMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"
p:names="invoiceNo,salesDate,productId,customerId,quant,price"/>
</property>
</bean>
</property>
</bean>
<!-- (4) -->
<bean id="patitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
scope="step"
p:resources="file:#{jobParameters[basedir]}/input/invoice-*.csv"/> <!-- (5) -->
<!--(6) -->
<batch:job id="inspectPartitioninglStepFileJob" job-repository="jobRepository">
<batch:step id="inspectPartitioninglStepFileJob.step.master">
<batch:partition partitioner="patitioner"
step="inspectPartitioninglStepFileJob.step.slave">
<batch:handler grid-size="0" task-executor="parallelTaskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
<!-- (7) -->
<batch:step id="inspectPartitioninglStepFileJob.step.slave">
<batch:tasklet>
<batch:chunk reader="reader" writer="writer" commit-interval="20"/>
</batch:tasklet>
</batch:step>
Sr. No. | Description |
---|---|
(1) |
Define a thread pool to be assigned to each thread of Slave step in multiple processing. |
(2) |
Define ItemReader to read a single file. |
(3) |
In |
(4) |
Define |
(5) |
Multiple files can be handled by using a pattern wherein * is used. |
(6) |
Define Master step. |
(7) |
Define Slave step. |
8.2.2.2.2. When number of partitionings are fixed
How to fix number of partitionings in Partitioning Step is explained.
Processing image diagram is shown below.
How to implement is shown below using the processing image as an example.
<!-- (1) -->
<select id="findByYearAndMonth"
resultType="org.terasoluna.batch.functionaltest.app.model.performance.SalesPerformanceSummary">
<![CDATA[
SELECT
branch_id AS branchId, year, month, amount
FROM
sales_performance_summary
WHERE
year = #{year} AND month = #{month}
ORDER BY
branch_id ASC
LIMIT
#{dataSize}
OFFSET
#{offset}
]]>
</select>
<!-- (2) -->
<select id="countByYearAndMonth" resultType="_int">
<![CDATA[
SELECT
count(*)
FROM
sales_performance_summary
WHERE
year = #{year} AND month = #{month}
]]>
</select>
<!-- omitted -->
@Component
public class SalesDataPartitioner implements Partitioner {
@Inject
SalesSummaryRepository repository; // (3)
// omitted
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>();
int count = repository.countByYearAndMonth(year, month);
int dataSize = (count / gridSize) + 1; // (4)
int offset = 0;
for (int i = 0; i < gridSize; i++) {
ExecutionContext context = new ExecutionContext();
context.putInt("dataSize", dataSize); // (5)
context.putInt("offset", offset); // (6)
offset += dataSize;
map.put("partition:" + i, context); // (7)
}
return map;
}
}
<!-- (8) -->
<task:executor id="parallelTaskExecutor"
pool-size="${thread.size}" queue-capacity="10"/>
<!-- (9) -->
<bean id="reader"
class="org.mybatis.spring.batch.MyBatisCursorItemReader" scope="step"
p:queryId="org.terasoluna.batch.functionaltest.ch08.parallelandmultiple.repository.SalesSummaryRepository.findByYearAndMonth"
p:sqlSessionFactory-ref="jobSqlSessionFactory">
<property name="parameterValues">
<map>
<entry key="year" value="#{new Integer(jobParameters[year])}" />
<entry key="month" value="#{new Integer(jobParameters[month])}" />
<!-- (10) -->
<entry key="dataSize" value="#{stepExecutionContext[dataSize]}" />
<!-- (11) -->
<entry key="offset" value="#{stepExecutionContext[offset]}" />
</map>
</property>
</bean>
<!-- omitted -->
<batch:job id="multipleCreateSalesPlanSummaryJob" job-repository="jobRepository">
<!-- (12) -->
<batch:step id="multipleCreateSalesPlanSummaryJob.master">
<!-- (13) -->
<batch:partition partitioner="salesDataPartitioner"
step="multipleCreateSalesPlanSummaryJob.slave">
<!-- (14) -->
<batch:handler grid-size="4" task-executor="parallelTaskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
<!-- (15) -->
<batch:step id="multipleCreateSalesPlanSummaryJob.slave">
<batch:tasklet transaction-manager="jobTransactionManager">
<batch:chunk reader="reader" processor="addProfitsItemProcessor"
writer="writer" commit-interval="10"/>
</batch:tasklet>
</batch:step>
Sr. No. | Description |
---|---|
(1) |
Define a pagination search (SQL narrowing down method) to fetch a specific data range. |
(2) |
Define SQL to fetch total number of records for processing. |
(3) |
Inject defined Repository(SQLMapper). |
(4) |
Calculate data records processed by one Slave step. |
(5) |
Store data records of (4) in step context. |
(6) |
Store search start position of each Slave step in step context. |
(7) |
Each Slave is stored in the Map to enable fetching of corresponding context. |
(8) |
Define a thread poolto be assigned to each thread of Slave step in multiple processing. |
(9) |
Define ItemReader for fetching data by using pagination search (SQL narrow down method). |
(10) |
Fetch data records set in (5) from step context and add to search conditions. |
(11) |
Fetch search start position set in (6) from step context and add to search conditions. |
(12) |
Define Master step. |
(13) |
Define a process which generates partitioning conditions for data. |
(14) |
Set number of partitionings (fixed number) in |
(15) |
Define Slave step. |
9. Summary of points
9.1. Notes on TERASOLUNA Batch 5.x
This is a summarized list of the rules and notes about using TERASOLUNA Batch 5.x that are explained in each section. Users should keep in mind the following points and proceed when developing a batch application.
Only the salient points are mentioned here instead of covering the entire consider material. Users should read the guideline of function to be used. |
-
Single batch process should be simplified and complex logical structures should be avoided.
-
Same operation should not be performed in multiple jobs over and over again.
-
Usage of system resources should be minimized, unnecessary physical I/O should be avoided and on-memory operations should be utilized.
-
Development of batch application
-
Create as 1 job=1 Bean definition(1 job definition)
-
Create as 1 step=1 batch process=1 business logic
-
-
-
Use it for efficiently processing large amount of data.
-
-
-
Use for simple processing, processing that is hard to standardize, and to process data by single commit.
-
-
-
Use for starting a job as per the schedule and for batch processing by combining multiple jobs.
-
-
-
Use for delayed process, continuous execution of job with short processing time and consolidation of large jobs.
-
-
Asynchronous job (Web container)
-
Similar to DB polling. However, use when instantaneous start is required.
-
-
Management of JobRepository
-
In Spring Batch, use
JobRepository
for recording start status and execution result of job. -
In TERASOLUNA Batch 5.x, persistence is optional if it corresponds to all the following.
-
Using TERASOLUNA Batch 5.x for executing synchronous job only.
-
All job execution management including stop, restart of job is assigned to the job scheduler.
-
Do not use restart where the
JobRepository
possessed by Spring Batch is a prerequisite.
-
-
-
When these are applicable, use
H2
which is an in-memory and built-in database as an option of RDBMS used byJobRepository
. On the other hand, when using asynchronous job or stop and restart by Spring Batch, RDBMS that can make the job execution status and result permanent, is required.
For this point, Job management should also be read.
-
-
-
When you want to steadily process large amount of data
-
When you want to restart based on the record count
-
-
-
When you want to make recovery as simple as possible
-
When you want to consolidate the process contents
-
How to choose chunk model or tasklet model should also be read.
-
In the Tasklet implementation, match with the scope of component to be Injected.
-
Composite type component matches with the scope of component to be delegated.
-
When using JobParameter, set to the scope of
step
. -
If you want to save instance variables in Step unit, set to the scope of
step
.
-
Adjust chunk size
-
When using Chunk, set the number of commits to an appropriate size. Do not increase the size too much.
-
-
Adjust fetch size
-
In database access, set fetch size to an appropriate size. Do not increase the size too much.
-
-
Make file reading more efficient
-
Provide a dedicated FieldSetMapper interface implementation.
-
-
Parallel process and multiple processes
-
Implement by job scheduler.
-
-
Distributed processing
-
Implement by job scheduler.
-
-
Usage of in-memory database
-
It is not suitable for long-term continuous operation so it is desirable to restart it periodically.
-
When it is to be used for long-term continuous operation, maintenance work such as periodically deleting data from
JobRepository
is required.
-
-
Narrow-down of registered job
-
Specify the designed and implemented job based on asynchronous execution.
-
-
Mass processing of very short batch is not suitable since performance deterioration is possible.
-
Since parallel execution of the same job is possible, it is necessary to prevent the same job from affecting in parallel execution
-
The basic consideration is same as Asynchronous job (DB polling).
-
Adjust thread pool.
-
Apart from the thread pool of asynchronous execution, it is necessary to consider the request thread of the Web container and other applications operating within the same unit.
-
-
In Web and batch, you cannot cross-reference data source, MyBatis setting and Mapper interface.
-
Failure to start a job due to thread pool exhaustion cannot be catched at job start, so provide a means to confirm it separately.
-
"Use
MyBatisBatchItemWriter
in ItemWriter" and "Update reference using Mapper interface in ItemProcessor" cannot be done at the same time.-
There is a restriction in MyBatis of not executing with two or more
ExecutorType
in the same transaction. Refer to Database access of other than ItemReader/ItemWriter.
-
-
Notes on input/output of database to the same table
-
As the result of losing the information that guarantees reading consistency due to output (issue of UPDATE), error may occur in the input (SELECT). Consider the following measures.
-
It depends on the database so, increase the area to secure the information.
-
Split the input data and perform multiple processing.
-
-
-
When dealing with the following fixed-length file, be sure to use the component provided by TERASOLUNA Batch 5.x.
-
Fixed-length file containing multibyte characters
-
Fixed length file without line break
-
-
When skipping footer records, it is necessary to process with OS command.
-
When multiple jobs are concurrently executed, design a job so that the exclusive control is not required.
-
Resources to be accessed and processing targets should be split for each job.
-
-
Design in such a way that deadlocks are prevented from occurring.
-
File exclusive control should be implemented in the tasklet model.
-
Do not perform transaction processing in exception handling.
-
Note that ChunkListener behaves differently by the process model.
-
The exceptions generated by opening and closing the resources are
-
Chunk model: Not in the scope of catching by ChunkListener interface.
-
Tasklet model: In the scope of catching by ChunkListener interface.
-
-
-
Input check error cannot be recovered even by restarting unless the input resource that is the cause of check error is modified
-
How to cope when a failure occurs in JobRepository should be considered.
-
Since
ExecutionContext
is stored in theJobRepository
, there are following restrictions.-
The object to be stored in
ExecutionContext
should be the class that implementsjava.io.Serializable
. -
There should be a limit in the size that can be stored.
-
-
The forced end of Java process and the end status of batch application is clearly distinguished.
-
It is strictly prohibited to set the exit code of process to 1 by batch application.
-
-
Do not use
Multi Thread Step
. -
Depending on the processing content, be careful to possibility of resource contention and deadlock