Overview
Overall architecture of TERASOLUNA Batch Framework for Java (5.x) is explained.
In TERASOLUNA Batch Framework for Java (5.x), as described in General batch processing system, it is implemented by using OSS combination focused on Spring Batch.
A configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x) including hierarchy architecture of Spring Batch is shown below.
- Business Application
-
All job definitions and business logic written by developers.
- spring batch core
-
A core runtime class required to start and control batch jobs offered by Spring Batch.
- spring batch infrastructure
-
Implementation of general ItemReader/ItemProcessor/ItemWriter offered by Spring Batch which are used by developers and core framework itself.
Structural elements of job
A configuration schematic diagram of jobs is shown below in order to explain structural elements of the job.
This section also talks about guidelines which should be finely configured for job and step.
Job
A job is an entity that encapsulates entire batch process and is a container for storing steps.
A job can consist of one or more steps.
A job is defined in the Bean definition file by using XML. Multiple jobs can be defined in the job definition file, however, managing jobs tend to become complex.
Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.
1 job = 1 job definition file
Step
Step defines information required for controlling a batch process. A chunk model and a tasket model can be defined in the step.
- Chunk model
-
-
It is configured by ItemReader, ItemProcessor and ItemWriter.
-
- Tasket model
-
-
It is configured only by Tasklet.
-
As given in Rules and precautions to be considered in batch processing, it is necessary to simplify as much as possible and avoid complex logical structures in a single batch process.
Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.
1 step = 1 batch process = 1 business logic
Distribution of business logic in chunk model
If a single business logic is complex and large-scale, the business logic is divided into units. As clear from the schematic diagram, since only one ItemProcessor can be set in 1 step, it looks like the division of business logic is not possible. However, since CompositeItemProcssor which is an ItemProcessor consisting of multiple ItemProcessors exist, the business logic can be divided and executed by using this implementation. |
How to implement Step
Chunk model
Definition of chunk model and purpose of use are explained.
- Definition
-
ItemReader, ItemProcessor and ItemWriter implementation and number of chunks are set in ChunkOrientedTasklet. Respective roles are explained.
-
ChunkOrientedTasklet…Call ItemReader/ItemProcessor and create a chunk. Pass created chunk to ItemWriter.
-
ItemReader…Read input data.
-
ItemProcessor…Process read data.
-
ItemWriter…Output processed data in chunk units.
-
-
For overview of chunk model, refer Chunk model.
<batch:job id="exampleJob">
<batch:step id="exampleStep">
<batch:tasklet>
<batch:chunk reader="reader"
processor="processor"
writer="writer"
commit-interval="100" />
</batch:tasklet>
</batch:step>
</batch:job>
- Purpose of use
-
Since it handles a certain amount of data collectively, it is used while handling a large amount of data.
Tasket model
Definition of tasket model and purpose of use are explained.
- Definition
-
Only Tasklet implementation is set.
For overview of Tasket model, refer Tasket model.
<batch:job id="exampleJob">
<batch:step id="exampleStep">
<batch:tasklet ref="myTasklet">
</batch:step>
</batch:job>
- Purpose of use
-
It can be used for executing a process which is not associated with I/O like execution of system commands etc.
Further, it can also be used while committing the data in batches.
Function difference between chunk model and Tasket model
Explanation is given for the function difference between chunk model and Tasket model. Here, only outline is given. Refer section for each function for details.
Function | Chunk model | Tasket model |
---|---|---|
Structural elements |
Configured by ItemReader/ItemProcessor/ItemWriter/ChunkOrientedTasklet. |
Configured only by Takslet. |
Transaction |
A transaction is generated in a chunk unit. |
Processed in 1 transaction. |
Recommended reprocessing method |
Re-run and re-start can be used. |
As a rule, only re-run is used. |
Exception handling |
Handling process becomes easier by using a listener. Individual implementation is also possible. |
Individual implementation is required. |
Running a job method
Running a job method is explained. This contains following.
Respective methods are explained.
Synchronous execution method
Synchronous execution method is an execution method wherein the control is not given back to the boot source from job start to job completion.
A schematic diagram which starts a job from job scheduler is shown.
-
Start a shell script to run a job from job scheduler.
Job scheduler waits until the exit code (numeric value) is returned. -
Start
CommandLineJonRunner
to run a job from shell script.
Shell script waits untilCommandLineJonRunner
returns an exit code (numeric value). -
CommandLineJonRunner
runs a job. Job returns an exit code (string) toCommandLineJonRunner
after processing is completed.
CommandLineJonRunner
converts exit code (string) returned from the job to exit code (numeric value) and returns it to the shell script.
Asynchronous execution method
Asynchronous execution method is an execution method wherein the control is given back to boot source immediately after running a job, by executing a job on a different execution base than boot source (a separate thread etc). In this method, it is necessary to fetch job execution results by a means different from that of running a job.
Following 2 methods are explained in TERASOLUNA Batch Framework for Java (5.x).
Other asynchronous execution methods
Asynchronous execution can also be performed by using messages like MQ, however since the job execution points are identical, description will be omitted in {batch5_guide}. |
Asynchronous execution method (DB polling)
Asynchronous execution (DB polling) is a method wherein a job execution request is registered in the database, polling of the request is done and job is executed.
TERASOLUNA Batch Framework for Java (5.x) supports DB polling function. The schematic diagram of start by DB polling offered is shown.
-
User registers a job request to the database.
-
DB polling function periodically monitors the registration of the job request and executes the corresponding job when the registration is detected.
-
Run the job from SimpleJobOperator and receive
JobExecutionId
after completion of the job. -
JobExecutionId is an ID which uniquely identifies job execution and execution results are browsed from JobRepository by using this ID.
-
Job execution results are registered in JobRepository by using Spring Batch system.
-
DB polling is itself executed asynchronously.
-
-
DB polling function updates JobExecutionId returned from SimpleJobOperator and the job request that started the status.
-
Job process progress and results are referred separately by using JobExecutionId.
Asynchronous execution method (Web container)
Asynchronous execution (Web container) is a method wherein a job is executed asynchronously using the request sent to web application on the web container as a trigger.* A Web application can return a response immediately after starting without waiting for the job to end.
-
Send a request from a client to Web application.
-
Web application asynchronously executes the job requested from a request.
-
Receive
`JobExecutionId
immediately after starting a job from SimpleJobOperator. -
Job execution results are registered in JobRepository by using Spring Batch system.
-
-
Web application returns a response to the client without waiting for the job to end.
-
Job process progress and results are browsed separately by using JobExecutionId.
Further, it can also be linked with Web application configured by TERASOLUNA Server Framework for Java (5.x).
Points to consider while using
Points to consider while using TERASOLUNA Batch Framework for Java (5.x) are shown.
- Running a job method
-
- Synchronous execution method
-
It is used when job is run as per schedule and batch processing is carried out by combining multiple jobs.
- Asynchronous execution method (DB polling)
-
It is used in delayed processing, continuous execution of jobs with a short processing time, aggregation of large quantity of jobs.
- Asynchronous execution method (Web container)
-
Similar to DB polling, however it is used when an immediate action is required for the startup.
- Implementation method
-
- Chunk model
-
It is used when a large quantity of data is to be processed efficiently.
- Tasket model
-
It is used for simple processing, processing that is difficult to standardize and for the processes wherein data is to be processed collectively.