TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.0.1.RELEASE, 2017-9-27
> INDEX

Overview

Overall architecture of TERASOLUNA Batch Framework for Java (5.x) is explained.

In TERASOLUNA Batch Framework for Java (5.x), as described in "General batch processing system", it is implemented by using OSS combination focused on Spring Batch.

A configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x) including hierarchy architecture of Spring Batch is shown below.

TERASOLUNA Batch Framework for Java (5.x) Stack
Configuration schematic diagram of TERASOLUNA Batch Framework for Java (5.x)
Description of hierarchy architecture of Spring Batch
Business Application

All job definitions and business logic written by developers.

spring batch core

A core runtime class required to start and control batch jobs offered by Spring Batch.

spring batch infrastructure

Implementation of general ItemReader/ItemProcessor/ItemWriter offered by Spring Batch which are used by developers and core framework itself.

Structural elements of job

A configuration schematic diagram of jobs is shown below in order to explain structural elements of the job.

Job Components
Configuration schematic diagram of job

This section also talks about guidelines which should be finely configured for job and step.

Job

A job is an entity that encapsulates entire batch process and is a container for storing steps.
A job can consist of one or more steps.

A job is defined in the Bean definition file by using XML. Multiple jobs can be defined in the job definition file, however, managing jobs tend to become complex.

Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.

1 job = 1 job definition file

Step

Step defines information required for controlling a batch process. A chunk model and a tasket model can be defined in the step.

Chunk model
  • It is configured by ItemReader, ItemProcessor and ItemWriter.

Tasket model
  • It is configured only by Tasklet.

As given in "Rules and precautions to be considered in batch processing", it is necessary to simplify as much as possible and avoid complex logical structures in a single batch process.

Hence, TERASOLUNA Batch Framework for Java (5.x) uses following guidelines.

1 step = 1 batch process = 1 business logic

Distribution of business logic in chunk model

If a single business logic is complex and large-scale, the business logic is divided into units. As clear from the schematic diagram, since only one ItemProcessor can be set in 1 step, it looks like the division of business logic is not possible. However, since CompositeItemProcssor which is an ItemProcessor consisting of multiple ItemProcessors exist, the business logic can be divided and executed by using this implementation.

How to implement Step

Chunk model

Definition of chunk model and purpose of use are explained.

Definition

ItemReader, ItemProcessor and ItemWriter implementation and number of chunks are set in ChunkOrientedTasklet. Respective roles are explained.

  • ChunkOrientedTasklet…​Call ItemReader/ItemProcessor and create a chunk. Pass created chunk to ItemWriter.

  • ItemReader…​Read input data.

  • ItemProcessor…​Process read data.

  • ItemWriter…​Output processed data in chunk units.

For overview of chunk model, refer "Chunk model".

How to set a job in chunk model
<batch:job id="exampleJob">
    <batch:step id="exampleStep">
        <batch:tasklet>
            <batch:chunk reader="reader"
                         processor="processor"
                         writer="writer"
                         commit-interval="100" />
        </batch:tasklet>
    </batch:step>
</batch:job>
Purpose of use

Since it handles a certain amount of data collectively, it is used while handling a large amount of data.

Tasket model

Definition of tasket model and purpose of use are explained.

Definition

Only Tasklet implementation is set.
For overview of Tasket model, refer "Tasket model".

How to set a job in Tasket model
<batch:job id="exampleJob">
    <batch:step id="exampleStep">
        <batch:tasklet ref="myTasklet">
    </batch:step>
</batch:job>
Purpose of use

It can be used for executing a process which is not associated with I/O like execution of system commands etc.
Further, it can also be used while committing the data in batches.

Function difference between chunk model and Tasket model

Explanation is given for the function difference between chunk model and Tasket model. Here, only outline is given. Refer section for each function for details.

List of function differences
Function Chunk model Tasket model

Structural elements

Configured by ItemReader/ItemProcessor/ItemWriter/ChunkOrientedTasklet.

Configured only by Takslet.

Transaction

A transaction is generated in a chunk unit.

Processed in 1 transaction.

Recommended reprocessing method

Re-run and re-start can be used.

As a rule, only re-run is used.

Exception handling

Handling process becomes easier by using a listener. Individual implementation is also possible.

Individual implementation is required.

Running a job method

Running a job method is explained. This contains following.

Respective methods are explained.

Synchronous execution method

Synchronous execution method is an execution method wherein the control is not given back to the boot source from job start to job completion.

A schematic diagram which starts a job from job scheduler is shown.

Synchronized Execution
Schematic diagram for synchronous execution
  1. Start a shell script to run a job from job scheduler.
    Job scheduler waits until the exit code (numeric value) is returned.

  2. Start CommandLineJobRunner to run a job from shell script.
    Shell script waits until CommandLineJobRunner returns an exit code (numeric value).

  3. CommandLineJobRunner runs a job. Job returns an exit code (string) to CommandLineJobRunner after processing is completed.
    CommandLineJobRunner converts exit code (string) returned from the job to exit code (numeric value) and returns it to the shell script.

Asynchronous execution method

Asynchronous execution method is an execution method wherein the control is given back to boot source immediately after running a job, by executing a job on a different execution base than boot source (a separate thread etc). In this method, it is necessary to fetch job execution results by a means different from that of running a job.

Following 2 methods are explained in TERASOLUNA Batch Framework for Java (5.x).

Other asynchronous execution methods

Asynchronous execution can also be performed by using messages like MQ, however since the job execution points are identical, description will be omitted in TERASOLUNA Batch Framework for Java (5.x).

Asynchronous execution method (DB polling)

"Asynchronous execution (DB polling)" is a method wherein a job execution request is registered in the database, polling of the request is done and job is executed.

TERASOLUNA Batch Framework for Java (5.x) supports DB polling function. The schematic diagram of start by DB polling offered is shown.

DB Polling
DB polling schematic diagram
  1. User registers a job request to the database.

  2. DB polling function periodically monitors the registration of the job request and executes the corresponding job when the registration is detected.

    • Run the job from SimpleJobOperator and receive JobExecutionId after completion of the job.

    • JobExecutionId is an ID which uniquely identifies job execution and execution results are browsed from JobRepository by using this ID.

    • Job execution results are registered in JobRepository by using Spring Batch system.

    • DB polling is itself executed asynchronously.

  3. DB polling function updates JobExecutionId returned from SimpleJobOperator and the job request that started the status.

  4. Job process progress and results are referred separately by using JobExecutionId.

Asynchronous execution method (Web container)

"Asynchronous execution (Web container)" is a method wherein a job is executed asynchronously using the request sent to web application on the web container as a trigger.* A Web application can return a response immediately after starting without waiting for the job to end.

Web Container
Web container schematic diagram
  1. Send a request from a client to Web application.

  2. Web application asynchronously executes the job requested from a request.

    • Receive `JobExecutionId immediately after starting a job from SimpleJobOperator.

    • Job execution results are registered in JobRepository by using Spring Batch system.

  3. Web application returns a response to the client without waiting for the job to end.

  4. Job process progress and results are browsed separately by using JobExecutionId.

Further, it can also be linked with Web application configured by TERASOLUNA Server Framework for Java (5.x).

Points to consider while using

Points to consider while using TERASOLUNA Batch Framework for Java (5.x) are shown.

Running a job method
Synchronous execution method

It is used when job is run as per schedule and batch processing is carried out by combining multiple jobs.

Asynchronous execution method (DB polling)

It is used in delayed processing, continuous execution of jobs with a short processing time, aggregation of large quantity of jobs.

Asynchronous execution method (Web container)

Similar to DB polling, however it is used when an immediate action is required for the startup.

Implementation method
Chunk model

It is used when a large quantity of data is to be processed efficiently.

Tasket model

It is used for simple processing, processing that is difficult to standardize and for the processes wherein data is to be processed collectively.

TERASOLUNA Batch Framework for Java (5.x) Development Guideline - version 5.0.1.RELEASE, 2017-9-27