Spring Batch

Introduction to Spring Batch

Spring Batch is a lightweight, comprehensive batch processing framework designed to enable the development of robust batch applications. In the modern enterprise environment, batch processing plays a critical role in handling operations that need to process large volumes of data without user interaction, such as:

Nightly financial transactions
Automated report generation
ETL (Extract, Transform, Load) operations
Data migration between systems
Processing of large datasets

Unlike real-time processing where data is handled as it arrives, batch processing collects data over time and processes it in scheduled "batches." Spring Batch provides reusable functions that are essential for processing large volumes of data, including logging, transaction management, job processing statistics, job restart functionality, and resource management.

Why Use Spring Batch?

As a beginner, you might wonder why you should learn Spring Batch when you could write your own loops to process data. Here are some compelling reasons:

Scalability - Spring Batch can handle millions of records through optimized processing techniques
Resilience - Built-in support for retrying failed operations and resuming jobs
Monitoring - Comprehensive metrics and execution statistics
Maintenance - Separation of business logic from batch infrastructure
Enterprise Integration - Seamless integration with other Spring projects

Spring Batch Architecture

Spring Batch follows a layered architecture with three primary components:

Application Layer - Contains your batch jobs and custom code
Core Layer - Provides core runtime services and APIs
Infrastructure Layer - Handles the common readers, writers, and services

Spring Batch Architecture

Key Components

Understanding these foundational components will help you build effective batch applications:

1. Job

A Job represents a complete batch process. It's a container for Steps and defines how the batch process should execute.

2. Step

A Step is a sequential phase of a Job that encapsulates an independent part of the batch process. A job can have multiple steps.

3. JobRepository

The JobRepository stores metadata about configured and executed batch jobs, including status, start and end times, and more.

4. JobLauncher

The JobLauncher is responsible for launching a Job with a given set of parameters.

5. Item Reader, Processor, Writer

These components handle the reading, processing, and writing of data respectively, forming a chunk-oriented processing pattern that is central to Spring Batch operations.

Getting Started with Spring Batch

Let's walk through setting up a basic Spring Batch application.

Step 1: Add Dependencies

First, add Spring Batch dependencies to your Maven pom.xml:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-batch</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>
    <dependency>
        <groupId>com.h2database</groupId>
        <artifactId>h2</artifactId>
        <scope>runtime</scope>
    </dependency>
</dependencies>

Step 2: Configure the Database

Spring Batch requires a database to store metadata. For development, we can use an H2 in-memory database. Add these properties to application.properties:

spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=
spring.h2.console.enabled=true

# Disable Spring Batch auto-execution
spring.batch.job.enabled=false

Step 3: Create a Batch Configuration Class

Now, let's create a basic batch job that reads data from a CSV file, processes it, and writes it to a database:

package com.example.batchdemo;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;

import javax.sql.DataSource;

@Configuration
public class BatchConfiguration {

    @Bean
    public FlatFileItemReader<Person> reader() {
        return new FlatFileItemReaderBuilder<Person>()
            .name("personItemReader")
            .resource(new ClassPathResource("sample-data.csv"))
            .delimited()
            .names(new String[]{"firstName", "lastName"})
            .targetType(Person.class)
            .build();
    }

    @Bean
    public PersonItemProcessor processor() {
        return new PersonItemProcessor();
    }

    @Bean
    public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Person>()
            .sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
            .dataSource(dataSource)
            .beanMapped()
            .build();
    }

    @Bean
    public Job importUserJob(JobRepository jobRepository, Step step1) {
        return new JobBuilder("importUserJob", jobRepository)
            .start(step1)
            .build();
    }

    @Bean
    public Step step1(JobRepository jobRepository, 
                     DataSourceTransactionManager transactionManager,
                     FlatFileItemReader<Person> reader, 
                     PersonItemProcessor processor, 
                     JdbcBatchItemWriter<Person> writer) {
        return new StepBuilder("step1", jobRepository)
            .<Person, Person>chunk(10, transactionManager)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
    }
}

Step 4: Create Model and Processor Classes

Create a Person class to represent our data model:

package com.example.batchdemo;

public class Person {

    private String firstName;
    private String lastName;

    public Person() {
    }

    public Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    @Override
    public String toString() {
        return "Person{" +
                "firstName='" + firstName + '\'' +
                ", lastName='" + lastName + '\'' +
                '}';
    }
}

Next, create the processor that transforms the data:

package com.example.batchdemo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<Person, Person> {

    private static final Logger LOGGER = LoggerFactory.getLogger(PersonItemProcessor.class);

    @Override
    public Person process(Person person) throws Exception {
        final String firstName = person.getFirstName().toUpperCase();
        final String lastName = person.getLastName().toUpperCase();
        
        final Person transformedPerson = new Person(firstName, lastName);
        
        LOGGER.info("Converting ({}) into ({})", person, transformedPerson);
        
        return transformedPerson;
    }
}

Step 5: Create a Sample Data File

Create a sample-data.csv file in your src/main/resources directory:

John,Doe
Jane,Smith
Alex,Johnson
Maria,Garcia
Robert,Brown

Step 6: Create a Schema for the Output Table

Create a schema.sql file in src/main/resources:

DROP TABLE IF EXISTS people;

CREATE TABLE people (
    person_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(255),
    last_name VARCHAR(255)
);

Step 7: Create a Job Runner

Finally, let's create a class to launch the job:

package com.example.batchdemo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class JobRunner implements CommandLineRunner {

    private static final Logger LOGGER = LoggerFactory.getLogger(JobRunner.class);

    private final JobLauncher jobLauncher;
    private final Job job;

    @Autowired
    public JobRunner(JobLauncher jobLauncher, Job job) {
        this.jobLauncher = jobLauncher;
        this.job = job;
    }

    @Override
    public void run(String... args) throws Exception {
        JobParameters jobParameters = new JobParametersBuilder()
                .addLong("time", System.currentTimeMillis())
                .toJobParameters();
        
        JobExecution execution = jobLauncher.run(job, jobParameters);
        
        LOGGER.info("Job Execution Status: {}", execution.getStatus());
    }
}

Running the Application

When you run this Spring Boot application:

Spring Batch initializes required database tables
The job launcher executes our defined job
The reader reads each line from sample-data.csv
The processor converts names to uppercase
The writer inserts the processed data into the people table
The job completes and logs status information

Output Sample:

2023-07-15 14:32:10.123  INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor       : Converting (Person{firstName='John', lastName='Doe'}) into (Person{firstName='JOHN', lastName='DOE'})
2023-07-15 14:32:10.125  INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor       : Converting (Person{firstName='Jane', lastName='Smith'}) into (Person{firstName='JANE', lastName='SMITH'})
...
2023-07-15 14:32:10.142  INFO 12345 --- [main] c.e.batchdemo.JobRunner                 : Job Execution Status: COMPLETED

Advanced Spring Batch Features

Once you understand the basics, Spring Batch offers several advanced features:

1. Job Flow Control

You can build complex job flows with conditional execution:

@Bean
public Job flowJob(JobRepository jobRepository, Step step1, Step step2, Step step3) {
    return new JobBuilder("flowJob", jobRepository)
            .start(step1)
            .on("COMPLETED").to(step2)
            .from(step1).on("FAILED").to(step3)
            .end()
            .build();
}

2. Parallel Processing

For better performance with large datasets:

@Bean
public Step parallelStep(JobRepository jobRepository, 
                        DataSourceTransactionManager transactionManager,
                        ItemReader<InputData> reader,
                        ItemProcessor<InputData, OutputData> processor,
                        ItemWriter<OutputData> writer) {
    return new StepBuilder("parallelStep", jobRepository)
            .<InputData, OutputData>chunk(100, transactionManager)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .taskExecutor(new SimpleAsyncTaskExecutor())
            .throttleLimit(10) // maximum of 10 concurrent threads
            .build();
}

3. Job Parameters

You can pass parameters to a job:

@Bean
public FlatFileItemReader<Person> reader(@Value("#{jobParameters['inputFile']}") String inputFile) {
    return new FlatFileItemReaderBuilder<Person>()
        .name("personItemReader")
        .resource(new FileSystemResource(inputFile))
        .delimited()
        .names(new String[]{"firstName", "lastName"})
        .targetType(Person.class)
        .build();
}

4. Restart Capability

Spring Batch automatically tracks job execution state, allowing failed jobs to be restarted from where they left off:

JobExecution lastExecution = jobExplorer.getLastJobExecution("importUserJob", jobParameters);
if (lastExecution != null && lastExecution.getStatus() == BatchStatus.FAILED) {
    // This will resume from the last failed step
    jobLauncher.run(job, jobParameters);
}

Real-World Example: ETL Process

Let's explore a more practical example of an ETL (Extract, Transform, Load) process that reads sales data from a CSV file, calculates total revenue per product category, and saves the results to a database.

Step 1: Define the Data Models

First, create the input data model:

public class SalesRecord {
    private String transactionId;
    private String productId;
    private String category;
    private double amount;
    private String transactionDate;
    
    // Getters and setters
}

Then, create the output data model:

public class CategorySummary {
    private String category;
    private double totalRevenue;
    private int transactionCount;
    
    // Getters and setters
}

Step 2: Create a Custom ItemProcessor

This processor will aggregate sales by category:

public class SalesSummaryProcessor implements ItemProcessor<SalesRecord, CategorySummary> {
    
    private Map<String, CategorySummary> categorySummaryMap = new HashMap<>();
    
    @Override
    public CategorySummary process(SalesRecord item) {
        // Skip processing - we'll use ItemWriter for aggregation
        return null;
    }
    
    public Collection<CategorySummary> getSummaries() {
        return categorySummaryMap.values();
    }
    
    public void addSalesRecord(SalesRecord record) {
        String category = record.getCategory();
        CategorySummary summary = categorySummaryMap.getOrDefault(category,
                new CategorySummary(category, 0, 0));
                
        summary.setTotalRevenue(summary.getTotalRevenue() + record.getAmount());
        summary.setTransactionCount(summary.getTransactionCount() + 1);
        
        categorySummaryMap.put(category, summary);
    }
}

Step 3: Create a Custom ItemReader and ItemWriter

Create a custom reader for the sales data:

@Bean
public FlatFileItemReader<SalesRecord> salesReader() {
    return new FlatFileItemReaderBuilder<SalesRecord>()
        .name("salesReader")
        .resource(new ClassPathResource("sales-data.csv"))
        .delimited()
        .names("transactionId", "productId", "category", "amount", "transactionDate")
        .fieldSetMapper(new BeanWrapperFieldSetMapper<SalesRecord>() {{
            setTargetType(SalesRecord.class);
        }})
        .build();
}

Create a custom writer that performs the aggregation:

@Bean
public ItemWriter<SalesRecord> salesAggregator(SalesSummaryProcessor processor) {
    return items -> {
        for (SalesRecord item : items) {
            processor.addSalesRecord(item);
        }
    };
}

Step 4: Create a Second Step to Save the Aggregated Data

@Bean
public ItemWriter<CategorySummary> categorySummaryWriter(DataSource dataSource) {
    return new JdbcBatchItemWriterBuilder<CategorySummary>()
        .sql("INSERT INTO category_summary (category, total_revenue, transaction_count) " +
             "VALUES (:category, :totalRevenue, :transactionCount)")
        .dataSource(dataSource)
        .beanMapped()
        .build();
}

Step 5: Create a Job That Chains These Steps Together

@Bean
public Job salesSummaryJob(JobRepository jobRepository, 
                          Step processSalesStep, 
                          Step saveSummaryStep) {
    return new JobBuilder("salesSummaryJob", jobRepository)
        .start(processSalesStep)
        .next(saveSummaryStep)
        .build();
}

@Bean
public Step processSalesStep(JobRepository jobRepository,
                           DataSourceTransactionManager transactionManager,
                           FlatFileItemReader<SalesRecord> salesReader,
                           ItemWriter<SalesRecord> salesAggregator) {
    return new StepBuilder("processSalesStep", jobRepository)
        .<SalesRecord, SalesRecord>chunk(100, transactionManager)
        .reader(salesReader)
        .writer(salesAggregator)
        .build();
}

@Bean
public Step saveSummaryStep(JobRepository jobRepository,
                          DataSourceTransactionManager transactionManager,
                          SalesSummaryProcessor processor,
                          ItemWriter<CategorySummary> categorySummaryWriter) {
    return new StepBuilder("saveSummaryStep", jobRepository)
        .<CategorySummary, CategorySummary>chunk(10, transactionManager)
        .reader(new ListItemReader<>(processor.getSummaries()))
        .writer(categorySummaryWriter)
        .build();
}

Summary

Spring Batch provides a powerful framework for batch processing in Java applications. In this tutorial, we've covered:

Core concepts - Job, Step, JobRepository, ItemReader, ItemProcessor, ItemWriter
Basic implementation - Setting up a simple CSV to database batch job
Advanced features - Flow control, parallel processing, and restart capabilities
Real-world example - An ETL process that aggregates sales data by category

Spring Batch is especially valuable for:

Processing large volumes of data efficiently
Handling errors gracefully with robust restart functionality
Maintaining clean separation between business logic and infrastructure
Providing detailed metrics about job execution

Additional Resources

To continue learning about Spring Batch, explore these resources:

Exercises

To solidify your understanding:

Basic Exercise: Create a batch job that reads a list of names from a CSV file and writes them to a database table.
Intermediate Exercise: Extend the basic example to include validation. Skip records with missing fields and log them to a separate file.
Advanced Exercise: Create a batch job that reads from multiple sources (CSV and JSON), combines the data, and writes aggregate statistics to a database.
Challenge Exercise: Implement a fault-tolerant batch job that can resume from the point of failure and includes retry logic for transient errors.

Spring Batch is a foundational skill for enterprise Java developers, and mastering it will enable you to build robust data processing solutions for a wide range of business needs.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction to Spring Batch​

Why Use Spring Batch?​

Spring Batch Architecture​

Key Components​

1. Job​

2. Step​

3. JobRepository​

4. JobLauncher​

5. Item Reader, Processor, Writer​

Getting Started with Spring Batch​

Step 1: Add Dependencies​

Step 2: Configure the Database​

Step 3: Create a Batch Configuration Class​

Step 4: Create Model and Processor Classes​

Step 5: Create a Sample Data File​

Step 6: Create a Schema for the Output Table​

Step 7: Create a Job Runner​

Running the Application​

Advanced Spring Batch Features​

1. Job Flow Control​

2. Parallel Processing​

3. Job Parameters​

4. Restart Capability​

Real-World Example: ETL Process​

Step 1: Define the Data Models​

Step 2: Create a Custom ItemProcessor​

Step 3: Create a Custom ItemReader and ItemWriter​

Step 4: Create a Second Step to Save the Aggregated Data​

Step 5: Create a Job That Chains These Steps Together​

Summary​

Additional Resources​

Exercises​

Introduction to Spring Batch

Why Use Spring Batch?

Spring Batch Architecture

Key Components

1. Job

2. Step

3. JobRepository

4. JobLauncher

5. Item Reader, Processor, Writer

Getting Started with Spring Batch

Step 1: Add Dependencies

Step 2: Configure the Database

Step 3: Create a Batch Configuration Class

Step 4: Create Model and Processor Classes

Step 5: Create a Sample Data File

Step 6: Create a Schema for the Output Table

Step 7: Create a Job Runner

Running the Application

Advanced Spring Batch Features

1. Job Flow Control

2. Parallel Processing

3. Job Parameters

4. Restart Capability

Real-World Example: ETL Process

Step 1: Define the Data Models

Step 2: Create a Custom ItemProcessor

Step 3: Create a Custom ItemReader and ItemWriter

Step 4: Create a Second Step to Save the Aggregated Data

Step 5: Create a Job That Chains These Steps Together

Summary

Additional Resources

Exercises