Skip to main content

Spring Batch

Introduction to Spring Batch

Spring Batch is a lightweight, comprehensive batch processing framework designed to enable the development of robust batch applications. In the modern enterprise environment, batch processing plays a critical role in handling operations that need to process large volumes of data without user interaction, such as:

  • Nightly financial transactions
  • Automated report generation
  • ETL (Extract, Transform, Load) operations
  • Data migration between systems
  • Processing of large datasets

Unlike real-time processing where data is handled as it arrives, batch processing collects data over time and processes it in scheduled "batches." Spring Batch provides reusable functions that are essential for processing large volumes of data, including logging, transaction management, job processing statistics, job restart functionality, and resource management.

Why Use Spring Batch?

As a beginner, you might wonder why you should learn Spring Batch when you could write your own loops to process data. Here are some compelling reasons:

  1. Scalability - Spring Batch can handle millions of records through optimized processing techniques
  2. Resilience - Built-in support for retrying failed operations and resuming jobs
  3. Monitoring - Comprehensive metrics and execution statistics
  4. Maintenance - Separation of business logic from batch infrastructure
  5. Enterprise Integration - Seamless integration with other Spring projects

Spring Batch Architecture

Spring Batch follows a layered architecture with three primary components:

  1. Application Layer - Contains your batch jobs and custom code
  2. Core Layer - Provides core runtime services and APIs
  3. Infrastructure Layer - Handles the common readers, writers, and services

Spring Batch Architecture

Key Components

Understanding these foundational components will help you build effective batch applications:

1. Job

A Job represents a complete batch process. It's a container for Steps and defines how the batch process should execute.

2. Step

A Step is a sequential phase of a Job that encapsulates an independent part of the batch process. A job can have multiple steps.

3. JobRepository

The JobRepository stores metadata about configured and executed batch jobs, including status, start and end times, and more.

4. JobLauncher

The JobLauncher is responsible for launching a Job with a given set of parameters.

5. Item Reader, Processor, Writer

These components handle the reading, processing, and writing of data respectively, forming a chunk-oriented processing pattern that is central to Spring Batch operations.

Getting Started with Spring Batch

Let's walk through setting up a basic Spring Batch application.

Step 1: Add Dependencies

First, add Spring Batch dependencies to your Maven pom.xml:

xml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>

Step 2: Configure the Database

Spring Batch requires a database to store metadata. For development, we can use an H2 in-memory database. Add these properties to application.properties:

properties
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=
spring.h2.console.enabled=true

# Disable Spring Batch auto-execution
spring.batch.job.enabled=false

Step 3: Create a Batch Configuration Class

Now, let's create a basic batch job that reads data from a CSV file, processes it, and writes it to a database:

java
package com.example.batchdemo;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;

import javax.sql.DataSource;

@Configuration
public class BatchConfiguration {

@Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new ClassPathResource("sample-data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.targetType(Person.class)
.build();
}

@Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}

@Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.beanMapped()
.build();
}

@Bean
public Job importUserJob(JobRepository jobRepository, Step step1) {
return new JobBuilder("importUserJob", jobRepository)
.start(step1)
.build();
}

@Bean
public Step step1(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
FlatFileItemReader<Person> reader,
PersonItemProcessor processor,
JdbcBatchItemWriter<Person> writer) {
return new StepBuilder("step1", jobRepository)
.<Person, Person>chunk(10, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}

Step 4: Create Model and Processor Classes

Create a Person class to represent our data model:

java
package com.example.batchdemo;

public class Person {

private String firstName;
private String lastName;

public Person() {
}

public Person(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}

public String getFirstName() {
return firstName;
}

public void setFirstName(String firstName) {
this.firstName = firstName;
}

public String getLastName() {
return lastName;
}

public void setLastName(String lastName) {
this.lastName = lastName;
}

@Override
public String toString() {
return "Person{" +
"firstName='" + firstName + '\'' +
", lastName='" + lastName + '\'' +
'}';
}
}

Next, create the processor that transforms the data:

java
package com.example.batchdemo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<Person, Person> {

private static final Logger LOGGER = LoggerFactory.getLogger(PersonItemProcessor.class);

@Override
public Person process(Person person) throws Exception {
final String firstName = person.getFirstName().toUpperCase();
final String lastName = person.getLastName().toUpperCase();

final Person transformedPerson = new Person(firstName, lastName);

LOGGER.info("Converting ({}) into ({})", person, transformedPerson);

return transformedPerson;
}
}

Step 5: Create a Sample Data File

Create a sample-data.csv file in your src/main/resources directory:

John,Doe
Jane,Smith
Alex,Johnson
Maria,Garcia
Robert,Brown

Step 6: Create a Schema for the Output Table

Create a schema.sql file in src/main/resources:

sql
DROP TABLE IF EXISTS people;

CREATE TABLE people (
person_id BIGINT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255)
);

Step 7: Create a Job Runner

Finally, let's create a class to launch the job:

java
package com.example.batchdemo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class JobRunner implements CommandLineRunner {

private static final Logger LOGGER = LoggerFactory.getLogger(JobRunner.class);

private final JobLauncher jobLauncher;
private final Job job;

@Autowired
public JobRunner(JobLauncher jobLauncher, Job job) {
this.jobLauncher = jobLauncher;
this.job = job;
}

@Override
public void run(String... args) throws Exception {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("time", System.currentTimeMillis())
.toJobParameters();

JobExecution execution = jobLauncher.run(job, jobParameters);

LOGGER.info("Job Execution Status: {}", execution.getStatus());
}
}

Running the Application

When you run this Spring Boot application:

  1. Spring Batch initializes required database tables
  2. The job launcher executes our defined job
  3. The reader reads each line from sample-data.csv
  4. The processor converts names to uppercase
  5. The writer inserts the processed data into the people table
  6. The job completes and logs status information

Output Sample:

2023-07-15 14:32:10.123  INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor       : Converting (Person{firstName='John', lastName='Doe'}) into (Person{firstName='JOHN', lastName='DOE'})
2023-07-15 14:32:10.125 INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor : Converting (Person{firstName='Jane', lastName='Smith'}) into (Person{firstName='JANE', lastName='SMITH'})
...
2023-07-15 14:32:10.142 INFO 12345 --- [main] c.e.batchdemo.JobRunner : Job Execution Status: COMPLETED

Advanced Spring Batch Features

Once you understand the basics, Spring Batch offers several advanced features:

1. Job Flow Control

You can build complex job flows with conditional execution:

java
@Bean
public Job flowJob(JobRepository jobRepository, Step step1, Step step2, Step step3) {
return new JobBuilder("flowJob", jobRepository)
.start(step1)
.on("COMPLETED").to(step2)
.from(step1).on("FAILED").to(step3)
.end()
.build();
}

2. Parallel Processing

For better performance with large datasets:

java
@Bean
public Step parallelStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
ItemReader<InputData> reader,
ItemProcessor<InputData, OutputData> processor,
ItemWriter<OutputData> writer) {
return new StepBuilder("parallelStep", jobRepository)
.<InputData, OutputData>chunk(100, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(new SimpleAsyncTaskExecutor())
.throttleLimit(10) // maximum of 10 concurrent threads
.build();
}

3. Job Parameters

You can pass parameters to a job:

java
@Bean
public FlatFileItemReader<Person> reader(@Value("#{jobParameters['inputFile']}") String inputFile) {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new FileSystemResource(inputFile))
.delimited()
.names(new String[]{"firstName", "lastName"})
.targetType(Person.class)
.build();
}

4. Restart Capability

Spring Batch automatically tracks job execution state, allowing failed jobs to be restarted from where they left off:

java
JobExecution lastExecution = jobExplorer.getLastJobExecution("importUserJob", jobParameters);
if (lastExecution != null && lastExecution.getStatus() == BatchStatus.FAILED) {
// This will resume from the last failed step
jobLauncher.run(job, jobParameters);
}

Real-World Example: ETL Process

Let's explore a more practical example of an ETL (Extract, Transform, Load) process that reads sales data from a CSV file, calculates total revenue per product category, and saves the results to a database.

Step 1: Define the Data Models

First, create the input data model:

java
public class SalesRecord {
private String transactionId;
private String productId;
private String category;
private double amount;
private String transactionDate;

// Getters and setters
}

Then, create the output data model:

java
public class CategorySummary {
private String category;
private double totalRevenue;
private int transactionCount;

// Getters and setters
}

Step 2: Create a Custom ItemProcessor

This processor will aggregate sales by category:

java
public class SalesSummaryProcessor implements ItemProcessor<SalesRecord, CategorySummary> {

private Map<String, CategorySummary> categorySummaryMap = new HashMap<>();

@Override
public CategorySummary process(SalesRecord item) {
// Skip processing - we'll use ItemWriter for aggregation
return null;
}

public Collection<CategorySummary> getSummaries() {
return categorySummaryMap.values();
}

public void addSalesRecord(SalesRecord record) {
String category = record.getCategory();
CategorySummary summary = categorySummaryMap.getOrDefault(category,
new CategorySummary(category, 0, 0));

summary.setTotalRevenue(summary.getTotalRevenue() + record.getAmount());
summary.setTransactionCount(summary.getTransactionCount() + 1);

categorySummaryMap.put(category, summary);
}
}

Step 3: Create a Custom ItemReader and ItemWriter

Create a custom reader for the sales data:

java
@Bean
public FlatFileItemReader<SalesRecord> salesReader() {
return new FlatFileItemReaderBuilder<SalesRecord>()
.name("salesReader")
.resource(new ClassPathResource("sales-data.csv"))
.delimited()
.names("transactionId", "productId", "category", "amount", "transactionDate")
.fieldSetMapper(new BeanWrapperFieldSetMapper<SalesRecord>() {{
setTargetType(SalesRecord.class);
}})
.build();
}

Create a custom writer that performs the aggregation:

java
@Bean
public ItemWriter<SalesRecord> salesAggregator(SalesSummaryProcessor processor) {
return items -> {
for (SalesRecord item : items) {
processor.addSalesRecord(item);
}
};
}

Step 4: Create a Second Step to Save the Aggregated Data

java
@Bean
public ItemWriter<CategorySummary> categorySummaryWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<CategorySummary>()
.sql("INSERT INTO category_summary (category, total_revenue, transaction_count) " +
"VALUES (:category, :totalRevenue, :transactionCount)")
.dataSource(dataSource)
.beanMapped()
.build();
}

Step 5: Create a Job That Chains These Steps Together

java
@Bean
public Job salesSummaryJob(JobRepository jobRepository,
Step processSalesStep,
Step saveSummaryStep) {
return new JobBuilder("salesSummaryJob", jobRepository)
.start(processSalesStep)
.next(saveSummaryStep)
.build();
}

@Bean
public Step processSalesStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
FlatFileItemReader<SalesRecord> salesReader,
ItemWriter<SalesRecord> salesAggregator) {
return new StepBuilder("processSalesStep", jobRepository)
.<SalesRecord, SalesRecord>chunk(100, transactionManager)
.reader(salesReader)
.writer(salesAggregator)
.build();
}

@Bean
public Step saveSummaryStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
SalesSummaryProcessor processor,
ItemWriter<CategorySummary> categorySummaryWriter) {
return new StepBuilder("saveSummaryStep", jobRepository)
.<CategorySummary, CategorySummary>chunk(10, transactionManager)
.reader(new ListItemReader<>(processor.getSummaries()))
.writer(categorySummaryWriter)
.build();
}

Summary

Spring Batch provides a powerful framework for batch processing in Java applications. In this tutorial, we've covered:

  1. Core concepts - Job, Step, JobRepository, ItemReader, ItemProcessor, ItemWriter
  2. Basic implementation - Setting up a simple CSV to database batch job
  3. Advanced features - Flow control, parallel processing, and restart capabilities
  4. Real-world example - An ETL process that aggregates sales data by category

Spring Batch is especially valuable for:

  • Processing large volumes of data efficiently
  • Handling errors gracefully with robust restart functionality
  • Maintaining clean separation between business logic and infrastructure
  • Providing detailed metrics about job execution

Additional Resources

To continue learning about Spring Batch, explore these resources:

  1. Official Spring Batch Documentation
  2. Spring Batch Sample Projects
  3. Building a RESTful Web Service with Spring Batch

Exercises

To solidify your understanding:

  1. Basic Exercise: Create a batch job that reads a list of names from a CSV file and writes them to a database table.

  2. Intermediate Exercise: Extend the basic example to include validation. Skip records with missing fields and log them to a separate file.

  3. Advanced Exercise: Create a batch job that reads from multiple sources (CSV and JSON), combines the data, and writes aggregate statistics to a database.

  4. Challenge Exercise: Implement a fault-tolerant batch job that can resume from the point of failure and includes retry logic for transient errors.

Spring Batch is a foundational skill for enterprise Java developers, and mastering it will enable you to build robust data processing solutions for a wide range of business needs.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)