Spring Batch
Introduction to Spring Batch
Spring Batch is a lightweight, comprehensive batch processing framework designed to enable the development of robust batch applications. In the modern enterprise environment, batch processing plays a critical role in handling operations that need to process large volumes of data without user interaction, such as:
- Nightly financial transactions
- Automated report generation
- ETL (Extract, Transform, Load) operations
- Data migration between systems
- Processing of large datasets
Unlike real-time processing where data is handled as it arrives, batch processing collects data over time and processes it in scheduled "batches." Spring Batch provides reusable functions that are essential for processing large volumes of data, including logging, transaction management, job processing statistics, job restart functionality, and resource management.
Why Use Spring Batch?
As a beginner, you might wonder why you should learn Spring Batch when you could write your own loops to process data. Here are some compelling reasons:
- Scalability - Spring Batch can handle millions of records through optimized processing techniques
- Resilience - Built-in support for retrying failed operations and resuming jobs
- Monitoring - Comprehensive metrics and execution statistics
- Maintenance - Separation of business logic from batch infrastructure
- Enterprise Integration - Seamless integration with other Spring projects
Spring Batch Architecture
Spring Batch follows a layered architecture with three primary components:
- Application Layer - Contains your batch jobs and custom code
- Core Layer - Provides core runtime services and APIs
- Infrastructure Layer - Handles the common readers, writers, and services
Key Components
Understanding these foundational components will help you build effective batch applications:
1. Job
A Job represents a complete batch process. It's a container for Steps and defines how the batch process should execute.
2. Step
A Step is a sequential phase of a Job that encapsulates an independent part of the batch process. A job can have multiple steps.
3. JobRepository
The JobRepository stores metadata about configured and executed batch jobs, including status, start and end times, and more.
4. JobLauncher
The JobLauncher is responsible for launching a Job with a given set of parameters.
5. Item Reader, Processor, Writer
These components handle the reading, processing, and writing of data respectively, forming a chunk-oriented processing pattern that is central to Spring Batch operations.
Getting Started with Spring Batch
Let's walk through setting up a basic Spring Batch application.
Step 1: Add Dependencies
First, add Spring Batch dependencies to your Maven pom.xml
:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
Step 2: Configure the Database
Spring Batch requires a database to store metadata. For development, we can use an H2 in-memory database. Add these properties to application.properties
:
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=
spring.h2.console.enabled=true
# Disable Spring Batch auto-execution
spring.batch.job.enabled=false
Step 3: Create a Batch Configuration Class
Now, let's create a basic batch job that reads data from a CSV file, processes it, and writes it to a database:
package com.example.batchdemo;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;
import javax.sql.DataSource;
@Configuration
public class BatchConfiguration {
@Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new ClassPathResource("sample-data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.targetType(Person.class)
.build();
}
@Bean
public PersonItemProcessor processor() {
return new PersonItemProcessor();
}
@Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.beanMapped()
.build();
}
@Bean
public Job importUserJob(JobRepository jobRepository, Step step1) {
return new JobBuilder("importUserJob", jobRepository)
.start(step1)
.build();
}
@Bean
public Step step1(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
FlatFileItemReader<Person> reader,
PersonItemProcessor processor,
JdbcBatchItemWriter<Person> writer) {
return new StepBuilder("step1", jobRepository)
.<Person, Person>chunk(10, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}
Step 4: Create Model and Processor Classes
Create a Person
class to represent our data model:
package com.example.batchdemo;
public class Person {
private String firstName;
private String lastName;
public Person() {
}
public Person(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
@Override
public String toString() {
return "Person{" +
"firstName='" + firstName + '\'' +
", lastName='" + lastName + '\'' +
'}';
}
}
Next, create the processor that transforms the data:
package com.example.batchdemo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.item.ItemProcessor;
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
private static final Logger LOGGER = LoggerFactory.getLogger(PersonItemProcessor.class);
@Override
public Person process(Person person) throws Exception {
final String firstName = person.getFirstName().toUpperCase();
final String lastName = person.getLastName().toUpperCase();
final Person transformedPerson = new Person(firstName, lastName);
LOGGER.info("Converting ({}) into ({})", person, transformedPerson);
return transformedPerson;
}
}
Step 5: Create a Sample Data File
Create a sample-data.csv
file in your src/main/resources
directory:
John,Doe
Jane,Smith
Alex,Johnson
Maria,Garcia
Robert,Brown
Step 6: Create a Schema for the Output Table
Create a schema.sql
file in src/main/resources
:
DROP TABLE IF EXISTS people;
CREATE TABLE people (
person_id BIGINT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255)
);
Step 7: Create a Job Runner
Finally, let's create a class to launch the job:
package com.example.batchdemo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
@Component
public class JobRunner implements CommandLineRunner {
private static final Logger LOGGER = LoggerFactory.getLogger(JobRunner.class);
private final JobLauncher jobLauncher;
private final Job job;
@Autowired
public JobRunner(JobLauncher jobLauncher, Job job) {
this.jobLauncher = jobLauncher;
this.job = job;
}
@Override
public void run(String... args) throws Exception {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("time", System.currentTimeMillis())
.toJobParameters();
JobExecution execution = jobLauncher.run(job, jobParameters);
LOGGER.info("Job Execution Status: {}", execution.getStatus());
}
}
Running the Application
When you run this Spring Boot application:
- Spring Batch initializes required database tables
- The job launcher executes our defined job
- The reader reads each line from
sample-data.csv
- The processor converts names to uppercase
- The writer inserts the processed data into the
people
table - The job completes and logs status information
Output Sample:
2023-07-15 14:32:10.123 INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor : Converting (Person{firstName='John', lastName='Doe'}) into (Person{firstName='JOHN', lastName='DOE'})
2023-07-15 14:32:10.125 INFO 12345 --- [main] c.e.batchdemo.PersonItemProcessor : Converting (Person{firstName='Jane', lastName='Smith'}) into (Person{firstName='JANE', lastName='SMITH'})
...
2023-07-15 14:32:10.142 INFO 12345 --- [main] c.e.batchdemo.JobRunner : Job Execution Status: COMPLETED
Advanced Spring Batch Features
Once you understand the basics, Spring Batch offers several advanced features:
1. Job Flow Control
You can build complex job flows with conditional execution:
@Bean
public Job flowJob(JobRepository jobRepository, Step step1, Step step2, Step step3) {
return new JobBuilder("flowJob", jobRepository)
.start(step1)
.on("COMPLETED").to(step2)
.from(step1).on("FAILED").to(step3)
.end()
.build();
}
2. Parallel Processing
For better performance with large datasets:
@Bean
public Step parallelStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
ItemReader<InputData> reader,
ItemProcessor<InputData, OutputData> processor,
ItemWriter<OutputData> writer) {
return new StepBuilder("parallelStep", jobRepository)
.<InputData, OutputData>chunk(100, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.taskExecutor(new SimpleAsyncTaskExecutor())
.throttleLimit(10) // maximum of 10 concurrent threads
.build();
}
3. Job Parameters
You can pass parameters to a job:
@Bean
public FlatFileItemReader<Person> reader(@Value("#{jobParameters['inputFile']}") String inputFile) {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new FileSystemResource(inputFile))
.delimited()
.names(new String[]{"firstName", "lastName"})
.targetType(Person.class)
.build();
}
4. Restart Capability
Spring Batch automatically tracks job execution state, allowing failed jobs to be restarted from where they left off:
JobExecution lastExecution = jobExplorer.getLastJobExecution("importUserJob", jobParameters);
if (lastExecution != null && lastExecution.getStatus() == BatchStatus.FAILED) {
// This will resume from the last failed step
jobLauncher.run(job, jobParameters);
}
Real-World Example: ETL Process
Let's explore a more practical example of an ETL (Extract, Transform, Load) process that reads sales data from a CSV file, calculates total revenue per product category, and saves the results to a database.
Step 1: Define the Data Models
First, create the input data model:
public class SalesRecord {
private String transactionId;
private String productId;
private String category;
private double amount;
private String transactionDate;
// Getters and setters
}
Then, create the output data model:
public class CategorySummary {
private String category;
private double totalRevenue;
private int transactionCount;
// Getters and setters
}
Step 2: Create a Custom ItemProcessor
This processor will aggregate sales by category:
public class SalesSummaryProcessor implements ItemProcessor<SalesRecord, CategorySummary> {
private Map<String, CategorySummary> categorySummaryMap = new HashMap<>();
@Override
public CategorySummary process(SalesRecord item) {
// Skip processing - we'll use ItemWriter for aggregation
return null;
}
public Collection<CategorySummary> getSummaries() {
return categorySummaryMap.values();
}
public void addSalesRecord(SalesRecord record) {
String category = record.getCategory();
CategorySummary summary = categorySummaryMap.getOrDefault(category,
new CategorySummary(category, 0, 0));
summary.setTotalRevenue(summary.getTotalRevenue() + record.getAmount());
summary.setTransactionCount(summary.getTransactionCount() + 1);
categorySummaryMap.put(category, summary);
}
}
Step 3: Create a Custom ItemReader and ItemWriter
Create a custom reader for the sales data:
@Bean
public FlatFileItemReader<SalesRecord> salesReader() {
return new FlatFileItemReaderBuilder<SalesRecord>()
.name("salesReader")
.resource(new ClassPathResource("sales-data.csv"))
.delimited()
.names("transactionId", "productId", "category", "amount", "transactionDate")
.fieldSetMapper(new BeanWrapperFieldSetMapper<SalesRecord>() {{
setTargetType(SalesRecord.class);
}})
.build();
}
Create a custom writer that performs the aggregation:
@Bean
public ItemWriter<SalesRecord> salesAggregator(SalesSummaryProcessor processor) {
return items -> {
for (SalesRecord item : items) {
processor.addSalesRecord(item);
}
};
}
Step 4: Create a Second Step to Save the Aggregated Data
@Bean
public ItemWriter<CategorySummary> categorySummaryWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<CategorySummary>()
.sql("INSERT INTO category_summary (category, total_revenue, transaction_count) " +
"VALUES (:category, :totalRevenue, :transactionCount)")
.dataSource(dataSource)
.beanMapped()
.build();
}
Step 5: Create a Job That Chains These Steps Together
@Bean
public Job salesSummaryJob(JobRepository jobRepository,
Step processSalesStep,
Step saveSummaryStep) {
return new JobBuilder("salesSummaryJob", jobRepository)
.start(processSalesStep)
.next(saveSummaryStep)
.build();
}
@Bean
public Step processSalesStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
FlatFileItemReader<SalesRecord> salesReader,
ItemWriter<SalesRecord> salesAggregator) {
return new StepBuilder("processSalesStep", jobRepository)
.<SalesRecord, SalesRecord>chunk(100, transactionManager)
.reader(salesReader)
.writer(salesAggregator)
.build();
}
@Bean
public Step saveSummaryStep(JobRepository jobRepository,
DataSourceTransactionManager transactionManager,
SalesSummaryProcessor processor,
ItemWriter<CategorySummary> categorySummaryWriter) {
return new StepBuilder("saveSummaryStep", jobRepository)
.<CategorySummary, CategorySummary>chunk(10, transactionManager)
.reader(new ListItemReader<>(processor.getSummaries()))
.writer(categorySummaryWriter)
.build();
}
Summary
Spring Batch provides a powerful framework for batch processing in Java applications. In this tutorial, we've covered:
- Core concepts - Job, Step, JobRepository, ItemReader, ItemProcessor, ItemWriter
- Basic implementation - Setting up a simple CSV to database batch job
- Advanced features - Flow control, parallel processing, and restart capabilities
- Real-world example - An ETL process that aggregates sales data by category
Spring Batch is especially valuable for:
- Processing large volumes of data efficiently
- Handling errors gracefully with robust restart functionality
- Maintaining clean separation between business logic and infrastructure
- Providing detailed metrics about job execution
Additional Resources
To continue learning about Spring Batch, explore these resources:
- Official Spring Batch Documentation
- Spring Batch Sample Projects
- Building a RESTful Web Service with Spring Batch
Exercises
To solidify your understanding:
-
Basic Exercise: Create a batch job that reads a list of names from a CSV file and writes them to a database table.
-
Intermediate Exercise: Extend the basic example to include validation. Skip records with missing fields and log them to a separate file.
-
Advanced Exercise: Create a batch job that reads from multiple sources (CSV and JSON), combines the data, and writes aggregate statistics to a database.
-
Challenge Exercise: Implement a fault-tolerant batch job that can resume from the point of failure and includes retry logic for transient errors.
Spring Batch is a foundational skill for enterprise Java developers, and mastering it will enable you to build robust data processing solutions for a wide range of business needs.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)