Contrary to CSV files, XML files are structured and contain enough meta-data (tags and attributes) to uniquely identify each field in the document. XML documents are, generally, processed with either DOM or SAX parsers. But in Spring Batch, we use a StAX parser.
Although StAX is an event-based parser similar to SAX, StAX allows parsing the sections of the XML document independently. For example, we can parse a complete ‘<person>…</person>
‘ tag (also called fragment) at one time, and process it. This is exactly what we do in Spring Batch as well i.e. reading and processing a complete record in one time.
So with Spring Batch, each time a specified fragment is found in an XML file, it will be considered a single record and converted into an item to be processed.
1. XML File
Before diving into the code, let’s see the person.xml file. It contains 4 fragments i.e. 4 ‘person’ records.
<people>
<person>
<firstName>Lokesh</firstName>
<lastName>Gupta</lastName>
<age>41</age>
<active>true</active>
</person>
....
....
</people>
2. Maven
In the project, we add the Spring Batch and database dependencies for configuring the job repository and steps.
<!-- Batch Setup -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
Additionally, Spring OXM is added for the pre-built Jaxb2Marshaller class. Spring Batch is not picky about the XML binding technology you choose to use. Spring provides Unmarshaller implementations that can use Castor, JAXB, JiBX, XMLBeans, or XStream. For this tutorial, we are using JAXB.
<!-- XML Marshalling and Unmarshalling -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-oxm</artifactId>
</dependency>
<dependency>
<groupId>jakarta.xml.bind</groupId>
<artifactId>jakarta.xml.bind-api</artifactId>
<version>4.0.2</version>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>4.0.5</version>
<scope>runtime</scope>
</dependency>
3. JAXB Setup
First, we need to annotate the model class with JAXB annotations. In this tutorial, the Person.java will be used to store the information of a person. The class can be made more complex, by using nested fields/collections and corresponding JAXB annotations.
Notice the @XmlRootElement annotation which declares the Person class as root object.
import jakarta.xml.bind.annotation.XmlRootElement;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@NoArgsConstructor
@AllArgsConstructor
@XmlRootElement
public class Person {
String firstName;
String lastName;
Integer age;
Boolean active;
}
Next, we need to create the JAXB context with the Jaxb2Marshaller object and register the Person class into the context.
@Bean
public Jaxb2Marshaller jaxb2Marshaller() {
Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
jaxb2Marshaller.setClassesToBeBound(Person.class);
return jaxb2Marshaller;
}
3. Reading XML using StaxEventItemReader
After setting up the dependencies and JAXB, we can create the StaxEventItemReader bean, provided by Spring Batch, which will be responsible for reading the XML fragments. This class is not thread-safe.
The ‘xmlFile’ is the input XML file reference, ‘person’ is the fragment name, and jaxb2Marshaller will convert the fragment to the Person object.
@Value("classpath:person.xml")
private Resource xmlFile;
@Bean
@StepScope
public StaxEventItemReader<Person> personXmlFileReader() {
return new StaxEventItemReaderBuilder<Person>()
.name("personXmlFileReader")
.resource(xmlFile)
.addFragmentRootElements("person")
.unmarshaller(jaxb2Marshaller())
.build();
}
4. Writing XML using StaxEventItemWriter
Very similar to the item reader, we can configure the XML item writer, StaxEventItemWriter, if the project demands it. Otherwise, we can always use a different writer as per the project’s requirements. Note that the file to be written must be a WritableResource instance.
@Bean
public StaxEventItemWriter<Person> personXmlFileWriter(ResourceLoader resourceLoader) {
WritableResource outputXml = (WritableResource)
resourceLoader.getResource("file:output-person.xml");
return new StaxEventItemWriterBuilder<Person>()
.name("personXmlFileWriter")
.marshaller(jaxb2Marshaller())
.resource(outputXml)
.rootTagName("people")
.overwriteOutput(true)
.build();
}
5. Batch Job and Step Configuration
After the XML item reader and writer objects have been set up, we can create the Step object with the desired flow (reader, processor, writer) and then create a Job with this step.
@Bean
Job job(Step step1, JobRepository jobRepository) {
var builder = new JobBuilder("job", jobRepository);
return builder
.start(step1)
.build();
}
@Bean
public Step step1(StaxEventItemReader<Person> reader,
StaxEventItemWriter<Person> writer,
JobRepository jobRepository,
PlatformTransactionManager txManager) {
var builder = new StepBuilder("step1", jobRepository);
return builder
.<Person, Person>chunk(1, txManager)
.reader(reader)
.processor(new LoggingPersonProcessor())
.writer(writer)
.build();
}
6. Complete Example
For reference purposes, please go through the complete configuration file with all the bean definitions.
package com.howtodoinjava.demo.xmlFileReaderWriter;
import com.howtodoinjava.demo.model.Person;
import com.howtodoinjava.demo.processor.LoggingPersonProcessor;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.xml.StaxEventItemReader;
import org.springframework.batch.item.xml.StaxEventItemWriter;
import org.springframework.batch.item.xml.builder.StaxEventItemReaderBuilder;
import org.springframework.batch.item.xml.builder.StaxEventItemWriterBuilder;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.core.io.Resource;
import org.springframework.core.io.ResourceLoader;
import org.springframework.core.io.WritableResource;
import org.springframework.oxm.jaxb.Jaxb2Marshaller;
import org.springframework.transaction.PlatformTransactionManager;
@SpringBootApplication
@Configuration // redundant here
public class XmlReaderWriterJobConfig {
@Value("classpath:person.xml")
private Resource xmlFile;
@Bean
Job job(Step step1, JobRepository jobRepository) {
var builder = new JobBuilder("job", jobRepository);
return builder
.start(step1)
.build();
}
@Bean
public Step step1(StaxEventItemReader<Person> reader,
StaxEventItemWriter<Person> writer,
JobRepository jobRepository,
PlatformTransactionManager txManager) {
var builder = new StepBuilder("step1", jobRepository);
return builder
.<Person, Person>chunk(1, txManager)
.reader(reader)
.processor(new LoggingPersonProcessor())
.writer(writer)
.build();
}
@Bean
@StepScope
public StaxEventItemReader<Person> personXmlFileReader() {
return new StaxEventItemReaderBuilder<Person>()
.name("personXmlFileReader")
.resource(xmlFile)
.addFragmentRootElements("person")
.unmarshaller(jaxb2Marshaller())
.build();
}
@Bean
public StaxEventItemWriter<Person> personXmlFileWriter(ResourceLoader resourceLoader) {
WritableResource outputXml = (WritableResource) resourceLoader.getResource("file:output-person.xml");
return new StaxEventItemWriterBuilder<Person>()
.name("personXmlFileWriter")
.marshaller(jaxb2Marshaller())
.resource(outputXml)
.rootTagName("people")
.overwriteOutput(true)
.build();
}
@Bean
public Jaxb2Marshaller jaxb2Marshaller() {
Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
jaxb2Marshaller.setClassesToBeBound(Person.class);
return jaxb2Marshaller;
}
public static void main(String[] args) {
SpringApplication.run(XmlReaderWriterJobConfig.class);
}
}
7. Run the Job
Spring Boot automatically runs all the registered Job beans in the application startup until we explicitly disable it by setting the ‘spring.batch.job.enabled‘ property to ‘false
‘. So to test out the code, we simply have to start the application and watch out for the logs.
@Configuration
@SpringBootApplication
public class XmlReaderWriterJobConfig {
//...
public static void main(String[] args) {
SpringApplication.run(XmlReaderWriterJobConfig.class);
}
}
The program logs:
2024-06-27T00:01:54.484+05:30 INFO 2028 --- [ main] c.h.d.x.XmlReaderWriterJobConfig : Started XmlReaderWriterJobConfig in 2.898 seconds (process running for 3.304)
2024-06-27T00:01:54.487+05:30 INFO 2028 --- [ main] o.s.b.a.b.JobLauncherApplicationRunner : Running default command line with: []
2024-06-27T00:01:54.557+05:30 INFO 2028 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] launched with the following parameters: [{}]
2024-06-27T00:01:54.582+05:30 INFO 2028 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step1]
2024-06-27T00:01:54.641+05:30 INFO 2028 --- [ main] c.h.d.processor.LoggingPersonProcessor : Processing person information: Person(firstName=Lokesh, lastName=Gupta, age=41, active=true)
2024-06-27T00:01:54.658+05:30 INFO 2028 --- [ main] c.h.d.processor.LoggingPersonProcessor : Processing person information: Person(firstName=Brian, lastName=Schultz, age=42, active=false)
2024-06-27T00:01:54.663+05:30 INFO 2028 --- [ main] c.h.d.processor.LoggingPersonProcessor : Processing person information: Person(firstName=John, lastName=Cena, age=43, active=true)
2024-06-27T00:01:54.667+05:30 INFO 2028 --- [ main] c.h.d.processor.LoggingPersonProcessor : Processing person information: Person(firstName=Albert, lastName=Pinto, age=44, active=false)
2024-06-27T00:01:54.673+05:30 INFO 2028 --- [ main] o.s.batch.core.step.AbstractStep : Step: [step1] executed in 91ms
2024-06-27T00:01:54.680+05:30 INFO 2028 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] completed with the following parameters: [{}] and the following status: [COMPLETED] in 113ms
8. Summary
In this Spring batch tutorial, we learned to create item reader and item writer objects for reading and writing the XML files in a batch application. We learned how to configure the StAX parser and JAXB for marshalling and unmarshalling purposes. Then we learned to create the batch Job and its steps. Finally, we learned the demo to see everything running fine.
Happy Learning !!
Comments