Spring Batch Delete Files After Processing

In Spring batch jobs, the best approach to delete the flat files after reading or processing is to create a separate Tasklet and execute it at the end of the job when processing is complete.

1. Using Tasklet for Deleting Processed Files or Records

In Spring batch, Tasklets are meant to perform a single task within a step. A Job can have multiple steps, and each step should perform only one defined task. Its execute() method is invoked by the framework during batch processing.

1.1. Creating the Tasklet

This is an example of such Tasklet which will delete all CSV files used in the Spring Boot Batch Example at the end of the job. This Tasklet will be added as the last step in the batch processing so it can remove the files after the records processing is complete.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.UnexpectedJobExecutionException;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.core.io.Resource;

import java.io.File;

public class DeleteInputCsvTasklet implements Tasklet, InitializingBean {

  private static final Logger log = LoggerFactory.getLogger(DeleteInputCsvTasklet.class);

  private Resource[] resources;

  public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {

    for (Resource r : resources) {
      File file = r.getFile();
      boolean deleted = file.delete();
      if (!deleted) {
        throw new UnexpectedJobExecutionException("Could not delete file " + file.getPath());
      }
    }
    return RepeatStatus.FINISHED;
  }

  public void setResources(Resource[] resources) {
    this.resources = resources;
  }

  public void afterPropertiesSet() throws Exception {
    if (resources == null) {
      log.info("No resource to delete");
    }
  }
}

Feel free to modify the logic inside DeleteInputCsvTasklet to archive the files to a different location or implement your own archiving logic.

1.2. How to use File Delete Tasklet

Create a Step, step2() in this example, to be executed after the main step and execute the DeleteInputCsvTasklet. In the execute() method, we can delete all the files, records, or rows, according to project needs.

@Configuration
public class CsvToDatabaseJob {

  //...

  // Resource to be deleted
  @Value("classpath:csv/inputData.csv")
  private Resource inputFeed;

  // Use File delete tasklet as step 2
  @Bean
  public Job insertIntoDbFromCsvJob(Step step1, Step step2) {

    var name = "Persons Import Job";
    var builder = new JobBuilder(name, jobRepository);

    return builder.start(step1)
        .next(step2)
        .listener(new JobCompletionNotificationListener())
        .build();
  }

  @Bean
  public Step step1(ItemReader<Person> reader,
                    ItemWriter<Person> writer,
                    PlatformTransactionManager txManager) {

    var name = "INSERT CSV RECORDS To DB Step";
    var builder = new StepBuilder(name, jobRepository);
    return builder
        .<Person, Person>chunk(5, txManager)
        .reader(reader)
        .writer(writer)
        .build();
  }

  //File delete tasklet
  @Bean
  public Step step2(PlatformTransactionManager txManager) {

    DeleteInputCsvTasklet task = new DeleteInputCsvTasklet();
    task.setResources(new Resource[]{inputFeed});

    var name = "DELETE CSV FILE";

    var builder = new StepBuilder(name, jobRepository);
    return builder
        .tasklet(task, txManager)
        .build();
  }

  //...
}

1.3. Demo

Now run the application and watch the logs.

2023-11-28T15:33:36.982+05:30  INFO 19592 --- [           main] o.s.batch.core.step.AbstractStep         : Step: [INSERT CSV RECORDS To DB Step] executed in 40ms
2023-11-28T15:33:36.987+05:30  INFO 19592 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [DELETE CSV FILE]
2023-11-28T15:33:36.991+05:30  INFO 19592 --- [           main] o.s.batch.core.step.AbstractStep         : Step: [DELETE CSV FILE] executed in 3ms
2023-11-28T15:33:36.994+05:30  INFO 19592 --- [           main] .j.c.l.JobCompletionNotificationListener : JOB FINISHED !!

2. Using JobExecutionListener for Deleting the Processed Files

Another possible way to delete the processed files or clean up other resources is by using the JobExecutionListener. It provides the callbacks at specific points in the lifecycle of a Job.

2.1. Modifying JobExecutionListener

We can watch out for the Job completion event, and when the Job is completed, we can delete the files. In the following code, we check for BatchStatus.COMPLETED status and when it is received, we remove all the resources in a for loop, and log the result accordingly.

import lombok.SneakyThrows;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;
import org.springframework.core.io.Resource;

import java.io.File;

public class JobCompletionNotificationListener implements JobExecutionListener {

  private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);

  private Resource[] resources;

  public JobCompletionNotificationListener(Resource[] resources) {
    this.resources = resources;
  }

  @SneakyThrows
  @Override
  public void afterJob(JobExecution jobExecution) {

    if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
      log.info("JOB FINISHED !!");
      if (resources == null) {
        log.info("No resource to delete");
      }

      for (Resource r : resources) {
        File file = r.getFile();
        boolean deleted = file.delete();

        if (!deleted) {
          log.info("Could not delete file " + file.getPath());
        } else {
          log.info("File deleted: " + file.getPath());
        }
      }
    }
  }
}

2.2. Registering the JobExecutionListener

The JobExecutionListener is registered with the JobBuilder.listener() method when creating the Job bean. Checkout the updated code as follows:

@Value("classpath:csv/inputData.csv")
private Resource inputFeed;

@Bean
public Job insertIntoDbFromCsvJob(Step step1, Step step2) {

  var name = "Persons Import Job";
  var builder = new JobBuilder(name, jobRepository);

  return builder.start(step1)
      //.next(step2)
      .listener(new JobCompletionNotificationListener(new Resource[]{inputFeed}))
      .build();
}

2.3. Demo

Let’s run the demo application again and check the logs.

2023-11-28T15:46:54.879+05:30  INFO 24244 --- [main] o.s.batch.core.step.AbstractStep         : Step: [INSERT CSV RECORDS To DB Step] executed in 40ms
2023-11-28T15:46:54.883+05:30  INFO 24244 --- [main] .j.c.l.JobCompletionNotificationListener : JOB FINISHED !!
2023-11-28T15:46:54.884+05:30  INFO 24244 --- [main] .j.c.l.JobCompletionNotificationListener : File deleted: C:\Users\...\spring-boot-batch\target\classes\csv\inputData.csv

3. Conclusion

In this Spring batch tutorial, we learned to clean up the input files after the batch processing is completed. It may not be required in all applications, but when such a requirement comes we can use the discussed approaches i.e. using the custom Tasklet or using the JobExecutionListener.

Drop me your questions in the comments section.

Happy Learning !!

Source Code on Github

Comments

Subscribe
Notify of
guest
3 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.