Reading a Large File Efficiently in Java

Learn to read all lines from a large file (size in GB) in Java and avoid any performance pitfalls such as very high usage of memory or even OutOfMemoryError if the File is large enough.

1. Approach to Read Large Files

Similar to DOM parser and SAX parser for XML files, we can read a file with two approaches:

  • Reading the complete file in memory before processing it
  • Reading the file content line by line and processing each line independently

The first approach looks cleaner and is suitable for small files where memory requirements are very low (in Kilobytes or few Megabytes). If used to read large files, it will quickly result in OutOfMemoryError for the files in size of Gigabytes.

The second approach is suitable for reading very large files in Gigabytes when it is not feasible to read the whole file into memory. In this approach, we use the line streaming i.e. read the lines from the file in form of a stream or iterator.

This tutorial is focused on the solutions using the second approach.

2. Using New IO’s Files.lines()

Using the Files.lines() method, the contents of the file are read and processed lazily so that only a small portion of the file is stored in memory at any given time.

The good thing about this approach is that we can directly write the Consumer actions and use newer language features such as lambda expressions with Stream.

Path filePath = Paths.get("C:/temp/file.txt")
 
//try-with-resources
try (Stream<String> lines = Files.lines( filePath )) 
{
  lines.forEach(System.out::println);
} 
catch (IOException e) 
{
  e.printStackTrace();
}

3. Common IO’s FileUtils.lineIterator()

The lineIterator() uses a Reader to iterator over the lines of a specified file. Use the try-with-resources to auto-close the iterator after reading the file.

Do not forget to import the latest version of commons-io module into project dependencies.

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.11.0</version>
</dependency>
File file = new File("C:/temp/file.txt");

try(LineIterator it = FileUtils.lineIterator(file, "UTF-8")) {
  while (it.hasNext()) {

    String line = it.nextLine();
    // do something with line
    System.out.println(line);
  }
} catch (IOException e) {
  e.printStackTrace();
}

4. Reading Large Binary Files

Note that when we are reading the files in Stream or line by line, we are referring to the character-based or text files. For reading the binary files, UTF-8 charset may corrupt the data and so the above solution does not apply to binary data files.

To read large raw data files, such as movies or large images, we can use Java NIO’s ByteBuffer and FileChannel classes. Remember that you will need to try different buffer sizes and pick that works best for you.

try (RandomAccessFile aFile = new RandomAccessFile("test.txt", "r");
  FileChannel inChannel = aFile.getChannel();) {

  //Buffer size is 1024
  ByteBuffer buffer = ByteBuffer.allocate(1024);

  while (inChannel.read(buffer) > 0) {
    buffer.flip();
    for (int i = 0; i < buffer.limit(); i++) {
      System.out.print((char) buffer.get());
    }
    buffer.clear(); // do something with the data and clear/compact it.
  }
} catch (IOException e) {
  e.printStackTrace();
}

5. Conclusion

This Java tutorial discussed a few efficient solutions to read very large files. The correct solution depends on the type of file and other deciding factors specific to the problem.

I will suggest benchmarking all solutions in your environment and choosing based on their performance.

Happy Learning !!

Source Code on Github

Was this post helpful?

Join 7000+ Awesome Developers

Get the latest updates from industry, awesome resources, blog updates and much more.

* We do not spam !!

Leave a Comment

HowToDoInJava

A blog about Java and related technologies, the best practices, algorithms, and interview questions.