Reading a Large File Efficiently in Java

Learn to read all lines from a large file (size in GB) in Java and avoid any performance pitfalls such as very high usage of memory or even OutOfMemoryError if the File is large enough.

1. Approach to Read Large Files

Similar to the DOM parser and SAX parser for XML files, we can read a file with two approaches:

  • Reading the complete file in memory before processing it
  • Reading the file content line by line and processing each line independently

The first approach looks cleaner and is suitable for small files where memory requirements are very low (in Kilobytes or a few Megabytes). If used to read large files, it will quickly result in OutOfMemoryError for the files in size of Gigabytes.

The second approach is suitable for reading very large files in Gigabytes when it is not feasible to read the whole file into memory. In this approach, we use line streaming i.e. read the lines from the file in the form of a stream or iterator.

This tutorial is focused on the solutions using the second approach.

2. Files.lines() – Read a Large File in Java 8

Using the Files.lines() method, the contents of the file are read and processed lazily so that only a small portion of the file is stored in memory at any given time.

The good thing about this approach is that we can directly write the Consumer actions and use newer language features such as lambda expressions with Stream.

Path filePath = Paths.get("C:/temp/file.txt")
 
//try-with-resources
try (Stream<String> lines = Files.lines( filePath )) 
{
  lines.forEach(System.out::println);
} 
catch (IOException e) 
{
  e.printStackTrace();
}

3. Common IO’s FileUtils.lineIterator()

The lineIterator() uses a Reader to iterator over the lines of a specified file. Use the try-with-resources to auto-close the iterator after reading the file.

Do not forget to import the latest version of commons-io module into project dependencies.

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.11.0</version>
</dependency>
File file = new File("C:/temp/file.txt");

try(LineIterator it = FileUtils.lineIterator(file, "UTF-8")) {
  while (it.hasNext()) {

    String line = it.nextLine();
    // do something with line
    System.out.println(line);
  }
} catch (IOException e) {
  e.printStackTrace();
}

4. Reading a Large Binary File

Note that when we are reading the files in Stream or line by line, we are referring to the character-based or text files. For reading the binary files, UTF-8 charset may corrupt the data and so the above solution does not apply to binary data files.

To read large raw data files, such as movies or large images, we can use Java NIO’s ByteBuffer and FileChannel classes. Remember that you will need to try different buffer sizes and pick that works best for you.

try (RandomAccessFile aFile = new RandomAccessFile("test.txt", "r");
  FileChannel inChannel = aFile.getChannel();) {

  //Buffer size is 1024
  ByteBuffer buffer = ByteBuffer.allocate(1024);

  while (inChannel.read(buffer) > 0) {
    buffer.flip();
    for (int i = 0; i < buffer.limit(); i++) {
      System.out.print((char) buffer.get());
    }
    buffer.clear(); // do something with the data and clear/compact it.
  }
} catch (IOException e) {
  e.printStackTrace();
}

5. Conclusion

This Java tutorial discussed which class should be used to read large files efficiently. The correct solution depends on the type of file and other deciding factors specific to the problem.

I will suggest benchmarking all solutions in your environment and choosing them based on their performance.

Happy Learning !!

Source Code on Github

Comments

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode