Parse and Read a CSV File in Java

A CSV file is used to store tabular data in plain-text form. A comma delimiter is used to identify and separate different data tokens in the CSV file.

  • CSV (Comma Separated Values) files are used by consumers, businesses, and scientific applications. Among its most common uses is moving tabular data between programs in runtime that natively operate on incompatible formats.
  • CSV data is popular because so many programs and languages support some variation of CSV at least as an alternative import/export format.

In Java, there are different ways of reading and parsing CSV files. Let us discuss some of the best approaches:

1. Using OpenCSV Library

OpenCSV is a brilliant library for operating on CSV files. It has the following features:

  • Reading arbitrary numbers of values per line
  • Ignoring commas in quoted elements
  • Handling entries that span multiple lines
  • Configurable separator and quote characters
  • Read all the entries at once, or use an Iterator-style model

Import the latest version of OpenCSV into project dependencies.

<dependency>
  <groupId>net.sf.opencsv</groupId>
  <artifactId>opencsv</artifactId>
  <version>2.3</version>
</dependency>

Example 1: Reading the CSV File line by line into String[]

In the given example, we are using CSVReader class from OpenCSV library which wraps a FileReader for reading the actual CSV file. The file uses the delimiter comma.

  • Using the reader.readNext(), we read the CSV file line by line.
  • It throws IOException if an error occurs in reading the file.
  • It throws CsvValidationException if the read line is not a valid CSV string.
  • When all the lines are read, readNext() method returns null and the program terminates.
try(CSVReader reader 
        = new CSVReader(new FileReader("SampleCSVFile.csv")))
{
  String [] nextLine;

  //Read one line at a time
  while ((nextLine = reader.readNext()) != null)
  {
    //Use the tokens as required
    System.out.println(Arrays.toString(nextLine));
  }
}
catch (IOException | CsvValidationException e) {
  e.printStackTrace();
}

2. Using Super CSV Library

Super CSV is to be the foremost, fastest, and most programmer-friendly, free CSV package for Java. It supports a very long list of useful features out of the box, such as:

  • Ability to read and write data as POJO classes
  • Automatic encoding and decoding of special characters
  • Custom delimiter, quote character and line separator
  • Support for cell processors to process each token in a specific manner
  • Ability to apply one or more constraints, such as number ranges, string lengths or uniqueness
  • Ability to process CSV data from files, strings, streams and even zip files

Add the latest version of the latest version of Super CSV in the project.

<dependency>
  <groupId>net.sf.supercsv</groupId>
  <artifactId>super-csv</artifactId>
  <version>2.4.0</version>
</dependency>

Example 2: Reading the CSV File into POJO

We will read the following CSV file.

CustomerId,CustomerName,Country,PinCode,Email
10001,Lokesh,India,110001,abc@gmail.com
10002,John,USA,220002,def@gmail.com
10003,Blue,France,330003,ghi@gmail.com

The corresponding POJO class is:

public class Customer 
{
  private Integer CustomerId;
  private String CustomerName;
  private String Country;
  private Long PinCode;
  private String Email;
}

Remember that the column names should match up exactly with the bean’s field names, and the bean has the appropriate setters defined for each field.

import java.io.FileReader;
import java.io.IOException;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.ParseInt;
import org.supercsv.cellprocessor.ParseLong;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.StrRegEx;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;
 
public class ReadCSVFileExample {
 
  static final String CSV_FILENAME = "data.csv";
 
  public static void main(String[] args) throws IOException 
  {
    try(ICsvBeanReader beanReader 
         = new CsvBeanReader(new FileReader(CSV_FILENAME), CsvPreference.STANDARD_PREFERENCE))
    {
      // the header elements are used to map the values to the bean
      final String[] headers = beanReader.getHeader(true);
      //final String[] headers = new String[]{"CustomerId","CustomerName","Country","PinCode","Email"};
      final CellProcessor[] processors = getProcessors();
 
      Customer customer;
      while ((customer = beanReader.read(Customer.class, headers, processors)) != null) {
        System.out.println(customer);
      }
    } 
  }
 
  /**
   * Sets up the processors used for the examples.
   */
  private static CellProcessor[] getProcessors() {
    final String emailRegex = "[a-z0-9\\._]+@[a-z0-9\\.]+";
    StrRegEx.registerMessage(emailRegex, "must be a valid email address");
 
    final CellProcessor[] processors = new CellProcessor[] {
        new NotNull(new ParseInt()), // CustomerId
        new NotNull(), // CustomerName
        new NotNull(), // Country
        new Optional(new ParseLong()), // PinCode
        new StrRegEx(emailRegex) // Email
    };
    return processors;
  }
}

3. Using java.util.Scanner

The Scanner class breaks its input into tokens using a specified delimiter pattern. The default delimiter is whitespace.

  • We can use a separate Scanner to read lines, and another scanner to parse each line into tokens. This approach may not be useful for large files because it is creating one scanner instance per line.
  • We can use the delimiter comma to parse the CSV file.
  • The CSV tokens may then be converted into values of different datatypes using the various next() methods.

Example 3: Parsing a CSV file using Scanner

try(Scanner scanner = new Scanner(new File("SampleCSVFile.csv"))){

  //Read line
  while (scanner.hasNextLine()) {
    String line = scanner.nextLine();

    //Scan the line for tokens
    try (Scanner rowScanner = new Scanner(line)) {
      rowScanner.useDelimiter(",");
      while (rowScanner.hasNext()) {
        System.out.print(scanner.next());
      }
    }
  }
} catch (FileNotFoundException e) {
  e.printStackTrace();
}

4. Using BufferedReader and String.split()

In this approach, we use BufferedReader to read the file line by line. Then the String.split() function is used to get tokens from the current line based on provided delimiter as the method parameter.

It is useful for small strings or small files.

Example 4: Splitting the CSV String or CSV File

In the given example, we are reading a file line by line. Then each line is split into tokens with a delimiter comma.

try(BufferedReader fileReader
        = new BufferedReader(new FileReader("SampleCSVFile.csv")))
{
  String line = "";

  //Read the file line by line
  while ((line = fileReader.readLine()) != null)
  {
    //Get all tokens available in line
    String[] tokens = line.split(",");

    //Verify tokens
    System.out.println(Arrays.toString(tokens));
  }
}
catch (IOException e) {
  e.printStackTrace();
}

5. Conclusion

Reading a CSV file is possible with many approaches in Java. As Java does not directly have dedicated APIs for CSV handling, we can rely on open-source libraries such as SuperCSV that are very easy to use and highly configurable.

Happy Learning !!

Sourcecode Download

Leave a Reply

27 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions, and frequently asked interview questions.

Our Blogs

REST API Tutorial