4 ways to split/tokenize Strings in java

Splitting a string and get tokens out of it, is not a very un-common task for java programmers specially working on web layer. In web layer, there are plenty of techniques for arising data from view layer to controller, and unfortunately many times we have to pass data in CSV format only () or separated based on some other separator such $,# or another character.

Ads by Google

Situation also come arise when we get data from clients in CSV formats and need to parse it before correctly interpretation it. In this post, I m discussing few known and few not so well known ways of splitting the strings.

Different String split examples

1) Using legacy StringTokenizer     
2) Using recommended String.split() 
3) Apache StringUtils.split()       
4) Google Guava Splitter example

1) Using legacy StringTokenizer

This is really easy to use and has been in java from a long time. A simple use case is:

//Example 1 - By default StringTokenizer breaks String on space
String str = "I am sample string and will be tokenized on space";
StringTokenizer defaultTokenizer = new StringTokenizer(str);
System.out.println("Total number of tokens found : " + defaultTokenizer.countTokens());
while (defaultTokenizer.hasMoreTokens())
{
	System.out.println(defaultTokenizer.nextToken());
}
System.out.println("Total number of tokens found : " + defaultTokenizer.countTokens());

Output:

Total number of tokens found : 10
I
am
sample
string
and
will
be
tokenized
on
space
Total number of tokens found : 0

You saw that in starting the count of tokens was 10, and after fetching all tokens, count reduced to 0.

Split the Spring based on multiple delimiters

This is real good usecase. It allows you to split a string where delimiters can be more than one.

//Example 2 - StringTokenizer with multiple delimiters
String url = "http://howtodoinjava.com/java-initerview-questions";
StringTokenizer multiTokenizer = new StringTokenizer(url, "://.-");
while (multiTokenizer.hasMoreTokens())
{
	System.out.println(multiTokenizer.nextToken());
}

Output:

http
howtodoinjava
com
java
initerview
questions
As java docs says, StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

2) Using recommended String.split()

This one is better and recommend than previous approach using StringTokenizer. Here tokens are returned inform of a string array which you are free to use as you wish.

String[] tokens = "I,am ,Legend, , oh ,you ?".split(",");
for (String token : tokens)
{
	System.out.println(token);
}

Output:

I
am 
Legend
      //Empty token
 oh   //Space in starting
you ?

Above code is really easy to use but as you see in output, it needs extra care while handling the tokens, It return the empty tokens and don’t trim the tokens by default. You need to do these specific handling token by token basis. I don’t like this.

3) Apache StringUtils.split()

This one is very much similar to above approach and it also returns the string[] as output. You need to deal with strings array as you had to in previous code. Only plus side is the code is faster.

String[] tokens = StringUtils.split("I,am ,Legend, , oh ,you ?",",");
for (String token : tokens)
{
	System.out.println(token);
}

Output:

I
am 
Legend
      //Empty token
 oh   //Space in starting
you ?

4) Google Guava Splitter example

This one is best. It looks good while writing and re-usable also. You create a splitter and re-use it as many times as you want. So it helps in achieving uniform logic for splitter application, for similar use-cases.

Another benefit is that it also provided some useful methods while building the splitter itself which eliminates a lot of after work after creating the tokens itself as we saw in above examples.

To build a beautiful splitter, you following code:

Splitter niceCommaSplitter = Splitter.on(',').omitEmptyStrings().trimResults();

And Now use it anywhere in code as you like:

Splitter niceCommaSplitter = Splitter.on(',').omitEmptyStrings().trimResults();

Iterable<String> tokens2 = niceCommaSplitter.split("I,am ,Legend, , oh ,you ?"); 
for(String token: tokens2){
 System.out.println(token);
}

Output:

I
am
Legend
oh
you ?

For you reference, you can download Guava library from there project’s home project:

https://code.google.com/p/guava-libraries/

OR, you can directly include it as maven dependency.

<dependency>
	<groupId>com.google.guava</groupId>
	<artifactId>guava</artifactId>
	<version>17.0</version>
</dependency>

Share your thoughts if you have some other better solutions on this very specific problem of splitting a string in java.

Happy Learning !!

2 thoughts on “4 ways to split/tokenize Strings in java”

  1. Hi Lokesh,

    Just to point a correction. You have copied the same code from String.split() in StringUtils.split() section.

    Regards,
    Moiz

Note:- In comment box, please put your code inside [java] ... [/java] OR [xml] ... [/xml] tags otherwise it may not appear as intended.

Leave a Reply

Your email address will not be published. Required fields are marked *


seven − 4 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>