Java 9 – Compact Strings Improvement [JEP 254]

Lokesh Gupta

Till Java 8, String in java were internally represented by char[]. Each char is stored in 2 bytes in memory. JDK developers at oracle analyzed lots of client’s application heap dumps and they noticed that most of the strings can be represented only using Latin-1 characters set. A latin-1 char can be stored in one byte, which is 50% (1 byte) less than char data type storage.

So, the JDK developers defaulted the String classes internal storage to byte[] from char[]. This has resulted in saving of lots of space in heap memory because string objects take up really big portion of heap memory, generally. [Source]

You can control the usage of this feature in your application using -XX:-CompactStrings parameters to java command.

String class BEFORE Java 9

Prior to Java 9, string data was stored as an array of chars. This required 16 bits for each char.

public final class String
   	implements java.io.Serializable, Comparable<String>, CharSequence {

   	//The value is used for character storage.
	private final char value[];

}

String class AFTER Java 9

Starting with Java 9, strings are now internally represented using a byte array along with a flag field for encoding references.

public final class String
   	implements java.io.Serializable, Comparable<String>, CharSequence {

    /** The value is used for character storage. */
	@Stable
	private final byte[] value;

	/**
	 * The identifier of the encoding used to encode the bytes in
	 * {@code value}. The supported values in this implementation are
	 *
	 * LATIN1
	 * UTF16
	 *
	 * @implNote This field is trusted by the VM, and is a subject to
	 * constant folding if String instance is constant. Overwriting this
	 * field after construction will cause problems.
	 */
	private final byte coder;

}

‘java’ command reference

As we know that java command is used to launch a Java application. It can have many parameters to customize the application runtime. One such command is below:

-XX:-CompactStrings

Disables the Compact Strings feature. By default, this option is enabled. When this option is enabled, Java Strings containing only single-byte characters are internally represented and stored as single-byte-per-character Strings using ISO-8859-1 / Latin-1 encoding. This reduces, by 50%, the amount of space required for Strings containing only single-byte characters. For Java Strings containing at least one multibyte character: these are represented and stored as 2 bytes per character using UTF-16 encoding. Disabling the Compact Strings feature forces the use of UTF-16 encoding as the internal representation for all Java Strings.

Cases where it may be beneficial to disable Compact Strings include the following:

  1. When it’s known that an application overwhelmingly will be allocating multibyte character Strings
  2. In the unexpected event where a performance regression is observed in migrating from Java SE 8 to Java SE 9 and an analysis shows that Compact Strings introduces the regression

In both of these scenarios, disabling Compact Strings makes sense.

This is purely an implementation change, with no changes to existing public interfaces.

Drop me your questions in comments section.

Happy Learning !!

Comments

Subscribe
Notify of
guest
1 Comment
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

About Us

HowToDoInJava provides tutorials and how-to guides on Java and related technologies.

It also shares the best practices, algorithms & solutions and frequently asked interview questions.

Our Blogs

REST API Tutorial

Dark Mode

Dark Mode