Till Java 8, String in java were internally represented by char[]
. Each char
is stored in 2 bytes in memory. JDK developers at oracle analyzed lots of client’s application heap dumps and they noticed that most of the strings can be represented only using Latin-1 characters set. A latin-1 char can be stored in one byte, which is 50% (1 byte) less than char
data type storage.
So, the JDK developers defaulted the String
classes internal storage to byte[]
from char[]
. This has resulted in saving of lots of space in heap memory because string objects take up really big portion of heap memory, generally. [Source]
You can control the usage of this feature in your application using -XX:-CompactStrings
parameters to java
command.
String class BEFORE Java 9
Prior to Java 9, string data was stored as an array of chars. This required 16 bits for each char.
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { //The value is used for character storage. private final char value[]; }
String class AFTER Java 9
Starting with Java 9, strings are now internally represented using a byte array along with a flag field for encoding references.
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ @Stable private final byte[] value; /** * The identifier of the encoding used to encode the bytes in * {@code value}. The supported values in this implementation are * * LATIN1 * UTF16 * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. */ private final byte coder; }
‘java’ command reference
As we know that java
command is used to launch a Java application. It can have many parameters to customize the application runtime. One such command is below:
-XX:-CompactStrings
Disables the Compact Strings feature. By default, this option is enabled. When this option is enabled, Java Strings containing only single-byte characters are internally represented and stored as single-byte-per-character Strings using ISO-8859-1 / Latin-1 encoding. This reduces, by 50%, the amount of space required for Strings containing only single-byte characters. For Java Strings containing at least one multibyte character: these are represented and stored as 2 bytes per character using UTF-16 encoding. Disabling the Compact Strings feature forces the use of UTF-16 encoding as the internal representation for all Java Strings.
Cases where it may be beneficial to disable Compact Strings include the following:
- When it’s known that an application overwhelmingly will be allocating multibyte character Strings
- In the unexpected event where a performance regression is observed in migrating from Java SE 8 to Java SE 9 and an analysis shows that Compact Strings introduces the regression
In both of these scenarios, disabling Compact Strings makes sense.
Drop me your questions in comments section.
Happy Learning !!