Till Java 8, String in java were internally represented by char[]
. Each char
is stored in 2 bytes in memory. JDK developers at oracle analyzed lots of client’s application heap dumps and they noticed that most of the strings can be represented only using Latin-1 characters set. A latin-1 char can be stored in one byte, which is 50% (1 byte) less than char
data type storage.
So, the JDK developers defaulted the String
classes internal storage to byte[]
from char[]
. This has resulted in saving of lots of space in heap memory because string objects take up really big portion of heap memory, generally. [Source]
You can control the usage of this feature in your application using -XX:-CompactStrings
parameters to java
command.
String class BEFORE Java 9
Prior to Java 9, string data was stored as an array of chars. This required 16 bits for each char.
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { //The value is used for character storage. private final char value[]; }
String class AFTER Java 9
Starting with Java 9, strings are now internally represented using a byte array along with a flag field for encoding references.
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ @Stable private final byte[] value; /** * The identifier of the encoding used to encode the bytes in * {@code value}. The supported values in this implementation are * * LATIN1 * UTF16 * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. */ private final byte coder; }
‘java’ command reference
As we know that java
command is used to launch a Java application. It can have many parameters to customize the application runtime. One such command is below:
-XX:-CompactStrings
Disables the Compact Strings feature. By default, this option is enabled. When this option is enabled, Java Strings containing only single-byte characters are internally represented and stored as single-byte-per-character Strings using ISO-8859-1 / Latin-1 encoding. This reduces, by 50%, the amount of space required for Strings containing only single-byte characters. For Java Strings containing at least one multibyte character: these are represented and stored as 2 bytes per character using UTF-16 encoding. Disabling the Compact Strings feature forces the use of UTF-16 encoding as the internal representation for all Java Strings.
Cases where it may be beneficial to disable Compact Strings include the following:
- When it’s known that an application overwhelmingly will be allocating multibyte character Strings
- In the unexpected event where a performance regression is observed in migrating from Java SE 8 to Java SE 9 and an analysis shows that Compact Strings introduces the regression
In both of these scenarios, disabling Compact Strings makes sense.
Drop me your questions in comments section.
Happy Learning !!
Comments