Garbage collection (GC) has been one of Java’s great features behind it’s popularity. Garbage collection is the mechanism used in Java to deallocate unused memory. Essentially, it is tracking down all the objects that are still used and marks the rest as garbage. Java’s garbage collection is considered an automatic memory management schema because programmers do not have to designate objects as ready to be deallocated. The garbage collection runs on low-priority threads.
In this tutorial, we will go through various concepts related to memory allocation/deallocatation, algorithms run behind the scene and what options you have to customize this behavior.
Table of Contents Object Life Cycle Garbage collection algorithms Mark and sweep Concurrent mark sweep (CMS) garbage collection Serial garbage collection Parallel garbage collection G1 garbage collection Customization Options Summary
Object Life Cycle
A java’s object life cycle can be seen in 3 stages:
To create an object, generally we use
Object obj = new Object();
When an object is created, a specific amount of memory is allocated for storing that object. The amount of memory allocated can differ based on architecture and JVM.
Object in use
Till the time, object is used by application’s other objects (other live objects have references pointing to it). During its usage, object reside in memory and may contain references to other objects.
The garbage collection system monitors objects and, as feasible, counts the number of references to each object. When there are no references to an object, there is no way to get to it with the currently running code, so it makes perfect sense to deallocate the associated memory.
Garbage collection algorithms
Object creation is done by code you write; and frameworks you use to use their provided features. As a java developer, we are not required to deallocate the memory or dereference the objects. It’s done automatically at JVM level by gargabe collector. Since java’s inception, there have been many updates on algorithms which run behind the scene to free the memory. Let’s see how they work?
Mark and sweep
It is initial and very basic algorithm which runs in two stages:
- Marking live objects – find out all objects that are still alive.
- Removing unreachable objects – get rid of everything else – the supposedly dead and unused objects.
To start with, GC defines some specific objects as Garbage Collection Roots. e.g. local variable and input parameters of the currently executing methods, active threads, static field of the loaded classes and JNI references. Now GC traverses the whole object graph in your memory, starting from those roots and following references from the roots to other objects. Every object the GC visits is marked as alive.
The application threads need to be stopped for the marking to happen as it cannot really traverse the graph if it keeps changing. It is called Stop The World pause.
Second stage is for getting rid of unused objects to freeup memory. This can be done in variety of ways e.g.
- Normal deletion – Normal deletion removes unreferenced objects to free space and leave referenced objects and pointers. The memory allocator (kind of hashtable) holds references to blocks of free space where new object can be allocated.
It is often reffred as
- Deletion with compacting – Only removing unused objects is not efficient because blocks of free memory is scattered across storage area and cause OutOfMemoryError, if created object big enough and does not find large enough memory block.
To solve this issue, after deleting unreferenced objects, compacting is done on the remaining referenced objects. Here compacting refer the process of moving referenced object together. This makes new memory allocation much easier and faster.
It is often reffred as
- Deletion with copying – It is very similar to mark and compacing approach as they too relocate all live objects. The important difference is that the target of relocation is a different memory region.
It is often reffred as
Concurrent mark sweep (CMS) garbage collection
CMS garbage collection is essentially an upgraded mark and sweep method. It scans heap memory using multiple threads. It was modified to take advantage of faster systems and had performance enhancements.
It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads. It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.
To use CMS GC, use below JVM argument:
CMS GC Optimization Options
|Indicates that you want to solely use occupancy as a criterion for starting a CMS collection operation.
|Sets the percentage CMS generation occupancy to start a CMS collection cycle.
|This is the percentage of
MinHeapFreeRatio in CMS generation that is allocated prior to a CMS cycle starts.
|Sets the percentage of
MinHeapFreeRatio in the CMS permanent generation that is allocated before starting a CMS collection cycle.
|Use the parameter to specify how long the CMS is allowed to wait for young collection.
|Elects to use the parallel algorithm for young space collection.
|Enables the use of multiple threads for concurrent phases.
|Sets the number of parallel threads used for the concurrent phases.
|Sets the number of parallel threads you want used for stop-the-world phases.
|Enable the incremental CMS (iCMS) mode.
|If this is not enabled, CMS will not clean permanent space.
System.gc() to trigger concurrent collection instead of a full garbage collection cycle.
Serial garbage collection
This algorithm uses mark-copy for the Young Generation and mark-sweep-compact for the Old Generation. It works on a single thread. When executing, it freezes all other threads until garbage collection operations have concluded.
Due to the thread-freezing nature of serial garbage collection, it is only feasible for very small programs.
To use Serial GC, use below JVM argument:
Parallel garbage collection
Simimar to serial GC, It uses
mark-copy in the Young Generation and
mark-sweep-compact in the Old Generation. Multiple concurrent threads are used for marking and copying / compacting phases. You can configure the number of threads using
Parallel Garbage Collector is suitable on multi-core machines in cases where your primary goal is to increase throughput by efficient usage of existing system resources. Using this approach, GC cycle times can be considerably reduced.
To use parallel GC, use below JVM argument:
G1 garbage collection
The G1 (Garbage First) garbage collector was available in Java 7 and is designed to be the long term replacement for the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.
This approach involves segmenting the memory heap into multiple small regions (typically 2048). Each region is marked as either young generation (further devided into eden regions or survivor regions) or old generation. This allows the GC to avoid collecting the entire heap at once, and instead approach the problem incrementally. It means that only a subset of the regions is considered at a time.
G1 keep tracking of the amount of live data that each region contains. This information is used in determining the regions that contain the most garbage; so they are collected first. That’s why it is name garbage-first collection.
Just like other algorithms, unfortunately, the compacting operation takes place using the Stop the World approach. But as per it’s design goal, you can set specific performance goals to it. You can configure the pauses duration e.g. no more than 10 milliseconds in any given second. Garbage-First GC will do its best to meet this goal with high probability (but not with certainty, that would be hard real-time due to OS level thread management).
If you want to use in Java 7 or Java 8 machines, use JVM argument as below:
G1 Optimization Options
|Size of the heap region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.
|Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.
|This determines the minimum reserve in the heap.
|This is the confidence coefficient pause prediction heuristics.
|This is the pause interval time slice per MMU in milliseconds.
GC Customization Options
GC configuration flags
|Sets the initial and maximum heap size (young space plus tenured space).
|This will cause the JVM to ignore any System.gc() method invocations by an application.
|This is the use policy used to limit the time spent in garbage collection before an OutOfMemory error is thrown.
|This limits the proportion of time spent in garbage collection before an
OutOfMemory error is thrown. This is used with
|This sets the minimum percentage of free space after a full garbage collection before an
OutOfMemory error is thrown. This is used with
|Sets the initial heap size (young space plus tenured space).
|Sets the maximum heap size (young space plus tenured space).
|Sets the initial size of young space.
|Sets the maximum size of young space.
|Sets the size of single survivor space as a portion of Eden space size.
|Sets the initial size of the permanent space.
|Sets the maximum size of the permanent space.
|Sets the size of the stack area dedicated to each thread in bytes.
GC logging flags
|-verbose:gc or -XX:+PrintGC
|This prints the basic garbage collection information.
|This will print more detailed garbage collection information.
|You can print timestamps for each garbage collection event. The seconds are sequential and begin from the JVM start time.
|You can print date stamps for each garbage collection event.
|Using this you can redirect garbage collection output to a file instead of the console.
|You can print detailed information regarding young space following each collection cycle.
|You can use this flag to print TLAB allocation statistics.
|Using this flag, you can print the times for reference processing (that is, weak, soft, and so on) during stop-the-world pauses.
|This creates a heap dump file in an out-of-memory condition.
So in this java garbage collection tutorial, we learned the following –
- Object life cycle is devided into 3 phases i.e. object creation, object in use and object destruction.
- Different single threaded and concurrent GC algorithms.
- Till java 8, parallel GC was default algorithm.
- Since java 9, G1 has been set as default GC algorithm.
- Also, various flags to control the garbage collection algorithm’s behavior and log useful information for any application.
Drop me your questions in comments section.
Happy Learning !!