We all are aware of the functional responsibilities of garbage collector [GC] in java. But, only few try to go real deep on how actually it works. You are not one of them and that’s why you are here. In this post, we will try to understand the current mechanism of GC and will understand its evolution.
Sections in this post: Memory management in java Reference counting mechanism Mark and sweep mechanism Stop and copy GC Generational stop and copy How to increase memory performance
Memory management in java
Memory management in java is responsibility of garbage collector. This is opposite to what has been a practice before java, where programmer were responsible for allocating deallocating the memory in programs. Formally speaking, garbage collector is responsible for
- allocating memory
- ensuring that any referenced objects remain in memory, and
- recovering memory used by objects that are no longer reachable from references in executing code.
Objects that are referenced are said to be live. Objects that are no longer referenced are considered dead and are termed garbage. The process of finding and freeing (also known as reclaiming) the space used by these objects is known as garbage collection.
Garbage collection solves many, but not all, memory allocation problems. You could, for example, create objects indefinitely and continue referencing them until there is no more memory available (Out of memory error). Garbage collection is also a complex task taking time and resources of its own. Space is commonly allocated from a large pool of memory referred to as the heap.
The timing of garbage collection is up to the garbage collector. Typically, the entire heap or a sub-part of it is collected either when it fills up or when it reaches a threshold percentage of occupancy.
The Java HotSpot virtual machine includes four garbage collectors as of J2SE 5.0. All the collectors are generational. We will learn more about generational GC in later sections.
Reference counting mechanism
This has been a very old GC mechanism from initial versions. In this technique, Each object has count of number of pointers to it from other objects and from the stack. Every time a new object reference it, counter increment by one. Similarly, when any object loses its reference, the counter decrement by one. When count reaches 0, object can be deallocated by garbage collector.
Its main advantage has been small amount of work per memory write when allocating to new object. But, it has very critical problem and it was with data cycles. It means when first object was referred with second object, and second is referred with first object, then count never comes to zero, hence they never get garbage collected.
Mark and sweep mechanism
The mark-and-sweep algorithm was the first garbage collection algorithm to be developed that is able to reclaim cyclic data structures. In this algorithm, GC will first identify some objects as default reachable which are generally global variables and local variables in stack. There are called live objects.
In next step, algorithm start tracing the objects from these live objects and mark them live also. This procedure goes on until all the objects are examined and marked as live. The objects not marked live after full tracing are assumed dead objects.
When using mark-and-sweep, unreferenced objects are not reclaimed immediately. Instead, garbage is allowed to accumulate until all available memory has been exhausted. When that happens, the execution of the program is suspended temporarily (It is called stop the world) while the mark-and-sweep algorithm collects all the garbage. Once all unreferenced objects have been reclaimed, the normal execution of the program can resume.
This technique, apart from pausing the application for some duration, need to do de-fragmentation of memory address space frequently which is another overhead.
Stop and copy GC
Like above technique “mark and sweep”, this also depends on identifying the live objects and marking them. The difference lies in how it handles live objects. This technique devised the whole heap in two semi-spaces. Only one semispace is active at a time, and memory allocation for newly created objects takes place only single semispace, while the other remain calm.
When GC run, it starts marking live objects in current semispace and when its done, it copies all live objects to other semispace. All the remaining objects in current semispace are considered dead and they are garbage collected.
As previous approach, it has some advantages like it only touches live objects,. Additionally, no fragmentation is required because while switching semispaces, memory contraction is done.
Main disadvantages of this approach is the need to twice the size of memory needed, because only half is used at a given point of time. Other than this, it also required to stop the world while switching the semi spaces.
Generational stop and copy
Like “stop and copy” technique, it also divides the memory in semispaces but they are now three. These semispaces are called here generation. So, memory in this technique is organized into three generations: a young generation, an old generation, and a permanent generation. Most objects are initially allocated in the young generation. The old generation contains objects that have survived some number of young generation collections, as well as some large objects that may be allocated directly in the old generation. The permanent generation holds objects that the JVM finds convenient to have the garbage collector manage, such as objects describing classes and methods, as well as the classes and methods themselves.
When the young generation fills up, a young generation garbage collection (sometimes referred to as a minor collection) of just that generation is performed. When the old or permanent generation fills up, what is known as a full garbage collection (sometimes referred to as a major collection) is typically done. That is, all generations are collected.
Commonly, the young generation is collected first, using the collection algorithm designed specifically for that generation, because it is usually the most efficient algorithm for identifying garbage in the young generation. Objects that survive GC traces, get pushed into older generations. The older generations are collected less often for obvious reasons i.e. they are there because they will be for longer time. Apart from above if fragmentation/compaction occurs, each generation is compacted separately.
The main advantages of this technique is to reclaim dead objects early in younger generation itself and do not need to scan whole memory everytime to identify dead objects. Older generation objects have already passed some GC cycles so they are assumed to be in system for longer time, so no need to scan them frequently [not perfect case everytime, but mostly It should be].
Disadvantages again are same i.e. need to defragment memory areas and need to stop the world (application) while the GC is running full scan.
How to increase memory performance
1) Don not allocate excessive memory. Allocate just as much needed.
2) Don’t hold on to references
3) Find and resolve memory leaks
4) Do system profiling on each release to verify memory bumps
5) Do not rely on System.gc()
I hope it has been a refresher to you regarding garbage collection mechanism. If you are looking for more information on topics on core java, read these posts.
Happy Learning !!