We all are aware of the functional responsibilities of garbage collector [GC] in Java. But only few try to go real deep on how garbage collection works. You are not one of them and that’s why you are here.
In this Java memory management tutorial, we will try to understand the current algorithms for Java garbage collections and we will understand the evolution of these algorithms.
Table of Contents 1. Memory management in Java 2. Reference counting mechanism 3. Mark and sweep mechanism 4. Stop and copy GC 5. Generational stop and copy 6. How to improve memory utilization in Java
1. Memory management in Java
Memory management in Java is responsibility of garbage collector. This is opposite to what has been a practice before Java, where programmer were responsible for allocating de-allocating the memory in programs.
Formally speaking, garbage collector is responsible for –
- allocating memory
- ensuring that any referenced objects remain in memory, and
- recovering memory used by objects that are no longer reachable from references in executing code.
During application runtime, applications create lots of objects and each object has it’s life cycle. Inside memory, objects that are referenced by other objects are said to be live objects. Objects that are no longer referenced by any live object are considered dead objects and are termed garbage. The process of finding and freeing (also known as reclaiming) the space used by these objects is known as garbage collection.
Garbage collection solves many, but not all, memory allocation problems. We could, for example, create objects indefinitely and continue referencing them until there is no more memory available (Out of memory error). Garbage collection is a complex task which takes time and resources of its own. It is run on space which is commonly allocated from a large pool of memory referred to as the heap.
The timing of garbage collection is up to the garbage collector. Typically, the entire heap or a sub-part of it is collected either when it fills up or when it reaches a threshold percentage of occupancy.
The Java HotSpot virtual machine includes four garbage collectors as of J2SE 5.0. All the collectors are generational. We will learn more about generational GC in later sections.
2. Reference counting mechanism
This has been a very old GC mechanism from initial versions. In reference counting technique, each object has count of number of pointers to it from other objects and from the stack. Every time a new object reference it, counter increment by one. Similarly, when any object loses its reference, the counter decrement by one. When count reaches ‘0’, object can be de-allocated by garbage collector.
The main advantage of reference counting algorithm has been small amount of work per memory write when allocating to new object. But, it has very critical problem with data cycles. It means when first object was referred with second object, and second is referred with first object (cyclic references), then count never comes to zero, hence they never get garbage collected.
3. Mark and sweep mechanism
The mark-and-sweep algorithm was the first garbage collection algorithm to be developed that is able to reclaim cyclic data structures. In this algorithm, GC will first identify some objects as default reachable which are generally global variables and local variables in stack. There are called live objects.
In next step, algorithm start tracing the objects from these live objects and mark them live also. This procedure goes on until all the objects are examined and marked as live. The objects not marked live after full tracing are assumed dead objects.
When using mark-and-sweep, un-referenced objects are not reclaimed immediately. Instead, garbage collection is allowed to accumulate until all available memory has been exhausted. When that happens, the execution of the program is suspended temporarily (It is called stop the world) while the mark-and-sweep algorithm collects all the garbage. Once all un-referenced objects have been reclaimed, the normal execution of the program can resume.
This technique, apart from pausing the application for some duration, need to do de-fragmentation of memory address space frequently which is another overhead.
4. Stop and copy GC
Like “mark and sweep”, this algorithm also depends on identifying the live objects and marking them. The difference lies in how it handles live objects.
Stop and copy technique devises the whole heap in two semi-spaces. Only one semispace is active at a time, and memory allocation for newly created objects takes place only single semispace, while the other remain calm.
When GC run, it starts marking live objects in current semispace and when its done, it copies all live objects to other semispace. All the remaining objects in current semispace are considered dead and they are garbage collected.
As previous approach, it has some advantages like it only touches live objects. Additionally, no fragmentation is required because while switching semispaces, memory contraction is done.
Main disadvantages of this approach is the need to twice the size of memory needed, because only half is used at a given point of time. Other than this, it also required to stop the world while switching the semi spaces.
5. Generational stop and copy
Like “stop and copy” technique, it also divides the memory in semispaces but they are now three semispaces. These semispaces are called here generations. So, memory in this technique is organized into three generations- young generation, old generation, and permanent generation.
Most objects are initially allocated in the young generation. The old generation contains objects that have survived some number of young generation collections, as well as some large objects that may be allocated directly in the old generation. The permanent generation holds objects that the JVM finds convenient to have the garbage collector manage, such as objects describing classes and methods, as well as the classes and methods themselves.
When the young generation fills up, a young generation garbage collection (sometimes referred to as a minor collection) of just that generation is performed. When the old or permanent generation fills up, what is known as a full garbage collection (sometimes referred to as a major collection) is typically done. That is, all generations are collected.
Commonly, the young generation is collected first, using the garbage collection algorithm designed specifically for that generation, because it is usually the most efficient algorithm for identifying garbage in the young generation. Objects that survive GC traces, get pushed into older generations. The older generations are collected less often for obvious reasons i.e. they are there because they will be for longer time. Apart from above if fragmentation/compaction occurs, each generation is compacted separately.
The main advantages of this technique is to reclaim dead objects early in younger generation itself and do not need to scan whole memory everytime to identify dead objects. Older generation objects have already passed some GC cycles so they are assumed to be in system for longer time, so no need to scan them frequently [not perfect case everytime, but mostly It should be].
Disadvantages again are same i.e. need to defragment memory areas and need to stop the world (application) while the GC is running full scan.
6. How to improve memory utilization in Java
- Do not allocate excessive memory. Allocate memory only just as much needed. This is specially applicable to Java arrays.
- Don’t hold on to references. Once the object is used and no longer needed, assign
nullreference to it.
- Find and resolve memory leaks
- Do system profiling on each release to verify memory bumps
- Do not rely on System.gc() to run garbage collection
I hope it has been a refresher to you regarding garbage collection mechanisms which enables automatic memory management for Java programs. This may help you in answering Java memory management interview questions.
Happy Learning !!