The Java Virtual Machine (JVM)

Selecting and tuning a JVM is an important part in running any Java application smoothly. Applications with low latency requirements often require more attention paid to the JVM, as the JVM is often a big factor in performance.

This section outlines JVM selection, and advice on tuning for low latency applications on these JVMs. There are many different JVM vendors available and each JVM has slightly different configurable parameters. This section outlines a few key vendors and important configuration parameters.

As mentioned above, there are a variety of JVMs to choose from that come from different vendors. Some of these are free, some require a license to use. This section outlines a selection of these JVMs.

This JVM is suitable to fulfil most users needs for Universal Messaging.

This JVM is included in the Universal Messaging distribution kit for the Windows, Linux and Solaris platforms.

This JVM was made free and publicly available in May 2011. It contains many of the assets from the Oracle HotSpot VM.

The Azul Zing VM is a commercial offering from Azul. Its primary feature is a 'Pauseless Garbage Collection'. This VM is well suited to applications which require the absolute lowest latency requirements. Applications which experience higher garbage collection pause times may also benefit from using this VM.

This section covers parameters for the Oracle HotSpot VM which may help improve application performance. These settings can be applied to a Universal Messaging Realm Server by editing the Server_Common.conf file found under the <InstallDir>/UniversalMessaging/server/<InstanceName>/bin directory of your installation, where <InstanceName> is the name of the Universal Messaging realm.

Below are some suggestions of general tuning parameters which can be applied to the HotSpot VM.

-Xmx	The maximum heap size of the JVM.
-Xms	The minimum heap size of the JVM. Set this as equal to the maximum heap size
-XX:+UseLargePages	Allows the JVM to use large pages. This may improve memory access performance. The system must be configured to use large pages.
-XX:+UseNUMA	Allows the JVM to use non uniform memory access. This may improve memory performance

It is important to collect proper monitoring information when tuning an application. This will allow you to quantify the results of changes made to the environment. Monitoring information about the Garbage collection can be collected from a JVM without any significant performance penalty.

We recommend using the most verbose monitoring settings. These can be activated by adding the following commands to the Server_Common.conf file:

This will produce output similar to the following:

2012-07-06T11:42:37.439+0100:
[GC
[ParNew: 17024K->1416K(19136K), 0.0090341 secs]
17024K->1416K(260032K), 0.0090968 secs]
[Times: user=0.02 sys=0.01, real=0.01 secs]

The line starts by printing the time of the garbage collection. If Date Stamps are enabled, this will be the absolute time, otherwise it will be the uptime of the process. Printing the full date is useful for correlating information taken from the nirvana logs or other application logs.

The next line shows if this is a full collection. If the log prints GC, then this is a young generation collection. Full garbage collections are denoted by the output Full GC (System). Full garbage collections are often orders of magnitude longer than young garbage collections, hence for low latency systems they should be avoided. Applications which produce lots of full garbage collections may need to undergo analysis to reduce the stress placed on the JVMs memory management.

The next line displays the garbage collectors type. In this example ParNew is the Parallel Scavenge collector. Detailed explanation of garbage collectors are provided elsewhere on this page. Next to the type, it displays the amount of memory this collector reclaimed, as well as the amount of time it took to do so. Young garbage collections will only produce one line like this, full garbage collections will produce one line for the young generation collection and another for the old generation collection.

The last line in this example shows the total garbage collection time in milliseconds. The user time is the total amount of processor time taken by the garbage collector in user mode. The system time is the total amount of processor time taken by the garbage collector running in privileged mode. The real time is the wall clock time that the garbage collection has taken, in single core systems this will be the user + system time. In multiprocessor systems this time is often less as the garbage collector utilizes multiple cores.

The last flag will also cause the explicit application pause time to be printed out to the console. This output will usually look like the following:

If you observe high client latencies as well as long application pause times, it is likely that the garbage collection mechanism is having an adverse affect on the performance of your application.

The Garbage Collector can be one of the most important aspects of Java Virtual Machine tuning. Large pause times have the capability to negatively impact an applications performance by a noticeable degree. Below are some suggestions of ways to combat specific problems observed by monitoring garbage collection pause times.

Full garbage collections are expensive, and often take an order of magnitude longer than a young generation garbage collection to complete. This kind of collection occurs when the old generation is full, and the JVM attempts to promote objects from the younger generation to the older generation. There are two scenarios where this can happen on a regular basis:

If the information from garbage collection monitoring shows that full garbage collections are removing very few objects from the old generation, and that the old generation remains nearly full after a old generation collection, it is the case that there are many objects on the heap that cannot be cleaned up.

In the case of a Universal Messaging Realm Server exhibiting this symptom, it would be prudent to do an audit of data stored on the server. Stored events, ACL entries, data groups, channels and queues all contribute to the memory footprint of Universal Messaging. Reducing this footprint by removing unused or unnecessary objects will reduce the frequency of full collections.

If the information from garbage collection monitoring shows that young garbage collection results in many promotions on a consistent basis, then the JVM is likely to have to perform full garbage collections frequently to free space for further promotions.

This kind of heap behaviour is caused by objects which remain live for more than a short amount of time. After this short amount of time they are promoted from the young generation into the old generation. These objects pollute the old generation, increasing the frequency of old generation collections. As promotion is an expensive operation, this behaviour often also causes longer young generation pause times.

Universal Messaging will mitigate this kind of problem by employing a caching mechanism on objects. To further decrease the amount of objects with this lifespan it is important that the administrator perform an audit of creation of resources, such as events, ACL entries, channels, data groups or queues. Heavy dynamic creation and removal of ACL entries, channels, data groups and queues may induce this kind of behaviour.

If an administrator has done everything possible to reduce the static application memory footprint, as well as the allocation rate of objects in the realm server then changing some JVM settings may help achieve better results.

Increasing the maximum heap size will reduce the frequency of garbage collections. In general however larger heap sizes will increase the average pause time for garbage collections. Therefore it is important that pause times are measured to ensure they stay within an acceptable limit.

As mentioned above the primary cause of long young generation pauses is large amounts of object promotion. These objects often take the form of events, ACL entries, channels, data groups and queues being created.

To minimise the amount of object creation during normal operating hours it is suggested to employ static creation of many channels, data groups and queues at start up time. This will result in these objects being promoted once at the beginning of operation, remaining in the old generation. Analysing where possible events can be given short lifespans (possibly even made transient) will also reduce the amount of promotion, as these objects will become dereferenced before they are eligible to be moved to the old generation.

It is important to remember that the Java Virtual Machine's memory subsystem performs best when long living objects are created in the initialisation stage, while objects created afterwards die young. Therefore designing your system to create long lived objects like channels at startup and objects like events to be short lived allows Universal Messaging to harmoniously work with the underlying JVM.

Full Garbage collections which take long periods of time can often be remedied by proper tuning of the underlying JVM. The two recommended approaches to reducing the amount of time spent in full garbage collections is detailed below.

The first approach would be to reduce the overall heap size of the application. Larger heaps often increase the amount of time for a garbage collection cycle to finish. Reducing the heap will lower the average time that a garbage collection cycle takes to complete. Smaller heap sizes will require garbage collecting more often however, so it is important to ensure that you balance the need for lower collection times with collection frequency.

If you are not able to reduce the heap size any further, because garbage collection frequency is increasing, it may be beneficial to change the type of garbage collector used. If you are experiencing high maximum latencies correlated with long GC times it may be beneficial to switch to using the CMS collector.

The Concurrent Mark Sweep (CMS) collector aims to minimize the amount of time an application is paused by doing many of its operations in parallel with the application. This collector can be enabled by adding the following parameter to Server_Common.conf:

CMS Collections will usually take more time overall than those done with the Parallel Collector. Only a small fraction of the work done by the CMS collector requires the application to pause however, which will generally result in improved response times.