Configuration Examples
The configuration examples in this section show settings for L1 -> L2 HealthChecker. However, they apply similarly to L2 -> L2 and L2 -> L1, which means that the server is using HealthChecker on the client.
Aggressive
The following settings create an aggressive HealthChecker with low tolerance for short network outages or long GC cycles:
<property name="l1.healthcheck.l2.ping.enabled" value="true" />
<property name="l1.healthcheck.l2.ping.idletime" value="2000" />
<property name="l1.healthcheck.l2.ping.interval" value="1000" />
<property name="l1.healthcheck.l2.ping.probes" value="3" />
<property name="l1.healthcheck.l2.socketConnect" value="true" />
<property name="l1.healthcheck.l2.socketConnectTimeout" value="2" />
<property name="l1.healthcheck.l2.socketConnectCount" value="5" />
According to the HealthChecker "Max Time" formula, the maximum time (in milliseconds) before a remote node is considered to be lost is computed in the following way:
2000 + 5 [( 3 * 1000 ) + 1000] = 22000
In this case, after the initial idle time of 2 seconds, the remote failed to respond to ping probes but responded to every socket connection attempt, indicating that the node is reachable but not functional (within the allowed time frame) or in a long GC cycle. This aggressive HealthChecker configuration declares a node dead in no more than 22 seconds.
If at some point the node had been completely unreachable (a socket connection attempt failed), HealthChecker would have declared it dead sooner. Where, for example, the problem is a disconnected network cable, the "Max Time" is likely to be even shorter:
2000 + 1[(3 * 1000) + ( 2 * 1000 )] = 7000
In this case, HealthChecker declares a node dead in no more than 7 seconds.
Tolerant
The following settings create a HealthChecker with a higher tolerance for interruptions in network communications and long GC cycles:
<property name="l1.healthcheck.l2.ping.enabled" value="true" />
<property name="l1.healthcheck.l2.ping.idletime" value="5000" />
<property name="l1.healthcheck.l2.ping.interval" value="1000" />
<property name="l1.healthcheck.l2.ping.probes" value="3" />
<property name="l1.healthcheck.l2.socketConnect" value="true" />
<property name="l1.healthcheck.l2.socketConnectTimeout" value="5" />
<property name="l1.healthcheck.l2.socketConnectCount" value="10" />
According to the HealthChecker "Max Time" formula, the maximum time (in milliseconds) before a remote node is considered to be lost is computed in the following way:
5000 + 10 [( 3 x 1000 ) + 1000] = 45000
In this case, after the initial idle time of 5 seconds, the remote failed to respond to ping probes but responded to every socket connection attempt, indicating that the node is reachable but not functional (within the allowed time frame) or excessively long GC cycle. This tolerant HealthChecker configuration declares a node dead in no more than 45 seconds.
If at some point the node had been completely unreachable (a socket connection attempt failed), HealthChecker would have declared it dead sooner. Where, for example, the problem is a disconnected network cable, the "Max Time" is likely to be even shorter:
5000 + 1[(3 * 1000) + ( 5 * 1000 )] = 13000
In this case, HealthChecker declares a node dead in no more than 13 seconds.