BigMemory 4.3.6 | Product Documentation | BigMemory Max High-Availability Guide | Configuring the HealthChecker Properties | Configuration Examples
 
Configuration Examples
The configuration examples in this section show settings for L1 -> L2 HealthChecker. However, they apply in the similarly to L2 -> L2 and L2 -> L1, which means that the server is using HealthChecker on the client.
Aggressive
The following settings create an aggressive HealthChecker with low tolerance for short network outages or long GC cycles:
<property name="l1.healthcheck.l2.ping.enabled" value="true" />
<property name="l1.healthcheck.l2.ping.idletime" value="2000" />
<property name="l1.healthcheck.l2.ping.interval" value="1000" />
<property name="l1.healthcheck.l2.ping.probes" value="3" />
<property name="l1.healthcheck.l2.socketConnect" value="true" />
<property name="l1.healthcheck.l2.socketConnectTimeout" value="2" />
<property name="l1.healthcheck.l2.socketConnectCount" value="5" />
According to the HealthChecker "Max Time" formula, the maximum time before a remote node is considered to be lost is computed in the following way:
2000 + 5 [( 3 * 1000 ) + ( 2 * 1000)] = 27000
In this case, after the initial idletime of 2 seconds, the remote failed to respond to ping probes but responded to every socket connection attempt, indicating that the node is reachable but not functional (within the allowed time frame) or in a long GC cycle. This aggressive HealthChecker configuration declares a node dead in no more than 27 seconds.
If at some point the node had been completely unreachable (a socket connection attempt failed), HealthChecker would have declared it dead sooner. Where, for example, the problem is a disconnected network cable, the "Max Time" is likely to be even shorter:
2000 + 1[3 * 1000) + ( 2 * 1000 ) = 7000
In this case, HealthChecker declares a node dead in no more than 7 seconds.
Tolerant
The following settings create a HealthChecker with a higher tolerance for interruptions in network communications and long GC cycles:
<property name="l1.healthcheck.l2.ping.enabled" value="true" />
<property name="l1.healthcheck.l2.ping.idletime" value="5000" />
<property name="l1.healthcheck.l2.ping.interval" value="1000" />
<property name="l1.healthcheck.l2.ping.probes" value="3" />
<property name="l1.healthcheck.l2.socketConnect" value="true" />
<property name="l1.healthcheck.l2.socketConnectTimeout" value="5" />
<property name="l1.healthcheck.l2.socketConnectCount" value="10" />
According to the HealthChecker "Max Time" formula, the maximum time before a remote node is considered to be lost is computed in the following way:
5000 + 10 [( 3 x 1000 ) + ( 5 x 1000)] = 85000
In this case, after the initial idletime of 5 seconds, the remote failed to respond to ping probes but responded to every socket connection attempt, indicating that the node is reachable but not functional (within the allowed time frame) or excessively long GC cycle. This tolerant HealthChecker configuration declares a node dead in no more than 85 seconds.
If at some point the node had been completely unreachable (a socket connection attempt failed), HealthChecker would have declared it dead sooner. Where, for example, the problem is a disconnected network cable, the "Max Time" is likely to be even shorter:
5000 + 1[3 * 1000) + ( 5 * 1000 )] = 13000
In this case, HealthChecker declares a node dead in no more than 13 seconds.