Event Types and Definitions
This section describes the types of events that can be found in logs or viewed in the TMC.
memory.longgc (Memory Manager) Level: WARN
Cause: A full garbage collection (GC) longer than the configured threshold has occurred.
Action: Reduce cache memory footprint in L1 (Terracotta client). Investigate issues with application logic and garbage creation.
Notes: The default critical threshold is 8 seconds, but it can be reconfigured in tc.properties using
longgc.threshhold. For information about setting tc.properties, see the
Terracotta Configuration Parameters.
Occurrence of this event could help diagnose certain failures. For details, see "Configuring the HealthChecker Properties" in the BigMemory Max High-Availability Guide.
dgc.periodic.started (DGC) Level: INFO
Cause: Periodic distributed garbage collection (DGC), which was explicitly enabled in the configuration, has started a cleanup cycle.
Action: If periodic DGC is unneeded, disable it to improve overall cluster performance.
Notes: Periodic DGC, which is disabled by default, is mostly useful in the absence of automatic handling of distributed garbage.
dgc.periodic.finished (DGC) Level: INFO
Cause: Periodic DGC, which was explicitly enabled in the configuration, ended a cleanup cycle.
Action: If periodic DGC is unneeded, disable it to improve overall cluster performance.
Notes: Event message reads "DGC[ {0} ] finished. Begin Count : {1} Collected : {2} Time Taken : {3} ms Live Objects : {4}".
dgc.periodic.canceled (DGC) Level: INFO
Cause: Periodic DGC, wPeriodichich was explicitly enabled in the configuration, has been cancelled due to an interruption (for example, by a failover operation).
Action: If periodic DGC is unneeded, disable it to improve overall cluster performance.
Notes: Periodic DGC, which is disabled by default, is mostly useful in the absence of automatic handling of distributed garbage.
dgc.inline.cleanup.started (DGC) Level: INFO
Cause: L2 (Terracotta server) is starting up as ACTIVE with existing data, triggering inline distributed garbage collection (DGC).
Action: No action necessary.
Notes: Only seen when a server starts up as ACTIVE upon a recovery, using Fast Restartability.
dgc.inline.cleanup.finished (DGC) Level: INFO
Cause: Inline DGC operation completed.
Action: No action necessary.
Notes: Event message reads "Inline DGC [ {0} ] reference cleanup finished. Begin Count : {1} Collected : {2} Time Taken : {3} ms Live Objects : {4}".
dgc.inline.cleanup.canceled (DGC) Level: INFO
Cause: Inline DGC operation interrupted.
Action: Investigate any unusual cluster behavior or other events.
Notes: Possibly occurs during failover, but other events should indicate real cause.
topology.node.joined (Cluster Topology) Level: INFO
Cause: Specified node has joined the cluster.
Action: No action necessary.
Notes: None.
topology.node.left (Cluster Topology) Level: WARN
Cause: Specified node has left the cluster.
Action: Check why the node has left (for example: long GC, network issues, or issues with local node resources).
Notes: None.
topology.node.state (Cluster Topology) Level: INFO
Cause: L2 changing state (for example, from INITIALIZING to ACTIVE).
Action: Check to see that the state change is expected.
Notes: Event message reads "Moved to {0}", where {0} is the new state.
topology.handshake.reject (Cluster Topology) Level: ERROR
Cause: L1 is unsuccessfully trying to reconnect to cluster, but it has already been expelled.
Action: If the L1 does not go into a rejoin operation, it must be restarted manually.
Notes: Event message reads "An {0} client {1} tried to connect to {2} server. Connection refused!!"
topology.active.left (Cluster Topology) Level: WARN
Cause: Active server left the cluster.
Action: Check why the active L2 has left.
Notes: None.
topology.mirror.left (Cluster Topology) Level: WARN
Cause: Mirror server left the cluster.
Action: Check why the mirror L2 has left.
Notes: None.
topology.zap.received (Cluster Topology) Level: CRITICAL
Cause: One L2 is trying to cause another L2 to restart ("zap").
Action: Investigate a possible "split brain" situation (a mirror L2 behaves as the ACTIVE) if the zapped L2 does not obey the restart order.
Notes: A "zap" operation happens only within a mirror group. Event message reads "SPLIT BRAIN, {0} and {1} are ACTIVE", where {0} and {1} are the two servers vying for the ACTIVE role.
topology.zap.accepted (Cluster Topology) Level: CRITICAL
Cause: The L2 is accepting the order to restart ("zap" order).
Action: Check the state of the zapped L2 to ensure that it restarts as a mirror, or manually restart it.
Notes: A "zap" order is issued only within a mirror group. Event message reads "{0} has more clients. Exiting!!", where {0} is the L2 that becomes the ACTIVE.
topology.db.dirty (Cluster Topology) Level: WARN
Cause: A mirror L2 is trying to join with data in place.
Action: If the mirror does not automatically restart and wipe its data, its data may need to be manually wiped and before it is restarted.
Notes: Restarted mirror L2s must wipe their data to resync with the active L2. This is normally an automatic operation that should not require action. Event message reads "Started with dirty database. Exiting!! Restart {0}", where {0} is the the mirror that is automatically restarting.
topology.config.reloaded (Cluster Topology) Level: INFO
Cause: Cluster configuration was reloaded.
Action: No action necessary.
Notes: None.
dcv2.servermap.eviction (DCV2) Level: INFO
Cause: Automatic evictions for optimizing Terracotta Server Array operations.
Action: No action necessary.
Notes: Event message reads "DCV2 Eviction - Time taken (msecs)={0}, Number of entries evicted={1}, Number of segments over threshold={2}, Total Overshoot={3}".
system.time.different (System Setup) Level: WARN
Cause: System clocks are not aligned.
Action: Synchronize system clocks.
Notes: The default tolerance is 30 seconds, but it can be reconfigured in tc.properties using
time.sync.threshold. For information about setting tc.properties, see
Terracotta Configuration Parameters.
Note that overly large tolerance can introduce unpredictable errors and behaviors.
resource.capacity.near (Resource) Level: WARN
Cause: L2 entered throttled mode, which could be a temporary condition (e.g., caused by bulk-loading) or could indicate insufficient allocation of memory.
Notes: After emitting this, L2 can emit
resource.capacityrestored (return to normal mode) or
resource.fullcapacity (move to restricted mode), based on resource availability. Event message reads "{0} is nearing capacity limit, performance may be degraded - {1}% usage", where {0} is the L2 identification and {1} is the % usage of the memory resources allocated to that L2.
resource.capacity.full (Resource) Level: ERROR
Cause: L2 entered restricted mode, which could be a temporary condition (e.g., caused by bulk-loading) or could indicate insufficient allocation of memory.
Notes: After emitting this, L2 can emit
resource.capacityrestored (return to normal mode), based on resource availability. Event message reads "{0} is at over capacity limit, no further additive operations will be accepted - {1}% usage", where {0} is the L2 identification and {1} is the % usage of the memory resources allocated to that L2.
resource.capacity.restored (Resource) Level: INFO
Cause: L2 returned to normal from throttled or restricted mode.
Action: No action necessary.
Notes: Event message reads "{0} capacity has been restored, performance has returned to normal - {1}% usage", where {0} is the L2 identification and {1} is the % usage of the memory resources allocated to that L2.