Universal Messaging Clusters: Outages and Recovery
Should any cluster member realm exit unexpectedly or become disconnected from the remaining cluster realms, it needs to fully recover the current cluster state as soon as it restarts or attempts to rejoin the cluster.
When a cluster member rejoins the cluster, they automatically move into the recovery state until all local stores are recovered and its state is fully validated against the current master realm.
In order to achieve this, each clustered resource must recover the state from the master. This involves a complex evaluation of its own local stores against the master realm's stores to ensure that they contain the correct events, and that any events that no longer exist in any queues or topics are removed from its local stores. With queues for example, events are physically stored in sequence, but may be consumed non-sequentially (for example using message selectors that would consume and remove, say, every fifth event). Such an example would result in a fairly sparse and fragmented store, and adds to the complexity of recovering the correct state. Universal Messaging clusters will, however, automatically perform this state recovery upon restart of any cluster member.