Universal Messaging 10.3 | Concepts | Clustered Server Concepts | About Active/Active Clustering
About Active/Active Clustering
In an active/active cluster, multiple servers are active and working together to publish and subscribe messages. Universal Messaging clients automatically move from one server to another server in a cluster as required or when specific servers within the cluster become unavailable to the client for any reason. The state of all the client operations is maintained in the cluster to enable automatic failover.
To form an active/active cluster, more than 50% of the servers (a quorum) in the cluster must be active and intercommunicating. Quorum is the term used to describe the state of a fully formed cluster with an elected master.
Applications connected to a Universal Messaging cluster can:
*Publish and subscribe to channels
*Push and pop events from queues
*Connect to any Universal Messaging server instance and view the server state
If a cluster node is unavailable, client applications automatically reconnect to any of the other cluster nodes and continue to operate.
How Does the Active/Active Cluster Work?
In an active/active cluster, one of the cluster nodes must be designated as the master node. The master node is selected by the cluster nodes. Each cluster node submits a vote to choose the master node. If the master node exits or goes offline due to power or network failure, the remaining active cluster nodes elect a new master, provided more than 50% of the cluster nodes are available to form the cluster.
Cluster nodes replicate resources amongst themselves, and maintain the state of the resources across all cluster nodes. Operations such as configuration changes, transactions, and client connections go through the master node. The master node broadcasts the requests to the other cluster nodes to ensure that all the servers are in sync. If a cluster node disconnects and reconnects, all the states and data are recovered from the master node.
You can connect one cluster to another cluster through remote cluster connections. Remote cluster connections enable bi-directional joins between clusters, therefore joining the resources of both the clusters for publish and subscribe.
Active/Active Cluster with Sites
In this approach, you can configure just two servers to form a cluster. The quorum rule of availability of more than 50% servers in the cluster is achieved by defining the servers in two sites (primary and backup), and by allocating an additional vote (IsPrime flag) to one of these sites.
The value of the IsPrime flag in a site indicates whether the primary site or the backup site as a whole can cast an additional vote. The failover is automatic if the site where the IsPrime flag is set to false fails. If the site where the IsPrime flag is set to true fails, you need to manually set the IsPrime flag to true on the active site and perform manual failover.
This approach provides:
*Transparent client failover
*Semi-transparent server failover
*Load balancing and scalability
In the diagram, two servers are configured in just two sites: primary (master) and backup (slave). The IsPrimeFlag is set to true in the prime site. If the server in the backup site becomes unavailable, the cluster continues to work with the sever in the prime site because the prime site has an additional vote to achieve the quorum rule of more than 50% available servers.
However, if the connection to the server in the prime site is lost when the server on the backup site is active, you must manually set the IsPrime flag to true in the backup site so that the server in the backup site can achieve quorum.
Switching the prime site MUST be a manual operation by an administrator who can confirm that the previous prime site is indeed down and not merely disconnected from the other sites. Attempts to automate this process raises the risk of "split brain" situations, in which loss of data is very likely.
Even if you configure only two sites, you can define odd or even number of servers split across these sites.