Failover Tuning

In a clustered environment, any network, hardware or other failures can cause an active server to get partitioned from the rest of the servers in its stripe. When your cluster needs to remain tolerant to such failures, you have a choice to make: choose either consistency or availability but not both (CAP theorem). If consistency is chosen over availability, then the cluster will halt processing client requests as consistent reads/writes can't be guaranteed when the cluster is partitioned. But when availability is chosen over consistency, the cluster will respond to client requests even when the cluster is partitioned but the response is not guaranteed to be consistent. In the absence of such failures, the cluster can provide both consistency and availability.

The cluster, by default, is tuned to favour availability over consistency. This means that when such failures happen, the behavior of a stripe is that the remaining passive servers will then run an election and, if not able to find the old active server, the passive server that wins the election becomes the new active server. While this configuration ensures high availability of the data, risks of experiencing a so-called split-brain situation during such elections are increased. In the case of a TSA, split-brain would be a situation in which multiple servers in a stripe are acting as active servers. For example, if an active server gets partitioned from its peers in that stripe, the active server will remain active and the passive servers on the other side of the partition would elect a new active server as well. Any further operations performed on the data are likely to result in inconsistencies.

When tuned for consistency, a stripe would need at least a majority of servers connected with each other to elect an active server. Thus, even if the stripe gets partitioned into two sets of servers due to some network failure, the set with the majority of servers will elect an active server among them and proceed. In the absence of a majority, an active server will not be elected and hence the clients will be prevented from performing any operations, thereby preserving data consistency by sacrificing availability.

When configuring the stripe, the user needs to choose between availability and consistency as the failover priority of the stripe. To prevent split-brain scenarios and thereby preserve data consistency, failover priority must be set to consistency. However, if availability is preferred, failover-priority can be set to availability at the risk of running into split-brain scenarios.

The following xml snippet shows how to configure a stripe for consistency:

Similarly, the stripe can be tuned for availability as follows:

In the absence of any explicit failover-priority configuration value, the default value is availability. This is likely to change in future releases, and therefore, it is recommended to explicitly configure the server with your choice.

Mandating a majority for active server election in certain topologies introduces additional availability issues. For example, in a two-server stripe the majority quorum is two as well. This means that if these servers get disconnected from each other due to a network partition or because of a server failure, the surviving server would not promote itself as the active server as it requires 2 votes to win the election. But since the other voting server is not reachable, it will not be able to get that second vote and hence will not promote itself. In the absence of an active server, the stripe is not available.

Adding a third server is the best option, so that even if one fails, there is a majority (2 out of 3) surviving to elect an active. A three-server stripe can provide data redundancy and high availability at the same time even when one server fails. If adding a third server is not feasible, the alternate option is to get high availability without risking data consistency (via split-brain scenarios) using an external voter. But this configuration cannot offer data redundancy (like a three-server stripe) if a server fails.

An external voter is a client that is allowed to cast a vote in the election of a new active server, in cases where a majority of servers in a stripe are unable to reach a consensus on electing a new active server.

The number of external voters needs to be described in the server configuration. It is recommended that the total number of servers and external voters be kept as an odd number.

External voters need to get registered with the servers to get added as voting members in their elections. If there are n voters configured in the server, then the first n voting clients requesting to get registered will be added as voters. Registration requests of other clients will be declined and put on hold until one of the registered voters gets de-registered.

Voters can de-register themselves from the cluster so that the voting rights can be transferred to other clients waiting to get registered, if there are any. A voting client can de-register itself by using APIs or by getting disconnected from the cluster.

When a voting client gets disconnected from the server, it will automatically get de-registered by the server. When the client reconnects, it will only get registered again as a voter if another voter has not taken its place while this client was disconnected.

Server configuration

A maximum count for the number of external voters allowed can optionally be added to the failover-priority configuration if the stripe is tuned for consistency, as follows:

The failover priority setting and the specified maximum number of external voters across the stripes must be consistent and will be validated during the cluster configuration step. For more information on how to configure a cluster, see the section Cluster Tool.

Client configuration

External voters can be of two variants:

An external voter can be run as a standalone process using a script provided with the kit. The script takes the tc-config files of the stripes in the cluster as arguments. A variant that takes the <host>:<port> combinations instead of the server configuration files is also supported. Each -s option argument must be a comma separated list of <host>:<port> combinations of servers in a single stripe. To register a multi-stripe cluster, multiple -f or -s options can be provided for each stripe.

Any TCStore or Ehcache client can act as an external voter as well by using a voter library distributed with the kit. A client can join the cluster as a voter by creating a TCVoter instance and registering itself with the cluster.

When the voter is no longer required, it can be de-registered from the cluster either by disconnecting that client, or by using the deregister API.

1	Instantiate a TCVoter instance
2	Register the voter with a cluster by providing a cluster name …
3	and host port combinations of all servers in the cluster.
4	De-register from the cluster using the same cluster name that was used to register it.

Since an external voter is just another process, there is no guarantee that it will always be up and available. Especially in the form of client voters, the moment the client leaves, the external voter leaves too. In the rare event of a failure happening (partition splitting the active and passive servers or the active server crashing) and the external voter not being around either, none of the surviving servers will be acting as an active server. The servers will be stuck in an intermediate state where operations from the regular clients are all stalled. A manual intervention will be required to get the cluster out of this state by fixing the cause of the partition or by restarting the crashed server. If neither is feasible, then the third option is to get a server manually promoted using an override vote from an external voter.

The voter process can be started in an override mode to promote a single server stuck in that intermediate state to be an active server. When the voter process is started in this special mode, it will connect to the server that you want to promote, give it an override vote and exit. The voter process can be started in override mode as follows:

Running this command will forcibly promote the server at HOST:PORT to be an active server, if it is stuck in that intermediate state.

When the failover priority of the stripes is tuned for consistency, it has an impact on server startup as well. In a multi-server stripe, the very first server that is started up fresh will not become an active server until it gets a majority quorum of votes from its peers. In order to get it promoted as an active server, its peer servers will have to be brought up so that they all vote and the majority quorum is formed. Bringing up regular voters is not going to help as they need to communicate with all the active servers in the cluster to get registered. But if bringing up the other servers is not feasible for some reason, then an override voter can be used to forcibly promote that server.

1	Failover priority is tuned to favor…
2	Consistency over availability for this stripe