Safe Cluster Restart Automation Guidelines
This document provides guidelines for automating the cluster restart procedure. This is useful when a simple restart of the cluster is needed.
Note that these guidelines apply to restarting a healthy cluster only, one where all servers in the cluster are up and there is only one active server per stripe.
Cluster restart automation
Cluster restart involves the following steps:
STEP 1. Shut down the clients.
STEP 2. Restart the passive servers in safe mode.
STEP 3. Restart the active servers in safe mode.
STEP 4. Make the previous active servers exit from safe mode.
STEP 5. Make the previous passive servers exit from safe mode.
The details for these steps are as follows:
STEP 1. Shut down the clients The Terracotta client will shut down when you shut down your application.
STEP 2. Restart the passive servers in safe mode Use the stop-tc-server script with the options --stop-if-passive and --restart-in-safe-mode to ensure that a server only restarts if it is in passive mode.
Use a procedure indicated by the following pseudocode for shutting down all passive servers:
for each <server> in <running servers> {
stop-tc-server --stop-if-passive --restart-in-safe-mode <server> <args>
}
Wait for the passive servers to reach SAFE_MODE_STATE. The server state can determined by using the server-stat script.
STEP 3. Restart the active servers in safe mode Use the stop-tc-server script with the options --stop-if-active and --restart-in-safe-mode. This restarts a server only if it is in active mode.
Use a procedure indicated by the following pseudocode for shutting down all active servers:
for each <server> in <running servers> {
stop-tc-server --stop-if-active --restart-in-safe-mode <server> <args>
}
Wait for all servers to reach SAFE_MODE_STATE. The server state can determined by using the server-stat script.
STEP 4. Make previous active servers exit from safe mode Use the exit-safe-mode script to make a server exit from safe mode.
All previous active servers can be determined using the server-stat script. The server-stat script provides the state of a server prior to shutdown in the initialState field.
Use a procedure indicated by the following pseudocode to make previous active servers exit from safe mode:
<previous active servers> = []
for each <server> in <servers> {
server-stat -s <server>:<management-port>
add <server> to <previous active servers> if the initialState is Active state
}
for each <server> in <previous active servers> {
exit-safe-mode -s <server>:<management-port>
}
STEP 5. Make previous passive servers exit from safe mode Use a procedure indicated by the following pseudocode to make previous passive servers exit from safe mode:
for each <server> in <previous passive servers> {
exit-safe-mode -s <server>:<management-port>
}