Terracotta Ehcache 10.7 | Terracotta Server Administration Guide | Cluster Tool
 
Cluster Tool
The cluster tool is a command-line utility that allows administrators of the Terracotta Server Array to perform a variety of cluster management tasks. For example, the cluster tool can be used to:
*Obtain the running status of servers
*Dump the state of running servers
*Take backups of running servers
*Promote a suspended server on startup or failover
*Shut down an entire cluster
*Perform a conditional partial shutdown of a cluster having one or more passive servers configured for high availability (for upgrades etc.)
The cluster tool script is located in tools/bin under the product installation directory as cluster-tool.bat for Windows platforms, and as cluster-tool.sh for Unix/Linux.
Cluster Tool commands
The cluster tool provides several commands. To list them and their respective options, run cluster-tool.sh (or cluster-tool.bat on Windows) without any arguments, or use the option -h (long option --help).
The following section provides a list of options common to all commands, and thus need to be specified before the command name:
Precursor options
1. -v (long option --verbose)
This option gives you a verbose output, and is useful to debug error conditions. Default: false.
2. -srd (long option --security-root-directory)
This option can be used to communicate with a server which has TLS/SSL-based security configured. For more details on setting up security in a Terracotta cluster, see the section Security Core Concepts.
3. -t (long option --connection-timeout)
This option lets you specify a custom timeout value (in milliseconds) for connections to be established in cluster tool commands. Default: 10s.
4. -r (long option --request-timeout)
This option lets you specify a request timeout value for operations. Default: 10s.
The "status" Command
The status command displays the status of a cluster, or particular server(s) in the same or different clusters..
Syntax:
status [-n <cluster-name>] [-o json]
-s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-o json
Output in JSON format. Default is tabular.
*-s <hostname[:port]>,<hostname[:port]>...​
The hostname:port(s) or hostname s) (default port being 9410) of running servers, each specified using the -s option. When provided with option -n, a reachable server in the provided list will be used. Otherwise, the command will be individually executed on each server in the list.
Examples
*The example below shows the execution of a cluster-level status command.
./cluster-tool.sh status -n tc-cluster -s localhost
| STRIPE: 1 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+

| server-1 | localhost:9410 | ACTIVE |

| server-2 | localhost:9510 | PASSIVE |
+--------------------+----------------------+--------------------------+

| STRIPE: 2 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+

| server-3 | localhost:9610 | ACTIVE |

| server-4 | localhost:9710 | PASSIVE |
+--------------------+----------------------+--------------------------+
*The example below shows the execution of a server-level status command. No server is running at localhost:9510, hence the UNREACHABLE status.
./cluster-tool.sh status -s localhost:9410 -s localhost:9510 -s localhost:9910
+------------------------+--------------------+--------------------------+--------------------------------------------+
| Host-Port | Status | Member of Cluster | Additional Information |
+------------------------+--------------------+--------------------------+--------------------------------------------+
| localhost:9410 | ACTIVE | tc-cluster | - |
-----------------------------------------------------------------------------------------------------------------------
| localhost:9510 | PASSIVE | tc-cluster | - |
-----------------------------------------------------------------------------------------------------------------------
| localhost:9910 | UNREACHABLE | - | localhost:9910=Connection refused; |
+------------------------+--------------------+--------------------------+--------------------------------------------+

Error (PARTIAL_FAILURE): Command completed with errors.
To learn more about server states, visit the section Logical Server States.
The "promote" command
The promote command can be used to promote a server stuck in a suspended state. For more information about suspended states, refer to the topics Server startup and Manual promotion with override voter in the section Failover Tuning.
Syntax:
promote -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. The command will be individually executed on each server in the list.
Note:
There is no cluster-wide flavor for this command.
Examples
*The example below shows the execution of the promote command on a server stuck in suspended state at localhost:9410.
./cluster-tool.sh promote -s localhost
Following sub-operations were successful:
localhost:9410: Server promotion successful
Command completed successfully.
*The example below shows the erroneous execution of a server-level promote command. The server running at localhost:9510 is not in a suspended state to be promoted, hence the failure.
./cluster-tool.sh promote -s localhost:9510
Following sub-operations were unsuccessful:
localhost:9510: com.terracottatech.tools.clustertool.exceptions.ClusterToolException:
Promote command failed as the server is not in a suspended state
Error (FAILURE): Command failed.
The "dump" Command
The dump command dumps the state of a cluster, or particular server(s) in the same or different clusters. The dump of each server can be found in its logs.
Syntax:
dump [-n <cluster-name>] -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. When provided with option -n, a reachable server in the provided list will be used. Otherwise, the command will be individually executed on each server in the list.
Examples
*The example below shows the execution of a cluster-level dump command.
./cluster-tool.sh dump -n tc-cluster -s localhost:9410
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation
Following sub-operations were successful:
localhost:9410: Dump successful
localhost:9510: Dump successful
localhost:9610: Dump successful
localhost:9710: Dump successful

Command completed successfully.
*The example below shows the execution of a server-level dump command. No server is running at localhost:9910, hence the dump failure.
./cluster-tool.sh dump -s localhost:9410 -s localhost:9510 -s localhost:9910

Following sub-operations were successful:
localhost:9410: Dump successful
localhost:9510: Dump successful

Following sub-operations were unsuccessful:
localhost:9910:
org.terracotta.diagnostic.client.connection.DiagnosticServiceProviderException:
com.terracotta.connection.api.DetailedConnectionException:
java.util.concurrent.TimeoutException: localhost:9910=Connection refused;

Error (PARTIAL_FAILURE): Command completed with errors.
The "ipwhitelist-reload" Command
The ipwhitelist-reload command reloads the IP whitelist on a cluster, or particular server(s) in the same or different clusters. See the section IP Whitelisting for details.
Syntax:
ipwhitelist-reload [-n <cluster-name>] -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. When provided with option -n, a reachable server in the provided list will be used. Otherwise, the command will be individually executed on each server in the list.
Examples
*The example below shows the execution of a cluster-level ipwhitelist-reload command.
./cluster-tool.sh ipwhitelist-reload -n tc-cluster -s localhost
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation
Following sub-operations were successful:
localhost:9410: IP whitelist reload successful
localhost:9510: IP whitelist reload successful
localhost:9610: IP whitelist reload successful
localhost:9710: IP whitelist reload successful

Command completed successfully.
*The example below shows the execution of a server-level ipwhitelist-reload command. No server is running at localhost:9510, hence the IP whitelist reload failure.
./cluster-tool.sh ipwhitelist-reload -s localhost:9410 -s localhost:9510 -s localhost:9910
Following sub-operations were successful:
localhost:9410: IP whitelist reload successful
localhost:9510: IP whitelist reload successful

Following sub-operations were unsuccessful:
localhost:9910:
org.terracotta.diagnostic.client.connection.DiagnosticServiceProviderException:
com.terracotta.connection.api.DetailedConnectionException:
java.util.concurrent.TimeoutException: localhost:9910=Connection refused;

Error (PARTIAL_FAILURE): Command completed with errors.
The "backup" Command
The backup command takes a backup of the running Terracotta cluster. The backup is taken on active servers only. Before taking backup of a cluster, backup-dir needs to be set on each server. For more details about this feature, see Backup, Restore and Data Migration.
Syntax:
backup -n <cluster-name> -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. A reachable server in the provided server list will be used for connection.
Note:
There's no server-level flavor of this command, as backup works at the cluster level only.
Examples
*The example below shows the execution of a cluster-level successful backup command. Note that the server at localhost:9610 was unreachable.
./cluster-tool.sh backup -n tc-cluster -s localhost:9710 -s localhost:9410
Contacting servers: [localhost:9710, localhost:9410]
Following sub-operations were unsuccessful:
localhost:9710:
org.terracotta.diagnostic.client.connection.DiagnosticServiceProviderException:
com.terracotta.connection.api.DetailedConnectionException:
java.util.concurrent.TimeoutException: localhost:9710=Connection refused;

Using reachable server: localhost:9410 to carry out the operation

PHASE 0: SETTING BACKUP NAME TO : 996e7e7a-5c67-49d0-905e-645365c5fe28
localhost:9710: TIMEOUT
localhost:9410: SUCCESS
localhost:9510: SUCCESS
localhost:9610: SUCCESS

PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9710: TIMEOUT
localhost:9410: SUCCESS
localhost:9510: NOOP
localhost:9610: SUCCESS

PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9410: SUCCESS
localhost:9610: SUCCESS

PHASE (3/4): START_BACKUP
localhost:9410: SUCCESS
localhost:9610: SUCCESS

PHASE (4/4): EXIT_ONLINE_BACKUP_MODE
localhost:9410: SUCCESS
localhost:9610: SUCCESS
Command completed successfully.
*The example below shows the execution of a cluster-level failed backup command.
./cluster-tool.sh backup -n tc-cluster -s localhost:9610
Contacting servers: [localhost:9610]
Using reachable server: localhost:9610 to carry out the operation

PHASE 0: SETTING BACKUP NAME TO : 93cdb93d-ad7c-42aa-9479-6efbdd452302
localhost:9410: SUCCESS
localhost:9510: SUCCESS
localhost:9610: SUCCESS
localhost:9710: SUCCESS

PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9410: SUCCESS
localhost:9510: NOOP
localhost:9610: SUCCESS
localhost:9710: NOOP

PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9410: BACKUP_FAILURE
localhost:9610: SUCCESS

PHASE (CLEANUP): ABORT_BACKUP
localhost:9410: SUCCESS
localhost:9610: SUCCESS

Error (FAILURE): Unable to complete backup.
The "shutdown" Command
The shutdown command shuts down a running Terracotta cluster. During the course of the shutdown process, it ensures that:
*Shutdown safety checks are performed on all the servers. Exactly what safety checks are performed will depend on the specified options and is explained in detail later in this section.
*All data is persisted to eliminate data loss.
*All passive servers are shut down first before shutting down the active servers.
The shutdown command follows a multi-phase process as follows:
1. Check with all servers whether they are OK to shut down. Whether or not a server is OK to shut down will depend on the specified shutdown options and the state of server in question.
2. If all servers agree to the shutdown request, all of them will be asked to prepare for the shutdown. Preparing for shutdown may include the following:
a. Persist all data.
b. Block new incoming requests. This ensures that the persisted data will be cluster-wide consistent after shutdown.
3. If all servers successfully prepare for the shutdown, a shutdown call will be issued to all the servers.
The first two steps above ensure an atomic shutdown to the extent possible as the system can be rolled back to its original state if there are any errors. In such cases, client-request processing will resume as usual after unblocking any blocked servers.
In the unlikely event of a failure in the third step above, the error message will clearly specify the servers that failed to shut down. In this case, use the --force option to forcefully terminate the remaining servers. If there is a network connectivity issue, the forceful shutdown may fail, and the remaining servers will have to be terminated using operating system commands.
Note:
The shutdown sequence also ensures that the data is stripe-wide consistent. Although, it is recommended that clients are shut down before attempting to shut down the Terracotta cluster.
Syntax:
shutdown [ -n <cluster-name> [-f | -i] ] -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-f | --force
Forcefully shut down the cluster, even if the cluster is only partially reachable.
*-i | --immediate
Do an immediate shutdown of the cluster, even if clients are connected.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option.
If the -n option is not specified, this command forcefully shuts down only the servers specified in the list. For clusters having stripes configured for high availability (with at least one passive server per stripe), it is recommended that you use the partial cluster shutdown commands explained in the section below, as they allow conditional shutdown, instead of using the shutdown variant without the -n option.
If the -n option is specified (i.e. a full cluster shutdown), this command shuts down the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server. If all servers are reachable, this command checks if all servers in all the stripes are safe to shut down before proceeding with the command.
A cluster is considered to be safe to shut down provided the following are true:
*No critical operations such as backup and restore are going on.
*No Ehcache or TCStore clients are connected.
*All servers in all the stripes are reachable.
If either the -f or -i option is specified, this command works differently than above as follows:
*If the -i option is specified, this command proceeds with the shutdown even if clients are connected.
*If the -f option is specified, this command proceeds with the shutdown even if none of the conditions specified for safe shutdown above are met.
For all cases, the shutdown sequence is performed as follows:
1. Flush all data to persistent store for datasets or caches that have persistence configured.
2. Shut down all the passive servers, if any, in the cluster for all stripes.
3. Once the passive servers are shut down, issue a shutdown request to all the active servers in the cluster.
The above shutdown sequence is the cleanest way to shut down a cluster.
Examples
*The example below shows the execution of a cluster-level successful shutdown command.
./cluster-tool.sh shutdown -n tc-cluster -s localhost:9410
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation

Shutting down cluster: tc-cluster
STEP (1/3): Preparing to shut down
STEP (2/3): Stopping all passive servers first
STEP (3/3): Stopping all active servers
Command completed successfully.
*The example below shows the execution of a cluster-level successful shutdown command that fails as one of the servers in the cluster was not reachable.
./cluster-tool.sh shutdown -n tc-cluster -s localhost:9410
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation
Error (FAILURE): Timed out trying to reach the server
Detailed Error Status for Cluster `tc-cluster` :
ServerError{host='localhost:9510', Error='Timed out trying to reach the server'}.
Unable to process safe shutdown request.
Command failed.
*The example below shows the execution of a cluster-level successful shutdown command with the force option. Note that one of the servers in the cluster was already down.
./cluster-tool.sh shutdown -f -n tc-cluster -s localhost:9410
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation
Timed out trying to reach the server
Detailed Error Status for Cluster `tc-cluster` :
ServerError{host='localhost:9510', Error='Timed out trying to reach the server'}.
Continuing forced shutdown.

Shutting down cluster: tc-cluster
STEP (1/3): Preparing to shut down
Timed out trying to reach the server
Detailed Error Status :
ServerError{host='localhost:9510', Error='Timed out trying to reach the server'}.
Continuing forced shutdown.
STEP (2/3): Stopping all passive servers first
STEP (3/3): Stopping all active servers
Command completed successfully.
Partial Cluster Shutdown Commands
Partial cluster shutdown commands can be used to partially shut down nodes in the cluster without sacrificing the availability of the cluster. These commands can be used only on a cluster that is configured for redundancy with one or more passive servers per stripe. The purpose of these commands is to allow administrators to perform routine and planned administrative tasks, such as rolling upgrades, with high availability.
The following flavors of partial cluster shutdown commands are available:
*shutdown-if-passive
*shutdown-if-active
*shutdown-all-passives
*shutdown-all-actives
As a general rule, if these commands are successful, the specified servers will be shut down. If there are any errors due to which these commands abort, the state of the servers will be left intact.
From the table of server states described in The "status" Command, the following are the different active states that a server may find itself in:
*ACTIVE
*ACTIVE_RECONNECTING
*ACTIVE_SUSPENDED
Note:
In the following sections, the term 'active servers' means servers in any of the active states mentioned above, unless explicitly stated otherwise.
Similarly, the following are the passive states for a server:
*PASSIVE_SUSPENDED
*SYNCHRONIZING
*PASSIVE
Note:
In the following sections, the term 'passive servers' means servers in any of the passive states mentioned above, unless explicitly stated otherwise.
The "shutdown-if-passive" Command
The shutdown-if-passive command shuts down the specified servers in the cluster, provided the following conditions are met:
*All the stripes in the cluster are functional and there is one healthy active server with no suspended active servers per stripe.
*All the servers specified in the list are passive servers.
Syntax:
shutdown-if-passive -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option.
Note:
There's no cluster-level flavor of this command.
Examples
*The example below shows the execution of a successful shutdown-if-passive command.
./cluster-tool.sh shutdown-if-passive -s localhost:9510
Contacting servers: [localhost:9510]

Stopping passive node(s): [localhost:9510] of cluster: tc-cluster
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.
*The example below shows the execution of a failed shutdown-if-passive command, as it tried to shut down a server which is not a passive server.
./cluster-tool.sh shutdown-if-passive -s localhost:9410
Contacting servers: [localhost:9410]
Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in passive state
or may not be in the same cluster
Discovered state of all servers are as follows:
Reachable Servers : 2
Stripe #: 1
Node: {localhost:9410} State: ACTIVE
Node: {localhost:9510} State: PASSIVE

Please check server logs for more details.
Command failed.
The "shutdown-if-active" Command
The shutdown-if-active command shuts down the specified servers in the cluster, provided the following conditions are met:
*All the servers specified in the list are active servers.
*All the stripes corresponding to the given servers have at least one server in 'PASSIVE' state.
Syntax:
shutdown-if-active -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option.
Note:
There's no cluster-level flavor of this command.
Examples
*The example below shows the execution of a successful shutdown-if-active command:
./cluster-tool.sh shutdown-if-active -s localhost:9410
Contacting servers: [localhost:9410]

Stopping active node(s): [localhost:9410] of cluster: tc-cluster
STEP (1/2): Preparing to shut down
STEP (2/2): Shut down if active server
Command completed successfully.
*The example below shows the execution of a failed shutdown-if-active command as the specified server was not an active server.
./cluster-tool.sh shutdown-if-active -s localhost:9510
Contacting servers: [localhost:9510]
Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in active state
or may not be in the same cluster.
Reachable Servers : 2
Stripe #: 1
Node : {localhost:9410} State : ACTIVE
Node : {localhost:9510} State : PASSIVE

Please check server logs for more details
Command failed.
The "shutdown-all-passives" Command
The shutdown-all-passives command shuts down all the passive servers in the specified cluster, provided the following is true:
*All the stripes in the cluster are functional and there is one active server in 'ACTIVE' state with no suspended active servers per stripe.
All passive servers in all the stripes of the cluster will be shut down when this command is run.
Syntax:
shutdown-all-passives -n <cluster-name> -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n <cluster-name>
The name of the configured cluster.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be passive servers.
Note:
There's no server-level flavor of this command, as it can be used only to shut down all the passive servers in the entire cluster.
The command shuts down all the passive servers in a multi-phase manner as follows:
1. Check with all servers whether it is safe to shut down as a passive server.
2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.
3. Issue a shutdown request to all passive servers if all passive servers succeed in step 2.
4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain intact.
Examples
*The example below shows the execution of a successful shutdown-all-passives command.
./cluster-tool.sh shutdown-all-passives -n tc-cluster -s localhost:9410
Contacting servers: [localhost:9410]
Using reachable server: localhost:9410 to carry out the operation

Stopping passive node(s): [localhost:9510] of cluster: tc-cluster
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.
The "shutdown-all-actives" Command
The shutdown-all-actives command shuts down the active server of all stripes in the cluster, provided the following are true:
*There are no suspended active servers in the cluster.
*There is at least one passive server in 'PASSIVE' state in every stripe in the cluster.
The active server of all stripes of the cluster will be shut down when this command returns success. If the command reports an error, the state of the servers will be left intact.
Syntax:
shutdown-all-actives -n cluster-name -s <hostname[:port]>,<hostname[:port]>...
Parameters:
*-n cluster-name
The name of the configured cluster.
*-s <hostname[:port]>,<hostname[:port]>...
The hostname:port(s) or hostname(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be active servers.
Note:
There's no server-level flavor of this command as it can be used only to shut down all the active servers in the entire cluster.
The command shuts down all the active servers in a multi-phase manner as explained below:
1. Check with all servers whether they are safe to be shut down as active servers.
2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.
3. Issue a shutdown request to all active servers if they succeed in step 2.
4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain as before.
Examples
*The example below shows the execution of a successful shutdown-all-actives command. Note that the specified host was a passive server in this example. As the specified host is used only to connect to the cluster and obtain the correct state of all the servers in the cluster, the command successfully shuts down all the active servers in the cluster, leaving the passive servers intact.
./cluster-tool.bat shutdown-all-actives -n tc-cluster -s localhost:9510
Contacting servers: [localhost:9510]
Using reachable server: localhost:9510 to carry out the operation

Stopping active node(s): [localhost:9410] of cluster: tc-cluster
STEP (1/2): Preparing to shut down
STEP (2/2): Shut down if active server
Command completed successfully.