Cluster Tool
The cluster tool is a command-line utility that allows administrators of the Terracotta Server Array to perform a variety of cluster management tasks. For example, the cluster tool can be used to:
Configure or re-configure a cluster
Obtain the status and configuration information of running servers
Dump the state of running servers
Take backups of running servers
Promote a suspended server on startup or failover
Shut down an entire cluster
Perform a conditional partial shutdown of a cluster having one or more passive servers configured for high availability (for upgrades etc.)
The cluster tool script is located in tools/cluster-tool/bin under the product installation directory as cluster-tool.bat for Windows platforms, and as cluster-tool.sh for Unix/Linux.
Usage Flow
The following is a typical flow in a cluster setup and usage:
3. Make sure the stripes are online and ready.
4. Configure the cluster using the configure command of the cluster tool. See the section The "configure" Command" below for details.
5. Check the current status of the cluster or specific servers in the cluster using the
status command. See the section
The "status" Command below for details.
Cluster Tool commands
The cluster tool provides several commands. To list them and their respective options, run cluster-tool.sh (or cluster-tool.bat on Windows) without any arguments, or use the option -h (long option --help).
The following section provides a list of options common to all commands, and thus need to be specified before the command name:
Precursor options
1. -v (long option --verbose)
This option gives you a verbose output, and is useful to debug error conditions.
2. -srd (long option --security-root-directory)
This option can be used to communicate with a server which has TLS/SSL-based security configured. For more details on setting up security in a Terracotta cluster, see the section
Security Core Concepts.
Note: If this option is not specified while trying to connect to a secure cluster, the command will fail with a SECURITY_CONFLICT error.
3. -t (long option --timeout)
This option lets you specify a custom timeout value (in milliseconds) for connections to be established in cluster tool commands.
Note: If this option is not specified, the default value of 30,000 ms (or 30 seconds) is used.
Each command has the option -h (long option --help), which can be used to display the usage for the command.
The following is a comprehensive explanation of the available commands:
The "configure" Command
The
configure command creates a cluster from the otherwise independent
Terracotta stripes, taking as input a mandatory license key. No functionality is available on the server until a valid license is installed. See the section
Licensing for details.
All servers in any given stripe should be started with the same configuration file. The configure command configures the cluster based on the configuration(s) of the currently known active server(s) only. If there is a configuration mismatch among the active and passive server(s) within the same stripe, this command can configure the cluster while taking down any passive server(s) with configuration mismatches. This validation also happens upon server restart and changes will prevent the server from starting. See the section on the reconfigure command for more information on how to update server configurations.
The command will fail if any of the following checks do not pass:
1. License checks
a. The license is valid.
b. The provided configuration files do not violate the license.
2. Configuration checks
The provided configuration files are consistent across all the stripes.
The following configuration items are validated in the configuration files:
1. config:
a. offheap-resource
Offheap resources present in one configuration file must be present in all the files with the same sizes.
b. data-directories
Data directory identifiers present in one configuration file must be present in all the files. However, the data directories they map to can differ.
2. service
a. security
Security configuration settings present in one configuration file must match the settings in all the files.
b. backup-restore
If this element is present in one configuration file, it must be present in all the files.
3. failover-priority
The failover priority setting present in one configuration file must match the setting in all the files.
Refer to the section
The
Terracotta
Configuration File for more information on these elements.
The servers section of the configuration files is also validated. Note that it is not validated between stripes but rather against the configuration used to start the servers themselves.
server host It must be a strict match
name It must be a strict match
tsa-port It must be a strict match
Note: Once a cluster is configured, a similar validation will take place upon server restart. It will cause the server to fail to start if there are differences.
Usage:
configure -n CLUSTER-NAME [-l LICENSE-FILE] TC-CONFIG [TC-CONFIG...]
configure -n CLUSTER-NAME [-l LICENSE-FILE] -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME A name that is to be assigned to the cluster.
-l LICENSE-FILE The path to the license file. If you omit this option, the cluster tool looks for a license file named license.xml in the location tools/cluster-tool/conf under the product installation directory.
TC-CONFIG [TC-CONFIG ...] A whitespace-separated list of configuration files (minimum 1) that describes the stripes to be added to the cluster.
-s HOST[:PORT] [-s HOST[:PORT]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. Any one server from each stripe can be provided. However, multiple servers from the same stripe will work as well. The cluster will be configured with the configurations which were originally used to start the servers.
Note: The command configures the cluster only once. To update the configuration of an already configured cluster, the reconfigure command should be used.
Examples
The example below shows a successful execution for a two stripe configuration and a valid license.
./cluster-tool.sh configure -l ~/license.xml -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Configuration successful
License installation successful
Command completed successfully
The example below shows a failed execution because of an invalid license.
./cluster-tool.sh configure -l ~/license.xml
-n tc-cluster ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Error (BAD_REQUEST): com.terracottatech.LicenseException: Invalid license
The example below shows a failed execution with two stripe configurations mis-matching in their offheap resource sizes.
./cluster-tool.sh configure -n tc-cluster -l
~/license.xml ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Error (BAD_REQUEST): Mismatched off-heap resources in provided config files:
[[primary-server-resource: 51200M], [primary-server-resource: 25600M]]
The "reconfigure" Command
The reconfigure command updates the configuration of a cluster which was configured using the configure command. With reconfigure, it is possible to:
1. Update the license on the cluster.
2. Add new offheap resources, or grow existing ones.
3. Add new data directories.
4. Add new configuration element types.
The command will fail if any of the following checks do not pass:
1. License checks
a. The new license is valid.
b. The new configuration files do not violate the license.
2. Stripe checks
a. The new configuration files have all the previously configured servers.
b. The order of the configuration files provided in the reconfigure command is the same as the order of stripes in the previously configured cluster.
3. Configuration checks
a. Stripe consistency checks
The new configuration files are consistent across all the stripes. For the list of configuration items validated in the configuration files, refer to the section The "configure" Command above for details.
b. Offheap checks
The new configuration has all the previously configured offheap resources, and the new sizes are not smaller than the old sizes.
c. Data directories checks
The new configuration has all the previously configured data directory names.
d. Configuration type checks
The new configuration has all the previously configured configuration types.
Usage:
reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]
reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...
reconfigure -n CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
TC-CONFIG [TC-CONFIG ...] A whitespace-separated list of configuration files (minimum 1) that describe the new configurations for the stripes.
-l LICENSE-FILE The path to the new license file.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of servers, each specified using the -s option.
Servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server.
reconfigure command usage scenarios:
1. License update
When it is required to update the license, most likely because the existing license has expired, the following reconfigure command syntax should be used:
reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...
Note: A license update does not require the servers to be restarted.
2. Configuration update
When it is required to update the cluster configuration, the following reconfigure command syntax should be used:
reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]
The steps below should be followed in order:
a. Update the Terracotta configuration files with the new configuration, ensuring that it meets the reconfiguration criteria mentioned above.
b. Run the reconfigure command with the new configuration files.
c. Restart the servers with the new configuration files for the new configuration to take effect.
3. License and configuration update at once
In the rare event that it is desirable to update the license and the cluster configuration in one go, the following reconfigure command syntax should be used:
cluster-tool.sh reconfigure -n
CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]
The steps to be followed here are the same as those mentioned in the Configuration update section above.
Examples
The example below shows a successful re-configuration of a two stripe cluster
tc-cluster with new stripe configurations.
./cluster-tool.sh reconfigure -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
License not updated (Reason: Identical to previously installed license)
Configuration successful
Command completed successfully.
The example below shows a failed re-configuration because of a license violation.
./cluster-tool.sh reconfigure -n tc-cluster
-l ~/license.xml -s localhost:9410
Error (BAD_REQUEST): Cluster offheap resource is not within the limit of the license.
Provided: 409600 MB, but license allows: 102400 MB only
The example below shows a failed re-configuration of a two stripe cluster with new stripe configurations having fewer data directories than existing configuration.
./cluster-tool.sh reconfigure -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
License not updated (Reason: Identical to previously installed license)
Error (CONFLICT): org.terracotta.exception.EntityConfigurationException:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception: org.terracotta.entity.ConfigurationException:
Mismatched data directories. Provided: [use-for-platform, data],
but previously known: [use-for-platform, data, myData]
The "status" Command
The status command displays the status of a cluster, or particular server(s) in the same or different clusters..
Usage:
status -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
status -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
status command.
./cluster-tool.sh status -n tc-cluster -s localhost
Cluster name: tc-cluster
Stripes in the cluster: 2
Servers in the cluster: 4
Server{name='server-1', host='localhost', port=9410},
Server{name='server-2', host='localhost', port=9610} (stripe 1)
Server{name='server-3', host='localhost', port=9710},
Server{name='server-4', host='localhost', port=9910} (stripe 2)
Total configured offheap: 102400M
Backup configured: true
SSL/TLS configured: false
IP whitelist configured: false
Data directories configured: data, myData
| STRIPE: 1 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+
| server-1 | localhost:9410 | ACTIVE |
| server-2 | localhost:9610 | PASSIVE |
+--------------------+----------------------+--------------------------+
| STRIPE: 2 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+
| server-3 | localhost:9710 | ACTIVE |
| server-4 | localhost:9910 | PASSIVE |
+--------------------+----------------------+--------------------------+
The example below shows the execution of a server-level
status command. No server is running at
localhost:9510, hence the
UNREACHABLE status.
./cluster-tool.sh status -s localhost:9410 -s localhost:9510 -s localhost:9910
+----------------------+--------------------+--------------------------+
| Host:Port | Status | Member of Cluster |
+----------------------+--------------------+--------------------------+
| localhost:9410 | ACTIVE | tc-cluster |
| localhost:9910 | PASSIVE | tc-cluster |
| localhost:9510 | UNREACHABLE | - |
+----------------------+--------------------+--------------------------+
Error (PARTIAL_FAILURE): Command completed with errors.
Server States
STARTING | server is starting |
UNINITIALIZED | server has started and is ready for election |
SYNCHRONIZING | server is synchronizing its data with the current active server |
PASSIVE | server is passive and ready for replication |
ACTIVE | server is active and ready to accept clients |
ACTIVE_RECONNECTING | server is active but waits for previously known clients to rejoin before accepting new clients |
START_SUSPENDED | server startup is suspended for all of its peers to come up |
ACTIVE_SUSPENDED | server is active but blocked in the election process (consistency mode) |
PASSIVE_SUSPENDED | server is passive but blocked in the election process (consistency mode) |
UNREACHABLE | server is unreachable from cluster tool |
UNKNOWN | server state is unknown |
The "promote" command
The
promote command can be used to promote a server stuck in a
suspended state. For more information about suspended states, refer to the topics
Server startup and
Manual promotion with override voter in the section
Failover Tuning.
Usage:
promote -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-s HOST[:PORT] The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. The command will be individually executed on each server in the list.
Note: There is no cluster-wide equivalent (with the -n option) for this command.
Examples
The example below shows the execution of the
promote command on a server stuck in suspended state at
localhost:9510.
./cluster-tool.sh promote -s localhost:9510
Command completed successfully.
The example below shows the erroneous execution of a server-level
promote command. The server running at
localhost:9510 is not in a suspended state to be promoted, hence the failure.
./cluster-tool.sh promote -s localhost:9510
localhost:9510: Promote failed as the server at localhost:9510 is not
in any suspended state
Error (FAILURE): Command failed.
The "dump" Command
The dump command dumps the state of a cluster, or particular server(s) in the same or different clusters. The dump of each server can be found in its logs.
Usage:
dump -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
dump -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
dump command.
./cluster-tool.sh dump -n tc-cluster -s localhost:9910
Command completed successfully.
The example below shows the execution of a server-level
dump command. No server is running at
localhost:9510, hence the dump failure.
./cluster-tool.sh dump -s localhost:9410 -s localhost:9510 -s localhost:9910
Dump successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Dump successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "stop" Command
The stop command stops the cluster, or particular server(s) in the same or different clusters.
Usage:
stop -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
stop -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be contacted for connectivity, and the command will be executed on all reachable servers. An attempt will be made to shut down the entire cluster in the correct sequence by shutting down all the passive servers first followed by the active servers. The stop command with the -n option is similar to the shutdown command with the --force option.
NOTE: This command is deprecated in favor of the shutdown command. Refer to the description of the shutdown command for more details.
Examples
The example below shows the execution of a cluster-level
stop command.
./cluster-tool.sh stop -n tc-cluster -s localhost
Command completed successfully.
The example below shows the execution of a server-level
stop command. No server is running at
localhost:9510, hence the stop failure.
./cluster-tool.sh stop -s localhost:9410 -s localhost:9510 -s localhost:9910
Stop successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Stop successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "ipwhitelist-reload" Command
The
ipwhitelist-reload command reloads the IP whitelist on a cluster, or particular server(s) in the same or different clusters. See the section
IP Whitelisting for details.
Usage:
ipwhitelist-reload -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
ipwhitelist-reload -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAMEThe name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
ipwhitelist-reload command.
./cluster-tool.sh ipwhitelist-reload -n tc-cluster -s localhost
IP whitelist reload successful for server at: localhost:9410
IP whitelist reload successful for server at: localhost:9610
IP whitelist reload successful for server at: localhost:9710
IP whitelist reload successful for server at: localhost:9910
Command completed successfully.
The example below shows the execution of a server-level
ipwhitelist-reload command. No server is running at
localhost:9510, hence the IP whitelist reload failure.
./cluster-tool.sh ipwhitelist-reload -s localhost:9410
-s localhost:9510 -s localhost:9910
IP whitelist reload successful for server at: localhost:9410
Connection refused from server at: localhost:9510
IP whitelist reload successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "backup" Command
The backup command takes a backup of the running Terracotta cluster.
Usage:
backup -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level successful
backup command. Note that the server at
localhost:9610 was unreachable.
./cluster-tool.sh backup -n tc-cluster -s localhost:9610 -s localhost:9410
PHASE 0: SETTING BACKUP NAME TO : 996e7e7a-5c67-49d0-905e-645365c5fe28
localhost:9610: TIMEOUT
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS
PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: TIMEOUT
localhost:9910: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS
PHASE (3/4): START_BACKUP
localhost:9710: SUCCESS
localhost:9410: SUCCESS
PHASE (4/4): EXIT_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS
Command completed successfully.
The example below shows the execution of a cluster-level failed
backup command.
./cluster-tool.sh backup -n tc-cluster -s localhost:9610
PHASE 0: SETTING BACKUP NAME TO : 93cdb93d-ad7c-42aa-9479-6efbdd452302
localhost:9610: SUCCESS
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS
PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: NOOP
PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9410: BACKUP_FAILURE
localhost:9710: SUCCESS
PHASE (CLEANUP): ABORT_BACKUP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
Backup failed as some servers '[Server{name='server-1', host='localhost', port=9410},
[Server{name='server-2', host='localhost', port=9710}]]',
failed to enter online backup mode.
The "shutdown" Command
The shutdown command shuts down a running Terracotta cluster. During the course of the shutdown process, it ensures that:
Shutdown safety checks are performed on all the servers. Exactly what safety checks are performed will depend on the specified options and is explained in detail later in this section.
All data is persisted to eliminate data loss.
All passive servers are shut down first before shutting down the active servers.
The shutdown command follows a multi-phase process as follows:
1. Check with all servers whether they are OK to shut down. Whether or not a server is OK to shut down will depend on the specified shutdown options and the state of server in question.
2. If all servers agree to the shutdown request, all of them will be asked to prepare for the shutdown. Preparing for shutdown may include the following:
a. Persist all data.
b. Block new incoming requests. This ensures that the persisted data will be cluster-wide consistent after shutdown.
3. If all servers successfully prepare for the shutdown, a shutdown call will be issued to all the servers.
The first two steps above ensure an atomic shutdown to the extent possible as the system can be rolled back to its original state if there are any errors. In such cases, client-request processing will resume as usual after unblocking any blocked servers.
In the unlikely event of a failure in the third step above, the error message will clearly specify the servers that failed to shut down. In this case, use the --force option to forcefully terminate the remaining servers. If there is a network connectivity issue, the forceful shutdown may fail, and the remaining servers will have to be terminated using operating system commands.
Note: The shutdown sequence also ensures that the data is stripe-wide consistent. Although, it is recommended that clients are shut down before attempting to shut down the Terracotta cluster.
Usage:
shutdown -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-f | --force Forcefully shut down the cluster, even if the cluster is only partially reachable.
-i | --immediate Do an immediate shutdown of the cluster, even if clients are connected.
-s HOST[:PORT] [-s HOST[:PORT]]… The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
If the -n option is not specified, this command forcefully shuts down only the servers specified in the list. For clusters having stripes configured for high availability (with at least one passive server per stripe), it is recommended that you use the partial cluster shutdown commands explained in the section below, as they allow conditional shutdown, instead of using the shutdown variant without the -n option.
If the -n option is specified (i.e. a full cluster shutdown), this command shuts down the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server. If all servers are reachable, this command checks if all servers in all the stripes are safe to shut down before proceeding with the command.
A cluster is considered to be safe to shut down provided the following are true:
No critical operations such as backup and restore are going on.
No Ehcache or TCStore clients are connected.
All servers in all the stripes are reachable.
If either the -f or -i option is specified, this command works differently than above as follows:
If the
-i option is specified, this command proceeds with the shutdown even if clients are connected.
If the
-f option is specified, this command proceeds with the shutdown even if none of the conditions specified for safe shutdown above are met.
For all cases, the shutdown sequence is performed as follows:
1. Flush all data to persistent store for datasets or caches that have persistence configured.
2. Shut down all the passive servers, if any, in the cluster for all stripes.
3. Once the passive servers are shut down, issue a shutdown request to all the active servers in the cluster.
The above shutdown sequence is the cleanest way to shut down a cluster.
Examples
The example below shows the execution of a cluster-level successful
shutdown command.
./cluster-tool.sh shutdown -n primary -s localhost:9610 -s localhost:9410
Shutting down cluster: primary
STEP (1/3): Preparing to shutdown
STEP (2/3): Stopping all passives first
STEP (3/3): Stopping all actives
Command completed successfully.
The example below shows the execution of a cluster-level successful
shutdown command that fails as one of the servers in the cluster was not reachable.
./cluster-tool.sh shutdown -n primary -s localhost:11104
Error (FAILURE): Shutdown invocation timed out
Detailed Error Status for Cluster `primary` :
ServerError{host='localhost:25493', Error='Shutdown invocation timed out'}.
Unable to process safe shutdown request.
Command failed..
The example below shows the execution of a cluster-level successful
shutdown command with the force option. Note that one of the servers in the cluster was already down.
./cluster-tool.sh shutdown -f -n primary -s localhost:11104
Shutting down cluster: primary
STEP (1/3): Preparing to shutdown
Shutdown invocation timed out
Detailed Error Status :
ServerError{host='localhost:25493', Error='Shutdown invocation timed out'}.
Continuing Forced Shutdown.
STEP (2/3): Stopping all passives first
STEP (3/3): Stopping all actives
Command completed successfully.
Partial Cluster Shutdown Commands
Partial cluster shutdown commands can be used to partially shut down nodes in the cluster without sacrificing the availability of the cluster. These commands can be used only on a cluster that is configured for redundancy with one or more passive servers per stripe. The purpose of these commands is to allow administrators to perform routine and planned administrative tasks, such as rolling upgrades, with high availability.
The following flavors of partial cluster shutdown commands are available:
shutdown-if-passive shutdown-if-active shutdown-all-passives shutdown-all-actives As a general rule, if these commands are successful, the specified servers will be shut down. If there are any errors due to which these commands abort, the state of the servers will be left intact.
From the table of server states described in
The "status" Command, the following are the different active states that a server may find itself in:
ACTIVE ACTIVE_RECONNECTING ACTIVE_SUSPENDED Note: In the following sections, the term 'active servers' means servers in any of the active states mentioned above, unless explicitly stated otherwise.
Similarly, the following are the passive states for a server:
PASSIVE_SUSPENDED SYNCHRONIZING PASSIVE Note: In the following sections, the term 'passive servers' means servers in any of the passive states mentioned above, unless explicitly stated otherwise.
The "shutdown-if-passive" Command
The shutdown-if-passive command shuts down the specified servers in the cluster, provided the following conditions are met:
All the stripes in the cluster are functional and there is one healthy active server with no suspended active servers per stripe.
All the servers specified in the list are passive servers.
Usage:
shutdown-if-passive -s HOST[:PORT] [-s
HOST[:PORT]]...
Parameters:
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
Examples
The example below shows the execution of a successful
shutdown-if-passive command.
./cluster-tool.sh shutdown-if-passive -s localhost:23006
Stopping Passive Server(s): `[Server{name='testServer1', host='localhost',
port=23006}]` of Cluster: primary
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.
The example below shows the execution of a failed shutdown-if-passive command, as it tried to shut down a server which is not a passive server.
./cluster-tool.sh shutdown-if-passive -s localhost:23004
Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in passive state or
may not be in the same cluster
Discovered state of all servers are as follows:
Reachable Servers : 5
Stripe #: 1
Server : {Server{name='testServer1', host='localhost', port=23006}}
State : PASSIVE
Server : {Server{name='testServer0', host='localhost', port=23004}}
State : ACTIVE
Stripe #: 2
Server : {Server{name='testServer101', host='localhost', port=2537}}
State : ACTIVE
Server : {Server{name='testServer100', host='localhost', port=2535}}
State : PASSIVE
Server : {Server{name='testServer102', host='localhost', port=2539}}
State : PASSIVE
Please check server logs for more details.
Command failed.
The "shutdown-if-active" Command
The shutdown-if-active command shuts down the specified servers in the cluster, provided the following conditions are met:
All the servers specified in the list are active servers.
All the stripes corresponding to the given servers have at least one server in 'PASSIVE' state.
Usage:
shutdown-if-active -s HOST[:PORT] [-s
HOST[:PORT]]...
Parameters:
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
Examples
The example below shows the execution of a successful
shutdown-if-active command:
./cluster-tool.sh shutdown-if-active -s localhost:23726 -s localhost:26963
Stopping Active Server(s): `[Server{name='testServer0', host='localhost', port=23726}, Server{name='testServer101', host='localhost', port=26963}]` of cluster: primary
STEP (1/2): Preparing For Shutdown
STEP (2/2): Shutdown If Active
Command completed successfully.
The example below shows the execution of a failed
shutdown-if-active command as the specified server was not an active server.
./cluster-tool.sh shutdown-if-active -s localhost:23726 -s localhost:23730
Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in active state or
may not be in the same cluster.
Reachable Servers : 5
Stripe #: 1
Server : {Server{name='testServer2', host='localhost', port=23730}}
State : PASSIVE
Server : {Server{name='testServer0', host='localhost', port=23726}}
State : ACTIVE
Stripe #: 2
Server : {Server{name='testServer100', host='localhost', port=26961}}
State : PASSIVE
Server : {Server{name='testServer101', host='localhost', port=26963}}
State : ACTIVE
Server : {Server{name='testServer102', host='localhost', port=26965}}
State : PASSIVE
Please check server logs for more details
Command failed.
The "shutdown-all-passives" Command
The shutdown-all-passives command shuts down all the passive servers in the specified cluster, provided the following is true:
All the stripes in the cluster are functional and there is one active server in 'ACTIVE' state with no suspended active servers per stripe.
All passive servers in all the stripes of the cluster will be shut down when this command is run.
Usage:
shutdown-all-passives -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be passive servers.
The -n option is mandatory, as this command can be used only to shut down all the passive servers in the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server.
After the necessary verifications, it proceeds to shut down all the passive servers in a multi-phase manner as explained below:
1. Check with all servers whether it is safe to shut down as a passive server.
2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.
3. Issue a shutdown request to all passive servers if all passive servers succeed in step 2.
4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain intact.
Examples
The example below shows the execution of a successful
shutdown-all-passives command.
./cluster-tool.sh shutdown-all-passives -n primary -s localhost:5252
Stopping Passive Server(s): `[Server{name='testServer0', host='localhost',
port=5252},
Server{name='testServer100', host='localhost', port=15361},
Server{name='testServer102', host='localhost', port=15365}]`
of Cluster: primary
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.
The "shutdown-all-actives" Command
The shutdown-all-actives command shuts down the active server of all stripes in the cluster, provided the following are true:
There are no suspended active servers in the cluster.
There is at least one passive server in 'PASSIVE' state in every stripe in the cluster.
The active server of all stripes of the cluster will be shut down when this command returns success. If the command reports an error, the state of the servers will be left intact.
Usage:
shutdown-all-actives -n CLUSTER-NAME -s
HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be active servers.
The -n option is mandatory, as this command can be used only to shut down all the active servers in the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server.
After the necessary verifications, it proceeds to shut down all the active servers in a multi-phase manner as explained below:
1. Check with all servers whether they are safe to be shut down as active servers.
2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.
3. Issue a shutdown request to all active servers if they succeed in step 2.
4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain as before.
Examples
The example below shows the execution of a successful
shutdown-all-actives command. Note that the specified host was a passive server in this example. As the specified host is used only to connect to the cluster and obtain the correct state of all the servers in the cluster, the command successfully shuts down all the active servers in the cluster, leaving the passive servers intact.
./cluster-tool.bat shutdown-all-actives -n primary -s localhost:31445
Stopping Active Server(s): `[Server{name='testServer2', host='localhost',
port=31449},
Server{name='testServer100', host='localhost', port=27579}]`
of cluster: primary
STEP (1/2): Preparing For Shutdown
STEP (2/2): Shutdown If Active
Command completed successfully.