Cluster Tool

Terracotta Ehcache 10.5 | Terracotta Server Administration Guide | Cluster Tool

Cluster Tool

The cluster tool is a command-line utility that allows administrators of the Terracotta Server Array to perform a variety of cluster management tasks. For example, the cluster tool can be used to:

Configure or re-configure a cluster

Obtain the status and configuration information of running servers

Dump the state of running servers

Take backups of running servers

Promote a suspended server on startup or failover

Shut down an entire cluster

Perform a conditional partial shutdown of a cluster having one or more passive servers configured for high availability (for upgrades etc.)

The cluster tool script is located in tools/cluster-tool/bin under the product installation directory as cluster-tool.bat for Windows platforms, and as cluster-tool.sh for Unix/Linux.

Usage Flow

The following is a typical flow in a cluster setup and usage:

1. Create Terracotta configuration files for each stripe in the deployment. See the section The Terracotta Configuration File for details.

2. Start up the servers in each stripe. See the section Starting and Stopping the Terracotta Server for details.

3. Make sure the stripes are online and ready.

4. Configure the cluster using the configure command of the cluster tool. See the section The "configure" Command" below for details.

5. Check the current status of the cluster or specific servers in the cluster using the status command. See the section The "status" Command below for details.

Cluster Tool commands

The cluster tool provides several commands. To list them and their respective options, run cluster-tool.sh (or cluster-tool.bat on Windows) without any arguments, or use the option -h (long option --help).

The following section provides a list of options common to all commands, and thus need to be specified before the command name:

Precursor options

1. -v (long option --verbose)

This option gives you a verbose output, and is useful to debug error conditions.

2. -srd (long option --security-root-directory)

This option can be used to communicate with a server which has TLS/SSL-based security configured. For more details on setting up security in a Terracotta cluster, see the section Security Core Concepts.

Note: If this option is not specified while trying to connect to a secure cluster, the command will fail with a SECURITY_CONFLICT error.

3. -t (long option --timeout)

This option lets you specify a custom timeout value (in milliseconds) for connections to be established in cluster tool commands.

Note: If this option is not specified, the default value of 30,000 ms (or 30 seconds) is used.

Each command has the option -h (long option --help), which can be used to display the usage for the command.

The following is a comprehensive explanation of the available commands:

The "configure" Command

The configure command creates a cluster from the otherwise independent Terracotta stripes, taking as input a mandatory license key. No functionality is available on the server until a valid license is installed. See the section Licensing for details.

All servers in any given stripe should be started with the same configuration file. The configure command configures the cluster based on the configuration(s) of the currently known active server(s) only. If there is a configuration mismatch among the active and passive server(s) within the same stripe, this command can configure the cluster while taking down any passive server(s) with configuration mismatches. This validation also happens upon server restart and changes will prevent the server from starting. See the section on the reconfigure command for more information on how to update server configurations.

The command will fail if any of the following checks do not pass:

1. License checks

a. The license is valid.

b. The provided configuration files do not violate the license.

2. Configuration checks

The provided configuration files are consistent across all the stripes.

The following configuration items are validated in the configuration files:

1. config:

a. offheap-resource

Offheap resources present in one configuration file must be present in all the files with the same sizes.

b. data-directories

Data directory identifiers present in one configuration file must be present in all the files. However, the data directories they map to can differ.

2. service

a. security

Security configuration settings present in one configuration file must match the settings in all the files.

For more details on setting up security in a Terracotta cluster, see the section Security Core Concepts.

b. backup-restore

If this element is present in one configuration file, it must be present in all the files.

3. failover-priority

The failover priority setting present in one configuration file must match the setting in all the files.

Refer to the section The Terracotta Configuration File for more information on these elements.

The servers section of the configuration files is also validated. Note that it is not validated between stripes but rather against the configuration used to start the servers themselves.

server

host

It must be a strict match

name

It must be a strict match

tsa-port

It must be a strict match

Note: Once a cluster is configured, a similar validation will take place upon server restart. It will cause the server to fail to start if there are differences.

Usage:

configure -n CLUSTER-NAME [-l LICENSE-FILE] TC-CONFIG [TC-CONFIG...]
configure -n CLUSTER-NAME [-l LICENSE-FILE] -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

A name that is to be assigned to the cluster.

-l LICENSE-FILE

The path to the license file. If you omit this option, the cluster tool looks for a license file named license.xml in the location tools/cluster-tool/conf under the product installation directory.

TC-CONFIG [TC-CONFIG ...]

A whitespace-separated list of configuration files (minimum 1) that describes the stripes to be added to the cluster.

-s HOST[:PORT] [-s HOST[:PORT]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. Any one server from each stripe can be provided. However, multiple servers from the same stripe will work as well. The cluster will be configured with the configurations which were originally used to start the servers.

Note: The command configures the cluster only once. To update the configuration of an already configured cluster, the reconfigure command should be used.

Examples

The example below shows a successful execution for a two stripe configuration and a valid license.

./cluster-tool.sh configure -l ~/license.xml -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Configuration successful
License installation successful

Command completed successfully

The example below shows a failed execution because of an invalid license.

./cluster-tool.sh configure -l ~/license.xml
-n tc-cluster ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml

Error (BAD_REQUEST): com.terracottatech.LicenseException: Invalid license

The example below shows a failed execution with two stripe configurations mis-matching in their offheap resource sizes.

./cluster-tool.sh configure -n tc-cluster -l
~/license.xml ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml

Error (BAD_REQUEST): Mismatched off-heap resources in provided config files:
[[primary-server-resource: 51200M], [primary-server-resource: 25600M]]

The "reconfigure" Command

The reconfigure command updates the configuration of a cluster which was configured using the configure command. With reconfigure, it is possible to:

1. Update the license on the cluster.

2. Add new offheap resources, or grow existing ones.

3. Add new data directories.

4. Add new configuration element types.

The command will fail if any of the following checks do not pass:

1. License checks

a. The new license is valid.

b. The new configuration files do not violate the license.

2. Stripe checks

a. The new configuration files have all the previously configured servers.

b. The order of the configuration files provided in the reconfigure command is the same as the order of stripes in the previously configured cluster.

3. Configuration checks

a. Stripe consistency checks

The new configuration files are consistent across all the stripes. For the list of configuration items validated in the configuration files, refer to the section The "configure" Command above for details.

b. Offheap checks

The new configuration has all the previously configured offheap resources, and the new sizes are not smaller than the old sizes.

c. Data directories checks

The new configuration has all the previously configured data directory names.

d. Configuration type checks

The new configuration has all the previously configured configuration types.

Usage:

reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]
reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...
reconfigure -n CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

TC-CONFIG [TC-CONFIG ...]

A whitespace-separated list of configuration files (minimum 1) that describe the new configurations for the stripes.

-l LICENSE-FILE

The path to the new license file.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of servers, each specified using the -s option.

Servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server.

reconfigure command usage scenarios:

1. License update

When it is required to update the license, most likely because the existing license has expired, the following reconfigure command syntax should be used:

reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...

Note: A license update does not require the servers to be restarted.

2. Configuration update

When it is required to update the cluster configuration, the following reconfigure command syntax should be used:

reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]

The steps below should be followed in order:

a. Update the Terracotta configuration files with the new configuration, ensuring that it meets the reconfiguration criteria mentioned above.

b. Run the reconfigure command with the new configuration files.

c. Restart the servers with the new configuration files for the new configuration to take effect.

3. License and configuration update at once

In the rare event that it is desirable to update the license and the cluster configuration in one go, the following reconfigure command syntax should be used:

cluster-tool.sh reconfigure -n
CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]

The steps to be followed here are the same as those mentioned in the Configuration update section above.

Examples

The example below shows a successful re-configuration of a two stripe cluster tc-cluster with new stripe configurations.

The example below shows a failed re-configuration because of a license violation.

./cluster-tool.sh reconfigure -n tc-cluster
-l ~/license.xml -s localhost:9410

Error (BAD_REQUEST): Cluster offheap resource is not within the limit of the license.
Provided: 409600 MB, but license allows: 102400 MB only

The example below shows a failed re-configuration of a two stripe cluster with new stripe configurations having fewer data directories than existing configuration.

./cluster-tool.sh reconfigure -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml

License not updated (Reason: Identical to previously installed license)
Error (CONFLICT): org.terracotta.exception.EntityConfigurationException:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception: org.terracotta.entity.ConfigurationException:
Mismatched data directories. Provided: [use-for-platform, data],
but previously known: [use-for-platform, data, myData]

The "status" Command

The status command displays the status of a cluster, or particular server(s) in the same or different clusters..

Usage:

status -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
status -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

When provided with option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.

Examples

The example below shows the execution of a cluster-level status command.

./cluster-tool.sh status -n tc-cluster -s localhost
Cluster name: tc-cluster
Stripes in the cluster: 2
Servers in the cluster: 4
Server{name='server-1', host='localhost', port=9410},
Server{name='server-2', host='localhost', port=9610} (stripe 1)
Server{name='server-3', host='localhost', port=9710},
Server{name='server-4', host='localhost', port=9910} (stripe 2)
Total configured offheap: 102400M
Backup configured: true
SSL/TLS configured: false
IP whitelist configured: false
Data directories configured: data, myData

| STRIPE: 1 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+

| server-1 | localhost:9410 | ACTIVE |

| server-2 | localhost:9610 | PASSIVE |
+--------------------+----------------------+--------------------------+

| STRIPE: 2 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+

| server-3 | localhost:9710 | ACTIVE |

| server-4 | localhost:9910 | PASSIVE |
+--------------------+----------------------+--------------------------+

The example below shows the execution of a server-level status command. No server is running at localhost:9510, hence the UNREACHABLE status.

Server States

STARTING	server is starting
UNINITIALIZED	server has started and is ready for election
SYNCHRONIZING	server is synchronizing its data with the current active server
PASSIVE	server is passive and ready for replication
ACTIVE	server is active and ready to accept clients
ACTIVE_RECONNECTING	server is active but waits for previously known clients to rejoin before accepting new clients
START_SUSPENDED	server startup is suspended for all of its peers to come up
ACTIVE_SUSPENDED	server is active but blocked in the election process (consistency mode)
PASSIVE_SUSPENDED	server is passive but blocked in the election process (consistency mode)
UNREACHABLE	server is unreachable from cluster tool
UNKNOWN	server state is unknown

The "promote" command

The promote command can be used to promote a server stuck in a suspended state. For more information about suspended states, refer to the topics Server startup and Manual promotion with override voter in the section Failover Tuning.

Usage:

promote -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-s HOST[:PORT]

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. The command will be individually executed on each server in the list.

Note: There is no cluster-wide equivalent (with the -n option) for this command.

Examples

The example below shows the execution of the promote command on a server stuck in suspended state at localhost:9510.

./cluster-tool.sh promote -s localhost:9510
Command completed successfully.

The example below shows the erroneous execution of a server-level promote command. The server running at localhost:9510 is not in a suspended state to be promoted, hence the failure.

./cluster-tool.sh promote -s localhost:9510
localhost:9510: Promote failed as the server at localhost:9510 is not
in any suspended state
Error (FAILURE): Command failed.

The "dump" Command

The dump command dumps the state of a cluster, or particular server(s) in the same or different clusters. The dump of each server can be found in its logs.

Usage:

dump -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
dump -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

Examples

The example below shows the execution of a cluster-level dump command.

./cluster-tool.sh dump -n tc-cluster -s localhost:9910
Command completed successfully.

The example below shows the execution of a server-level dump command. No server is running at localhost:9510, hence the dump failure.

./cluster-tool.sh dump -s localhost:9410 -s localhost:9510 -s localhost:9910
Dump successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Dump successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.

The "stop" Command

The stop command stops the cluster, or particular server(s) in the same or different clusters.

Usage:

stop -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
stop -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

When provided with the option -n, servers in the provided list will be contacted for connectivity, and the command will be executed on all reachable servers. An attempt will be made to shut down the entire cluster in the correct sequence by shutting down all the passive servers first followed by the active servers. The stop command with the -n option is similar to the shutdown command with the --force option.

NOTE: This command is deprecated in favor of the shutdown command. Refer to the description of the shutdown command for more details.

Examples

The example below shows the execution of a cluster-level stop command.

./cluster-tool.sh stop -n tc-cluster -s localhost
Command completed successfully.

The example below shows the execution of a server-level stop command. No server is running at localhost:9510, hence the stop failure.

./cluster-tool.sh stop -s localhost:9410 -s localhost:9510 -s localhost:9910
Stop successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Stop successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.

The "ipwhitelist-reload" Command

The ipwhitelist-reload command reloads the IP whitelist on a cluster, or particular server(s) in the same or different clusters. See the section IP Whitelisting for details.

Usage:

ipwhitelist-reload -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
ipwhitelist-reload -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAMEThe name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.

Examples

The example below shows the execution of a cluster-level ipwhitelist-reload command.

./cluster-tool.sh ipwhitelist-reload -n tc-cluster -s localhost
IP whitelist reload successful for server at: localhost:9410
IP whitelist reload successful for server at: localhost:9610
IP whitelist reload successful for server at: localhost:9710
IP whitelist reload successful for server at: localhost:9910
Command completed successfully.

The example below shows the execution of a server-level ipwhitelist-reload command. No server is running at localhost:9510, hence the IP whitelist reload failure.

./cluster-tool.sh ipwhitelist-reload -s localhost:9410
-s localhost:9510 -s localhost:9910
IP whitelist reload successful for server at: localhost:9410
Connection refused from server at: localhost:9510
IP whitelist reload successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.

The "backup" Command

The backup command takes a backup of the running Terracotta cluster.

Usage:

backup -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

Examples

The example below shows the execution of a cluster-level successful backup command. Note that the server at localhost:9610 was unreachable.

./cluster-tool.sh backup -n tc-cluster -s localhost:9610 -s localhost:9410

PHASE 0: SETTING BACKUP NAME TO : 996e7e7a-5c67-49d0-905e-645365c5fe28
localhost:9610: TIMEOUT
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS

PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: TIMEOUT
localhost:9910: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS

PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS

PHASE (3/4): START_BACKUP
localhost:9710: SUCCESS
localhost:9410: SUCCESS

PHASE (4/4): EXIT_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS
Command completed successfully.

The example below shows the execution of a cluster-level failed backup command.

./cluster-tool.sh backup -n tc-cluster -s localhost:9610
PHASE 0: SETTING BACKUP NAME TO : 93cdb93d-ad7c-42aa-9479-6efbdd452302
localhost:9610: SUCCESS
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS

PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: NOOP

PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9410: BACKUP_FAILURE
localhost:9710: SUCCESS

PHASE (CLEANUP): ABORT_BACKUP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
Backup failed as some servers '[Server{name='server-1', host='localhost', port=9410},
[Server{name='server-2', host='localhost', port=9710}]]',
failed to enter online backup mode.

The "shutdown" Command

The shutdown command shuts down a running Terracotta cluster. During the course of the shutdown process, it ensures that:

Shutdown safety checks are performed on all the servers. Exactly what safety checks are performed will depend on the specified options and is explained in detail later in this section.

All data is persisted to eliminate data loss.

All passive servers are shut down first before shutting down the active servers.

The shutdown command follows a multi-phase process as follows:

1. Check with all servers whether they are OK to shut down. Whether or not a server is OK to shut down will depend on the specified shutdown options and the state of server in question.

2. If all servers agree to the shutdown request, all of them will be asked to prepare for the shutdown. Preparing for shutdown may include the following:

a. Persist all data.

b. Block new incoming requests. This ensures that the persisted data will be cluster-wide consistent after shutdown.

3. If all servers successfully prepare for the shutdown, a shutdown call will be issued to all the servers.

The first two steps above ensure an atomic shutdown to the extent possible as the system can be rolled back to its original state if there are any errors. In such cases, client-request processing will resume as usual after unblocking any blocked servers.

In the unlikely event of a failure in the third step above, the error message will clearly specify the servers that failed to shut down. In this case, use the --force option to forcefully terminate the remaining servers. If there is a network connectivity issue, the forceful shutdown may fail, and the remaining servers will have to be terminated using operating system commands.

Note: The shutdown sequence also ensures that the data is stripe-wide consistent. Although, it is recommended that clients are shut down before attempting to shut down the Terracotta cluster.

Usage:

shutdown -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-f | --force

Forcefully shut down the cluster, even if the cluster is only partially reachable.

-i | --immediate

Do an immediate shutdown of the cluster, even if clients are connected.

-s HOST[:PORT] [-s HOST[:PORT]]…

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

If the -n option is not specified, this command forcefully shuts down only the servers specified in the list. For clusters having stripes configured for high availability (with at least one passive server per stripe), it is recommended that you use the partial cluster shutdown commands explained in the section below, as they allow conditional shutdown, instead of using the shutdown variant without the -n option.

If the -n option is specified (i.e. a full cluster shutdown), this command shuts down the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server. If all servers are reachable, this command checks if all servers in all the stripes are safe to shut down before proceeding with the command.

A cluster is considered to be safe to shut down provided the following are true:

No critical operations such as backup and restore are going on.

No Ehcache or TCStore clients are connected.

All servers in all the stripes are reachable.

If either the -f or -i option is specified, this command works differently than above as follows:

If the -i option is specified, this command proceeds with the shutdown even if clients are connected.

If the -f option is specified, this command proceeds with the shutdown even if none of the conditions specified for safe shutdown above are met.

For all cases, the shutdown sequence is performed as follows:

1. Flush all data to persistent store for datasets or caches that have persistence configured.

2. Shut down all the passive servers, if any, in the cluster for all stripes.

3. Once the passive servers are shut down, issue a shutdown request to all the active servers in the cluster.

The above shutdown sequence is the cleanest way to shut down a cluster.

Examples

The example below shows the execution of a cluster-level successful shutdown command.

./cluster-tool.sh shutdown -n primary -s localhost:9610 -s localhost:9410

Shutting down cluster: primary
STEP (1/3): Preparing to shutdown
STEP (2/3): Stopping all passives first
STEP (3/3): Stopping all actives
Command completed successfully.

The example below shows the execution of a cluster-level successful shutdown command that fails as one of the servers in the cluster was not reachable.

./cluster-tool.sh shutdown -n primary -s localhost:11104

Error (FAILURE): Shutdown invocation timed out
Detailed Error Status for Cluster `primary` :
ServerError{host='localhost:25493', Error='Shutdown invocation timed out'}.
Unable to process safe shutdown request.
Command failed..

The example below shows the execution of a cluster-level successful shutdown command with the force option. Note that one of the servers in the cluster was already down.

./cluster-tool.sh shutdown -f -n primary -s localhost:11104

Shutting down cluster: primary
STEP (1/3): Preparing to shutdown
Shutdown invocation timed out
Detailed Error Status :
ServerError{host='localhost:25493', Error='Shutdown invocation timed out'}.
Continuing Forced Shutdown.
STEP (2/3): Stopping all passives first
STEP (3/3): Stopping all actives
Command completed successfully.

Partial Cluster Shutdown Commands

Partial cluster shutdown commands can be used to partially shut down nodes in the cluster without sacrificing the availability of the cluster. These commands can be used only on a cluster that is configured for redundancy with one or more passive servers per stripe. The purpose of these commands is to allow administrators to perform routine and planned administrative tasks, such as rolling upgrades, with high availability.

The following flavors of partial cluster shutdown commands are available:

shutdown-if-passive

shutdown-if-active

shutdown-all-passives

shutdown-all-actives

As a general rule, if these commands are successful, the specified servers will be shut down. If there are any errors due to which these commands abort, the state of the servers will be left intact.

From the table of server states described in The "status" Command, the following are the different active states that a server may find itself in:

ACTIVE

ACTIVE_RECONNECTING

ACTIVE_SUSPENDED

Note: In the following sections, the term 'active servers' means servers in any of the active states mentioned above, unless explicitly stated otherwise.

Similarly, the following are the passive states for a server:

PASSIVE_SUSPENDED

SYNCHRONIZING

PASSIVE

Note: In the following sections, the term 'passive servers' means servers in any of the passive states mentioned above, unless explicitly stated otherwise.

The "shutdown-if-passive" Command

The shutdown-if-passive command shuts down the specified servers in the cluster, provided the following conditions are met:

All the stripes in the cluster are functional and there is one healthy active server with no suspended active servers per stripe.

All the servers specified in the list are passive servers.

Usage:

shutdown-if-passive -s HOST[:PORT] [-s
HOST[:PORT]]...

Parameters:

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

Examples

The example below shows the execution of a successful shutdown-if-passive command.

./cluster-tool.sh shutdown-if-passive -s localhost:23006

Stopping Passive Server(s): `[Server{name='testServer1', host='localhost',
port=23006}]` of Cluster: primary
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.

The example below shows the execution of a failed shutdown-if-passive command, as it tried to shut down a server which is not a passive server.

./cluster-tool.sh shutdown-if-passive -s localhost:23004

Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in passive state or
may not be in the same cluster
Discovered state of all servers are as follows:
Reachable Servers : 5
Stripe #: 1
Server : {Server{name='testServer1', host='localhost', port=23006}}
State : PASSIVE
Server : {Server{name='testServer0', host='localhost', port=23004}}
State : ACTIVE
Stripe #: 2
Server : {Server{name='testServer101', host='localhost', port=2537}}
State : ACTIVE
Server : {Server{name='testServer100', host='localhost', port=2535}}
State : PASSIVE
Server : {Server{name='testServer102', host='localhost', port=2539}}
State : PASSIVE

Please check server logs for more details.
Command failed.

The "shutdown-if-active" Command

The shutdown-if-active command shuts down the specified servers in the cluster, provided the following conditions are met:

All the servers specified in the list are active servers.

All the stripes corresponding to the given servers have at least one server in 'PASSIVE' state.

Usage:

shutdown-if-active -s HOST[:PORT] [-s
HOST[:PORT]]...

Parameters:

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.

Examples

The example below shows the execution of a successful shutdown-if-active command:

./cluster-tool.sh shutdown-if-active -s localhost:23726 -s localhost:26963

Stopping Active Server(s): `[Server{name='testServer0', host='localhost', port=23726}, Server{name='testServer101', host='localhost', port=26963}]` of cluster: primary
STEP (1/2): Preparing For Shutdown
STEP (2/2): Shutdown If Active
Command completed successfully.

The example below shows the execution of a failed shutdown-if-active command as the specified server was not an active server.

./cluster-tool.sh shutdown-if-active -s localhost:23726 -s localhost:23730

Error (FAILURE): Unable to process the partial shutdown request.
One or more of the specified server(s) are not in active state or
may not be in the same cluster.
Reachable Servers : 5
Stripe #: 1
Server : {Server{name='testServer2', host='localhost', port=23730}}
State : PASSIVE
Server : {Server{name='testServer0', host='localhost', port=23726}}
State : ACTIVE
Stripe #: 2
Server : {Server{name='testServer100', host='localhost', port=26961}}
State : PASSIVE
Server : {Server{name='testServer101', host='localhost', port=26963}}
State : ACTIVE
Server : {Server{name='testServer102', host='localhost', port=26965}}
State : PASSIVE

Please check server logs for more details
Command failed.

The "shutdown-all-passives" Command

The shutdown-all-passives command shuts down all the passive servers in the specified cluster, provided the following is true:

All the stripes in the cluster are functional and there is one active server in 'ACTIVE' state with no suspended active servers per stripe.

All passive servers in all the stripes of the cluster will be shut down when this command is run.

Usage:

shutdown-all-passives -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be passive servers.

The -n option is mandatory, as this command can be used only to shut down all the passive servers in the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server.

After the necessary verifications, it proceeds to shut down all the passive servers in a multi-phase manner as explained below:

1. Check with all servers whether it is safe to shut down as a passive server.

2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.

3. Issue a shutdown request to all passive servers if all passive servers succeed in step 2.

4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain intact.

Examples

The example below shows the execution of a successful shutdown-all-passives command.

./cluster-tool.sh shutdown-all-passives -n primary -s localhost:5252

Stopping Passive Server(s): `[Server{name='testServer0', host='localhost',
port=5252},
Server{name='testServer100', host='localhost', port=15361},
Server{name='testServer102', host='localhost', port=15365}]`
of Cluster: primary
STEP (1/2): Preparing to shutdown
STEP (2/2): Stopping if Passive
Command completed successfully.

The "shutdown-all-actives" Command

The shutdown-all-actives command shuts down the active server of all stripes in the cluster, provided the following are true:

There are no suspended active servers in the cluster.

There is at least one passive server in 'PASSIVE' state in every stripe in the cluster.

The active server of all stripes of the cluster will be shut down when this command returns success. If the command reports an error, the state of the servers will be left intact.

Usage:

shutdown-all-actives -n CLUSTER-NAME -s
HOST[:PORT] [-s HOST[:PORT]]...

Parameters:

-n CLUSTER-NAME

The name of the configured cluster.

-s HOST[:PORT] [-s HOST[:PORT]]...

The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. These host(s) need not be active servers.

The -n option is mandatory, as this command can be used only to shut down all the active servers in the entire cluster. Servers in the provided list will be contacted for connectivity, and the command will then verify the cluster configuration with the given cluster name by obtaining the cluster configuration from the first reachable server.

After the necessary verifications, it proceeds to shut down all the active servers in a multi-phase manner as explained below:

1. Check with all servers whether they are safe to be shut down as active servers.

2. Flush any data that needs to be made persistent across all servers that are going down and block any further changes.

3. Issue a shutdown request to all active servers if they succeed in step 2.

4. If any servers fail in step 2 or above, the shutdown request will fail and the state of the servers will remain as before.

Examples

The example below shows the execution of a successful shutdown-all-actives command. Note that the specified host was a passive server in this example. As the specified host is used only to connect to the cluster and obtain the correct state of all the servers in the cluster, the command successfully shuts down all the active servers in the cluster, leaving the passive servers intact.

./cluster-tool.bat shutdown-all-actives -n primary -s localhost:31445

Stopping Active Server(s): `[Server{name='testServer2', host='localhost',
port=31449},
Server{name='testServer100', host='localhost', port=27579}]`
of cluster: primary
STEP (1/2): Preparing For Shutdown
STEP (2/2): Shutdown If Active
Command completed successfully.