Cluster Tool
The cluster tool is a command-line utility that allows administrators of the Terracotta Server Array to perform a variety of cluster management tasks. For example, the cluster tool can be used to:
Configure or re-configure a cluster
Obtain the status and configuration information of running servers
Dump the state of running servers
Stop the running servers
Take backups of running servers
The cluster tool script is located in tools/cluster-tool/bin under the product installation directory as cluster-tool.bat for Windows platforms, and as cluster-tool.sh for Unix/Linux.
Usage Flow
The following is a typical flow in a cluster setup and usage:
3. Make sure the stripes are online and ready.
4. Configure the cluster using the configure command of the cluster tool. See the section The "configure" Command" below for details.
5. Check the current status of the cluster or specific servers in the cluster using the
status command. See the section
The "status" Command below for details.
Cluster Tool commands
The cluster tool provides several commands. To list them and their respective options, run cluster-tool.sh (or cluster-tool.bat on Windows) without any arguments, or use the option -h (long option --help).
The following section provides a list of options common to all commands, and thus need to be specified before the command name:
Precursor options
1. -v (long option --verbose)
This option gives you a verbose output, and is useful to debug error conditions.
2. -srd (long option --security-root-directory)
This option can be used to communicate with a server which has TLS/SSL-based security configured. For more details on setting up security in a Terracotta cluster, see the section
SSL/TLS Security Configuration in Terracotta.
Note: If this option is not specified while trying to connect to a secure cluster, the command will fail with a SECURITY_CONFLICT error.
3. -t (long option --timeout)
This option lets you specify a custom timeout value (in milliseconds) for connections to be established in cluster tool commands.
Note: If this option is not specified, the default value of 30,000 ms (or 30 seconds) is used.
Each command has the option -h (long option --help), which can be used to display the usage for the command.
The following is a comprehensive explanation of the available commands:
The "configure" Command
The
configure command creates a cluster from the otherwise independent
Terracotta stripes, taking as input a mandatory license key. No functionality is available on the server until a valid license is installed. See the section
Licensing for details.
All servers in any given stripe should be started with the same configuration file. The configure command configures the cluster based on the configuration(s) of the currently known active server(s) only. If there is a configuration mismatch among the active and passive server(s) within the same stripe, this command can configure the cluster while taking down any passive server(s) with configuration mismatches. This validation also happens upon server restart and changes will prevent the server from starting. See the section on the reconfigure command for more information on how to update server configurations.
The command will fail if any of the following checks do not pass:
1. License checks
a. The license is valid.
b. The provided configuration files do not violate the license.
2. Configuration checks
The provided configuration files are consistent across all the stripes.
The following configuration items are validated in the configuration files:
1. config:
a. offheap-resource
Offheap resources present in one configuration file must be present in all the files with the same sizes.
b. data-directories
Data directory identifiers present in one configuration file must be present in all the files. However, the data directories they map to can differ.
2. service
a. security
Security configuration settings present in one configuration file must match the settings in all the files.
b. backup-restore
If this element is present in one configuration file, it must be present in all the files.
3. failover-priority
The failover priority setting present in one configuration file must match the setting in all the files.
Refer to the section
The
Terracotta
Configuration File for more information on these elements.
The servers section of the configuration files is also validated. Note that it is not validated between stripes but rather against the configuration used to start the servers themselves.
server host It must be a strict match
name It must be a strict match
tsa-port It must be a strict match
Note: Once a cluster is configured, a similar validation will take place upon server restart. It will cause the server to fail to start if there are differences.
Usage:
configure -n CLUSTER-NAME [-l LICENSE-FILE] TC-CONFIG [TC-CONFIG...]
configure -n CLUSTER-NAME [-l LICENSE-FILE] -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME A name that is to be assigned to the cluster.
-l LICENSE-FILE The path to the license file. If you omit this option, the cluster tool looks for a license file named license.xml in the location tools/cluster-tool/conf under the product installation directory.
TC-CONFIG [TC-CONFIG ...] A whitespace-separated list of configuration files (minimum 1) that describes the stripes to be added to the cluster.
-s HOST[:PORT] [-s HOST[:PORT]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option. Any one server from each stripe can be provided. However, multiple servers from the same stripe will work as well. The cluster will be configured with the configurations which were originally used to start the servers.
Note: The command configures the cluster only once. To update the configuration of an already configured cluster, the reconfigure command should be used.
Examples
The example below shows a successful execution for a two stripe configuration and a valid license.
./cluster-tool.sh configure -l ~/license.xml -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Configuration successful
License installation successful
Command completed successfully
The example below shows a failed execution because of an invalid license.
./cluster-tool.sh configure -l ~/license.xml
-n tc-cluster ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Error (BAD_REQUEST): com.terracottatech.LicenseException: Invalid license
The example below shows a failed execution with two stripe configurations mis-matching in their offheap resource sizes.
./cluster-tool.sh configure -n tc-cluster -l
~/license.xml ~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
Error (BAD_REQUEST): Mismatched off-heap resources in provided config files:
[[primary-server-resource: 51200M], [primary-server-resource: 25600M]]
The "reconfigure" Command
The reconfigure command updates the configuration of a cluster which was configured using the configure command. With reconfigure, it is possible to:
1. Update the license on the cluster.
2. Add new offheap resources, or grow existing ones.
3. Add new data directories.
4. Add new configuration element types.
The command will fail if any of the following checks do not pass:
1. License checks
a. The new license is valid.
b. The new configuration files do not violate the license.
2. Stripe checks
a. The new configuration files have all the previously configured servers.
b. The order of the configuration files provided in the reconfigure command is the same as the order of stripes in the previously configured cluster.
3. Configuration checks
a. Stripe consistency checks
The new configuration files are consistent across all the stripes. For the list of configuration items validated in the configuration files, refer to the section The "configure" Command above for details.
b. Offheap checks
The new configuration has all the previously configured offheap resources, and the new sizes are not smaller than the old sizes.
c. Data directories checks
The new configuration has all the previously configured data directory names.
d. Configuration type checks
The new configuration has all the previously configured configuration types.
Usage:
reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]
reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...
reconfigure -n CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
TC-CONFIG [TC-CONFIG ...] A whitespace-separated list of configuration files (minimum 1) that describe the new configurations for the stripes.
-l LICENSE-FILE The path to the new license file.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of servers, each specified using the -s option.
Servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server.
reconfigure command usage scenarios:
1. License update
When it is required to update the license, most likely because the existing license has expired, the following reconfigure command syntax should be used:
reconfigure -n CLUSTER-NAME -l LICENSE-FILE -s HOST[:PORT] [-s HOST[:PORT]]...
Note: A license update does not require the servers to be restarted.
2. Configuration update
When it is required to update the cluster configuration, the following reconfigure command syntax should be used:
reconfigure -n CLUSTER-NAME TC-CONFIG [TC-CONFIG...]
The steps below should be followed in order:
a. Update the Terracotta configuration files with the new configuration, ensuring that it meets the reconfiguration criteria mentioned above.
b. Run the reconfigure command with the new configuration files.
c. Restart the servers with the new configuration files for the new configuration to take effect.
3. License and configuration update at once
In the rare event that it is desirable to update the license and the cluster configuration in one go, the following reconfigure command syntax should be used:
cluster-tool.sh reconfigure -n
CLUSTER-NAME -l LICENSE-FILE TC-CONFIG [TC-CONFIG...]
The steps to be followed here are the same as those mentioned in the Configuration update section above.
Examples
The example below shows a successful re-configuration of a two stripe cluster
tc-cluster with new stripe configurations.
./cluster-tool.sh reconfigure -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
License not updated (Reason: Identical to previously installed license)
Configuration successful
Command completed successfully.
The example below shows a failed re-configuration because of a license violation.
./cluster-tool.sh reconfigure -n tc-cluster
-l ~/license.xml -s localhost:9410
Error (BAD_REQUEST): Cluster offheap resource is not within the limit of the license.
Provided: 409600 MB, but license allows: 102400 MB only
The example below shows a failed re-configuration of a two stripe cluster with new stripe configurations having fewer data directories than existing configuration.
./cluster-tool.sh reconfigure -n tc-cluster
~/tc-config-stripe-1.xml ~/tc-config-stripe-2.xml
License not updated (Reason: Identical to previously installed license)
Error (CONFLICT): org.terracotta.exception.EntityConfigurationException:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception:
Entity: com.terracottatech.tools.client.TopologyEntity:topology-entity
lifecycle exception: org.terracotta.entity.ConfigurationException:
Mismatched data directories. Provided: [use-for-platform, data],
but previously known: [use-for-platform, data, myData]
The "status" Command
The status command displays the status of a cluster, or particular server(s) in the same or different clusters..
Usage:
status -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
status -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
status command.
./cluster-tool.sh status -n tc-cluster -s localhost
Cluster name: tc-cluster
Stripes in the cluster: 2
Servers in the cluster: 4
Server{name='server-1', host='localhost', port=9410},
Server{name='server-2', host='localhost', port=9610} (stripe 1)
Server{name='server-3', host='localhost', port=9710},
Server{name='server-4', host='localhost', port=9910} (stripe 2)
Total configured offheap: 102400M
Backup configured: true
SSL/TLS configured: false
IP whitelist configured: false
Data directories configured: data, myData
| STRIPE: 1 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+
| server-1 | localhost:9410 | ACTIVE-COORDINATOR |
| server-2 | localhost:9610 | PASSIVE-STANDBY |
+--------------------+----------------------+--------------------------+
| STRIPE: 2 |
+--------------------+----------------------+--------------------------+
| Server Name | Host:Port | Status |
+--------------------+----------------------+--------------------------+
| server-3 | localhost:9710 | ACTIVE-COORDINATOR |
| server-4 | localhost:9910 | PASSIVE-STANDBY |
+--------------------+----------------------+--------------------------+
The example below shows the execution of a server-level
status command. No server is running at
localhost:9510, hence the
UNKNOWN status.
./cluster-tool.sh status -s localhost:9410 -s localhost:9510 -s localhost:9910
+----------------------+--------------------------+----------------+
| Host:Port | Status | Cluster |
+----------------------+--------------------------+----------------+
| localhost:9410 | ACTIVE-COORDINATOR | tc-cluster |
| localhost:9510 | UNKNOWN | UNKNOWN |
| localhost:9910 | PASSIVE-STANDBY | tc-cluster |
+----------------------+--------------------------+----------------+
Error (PARTIAL_FAILURE): Command completed with errors.
The "dump" Command
The dump command dumps the state of a cluster, or particular server(s) in the same or different clusters. The dump of each server can be found in its logs.
Usage:
dump -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
dump -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
dump command.
./cluster-tool.sh dump -n tc-cluster -s localhost:9910
Command completed successfully.
The example below shows the execution of a server-level
dump command. No server is running at
localhost:9510, hence the dump failure.
./cluster-tool.sh dump -s localhost:9410 -s localhost:9510 -s localhost:9910
Dump successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Dump successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "stop" Command
The stop command stops the cluster, or particular server(s) in the same or different clusters.
Usage:
stop -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
stop -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
stop command.
./cluster-tool.sh stop -n tc-cluster -s localhost
Command completed successfully.
The example below shows the execution of a server-level
stop command. No server is running at
localhost:9510, hence the stop failure.
./cluster-tool.sh stop -s localhost:9410 -s localhost:9510 -s localhost:9910
Stop successful for server at: localhost:9410
Connection refused from server at: localhost:9510
Stop successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "ipwhitelist-reload" Command
Usage:
ipwhitelist-reload -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
ipwhitelist-reload -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAMEThe name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level
ipwhitelist-reload command.
./cluster-tool.sh ipwhitelist-reload -n tc-cluster -s localhost
IP white-list reload successful for server at: localhost:9410
IP white-list reload successful for server at: localhost:9610
IP white-list reload successful for server at: localhost:9710
IP white-list reload successful for server at: localhost:9910
Command completed successfully.
The example below shows the execution of a server-level
ipwhitelist-reload command. No server is running at
localhost:9510, hence the IP whitelist reload failure.
./cluster-tool.sh ipwhitelist-reload -s localhost:9410
-s localhost:9510 -s localhost:9910
IP white-list reload successful for server at: localhost:9410
Connection refused from server at: localhost:9510
IP white-list reload successful for server at: localhost:9910
Error (PARTIAL_FAILURE): Command completed with errors.
The "backup" Command
The backup command takes a backup of the running Terracotta cluster.
Usage:
backup -n CLUSTER-NAME -s HOST[:PORT] [-s HOST[:PORT]]...
Parameters:
-n CLUSTER-NAME The name of the configured cluster.
-s HOST[:PORT] [-s HOST[:PORT]]... The host:port(s) or host(s) (default port being 9410) of running servers, each specified using the -s option.
When provided with the option -n, servers in the provided list will be sequentially contacted for connectivity, and the command will be executed on the first reachable server. Otherwise, the command will be individually executed on each server in the list.
Examples
The example below shows the execution of a cluster-level successful
backup command. Note that the server at
localhost:9610 was unreachable.
./cluster-tool.sh backup -n tc-cluster -s localhost:9610 -s localhost:9410
PHASE 0: SETTING BACKUP NAME TO : 996e7e7a-5c67-49d0-905e-645365c5fe28
localhost:9610: TIMEOUT
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS
PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: TIMEOUT
localhost:9910: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS
PHASE (3/4): START_BACKUP
localhost:9710: SUCCESS
localhost:9410: SUCCESS
PHASE (4/4): EXIT_ONLINE_BACKUP_MODE
localhost:9710: SUCCESS
localhost:9410: SUCCESS
Command completed successfully.
The example below shows the execution of a cluster-level failed
backup command.
./cluster-tool.sh backup -n tc-cluster -s localhost:9610
PHASE 0: SETTING BACKUP NAME TO : 93cdb93d-ad7c-42aa-9479-6efbdd452302
localhost:9610: SUCCESS
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: SUCCESS
PHASE (1/4): PREPARE_FOR_BACKUP
localhost:9610: NOOP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
localhost:9910: NOOP
PHASE (2/4): ENTER_ONLINE_BACKUP_MODE
localhost:9410: BACKUP_FAILURE
localhost:9710: SUCCESS
PHASE (CLEANUP): ABORT_BACKUP
localhost:9410: SUCCESS
localhost:9710: SUCCESS
Backup failed as some servers '[Server{name='server-1', host='localhost', port=9410},
[Server{name='server-2', host='localhost', port=9710}]]',
failed to enter online backup mode.