Configuration Terms and Concepts

Terracotta 10.11 | Terracotta Server Administration Guide | Configuration Terms and Concepts

To have a solid grasp of how to configure a Terracotta Server Array (TSA), one must first have a strong understanding of the basic concepts of what a TSA is, what it uses as resources, and how its configuration system works.

Review of Terracotta Server Array Concepts

As a quick review of high-level TSA concepts:

A Terracotta Server Array (TSA) is composed of one or more "stripes".

A stripe is composed of one or more Terracotta Servers

Each stripe contributes to the total storage and computing capacity of the TSA. If there are five stripes, then each one will contain roughly one-fifth of the stored data.

Within a stripe, one server is "active" (serves workload from clients), and any others act as "mirrors" for HA purposes.

Because any member of the stripe may be elected as the "active" server, the configuration and system resources of all stripe members must be equivalent.

Stripes have names (which can be assigned during configuration time), and nodes (servers) that are members of the stripe also have names. These names are useful when "targeting" configuration or operational commands.

For more information on the above concepts, please review the sections Cluster Architecture and Active and Passive Servers.

Server Resource Concepts

Terracotta Servers utilize resources in order to provide services and features such as network connectivity, data storage, durability, backups, etc.

Notable items that need to be configured (or considered whether the default value is appropriate) include:

Network ports - This includes a "port" for receiving client requests, and a "group-port" for communicating with other stripe members (servers). The default values are 9410 and 9430 respectively.

Server metadata directory - This is a directory where the server stores important metadata about its internal state. The default location is <user-home-dir>/terracotta/metadata.

Configuration directory - This is a directory where the server stores its internal configuration. The default location is <user-home-dir>/terracotta/config.

Offheap storage resources - Servers need one or more offheap (memory) resources defined in order to have space in which to store data (via Caches or Datasets). For proper operation, all servers in a cluster (TSA) need to have the same set of offheap resources defined (because Cache and Dataset configurations will reference them for use). The default is to create one offheap resource named main with size 512MB.

Data directories - Optional, but commonly used, data directories are used for durable (persistent) storage of data. For proper operation, all servers in a cluster (TSA) need to have the same set of data directories defined (because Cache and Dataset configurations will reference them for use). The default is to create one data directory named main with location <user-home-dir>/terracotta/user-data/main.

Backup directory - Used as the destination for backups.

Logging directory - Used as the destination for server logs. Default location is <user-home-dir>/terracotta/logs.

Failover priority - a cluster-wide setting that affects HA behavior when nodes are shut down or fail. A choice must be made as to whether the cluster should favor availability of service or consistency of data when situations occur that could lead to split-brain scenarios (e.g. when servers are still running but cannot communicate with each other).

For more information on about these items, please review the sections Config Tool, The TerracottaConfiguration File, and Configuring the Terracotta Server.

Configuration Concepts

Perhaps the most important thing to understand about a Terracotta server's configuration, is that it is stored and updated "internally" to the server, not in a human-editable file that is read each time the servers starts.

After a node has been configured and is running, everything that is needed to restart it, and get it running again (with the same configuration and internal state) is stored within the node's config-dir.

The mechanisms for adding to or changing the internally stored configuration of servers is therefore the focus of what needs to be understood next.

Fundamental, Required Settings

There are a few very fundamental items related to a server instance (node) that are necessary for its existence. This includes: network ports, config-dir and metadata-dir. The ports are of course used to make the node accessible. The config-dir is where the server will store (and later find) its internal configuration. And the metadata-dir is where the server will store (and later find) its internal state information.

Necessarily Equivalent Settings

Some configuration settings need to be consistent or equivalent across all nodes in a stripe and/or across all nodes in all stripes of a cluster. The reasoning for this is fairly clear and logical, if we give a few examples.

Settings that need not be (or, in some cases must not be) the same across nodes include things such as the node name. Clearly, the node name must be unique, or it wouldn't be useful for identification of the node. The network port can be any legal and available port number, and there is no need for any two servers to use the same port, though it probably is most clear if they all use the same port. (Note that they obviously must not use the same port if they are on the same host, but for other reasons (resources, and HA) it is strongly recommended not to run servers on the same host).

Some examples of settings that need to be equivalent across nodes are offheap resources and data directories. This is because these are referenced (by name) in configurations for Datasets and Caches, and are expected to exist on all servers. For example, if a user configures a new Dataset to utilize (store its data in) an offheap resource named 'primary', then, as the Dataset is created on each server member of the cluster, an offheap resource named 'primary' must be found on each, or the creation of the dataset will fail. It also makes sense that the offheap resource named 'primary' should have an identical size on each server, such that mirrors can hold a copy of all the same data the active server has, etc. For data-dirs, it is similar: when a Cache or Dataset configuration instructs the usage of a data-dir, one with that name must exist on each server in the TSA. (However, in the case of a data-dir, while the data-dir name needs to be known by all servers, the file path that it refers to does not have to be identical on all servers. Hence, we say that all server nodes should have the equivalent set of data-dirs.)

Initial Configuration Steps

The typical steps for initial configuration of a TSA are:

1. Start each server node (as unconfigured servers, they will enter 'diagnostic mode', and await configuration)

2. Provide each server with configuration settings

3. Attach nodes to each other to form stripes

4. Attach stripes to each other to form a cluster

5. Activate the cluster

The first two steps can be accomplished in one command-line, if the user specifies configuration settings as parameters to the start-tc-server script.

All of the steps can be accomplished in one command-line, if the user specifies a config file containing all of the settings for all nodes of the cluster. Note that such a file is only used to initialize the set of configuration properties in each node's internal configuration (stored in its config-dir) - it is never read or utilized again.

After these steps, the server nodes will restart themselves (to leave diagnostic mode) and form the configured cluster. As the servers restart, they will use the configuration stored in their config-dir. With any subsequent restarts of a node, the user must specify (to the start-tc-server script) the location of the config-dir (or the script will attempt to use the default location).

Once the cluster is activated, some configuration properties (such as node name and config-dir) cannot be changed. Others can be changed or added later, as necessary (such as offheap resources).

Understanding the Configuration Directory and Config Tool

Configuration Directory

As previously noted, a server node's config-dir is where its internal storage, or "source of truth", for configuration is kept. The files under this directory are not to be edited by the user, as (for reasons that will be made more clear below) they are solely managed by the server node itself.

If you are restarting a server process, and want it to be the same server node that it was before, you need to ensure that the start-tc-server script specifies (or defaults to) the appropriate config-dir.

Config Tool

The config tool (see Config Tool) is used to add or modify configuration settings for servers, both before they are activated as part of a cluster, and afterward. It can also be used to see what a server's current configuration settings are, or export them for use as a template or backup for recreating clusters.

The config tool connects to server nodes and issues commands, such as to set a configuration property. Some config tool commands may target a single node, while others may target all nodes of a stripe, or all nodes of a cluster. In all cases, the server responds to the config tool's requests by reading and/or updating the configuration state files contained in the server's config-dir.

Configuration Operations and Outcomes

As noted in the previous paragraphs, a server node responds or reacts to config-tool requests by reading or updating the contents of its internal configuration (which is contained in the config-dir, once the cluster has been activated).

Because many configuration settings must be the same on all member nodes of a stripe and some must be the same on all members of the cluster, changes to configuration must be coordinated, and therefore complex outcomes are possible.

For example, if the number of "voters" for a stripe is to be changed, it is only safe for that change to go into effect if it does so at the same time on all nodes in the stripe. Otherwise, "bad things" could happen when a failover situation occurs (e.g. one node may make a bad decision as to whether or not it should move to "active" state.) Similarly the adjustment of offheap resource sizes, or the attachment of an additional node to a stripe needs to be synchronized across servers.

This coordination, or synchronization is accomplished via a two-phase commit protocol, wherein configuration changes are staged and validated on each server (to determine that the change can be successful on all servers), and then committed (or activated) on each server in the second phase.

Typically, (in most cases), the config tool and the server's internal configuration manager handle the complexities of the coordination just fine. However, if a failure occurs during a configuration change, or one of the nodes is not running when a configuration change is made, there are possible outcomes that may require follow-up on the user's part.

Most of the non-typical cases are automatically corrected by the servers, such as when a node restarts (if it was down during the configuration change, or if it crashed after the configuration change was staged but before it was committed or rolled back). When servers that are restarting connect to the other members of their stripe, they discover whether their configuration state is out of sync, and if so, then they receive the appropriate updates from the other server(s).

Very rarely the config tool's repair command may need to be used to force a commit or rollback of a config change after careful inspection of the configuration state via the config tool's get or export commands.

In all cases the config tool will inform you about the success or failure of a configuration operation, and hint to you any next steps that may be necessary.