Tamino XML Server Version 9.7

High Availability

Tamino is often part of a mission-critical infrastructure. Downtime of the infrastructure means downtime of the underlying business. One example: A major European airport uses Tamino in a high availability scenario - one hour of downtime would mean a severe interruption of the entire European air traffic network. This, of course, would also have a heavy business impact.

An increasing number of applications require such a high degree of availability. Clients cannot afford to wait to get mission-critical information. Applications have to provide a continuous service. Tamino addresses these requirements by providing support for high availability configurations.

In the following sections, you will find information about high availability in general and how it is implemented in Tamino. Also, you are pointed to other parts of the documentation that contain relevant information.

High-availability data processing configurations are necessarily associated with considerably higher costs and complexity than standard configurations. Issues such as licensing costs for standby CPUs and professional services consultancy should be taken into consideration. It is not possible to cover all the technical implications within the scope of this documentation. Software AG experts will be pleased to advise you in all aspects of your implementation. Please contact your nearest Software AG account manager or the professional services consultant assigned to you for further information.

This document is subdivided into the following topics:


What is High Availability?

This section will first give you some general background information about high availability. In the second part, the possibilities that Tamino offers with regard to high availability are introduced

Definition

Generally, high availability is defined as the ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliability of its components would suggest. As high availability cannot easily be quantified, it can be considered as a design goal rather than an actual design of a system. It is a trade-off between the cost of downtime and the cost of the protective measures that are available to avoid or reduce downtime. With increasing investment, availability increases also.

Several technological aspects need to be considered for providing high availability:

  1. Good system administration practices

  2. Reliable backups

  3. Disk and volume management

  4. Reliable network

  5. Reliable local environment

  6. Good client management

  7. Error-free services and applications

  8. Failover

  9. Replication management

  10. Procedures for disaster recovery

High availability is achieved by applying the technologies from the bottom up - the more aspects considered, the higher the availability of your system. The trend is clear: Reaching higher levels of availability will cost more money, and the amount of money it takes to get from one level to the next level increases.

High Availability in Tamino

What does high availability mean with regard to Tamino? High availability depends on many factors, such as the system environment, the application requirements and the balance between cost versus risk. The strategy for achieving high availability aims at eliminating single points of failure in a system.

Improving high availability is more than just implementing one failover solution for Tamino. Tamino provides several solutions for the different building blocks of availability mentioned above that need to be considered for improving availability:

Top of page

Failures and Failover

Failures

Tamino usually runs in a very complex environment, which consists of many components. All of them can fail individually, leading to a failure of larger components, and in the worst case, of the whole environment. Points of failure could be:

Failover

Failover means that there is a second system configured as a standby system for the primary one. In case of a failure, the second system can take over: After a short interruption in service, all the functions of the primary system are accomplished by the secondary system. The take-over process is done automatically. Downtime is limited to the takeover time needed for the migration. Users should not be aware of the failover process.

Top of page

Basic Architecture of a Failover System

To avoid failure and loss of data, Tamino should be built into a cluster system which reduces the amount of time that an application is unavailable. The basic architecture of a high-available failover system is shown in the following graphic. It consists of (a minimum of) two nodes, a disk shared between the two nodes, and a heartbeat connection connecting the two nodes. This setup is controlled by a cluster software. Each of the two nodes is identified by its own IP address. The high availability cluster is accessed by the clients, using a third IP address. This IP address is attached to only one node at a time under the control of the cluster software:

graphics/failconf.png

The basic components are:

Tamino XML Server is installed on both nodes (two licenses are required). The disks of the RAID array are mounted either on node 1 or on node 2, but never on both nodes at the same time. The Tamino database containers are located on the RAID array. For the clients, the two-node architecture is a "black box". They are connected to the virtual IP address of the high availability cluster, and not to a specific node in the cluster. The cluster software is installed on the servers and uses plug-ins for monitoring, starting and stopping the Tamino XML Server.

Top of page

The Failover Process

Once you have established a basic architecture for failover situations in Tamino and a failure occurs, the software handles the situation automatically.

Basically, the failover process is simple: The cluster software detects a problem on tamserver1. It terminates all parts of Tamino on node 1. It also releases all resources needed by Tamino (disk, mount, IP address). Then, on tamserver2, the cluster software re-establishes the needed resources and starts Tamino. After a failover, the database recovers to a consistent state of the database.

However, clients need to be aware of the fact that during a failover, the Tamino database session is lost and there might be a timeout response during the failover period. Client programming should be adapted in a way that the corresponding response codes are not handled as an unrecoverable failure.

The following graphics give a more detailed description of the process.

Example: Web Server fails

graphics/failure.png

A cluster server uses plug-ins to control the software, in this case the web server and Tamino XML Server, which are installed on node 1. Controlling means that the cluster plug-ins can start, stop and monitor the software. The client is connected to the virtual IP address of the high availability cluster. A failure occurs if one of the components is no longer available, for example due to an hardware error. In this example, the web server fails.

Failover: Software handles the process

graphics/failover.png

As the cluster server continuously monitors the software, failures are immediately recognized. In the case of a failure (if, for example, the web server fails), the cluster server stops both the web server and the Tamino XML Server. The IP address is switched from node 1 to node 2. Thus the applications are taken over by node 2, without the clients realizing that a switch has occurred (in the best case). The cluster server starts the web server, as well as the Tamino XML Server. Work can be resumed. The failover is done automatically and includes the following actions:

Top of page

High Availability Software

Cluster software monitors the health of the hardware and the software in a cluster. Tamino supports the major third-party cluster software products with respective plug-ins. These plug-ins allow an integrated control of Tamino servers in the context of the cluster software. The Tamino plug-ins are part of the Tamino Enterprise License.

Third Party Software

Tamino High Availability supports Veritas Cluster Server, providing a high availability kit for Solaris and Windows. For the details on supported Veritas Cluster Server versions and operating system versions contact your software supplier. Other high availability software (such as HP Service Guard or IBM HACMP) might be supported on request and under special conditions. Please contact your software supplier if you require support for other high availability software.

Tamino High Availability Software

The Tamino Enterprise kit contains a Tamino High Availability kit. The major components of this kit are:

Top of page

Summary

High availability is defined by the demands placed on the system by its users, and it is met through appropriate protection methods by its administrators to deliver application availability. Tamino offers several possibilities to achieve high availability, among them good system practices, backups, failover and replication. Failover, as discussed in this guide, comprises a special hardware architecture and the corresponding third party software.

Top of page