In a cluster environment, the Adabas nuclei working on the same cluster database must collaborate to keep the database physically and logically consistent while processing user commands. To do this, they issue intracluster commands to one another. If one cluster member does not respond to an intracluster command from another cluster member within a specified time period, the sending member cancels the unresponsive member.
Adabas Parallel Services provides alert and timeout settings that are designed to help you prevent or handle critical situations where system problems might cause the prolonged unavailability of one cluster member, thus endangering the ability of the cluster member (or in severe cases, the entire cluster) to provide service. These settings include:
A cancel alert that generates an operator message can be invoked when a cluster member is unresponsive to an intracluster command for a specified period of time. If the cluster member does not respond before the message times out, it is canceled.
Self-termination alerts that generate operator messages can be invoked when a canceled cluster member does not terminate as requested (in a specified period of time) and the other cluster members prepare to self-terminate.
An operator query that prints an operator console message and requests a response from the operator. This can be invoked by cluster members that are preparing to self-terminate because a canceled peer member remains active.
Messaging statistics provide information about the performance of message transmission events. These statistics can be used to determine the impact of messaging on system performance and to determine how to set the ADARUN parameters related to the alert and timeout settings.
This document covers the following topics:
In cluster environments, the cluster members issue intracluster commands to one another to ensure that the database is kept physically and logically consistent. If one cluster member does not respond to an intracluster command from another cluster member within the time specified by the ADARUN MXMSG parameter, the sending member cancels the unresponsive member.
You can invoke a cancel alert before the unresponsive peer member is canceled. This alert generates an operator message that provides early warning information before the unresponsive cluster member is canceled.
The cancel alert is governed by the setting of the ADARUN MXMSGWARN parameter. This optional parameter specifies the number of seconds after which a cluster nucleus should generate an operator message warning about an outstanding intracluster response. If the cluster member does not respond within the time specified by ADARUN MXMSGWARN, message ADAX9C is issued. This warning message can be used to notify you sometime before the unresponsive cluster member is canceled.
As complements to the ADARUN MXMSG and MXMSGWARN parameters, two operator commands,
MXMSG
and MXMSGWARN
, are
provided that allow you to change the corresponding ADARUN settings
dynamically, while the database is running. In addition, the
DPARM
operator command’s output information includes
the settings of the ADARUN parameters related to cluster alert and timeout
enhancements.
For more information about the MXMSG and MXMSGWARN parameters, read
MXMSG: Timeout Threshold for
Internucleus Command Processing and
MXMSGWARN : Timeout
Threshold for Internucleus Command Processing Warning
. For information about the
DPARM
, MXMSG
, and
MXMSGWARN
operator commands, read
Cluster Operator
Commands.
In cluster environments, if one cluster nucleus has issued a cancellation request for a second unresponsive cluster nucleus, but the canceled peer cluster nucleus does not terminate within the time specified by the ADARUN MXCANCEL parameter, the sending nucleus will either return response code 124, subcode 28 (if the intracluster communication occurred on behalf of an Adabas command) or terminate itself abnormally (if the intracluster communication occurred on behalf of an internal process that must not fail).
You can invoke a self-termination alert before a nucleus terminates itself because a canceled peer nucleus fails to terminate. This alert generates an operator message that provides early warning information regarding the pending self-termination.
Self-termination alerts are governed by the setting of the new ADARUN MXCANCELWARN parameter. This optional parameter specifies the number of seconds after which a requesting cluster nucleus should generate an operator message warning about the inability of a canceled peer nucleus to terminate quickly. If the canceled peer nucleus does not terminate within the time specified by ADARUN MXCANCELWARN, message ADAX9G is issued. This warning message can be used to notify you that the nucleus issuing the message is in danger of terminating itself.
As complements to the ADARUN MXCANCEL and MXCANCELWARN parameters, two operator commands,
MXCANCEL
and
MXCANCELWARN
, are provided that allow you to change
the corresponding ADARUN settings dynamically, while the database is running.
In addition, the DPARM operator command’s output information has been enhanced
to include the settings of ADARUN parameters related to Adabas Parallel Services alert and timeout
enhancements.
For more information about the MXCANCEL and MXCANCELWARN parameters, read
MXCANCEL: Timeout
Threshold for Canceled Peer Nucleus and
MXCANCELWARN :
Timeout Threshold for Canceled Peer Nucleus Warning
. For information about the
DPARM
, MXCANCEL
, and
MXCANCELWARN
operator commands, read
Cluster Operator
Commands.
You can invoke an operator query when a cluster member is in the process of self-terminating because a canceled peer nucleus fails to terminate. This gives you a chance to terminate the canceled cluster member manually, thus avoid the self-termination of the member that issued the ineffective cancel request.
This operator query prints a console message (message ADAX9J) explaining the situation and requesting instructions, waiting for a specified time for a response. The valid responses to message ADAX9J are:
R (print the ADAX9J message again and continue to wait for resolution of this issue, but without setting a new wait period for the response)
T (terminate the querying nucleus with message ADAX99 and user abend 79)
W (continue to wait for another time period of length MXWTOR)
The amount of time the operator query waits for a response is governed by the setting of the ADARUN MXWTOR parameter. This optional parameter specifies the number of seconds the nucleus should wait for the operator response. If the operator does not respond in this time frame and if the canceled peer nucleus still has not terminated, the requesting nucleus issues message ADAX99 and terminates itself.
However, if the canceled cluster member terminates after all (whether due to operator intervention or another reason), the cluster nucleus that issued the operator query stays alive; it retracts the query and initiates an online recovery process.
As a complement to the ADARUN MXWTOR parameter, an operator command,
MXWTOR
, is provided that allows you to change the
MXWTOR setting dynamically, while the database is running. In addition, the
DPARM
operator command’s output information includes
the settings of ADARUN parameters related to Adabas Cluster Services alert and
timeout enhancements.
For more information about the MXWTOR parameter, read
MXWTOR : Self-Termination
Operator Query Interval . For information about the
DPARM
and MXWTOR
operator
commands, read Cluster Operator
Commands.
Adabas Parallel Services messaging statistics provide information about the performance of message transmission events. These statistics can be used to determine the impact of messaging on system performance and to determine how to set the ADARUN MXMSG and MXMSGWARN parameters related to the other alert and timeout enhancements in Adabas Parallel Services.
The performance statistics are provided in the termination statistics of
an Adabas nucleus as well as in response to the
DXMSG
operator command. The performance statistics
are split into those that are subject to the ADARUN
MXMSG parameter setting and those that
are not; after each is reported separately in the output, a combined report is
provided containing the summarization of the two for all messages.
For more information about the DXMSG
operator
command, read Cluster Operator
Commands.