In a cluster environment, the Adabas nuclei working on the same cluster database must collaborate to keep the database physically and logically consistent while processing user commands. To do this, they issue intracluster commands to one another. If one cluster member does not respond to an intracluster command from another cluster member within a specified time period, the sending member cancels the unresponsive member.
Adabas Cluster Services provides alert and timeout settings that are designed to help you prevent or handle critical situations where system problems might cause the prolonged unavailability of one cluster member, thus endangering the ability of the cluster member (or in severe cases, the entire cluster) to provide service. These settings include:
A cancel alert that generates an operator message can be invoked when a cluster member is unresponsive to an intracluster command for a specified period of time. If the cluster member does not respond before the message times out, it is canceled.
Self-termination alerts that generate operator messages can be invoked when a canceled cluster member does not terminate as requested (in a specified period of time) and the other cluster members prepare to self-terminate.
An operator query that prints an operator console message and requests a response from the operator. This can be invoked by cluster members that are preparing to self-terminate because a canceled peer member remains active.
XCF system- and member-level status monitoring can be used to determine if a cluster member is unable to respond to internal intracluster requests. This monitoring process prints operator messages that provide early warning information about the cluster member.
Messaging statistics provide information about the performance of message transmission events. These statistics can be used to determine the impact of messaging on system performance and to determine how to set the ADARUN parameters related to the alert and timeout settings.
This document covers the following topics:
In cluster environments, the cluster members issue intracluster commands to one another to ensure that the database is kept physically and logically consistent. If one cluster member does not respond to an intracluster command from another cluster member within the time specified by the ADARUN MXMSG parameter, the sending member cancels the unresponsive member.
You can invoke a cancel alert before the unresponsive peer member is canceled. This alert generates an operator message that provides early warning information before the unresponsive cluster member is canceled.
The cancel alert is governed by the setting of the ADARUN MXMSGWARN parameter. This optional parameter specifies the number of seconds after which a cluster nucleus should generate an operator message warning about an outstanding intracluster response. If the cluster member does not respond within the time specified by ADARUN MXMSGWARN, message ADAX9C is issued. This warning message can be used to notify you sometime before the unresponsive cluster member is canceled.
As complements to the ADARUN MXMSG and MXMSGWARN parameters, two operator commands,
MXMSG
and MXMSGWARN
, are
provided that allow you to change the corresponding ADARUN settings
dynamically, while the database is running. In addition, the
DPARM
operator command’s output information includes
the settings of the ADARUN parameters related to cluster alert and timeout
enhancements.
For more information about the MXMSG and MXMSGWARN parameters, read
MXMSG: Timeout Threshold for
Internucleus Command Processing and
MXMSGWARN : Timeout
Threshold for Internucleus Command Processing Warning
. For information about the
DPARM
, MXMSG
, and
MXMSGWARN
operator commands, read
Adabas Cluster
Nucleus Operator Commands .
In cluster environments, if one cluster nucleus has issued a cancellation request for a second unresponsive cluster nucleus, but the canceled peer cluster nucleus does not terminate within the time specified by the ADARUN MXCANCEL parameter, the sending nucleus will either return response code 124, subcode 28 (if the intracluster communication occurred on behalf of an Adabas command) or terminate itself abnormally (if the intracluster communication occurred on behalf of an internal process that must not fail).
You can invoke a self-termination alert before a nucleus terminates itself because a canceled peer nucleus fails to terminate. This alert generates an operator message that provides early warning information regarding the pending self-termination.
Self-termination alerts are governed by the setting of the new ADARUN MXCANCELWARN parameter. This optional parameter specifies the number of seconds after which a requesting cluster nucleus should generate an operator message warning about the inability of a canceled peer nucleus to terminate quickly. If the canceled peer nucleus does not terminate within the time specified by ADARUN MXCANCELWARN, message ADAX9G is issued. This warning message can be used to notify you that the nucleus issuing the message is in danger of terminating itself.
As complements to the ADARUN MXCANCEL and MXCANCELWARN parameters, two operator commands,
MXCANCEL
and
MXCANCELWARN
, are provided that allow you to change
the corresponding ADARUN settings dynamically, while the database is running.
In addition, the DPARM operator command’s output information has been enhanced
to include the settings of ADARUN parameters related to Adabas Cluster Services alert and timeout
enhancements.
For more information about the MXCANCEL and MXCANCELWARN parameters, read
MXCANCEL: Timeout
Threshold for Canceled Peer Nucleus and
MXCANCELWARN :
Timeout Threshold for Canceled Peer Nucleus Warning
. For information about the
DPARM
, MXCANCEL
, and
MXCANCELWARN
operator commands, read
Adabas Cluster
Nucleus Operator Commands .
You can invoke an operator query when a cluster member is in the process of self-terminating because a canceled peer nucleus fails to terminate. This gives you a chance to terminate the canceled cluster member manually, thus avoid the self-termination of the member that issued the ineffective cancel request.
This operator query prints a console message (message ADAX9J) explaining the situation and requesting instructions, waiting for a specified time for a response. The valid responses to message ADAX9J are:
R (print the ADAX9J message again and continue to wait for resolution of this issue, but without setting a new wait period for the response)
T (terminate the querying nucleus with message ADAX99 and user abend 79)
W (continue to wait for another time period of length MXWTOR)
The amount of time the operator query waits for a response is governed by the setting of the ADARUN MXWTOR parameter. This optional parameter specifies the number of seconds the nucleus should wait for the operator response. If the operator does not respond in this timeframe and if the canceled peer nucleus still has not terminated, the requesting nucleus issues message ADAX99 and terminates itself.
However, if the canceled cluster member terminates after all (whether due to operator intervention or another reason), the cluster nucleus that issued the operator query stays alive; it retracts the query and initiates an online recovery process.
As a complement to the ADARUN MXWTOR parameter, an operator command,
MXWTOR
, is provided that allows you to change the
MXWTOR setting dynamically, while the database is running. In addition, the
DPARM
operator command’s output information includes
the settings of ADARUN parameters related to Adabas Cluster Services alert and
timeout enhancements.
For more information about the MXWTOR parameter, read
MXWTOR : Self-Termination
Operator Query Interval . For information about the
DPARM
and MXWTOR
operator
commands, read Adabas Cluster Nucleus
Operator Commands .
XCF system- and member-level status monitoring on z/OS systems can be used to determine early if a cluster member may be unable to respond to internal intracluster requests. This monitoring process occurs by checking the activity (heartbeat) of each cluster nucleus and printing operator messages which provide early warning information about the cluster nuclei that show no heartbeat.
XCF status monitoring provides a second method by which Adabas Cluster Services can warn you that a cluster nucleus might be unable to respond in a timely way to intracluster commands. The first method is, of course, via the normal intracluster communication that occurs between cluster members. If a nucleus has heartbeat exceptions (as determined by XCF status monitoring), it most likely will be unable to process and respond to an intracluster command; if a nucleus is slow to respond to an intracluster command, it might or might not have a heartbeat monitor exception (a nucleus may appear to be active to XCF but be unable to respond to an intracluster command). If the ADARUN MXMSGWARN parameter for a cluster nucleus is nonzero (read Using Cancel Alerts), it produces warning messages (ADAX9B or ADAX9C) when intracluster communication with other nuclei in the cluster is too slow, Likewise, when XCF status monitoring determines that a nucleus is missing its heartbeat updates, it produces warning messages (ADAX22 and ADAX04). You can use an automated mechanism set up at installation to raise an alert or take other appropriate action based on the existence of these messages, as they identify existing or potential problems in the cluster.
XCF status monitoring uses an ADARUN parameter,
MXSTATUS, to activate XCF member-level
status monitoring and to specify the monitoring interval (in seconds). In
addition, the DMEMTB
operator command includes a
flag in its member state table messages indicating whether a system or
message-level status monitoring exception was encountered and whether a message
was issued for the exception.
To complement the new ADARUN MXSTATUS parameter, an operator command,
MXSTATUS
, allows you to change the MXSTATUS setting
dynamically, while the database is running. In addition, the
DPARM
operator command’s output information includes
the settings of ADARUN parameters related to Adabas Cluster Services alert and timeout
enhancements.
Note:
The MXSTATUS parameter and
operator command are only used by Adabas Cluster Services and not by Adabas Parallel Services. Adabas Parallel Services
does not use
XCF and ignores this parameter and setting.
For more information about the MXSTATUS parameter, read
MXSTATUS : Member-Level
XCF Status Monitoring Heartbeat Interval . For
information about the updated DPARM
and
DMEMTB
, and MXSTATUS
operator commands, read Adabas Cluster Nucleus
Operator Commands .
Adabas Cluster Services messaging statistics provide information about the performance of message transmission events. These statistics can be used to determine the impact of messaging on system performance and to determine how to set the ADARUN MXMSG and MXMSGWARN parameters related to the other alert and timeout enhancements in Adabas Cluster Services.
The performance statistics are provided in the termination statistics of
an Adabas nucleus as well as in response to the
DXMSG
operator command. The performance statistics
are split into those that are subject to the ADARUN
MXMSG parameter setting and those that
are not; after each is reported separately in the output, a combined report is
provided containing the summarization of the two for all messages.
For more information about the DXMSG
operator
command, read Adabas
Cluster Nucleus Operator Commands .