Universal Messaging 10.11 | Operations Guide | Monitoring
 
Monitoring
Universal Messaging provides a set of command line tools that allow you to perform many of the common actions available. Some of these tools can be used to monitor different aspects of a realm which indicate its health. Below is the description of these tools and their usage.
Environment State Check
Periodically (every 1 minute) run the Health Checker tool for checking the environment status.
Command Details
Usage:
runUMTool HealthChecker -rname=<rname> -check=EnvironmentStateCheck

Examples:
runUMTool HealthChecker -rname=nsp://localhost:9000
-check=EnvironmentStateCheck

Required arguments:
rname : Name of a realm to check.
Refer to the section Running a Configuration Health Check section of the Universal Messaging Administration Guide for more information on the HealthChecker tool.
Output of each run can be parsed to raise alerts for any line starting with WARN or ERROR. The tool checks what percentage of memory is taken by events from the whole heap memory. If the percentage is between 70 and 80, or between 80 and 90, or above 90, an appropriate warning will be displayed.
Sample output of the EnvironmentStateCheck command:
ENVIRONMENT STATE CHECK
Environment State [umserver]
INFO: [umserver] Connections: 2
INFO: [umserver] Queued threads: 0
INFO: [umserver] Vended threads: 62
INFO: [umserver] Total threads: 64
INFO: [umserver] Total memory (MB): 981
INFO: [umserver] Used memory (MB): 101
INFO: [umserver] Free memory (MB): 880 (89.69%)
INFO: [umserver] Total direct memory (MB): 1024
INFO: [umserver] Used direct memory (MB): 0
INFO: [umserver] Free direct memory (MB): 1024 (100%)
INFO: [umserver] Max heap memory (MB): 981
INFO: [umserver] Memory allocated for events (MB): 0
Store State Check
Periodically (every 1 minute) run the Health Checker tool for checking the store status.
The following checks have to be enabled in the HealthChecker:
*StoreMemoryCheck
*StoreMismatchCheck
*StoreWarningsCheck
Command Details
Usage:
runUMTool HealthChecker -rname=<rname>
-check=StoreMemoryCheck, StoreMismatchCheck, StoreWarningsCheck

Examples:
runUMTool HealthChecker -rname=nsp://localhost:9000
-check=StoreMemoryCheck, StoreMismatchCheck, StoreWarningsCheck

Required arguments:rname : Name of a realm to check.
Refer to the section Running a Configuration Health Check of the Universal Messaging Administration Guide for more information on the HealthChecker tool.
Output of each run can be parsed to raise alerts for any line starting with WARN or ERROR.
Cluster State
Check the cluster state by a given RNAME, which is part of a cluster.
Command Details
Usage:
runUMTool ClusterState -rname=<rname> optional_args]

Examples:
runUMTool ClusterState -rname=nsp://localhost:9000

Required arguments:
rname : Name of a realm, which is part of a cluster.

Optional Parameters:
username : Your Universal Messaging server username.
password : Your Universal Messaging server password.
As seen in the sample output below, the statuses of the cluster nodes can be parsed and appropriate alerts can be raised.
Sample output of ClusterState command
--------------------------------------
Cluster Name: Cluster1
--------------------------------------
Cluster Nodes:
Node name: umserver (Master)
Realm rnames: nhp://10.42.96.207:9000/
nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9000/
Is node clustered: true

Node name: umserver2 (Slave)
Realm rnames: nhp://10.42.96.207:9000/
nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9000/ nhp://10.42.96.207:9001/
nhp://fe80:0:0:0:2de9:64df:4bbf:d85c%14:9001/
Is node clustered: true

--------------------------------------
Cluster Statuses
--------------------------------------
Server name: umserver
Server status: online
Cluster state: Master
Broadcast time: 0
Client Request size: 0
Comms Queue size: 0
Queue size: 0
Response time: 0

--------------------------------------
Server name: umserver2
Server status: online
Cluster state: Slave
Broadcast time: 0
Client Request size: 0
Comms Queue size: 0
Queue size: 0
Response time: 0
--------------------------------------
Alternative Java Sample
The sample code nClusterWatch.java found in <InstallDir>\UniversalMessaging\java\examples\com\pcbsys\nirvana\nAdminAPI\apps\nClusterWatch.java, demonstrates how the Java Admin API can be used to monitor the Cluster State.
Monitor Channels
Monitors the channels and queues in a realm and prints totals.
Command Details
Usage:
runUMTool MonitorChannels -rname=<rname> [optional_args]

Examples:

runUMTool MonitorChannels -rname=nsp://localhost:9000
-channelname=channel0 -format=plaintext

runUMTool MonitorChannels -rname=nsp://localhost:9000
-channelname=queue1 -format=plaintext

Required arguments:
rname : URL of the realm to monitor channels and queues for.

Optional Parameters:
channelname : Name of a specific channel or queue to monitor
format : Format to print output in (plaintext/xml/json)
username : Your Universal Messaging server username.
password : Your Universal Messaging server password.
As seen in the sample output below, channel and queue statuses of the cluster nodes can be parsed and appropriate alerts can be raised.
Sample output of the MonitorChannels command
Name : channel0
Total Events Published : 10000
Total Events Consumed : 0
Last Event ID : 10277
Current Connections : 0
Total Connections : 0
Used Space : 781K
Events : 10000
Memory Usage : 1M
% Free : 0%
Cache Hit : 0.0
Alternative Java Sample
Sample code using Java Admin API to monitor Channel and Queue depths
package com.pcbsys.nirvana.nAdminAPI.apps;

import com.pcbsys.nirvana.client.nSessionAttributes;
import com.pcbsys.nirvana.nAdminAPI.nContainer;
import com.pcbsys.nirvana.nAdminAPI.nLeafNode;
import com.pcbsys.nirvana.nAdminAPI.nNode;
import com.pcbsys.nirvana.nAdminAPI.nRealmNode;

import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;

/**
* Scans the provided Realm for Channels and Queues and displays their
* attributes (Current Depth, Total Published and Total Consumed)
* every 10 seconds.
*
* Expects Realm Name (nsp://hostname:port) as runtime argument
*/
public class GetChannelsAndQueuesInfo {

private nRealmNode realmNode;
private List<nLeafNode> channels = new ArrayList<>();
private List<nLeafNode> queues = new ArrayList<>();

public GetChannelsAndQueuesInfo(String realmName)
throws Exception {
realmNode = new nRealmNode(new nSessionAttributes(realmName));

scanRealmForChannelsAndQueues(realmNode.getNodes());
}

/**
* Recursively scans the Realm namespace for Channels and Queues
*
* @param realmNamespaceNodes
*/
private void scanRealmForChannelsAndQueues(
final Enumeration realmNamespaceNodes) {
while (realmNamespaceNodes.hasMoreElements()) {
final nNode child = (nNode) realmNamespaceNodes.nextElement();

if (child instanceof nLeafNode) {
final nLeafNode leafNode = (nLeafNode) child;

if (leafNode.isChannel()) {
channels.add(leafNode);
} else if(leafNode.isQueue()) {
queues.add(leafNode);
}
}
else if (child instanceof nContainer) {
scanRealmForChannelsAndQueues(((nContainer) child).getNodes());
}
}
}

public nRealmNode getRealmNode() {
return realmNode;
}

public List<nLeafNode> getChannels() {
return channels;
}

public List<nLeafNode> getQueues() {
return queues;
}

public static void main(String[] args) throws Exception {
if(args.length == 0) {
throw new Exception("Realm Name startup argument is missing.");
}

GetChannelsAndQueuesInfo getChannelsAndQueuesInfo =
new GetChannelsAndQueuesInfo(args[0]);
System.out.println();
System.out.println("Connected to Realm : " +
getChannelsAndQueuesInfo.getRealmNode().getRealm().getName());
System.out.println();


while(true) {
StringBuilder displayString = new StringBuilder();

displayString.append(
"Channels (Name | Current Depth | Total Published | Total Consumed) \n");
displayString.append(
"------------------------------------------------------------------ \n");
for (nLeafNode oneLeaf : getChannelsAndQueuesInfo.getChannels()) {
printLeafNode(displayString, oneLeaf);
}

displayString.append(
"\nQueues (Name | Current Depth | Total Published | Total Consumed) \n");
displayString.append(
"---------------------------------------------------------------- \n");
for (nLeafNode oneLeaf : getChannelsAndQueuesInfo.getQueues()) {
printLeafNode(displayString, oneLeaf);
}
displayString.append(
"==================================================================");
System.out.println();

System.out.println(displayString);

Thread.sleep(10000);
}
}

private static void printLeafNode(StringBuilder displayString,
nLeafNode oneLeaf) {
displayString.append(oneLeaf.getAbsolutePath())
.append(" | ")
.append(oneLeaf.getCurrentNumberOfEvents())
.append(" | ")
.append(oneLeaf.getTotalPublished())
.append(" | ")
.append(oneLeaf.getTotalConsumed())
.append("\n");
}
}
Identify Large Durable Outstanding Events
Identifies channels containing Durables with a large number of outstanding events.
Command Details
Usage:
runUMTool IdentifyLargeDurableOutstandingEvents
-rname=<rname> -threshold=<threshold>
[optional_args]

Examples:
runUMTool IdentifyLargeDurableOutstandingEvents
-rname=nsp://localhost:9000 -threshold=100

Required arguments:
rname : URL of the realm to list the details of all the channels within.
threshold : Long value representing the tolerated number of outstanding events.

Optional Parameters:
username : Your Universal Messaging server username.
password : Your Universal Messaging server password.
Periodic Logging of Server Status
The Universal Messaging server writes status information to the log file at regular intervals. The default interval can be configured using the StatusBroadcast realm configuration property, and the default value is 5 seconds).
For information on realm configuration properties, see the section Realm Configuration in the Enterprise Manager part of the Administration Guide.
Sample status log message
ServerStatusLog> Memory=3577, Direct=3925, EventMemory=0,
Disk=277070, CPU=0.2, Scheduled=29, Queued=0,
Connections=5, BytesIn=12315, BytesOut=19876,
Published=413, Consumed=1254, QueueSize=0,
ClientsSize=0, CommQueueSize=0
The log file can be parsed to extract the server status and take appropriate preemptive actions if these parameters are deviating from set thresholds.
For more information, refer to the section Periodic Logging of Server Status section of the Universal Messaging Concepts Guide.
Other Parameters to Monitor
Apart from the parameters mentioned above, some more system parameters which need to be monitored are:
*Disk utilization : Any rapid increase in the disk usage should be tracked and alerted. An appropriate threshold needs to be set and monitored.
*CPU utilization
*Memory utilization
*Network activity