Monitoring periodic status
In addition to the status that is shown on the card for a model, it is possible to enable generation of periodic status published as
Cumulocity IoT operations or events. See
Configuration on setting the
status_device_name and
status_period_secs tenant options.
Each operation has the following parameters:
Parameter | Description |
models_running | Information about deployed models that are running. |
models_failed | Information about deployed models that have failed. |
chain_diagnostics | |
apama_status | The Apama correlator status metrics. Many status names correspond to the key names in the Apama REST API. The values are returned by the getValues() action of the com.apama.correlator.EngineStatus event and exposed via the REST API. |
Model status
The following information is published for each deployed model that is currently running or has failed:
Name | Description |
mode | The mode of the deployed model. It is SIMULATION for models deployed in simulation mode. Otherwise, it is PRODUCTION. |
modeProperties | Any mode-specific properties of the model. This includes the start and end time of the simulation for models running in the SIMULATION mode. |
numModelEvaluations | The total number of times the model has been evaluated since it was deployed. |
numBlockEvaluations | The total number of times that the blocks have been evaluated in the model since it was deployed. This is the sum of the count of evaluation for each block in the model. |
avgBlockEvaluations | The average number of blocks that have been evaluated per model evaluation. |
numOutputGenerated | The total number of outputs generated by the model since it was deployed. |
This information about each model provides insight into the performance or working of models. For example, a model with a much larger number of numBlockEvaluations than another model might indicate that it is consuming most resources even though it might have low numModelEvaluations. Similarly, it can be used to find out whether a model is producing output at the expected rate relative to the number of times it is evaluated.
You can monitor the status using the Apama REST API or the Management interface which is an EPL plug-in. See the following topics in the Apama product documentation for further information:
"Managing and Monitoring over REST" in
Deploying and Managing Apama Applications, and
"Using the Management interface" in
Developing Apama Applications.
Chain diagnostics
The following information is published for all chains that are present:
Name | Description |
creationTime | The time when this chain was created. |
executionCount | The number of times the chain was evaluated. 1 |
modelsInEvalOrder | A list of model identifiers in the order in which the models were evaluated. |
pendingTimersCount | The number of pending timers which are behind the current time. |
maxTime | The maximum time taken to evaluate the chain. 1 |
minTime | The minimum time taken to evaluate the chain. 1 |
meanTime | The mean time taken to evaluate the chain. 1 |
execBucket | The execution time statistics of the chain. 1, 2 |
1 The fields are updated if the chain is evaluated fully or partially. Partial evaluation of a chain means that only some models of the chain are evaluated.
2 There are 21 buckets which store the number of times when the execution time falls within the bucket range. Each bucket has size of
timedelay_secs divided by 10 seconds, except for the last bucket which stretches to infinity. For example, if
timedelay_secs is 2 seconds, then the first bucket holds the number of times when the chain execution took up to 0.2 seconds, the second bucket holds the number of times when the chain execution took more than 0.2 seconds but up to 0.4 seconds, and so on. See also the following example:
Bucket | Execution time range |
1 | 0 - 0.2 |
2 | 0.2 - 0.4 |
3 | 0.4 - 0.6 |
... | ... |
20 | 3.8 - 4.0 |
21 | 4.0 - infinity |
Slowest chain status
When chains of models with a high throughput are deployed across multiple workers, it may happen that the chain falls behind in processing input events, creating a backlog of input events that are still to be processed. These chains are referred to as slow chains. A message is written to the correlator log if the slowest chain is delayed by more than 1 second. For example:
Analytics Builder chain of models "Model 1", "Model 2", "Model 3" is slow by 3 seconds.
See
Accessing the correlator log for information on where to find the correlator log.
The following information on the slowest chain is also available in the periodic status that is published as Cumulocity IoT operations or events, within the apama_status parameter:
Name | Description |
user-analyticsbuilder.slowestChain.models | All models contained in the slowest chain. |
user-analyticsbuilder.slowestChain.delaySec | The number of seconds the chain lags behind in processing the input events. |
Example
The following is an example of the status operation data that is published by Cumulocity IoT:
{
"creationTime": "2021-01-05T21:48:54.620+02:00",
"deviceId": "6518",
"deviceName": "apama_status",
"id": "8579",
"self": "https://myown.iot.com/devicecontrol/operations/8579",
"status": "PENDING",
"models_running": {
"Package Tracking": {
"mode": "SIMULATION",
"modeProperties":{"startTime":1533160604, "endTime":1533160614},
"numModelEvaluations": 68,
"numBlockEvaluations": 967,
"avgBlockEvaluations": 14.2,
"numOutputGenerated": 50
}
},
"models_failed": {
"Build Pipeline ": {
"mode": "PRODUCTION",
"numModelEvaluations": 214,
"numBlockEvaluations": 671,
"avgBlockEvaluations": 3.13,
"numOutputGenerated": 4
}
},
"chain_diagnostics": {
"780858_780858": {
"creationTime": 1600252455.164188,
"executionCount": 4,
"modelsInEvalOrder": ["780858_780858", "780860_780860"],
"pendingTimersCount": 1,
"timeData": {
"execBucket": [2,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
"maxTime": 0.00014781951904296875,
"meanTime": 0.0001152356465657552,
"minTime": 6.29425048828125e-05
}
}
},
"apama_status": {
"user-analyticsbuilder.slowestChain.models": "\"Model 1\", \"Model 2\", \"Model 3\"",
"user-analyticsbuilder.slowestChain.delaySec": "3",
"user-analytics-oldEventsDropped": "1",
"numJavaApplications": "1",
"numMonitors": "27",
"user-httpServer.eventsTowardsHost": "1646",
"numFastTracked": "183",
"user-httpServer.authenticationFailures": "4",
"numContexts": "5",
"slowestReceiverQueueSize": "0",
"numQueuedFastTrack": "0",
"mostBackedUpInputContext": "<none>",
"user-httpServer.failedRequests": "4",
"slowestReceiver": "<none>",
"numInputQueuedInput": "0",
"user-httpServer.staticFileRequests": "0",
"numReceived": "1690",
"user-httpServer.failedRequests.marginal": "1",
"numEmits": "1687",
"numOutEventsUnAcked": "1",
"user-httpServer.authenticationFailures.marginal": "1",
"user-httpServer.status": "ONLINE",
"numProcesses": "48",
"numEventTypes": "228",
"virtualMemorySize": "3177968",
"numQueuedInput": "0",
"numConsumers": "3",
"numOutEventsQueued": "1",
"uptime": "1383561",
"numListeners": "207",
"numOutEventsSent": "1686",
"mostBackedUpICQueueSize": "0",
"numSnapshots": "0",
"mostBackedUpICLatency": "0",
"numProcessed": "1940",
"numSubListeners": "207"
}
}