Container Monitoring Metrics

If you have installed API Gateway through docker or k8s, Software AG recommends you to monitor the following metrics. When the metrics exceed the threshold value, you can consider the severity as mentioned below and perform the possible actions that Software AG recommends to identify, analyze, and debug the problem.

If the pod status is not ready for more than 10 minutes, you can consider the severity as CRITICAL and check the pod console log to find a status or exception. You must ensure that either the issue with the existing pod is resolved or a new pod is created.

If the pod replicas' count is not equal to number to pods in ready state, even after ten minutes, you can consider the severity as CRITICAL and check the pod console log, and identify and resolve the new pod provisioning issue.

If a newly created or scaled pod is not ready in the Kubernetes cluster even after 15 minutes of deployment, you can consider the severity as CRITICAL and check the autoscaling settings and node provisioning events, logs, and identify and resolve the issue discovered from the logs.

If the Statefulset replicas mismatch for longer than five minutes, you can consider the severity as CRITICAL and check the pod console logs to find the status/exception and resolve the same.

Note:
Statesulset is a workload API that manages the deployment and scaling of a set of pods.

If Elasticsearch pods are stopping and restarting continuously for more than 10 minutes, you can consider the severity as CRITICAL and describe the pod status and check for any error. Restart the pod and check the startup logs. Ensure that all the required system resources are available to the pod and check the cluster health.

If only 10% of the persistent volume is free at any given point of time, you can consider the severity as CRITICAL and check cluster health and perform the same clean up that you would perform for the ES metrics.

If the persistent volume status shows XXX at any given point of time, you can consider the severity as CRITICAL and check the API Data Store cluster status.

You can use external tools for dashboarding operations and visualizing metrics and logs.