Container Metrics

If you have installed API Gateway through Docker or Kubernetes, Software AG recommends monitoring the following metrics to check if API Data Store container is healthy. When the metrics exceed the threshold value, consider the severity as mentioned below and perform the possible actions that Software AG recommends to identify, analyze, and debug the problem.

Metric	Description
PodNotReady	If the pod status is not ready for more than 10 minutes, consider the severity as CRITICAL and check the pod console log to find a status or exception. Ensure that either the issue with the existing pod is resolved or a new pod is created.
DeploymentReplicasMismatch	If the pod replicas' count is not equal to number to pods in ready state, even after 10 minutes, consider the severity as CRITICAL and check the pod console log, and identify and resolve the new pod provisioning issue.
NodeNotReady	If a newly created or scaled pod is not ready in the Kubernetes cluster even after 15 minutes of deployment, consider the severity as CRITICAL and check the autoscaling settings and node provisioning events, logs, and identify and resolve the issue discovered from the logs.
StatefulSetReplicasMismatch	If the Statefulset replicas mismatch for longer than 5 minutes, consider the severity as CRITICAL and check the pod console logs to find the status or exception and resolve the same. Note: Statefulset is a workload API that manages the deployment and scaling of a set of pods.
PodCrashLooping	If API Data Store pods are stopping and restarting continuously for more than 10 minutes, consider the severity as CRITICAL and describe the pod status and check for any error. Restart the pod and check the startup logs. Check the availability of system resources and cluster health.
PVC_Usage	If only 10% of the persistent volume is free at any given point of time, consider the severity as CRITICAL and check cluster health and perform the same clean up that you would perform for the API Data Store metrics.
PVC_Error	If the persistent volume status shows XXX at any given point of time, consider the severity as CRITICAL and check the API Data Store cluster status.