Metric | Description |
PodNotReady | If the pod status is not ready for more than 10 minutes, consider the severity as CRITICAL and check the pod console log to find a status or exception. Ensure that either the issue with the existing pod is resolved or a new pod is created. |
DeploymentReplicas​Mismatch | If the pod replicas' count is not equal to number to pods in ready state, even after 10 minutes, consider the severity as CRITICAL and check the pod console log, and identify and resolve the new pod provisioning issue. |
NodeNotReady | If a newly created or scaled pod is not ready in the Kubernetes cluster even after 15 minutes of deployment, consider the severity as CRITICAL and check the autoscaling settings and node provisioning events, logs, and identify and resolve the issue discovered from the logs. |
StatefulSetReplicas​Mismatch | If the Statefulset replicas mismatch for longer than 5 minutes, consider the severity as CRITICAL and check the pod console logs to find the status or exception and resolve the same. Note: Statefulset is a workload API that manages the deployment and scaling of a set of pods. |
PodCrashLooping | If API Data Store pods are stopping and restarting continuously for more than 10 minutes, consider the severity as CRITICAL and describe the pod status and check for any error. Restart the pod and check the startup logs. Check the availability of system resources and cluster health. |
PVC_Usage | If only 10% of the persistent volume is free at any given point of time, consider the severity as CRITICAL and check cluster health and perform the same clean up that you would perform for the API Data Store metrics. |
PVC_Error | If the persistent volume status shows XXX at any given point of time, consider the severity as CRITICAL and check the API Data Store cluster status. |