Container Metrics

Metric	Description
PodNotReady	If the status of the pod is not ready for more than 10 minutes, consider the severity as CRITICAL.
PodRestarting	If the application inside the pod is not up in 1 minute, consider the severity as CRITICAL.
PodCrashLooping	If API Gateway pod is restarting continuously for 15 minutes, consider the severity as CRITICAL. Perform the following actions to identify the problem when all the three events occur: Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway and its components are healthy and are in a cluster mode. Check the possible cause for the pod restart, if it is due to the pod reallocation, node auto scaling and so on. Check the node pool resource availability. Check the previous logs of the pod for any exception. Check the pod events to find the status of the pod. Check the Terracotta client logs for errors in Terracotta communication to know if the tenant is in cluster mode.
NodeNotReady	If the status of the new node is not ready in Kubernetes cluster for more than 15 minutes, consider the severity as CRITICAL. Perform the following actions to identify the problem: Check the settings of Autoscale. Check the logs for the provisioning of the new node. Check if there is any issue with the provisioning of the new pod. Ensure that the status of the node is ready. Ensure that the pod reallocation is completed.
DeploymentReplicasMismatch	If there is any mismatch with the replicas, that is, if the pods replicas count does not match with the pods that are in a ready state for more than 10 minutes, consider the severity as CRITICAL. Perform the following actions to identify the problem: If replicas mismatch, Kubernetes spawns a new pod and checks if the pod is stuck in any state (init, crash loop back) If the pod is stuck in any state, Kubernetes deletes the pod and ensures that a new and a healthy pod is created. Check the pod events to find the status of the pod, for errors. Check the previous logs of the pod for any exception. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway and its components are healthy and are in a cluster mode. Check the node pool resource availability. Check the status of the new node if it is in a ready state. Check if there is any issue with the provisioning of the new pod.