API Gateway 10.11 | Administering API Gateway | Operating API Gateway | Monitoring API Gateway | Node-level Monitoring | Application Monitoring | Monitoring API Gateway | Infrastructure Metrics | Container Metrics
 
Container Metrics
If you have installed API Gateway through Docker or Kubernetes, Software AG recommends monitoring the following metrics to check if the container is healthy. When the metrics exceed the threshold value, consider the severity as mentioned and perform the possible actions that Software AG recommends to identify, analyze, and debug the problem.
Metric
Description
PodNotReady
If the status of the pod is not ready for more than 10 minutes, consider the severity as CRITICAL.
PodRestarting
If the application inside the pod is not up in 1 minute, consider the severity as CRITICAL.
PodCrashLooping
If API Gateway pod is restarting continuously for 15 minutes, consider the severity as CRITICAL.
Perform the following actions to identify the problem when all the three events occur:
*Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway and its components are healthy and are in a cluster mode.
*Check the possible cause for the pod restart, if it is due to the pod reallocation, node auto scaling and so on.
*Check the node pool resource availability.
*Check the previous logs of the pod for any exception.
*Check the pod events to find the status of the pod.
*Check the Terracotta client logs for errors in Terracotta communication to know if the tenant is in cluster mode.
NodeNotReady
If the status of the new node is not ready in Kubernetes cluster for more than 15 minutes, consider the severity as CRITICAL.
Perform the following actions to identify the problem:
*Check the settings of Autoscale.
*Check the logs for the provisioning of the new node.
*Check if there is any issue with the provisioning of the new pod.
*Ensure that the status of the node is ready.
*Ensure that the pod reallocation is completed.
DeploymentReplicas​Mismatch
If there is any mismatch with the replicas, that is, if the pods replicas count does not match with the pods that are in a ready state for more than 10 minutes, consider the severity as CRITICAL.
Perform the following actions to identify the problem:
*If replicas mismatch, Kubernetes spawns a new pod and checks if the pod is stuck in any state (init, crash loop back)
*If the pod is stuck in any state, Kubernetes deletes the pod and ensures that a new and a healthy pod is created.
*Check the pod events to find the status of the pod, for errors.
*Check the previous logs of the pod for any exception.
*Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway and its components are healthy and are in a cluster mode.
*Check the node pool resource availability.
*Check the status of the new node if it is in a ready state.
*Check if there is any issue with the provisioning of the new pod.
Additionally, for any reason, if a pod restarts, perform the following steps to verify the health of the new pod.
*Check for the readiness of the pod.
*Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway and its components are healthy and are in a cluster mode.
*Check the possible cause for the pod restart, if it is due to the pod reallocation, node auto scaling and so on.
*Check the previous logs of the pod for any exception.
*Check the pod events to find the reason for the restart.
*Check the Terracotta client logs for errors in Terracotta communication, if the tenant is in cluster mode.
Analyze Trend
You can use external tools for dashboarding operations and visualizing metrics and logs.