API Gateway 10.11 | Administering API Gateway | Operating API Gateway | Monitoring API Gateway | Node-level Monitoring | Application Monitoring | Monitoring Terracotta | Infrastructure Metrics | Container Metrics
 
Container Metrics
If you have installed Terracotta through Docker or Kubernetes, Software AG recommends monitoring the following metrics to check if Terracotta container is healthy
Metric
Description
PodNotReady
If the status of the pod is not ready for more then 10 minutes, consider the severity as CRITICAL and perform the following actions to identify the problem:
1. Check the console logs of the pod to find the status for any exception.
2. Identify issue with the provisioning of the pod.
StatefulSet​ReplicasMismatch
If the Statefulset replicas mismatch is longer than 5 minutes, consider the severity as CRITICAL and perform the following actions to identify the problem:
1. Check the console logs of the pod to find the status for any exception.
2. Identify issue with the provisioning of the pod.
PodRestarting
Checks and creates a new pod if the application inside the pod is not up.
If the application inside the pod is not up in 1 minute, consider the severity as CRITICAL. Kubernetes creates a new pod to maintain the availability.
Perform the following actions to identify the problem:
1. Check the previous logs of the pod and ensure that you check the logs for all the pods that are running.
2. Check the Terracotta client logs for errors in Terracotta communication, if the tenant is in cluster mode.
Pod restart verification procedure
For any reason, if the pod is restarted, check the following to verify the health of the new pod.
1. Wait for 150 seconds for the alternate pod (passive) to take an active role.
2. If the alternate pod does not take an active role, it can lead to 2 active pods under following circumstances:
*The passive pod can turn active and also the new pod can turn active simultaneously.
Note:
Terracotta heals itself by sending a zap signal to one of the pods.
*When 2 pods are active, it may be due to the reason that the pod that was transitioning from passive to active is stuck and in this case, its readiness or liveliness checks returns an unhealthy status and an appropriate action that is defined for an unhealthy pod is performed.
Analyze Trend
You can use external tools for dashboarding operations and visualizing metrics and logs.