Infrastructure Metrics
Infrastructure metrics include system metrics and container metrics. For information about container metrics, see
Container Metrics.
System Metrics
Monitor the following metrics to analyze the health of Terracotta server.
![*](chapterTOC_bullet.png)
CPU usage
![*](chapterTOC_bullet.png)
Disk usage
![*](chapterTOC_bullet.png)
Memory usage
If the metrics return an exceeded threshold value, consider the severity as mentioned below and perform the possible actions that Software AG recommends to identify and debug the problem and contact Software AG for further support.
Note:
The threshold values, configurations, and severities that are mentioned throughout this section are the guidelines that Software AG suggests for an optimal performance of API Gateway. You can modify these thresholds or define actions based on your operational requirements.
To generate thread dump and heap dump for monitoring various system metrics, see
Troubleshooting: Monitoring Terracotta Server Array.
Monitor | Description |
CPU usage | If the CPU usage of the system is above the recommended threshold value, consider the severity as mentioned: Above 80% threshold for 15 minutes continuously, Severity: WARNING Above 90% threshold for 15 minutes continuously, Severity: CRITICAL The steps to identify the causes of higher CPU usage are as follows: 1. Identify the process that consumes the highest CPU. 2. Generate the thread dump. 3. Analyze the thread dump and logs to identify the problem. 4. Monitor the process closely. If the process fails, it should recreate. 5. Check if the active-passive quorum is intact using the following script: SAGInstallDirectory/Terracotta/server/bin/server-stat.sh 6. Check if API Gateway clients can establish the connection to Terracotta cluster using the following REST endpoint GET /rest/apigateway/health/engine |
Disk usage | If the disk usage of the Terracotta server shows a higher value, rotate logs based on a fixed size and fix the number of rotated files to be persisted. |
Memory usage | If the memory usage is above the recommended threshold value, consider the severity as mentioned: Above 80% threshold, Severity: WARNING Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher memory usage are as follows: ![*](chapterTOC_bullet.png) Identify the process that consumes more memory. ![*](chapterTOC_bullet.png) Start the Terracotta Management Console (TMC) and check the heap usage, off-heap usage and warnings. ![*](chapterTOC_bullet.png) Analyze the memory dump and Terracotta logs to identify the issue. ![*](chapterTOC_bullet.png) Monitor the process closely. ![*](chapterTOC_bullet.png) Check if the active-passive quorum is intact using the following script: SAGInstallDirectory/Terracotta/server/bin/server-stat.sh ![*](chapterTOC_bullet.png) Check if API Gateway clients can establish the connection to Terracotta cluster using the following REST endpoint GET /rest/apigateway/health/engine |