Monitor | Description |
CPU usage | If the CPU usage of the system is above the recommended threshold value, consider the severity as mentioned: Above 80% threshold for 15 minutes continuously, Severity: WARNING Above 90% threshold for 15 minutes continuously, Severity: CRITICAL The steps to identify the causes of higher CPU usage are as follows: 1. Identify the process that consumes the highest CPU. 2. Generate the thread dump. 3. Analyze the thread dump and logs to identify the problem. 4. Monitor the process closely. If the process fails, it should recreate. 5. Check if the active-passive quorum is intact using the following script: SAGInstallDirectory/Terracotta/server/bin/server-stat.sh 6. Check if API Gateway clients can establish the connection to Terracotta cluster using the following REST endpoint GET /rest/apigateway/health/engine |
Disk usage | If the disk usage of the Terracotta server shows a higher value, rotate logs based on a fixed size and fix the number of rotated files to be persisted. |
Memory usage | If the memory usage is above the recommended threshold value, consider the severity as mentioned: Above 80% threshold, Severity: WARNING Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher memory usage are as follows: Identify the process that consumes more memory. Start the Terracotta Management Console (TMC) and check the heap usage, off-heap usage and warnings. Analyze the memory dump and Terracotta logs to identify the issue. Monitor the process closely. Check if the active-passive quorum is intact using the following script: SAGInstallDirectory/Terracotta/server/bin/server-stat.sh Check if API Gateway clients can establish the connection to Terracotta cluster using the following REST endpoint GET /rest/apigateway/health/engine |