Metric | Description |
sag_is_service_threads | Checks the percentage of total number of threads used for service execution where the threads are obtained from the server thread pool. If the threads usage is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned: Above 80% threshold, Severity: ERROR Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher threads usage are as follows: 1. Identify the process that consumes the highest number of threads. 2. Generate the thread dump. 3. Analyze thread dump to identify the thread locks. 4. Analyze the logs of all API Gateway instances in the node. |
Metric | Description |
sag_is_number_service_errors | Checks the number of services that results in errors or exceptions. If service errors are encountered, consider the severity as ERROR. The steps to identify the causes of service errors are as follows: 1. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode. 2. Check the server logs for any exception from SAGInstallDirectory\IntegrationServer\instances\instance_name\logs\server.log. |
Metric | Description |
sag_is_used_memory_bytes | Checks the percentage of total used memory of JVM. If the memory usage is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned: Above 80% threshold, Severity: ERROR Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher memory usage of JVM are as follows: 1. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode. 2. Generate the heap dump. 3. Analyze the logs of all the API Gateway instances. 4. Identify the server that has an issue and restart the server if required. 5. Perform the following actions after restarting the server: a. Check for the readiness of API Gateway. b. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode. c. Check the resource availability of all the required system resources like memory, heap, and disk. d. Check API Data Store connectivity with API Gateway server. e. Check the Terracotta client logs for errors in Terracotta communication for a cluster set up. |
Metric | Description |
sag_is_http_requests | Checks the percentage of total number of HTTP or HTTPS requests since the last statistics poll. The statistics poll interval is controlled by the watt.server.stats.pollTime server configuration parameter and the default interval is 60 seconds. If the total number of HTTP or HTTPS requests since the last statistics poll is above the threshold limit that is based on the Throughput Per Second (TPS) value, consider the severity as ERROR. |