Infrastructure Metrics

Metric	Description
sag_is_server_proc_sys_percent	Checks the percentage of CPU used by the Operating System.
sag_is_server_proc_cpu_percent	Checks the percentage of CPU used by the Integration Server JVM. If the CPU usage of both the metrics is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned: Above 80% threshold, Severity: ERROR Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher CPU usage is as follows: 1. Identify the process that consumes the highest CPU. 2. Generate the thread dump. 3. Analyze thread dump to identify the thread locks. 4. Analyze the logs of all the API Gateway instances in the node. 5. If CPU spikes happen due to excess load, Software AG recommends you to monitor the load and scale up and scale down API Gateway if required. For more details about scaling, see Scaling.

Metric	Description
sag_is_server_total_disk_mbytes	Checks the percentage of total available disk space in megabytes.
sag_is_server_used_disk_mbytes	Checks the percentage of used disk space in megabytes. If the disk usage of both the metrics is above the recommended threshold value, consider the severity as mentioned: Above 80% threshold, Severity: ERROR Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher disk usage are as follows: 1. The events archived in API Gateway are stored in the temp directory in the following location: SAGInstallDirectory\profiles\IS_instance_name\workspace\temp. Check the size of the temp directory and clean up the space to reduce the disk usage. 2. Check if the log rotation works as configured for the following file types: server, audit, error, session, wrapper, osgi, and API Gateway and check the size of the log files that consume more disk space to know if it is greater than the configured values. 3. Purge the events periodically to clean up the disk space for optimal performance of API Gateway. For more details about Purging, see Archive and Purge using API.

Metric	Description
sag_is_server_total_memory_mbytes	Checks the percentage of total amount of physical memory available in megabytes.
sag_is_server_used_memory_mbytes	Checks the percentage of total amount of physical memory used in megabytes. If the memory usage of both the metrics is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned: Above 80% threshold, Severity: ERROR Above 90% threshold, Severity: CRITICAL The steps to identify the causes of higher memory usage is as follows: 1. Identify the process that consumes more memory. 2. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and responding. 3. Generate the heap dump. 4. Analyze the logs of all the API Gateway instances and identify the file that consumes more memory. 5. Identify the server that has an issue and restart the server if required. 6. Perform the following actions after restarting the server: a. Check for the readiness of API Gateway. b. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode. c. Check the resource availability of all the required system resources like memory, heap, disk. d. Check the Terracotta client logs for errors in Terracotta communication for a cluster set-up.