API Gateway 10.11 | Administering API Gateway | Operating API Gateway | Monitoring API Gateway | Node-level Monitoring | Application Monitoring | Monitoring API Gateway | Infrastructure Metrics
 
Infrastructure Metrics
Infrastructure metrics include system metrics and container metrics. For information about container metrics, see Container Metrics.
System Metrics
Monitor the following system metrics to analyze API Gateway health:
*CPU usage
*Disk usage
*Memory usage
Monitor the CPU usage
Metric
Description
sag_is_server_​proc_sys_percent
Checks the percentage of CPU used by the Operating System.
sag_is_server_​proc_cpu_percent
Checks the percentage of CPU used by the Integration Server JVM.
If the CPU usage of both the metrics is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned:
*Above 80% threshold, Severity: ERROR
*Above 90% threshold, Severity: CRITICAL
The steps to identify the causes of higher CPU usage is as follows:
1. Identify the process that consumes the highest CPU.
2. Generate the thread dump.
3. Analyze thread dump to identify the thread locks.
4. Analyze the logs of all the API Gateway instances in the node.
5. If CPU spikes happen due to excess load, Software AG recommends you to monitor the load and scale up and scale down API Gateway if required. For more details about scaling, see Scaling.
Monitor the Disk usage
Metric
Description
sag_is_server_​total_disk_mbytes
Checks the percentage of total available disk space in megabytes.
sag_is_server_​used_disk_mbytes
Checks the percentage of used disk space in megabytes.
If the disk usage of both the metrics is above the recommended threshold value, consider the severity as mentioned:
*Above 80% threshold, Severity: ERROR
*Above 90% threshold, Severity: CRITICAL
The steps to identify the causes of higher disk usage are as follows:
1. The events archived in API Gateway are stored in the temp directory in the following location: SAGInstallDirectory\profiles\IS_instance_name\workspace\temp. Check the size of the temp directory and clean up the space to reduce the disk usage.
2. Check if the log rotation works as configured for the following file types: server, audit, error, session, wrapper, osgi, and API Gateway and check the size of the log files that consume more disk space to know if it is greater than the configured values.
3. Purge the events periodically to clean up the disk space for optimal performance of API Gateway.
For more details about Purging, see Archive and Purge using API.
Monitor the Memory usage
Metric
Description
sag_is_server_​total_memory_mbytes
Checks the percentage of total amount of physical memory available in megabytes.
sag_is_server_​used_memory_mbytes
Checks the percentage of total amount of physical memory used in megabytes.
If the memory usage of both the metrics is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned:
*Above 80% threshold, Severity: ERROR
*Above 90% threshold, Severity: CRITICAL
The steps to identify the causes of higher memory usage is as follows:
1. Identify the process that consumes more memory.
2. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and responding.
3. Generate the heap dump.
4. Analyze the logs of all the API Gateway instances and identify the file that consumes more memory.
5. Identify the server that has an issue and restart the server if required.
6. Perform the following actions after restarting the server:
a. Check for the readiness of API Gateway.
b. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode.
c. Check the resource availability of all the required system resources like memory, heap, disk.
d. Check the Terracotta client logs for errors in Terracotta communication for a cluster set-up.
For more details about the API Gateway metrics, see Developing Microservices with webMethods Microservices Runtime.