Application Metrics

Note:
The threshold values, configurations, and severities that are mentioned throughout this section are the guidelines that Software AG suggests for an optimal performance of API Gateway. You can modify these thresholds or define actions based on your operational requirements.

Metric

Description

sag_is_service_threads

Checks the percentage of total number of threads used for service execution where the threads are obtained from the server thread pool.

If the threads usage is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned:

Above 80% threshold, Severity: ERROR

Above 90% threshold, Severity: CRITICAL

The steps to identify the causes of higher threads usage are as follows:

1. Identify the process that consumes the highest number of threads.

2. Generate the thread dump.

3. Analyze thread dump to identify the thread locks.

4. Analyze the logs of all API Gateway instances in the node.

Metric

Description

sag_is_number_service_errors

Checks the number of services that results in errors or exceptions.

If service errors are encountered, consider the severity as ERROR.

The steps to identify the causes of service errors are as follows:

1. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode.

2. Check the server logs for any exception from SAGInstallDirectory\IntegrationServer\instances\instance_name\logs\server.log.

Metric

Description

sag_is_used_memory_bytes

Checks the percentage of total used memory of JVM.

If the memory usage is above the recommended threshold value for more than 15 minutes, consider the severity as mentioned:

Above 80% threshold, Severity: ERROR

Above 90% threshold, Severity: CRITICAL

The steps to identify the causes of higher memory usage of JVM are as follows:

1. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode.

2. Generate the heap dump.

3. Analyze the logs of all the API Gateway instances.

4. Identify the server that has an issue and restart the server if required.

5. Perform the following actions after restarting the server:

a. Check for the readiness of API Gateway.

b. Check the cluster status of API Gateway using the following REST endpoint: GET /rest/apigateway/health/engine to know if API Gateway is healthy and is in a cluster mode.

c. Check the resource availability of all the required system resources like memory, heap, and disk.

d. Check API Data Store connectivity with API Gateway server.

e. Check the Terracotta client logs for errors in Terracotta communication for a cluster set up.

Metric

Description

sag_is_http_requests

Checks the percentage of total number of HTTP or HTTPS requests since the last statistics poll.

The statistics poll interval is controlled by the watt.server.stats.pollTime server configuration parameter and the default interval is 60 seconds.

If the total number of HTTP or HTTPS requests since the last statistics poll is above the threshold limit that is based on the Throughput Per Second (TPS) value, consider the severity as ERROR.