Monitoring API Gateway

As part of application monitoring, you can monitor the state, that is the cluster status and console access of API Gateway along with the resources.

How do I monitor the health of API Gateway?

Prerequisites:

You must have a valid API Gateway user credential for using the Readiness Probe, Runtime Service Health Probe, and Administration Service Health Probe.

All the node level probes must be setup to target the local instance, typically, localhost.

Software AG recommends to set up a dedicated port for monitoring with an appropriate private thread pool.

Readiness Probe at Node-Level

To monitor the readiness of API Gateway, that is to check if the traffic-serving port of a particular API Gateway node is ready to accept requests, use the following REST endpoint:

GET /rest/apigateway/health

The following table shows the response code and the description.

Response	Description
200 OK	Readiness check is successful. Readiness probe continues to reply OK if API Gateway remains in an operational state to serve the requests.
500 Internal server error	Readiness check failed and denotes a problem.
timeout or no response as the request did not reach the probe	Several factors can contribute to the delay when the Readiness Probe initiates, which may result in the timeout errors. To know the reasons for timeout errors, see Causes for timeout errors for more information.

Note:
As this is a Readiness Probe and only the response status code is essential, by design, JSON payload is not returned in the response for both success and failure scenarios.

Runtime Service Health Probe at Node-Level

To monitor the runtime service health of of API Gateway, that is to check the overall cluster health and to identify if the components of a particular API Gateway node are in an operational state, use the following REST endpoint:

GET /rest/apigateway/health/engine

The following table shows the response code and the description.

Response	Description
200 OK	Runtime service health check is successful.
500 Internal server error	Runtime service health check failed and denotes a problem. The response JSON indicates the problem.
timeout or no response as the request did not reach the probe	Several factors can contribute to the delay when the Runtime Service Health Probe initiates, which may result in the timeout errors. To know the reasons for timeout errors, see Causes for timeout errors for more information.

The response JSON of each health check request displays a status field in the response.

The overall status of API Gateway node can be green ,yellow, or red.

Status	Description
green	Indicates that the cluster within the node is in a healthy state.
yellow	Indicates that API Gateway does not have adequate resources to run.
red	Indicates the cluster failure in the node and an outage.

The overall status of API Gateway node is assessed based on the API Data Store status, API Gateway resource status, and the Terracotta server status.

API Data Store status

Status	Description
green	Indicates that API Data Store is in a healthy state. When the status of API Data Store signals green or yellow, the overall status of API Gateway is green.
red	Indicates that API Data Store is not in a healthy state. When the status of API Data Store signals red, the overall status of API Gateway is red.
yellow	Indicates a node failure in the cluster. However, the cluster is still functioning and operational.

API Gateway resource status

Status	Description
green	Indicates that API Gateway resource types like memory, disk space, and service threads are available to run.
yellow	Indicates that API Gateway does not have adequate resources to run. When the API Gateway resource status is yellow, the overall status of API Gateway is yellow.

Terracotta Server Array status

Status	Description
green	Indicates that Terracotta server is in a healthy state. When the status of Terracotta server signals green, the overall status of API Gateway is green.
red	Indicates that Terracotta server is not in a healthy state. When the status of Terracotta server signals red, the overall status of API Gateway is red.

A sample HTTP response is as follows:

{
"status": "green",
"elasticsearch": {
"cluster_name": "SAG_EventDataStore",
"status": "yellow",
"number_of_nodes": "1",
"number_of_data_nodes": "1",
"timed_out": "false",
"active_shards": "95",
"initializing_shards": "0",
"unassigned_shards": "92",
"task_max_waiting_in_queue_millis": "0",
"port_9240": "ok",
"response_time_ms": "526"
},
"is": {
"status": "green",
"diskspace": {
"status": "up",
"free": "908510568448",
"inuse": "104799719424",
"threshold": "101331028787",
"total": "1013310287872"
},
"memory": {
"status": "up",
"freemem": "425073672",
"maxmem": "954728448",
"threshold": "92222259",
"totalmem": "922222592"
},
"servicethread": {
"status": "up",
"avail": "72",
"inuse": "3",
"max": "75",
"threshold": "7"
},
"response_time_ms": "258"
},
"terracotta": {
"status": "green",
"nodes": "1",
"healthy_nodes": "1",
"response_time_ms": "22"
}
}

The overall engine status is green since all components work as expected.

Administration Service Health Probe at Node-Level

To check the availability and health status of the API Gateway administration service (UI, Dashboards) on a particular API Gateway node, use the following rest endpoint:

GET /rest/apigateway/health/admin

The following table shows the response code and the description.

Response	Description
200 OK	Administration service health check is successful.
500 Internal server error	Denotes a problem. The response JSON indicates the problem.
timeout or no response as the request did not reach the probe	Several factors can contribute to the delay when you initiate the Administration Service Health Probe, which may result in the timeout errors. To know the reasons for timeout errors, see Causes for timeout errors for more information.

The overall Administration Service Health Probe status can be green or red based on the API Gateway administration service's status and Kibana's status.

Kibana status

Status	Description
green	Indicates that Kibana's port is accessible. When the status signals green, the overall status of Administration Service Health Probe is green.
red	Indicates that either Kibana's port is inaccessible or Kibana's communication with API Data Store is not established. When the status signals red, the overall status of Administration Service Health Probe is red.

API Gateway administration service status

Status	Description
green	Indicates that API Gateway administration service is available. When the status signals green, the overall status of Administration Service Health Probe is green.
red	Indicates that API Gateway administration service is not available. When the status signals red, the overall status of Administration Service Health Probe is red.

A sample HTTP response is as follows:

{
"status": "green",
"ui": {
"status": "green",
"response_time_ms": "40"
},
"kibana": {
"status": {
"overall": {
"state": "green",
"nickname": "Looking good",
"icon": "success",
"uiColor": "secondary"
}
},
"response_time_ms": "36"
}
}

The overall status is green since API Gateway administration service and Kibana is in a healthy state.

How do I collect metrics?

To check the usage of the application and system parameters, use the following metrics endpoint: GET /metrics. When the endpoint is called, API Gateway gathers metrics and returns the data in the Prometheus format.

Note:
Prometheus is a non- Software AG dashboarding tool that helps in trend analysis. For more information, see https://prometheus.io/.

Prometheus metrics are exposed through the following endpoint.

[http|https]://host:port/metrics

The metrics endpoint by default is available on the following ports:

Default primary port (http). 5555

Default secure port (https). 5543

Default diagnostic port (debug port). 9999

A sample for the metrics endpoint is as follows:

http://server:5555/metrics

Authentication for the metrics endpoint

By default, the authentication is disabled when running API Gateway as Docker container.

For on-premise installations, the following environment variable can be set to switch off the authentication for the metrics endpoint:

SAG_IS_METRICS_ENDPOINT_ACL=Anonymous

The endpoint also exposes the Integration Server Prometheus metrics. For more details on the Integration Server Prometheus metrics, see Developing Microservices with webMethods Microservices Runtime.

Exposing API Gateway Prometheus Metrics over a dedicated port

The metrics endpoint can be made available on a custom port. After creating the port, add the following service to the port's allow list:

wm.server.query:getPrometheusStats

Similarly, the metrics endpoint can be removed from the default ports (5555 or 5543 or 9999) by removing the service from the allow or deny lists.