API Operational Metrics

API Gateway provides metric statistics for API calls, error, and error rates. The API-level monitoring measures the availability of the deployed APIs. For example, error rates and performance (latency). You can use these metrics to measure the service and business availability.

The key metrics monitored are:

Error rates

API transaction error rate per API and the aggregated value

API execution error rate per API and the aggregated value

Backend API errors per API and the aggregated value

Errors arising from the inter component interactions (such as API Gateway to Elasticsearch)

Performance (latency)

API performance per API

API Gateway performance and Backend API performance

Aggregated latency introduced by API Gateway

The API-level metrics you monitor have the following characteristics:

The metrics continue to exist when an API is deactivated or activated.

The metrics for a deleted API are no longer reported.

Note:

The metric count starts from zero when the server starts.

The API invocations are counted per node and not for the complete cluster.

The API invocations are counted only within an API Gateway instance.

After scraping, that is after sending the /metrics request, all Gauge values are reset.

The following tables lists the API-level metrics and the server-level metrics and their corresponding Prometheus metric types and labels that are associated with a metric that you use for monitoring.

API-level Metrics

The API-level metrics monitor the API transaction error rates and API performance. This is monitored based on the metric type you use like API name, API version, the response code or the tenant that receives the API invocation request.

Prometheus metric name	Description	Label
sag_apigw_api_exec_error_count	This is an error rate related metric. Counts the number of API invocations that fail due to an unexpected error. This does not include policy violations and backend failures. Prometheus metric type: Counter	api_name api_version env
sag_apigw_api_backend_error_count	This is an error rate related metric. Counts the number of backend service failures during API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages. Prometheus metric type: Counter	api_name api_version code (can have values 3xx, 4xx, 5xx, connect) env
sag_apigw_api_tx_error_count	This is an error rate related metric. Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx. Prometheus metric type: Counter	api_name api_version code (can have values 3xx, 4xx, 5xx) env error (can have values internal, policy, backend)
sag_apigw_api_avg_latency	This is a latency related metric. Measures the average time spent by the API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. Prometheus metric type: Gauge	api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx) env
sag_apigw_api_avg_backend_response_time	This is a latency related metric. Measures the average time spent by the backend services while performing an API invocation request, classified by its response code with the label 2xx, 3xx, 4xx, 5xx or connect. Prometheus metric type: Gauge	api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx, connect) env
sag_apigw_api_avg_response_time	This is a latency related metric. Measures the average time spent by the API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend service. Prometheus metric type: Gauge	api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx) env
sag_apigw_apicalls_total	The total number of API invocations per HTTP response code. Prometheus metric type: Counter	code env

Server-level Metrics

The server-level metrics monitor the API invocation errors due to inter-component connectivity. This is monitored based on the metric type you use such as component and tenant that receives the API invocation request.

Prometheus metric name	Description	Label
sag_apigw_tx_error_count	This is an error rate related metric. Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx. Prometheus metric type: Counter	code (can have values 3xx, 4xx, 5xx) env error (can have values internal, policy, backend)
sag_apigw_backend_error_count	This is an error rate related metric. Counts the number of backend service failures during all API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages. Prometheus metric type: Counter	code (can have values 3xx, 4xx, 5xx, connect) env
sag_apigw_component_error_count	This is an error rate related metric. Counts the number of exceptions that occur when interacting with components or destinations. Prometheus metric type: Counter	component env
sag_apigw_exec_error_count	This is an error rate related metric. Counts the number of API invocations that fail due to an unexpected error (for example NPE) and provides an aggregated number. Prometheus metric type: Counter	env
sag_apigw_avg_response_time	This is a latency related metric. Measures the average time spent by all API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend services Prometheus metric type: Gauge	code (can have values 2xx, 3xx, 4xx, 5xx) env
sag_apigw_avg_latency	This is a latency related metric. Measures the average time spent by all API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. Prometheus metric type: Gauge	code (can have values 2xx, 3xx, 4xx, 5xx) env

Prometheus labels

Prometheus label	Description
api_name	The name of the API for which the API invocation failed.
api_version	The version of the API for which the API invocation failed.
component	The name of the API Gateway component that cannot be reached.
code	The code label shows the HTTP response code for the API calls counted. For each HTTP response code that occurred during the lifetime of the API Gateway server, the metrics response contains a separate counter entry.
env	The name of the customer environment. The value of the env label is taken from the pg.gateway.elasticsearch.tenantId property in the config.properties file located at SAGInstallDir/IntegrationServer/instances/instance_name/packages/WmAPIGateway/config/resources/elasticsearch.
error	The type of error encountered, such as internal error, policy violation, or backend error.

Example: Sample Metrics

Here are a few example of sample metrics.

Sample transaction error count metrics are as follows:

In this example there are no unexpected execution errors counted for the particular API.

# HELP sag_apigw_api_exec_error_count Number of unexpected execution errors
# TYPE sag_apigw_api_exec_error_count counter
sag_apigw_api_exec_error_count{api version="1.0",env="default",api name="TestMetricsRest"} 0 1646997393690

In this example there are no unexpected execution errors counted for none of the executing APIs.

# HELP sag_apigw_exec_error_count Number of aggregated execution errors
# TYPE sag_apigw_exec_error_count counter
sag_apigw_exec_error_count{env="default"} 0 1646997393690

Sample aggregated backend errors count metrics are as follows:

In this example there is 1 connection error counted for the backend service of the particular API.

# HELP sag_apigw_api_backend_error_count Number of backend errors
# TYPE sag_apigw_api_backend_error_count counter
sag_apigw_api_backend_error_count{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example there is 1 connection error counted for backend services of all executing APIs.

# HELP sag_apigw_backend_error_count Number of aggregated backend errors
# TYPE sag_apigw_backend_error_count counter
sag_apigw_backend_error_count{code="connect",env="default"} 1 1646997393690

Sample transaction error count metrics is as follows:

In this example there is 1 transaction error counted for a particular API. This is an ordinary policy error with the response code 4xx.

# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="4xx",api version="1.0",env="default",error="policy",api name="TestMetricsRest"} 1 1646997393690

In this example there is 1 transaction error counted for a particular API. This is a backend service error with the response code 5xx.

# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="5xx",api version="1.0",env="default",error="backend",api name="TestMetricsRest"} 1 1646997393690

Sample API Gateway latency metrics is as follows:

In this example 1 millisecond is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 4xx

# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example 2328 milliseconds is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 5xx

# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690

Sample backend response time metrics is as follows. In this example 2315 milliseconds is the average time spent by the backend services while performing an API invocation request; measured with the label code=connect. The connect reason means that the backend service cannot be reached, for example, due to network outages.

# HELP sag_apigw_api_avg_backend_response_time Average backend service duration
# TYPE sag_apigw_api_avg_backend_response_time gauge
sag_apigw_api_avg_backend_response_time{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 2315 1646997393690

Sample average response time metrics is as follows:

In this example 1 millisecond is the average response time spent for the incoming API requests measured with response code 4xx

# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example 2328 milliseconds is the average response time spent for the incoming API requests measured with response code 5xx

# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690

Sample apicall metrics is as follows. In this example 32 API calls are measured with the HTTP response code 200.

# HELP sag_apigw_apicalls_total Total number of API invocations per response code
# TYPE sag_apigw_apicalls_total counter
sag_apigw_apicalls_total {code="200" ,env="default"} 32 1635169035001