API Gateway 10.15 | Administering API Gateway | Operating API Gateway | Monitoring API Gateway | Node-level Monitoring | Monitoring API Gateway | API Operational Metrics
 
API Operational Metrics
API Gateway provides metric statistics for API calls, error, and error rates. The API-level monitoring measures the availability of the deployed APIs. For example, error rates and performance (latency). You can use these metrics to measure the service and business availability.
The key metrics monitored are:
*Error rates
*API transaction error rate per API and the aggregated value
*API execution error rate per API and the aggregated value
*Backend API errors per API and the aggregated value
*Errors arising from the inter component interactions (such as API Gateway to Elasticsearch)
*Performance (latency)
*API performance per API
*API Gateway performance and Backend API performance
*Aggregated latency introduced by API Gateway
The API-level metrics you monitor have the following characteristics:
*The metrics continue to exist when an API is deactivated or activated.
*The metrics for a deleted API are no longer reported.
Note: 
*The metric count starts from zero when the server starts.
*The API invocations are counted per node and not for the complete cluster.
*The API invocations are counted only within an API Gateway instance.
*After scraping, that is after sending the /metrics request, all Gauge values are reset.
The following tables lists the API-level metrics and the server-level metrics and their corresponding Prometheus metric types and labels that are associated with a metric that you use for monitoring.
API-level Metrics
The API-level metrics monitor the API transaction error rates and API performance. This is monitored based on the metric type you use like API name, API version, the response code or the tenant that receives the API invocation request.
Prometheus metric name
Description
Label
sag_apigw_api_exec_error_count
This is an error rate related metric.
Counts the number of API invocations that fail due to an unexpected error. This does not include policy violations and backend failures.
Prometheus metric type: Counter
*api_name
*api_version
*env
sag_apigw_api_backend_error_count
This is an error rate related metric.
Counts the number of backend service failures during API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages.
Prometheus metric type: Counter
*api_name
*api_version
*code (can have values 3xx, 4xx, 5xx, connect)
*env
sag_apigw_api_tx_error_count
This is an error rate related metric.
Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx.
Prometheus metric type: Counter
*api_name
*api_version
*code (can have values 3xx, 4xx, 5xx)
*env
*error (can have values internal, policy, backend)
sag_apigw_api_avg_latency
This is a latency related metric.
Measures the average time spent by the API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx.
Prometheus metric type: Gauge
*api_name
*api_version
*code (can have values 2xx, 3xx, 4xx, 5xx)
*env
sag_apigw_api_avg_backend_response_time
This is a latency related metric.
Measures the average time spent by the backend services while performing an API invocation request, classified by its response code with the label 2xx, 3xx, 4xx, 5xx or connect.
Prometheus metric type: Gauge
*api_name
*api_version
*code (can have values 2xx, 3xx, 4xx, 5xx, connect)
*env
sag_apigw_api_avg_response_time
This is a latency related metric.
Measures the average time spent by the API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend service.
Prometheus metric type: Gauge
*api_name
*api_version
*code (can have values 2xx, 3xx, 4xx, 5xx)
*env
sag_apigw_apicalls_total
The total number of API invocations per HTTP response code.
Prometheus metric type: Counter
*code
*env
Server-level Metrics
The server-level metrics monitor the API invocation errors due to inter-component connectivity. This is monitored based on the metric type you use such as component and tenant that receives the API invocation request.
Prometheus metric name
Description
Label
sag_apigw_tx_error_count
This is an error rate related metric.
Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx.
Prometheus metric type: Counter
*code (can have values 3xx, 4xx, 5xx)
*env
*error (can have values internal, policy, backend)
sag_apigw_backend_error_count
This is an error rate related metric.
Counts the number of backend service failures during all API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages.
Prometheus metric type: Counter
*code (can have values 3xx, 4xx, 5xx, connect)
*env
sag_apigw_component_error_count
This is an error rate related metric.
Counts the number of exceptions that occur when interacting with components or destinations.
Prometheus metric type: Counter
*component
*env
sag_apigw_exec_error_count
This is an error rate related metric.
Counts the number of API invocations that fail due to an unexpected error (for example NPE) and provides an aggregated number.
Prometheus metric type: Counter
*env
sag_apigw_avg_response_time
This is a latency related metric.
Measures the average time spent by all API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend services
Prometheus metric type: Gauge
*code (can have values 2xx, 3xx, 4xx, 5xx)
*env
sag_apigw_avg_latency
This is a latency related metric.
Measures the average time spent by all API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx.
Prometheus metric type: Gauge
*code (can have values 2xx, 3xx, 4xx, 5xx)
*env
Prometheus labels
Prometheus label
Description
api_name
The name of the API for which the API invocation failed.
api_version
The version of the API for which the API invocation failed.
component
The name of the API Gateway component that cannot be reached.
code
The code label shows the HTTP response code for the API calls counted. For each HTTP response code that occurred during the lifetime of the API Gateway server, the metrics response contains a separate counter entry.
env
The name of the customer environment.
The value of the env label is taken from the pg.gateway.elasticsearch.tenantId property in the config.properties file located at SAGInstallDir/IntegrationServer/instances/instance_name/packages/WmAPIGateway/config/resources/elasticsearch.
error
The type of error encountered, such as internal error, policy violation, or backend error.
Example: Sample Metrics
Here are a few example of sample metrics.
*Sample transaction error count metrics are as follows:
In this example there are no unexpected execution errors counted for the particular API.
# HELP sag_apigw_api_exec_error_count Number of unexpected execution errors
# TYPE sag_apigw_api_exec_error_count counter
sag_apigw_api_exec_error_count{api version="1.0",env="default",api name="TestMetricsRest"} 0 1646997393690
In this example there are no unexpected execution errors counted for none of the executing APIs.
# HELP sag_apigw_exec_error_count Number of aggregated execution errors
# TYPE sag_apigw_exec_error_count counter
sag_apigw_exec_error_count{env="default"} 0 1646997393690
*Sample aggregated backend errors count metrics are as follows:
In this example there is 1 connection error counted for the backend service of the particular API.
# HELP sag_apigw_api_backend_error_count Number of backend errors
# TYPE sag_apigw_api_backend_error_count counter
sag_apigw_api_backend_error_count{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example there is 1 connection error counted for backend services of all executing APIs.
# HELP sag_apigw_backend_error_count Number of aggregated backend errors
# TYPE sag_apigw_backend_error_count counter
sag_apigw_backend_error_count{code="connect",env="default"} 1 1646997393690
*Sample transaction error count metrics is as follows:
In this example there is 1 transaction error counted for a particular API. This is an ordinary policy error with the response code 4xx.
# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="4xx",api version="1.0",env="default",error="policy",api name="TestMetricsRest"} 1 1646997393690
In this example there is 1 transaction error counted for a particular API. This is a backend service error with the response code 5xx.
# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="5xx",api version="1.0",env="default",error="backend",api name="TestMetricsRest"} 1 1646997393690

*Sample API Gateway latency metrics is as follows:
In this example 1 millisecond is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 4xx
# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example 2328 milliseconds is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 5xx
# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690

*Sample backend response time metrics is as follows. In this example 2315 milliseconds is the average time spent by the backend services while performing an API invocation request; measured with the label code=connect. The connect reason means that the backend service cannot be reached, for example, due to network outages.
# HELP sag_apigw_api_avg_backend_response_time Average backend service duration
# TYPE sag_apigw_api_avg_backend_response_time gauge
sag_apigw_api_avg_backend_response_time{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 2315 1646997393690

*Sample average response time metrics is as follows:
In this example 1 millisecond is the average response time spent for the incoming API requests measured with response code 4xx
# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690

In this example 2328 milliseconds is the average response time spent for the incoming API requests measured with response code 5xx
# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690

*Sample apicall metrics is as follows. In this example 32 API calls are measured with the HTTP response code 200.
# HELP sag_apigw_apicalls_total Total number of API invocations per response code
# TYPE sag_apigw_apicalls_total counter
sag_apigw_apicalls_total {code="200" ,env="default"} 32 1635169035001