API Operational Metrics
API Gateway provides metric statistics for API calls, error, and error rates. The API-level monitoring measures the availability of the deployed APIs. For example, error rates and performance (latency). You can use these metrics to measure the service and business availability.
The key metrics monitored are:
Error rates
API transaction error rate per API and the aggregated value
API execution error rate per API and the aggregated value
Backend API errors per API and the aggregated value
Errors arising from the inter component interactions (such as
API Gateway to Elasticsearch)
Performance (latency)
API performance per API
API Gateway performance and Backend API performance
Aggregated latency introduced by API Gateway
The API-level metrics you monitor have the following characteristics:
The metrics continue to exist when an API is deactivated or activated.
The metrics for a deleted API are no longer reported.
Note:
The metric count starts from zero when the server starts.
The API invocations are counted per node and not for the complete cluster.
The API invocations are counted only within an
API Gateway instance.
After scraping, that is after sending the
/metrics request, all Gauge values are reset.
The following tables lists the API-level metrics and the server-level metrics and their corresponding Prometheus metric types and labels that are associated with a metric that you use for monitoring.
API-level Metrics
The API-level metrics monitor the API transaction error rates and API performance. This is monitored based on the metric type you use like API name, API version, the response code or the tenant that receives the API invocation request.
Prometheus metric name | Description | Label |
sag_apigw_api_exec_error_count | This is an error rate related metric. Counts the number of API invocations that fail due to an unexpected error. This does not include policy violations and backend failures. Prometheus metric type: Counter | api_name api_version env |
sag_apigw_api_backend_error_count | This is an error rate related metric. Counts the number of backend service failures during API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages. Prometheus metric type: Counter | api_name api_version code (can have values 3xx, 4xx, 5xx, connect) env |
sag_apigw_api_tx_error_count | This is an error rate related metric. Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx. Prometheus metric type: Counter | api_name api_version code (can have values 3xx, 4xx, 5xx) env error (can have values internal, policy, backend) |
sag_apigw_api_avg_latency | This is a latency related metric. Measures the average time spent by the API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. Prometheus metric type: Gauge | api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx) env |
sag_apigw_api_avg_backend_response_time | This is a latency related metric. Measures the average time spent by the backend services while performing an API invocation request, classified by its response code with the label 2xx, 3xx, 4xx, 5xx or connect. Prometheus metric type: Gauge | api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx, connect) env |
sag_apigw_api_avg_response_time | This is a latency related metric. Measures the average time spent by the API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend service. Prometheus metric type: Gauge | api_name api_version code (can have values 2xx, 3xx, 4xx, 5xx) env |
sag_apigw_apicalls_total | The total number of API invocations per HTTP response code. Prometheus metric type: Counter | code env |
Server-level Metrics
The server-level metrics monitor the API invocation errors due to inter-component connectivity. This is monitored based on the metric type you use such as component and tenant that receives the API invocation request.
Prometheus metric name | Description | Label |
sag_apigw_tx_error_count | This is an error rate related metric. Counts the number of API invocations with a response code greater or equal to 300 and classifies the invocations by its response code with the label 3xx, 4xx or 5xx. Prometheus metric type: Counter | code (can have values 3xx, 4xx, 5xx) env error (can have values internal, policy, backend) |
sag_apigw_backend_error_count | This is an error rate related metric. Counts the number of backend service failures during all API invocations with a response code greater or equal to 300 and classifies the failures by its response code with the label 3xx, 4xx, 5xx or connect. The connect reason means that the backend service cannot be reached, for example, due to network outages. Prometheus metric type: Counter | code (can have values 3xx, 4xx, 5xx, connect) env |
sag_apigw_component_error_count | This is an error rate related metric. Counts the number of exceptions that occur when interacting with components or destinations. Prometheus metric type: Counter | component env |
sag_apigw_exec_error_count | This is an error rate related metric. Counts the number of API invocations that fail due to an unexpected error (for example NPE) and provides an aggregated number. Prometheus metric type: Counter | env |
sag_apigw_avg_response_time | This is a latency related metric. Measures the average time spent by all API invocations, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. This includes the time spent in API Gateway and by the backend services Prometheus metric type: Gauge | code (can have values 2xx, 3xx, 4xx, 5xx) env |
sag_apigw_avg_latency | This is a latency related metric. Measures the average time spent by all API invocations, which does not include the time spent in backend services, classified by its response code with the label 2xx, 3xx, 4xx or 5xx. Prometheus metric type: Gauge | code (can have values 2xx, 3xx, 4xx, 5xx) env |
Prometheus labels
Prometheus label | Description |
api_name | The name of the API for which the API invocation failed. |
api_version | The version of the API for which the API invocation failed. |
component | The name of the API Gateway component that cannot be reached. |
code | The code label shows the HTTP response code for the API calls counted. For each HTTP response code that occurred during the lifetime of the API Gateway server, the metrics response contains a separate counter entry. |
env | The name of the customer environment. The value of the env label is taken from the pg.gateway.elasticsearch.tenantId property in the config.properties file located at SAGInstallDir/IntegrationServer/instances/instance_name/packages/WmAPIGateway/config/resources/elasticsearch. |
error | The type of error encountered, such as internal error, policy violation, or backend error. |
Example: Sample Metrics
Here are a few example of sample metrics.
Sample transaction error count metrics are as follows:
In this example there are no unexpected execution errors counted for the particular API.
# HELP sag_apigw_api_exec_error_count Number of unexpected execution errors
# TYPE sag_apigw_api_exec_error_count counter
sag_apigw_api_exec_error_count{api version="1.0",env="default",api name="TestMetricsRest"} 0 1646997393690
In this example there are no unexpected execution errors counted for none of the executing APIs.
# HELP sag_apigw_exec_error_count Number of aggregated execution errors
# TYPE sag_apigw_exec_error_count counter
sag_apigw_exec_error_count{env="default"} 0 1646997393690
Sample aggregated backend errors count metrics are as follows:
In this example there is 1 connection error counted for the backend service of the particular API.
# HELP sag_apigw_api_backend_error_count Number of backend errors
# TYPE sag_apigw_api_backend_error_count counter
sag_apigw_api_backend_error_count{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690
In this example there is 1 connection error counted for backend services of all executing APIs.
# HELP sag_apigw_backend_error_count Number of aggregated backend errors
# TYPE sag_apigw_backend_error_count counter
sag_apigw_backend_error_count{code="connect",env="default"} 1 1646997393690
Sample transaction error count metrics is as follows:
In this example there is 1 transaction error counted for a particular API. This is an ordinary policy error with the response code 4xx.
# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="4xx",api version="1.0",env="default",error="policy",api name="TestMetricsRest"} 1 1646997393690
In this example there is 1 transaction error counted for a particular API. This is a backend service error with the response code 5xx.
# HELP sag_apigw_api_tx_error_count Number of transaction errors
# TYPE sag_apigw_api_tx_error_count counter
sag_apigw_api_tx_error_count{code="5xx",api version="1.0",env="default",error="backend",api name="TestMetricsRest"} 1 1646997393690
Sample API Gateway latency metrics is as follows:
In this example 1 millisecond is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 4xx
# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690
In this example 2328 milliseconds is the time spent in API Gateway performing an API invocation request for the incoming API requests measured with response code 5xx
# HELP sag_apigw_api_avg_latency Average API Gateway latency
# TYPE sag_apigw_api_avg_latency gauge
sag_apigw_api_avg_latency{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690
Sample backend response time metrics is as follows. In this example 2315 milliseconds is the average time spent by the backend services while performing an API invocation request; measured with the label
code=connect. The
connect reason means that the backend service cannot be reached, for example, due to network outages.
# HELP sag_apigw_api_avg_backend_response_time Average backend service duration
# TYPE sag_apigw_api_avg_backend_response_time gauge
sag_apigw_api_avg_backend_response_time{code="connect",api version="1.0",env="default",api name="TestMetricsRest"} 2315 1646997393690
Sample average response time metrics is as follows:
In this example 1 millisecond is the average response time spent for the incoming API requests measured with response code 4xx
# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="4xx",api version="1.0",env="default",api name="TestMetricsRest"} 1 1646997393690
In this example 2328 milliseconds is the average response time spent for the incoming API requests measured with response code 5xx
# HELP sag_apigw_api_avg_response_time Average request duration
# TYPE sag_apigw_api_avg_response_time gauge
sag_apigw_api_avg_response_time{code="5xx",api version="1.0",env="default",api name="TestMetricsRest"} 2328 1646997393690
Sample apicall metrics is as follows. In this example 32 API calls are measured with the HTTP response code 200.
# HELP sag_apigw_apicalls_total Total number of API invocations per response code
# TYPE sag_apigw_apicalls_total counter
sag_apigw_apicalls_total {code="200" ,env="default"} 32 1635169035001