API Monitoring
API Monitoring shows Rate, Errors, and Duration (RED) metrics for every API endpoint your services handle, across both HTTP and RPC (gRPC and Connect / connect_rpc). It is the server-side, per-endpoint view of your traffic.
Where the Service Map shows caller-to-callee dependencies and the Trace Explorer helps you find an individual request, API Monitoring sits in between: it answers “which of my endpoints are slow, failing, or busy?” and lets you drill from a single endpoint straight into its traces.
Overview
Section titled “Overview”In the left navigation, open APM & Tracing → API Monitoring.
Use the time range picker in the top-right to set the observation window. The view defaults to the last 24 hours so the per-endpoint trend sparklines have enough history to draw.
Endpoints are derived from your services’ server spans. If the table is empty, confirm your services are instrumented and sending traces, or widen the time range — see Getting Accurate Data.
The Endpoints List
Section titled “The Endpoints List”Each row is one endpoint, identified by its service, protocol, and (for HTTP) method. The columns are:
- Endpoint: a protocol badge (
HTTP,GRPC, orCONNECT_RPC), a method chip for HTTP rows (GET,POST, …), and the endpoint label — a templated route such asGET /users/{id}for HTTP, or the full method name such aspkg.Svc/Methodfor RPC. - Service: the service that handles the endpoint. It links to that service’s APM dashboard.
- Throughput (req/min): request rate over the selected window.
- Error rate: the percentage of requests counted as errors (see How Errors Are Counted). Shown in red when above zero.
- p50 / p95 / p99: latency percentiles, in milliseconds.
- Status: an error-focused summary of the HTTP status mix. It reads
healthywhen there are no4xxor5xxresponses, and surfaces4xxand5xxcounts when they occur. Hover it to see the full2xx/3xx/4xx/5xxbreakdown, plus any uncategorized requests. - Throughput trend: a sparkline of request volume across the window.
HTTP vs. RPC rows
Section titled “HTTP vs. RPC rows”The protocol badge determines what a row can show:
- HTTP rows carry a method chip and a full
2xx/3xx/4xx/5xxstatus breakdown. - RPC rows (gRPC, Connect) have no HTTP method or status classes, so the Status column shows a dash (
—). Judge these endpoints by their error rate, which is derived from span status and works for every protocol.
Filtering and Sorting
Section titled “Filtering and Sorting”Use the filter bar above the table to narrow the list:
- Search: match endpoints by name (for example, a path fragment or RPC method).
- Service: one or more services.
- Protocol:
http,grpc, orconnect_rpc. - Method: one or more HTTP methods (
GET,POST,PUT,PATCH,DELETE,HEAD,OPTIONS). - Status: restrict to endpoints that returned a given HTTP status class —
4xxor5xx.
Click the Throughput, Error rate, or p95 column header to sort by that metric. Sorting is always highest-first and is applied across all matching endpoints, so the table always shows the true top results rather than re-ordering only the current page.
Endpoint Details
Section titled “Endpoint Details”Click any row to open the endpoint drill-down in a slide-over panel. The panel is shareable — its address encodes the selected endpoint, so you can send a teammate a direct link, and closing it returns you to the list with your filters, sort, and time range intact.
At the top, summary tiles show Requests, Error rate, and p95 latency for the endpoint over the selected window. Below them are time-series charts:
- Throughput (requests) over time.
- Error rate over time.
- Latency with
p50,p95, andp99overlaid. - Status classes (
2xx/3xx/4xx/5xx) over time — shown for HTTP endpoints only.
Recent and Failed Traces
Section titled “Recent and Failed Traces”The drill-down ends with a table of the endpoint’s traces, so you can move from a metric to the exact requests behind it:
- Each row shows the start time, span, and duration, with a View trace action that opens the full trace in Trace Detail.
- Toggle Failed only to keep just the errored requests. This uses the same protocol-neutral error definition as the metrics, so it covers HTTP
5xxand gRPC/Connect errors alike. - Open in Traces hands the current endpoint and time range to the Trace Explorer for deeper filtering and analysis.
How Errors Are Counted
Section titled “How Errors Are Counted”An endpoint error is any request that either returned an HTTP 5xx status or produced a span marked ERROR (an exception, or a gRPC/Connect error). This definition is protocol-neutral, which is why RPC endpoints get a meaningful error rate even though they have no HTTP status.
4xx responses are treated as client faults, not endpoint errors. They appear in the status breakdown but are deliberately excluded from the error rate, so a spike in client mistakes (bad input, unauthorized calls) does not make a healthy endpoint look broken.
Getting Accurate Data
Section titled “Getting Accurate Data”API Monitoring is built entirely from your tracing data. For complete and well-grouped metrics:
- Services must emit OpenTelemetry server spans (
span.kind = server). Each span needs HTTP attributes (method and status code) for HTTP endpoints, orrpc.systemfor gRPC/Connect endpoints. - For clean grouping, routes should be templated by the instrumentation —
GET /users/{id}, notGET /users/12345. Untemplated routes with raw IDs would otherwise create a separate endpoint per value. KloudMate auto-collapses obvious numeric and UUID path segments as a safety net, but SDK route templates give the best result. - gRPC and Connect endpoints are grouped by their full method name and work out of the box — no templating needed.
If you are still setting up instrumentation, see the Auto Instrumentation and Manual Instrumentation guides.
Related Paths
Section titled “Related Paths”- Service Map — service-to-service dependencies and the caller’s view of failures.
- Services — service-level request, latency, and error trends.
- Trace Explorer — search and filter individual traces.