Infrastructure Monitoring
Kubernetes Monitoring
Monitoring GKE Autopilot Clusters with OpenTelemetry
7 min
this guide outlines how to configure monitoring for your google kubernetes engine (gke) autopilot clusters using the opentelemetry collector the integration allows you to collect, process, and export gke metrics to kloudmate for observability, alerting, and troubleshooting by following these steps, you will enable advanced monitoring and actionable insights into your gke workloads with minimal operational overhead prerequisites a running gke autopilot cluster a gcp service account with the "monitoring viewer" the following clis are installed & configured kubectl gcloud step 1 create a kubernetes service account create a yaml file called service account yaml apiversion v1 kind serviceaccount metadata name otel collector namespace default apply to your cluster kubectl apply f service account yaml step 2 annotate the service account with a gcp service account annotate the kubernetes service account to link it with your gcp service account replace placeholders with your values kubectl annotate serviceaccount otel collector \ namespace default iam gke io/gcp service account=\<service account name>@\<project id> iam gserviceaccount com \ overwrite step 3 create policy binding by running this command as an admin user text gcloud iam service accounts add iam policy binding \\ \<service account name>@\<project id> iam gserviceaccount com \\ \ member="serviceaccount \<project id> svc id goog\[default/otel collector]" \\ \ role="roles/iam workloadidentityuser" \\ \ project=\<project id> step 4 deploy opentelemetry collector create a file deployment yaml as follows replace placeholders such as \<project id> , \<api key> , \<cluster name> accordingly apiversion v1 kind configmap metadata name otel collector namespace default labels app opentelemetry component otel collector data config yaml | receivers googlecloudmonitoring collection interval 60s project id \<project id> metrics list metric descriptor filter 'metric type = starts with("kubernetes io/")' metric descriptor filter 'metric type = starts with("container googleapis com")' metric descriptor filter 'metric type = starts with("gke io")' metric descriptor filter 'metric type = starts with("prometheus googleapis com")' processors resourcedetection detectors env system timeout 5s override false attributes/metrics actions key cluster value 'gcp metrics' #add the cluster name action insert batch send batch size 10000 timeout 60s memory limiter check interval 1s limit mib 400 spike limit mib 100 exporters debug verbosity detailed otlphttp endpoint "https //otel kloudmate com 4318" headers authorization \<api key> service pipelines metrics receivers \[googlecloudmonitoring] processors \[memory limiter, batch, resourcedetection, attributes/metrics] exporters \[otlphttp, debug] \ apiversion apps/v1 kind deployment metadata name otel collector namespace default spec selector matchlabels app otel collector template metadata labels app otel collector spec serviceaccountname otel collector containers name otel collector image otel/opentelemetry collector contrib\ latest args \[" config=/etc/otel collector config/config yaml"] env name kube node name valuefrom fieldref fieldpath spec nodename volumemounts name config vol mountpath /etc/otel collector config name varlogpods mountpath /var/log/pods readonly true name varlogcontainers mountpath /var/log/containers readonly true volumes name config vol configmap name otel collector name varlogpods hostpath path /var/log/pods name varlogcontainers hostpath path /var/log/containers securitycontext runasuser 0 apply the deployment to your cluster with this cli command kubectl apply f deployment yaml step 5 verify setup log in to the kloudmate dashboard and navigate to your gke project you should begin seeing metrics within a few minutes under the relevant cluster you can now create dashboards, set alerts, and troubleshoot based on the telemetry data