Skip to content

Monitoring GKE Autopilot Clusters

Use this guide when you want to collect GKE Autopilot metrics with the OpenTelemetry Collector. Because GKE Autopilot automatically manages nodes and restricts certain privileged workloads, the standard KloudMate Agent installation cannot be used.

This guide demonstrates how to use a dedicated OpenTelemetry Collector deployment to pull Autopilot telemetry directly from Google Cloud Monitoring into KloudMate.

  1. A running GKE Autopilot cluster.
  2. A GCP service account with the Monitoring Viewer role.
  3. The following CLIs installed and configured:
    • kubectl
    • gcloud

Step 1: Create a Kubernetes Service Account

Section titled “Step 1: Create a Kubernetes Service Account”

Create a file named service-account.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector
  namespace: default

Apply to your cluster:

kubectl apply -f service-account.yaml

Step 2: Annotate the Service Account with a GCP Service Account

Section titled “Step 2: Annotate the Service Account with a GCP Service Account”

Annotate the Kubernetes service account to link it with your GCP service account. Replace placeholders with your values:

kubectl annotate serviceaccount otel-collector \
  --namespace default \
  iam.gke.io/gcp-service-account=<gcp-service-account>@<project>.iam.gserviceaccount.com \
  --overwrite
gcloud iam service-accounts add-iam-policy-binding \
  <gcp-service-account>@<project>.iam.gserviceaccount.com \
  --member="serviceAccount:<project>.svc.id.goog[default/otel-collector]" \
  --role="roles/iam.workloadIdentityUser" \
  --project=<project>

Step 4: Deploy the OpenTelemetry Collector

Section titled “Step 4: Deploy the OpenTelemetry Collector”

Create a file deployment.yaml and replace placeholders such as <PROJECT_ID>, <API_KEY>, and <CLUSTER_NAME>.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector
  namespace: default
  labels:
    app: opentelemetry
    component: otel-collector
data:
  config.yaml: |
    receivers:
      googlecloudmonitoring:
        collection_interval: 60s
        project_id: <PROJECT_ID>
        metrics_list:
          - metric_descriptor_filter: 'metric.type = starts_with("kubernetes.io/")'
          - metric_descriptor_filter: 'metric.type = starts_with("container.googleapis.com")'
          - metric_descriptor_filter: 'metric.type = starts_with("gke.io")'
          - metric_descriptor_filter: 'metric.type = starts_with("prometheus.googleapis.com")'
    processors:
      resourcedetection:
        detectors: [env, system]
        timeout: 5s
        override: false
      attributes/metrics:
        actions:
          - key: cluster
            value: "<CLUSTER_NAME>"
            action: insert
      batch:
        send_batch_size: 10000
        timeout: 60s
      memory_limiter:
        check_interval: 1s
        limit_mib: 400
        spike_limit_mib: 100
    exporters:
      debug:
        verbosity: detailed
      otlphttp:
        endpoint: "https://otel.kloudmate.com:4318"
        headers:
          Authorization: <API_KEY>
    service:
      pipelines:
        metrics:
          receivers: [googlecloudmonitoring]
          processors: [memory_limiter, batch, resourcedetection, attributes/metrics]
          exporters: [otlphttp, debug]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: default
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      serviceAccountName: otel-collector
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/etc/otel-collector-config/config.yaml"]
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: config-vol
              mountPath: /etc/otel-collector-config
            - name: varlogpods
              mountPath: /var/log/pods
              readOnly: true
            - name: varlogcontainers
              mountPath: /var/log/containers
              readOnly: true
      volumes:
        - name: config-vol
          configMap:
            name: otel-collector
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
      securityContext:
        runAsUser: 0

Apply the deployment to your cluster with this CLI command:

kubectl apply -f deployment.yaml

Log in to KloudMate and verify that cluster metrics begin appearing within a few minutes. Then use Explore or dashboards to validate CPU, memory, workload, and cluster-level telemetry.