Kubernetes Monitoring
10 min
kubernetes monitoring in kloudmate helps you monitor the health, performance, and resource usage of your kubernetes clusters it provides end to end visibility across clusters, nodes, namespaces, pods, and workloads by collecting metrics, logs, events, and optional apm data using the kloudmate kubernetes agent powered by opentelemetry this monitoring enables you to detect performance issues at an early stage track cpu and memory utilization across clusters identify unhealthy or unstable workloads improve capacity planning and resource allocation maintain overall cluster stability and reliability this setup also works for managed kubernetes services such as aks (azure kubernetes service) eks (elastic kubernetes service) gke (google kubernetes engine) unified monitoring for apps & databases in kubernetes when applications and databases run inside kubernetes, kloudmate uses the same opentelemetry instrumentation to collect application and database telemetry this allows you to correlate infrastructure, application, and database metrics perform end to end troubleshooting avoid deploying separate monitoring agents for detailed setup, refer to docid mxod5s t3uibgvmx9jwb docid 2kd7b2bur8x2zebtcuola what kubernetes monitoring covers kloudmate provides visibility into all major kubernetes components, allowing you to monitor your environment at different levels of abstraction clusters – overall health, capacity, and scale of each connected cluster nodes – infrastructure level resource usage and node health namespaces – resource distribution across teams, applications, or environments pods – pod level performance, restarts, and resource consumption workloads – health and availability of deployments, daemonsets, and statefulsets this layered visibility helps you quickly move from a high level overview to deep dive troubleshooting post installation monitoring capabilities once the kubernetes agent is successfully installed, data from your cluster automatically starts flowing into kloudmate you will be able to view all your kubernetes resources track real time and historical cpu and memory usage monitor workload health and replica availability analyze pod restarts and failures investigate incidents using logs and metrics access prebuilt kubernetes dashboards configure alerts on critical metrics installing the kubernetes agent install the kloudmate kubernetes agent to collect metrics, logs, events, and optional apm data securely it requires no code changes and supports self managed or managed clusters (aks, eks, gke) follow the step by step installation guide here docid\ llwe7n morlxytvuqc5kg once the agent is installed, your cluster data will automatically appear in modules → kubernetes kubernetes module overview the kubernetes module in kloudmate provides a centralized interface to explore kubernetes resources and monitor their resource usage across your environment once a cluster is connected, the module organizes kubernetes data into dedicated sections, clusters , nodes , namespaces , pods , and workloads , allowing you to easily move between different levels of the kubernetes hierarchy each section displays key resource metrics such as cpu and memory usage, helping teams understand how resources are allocated and consumed across the cluster this organized layout enables faster troubleshooting better capacity planning improved infrastructure visibility prerequisites before using the kubernetes module, ensure that the kubernetes agent is installed in your cluster explore kubernetes data the main kubernetes page is organized into the following component sections clusters provides a high level overview of each connected kubernetes cluster this view displays cluster name, node count, and total allocatable cpu and memory, helping you quickly assess overall cluster capacity and scale nodes shows detailed information for each node within a cluster, including allocatable and actual cpu and memory usage this helps identify unhealthy or overloaded nodes and track infrastructure level resource consumption namespaces displays resource usage grouped by namespaces, enabling visibility into how cpu and memory are distributed across teams, services, or environments this is especially useful for understanding multi tenant usage patterns pods offers pod level insights such as cpu and memory requests, limits, restarts, and real time usage this view helps validate pod configurations and detect issues like frequent restarts or resource contention workloads focuses on higher level kubernetes objects such as deployments, daemonsets, and statefulsets this view helps monitor workload health, availability, and resource behavior across replicas each component section displays a sortable table of key metrics, such as usage, requests, and limits, enabling quick identification of under or over utilized resources tables can be sorted in ascending or descending order to prioritize attention areas kubernetes resources can also be grouped or filtered using their respective dimensions (e g , filter pods by namespace, group nodes by cluster) to streamline troubleshooting and investigation resource details page clicking on any resource (such as a node or pod) opens its details page this page includes in depth time series metrics historical logs additional contextual information these insights help with advanced troubleshooting and root cause analysis post integration data validation verify data flow in modules → kubernetes → clusters view metrics appear within 2 5 minutes post agent restart for immediate visual confirmation standard kubernetes dashboards kloudmate provides prebuilt kubernetes dashboards through https //templates kloudmate com/ these dashboards visualize cluster health node performance pod resource usage workload availability to import and start using these templates, follow the steps described in https //docs kloudmate com/creating a dashboard#sfk3g these dashboards auto populate with your cluster data once the agent is running kubernetes default metrics when kubernetes monitoring is enabled, kloudmate automatically collects a set of commonly used kubernetes metrics no additional configuration is required cluster & workload metrics k8s container cpu limit k8s container cpu request k8s container memory limit k8s container memory request k8s container ready k8s container restarts k8s deployment available k8s deployment desired node & container metrics container cpu time container cpu usage container memory usage container memory working set k8s node cpu usage k8s node memory usage k8s node network io these metrics are sourced from the upstream opentelemetry receivers https //github com/open telemetry/opentelemetry collector contrib/blob/main/receiver/k8sclusterreceiver/documentation md https //github com/open telemetry/opentelemetry collector contrib/blob/main/receiver/kubeletstatsreceiver/documentation md