Server Metrics to KloudMate with Host Metrics Receiver
This guide provides instructions on gathering Linux and Windows Server metrics using the OpenTelemetry Host Metrics Receiver.
The Host Metrics Receiver generates metrics about the host system from various sources. It can also capture metrics on a per-process basis for applications running within Amazon EC2 instances, Azure VMs, or on-premise servers. This is intended to be used when the OpenTelemetry collector is deployed as an agent to ingest metrics into KloudMate.
Prerequisites:
- Install the OpenTelemetry Collector on the specific server that requires metric monitoring. Refer to the Installing the OpenTelemetry Collector guide for detailed instructions.
Step 1: Set Up the Host Metrics Receiver in the OpenTelemetry Configuration File
- Linux Users: Open the file located at /etc/otelcol-contrib/config.yaml using your preferred text editor.
- Windows Users: Create a new file called config.yaml in the C:\Program Files\OpenTelemetry Collector folder. You can use Notepad or any text editor to do this.
1. In this configuration file, ensure the host metrics receiver is set up to collect and send metrics according to your specific requirements.
Please ensure that the configuration includes processes: {} for Linux systems, as this setting is not applicable or required for Windows.
2. Configure the processor part to detect resource information from the host and append or override the resource value in telemetry data with this information.
Please choose one of the following options for configuration based on your provider (AWS EC2, Azure VM, or on-premises server)
- Server(Can be on-premise, non-cloud, or cloud)
- AWS EC2:
Optional: To retrieve AWS EC2 instance tags along with logs and metrics, you must associate an IAM role with the EC2 instance that includes the EC2:DescribeTags policy. The processor below needs to be added:
- Azure Virtual Machines:
3. Set up the KloudMate Backend on the exporter part of the Open Telemetry configuration file and configure the pipeline.
Step 3: To restart and verify the status of the OpenTelemetry (Otel) Collector, follow these steps:
For Linux:
- Execute the following commands:
These commands will restart the Otel Collector and display its current status.
For Windows:
- Open the Services window:
- Press Win + R, type services.msc, and press OK.
- Alternatively, search for "Services" in the Windows Start menu.
- In the Services window, locate the "OpenTelemetry Collector" service.
- Right-click the service and select "Restart."
Subsequently, monitor the metrics on the KloudMate dashboard and set up an alarm to receive notifications if the potential metrics for a specific application rise.
Metric | Description | Unit |
---|---|---|
cpu | ||
system.cpu.time | Total seconds each logical CPU spent on each mode. | Seconds |
disk | ||
system.disk.io | Disk bytes transferred. | Bytes |
system.disk.io_time | Time disk spent activated. | Seconds |
system.disk.merged | The number of disk reads/writes merged into single physical disk access operations. | Count |
system.disk.operation_time | Time spent in disk operations. | Seconds |
system.disk.operations | Disk operations count. | Count |
system.disk.pending_operations | The queue size of pending I/O operations. | Count |
system.disk.weighted_io_time | Time disk spent activated multiplied by the queue length. | Seconds |
Load | ||
system.cpu.load_average.15m | Average CPU Load over 15 minutes.
| {thread} |
system.cpu.load_average.5m | Average CPU Load over 5 minutes. | {thread} |
system.cpu.load_average.1m | Average CPU Load over 1 minute. | {thread} |
File system | ||
system.filesystem.inodes.usage | FileSystem inodes used. | Count |
system.filesystem.usage | Filesystem bytes used. | Bytes |
Memory | ||
system.memory.usage | Bytes of memory in use. | Bytes |
Network | ||
system.network.connections | The number of connections. | Count |
system.network.dropped | The number of packets dropped. | Count |
system.network.errors | The number of errors encountered. | Count |
system.network.io | The number of bytes transmitted and received. | Bytes |
system.network.packets | The number of packets transferred. | Count |
Paging | ||
system.paging.faults | The number of page faults. | Count |
system.paging.operations | The number of paging operations. | Count |
system.paging.usage | Swap (unix) or pagefile (windows) usage. | Bytes |
Processes | ||
system.processes.count | Total number of processes in each state. | Count |
system.processes.created | Total number of created processes. | Count |
Process | ||
process.cpu.time | Total CPU seconds broken down by different states. | Seconds |
process.disk.io | Disk bytes transferred. | Bytes |
process.memory.usage | The amount of physical memory in use. | Bytes |
process.memory.virtual | Virtual memory size. | Bytes |