OpenTelemetry Support
...
Service Integration Guides
Server Metrics to KloudMate

Server Metrics to KloudMate with Host Metrics Receiver

10min

This guide provides instructions on gathering Linux and Windows Server metrics using the OpenTelemetry Host Metrics Receiver.

The Host Metrics Receiver generates metrics about the host system from various sources. It can also capture metrics on a per-process basis for applications running within Amazon EC2 instances, Azure VMs, or on-premise servers. This is intended to be used when the OpenTelemetry collector is deployed as an agent to ingest metrics into KloudMate.

Prerequisites:

  1. Install the OpenTelemetry Collector on the specific server that requires metric monitoring. Refer to the Installing the OpenTelemetry Collector guide for detailed instructions.

Step 1: Set Up the Host Metrics Receiver in the OpenTelemetry Configuration File

  • Linux Users: Open the file located at /etc/otelcol-contrib/config.yaml using your preferred text editor.
  • Windows Users: Create a new file called config.yaml in the C:\Program Files\OpenTelemetry Collector folder. You can use Notepad or any text editor to do this.

1. In this configuration file, ensure the host metrics receiver is set up to collect and send metrics according to your specific requirements.

YAML
ļ»æ

Please ensure that the configuration includes processes: {} for Linux systems, as this setting is not applicable or required for Windows.

2. Configure the processor part to detect resource information from the host and append or override the resource value in telemetry data with this information.

Please choose one of the following options for configuration based on your provider (AWS EC2, Azure VM, or on-premises server)

  • Server(Can be on-premise, non-cloud, or cloud)
YAML
ļ»æ
  • AWS EC2:

Optional: To retrieve AWS EC2 instance tags along with logs and metrics, you must associate an IAM role with the EC2 instance that includes the EC2:DescribeTags policy. The processor below needs to be added:

YAML
ļ»æ
  • Azure Virtual Machines:
YAML
ļ»æ

3. Set up the KloudMate Backend on the exporter part of the Open Telemetry configuration file and configure the pipeline.

YAML
ļ»æ

Step 3: To restart and verify the status of the OpenTelemetry (Otel) Collector, follow these steps:

For Linux:

  1. Execute the following commands:
Text
ļ»æ

These commands will restart the Otel Collector and display its current status.

For Windows:

  1. Open the Services window:
    • Press Win + R, type services.msc, and press OK.
    • Alternatively, search for "Services" in the Windows Start menu.
  2. In the Services window, locate the "OpenTelemetry Collector" service.
  3. Right-click the service and select "Restart."

Subsequently, monitor the metrics on the KloudMate dashboard and set up an alarm to receive notifications if the potential metrics for a specific application rise.

ļ»æ

Default Hostmetrics

Metric

Description

Unit

cpu

system.cpu.time

Total seconds each logical CPU spent on each mode.

Seconds

disk

system.disk.io

Disk bytes transferred.

Bytes

system.disk.io_time

Time disk spent activated.

Seconds

system.disk.merged

The number of disk reads/writes merged into single physical disk access operations.

Count

system.disk.operation_time

Time spent in disk operations.

Seconds

system.disk.operations

Disk operations count.

Count

system.disk.pending_operations

The queue size of pending I/O operations.

Count

system.disk.weighted_io_time

Time disk spent activated multiplied by the queue length.

Seconds

Load

system.cpu.load_average.15m

Average CPU Load over 15 minutes.

{thread}

system.cpu.load_average.5m

Average CPU Load over 5 minutes.

{thread}

system.cpu.load_average.1m

Average CPU Load over 1 minute.

{thread}

File system

system.filesystem.inodes.usage

FileSystem inodes used.

Count

system.filesystem.usage

Filesystem bytes used.

Bytes

Memory

system.memory.usage

Bytes of memory in use.

Bytes

Network

system.network.connections

The number of connections.

Count

system.network.dropped

The number of packets dropped.

Count

system.network.errors

The number of errors encountered.

Count

system.network.io

The number of bytes transmitted and received.

Bytes

system.network.packets

The number of packets transferred.

Count

Paging

system.paging.faults

The number of page faults.

Count

system.paging.operations

The number of paging operations.

Count

system.paging.usage

Swap (unix) or pagefile (windows) usage.

Bytes

Processes

system.processes.count

Total number of processes in each state.

Count

system.processes.created

Total number of created processes.

Count

Process

process.cpu.time

Total CPU seconds broken down by different states.

Seconds

process.disk.io

Disk bytes transferred.

Bytes

process.memory.usage

The amount of physical memory in use.

Bytes

process.memory.virtual

Virtual memory size.

Bytes