OpenTelemetry Support
...
Service Integration Guides
Postgres Monitoring

PostgreSQL integration with KloudMate

12min
Document image




The PostgreSQL integration with KloudMate enables you to monitor the health and performance of your PostgreSQL databases in near real-time. With the integration of OpenTelemetry, you can collect vital metrics, track slow queries, and set up automated alerts to keep your team informed about potential performance issues. This document outlines how to set up the integration, monitor key metrics, and leverage additional features like slow query logging for comprehensive database monitoring.

Key Features

  • Real-Time Monitoring: Gain insights into your PostgreSQL database's health and performance with near real-time metrics.
  • OpenTelemetry Integration: Leverage OpenTelemetry to collect and visualize metrics, traces, and logs from PostgreSQL.
  • Slow Query Logging: Identify slow queries affecting your database performance.
  • Dashboards: Visualize PostgreSQL performance metrics and set thresholds for key indicators like query response time, CPU usage, disk I/O, and more.
  • Alerts: Create alerts to notify your team about performance issues such as high CPU usage, long-running queries, and storage bottlenecks.

Pre-requisites:

  1. PostgreSQL installed and running
  2. Install the OpenTelemetry collector on the specific server that requires monitoring. See Installing and Configuring OpenTelemetry Collector.
  3. PostgreSQL username and password for authentication.
  4. The postgreSQL logs must be enabled from configurations.

Step 1: Configure the Receivers to scrape metrics as well as Logs.

To start monitoring PostgreSQL with Otel Collector, you need to configure PostgreSQL receiver

  • Linux Users: Open the file located at /etc/otelcol-contrib/config.yaml using your preferred text editor.
  • Windows Users: Create a new file called config.yaml in the C:\Program Files\OpenTelemetry Collector folder. You can use Notepad or any text editor to do this.

Add the suitable extensions :

YAML


In this step, two receivers are integrated to retrieve telemetry data from PostgreSQL.

  • PostgreSQL Receiver: To retrieve metrics(health and performance)
  • File Log Receiver: To retrieve logs(general logs, error logs, and slow query logs)

To get the metrics use only the postgresql receiver

YAML


Step 2: Set up the processor component to identify resource information from the host and either append or replace the resource values in the telemetry data with this information.

Please choose one of the following configuration options based on your provider (AWS EC2, Azure VM, or on-premises server).

  • Server(Can be on-premise, non-cloud, or cloud)
YAML

  • AWS EC2:

(Optional) To retrieve AWS EC2 instance tags along with logs and metrics, users need to associate an IAM role with the EC2 instance that includes the EC2:DescribeTags policy.

YAML

  • Azure Virtual Machines:
YAML


Step 3: Configure the exporter, and extension and save the configuration

Set up the KloudMate Backend on the exporter part of the Open Telemetry configuration file and configure the pipeline.

YAML


Step 4: To restart and verify the status of the OpenTelemetry (Otel) Collector, follow these steps:

For Linux:

  1. Execute the following commands:
Text


These commands will restart the Otel Collector and display its current status.

For Windows:

  1. Open the Services window:
    • Press Win + R, type services.msc, and press OK.
    • Alternatively, search for "Services" in the Windows Start menu.
  2. In the Services window, locate the "OpenTelemetry Collector" service.
  3. Right-click the service and select "Restart."

Subsequently, monitor the metrics on the KloudMate dashboard and set up an alarm to receive notifications if the potential metrics for a specific application rise.

Default Metrics for PostgreSQL

Name

Description

Type

Unit

postgresql.backends

The number of backends

SUM

NONE

postgresql.connection.max

Configured the maximum number of client connections allowed

GAUGE

NONE

postgresql.database.count

Number of user databases

SUM

NONE

postgresql.db_size

The database disk usage

SUM

Bytes

postgresql.index.scans

The number of index scans on a table

SUM

NONE

postgresql.index.size

The size of the index on the disk

GAUGE

Bytes

postgresql.operations

The number of db row operations

SUM

NONE

postgresql.replication.data_delay

The amount of data delayed in replication

GAUGE

Bytes

postgresql.rollbacks

The number of rollbacks

SUM

NONE

postgresql.rows

The number of rows in the database

SUM

NONE

postgresql.table.count

Number of user tables in a database

SUM

NONE

postgresql.table.size

Disk space used by a table

SUM

Bytes

postgresql.table.vacuum.count

Number of times a table has manually been vacuumed

SUM

NONE

postgresql.wal.age

Age of the oldest WAL file. This metric requires WAL to be enabled with at least one replica.

GAUGE

seconds

postgresql.wal.lag

The time between flushing the recent WAL locally and receiving notification that the standby server has completed an operation on it.

GAUGE

seconds

postgresql.bgwriter.buffers.allocated

Number of buffers allocated

SUM

NONE

postgresql.bgwriter.buffers.writes

Number of buffers written

SUM

NONE

postgresql.bgwriter.checkpoint.count

The number of checkpoints performed

SUM

NONE

postgresql.bgwriter.duration

Total time spent writing and syncing files to disk by checkpoints

SUM

Milliseconds

postgresql.bgwriter.maxwritten

Number of times the background writer stopped a cleaning scan because it had written too many buffers

SUM

NONE

postgresql.blocks_read

The number of blocks read

SUM

NONE

postgresql.commits

The number of commits

SUM

NONE

postgresql.deadlocks

The number of deadlocks.

SUM

NONE

postgresql.database.locks

The number of database locks.

GAUGE

COUNT