OpenTelemetry Support
...
Service Integration Guides
Postgres Monitoring

PostgreSQL integration with KloudMate

12min
the postgresql integration with kloudmate enables you to monitor the health and performance of your postgresql databases in near real time with the integration of opentelemetry, you can collect vital metrics, track slow queries, and set up automated alerts to keep your team informed about potential performance issues this document outlines how to set up the integration, monitor key metrics, and leverage additional features like slow query logging for comprehensive database monitoring key features real time monitoring gain insights into your postgresql database's health and performance with near real time metrics opentelemetry integration leverage opentelemetry to collect and visualize metrics, traces, and logs from postgresql slow query logging identify slow queries affecting your database performance dashboards visualize postgresql performance metrics and set thresholds for key indicators like query response time, cpu usage, disk i/o, and more alerts create alerts to notify your team about performance issues such as high cpu usage, long running queries, and storage bottlenecks pre requisites postgresql installed and running install the opentelemetry collector on the specific server that requires monitoring see installing and configuring opentelemetry collector postgresql username and password for authentication the postgresql logs must be enabled from configurations step 1 configure the receivers to scrape metrics as well as logs to start monitoring postgresql with otel collector, you need to configure postgresql receiver linux users open the file located at /etc/otelcol contrib/config yaml /etc/otelcol contrib/config yaml using your preferred text editor windows users create a new file called config yaml config yaml in the c \program files\opentelemetry collector c \program files\opentelemetry collector folder you can use notepad or any text editor to do this add the suitable extensions extensions health check pprof endpoint 0 0 0 0 1777 zpages endpoint 0 0 0 0 55679 file storage create directory true in this step, two receivers are integrated to retrieve telemetry data from postgresql postgresql receiver to retrieve metrics(health and performance) file log receiver to retrieve logs(general logs, error logs, and slow query logs) to get the metrics use only the postgresql receiver receivers postgresql endpoint localhost 5432 #update the endpoint transport tcp username \<postgres> #update postgres username password \<password> #update postgres password databases \[] collection interval 30s tls insecure true metrics postgresql deadlocks enabled true postgresql database locks enabled true filelog/pgsql json #use this file log if postgres version >= 15 include \[/var/lib/postgresql/ /main/log/postgresql ] storage file storage retry on failure enabled true operators \ type json parser parse to body severity parse from body error severity mapping info log filelog/pgsql log #use this file log if postgres version < 15 include \[/var/lib/postgresql/ /main/log/postgresql ] storage file storage multiline line start pattern ^\d{4} \d{2} \d{2} \d{2} \d{2} \d{2} retry on failure enabled true operators \ type regex parser parse to body regex '^(?p\<timestamp>\[0 9\\ ]+\s\[0 9 \\ ]+\s\w+)\s+\\\[(?p\<pid>\d+)\\]\(? \s+(?p\<database>\w+)@(?p\<user>\w+))?\s+(?p\<severity>\w+) (? \s+duration \s+(?p\<duration>\d+\\ \d+)\s+\[^ ] \s+(statement \s+)?)?\s+(?p\<query> +)$' timestamp parse from body timestamp layout '%y %m %d %h %m %s %f %z' severity parse from body severity mapping info \ statement \ log \ type remove field body severity \ type remove field body timestamp step 2 set up the processor component to identify resource information from the host and either append or replace the resource values in the telemetry data with this information please choose one of the following configuration options based on your provider (aws ec2, azure vm, or on premises server) server(can be on premise, non cloud, or cloud) processors batch send batch size 5000 timeout 10s resourcedetection detectors \[system] system resource attributes host name enabled true host id enabled true os type enabled false resource attributes \ key service name action insert from attribute host name aws ec2 (optional) to retrieve aws ec2 instance tags along with logs and metrics, users need to associate an iam role with the ec2 instance that includes the ec2\ describetags ec2\ describetags policy processors batch send batch size 5000 timeout 10s resourcedetection/ec2 detectors \["ec2"] ec2 tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute ec2 tag name \ key service instance id action insert from attribute host id azure virtual machines processors batch send batch size 5000 timeout 10s resourcedetection/azure detectors \["azure"] azure tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute azure vm name \ key service instance id action insert from attribute host id step 3 configure the exporter, and extension and save the configuration set up the kloudmate backend on the exporter part of the open telemetry configuration file and configure the pipeline exporters debug verbosity detailed otlphttp endpoint 'https //otel kloudmate com 4318' headers authorization xxxxxxxx # use the auth key service pipelines metrics receivers \[postgresql] processors \[batch, resourcedetection, resource] # apply the correct resourcedetection exporters \[otlphttp] logs receivers \[pgsql log] #use the correct filelog receiver as per your pg version processors \[batch, resourcedetection, resource] # apply the correct resourcedetection exporters \[otlphttp] extensions \[health check, pprof, zpages, file storage] step 4 to restart and verify the status of the opentelemetry (otel) collector, follow these steps for linux execute the following commands sudo systemctl restart otelcol contrib sudo systemctl status otelcol contrib these commands will restart the otel collector and display its current status for windows open the services window press win + r win + r , type services msc services msc , and press ok ok alternatively, search for " services services " in the windows start menu in the services window, locate the " opentelemetry collector opentelemetry collector " service right click the service and select " restart restart " subsequently, monitor the metrics on the kloudmate dashboard and set up an alarm to receive notifications if the potential metrics for a specific application rise default metrics for postgresql name description type unit postgresql backends the number of backends sum none postgresql connection max configured the maximum number of client connections allowed gauge none postgresql database count number of user databases sum none postgresql db size the database disk usage sum bytes postgresql index scans the number of index scans on a table sum none postgresql index size the size of the index on the disk gauge bytes postgresql operations the number of db row operations sum none postgresql replication data delay the amount of data delayed in replication gauge bytes postgresql rollbacks the number of rollbacks sum none postgresql rows the number of rows in the database sum none postgresql table count number of user tables in a database sum none postgresql table size disk space used by a table sum bytes postgresql table vacuum count number of times a table has manually been vacuumed sum none postgresql wal age age of the oldest wal file this metric requires wal to be enabled with at least one replica gauge seconds postgresql wal lag the time between flushing the recent wal locally and receiving notification that the standby server has completed an operation on it gauge seconds postgresql bgwriter buffers allocated number of buffers allocated sum none postgresql bgwriter buffers writes number of buffers written sum none postgresql bgwriter checkpoint count the number of checkpoints performed sum none postgresql bgwriter duration total time spent writing and syncing files to disk by checkpoints sum milliseconds postgresql bgwriter maxwritten number of times the background writer stopped a cleaning scan because it had written too many buffers sum none postgresql blocks read the number of blocks read sum none postgresql commits the number of commits sum none postgresql deadlocks the number of deadlocks sum none postgresql database locks the number of database locks gauge count for full details on metrics, click here