Direct Database Monitoring

32 min

the kloudmate agent can connect directly to your database using a dedicated monitoring user to collect deep query metrics, execution plans, and schema metadata this approach provides the richest level of database insight — including per query latency breakdowns, explain plans, and wait event analysis — across both self hosted and managed database services when to use use direct database monitoring when you need deep query level analytics, including execution plans, query digest metrics, wait event analysis, and schema discovery this is the most comprehensive monitoring approach and works on any environment — self hosted, rds, aurora, cloud sql, azure, or alloydb how it works you create a dedicated kloudmate database user with minimal read only permissions helper procedures/functions (using security definer ) allow the agent to access query stats and run explain without needing superuser privileges the km agent connects using this user and periodically collects query metrics, live queries, execution plans, and schema metadata mysql setup 1\ create monitoring user connect as a user with admin privileges (for rds/aurora, use the master user) create user if not exists 'kloudmate'@'%' identified by '\<kloudmate password>'; \ limit connections to prevent exhausting the pool alter user 'kloudmate'@'%' with max user connections 5; 2\ grant base permissions \ process see all running threads and queries \ replication client check replication lag and binary log position \ select on perf schema read query digest stats, wait events, etc grant replication client on to 'kloudmate'@'%'; grant process on to 'kloudmate'@'%'; grant select on performance schema to 'kloudmate'@'%'; 3\ create helper procedures create a dedicated schema and the helper procedures the agent uses create schema if not exists kloudmate; grant execute on kloudmate to 'kloudmate'@'%'; enable performance schema, consumers on rds/aurora/cloud sql, you cannot permanently enable performance schema consumers via config this procedure lets the agent enable them at runtime delimiter $$ create procedure if not exists kloudmate enable events statements consumers() sql security definer begin update performance schema setup consumers set enabled = 'yes' where name like 'events statements %'; update performance schema setup consumers set enabled = 'yes' where name = 'events waits current'; end $$ delimiter ; grant execute on procedure kloudmate enable events statements consumers to 'kloudmate'@'%'; explain statement collects json execution plans for sampled queries delimiter $$ create procedure if not exists kloudmate explain statement(in query text) sql security definer begin set @explain = concat('explain format=json ', query); prepare stmt from @explain; execute stmt; deallocate prepare stmt; end $$ delimiter ; grant execute on procedure kloudmate explain statement to 'kloudmate'@'%'; 4\ enable performance schema rds / aurora performance schema must be enabled via the parameter group in the aws console or cli — it cannot be set via sql set the following parameters and reboot the instance performance schema = 1 max digest length = 4096 performance schema max digest length = 4096 performance schema max sql text length = 4096 self hosted — add to your my cnf / mysqld cnf \[mysqld] performance schema = on performance schema consumer events statements current = on performance schema consumer events statements history = on performance schema consumer events statements history long = on performance schema consumer events waits current = on max digest length = 4096 performance schema max digest length = 4096 performance schema max sql text length = 4096 5\ optional schema collection to enable the kloudmate schema explorer, grant select on the databases the agent should discover \ option a all databases grant select on to 'kloudmate'@'%'; \ option b specific databases only grant select on your app db to 'kloudmate'@'%'; the agent queries information schema for metadata — it does not read your actual row data 6\ finalize flush privileges; 7\ verify setup run these as the kloudmate user \ check performance schema is enabled select @@performance schema; \ check consumers are active select name, enabled from performance schema setup consumers where name like 'events statements %' or name = 'events waits current'; \ check query digests are being collected select count( ) from performance schema events statements summary by digest; \ verify procedures exist show procedure status where db = 'kloudmate'; \ test explain call kloudmate explain statement('select 1'); mysql agent configuration receivers kloudmate mysql endpoint "\<host> 3306" username "kloudmate" password "${kloudmate db password}" database "" # empty = monitor all databases collect query metrics true collect query samples true collect explain plans true collect schema true # requires select grants (step 5) collection interval 10s query samples rate 1 # sample every nth query tls enabled true # recommended for rds skip verify false for rds with iam authentication receivers kloudmate mysql endpoint "\<rds endpoint> 3306" auth type "iam" username "kloudmate" aws region "ap south 1" instance endpoint "\<rds endpoint>" postgresql setup 1\ create monitoring user connect as a superuser (or rds superuser on rds/aurora) to the postgres database create user kloudmate with password '\<kloudmate password>'; 2\ grant base permissions \ pg monitor built in role (pg 10+) granting read access to \ pg stat activity, pg stat replication, pg stat database, and more grant pg monitor to kloudmate; 3\ enable pg stat statements pg stat statements is the primary data source for query metrics (latency, calls, rows) create extension if not exists pg stat statements schema public; rds / aurora add pg stat statements to s hared preload libraries in your rds parameter group and reboot the instance self hosted — add to postgresql conf shared preload libraries = 'pg stat statements' pg stat statements track = all pg stat statements max = 10000 track activity query size = 4096 4\ create helper functions create a dedicated schema and the security definer wrapper functions create schema if not exists kloudmate; grant usage on schema kloudmate to kloudmate; grant usage on schema public to kloudmate; the security definer pattern means these functions execute with the privileges of the admin who created them, not the kloudmate user calling them this is how the agent gains full visibility into query stats and activity on managed databases without requiring superuser access pg stat activity wrapper ensures full visibility into all sessions' query text (required on rds where non superusers have restricted access) create or replace function kloudmate pg stat activity() returns setof pg stat activity as $$ select from pg catalog pg stat activity; $$ language sql security definer; pg stat statements wrapper ensures full visibility into all query statistics create or replace function kloudmate pg stat statements() returns setof pg stat statements as $$ select from pg stat statements; $$ language sql security definer; explain statement collects json execution plans for sampled queries create or replace function kloudmate explain statement( l query text, out explain json ) returns setof json as $$ declare curs refcursor; plan json; begin open curs for execute pg catalog concat('explain (format json) ', l query); loop fetch curs into plan; exit when not found; return query select plan; end loop; close curs; return; end; $$ language plpgsql security definer; 5\ optional functions stats reset safe way to reset pg stat statements counters without direct superuser access create or replace function kloudmate pg stat statements reset() returns void as $$ select pg stat statements reset(); $$ language sql security definer; replication monitoring only needed on the primary / writer instance for monitoring replication lag create or replace function kloudmate pg stat replication() returns setof pg stat replication as $$ select from pg catalog pg stat replication; $$ language sql security definer; 6\ configure parameters rds / aurora parameter group set the following in the aws console or cli and reboot shared preload libraries = 'pg stat statements' track activity query size = 4096 pg stat statements track = all pg stat statements max = 10000 (recommended) self hosted — add to postgresql conf shared preload libraries = 'pg stat statements' pg stat statements track = all pg stat statements max = 10000 track activity query size = 4096 track io timing = on # required for i/o metrics log min duration statement = 1000 # log slow queries (ms) 7\ optional auto explain for full explain analyze plans (actual execution times, not just estimates) shared preload libraries = 'pg stat statements,auto explain' auto explain log min duration = 1000 auto explain log analyze = true auto explain log buffers = true auto explain log timing = true auto explain log triggers = true auto explain log verbose = true auto explain log format = json auto explain log nested statements = true auto explain sample rate = 1 # 1 = all, 0 1 = 10% sample auto explain with l og analyze=true adds overhead because it executes the query plan use sample rate < 1 in production 8\ repeat for each database pg stat statements and the kloudmate functions must exist in every database the agent monitors the agent connects to one database at a time for each additional database (e g app db , analytics db ), connect and re run the extension + function creation for db in $(psql t c "select datname from pg database where datistemplate = false"); do psql d "$db" f kloudmate functions sql done 9\ verify setup run as the kloudmate user \ check pg stat statements is loaded select count( ) from pg stat statements; \ check functions exist select routine name from information schema routines where routine schema = 'kloudmate'; \ test pg stat activity wrapper select count( ) from kloudmate pg stat activity(); \ test explain select from kloudmate explain statement('select 1'); \ check track activity query size show track activity query size; \ check track io timing show track io timing; postgresql agent configuration receivers kloudmate postgresql endpoint "\<host> 5432" username "kloudmate" password "${kloudmate db password}" database "postgres" # primary connection database databases to monitor # additional databases \ "app db" \ "analytics db" collect query metrics true collect query samples true collect explain plans true collect schema true collect replication stats true collection interval 10s tls mode "require" # require | verify ca | verify full ca cert "/path/to/rds ca bundle pem" for rds with iam authentication receivers kloudmate postgresql endpoint "\<rds endpoint> 5432" auth type "iam" username "kloudmate" database "postgres" aws region "ap south 1" instance endpoint "\<rds endpoint>" cleanup / uninstall mysql drop procedure if exists kloudmate enable events statements consumers; drop procedure if exists kloudmate explain statement; drop schema if exists kloudmate; drop user if exists 'kloudmate'@'%'; postgresql drop function if exists kloudmate pg stat activity(); drop function if exists kloudmate pg stat statements(); drop function if exists kloudmate explain statement(text); drop function if exists kloudmate pg stat statements reset(); drop function if exists kloudmate pg stat replication(); drop schema if exists kloudmate; drop user if exists kloudmate;