Database Monitoring
MSSQL Monitoring
MSSQL Integration with KloudMate using KloudMate Agents
7 min
this document describes the steps to collect mssql metrics and logs using kloudmate agents by configuring the appropriate receivers within the agent’s collector configuration the sqlserver receiver collects detailed performance metrics from a microsoft sql server instance it can gather data either through windows performance counters (when running on windows) or by directly connecting to the database and executing queries performance counters are only available on windows hosts to capture mssql logs, the sql query receiver is used it allows you to execute custom sql queries and convert the results into metrics or logs that can be forwarded to kloudmate in this setup, the hostmetrics receiver , sqlserver receiver , and sqlquery receiver are all configured within the kloudmate agent to collect and send telemetry to kloudmate prerequisite mssql server must be installed and running kloudmate agent installed on the postgresql host (see agent installation for https //docs kloudmate com/linux agent and https //docs kloudmate com/windows agent ) ensure that you run the agents as an administrator to collect all performance counters for metrics step 1 configure the microsoft sql server receiver server receiver along with the hostmetrics receiver to hostmetrics receiver to scrape the mssql metrics and sql query receiver sql query receiver along with the filelog receiver filelog receiver to pull required logs navigate to the kloudmate platform and log in to your account go to settings > agents section here you can view all installed agents access configuration settings for each agent review the default configurations provided by kloudmate to edit an agent's collector configuration select the desired agent click the actions menu (or agent action tab) choose collector configuration a text editor ui will open, allowing you to edit the agent’s yaml configuration directly within the platform extensions health check pprof endpoint 0 0 0 0 1777 zpages endpoint 0 0 0 0 55679 file storage create directory true receivers otlp protocols grpc endpoint 0 0 0 0 4317 http endpoint 0 0 0 0 4318 hostmetrics collection interval 30s scrapers cpu metrics system cpu utilization enabled true load cpu average true memory metrics system memory utilization enabled true filesystem metrics system filesystem usage enabled true system filesystem utilization enabled true disk {} network {} processes {} sqlserver collection interval 30s username \<username> #use the username password \<password> #use the secure password server 0 0 0 0 port 1433 metrics sqlserver database count enabled true sqlserver database io enabled true sqlserver database latency enabled true sqlserver database operations enabled true sqlserver processes blocked enabled true sqlserver resource pool disk throttled read rate enabled true sqlserver resource pool disk throttled write rate enabled true sqlquery/upmetrics driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" #modify the username and password queries \# query collect sql server uptime as a metric \ sql | select sqlserver start time, datediff(minute, sqlserver start time, getdate()) as minutes uptime from sys dm os sys info; metrics \ metric name sqlserver uptime minutes value column minutes uptime attribute columns \["sqlserver start time"] static attributes dbinstance mydbinstance filelog include \# ensure the path is appropriate for your server's operating system uncomment the line below if the server is running windows \ /var/opt/mssql/log/errorlog \# 'c \program files\microsoft sql server\mssql\log\errorlog ' storage file storage multiline line start pattern ^\d{4} \d{2} \d{2} \d{2} \d{2} \d{2}\\ \d{2} retry on failure enabled true \# query to show client connections and ip sqlquery/connection driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select c session id, c client net address as client ip address, s host name, s login name, s program name, s login time, s last request start time, s last request end time, 'mssql client log' as log source from sys dm exec connections as c join sys dm exec sessions as s on c session id = s session id where s last request start time > $$1 order by s last request start time asc; tracking column last request start time tracking start value "" # add today date logs \ body column client ip address attribute columns \["session id", "host name", "login name", "program name", "login time", "last request start time", "last request end time", "log source"] \# to show slow query sqlquery/slowquery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select , substring( st text, (qs statement start offset / 2) + 1, ( ( case qs statement end offset when 1 then datalength(st text) else qs statement end offset end qs statement start offset ) / 2 ) + 1 ) as statement text, 'mssql slow query log' as log source from sys dm exec query stats as qs cross apply sys dm exec sql text(qs sql handle) as st where qs last execution time > $$1 order by qs last execution time; tracking column last execution time tracking start value " " #add today's date logs \ body column statement text attribute columns \["sql handle", "execution count", "total worker time", "last execution time", "creation time", "total elapsed time", "statement start offset", "statement end offset", "log source"] \# to get audit log sqlquery/audit driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage collection interval 30s queries \ sql "select event time, action id, succeeded, session server principal name as user name, server principal name as login name, database name, schema name, object name, statement, additional information from sys fn get audit file('/var/opt/mssql/log/audit sqlaudit', default, default) where event time > $$1 order by event time asc" tracking column event time tracking start value " " # modify today date logs \ body column statement attribute columns \["event time", "action id", "succeeded", "user name", "login name", "database name", "schema name", "object name", "additional information", "log source"] \# active query that is running sqlquery/activequery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select r session id, r status, r command, r cpu time, r start time, s login name, s host name, s program name, st text as query text, 'mssql active query' as log source from sys dm exec requests r inner join sys dm exec sessions s on r session id = s session id cross apply sys dm exec sql text(r sql handle) st where s is user process = 1 logs \ body column query text attribute columns \["session id", "status", "command", "cpu time", "start time", "login name", "host name", "program name", "log source"] \# user session sqlquery/usersession driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select session id, login name, host name, program name, login time, status, client interface name, cpu time, memory usage, reads, writes, 'mssql user sessions' as log source from sys dm exec sessions where is user process = 1; logs \ body column login name attribute columns \["session id", "host name", "program name", "login time", "status", "client interface name", "cpu time", "memory usage", "reads", "writes", "log source"] step 2 configure the processor part to detect resource information from the host and append or override the resource value in telemetry data with this information on premise, non cloud, or cloud processors batch send batch size 10000 timeout 30s resourcedetection detectors \[system] system resource attributes host name enabled true host id enabled true os type enabled false resource attributes \ key service name action insert from attribute host name aws ec2 optional to retrieve aws ec2 instance tags along with logs and metrics, you need to associate an iam role with the ec2 instance that includes the ec2\ describetags policy the processor below needs to be added processors batch send batch size 5000 timeout 10s resourcedetection/ec2 detectors \["ec2"] ec2 tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute ec2 tag name \ key service instance id action insert from attribute host id azure virtual machines processors batch send batch size 5000 timeout 10s resourcedetection/azure detectors \["azure"] azure tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute azure vm name \ key service instance id action insert from attribute host id step 3 configure the exporter and extension and save the configuration set up the exporter as the kloudmate backend by mentioning the endpoint and authorization key in the agent configuration file, and configure the pipeline change the pipeline as per your correct configuration exporters debug verbosity detailed otlphttp endpoint 'https //otel kloudmate com 4318' headers authorization xxxxxxxx # use the auth key service pipelines metrics receivers \[sqlserver, hostmetrics, sqlquery/upmetrics] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] log receivers \[filelog, sqlquery/activequery, sqlquery/connection, sqlquery/audit, sqlquery/usersession, sqlquery/slowquery] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] extensions \[health check, pprof, zpages, file storage] step 4 save configuration and restart agent after updating the configuration, click “ save configuration ” in the ui or, from your server’s console, run for linux execute the following commands systemctl restart kmagent systemctl status kmagent these commands will restart the kloudmate agent and display its current status for windows open the services window press win + r win + r , type services msc services msc , and press ok ok alternatively, search for " services services " in the windows start menu in the services window, locate the " kloudmate agent " service right click the service and select " restart restart " subsequently, monitor the metrics on the kloudmate dashboard and set up an alarm to receive notifications if the potential metrics for a specific application rise default metrics metrics description type unit sqlserver user connection count number of users connected to the sql server gauge number sqlserver lock wait time avg average wait time for all lock requests that had to wait gauge milliseconds sqlserver lock wait rate number of lock requests resulting in a wait gauge number sqlserver batch request rate number of batch requests received by sql server gauge number sqlserver batch sql compilation rate number of sql compilations needed gauge number sqlserver batch sql recompilation rate number of sql recompilations needed gauge number sqlserver page buffer cache hit ratio pages found in the buffer pool without having to read from disk gauge number sqlserver page life expectancy time a page will stay in the buffer pool gauge seconds sqlserver page split rate number of pages split as a result of overflowing index pages gauge number sqlserver page lazy write rate number of lazy writes moving dirty pages to disk gauge number sqlserver page checkpoint flush rate number of pages flushed by operations requiring dirty pages to be flushed gauge number sqlserver page operation rate number of physical database page operations issued gauge number sqlserver transaction log growth count total number of transaction log expansions for a database sum number sqlserver transaction log shrink count total number of transaction logs shrinks for a database sum number sqlserver transaction log usage percent of transaction log space used gauge number sqlserver transaction log flush wait rate number of commits waiting for a transaction log flush gauge number sqlserver transaction log flush rate number of log flushes gauge number sqlserver transaction log flush data rate total number of log bytes flushed gauge bytes sqlserver transaction rate number of transactions started for the database (not including xtp only transactions) gauge number sqlserver transaction write rate number of transactions that were written to the database and committed gauge number sqlserver database latency total time that the users waited for i/o issued on this file sum seconds sqlserver database operations the number of operations issued on the file sum number sqlserver database io the number of bytes of i/o on this file sum bytes sqlserver resource pool disk throttled read rate the number of read operations that were throttled in the last second gauge number sqlserver resource pool disk throttled write rate the number of write operations that were throttled in the last second gauge number sqlserver processes blocked the number of processes that are currently blocked gauge number sqlserver database count the number of databases gauge number for full details on metrics, click https //github com/open telemetry/opentelemetry collector contrib/blob/main/receiver/sqlserverreceiver/documentation md