Database Monitoring
MSSQL Monitoring
14 min
microsoft sql server (mssql) is a robust relational database management system from microsoft, renowned for its enterprise grade performance, high availability features, and advanced analytics capabilities it powers mission critical applications, data warehousing, and business intelligence workloads mssql monitoring in kloudmate helps you observe the health, performance, and behavior of your microsoft sql server instances by collecting metrics and logs using the kloudmate agent powered by opentelemetry this enables centralized monitoring of mssql instances running on windows hosts, aws ec2, azure virtual machines, or on premise servers mssql integration using kloudmate agents what this integration provides with mssql monitoring enabled, kloudmate collects telemetry that provides visibility into database health and performance metrics lock waits, blocking processes, and transaction behavior buffer pool efficiency and i/o performance transaction log usage and growth patterns resource pool throttling and disk i/o latency server level resource utilization this visibility helps identify performance bottlenecks, blocking issues, transaction log problems, buffer pool inefficiencies, and availability concerns prerequisites mssql server must be installed and running kloudmate agent installed on the mssql host (see agent installation for linux and windows) run agents as administrator to collect all windows performance counters required permissions windows performance counters run the kloudmate agent as administrator to collect all performance counter based metrics direct database connection when configured for direct sql server connection, the monitoring user must have database permissions (at least one of) • create database • alter any database • view any database server state permissions • sql server pre 2022 view server state • sql server 2022+ view server performance state configuration overview step 1 configure the kloudmate agent with • hostmetrics receiver (system metrics) • sqlserver receiver (mssql performance metrics) • sqlquery receiver (custom sql metrics) • filelog receiver (mssql logs) step 2 access agents and opentelemetry collector configuration log in to the kloudmate platform go to settings → agents select the agent running on the mssql host click actions → collector configuration yaml editor opens for configuration step 3 add required extensions and receivers extensions health check pprof endpoint 0 0 0 0 1777 zpages endpoint 0 0 0 0 55679 file storage create directory true receivers otlp protocols grpc endpoint 0 0 0 0 4317 http endpoint 0 0 0 0 4318 hostmetrics collection interval 30s scrapers cpu metrics system cpu utilization enabled true load cpu average true memory metrics system memory utilization enabled true filesystem metrics system filesystem usage enabled true system filesystem utilization enabled true disk {} network {} processes {} sqlserver collection interval 30s username \<username> #use the username password \<password> #use the secure password server 0 0 0 0 port 1433 metrics sqlserver database count enabled true sqlserver database io enabled true sqlserver database latency enabled true sqlserver database operations enabled true sqlserver processes blocked enabled true sqlserver resource pool disk throttled read rate enabled true sqlserver resource pool disk throttled write rate enabled true sqlquery/upmetrics driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" #modify the username and password queries \# query collect sql server uptime as a metric \ sql | select sqlserver start time, datediff(minute, sqlserver start time, getdate()) as minutes uptime from sys dm os sys info; metrics \ metric name sqlserver uptime minutes value column minutes uptime attribute columns \["sqlserver start time"] static attributes dbinstance mydbinstance filelog include \# ensure the path is appropriate for your server's operating system uncomment the line below if the server is running windows \ /var/opt/mssql/log/errorlog \# 'c \program files\microsoft sql server\mssql\log\errorlog ' storage file storage multiline line start pattern ^\d{4} \d{2} \d{2} \d{2} \d{2} \d{2}\\ \d{2} retry on failure enabled true \# query to show client connections and ip sqlquery/connection driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select c session id, c client net address as client ip address, s host name, s login name, s program name, s login time, s last request start time, s last request end time, 'mssql client log' as log source from sys dm exec connections as c join sys dm exec sessions as s on c session id = s session id where s last request start time > $$1 order by s last request start time asc; tracking column last request start time tracking start value "" # add today date logs \ body column client ip address attribute columns \["session id", "host name", "login name", "program name", "login time", "last request start time", "last request end time", "log source"] \# to show slow query sqlquery/slowquery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select , substring( st text, (qs statement start offset / 2) + 1, ( ( case qs statement end offset when 1 then datalength(st text) else qs statement end offset end qs statement start offset ) / 2 ) + 1 ) as statement text, 'mssql slow query log' as log source from sys dm exec query stats as qs cross apply sys dm exec sql text(qs sql handle) as st where qs last execution time > $$1 order by qs last execution time; tracking column last execution time tracking start value " " #add today's date logs \ body column statement text attribute columns \["sql handle", "execution count", "total worker time", "last execution time", "creation time", "total elapsed time", "statement start offset", "statement end offset", "log source"] \# to get audit log sqlquery/audit driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage collection interval 30s queries \ sql "select event time, action id, succeeded, session server principal name as user name, server principal name as login name, database name, schema name, object name, statement, additional information from sys fn get audit file('/var/opt/mssql/log/audit sqlaudit', default, default) where event time > $$1 order by event time asc" tracking column event time tracking start value " " # modify today date logs \ body column statement attribute columns \["event time", "action id", "succeeded", "user name", "login name", "database name", "schema name", "object name", "additional information", "log source"] \# active query that is running sqlquery/activequery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select r session id, r status, r command, r cpu time, r start time, s login name, s host name, s program name, st text as query text, 'mssql active query' as log source from sys dm exec requests r inner join sys dm exec sessions s on r session id = s session id cross apply sys dm exec sql text(r sql handle) st where s is user process = 1 logs \ body column query text attribute columns \["session id", "status", "command", "cpu time", "start time", "login name", "host name", "program name", "log source"] \# user session sqlquery/usersession driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select session id, login name, host name, program name, login time, status, client interface name, cpu time, memory usage, reads, writes, 'mssql user sessions' as log source from sys dm exec sessions where is user process = 1; logs \ body column login name attribute columns \["session id", "host name", "program name", "login time", "status", "client interface name", "cpu time", "memory usage", "reads", "writes", "log source"] step 4 configure processors set up the processor component to identify resource information from the host and either append or replace the resource values in the telemetry data with this information choose one configuration based on where mssql is running server (on premise / non cloud / cloud) processors batch send batch size 10000 timeout 30s resourcedetection detectors \[system] system resource attributes host name enabled true host id enabled true os type enabled false resource attributes \ key service name action insert from attribute host name aws ec2 optional to retrieve aws ec2 instance tags along with logs and metrics, you need to associate an iam role with the ec2 instance that includes the ec2\ describetags policy the processor below needs to be added processors batch send batch size 5000 timeout 10s resourcedetection/ec2 detectors \["ec2"] ec2 tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute ec2 tag name \ key service instance id action insert from attribute host id azure virtual machines processors batch send batch size 5000 timeout 10s resourcedetection/azure detectors \["azure"] azure tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute azure vm name \ key service instance id action insert from attribute host id step 5 configure exporter and pipelines configure the kloudmate backend exporter and define pipelines for metrics and logs change the pipeline as per your correct configuration exporters debug verbosity detailed otlphttp endpoint 'https //otel kloudmate com 4318' headers authorization xxxxxxxx # use the auth key service pipelines metrics receivers \[sqlserver, hostmetrics, sqlquery/upmetrics] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] log receivers \[filelog, sqlquery/activequery, sqlquery/connection, sqlquery/audit, sqlquery/usersession, sqlquery/slowquery] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] extensions \[health check, pprof, zpages, file storage] step 6 save configuration and restart agent after updating the configuration, click save configuration in the kloudmate ui you can also restart the agent from your server’s console for linux execute the following commands systemctl restart kmagent systemctl status kmagent these commands restart the kloudmate agent and display its current status for windows open the services window press win + r , type services msc , and press ok or search for services from the start menu locate the kloudmate agent service right click the service and select restart after restarting the agent, verify that mssql metrics and logs are visible in the kloudmate dashboard post‑integration data validation verify that metrics are flowing into kloudmate using the explore view after the agent restarts log in to kloudmate navigate to explore select opentelemetry → metrics choose a mssql metric and run the query seeing time series data confirms that mssql telemetry is flowing successfully standard mssql dashboards kloudmate provides prebuilt mssql dashboards through https //templates kloudmate com/mssql/index html these dashboards visualize mssql database performance, query operations, connections, and resource usage to import and start using these templates, follow the steps described in https //docs kloudmate com/creating a dashboard#sfk3g default metrics metrics description type unit sqlserver user connection count number of users connected to the sql server gauge number sqlserver lock wait time avg average wait time for all lock requests that had to wait gauge milliseconds sqlserver lock wait rate number of lock requests resulting in a wait gauge number sqlserver batch request rate number of batch requests received by sql server gauge number sqlserver batch sql compilation rate number of sql compilations needed gauge number sqlserver batch sql recompilation rate number of sql recompilations needed gauge number sqlserver page buffer cache hit ratio pages found in the buffer pool without having to read from disk gauge number sqlserver page life expectancy time a page will stay in the buffer pool gauge seconds sqlserver page split rate number of pages split as a result of overflowing index pages gauge number sqlserver page lazy write rate number of lazy writes moving dirty pages to disk gauge number sqlserver page checkpoint flush rate number of pages flushed by operations requiring dirty pages to be flushed gauge number sqlserver page operation rate number of physical database page operations issued gauge number sqlserver transaction log growth count total number of transaction log expansions for a database sum number sqlserver transaction log shrink count total number of transaction logs shrinks for a database sum number sqlserver transaction log usage percent of transaction log space used gauge number sqlserver transaction log flush wait rate number of commits waiting for a transaction log flush gauge number sqlserver transaction log flush rate number of log flushes gauge number sqlserver transaction log flush data rate total number of log bytes flushed gauge bytes sqlserver transaction rate number of transactions started for the database (not including xtp only transactions) gauge number sqlserver transaction write rate number of transactions that were written to the database and committed gauge number sqlserver database latency total time that the users waited for i/o issued on this file sum seconds sqlserver database operations the number of operations issued on the file sum number sqlserver database io the number of bytes of i/o on this file sum bytes sqlserver resource pool disk throttled read rate the number of read operations that were throttled in the last second gauge number sqlserver resource pool disk throttled write rate the number of write operations that were throttled in the last second gauge number sqlserver processes blocked the number of processes that are currently blocked gauge number sqlserver database count the number of databases gauge number for the complete metrics list, refer to the https //github com/open telemetry/opentelemetry collector contrib/blob/main/receiver/sqlserverreceiver/documentation md