OpenTelemetry Support
Service Integration Guides

Monitor MSSQL using OpenTelemetry receivers

9min
this document demonstrate the steps to scrape mssql metrics as well as logs to kloudmate using various receivers the sqlserver sqlserver receiver grabs metrics about a microsoft sql server instance the receiver works by either using the windows performance counters, or by directly connecting to the instance and querying it windows performance counters are only available when running on windows to get the mssql logs the sql query receiver used it uses custom sql queries to generate metrics or logs from a database connection here hostmetrics receiver hostmetrics receiver , sql server receiver sql server receiver , and sql query receiver sql query receiver are configured prerequisite mssql server must be installed and running install the opentelemetry collector on the server see installing and configuring opentelemetry collector make sure to run the collector as administrator in order to collect all performance counters for metrics step 1 configure the microsoft sql server receiver server receiver along with the hostmetrics receiver hostmetrics receiver scrape the mssql metrics and sql query receiver sql query receiver along with the filelog receiver filelog receiver to pull required logs linux users open the file located at /etc/otelcol contrib/config yaml /etc/otelcol contrib/config yaml using your preferred text editor windows users create a new file called config yaml config yaml in the c \program files\opentelemetry collector c \program files\opentelemetry collector folder you can use notepad or any text editor to do this extensions health check pprof endpoint 0 0 0 0 1777 zpages endpoint 0 0 0 0 55679 file storage create directory true receivers otlp protocols grpc endpoint 0 0 0 0 4317 http endpoint 0 0 0 0 4318 hostmetrics collection interval 30s scrapers cpu metrics system cpu utilization enabled true load cpu average true memory metrics system memory utilization enabled true filesystem metrics system filesystem usage enabled true system filesystem utilization enabled true disk {} network {} processes {} sqlserver collection interval 30s username \<username> #use the username password \<password> #use the secure password server 0 0 0 0 port 1433 metrics sqlserver database count enabled true sqlserver database io enabled true sqlserver database latency enabled true sqlserver database operations enabled true sqlserver processes blocked enabled true sqlserver resource pool disk throttled read rate enabled true sqlserver resource pool disk throttled write rate enabled true sqlquery/upmetrics driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" #modify the username and password queries \# query collect sql server uptime as a metric \ sql | select sqlserver start time, datediff(minute, sqlserver start time, getdate()) as minutes uptime from sys dm os sys info; metrics \ metric name sqlserver uptime minutes value column minutes uptime attribute columns \["sqlserver start time"] static attributes dbinstance mydbinstance filelog include \# ensure the path is appropriate for your server's operating system uncomment the line below if the server is running windows \ /var/opt/mssql/log/errorlog \# 'c \program files\microsoft sql server\mssql\log\errorlog ' storage file storage multiline line start pattern ^\d{4} \d{2} \d{2} \d{2} \d{2} \d{2}\\ \d{2} retry on failure enabled true \# query to show client connections and ip sqlquery/connection driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select c session id, c client net address as client ip address, s host name, s login name, s program name, s login time, s last request start time, s last request end time, 'mssql client log' as log source from sys dm exec connections as c join sys dm exec sessions as s on c session id = s session id where s last request start time > $$1 order by s last request start time asc; tracking column last request start time tracking start value "" # add today date logs \ body column client ip address attribute columns \["session id", "host name", "login name", "program name", "login time", "last request start time", "last request end time", "log source"] \# to show slow query sqlquery/slowquery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage queries \ sql | select , substring( st text, (qs statement start offset / 2) + 1, ( ( case qs statement end offset when 1 then datalength(st text) else qs statement end offset end qs statement start offset ) / 2 ) + 1 ) as statement text, 'mssql slow query log' as log source from sys dm exec query stats as qs cross apply sys dm exec sql text(qs sql handle) as st where qs last execution time > $$1 order by qs last execution time; tracking column last execution time tracking start value " " #add today's date logs \ body column statement text attribute columns \["sql handle", "execution count", "total worker time", "last execution time", "creation time", "total elapsed time", "statement start offset", "statement end offset", "log source"] \# to get audit log sqlquery/audit driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" storage file storage collection interval 30s queries \ sql "select event time, action id, succeeded, session server principal name as user name, server principal name as login name, database name, schema name, object name, statement, additional information from sys fn get audit file('/var/opt/mssql/log/audit sqlaudit', default, default) where event time > $$1 order by event time asc" tracking column event time tracking start value " " # modify today date logs \ body column statement attribute columns \["event time", "action id", "succeeded", "user name", "login name", "database name", "schema name", "object name", "additional information", "log source"] \# active query that is running sqlquery/activequery driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select r session id, r status, r command, r cpu time, r start time, s login name, s host name, s program name, st text as query text, 'mssql active query' as log source from sys dm exec requests r inner join sys dm exec sessions s on r session id = s session id cross apply sys dm exec sql text(r sql handle) st where s is user process = 1 logs \ body column query text attribute columns \["session id", "status", "command", "cpu time", "start time", "login name", "host name", "program name", "log source"] \# user session sqlquery/usersession driver mssql datasource "sqlserver //\<user> \<password>@localhost 1433?database=kloudmate test\&encrypt=disable" queries \ sql | select session id, login name, host name, program name, login time, status, client interface name, cpu time, memory usage, reads, writes, 'mssql user sessions' as log source from sys dm exec sessions where is user process = 1; logs \ body column login name attribute columns \["session id", "host name", "program name", "login time", "status", "client interface name", "cpu time", "memory usage", "reads", "writes", "log source"] step 2 configure the processor part to detect resource information from the host and append or override the resource value in telemetry data with this information on premise, non cloud, or cloud processors batch send batch size 10000 timeout 30s resourcedetection detectors \[system] system resource attributes host name enabled true host id enabled true os type enabled false resource attributes \ key service name action insert from attribute host name aws ec2 optional to retrieve aws ec2 instance tags along with logs and metrics, you need to associate an iam role with the ec2 instance that includes the ec2\ describetags policy the processor below needs to be added processors batch send batch size 5000 timeout 10s resourcedetection/ec2 detectors \["ec2"] ec2 tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute ec2 tag name \ key service instance id action insert from attribute host id azure virtual machines processors batch send batch size 5000 timeout 10s resourcedetection/azure detectors \["azure"] azure tags \ ^tag1$ \ ^tag2$ \ ^label $ \ ^name$ resource attributes \ key service name action insert from attribute azure vm name \ key service instance id action insert from attribute host id step 3 set up the kloudmate backend on the exporter part with the correct extensions of the opentelemetry configuration file and configure the pipeline change the pipeline as per your correct configuration exporters debug verbosity detailed otlphttp endpoint 'https //otel kloudmate com 4318' headers authorization xxxxxxxx # use the auth key service pipelines metrics receivers \[sqlserver, hostmetrics, sqlquery/upmetrics] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] log receivers \[filelog, sqlquery/activequery, sqlquery/connection, sqlquery/audit, sqlquery/usersession, sqlquery/slowquery] processors \[batch, resourcedetection, resource] exporters \[debug, otlphttp] extensions \[health check, pprof, zpages, file storage] step 4 restart the opentelemetry collector service linux sudo systemctl restart otelcol contrib sudo systemctl status otelcol contrib windows go to the windows service you can directly search and open otherwise can go to run ➝ type services msc ➝ ok on service search for opentelemetry collector service and restart it default metrics metrics description type unit sqlserver user connection count number of users connected to the sql server gauge number sqlserver lock wait time avg average wait time for all lock requests that had to wait gauge milliseconds sqlserver lock wait rate number of lock requests resulting in a wait gauge number sqlserver batch request rate number of batch requests received by sql server gauge number sqlserver batch sql compilation rate number of sql compilations needed gauge number sqlserver batch sql recompilation rate number of sql recompilations needed gauge number sqlserver page buffer cache hit ratio pages found in the buffer pool without having to read from disk gauge number sqlserver page life expectancy time a page will stay in the buffer pool gauge seconds sqlserver page split rate number of pages split as a result of overflowing index pages gauge number sqlserver page lazy write rate number of lazy writes moving dirty pages to disk gauge number sqlserver page checkpoint flush rate number of pages flushed by operations requiring dirty pages to be flushed gauge number sqlserver page operation rate number of physical database page operations issued gauge number sqlserver transaction log growth count total number of transaction log expansions for a database sum number sqlserver transaction log shrink count total number of transaction logs shrinks for a database sum number sqlserver transaction log usage percent of transaction log space used gauge number sqlserver transaction log flush wait rate number of commits waiting for a transaction log flush gauge number sqlserver transaction log flush rate number of log flushes gauge number sqlserver transaction log flush data rate total number of log bytes flushed gauge bytes sqlserver transaction rate number of transactions started for the database (not including xtp only transactions) gauge number sqlserver transaction write rate number of transactions that were written to the database and committed gauge number sqlserver database latency total time that the users waited for i/o issued on this file sum seconds sqlserver database operations the number of operations issued on the file sum number sqlserver database io the number of bytes of i/o on this file sum bytes sqlserver resource pool disk throttled read rate the number of read operations that were throttled in the last second gauge number sqlserver resource pool disk throttled write rate the number of write operations that were throttled in the last second gauge number sqlserver processes blocked the number of processes that are currently blocked gauge number sqlserver database count the number of databases gauge number for full details on metrics, click here