Understanding KloudMate Alarms
Developers can create custom KloudMate alarms to monitor the events that are critical to their application. These alarms are based on a set of evaluation criteria that determines if the alarm will be triggered or not.
An alarm rule is a set of evaluation criteria that contains one or more queries, expressions, and conditions. It also specifies the frequency of evaluation and the duration for which the condition should be met in order for the alarm to be triggered.
A query is a set of configurational instructions that is used to request data from a data source. For KloudMate alarms, developers can choose AWS CloudWatch or KloudMate as their data source.
An expression is used to manipulate, reduce or transform the data returned from queries using math operations, and reduction functions. It is also used to set alarm conditions against data returned directly from a query or against transformed data returned by an expression.
Notification channels are the ChatOps and communication platforms where the notifications for the alarms are sent. KloudMate integrates with some of the most widely used workplace chatOps tools such as Slack, Email, and more.
Tags are used to add name-value attributes to the alarm in order to provide cross-reference and mapping capability between the alarm and the notification policies.
These are a set of rules to define when, where, and how the alarm notifications are sent. Notification policies are configured to send a notification to a particular notification channel when an alarm with matching tags is triggered.
Alarm labels are key-value pairs containing additional information about the alarm rule.
- KloudMate assigns labels to alarm queries on the basis of the dimensions that the query uses
- Alarm Expressions inherit the labels of the alarm queries
- Labels decide whether operations between queries and expressions are allowed or not
- Operations between expressions and queries are allowed if they have the same label or when one label is a subset of the other
Users can view the labels when they preview the alarms.
KloudMate alarms support the creation of multiple alarms per alarm rule. These are called multi-dimensional alarms. For example, an alarm rule evaluating if any Lambda Functions have been throttled will create individual alarms for each function that has been throttled.
An alarm rule is configured to:
- Retrieve data from the selected data source by running queries
- Reduce or transform the query results using expressions
- Compare queries and expressions to each other or to pre-defined thresholds based on conditions
- Define how long the condition should be met before the alarm gets triggered
Once the condition is met and is found true for the defined duration, KloudMate maps the tags added with the alarm to the tags added with the notification policy. Then a notification is sent to the appropriate notification channel.
Note: If no tags are specified in the notification policy that is configured for an alarm then it will match all alarm tags.
The following diagram gives a general outline of the lifecycle of a KloudMate alarm:
KloudMate alarm states are the status indicators of the alarms. KloudMate lets users filter and view their alarms based on the alarm states. Following are the KloudMate alarm states:
- Firing/Alerting: The state of an alarm that has been active for longer than the configured duration
- Pending: The state of an alarm that has been active for less than the configured threshold duration
- Normal: The state of an alarm that is neither firing nor pending and everything is working as expected
- Error: The state of an alarm when an error occured during the evaluation
- No Data: No data has been received for the configured time window