Alarms & Notifications

Understanding KloudMate Alarms

12min

Developers can create custom KloudMate alarms to monitor the events that are critical to their application. These alarms are based on a set of evaluation criteria that determines if the alarm will be triggered or not.

Key Concepts of KloudMate Alarms

Alarm Rule

An alarm rule is a set of evaluation criteria that contains one or more queries, expressions, and conditions. It also specifies the frequency of evaluation and the duration for which the condition should be met in order for the alarm to be triggered.

Alarm Query

A query is a set of configurational instructions that is used to request data from a data source. For KloudMate alarms, developers can choose AWS CloudWatch or KloudMate as their data source.

Alarm Expressions

An expression is used to manipulate, reduce or transform the data returned from queries using math operations, and reduction functions. It is also used to set alarm conditions against data returned directly from a query or against transformed data returned by an expression.

Notification Channel

Notification channels are the ChatOps and communication platforms where the notifications for the alarms are sent. KloudMate integrates with some of the most widely used workplace chatOps tools such as Slack, Email, and more.

Notification Tags

Tags are used to add name-value attributes to the alarm in order to provide cross-reference and mapping capability between the alarm and the notification policies.

Notification Policies

These are a set of rules to define when, where, and how the alarm notifications are sent. Notification policies are configured to send a notification to a particular notification channel when an alarm with matching tags is triggered.

Alarm Labels

Alarm labels are key-value pairs containing additional information about the alarm rule.

  • KloudMate assigns labels to alarm queries on the basis of the dimensions that the query uses
  • Alarm Expressions inherit the labels of the alarm queries
  • Labels decide whether operations between queries and expressions are allowed or not
  • Operations between expressions and queries are allowed if they have the same label or when one label is a subset of the other

Users can view the labels when they preview the alarms.

Multi-dimensional Alarms

KloudMate alarms support the creation of multiple alarms per alarm rule. These are called multi-dimensional alarms. For example, an alarm rule evaluating if any Lambda Functions have been throttled will create individual alarms for each function that has been throttled.

Document image


Workflow of KloudMate Alarms

An alarm rule is configured to:

  • Retrieve data from the selected data source by running queries
  • Reduce or transform the query results using expressions
  • Compare queries and expressions to each other or to pre-defined thresholds based on conditions
  • Define how long the condition should be met before the alarm gets triggered

Once the condition is met and is found true for the defined duration, KloudMate maps the tags added with the alarm to the tags added with the notification policy. Then a notification is sent to the appropriate notification channel.

Note: If no tags are specified in the notification policy that is configured for an alarm then it will match all alarm tags.

The following diagram gives a general outline of the lifecycle of a KloudMate alarm:

KloudMate Alarm Workflow
KloudMate Alarm Workflow


KloudMate Alarm States

KloudMate alarm states are the status indicators of the alarms. KloudMate lets users filter and view their alarms based on the alarm states. Following are the KloudMate alarm states:

  • Firing/Alerting: The state of an alarm that has been active for longer than the configured duration
  • Pending: The state of an alarm that has been active for less than the configured threshold duration
  • Normal: The state of an alarm that is neither firing nor pending and everything is working as expected
  • Error: The state of an alarm when an error occured during the evaluation
  • No Data: No data has been received for the configured time window





Related Resources