Alerts Overview
Alerts tell you when something needs attention — an error-rate spike, a service going quiet, a budget threshold crossed. You write the rule once; KloudMate evaluates it on a schedule, groups related firings together, and routes notifications to the channels you care about.
This page covers the concepts behind the Alerts module. Once they click, the rest of the section is a tour of each surface.
Key concepts
Section titled “Key concepts”Alert rule
Section titled “Alert rule”An alert rule contains the evaluation criteria — one or more queries, expressions, and a condition. It also specifies how often KloudMate evaluates the rule, how long the condition must hold before the rule fires, and how long it must stay clear before the rule resolves.
Alert query
Section titled “Alert query”A query fetches data from a source. KloudMate alerts support AWS CloudWatch and KloudMate’s own telemetry (logs, metrics, spans) as sources.
Alert expressions
Section titled “Alert expressions”Expressions transform query output with math, reductions, or conditions. They reduce time series to a single number you can compare against a threshold. See Alert Expressions.
Alert labels
Section titled “Alert labels”Labels are key-value pairs attached to each firing alert. They come from query dimensions and any folder the rule lives in, and they’re what Routing Rules match against to decide where notifications go. KloudMate generates labels automatically — you don’t enter them by hand.
Annotations and severity
Section titled “Annotations and severity”Annotations carry human-readable context (summary, runbook URL, dashboard link) into each notification. Values support Liquid templates, so you can interpolate live label and state data into the message. The reserved severity annotation key drives downstream routing and incident severity. See Annotations & Severity.
Folders
Section titled “Folders”Folders group related alert rules and let member rules inherit shared defaults — evaluation interval, no-data state, and eval-error state. See Folders.
Alert groups
Section titled “Alert groups”When matching alerts fire close together, KloudMate bundles them into a single Alert Group — a durable, deduplicated container that updates as higher-severity signals join, holds the auto-RCA if you’ve enabled one, and provides a single thread to notify against. See Alert Groups.
Routing rules
Section titled “Routing rules”Routing rules decide which channels get notified for which alerts. Each rule matches alerts by their labels, groups them by chosen label keys, and dispatches notifications to one or more channels. See Routing Rules.
Silences
Section titled “Silences”A silence suppresses notifications for matching labels for a bounded time window — useful when you know an alert will be noisy and don’t want to flood your channels. See Silences.
Multi-dimensional alerts
Section titled “Multi-dimensional alerts”A single rule can produce multiple alert instances, one per dimension. A rule watching Lambda throttling generates one instance per throttled function.
Workflow of KloudMate Alerts
Section titled “Workflow of KloudMate Alerts”- A rule retrieves data from its source using queries.
- Expressions reduce or transform query results.
- The condition is evaluated; if it holds for the pending duration, the rule fires.
- The firing alert is tagged with labels and annotations and sent into the grouping engine.
- The grouping engine matches the alert against Routing Rules, opens or appends to an Alert Group, and dispatches notifications to the rule’s destination channels.
- Active Silences and Maintenance Windows can suppress the outbound notification at this step. The rule keeps evaluating in the background and the state change still lands in history.
KloudMate alert states
Section titled “KloudMate alert states”- Firing / Alerting — the rule’s condition has held longer than the pending duration.
- Pending — the condition is currently true but hasn’t held long enough yet.
- Recovering — the condition has cleared, but the rule is still firing while it waits out the recovery period before resolving.
- Normal — everything’s quiet.
- Error — the evaluation hit an error (bad query, source unreachable).
- No Data — the query returned no data for the configured window.
For how these states transition — and how the pending duration, recovery period, and no-data / error settings shape them — see Alert Lifecycle & States.