Rule Evaluation
A rule is the unit of detection in Lumetry. It binds a metric to a threshold strategy and a set of trigger and recovery conditions, and it produces violations when the metric misbehaves. This page explains how evaluation works conceptually — what a rule is, how a point becomes a violation, and how the warning/critical alert-level model works.
For the threshold math itself (baselines and the dynamic modes), see Dynamic Thresholds & Baselines.
What a rule binds together
A rule contains three groups of settings:
- Target — which metric it watches (
metricId, the metric key) and in whichdirectiona deviation counts (Above,Below, orBoth). - Threshold strategy —
Static(fixed numeric limits) orDynamic(limits derived from a learned baseline, using thePercentage,Stddev, orEnvelopemode). - Trigger & recovery logic — how many violations, over what window, are needed to raise an alert, and how much quiet time is needed to recover it.
A rule can only be created against a metric that is registered in the catalog with
isActive = true and dataStatus = ACTIVE (see
Metrics & the Metric Catalog).
Multiple conditions (compound rules)
A rule can combine more than one condition into a single alarm. Each condition watches its own metric with its own threshold strategy — so one condition can be static while another is dynamic — and the rule combines them with a boolean operator:
- All (AND) — the alarm raises only while every condition is breached.
- Any (OR) — the alarm raises while at least one condition is breached.
For example, an internet-banking alarm can fire when login count is above 500 or money transfer count is below 100. The compound rule still produces one alarm at a single severity (Warning or Critical); the individual conditions do not raise separate alarms.
A rule with a single condition behaves exactly like a classic single-metric rule. Compound rules are evaluated on a short recurring cycle (so they can read the current value of every metric they reference), while single-condition rules continue to evaluate as each point arrives. Conditions apply to single-value metrics; fleet (multidimensional) rules use scope and the warning/critical levels instead.
From a point to a violation
Evaluation runs continuously and asynchronously, decoupled from ingestion. The flow for a single metric point is:
metric point ─▶ load active rules for this metric
─▶ load current baseline (dynamic rules only)
─▶ compute the threshold band for this point's time
─▶ is the value outside the band in the watched direction?
yes ─▶ write a VIOLATION (point-level, forensic)
no ─▶ nothing
A violation is a single point-level breach. It records the actual value, the expected value and thresholds at that moment, the direction, and a baseline-confidence indicator. Violations are the raw, forensic layer — one row per breaching point. They are not alerts; many violations may roll up into one alert, and an alert is what an operator actually reacts to.
Violation writes are idempotent per (rule, metric, timestamp), so re-processing a
point never double-counts it.
Why evaluation does not re-query the source
Production evaluation uses the value that was ingested, not a fresh read from any source adapter. This keeps evaluation fast, deterministic, and independent of source availability: a momentarily slow database cannot stall or distort detection.
Trigger logic: when violations become an alert
A handful of scattered violations usually aren't worth waking someone for. A rule defines a trigger condition so that an alert opens only when a breach is sustained or repeated:
- Trigger violation count — how many violations must occur…
- …within the trigger/violation window (minutes) — …for an alert to open.
For example, "open an alert when there are 5 violations within 10 minutes." This is what separates a transient spike from a real problem, and it is the primary lever against alert fatigue.
Recovery logic: when an alert closes
An alert does not close on the first good data point — that would cause flapping. Instead a rule defines a recovery condition:
- Recovery normal count — how many consecutive normal points are required…
- …within the recovery window (minutes) — …before the alert is considered recovered.
Only when the metric stays clean for the configured recovery window does the alert transition to closed.
Fleet alert levels: Warning and Critical
Single-value rules use one rule-level severity: Warning or Critical. Fleet
(multidimensional) rules can carry two alert levels, evaluated independently:
| Level field | Meaning |
|---|---|
severity | Warning or Critical. |
operator | Comparison direction for this level (Above, Below, Both). |
thresholdValue | The level's threshold. Its meaning depends on the rule type (see below). |
durationMinutes | How long the breach must persist for this level. |
triggerViolationCount | How many violations trigger this level. |
isEnabled | Whether this level is active. |
The meaning of thresholdValue depends on the threshold strategy:
- Static rule —
thresholdValueis the literal comparison threshold (e.g.90). - Dynamic rule —
thresholdValueis interpreted by the mode: it is the tolerance percent forPercentageandEnvelopemodes, and the standard-deviation multiplier forStddevmode.
For fleet rules, Critical is evaluated before Warning. A point that breaches Critical is attributed to Critical, not double-counted as Warning. In the current model a Critical level must use the same operator as Warning, have a duration ≥ Warning, and be stricter than Warning — so Critical is always the more serious, harder-to-trip level.
Previewing a rule before saving
Before committing a rule you can preview it: Lumetry replays the chosen metric over a historical time range, computes the threshold band it would have used, and returns a series of points each marked as normal, violating, or alarm-active. This lets you tune tolerance and trigger settings against real history and see how noisy or quiet the rule will be before it ever pages anyone.
For static previews, that threshold band is the configured staticLower / staticUpper
range. Static preview points do not carry a baseline expected value.
Preview reads values from metric history by metric key. A catalog-only metric with no available history cannot be previewed until data has been ingested.
What evaluation produces
| Output | Layer | Audience |
|---|---|---|
| Violation | Raw, point-level, forensic | Investigation / audit |
| Alert (operational) | Aggregated lifecycle (open → ack → close) | On-call operator |
| Incident | Correlated group of alerts | Incident response |
The next concept page follows that chain end to end: Violations, Alerts & Incidents.