Alerts, Violations & Incidents API
These endpoints expose the operational signal: raw violations, operational alerts, and correlated incidents. The model behind them is in Violations, Alerts & Incidents.
Permissions: reads require CanViewAlerts. Lifecycle actions
(acknowledge/close/resolve) require operator permission.
Identifiers: alerts, violations, and incidents use integer id. metricId filters and
fields are the string metric key and are nullable for external alerts.
Alerts (operational)
/api/alerts is the operational alert contract. /api/events is a backward-compatible
alias with the same shapes.
GET /api/alerts
GET /api/alerts?metricId=&ruleId=&severity=&status=&incidentId=&from=&to=&limit=500
All query parameters are optional filters.
| Param | Filters by |
|---|---|
metricId | Metric key. |
ruleId | Rule integer ID. Some detector-driven alerts may not have a rule ID. |
severity | Warning / Critical. |
status | Operational status (Open, Acknowledged, Closed). |
incidentId | Alerts belonging to one incident. |
from / to | Trigger-time range. |
limit | Max rows (default 500). |
Response — array of alerts:
[
{
"id": 9012,
"ruleId": 17,
"metricId": "database.health",
"eventStartTime": "2026-06-01T08:40:00Z",
"eventEndTime": null,
"status": "Open",
"operationalStatus": "Open",
"severity": "Warning",
"triggerReason": "11 violations within 10 minutes",
"violationCount": 11,
"alertName": "High impact metric event",
"metricDisplayName": "Database Health",
"relatedServiceId": null,
"relatedServiceName": null,
"relatedCiId": 51,
"relatedCiName": "ANVMBWEBWX01",
"dimensionKey": "host.id=srv-123|mount=/var",
"incidentId": null,
"acknowledgedAt": null,
"alertLevelId": 3,
"durationSeconds": null,
"incidentEligibility": null,
"postResolutionWatchSince": null,
"lastResolvedIncidentId": null
}
]
| Field | Meaning |
|---|---|
operationalStatus | Lifecycle: Open, Acknowledged, or Closed. |
severity | Warning / Critical. |
triggerReason | Why the alert opened (the trigger condition that was met). |
violationCount | Number of underlying violations aggregated. |
alertName / metricDisplayName | Human labels for the alert and its metric. |
dimensionKey | The multidimensional member that opened the alert, when applicable, formatted as `k=v |
relatedServiceId/Name, relatedCiId/Name | Topology context — the service and CI this alert affects. |
incidentId | The incident this alert was correlated into, if any. |
eventStartTime / eventEndTime / durationSeconds | When it opened, closed, and for how long. |
alertLevelId | The rule alert level that fired (Warning/Critical level). |
incidentEligibility / postResolutionWatchSince | PostResolutionWatch when the alert stayed active after its linked incident was resolved, with the time it entered that watch state; MaintenanceSuppressed when active maintenance suppresses incident generation and notifications. |
lastResolvedIncidentId | The resolved incident that placed this alert into post-resolution watch, when applicable. |
Rule-based alerts carry ruleId and may carry alertLevelId. Detector-driven alerts
such as multidimensional peer deviations can have ruleId: null; use metricId,
dimensionKey, triggerReason, topology context, and incident linkage to group and
display them. Multiple dimension-level alerts, for example several mounts on one host,
can be correlated into the same incident while remaining separate alert rows.
sourceKind distinguishes MetricRule, FleetDetector, and External. External alerts
have nullable metric/rule fields and may include externalAlertSourceId,
externalAlertSourceName, externalAlertDefinitionId, externalDefinitionKey,
externalInstanceKey, externalOccurrenceKey, topologyNodeId, title, and
description. They can be acknowledged locally, but POST /api/alerts/{id}/close is
rejected because recovery is owned by the external source.
GET /api/alerts/{id}/violations
The raw violations aggregated by one alert. For multidimensional alerts, the response is
limited to that alert's dimensionKey.
POST /api/alerts/{id}/acknowledge · POST /api/alerts/{id}/close
Transition an alert's lifecycle. Acknowledge marks ownership; close ends it. (/api/events/{id}/acknowledge
and /api/events/{id}/close are equivalent aliases.)
Violations (raw)
GET /api/violations
Point-level forensic breaches. Supports metric, rule, severity, and date filters. For
detector-driven violations, ruleId can be null.
[
{
"id": 55001,
"ruleId": 17,
"metricId": "database.health",
"timestamp": "2026-06-01T08:39:00Z",
"actualValue": 0,
"expectedValue": 1,
"lowerThreshold": 1,
"upperThreshold": 1,
"direction": "Below",
"severity": "Warning",
"ruleName": "Database health",
"metricDisplayName": "Database Health",
"deviationPercent": -100,
"baselineConfidence": "High",
"detectedAt": "2026-06-01T08:39:02Z",
"alertLevelId": 3,
"dimensionKey": "host.id=srv-123|mount=/var"
}
]
| Field | Meaning |
|---|---|
actualValue | The observed value that breached. |
expectedValue | Baseline expectation at that time (dynamic rules). |
lowerThreshold / upperThreshold | The band in force at that moment. |
direction | Above / Below / Both. |
deviationPercent | How far the value was from expectation. |
baselineConfidence | Confidence in the baseline used. |
detectedAt | When evaluation recorded the breach. |
dimensionKey | The multidimensional member that breached, when applicable. |
Incidents (correlated)
Incidents group related alerts into one operational problem. They do not replace the alert list.
GET /api/incidents
GET /api/incidents?status=&severity=&serviceNodeId=&ciNodeId=&from=&to=&limit=500
[
{
"id": 1,
"incidentKey": "INC-000001",
"primaryServiceNodeId": null,
"primaryServiceName": null,
"primaryCiNodeId": null,
"primaryCiName": null,
"primaryMetricId": "demo.http.requests.by_instance.per2m",
"severity": "Critical",
"status": "Open",
"startTime": "2026-06-01T23:23:00Z",
"endTime": null,
"durationSeconds": 47460,
"alertCount": 1,
"activeAlertCount": 1,
"violationCount": 1,
"acknowledgedAt": null,
"resolvedAt": null,
"resolutionClassification": null,
"activeAlertCountAtResolution": null,
"reopenCount": 0,
"lastReopenedAt": null,
"followUpOfIncidentId": null
}
]
| Field | Meaning |
|---|---|
incidentKey | Stable human identifier (INC-000001). |
primaryServiceName / primaryCiName / primaryMetricId | What the incident centers on (service/CI, falling back to metric). |
severity / status | Open → Acknowledged → Resolved. |
startTime / endTime / durationSeconds | Span of the incident. |
alertCount / activeAlertCount / violationCount | How many alerts it spans, how many linked alerts are currently active, and how many underlying violations it spans. |
acknowledgedByUserName / acknowledgeNote | Who acknowledged the incident and the optional note they left. |
resolvedByUserName / resolveNote | Who resolved the incident and the resolution note they recorded. |
resolutionClassification | Structured resolution reason: Resolved, NoImpact, FalsePositive, Duplicate, Maintenance, AcceptedRisk, or Transferred. |
activeAlertCountAtResolution | Snapshot of linked alerts that were still active when the incident was resolved. |
reopenCount / lastReopenedAt / reopenReason | Reopen metadata when a persistent alert causes a resolved incident to become active again. |
followUpOfIncidentId | The earlier resolved incident this incident follows up, when persistence creates a new coordination record. |
Incident detail & children
| Method & path | Returns |
|---|---|
GET /api/incidents/{id} | The incident. |
GET /api/incidents/{id}/alerts | Its correlated alerts. |
GET /api/incidents/{id}/violations | All underlying violations. |
GET /api/incidents/{id}/metrics | Affected metrics (metricId, metricDisplayName, unit). |
GET /api/incidents/{id}/timeline | Lifecycle/correlation events. |
A timeline entry:
{ "id": 7, "incidentId": 1, "entryType": "AlertCorrelated", "sourceType": "Alert", "sourceId": 9012, "message": "Alert 9012 correlated into incident", "createdAt": "2026-06-01T23:23:01Z" }
POST /api/incidents/{id}/acknowledge · POST /api/incidents/{id}/resolve
Transition an incident's lifecycle. Both record the acting operator as the acknowledger/resolver.
acknowledge accepts an optional note:
{ "note": "Investigating, paging the on-call DBA." }
resolve requires a note and a resolution classification. A blank note or invalid
classification is rejected with 400:
{
"note": "Restarted the stuck replica; lag is back to baseline.",
"resolutionClassification": "Resolved"
}
Incidents can be resolved while linked alerts remain active. The alert state is not changed
by incident resolution; the incident records the active-alert count at resolution for audit
and follow-up. Still-active linked alerts are marked as PostResolutionWatch until they
close. If the condition persists, Lumetry can reopen the resolved incident within the
configured reopen horizon or create a follow-up incident after the persistence threshold.
Operational support
Maintenance windows (operational suppression)
Maintenance windows suppress incident generation and alert notifications while keeping alerts and existing incidents separate from the maintenance workflow.
| Method & path | Purpose |
|---|---|
GET /api/maintenance-windows | List maintenance windows. |
POST /api/maintenance-windows | Create a global, metric, service, or CI scoped window. |
DELETE /api/maintenance-windows/{id} | Remove a window. |
Create body
{
"scopeType": "Service",
"scopeNodeId": 42,
"metricId": null,
"startTime": "2026-06-01T22:00:00Z",
"endTime": "2026-06-01T23:00:00Z",
"reason": "Planned service maintenance"
}
Alerts covered by a maintenance window expose
incidentEligibility: "MaintenanceSuppressed". Maintenance does not automatically close
alerts or resolve incidents.
Problem candidates
Problem candidates are read-only hints for problem management follow-up. They are derived from persistent post-resolution watches, reopened incidents, and repeated non-impact resolution classifications.
| Method & path | Purpose |
|---|---|
GET /api/problem-candidates?status=&type=&limit=500 | List open or dismissed candidates. |
Evaluation jobs (troubleshooting the queue)
| Method & path | Purpose |
|---|---|
GET /api/evaluation-jobs/summary | Queue health summary (counts by status). |
GET /api/evaluation-jobs/failed?limit=100 | Terminally failed jobs with last_error, for poison-job diagnosis. |
Incident windows (baseline hygiene)
Operator-declared periods excluded from baseline learning — not part of the operational incident model above.
| Method & path | Purpose |
|---|---|
GET /api/incident-windows | List windows. |
POST /api/incident-windows | Create a window. |
DELETE /api/incident-windows/{id} | Remove a window. |
Public holidays
Public-holiday records are another baseline-hygiene input. They help exclude predictable calendar anomalies from baseline learning.
| Method & path | Purpose |
|---|---|
GET /api/public-holidays | List public holidays. |
POST /api/public-holidays | Add a public holiday. |
DELETE /api/public-holidays/{id} | Remove a public holiday. |
Create body
{
"metricId": null,
"startTime": "2026-06-01T22:00:00Z",
"endTime": "2026-06-01T23:00:00Z",
"reason": "Planned maintenance"
}
| Field | Meaning |
|---|---|
metricId | Specific metric key, or null for a global window. |
startTime / endTime | The excluded period. |
reason | Why the period is excluded. |
The response includes createdBy as the authenticated user's numeric ID, or null for a
system-created record. Clients cannot supply this audit field.