Skip to main content

Alerts, Violations & Incidents API

These endpoints expose the operational signal: raw violations, operational alerts, and correlated incidents. The model behind them is in Violations, Alerts & Incidents.

Permissions: reads require CanViewAlerts. Lifecycle actions (acknowledge/close/resolve) require operator permission.

Identifiers: alerts, violations, and incidents use integer id. metricId filters and fields are the string metric key and are nullable for external alerts.


Alerts (operational)

/api/alerts is the operational alert contract. /api/events is a backward-compatible alias with the same shapes.

GET /api/alerts

GET /api/alerts?metricId=&ruleId=&severity=&status=&incidentId=&from=&to=&limit=500

All query parameters are optional filters.

ParamFilters by
metricIdMetric key.
ruleIdRule integer ID. Some detector-driven alerts may not have a rule ID.
severityWarning / Critical.
statusOperational status (Open, Acknowledged, Closed).
incidentIdAlerts belonging to one incident.
from / toTrigger-time range.
limitMax rows (default 500).

Response — array of alerts:

[
{
"id": 9012,
"ruleId": 17,
"metricId": "database.health",
"eventStartTime": "2026-06-01T08:40:00Z",
"eventEndTime": null,
"status": "Open",
"operationalStatus": "Open",
"severity": "Warning",
"triggerReason": "11 violations within 10 minutes",
"violationCount": 11,
"alertName": "High impact metric event",
"metricDisplayName": "Database Health",
"relatedServiceId": null,
"relatedServiceName": null,
"relatedCiId": 51,
"relatedCiName": "ANVMBWEBWX01",
"dimensionKey": "host.id=srv-123|mount=/var",
"incidentId": null,
"acknowledgedAt": null,
"alertLevelId": 3,
"durationSeconds": null,
"incidentEligibility": null,
"postResolutionWatchSince": null,
"lastResolvedIncidentId": null
}
]
FieldMeaning
operationalStatusLifecycle: Open, Acknowledged, or Closed.
severityWarning / Critical.
triggerReasonWhy the alert opened (the trigger condition that was met).
violationCountNumber of underlying violations aggregated.
alertName / metricDisplayNameHuman labels for the alert and its metric.
dimensionKeyThe multidimensional member that opened the alert, when applicable, formatted as `k=v
relatedServiceId/Name, relatedCiId/NameTopology context — the service and CI this alert affects.
incidentIdThe incident this alert was correlated into, if any.
eventStartTime / eventEndTime / durationSecondsWhen it opened, closed, and for how long.
alertLevelIdThe rule alert level that fired (Warning/Critical level).
incidentEligibility / postResolutionWatchSincePostResolutionWatch when the alert stayed active after its linked incident was resolved, with the time it entered that watch state; MaintenanceSuppressed when active maintenance suppresses incident generation and notifications.
lastResolvedIncidentIdThe resolved incident that placed this alert into post-resolution watch, when applicable.

Rule-based alerts carry ruleId and may carry alertLevelId. Detector-driven alerts such as multidimensional peer deviations can have ruleId: null; use metricId, dimensionKey, triggerReason, topology context, and incident linkage to group and display them. Multiple dimension-level alerts, for example several mounts on one host, can be correlated into the same incident while remaining separate alert rows.

sourceKind distinguishes MetricRule, FleetDetector, and External. External alerts have nullable metric/rule fields and may include externalAlertSourceId, externalAlertSourceName, externalAlertDefinitionId, externalDefinitionKey, externalInstanceKey, externalOccurrenceKey, topologyNodeId, title, and description. They can be acknowledged locally, but POST /api/alerts/{id}/close is rejected because recovery is owned by the external source.

GET /api/alerts/{id}/violations

The raw violations aggregated by one alert. For multidimensional alerts, the response is limited to that alert's dimensionKey.

POST /api/alerts/{id}/acknowledge · POST /api/alerts/{id}/close

Transition an alert's lifecycle. Acknowledge marks ownership; close ends it. (/api/events/{id}/acknowledge and /api/events/{id}/close are equivalent aliases.)


Violations (raw)

GET /api/violations

Point-level forensic breaches. Supports metric, rule, severity, and date filters. For detector-driven violations, ruleId can be null.

[
{
"id": 55001,
"ruleId": 17,
"metricId": "database.health",
"timestamp": "2026-06-01T08:39:00Z",
"actualValue": 0,
"expectedValue": 1,
"lowerThreshold": 1,
"upperThreshold": 1,
"direction": "Below",
"severity": "Warning",
"ruleName": "Database health",
"metricDisplayName": "Database Health",
"deviationPercent": -100,
"baselineConfidence": "High",
"detectedAt": "2026-06-01T08:39:02Z",
"alertLevelId": 3,
"dimensionKey": "host.id=srv-123|mount=/var"
}
]
FieldMeaning
actualValueThe observed value that breached.
expectedValueBaseline expectation at that time (dynamic rules).
lowerThreshold / upperThresholdThe band in force at that moment.
directionAbove / Below / Both.
deviationPercentHow far the value was from expectation.
baselineConfidenceConfidence in the baseline used.
detectedAtWhen evaluation recorded the breach.
dimensionKeyThe multidimensional member that breached, when applicable.

Incidents (correlated)

Incidents group related alerts into one operational problem. They do not replace the alert list.

GET /api/incidents

GET /api/incidents?status=&severity=&serviceNodeId=&ciNodeId=&from=&to=&limit=500
[
{
"id": 1,
"incidentKey": "INC-000001",
"primaryServiceNodeId": null,
"primaryServiceName": null,
"primaryCiNodeId": null,
"primaryCiName": null,
"primaryMetricId": "demo.http.requests.by_instance.per2m",
"severity": "Critical",
"status": "Open",
"startTime": "2026-06-01T23:23:00Z",
"endTime": null,
"durationSeconds": 47460,
"alertCount": 1,
"activeAlertCount": 1,
"violationCount": 1,
"acknowledgedAt": null,
"resolvedAt": null,
"resolutionClassification": null,
"activeAlertCountAtResolution": null,
"reopenCount": 0,
"lastReopenedAt": null,
"followUpOfIncidentId": null
}
]
FieldMeaning
incidentKeyStable human identifier (INC-000001).
primaryServiceName / primaryCiName / primaryMetricIdWhat the incident centers on (service/CI, falling back to metric).
severity / statusOpenAcknowledgedResolved.
startTime / endTime / durationSecondsSpan of the incident.
alertCount / activeAlertCount / violationCountHow many alerts it spans, how many linked alerts are currently active, and how many underlying violations it spans.
acknowledgedByUserName / acknowledgeNoteWho acknowledged the incident and the optional note they left.
resolvedByUserName / resolveNoteWho resolved the incident and the resolution note they recorded.
resolutionClassificationStructured resolution reason: Resolved, NoImpact, FalsePositive, Duplicate, Maintenance, AcceptedRisk, or Transferred.
activeAlertCountAtResolutionSnapshot of linked alerts that were still active when the incident was resolved.
reopenCount / lastReopenedAt / reopenReasonReopen metadata when a persistent alert causes a resolved incident to become active again.
followUpOfIncidentIdThe earlier resolved incident this incident follows up, when persistence creates a new coordination record.

Incident detail & children

Method & pathReturns
GET /api/incidents/{id}The incident.
GET /api/incidents/{id}/alertsIts correlated alerts.
GET /api/incidents/{id}/violationsAll underlying violations.
GET /api/incidents/{id}/metricsAffected metrics (metricId, metricDisplayName, unit).
GET /api/incidents/{id}/timelineLifecycle/correlation events.

A timeline entry:

{ "id": 7, "incidentId": 1, "entryType": "AlertCorrelated", "sourceType": "Alert", "sourceId": 9012, "message": "Alert 9012 correlated into incident", "createdAt": "2026-06-01T23:23:01Z" }

POST /api/incidents/{id}/acknowledge · POST /api/incidents/{id}/resolve

Transition an incident's lifecycle. Both record the acting operator as the acknowledger/resolver.

acknowledge accepts an optional note:

{ "note": "Investigating, paging the on-call DBA." }

resolve requires a note and a resolution classification. A blank note or invalid classification is rejected with 400:

{
"note": "Restarted the stuck replica; lag is back to baseline.",
"resolutionClassification": "Resolved"
}

Incidents can be resolved while linked alerts remain active. The alert state is not changed by incident resolution; the incident records the active-alert count at resolution for audit and follow-up. Still-active linked alerts are marked as PostResolutionWatch until they close. If the condition persists, Lumetry can reopen the resolved incident within the configured reopen horizon or create a follow-up incident after the persistence threshold.


Operational support

Maintenance windows (operational suppression)

Maintenance windows suppress incident generation and alert notifications while keeping alerts and existing incidents separate from the maintenance workflow.

Method & pathPurpose
GET /api/maintenance-windowsList maintenance windows.
POST /api/maintenance-windowsCreate a global, metric, service, or CI scoped window.
DELETE /api/maintenance-windows/{id}Remove a window.

Create body

{
"scopeType": "Service",
"scopeNodeId": 42,
"metricId": null,
"startTime": "2026-06-01T22:00:00Z",
"endTime": "2026-06-01T23:00:00Z",
"reason": "Planned service maintenance"
}

Alerts covered by a maintenance window expose incidentEligibility: "MaintenanceSuppressed". Maintenance does not automatically close alerts or resolve incidents.

Problem candidates

Problem candidates are read-only hints for problem management follow-up. They are derived from persistent post-resolution watches, reopened incidents, and repeated non-impact resolution classifications.

Method & pathPurpose
GET /api/problem-candidates?status=&type=&limit=500List open or dismissed candidates.

Evaluation jobs (troubleshooting the queue)

Method & pathPurpose
GET /api/evaluation-jobs/summaryQueue health summary (counts by status).
GET /api/evaluation-jobs/failed?limit=100Terminally failed jobs with last_error, for poison-job diagnosis.

Incident windows (baseline hygiene)

Operator-declared periods excluded from baseline learning — not part of the operational incident model above.

Method & pathPurpose
GET /api/incident-windowsList windows.
POST /api/incident-windowsCreate a window.
DELETE /api/incident-windows/{id}Remove a window.

Public holidays

Public-holiday records are another baseline-hygiene input. They help exclude predictable calendar anomalies from baseline learning.

Method & pathPurpose
GET /api/public-holidaysList public holidays.
POST /api/public-holidaysAdd a public holiday.
DELETE /api/public-holidays/{id}Remove a public holiday.

Create body

{
"metricId": null,
"startTime": "2026-06-01T22:00:00Z",
"endTime": "2026-06-01T23:00:00Z",
"reason": "Planned maintenance"
}
FieldMeaning
metricIdSpecific metric key, or null for a global window.
startTime / endTimeThe excluded period.
reasonWhy the period is excluded.

The response includes createdBy as the authenticated user's numeric ID, or null for a system-created record. Clients cannot supply this audit field.