Alerts, Violations & Incidents API

These endpoints expose the operational signal: raw violations, operational alerts, and correlated incidents. The model behind them is in Violations, Alerts & Incidents.

Permissions: reads require CanViewAlerts. Lifecycle actions (acknowledge/close/resolve) require operator permission.

Identifiers: alerts, violations, and incidents use integer id. metricId filters and fields are the string metric key and are nullable for external alerts.

Alerts (operational)

/api/alerts is the operational alert contract. /api/events is a backward-compatible alias with the same shapes.

`GET /api/alerts`

GET /api/alerts?metricId=&ruleId=&severity=&status=&incidentId=&from=&to=&limit=500

All query parameters are optional filters.

Param	Filters by
`metricId`	Metric key.
`ruleId`	Rule integer ID. Some detector-driven alerts may not have a rule ID.
`severity`	`Warning` / `Critical`.
`status`	Operational status (`Open`, `Acknowledged`, `Closed`).
`incidentId`	Alerts belonging to one incident.
`from` / `to`	Trigger-time range.
`limit`	Max rows (default 500).

Response — array of alerts:

[
  {
    "id": 9012,
    "ruleId": 17,
    "metricId": "database.health",
    "eventStartTime": "2026-06-01T08:40:00Z",
    "eventEndTime": null,
    "status": "Open",
    "operationalStatus": "Open",
    "severity": "Warning",
    "triggerReason": "11 violations within 10 minutes",
    "violationCount": 11,
    "alertName": "High impact metric event",
    "metricDisplayName": "Database Health",
    "relatedServiceId": null,
    "relatedServiceName": null,
    "relatedCiId": 51,
    "relatedCiName": "ANVMBWEBWX01",
    "dimensionKey": "host.id=srv-123|mount=/var",
    "incidentId": null,
    "acknowledgedAt": null,
    "alertLevelId": 3,
    "durationSeconds": null,
    "incidentEligibility": null,
    "postResolutionWatchSince": null,
    "lastResolvedIncidentId": null
  }
]

Field	Meaning
`operationalStatus`	Lifecycle: `Open`, `Acknowledged`, or `Closed`.
`severity`	`Warning` / `Critical`.
`triggerReason`	Why the alert opened (the trigger condition that was met).
`violationCount`	Number of underlying violations aggregated.
`alertName` / `metricDisplayName`	Human labels for the alert and its metric.
`dimensionKey`	The multidimensional member that opened the alert, when applicable, formatted as `k=v
`relatedServiceId/Name`, `relatedCiId/Name`	Topology context — the service and CI this alert affects.
`incidentId`	The incident this alert was correlated into, if any.
`eventStartTime` / `eventEndTime` / `durationSeconds`	When it opened, closed, and for how long.
`alertLevelId`	The rule alert level that fired (Warning/Critical level).
`incidentEligibility` / `postResolutionWatchSince`	`PostResolutionWatch` when the alert stayed active after its linked incident was resolved, with the time it entered that watch state; `MaintenanceSuppressed` when active maintenance suppresses incident generation and notifications.
`lastResolvedIncidentId`	The resolved incident that placed this alert into post-resolution watch, when applicable.

Rule-based alerts carry ruleId and may carry alertLevelId. Detector-driven alerts such as multidimensional peer deviations can have ruleId: null; use metricId, dimensionKey, triggerReason, topology context, and incident linkage to group and display them. Multiple dimension-level alerts, for example several mounts on one host, can be correlated into the same incident while remaining separate alert rows.

sourceKind distinguishes MetricRule, FleetDetector, and External. External alerts have nullable metric/rule fields and may include externalAlertSourceId, externalAlertSourceName, externalAlertDefinitionId, externalDefinitionKey, externalInstanceKey, externalOccurrenceKey, topologyNodeId, title, and description. They can be acknowledged locally, but POST /api/alerts/{id}/close is rejected because recovery is owned by the external source.

`GET /api/alerts/{id}/violations`

The raw violations aggregated by one alert. For multidimensional alerts, the response is limited to that alert's dimensionKey.

`POST /api/alerts/{id}/acknowledge` · `POST /api/alerts/{id}/close`

Transition an alert's lifecycle. Acknowledge marks ownership; close ends it. (/api/events/{id}/acknowledge and /api/events/{id}/close are equivalent aliases.)

Violations (raw)

`GET /api/violations`

Point-level forensic breaches. Supports metric, rule, severity, and date filters. For detector-driven violations, ruleId can be null.

[
  {
    "id": 55001,
    "ruleId": 17,
    "metricId": "database.health",
    "timestamp": "2026-06-01T08:39:00Z",
    "actualValue": 0,
    "expectedValue": 1,
    "lowerThreshold": 1,
    "upperThreshold": 1,
    "direction": "Below",
    "severity": "Warning",
    "ruleName": "Database health",
    "metricDisplayName": "Database Health",
    "deviationPercent": -100,
    "baselineConfidence": "High",
    "detectedAt": "2026-06-01T08:39:02Z",
    "alertLevelId": 3,
    "dimensionKey": "host.id=srv-123|mount=/var"
  }
]

Field	Meaning
`actualValue`	The observed value that breached.
`expectedValue`	Baseline expectation at that time (dynamic rules).
`lowerThreshold` / `upperThreshold`	The band in force at that moment.
`direction`	`Above` / `Below` / `Both`.
`deviationPercent`	How far the value was from expectation.
`baselineConfidence`	Confidence in the baseline used.
`detectedAt`	When evaluation recorded the breach.
`dimensionKey`	The multidimensional member that breached, when applicable.

Incidents (correlated)

Incidents group related alerts into one operational problem. They do not replace the alert list.

`GET /api/incidents`

GET /api/incidents?status=&severity=&serviceNodeId=&ciNodeId=&from=&to=&limit=500

[
  {
    "id": 1,
    "incidentKey": "INC-000001",
    "primaryServiceNodeId": null,
    "primaryServiceName": null,
    "primaryCiNodeId": null,
    "primaryCiName": null,
    "primaryMetricId": "demo.http.requests.by_instance.per2m",
    "severity": "Critical",
    "status": "Open",
    "startTime": "2026-06-01T23:23:00Z",
    "endTime": null,
    "durationSeconds": 47460,
    "alertCount": 1,
    "activeAlertCount": 1,
    "violationCount": 1,
    "acknowledgedAt": null,
    "resolvedAt": null,
    "resolutionClassification": null,
    "activeAlertCountAtResolution": null,
    "reopenCount": 0,
    "lastReopenedAt": null,
    "followUpOfIncidentId": null
  }
]

Field	Meaning
`incidentKey`	Stable human identifier (`INC-000001`).
`primaryServiceName` / `primaryCiName` / `primaryMetricId`	What the incident centers on (service/CI, falling back to metric).
`severity` / `status`	`Open` → `Acknowledged` → `Resolved`.
`startTime` / `endTime` / `durationSeconds`	Span of the incident.
`alertCount` / `activeAlertCount` / `violationCount`	How many alerts it spans, how many linked alerts are currently active, and how many underlying violations it spans.
`acknowledgedByUserName` / `acknowledgeNote`	Who acknowledged the incident and the optional note they left.
`resolvedByUserName` / `resolveNote`	Who resolved the incident and the resolution note they recorded.
`resolutionClassification`	Structured resolution reason: `Resolved`, `NoImpact`, `FalsePositive`, `Duplicate`, `Maintenance`, `AcceptedRisk`, or `Transferred`.
`activeAlertCountAtResolution`	Snapshot of linked alerts that were still active when the incident was resolved.
`reopenCount` / `lastReopenedAt` / `reopenReason`	Reopen metadata when a persistent alert causes a resolved incident to become active again.
`followUpOfIncidentId`	The earlier resolved incident this incident follows up, when persistence creates a new coordination record.

Incident detail & children

Method & path	Returns
`GET /api/incidents/{id}`	The incident.
`GET /api/incidents/{id}/alerts`	Its correlated alerts.
`GET /api/incidents/{id}/violations`	All underlying violations.
`GET /api/incidents/{id}/metrics`	Affected metrics (`metricId`, `metricDisplayName`, `unit`).
`GET /api/incidents/{id}/timeline`	Lifecycle/correlation events.

A timeline entry:

{ "id": 7, "incidentId": 1, "entryType": "AlertCorrelated", "sourceType": "Alert", "sourceId": 9012, "message": "Alert 9012 correlated into incident", "createdAt": "2026-06-01T23:23:01Z" }

`POST /api/incidents/{id}/acknowledge` · `POST /api/incidents/{id}/resolve`

Transition an incident's lifecycle. Both record the acting operator as the acknowledger/resolver.

acknowledge accepts an optional note:

{ "note": "Investigating, paging the on-call DBA." }

resolve requires a note and a resolution classification. A blank note or invalid classification is rejected with 400:

{
  "note": "Restarted the stuck replica; lag is back to baseline.",
  "resolutionClassification": "Resolved"
}

Incidents can be resolved while linked alerts remain active. The alert state is not changed by incident resolution; the incident records the active-alert count at resolution for audit and follow-up. Still-active linked alerts are marked as PostResolutionWatch until they close. If the condition persists, Lumetry can reopen the resolved incident within the configured reopen horizon or create a follow-up incident after the persistence threshold.

Operational support

Maintenance windows (operational suppression)

Maintenance windows suppress incident generation and alert notifications while keeping alerts and existing incidents separate from the maintenance workflow.

Method & path	Purpose
`GET /api/maintenance-windows`	List maintenance windows.
`POST /api/maintenance-windows`	Create a global, metric, service, or CI scoped window.
`DELETE /api/maintenance-windows/{id}`	Remove a window.

Create body

{
  "scopeType": "Service",
  "scopeNodeId": 42,
  "metricId": null,
  "startTime": "2026-06-01T22:00:00Z",
  "endTime": "2026-06-01T23:00:00Z",
  "reason": "Planned service maintenance"
}

Alerts covered by a maintenance window expose incidentEligibility: "MaintenanceSuppressed". Maintenance does not automatically close alerts or resolve incidents.

Problem candidates

Problem candidates are read-only hints for problem management follow-up. They are derived from persistent post-resolution watches, reopened incidents, and repeated non-impact resolution classifications.

Method & path	Purpose
`GET /api/problem-candidates?status=&type=&limit=500`	List open or dismissed candidates.

Evaluation jobs (troubleshooting the queue)

Method & path	Purpose
`GET /api/evaluation-jobs/summary`	Queue health summary (counts by status).
`GET /api/evaluation-jobs/failed?limit=100`	Terminally failed jobs with `last_error`, for poison-job diagnosis.

Incident windows (baseline hygiene)

Operator-declared periods excluded from baseline learning — not part of the operational incident model above.

Method & path	Purpose
`GET /api/incident-windows`	List windows.
`POST /api/incident-windows`	Create a window.
`DELETE /api/incident-windows/{id}`	Remove a window.

Public holidays

Public-holiday records are another baseline-hygiene input. They help exclude predictable calendar anomalies from baseline learning.

Method & path	Purpose
`GET /api/public-holidays`	List public holidays.
`POST /api/public-holidays`	Add a public holiday.
`DELETE /api/public-holidays/{id}`	Remove a public holiday.

Create body

{
  "metricId": null,
  "startTime": "2026-06-01T22:00:00Z",
  "endTime": "2026-06-01T23:00:00Z",
  "reason": "Planned maintenance"
}

Field	Meaning
`metricId`	Specific metric key, or `null` for a global window.
`startTime` / `endTime`	The excluded period.
`reason`	Why the period is excluded.

The response includes createdBy as the authenticated user's numeric ID, or null for a system-created record. Clients cannot supply this audit field.

Alerts (operational)​

GET /api/alerts​

GET /api/alerts/{id}/violations​

POST /api/alerts/{id}/acknowledge · POST /api/alerts/{id}/close​

Violations (raw)​

GET /api/violations​

Incidents (correlated)​

GET /api/incidents​

Incident detail & children​

POST /api/incidents/{id}/acknowledge · POST /api/incidents/{id}/resolve​

Operational support​

Maintenance windows (operational suppression)​

Problem candidates​

Evaluation jobs (troubleshooting the queue)​

Incident windows (baseline hygiene)​

Public holidays​