AlertNoiseAudit — Alert-Fatigue Inventory
38 alert rules with 30-day firing count, page count, actionable %, MTTR, alert-fatigue score, recommended action (keep / tune / demote / silence-or-tune / add-runbook / remove). Surfaces 4 noise leaders eating the on-call budget, 1 critical paging alert without a runbook, and 1 alert silenced 90 days for a deprecated service.
What it is
The shape behind every healthy on-call program. Most teams add alerts for years and never audit. Then a major incident teaches them the cost of noise — and they spend a quarter doing this audit reactively. AlertNoiseAudit prototypes the proactive version.
What’s in it
- 38 alert rules across Platform / SRE / Security / Compliance / Data / ML / Growth / Performance teams + 3 channels (page / slack / jira).
- Per-alert shape:
- 30-day fired count
- 30-day paged count
- Actionable % (paging alerts that resulted in real action)
- MTTR median
- Runbook present?
- Silenced days
- Alert-fatigue score (0-100)
- Recommendation: keep / tune / demote / silence-or-tune / add-runbook / remove
- Worst-offender findings:
- AL-019 HostUp probe — 88 fires/30d, 42 pages, 18% actionable. NOISE LEADER. Promote to require 5m sustained + 2 probes.
- AL-013 Bot datacenter ASN block — 420k fires, 0.42% FP volume — high enough to bury other signal
- AL-018 Datadog APM error spike — 32% actionable, 68% noise. Tune anomaly model.
- AL-023 Mongo legacy-svc — silenced 90 days for a deprecated service. Delete the alert.
- AL-038 Synthetic monitor — CRITICAL paging alert WITHOUT RUNBOOK. Add before next on-call rotation.
- On-call budget gauge — weighted sum of pages/slack/jira firings shows total team load against a healthy budget (8 pages/wk per engineer).
Why this shape
Google SRE Workbook ch. 5-6 introduced the canonical pattern: every alert must be actionable, must be paging-worthy, and must have a runbook. PagerDuty’s annual Incident Response Reports show that the median company has ~30% noisy alerts that produce nothing actionable. SOC2 CC7.3 requires “the entity identifies, develops, and implements activities to recover from identified events” — and a paging alert without a runbook fails that test. AlertNoiseAudit prototypes the audit that drives the cleanup.
How it ships
Single HTML file, ~17KB. Zero dependencies. 38 alerts × fatigue-score + recommendation engine in 200 lines of vanilla JavaScript.