Google SREPagerDuty IR ReportsSOC2 CC7.3Alert FatigueDeep Prototype

AlertNoiseAudit — Alert-Fatigue Inventory

38 alert rules with 30-day firing count, page count, actionable %, MTTR, alert-fatigue score, recommended action (keep / tune / demote / silence-or-tune / add-runbook / remove). Surfaces 4 noise leaders eating the on-call budget, 1 critical paging alert without a runbook, and 1 alert silenced 90 days for a deprecated service.

AlertNoiseAudit — Alert-Fatigue Inventory preview
Open live →

What it is

The shape behind every healthy on-call program. Most teams add alerts for years and never audit. Then a major incident teaches them the cost of noise — and they spend a quarter doing this audit reactively. AlertNoiseAudit prototypes the proactive version.

What’s in it

  • 38 alert rules across Platform / SRE / Security / Compliance / Data / ML / Growth / Performance teams + 3 channels (page / slack / jira).
  • Per-alert shape:
    • 30-day fired count
    • 30-day paged count
    • Actionable % (paging alerts that resulted in real action)
    • MTTR median
    • Runbook present?
    • Silenced days
    • Alert-fatigue score (0-100)
    • Recommendation: keep / tune / demote / silence-or-tune / add-runbook / remove
  • Worst-offender findings:
    • AL-019 HostUp probe — 88 fires/30d, 42 pages, 18% actionable. NOISE LEADER. Promote to require 5m sustained + 2 probes.
    • AL-013 Bot datacenter ASN block — 420k fires, 0.42% FP volume — high enough to bury other signal
    • AL-018 Datadog APM error spike — 32% actionable, 68% noise. Tune anomaly model.
    • AL-023 Mongo legacy-svc — silenced 90 days for a deprecated service. Delete the alert.
    • AL-038 Synthetic monitor — CRITICAL paging alert WITHOUT RUNBOOK. Add before next on-call rotation.
  • On-call budget gauge — weighted sum of pages/slack/jira firings shows total team load against a healthy budget (8 pages/wk per engineer).

Why this shape

Google SRE Workbook ch. 5-6 introduced the canonical pattern: every alert must be actionable, must be paging-worthy, and must have a runbook. PagerDuty’s annual Incident Response Reports show that the median company has ~30% noisy alerts that produce nothing actionable. SOC2 CC7.3 requires “the entity identifies, develops, and implements activities to recover from identified events” — and a paging alert without a runbook fails that test. AlertNoiseAudit prototypes the audit that drives the cleanup.

How it ships

Single HTML file, ~17KB. Zero dependencies. 38 alerts × fatigue-score + recommendation engine in 200 lines of vanilla JavaScript.

Open the tool →