ChaosScore — BCP/DR Resilience Maturity
Scores 18 production services across 6 resilience dimensions: RTO actual vs target, RPO actual vs target, last successful failover, runbook freshness, blast-radius isolation, on-call coverage. Composite drives the next chaos test. SOC2 CC7.5 / CC9.1 + ISO 22301 shape.
What it is
The shape behind a BCP/DR maturity program. Every service gets a 6-dimension scorecard, the scorecard composes to a maturity level (mature / developing / initial), and the weakest dimension drives the next gameday.
What’s in it
- 18 seeded services — checkout-api, payments-worker, auth-svc, product-catalog, cart-svc, inventory-svc (never failed over), order-history, notifications-svc, support-portal, analytics-pipeline, admin-tools (runbook 12 months old), reporting-svc, image-cdn, search-svc, ml-recommendation, billing-batch (critical gap — never failed over, monthly job, runbook 13 months old), webhook-receiver, data-warehouse.
- 3 tiers — tier-0 (critical path: checkout, payments, auth, webhooks), tier-1 (degraded service if down), tier-2 (internal / async).
- 6 dimensions, each scored 0-5:
- RTO actual vs target —
rtoActual / rtoTargetratio. 60-85% target hit = 4/5, never tested = 0/5. - RPO actual vs target — synchronous/source-of-truth columns score 5; never measured = 0.
- Last successful failover — under 30 days = 5; never = 0 (most common SOC2 finding).
- Runbook freshness — under 30 days = 5; over 1 year = 1 (older than the last refactor).
- Blast-radius isolation — cell-based = 5; SPOF = 1.
- On-call coverage — 24/7 follow-the-sun + secondary = 5; no on-call = 1.
- RTO actual vs target —
- Composite math — sum of 6 sub-scores (0-30). 26+ = mature, 18-25 = developing, under 18 = initial.
- Recommended next test — the tool reads the weakest dimension and recommends the specific drill: game-day, replication-lag drill, regional failover gameday, runbook walkthrough, cell-isolation drill, page-out drill.
Why this shape
SOC2 CC7.5 (“the entity identifies, develops, and implements activities to recover from identified events”) + CC9.1 (risk-mitigation activities) + ISO 22301 §8.4 (business-continuity strategies) all measure the same thing: not whether you wrote a DR plan, but whether you ever exercised it. The 4 most common audit findings: “no documented failover”, “RTO/RPO targets not measured”, “runbook stale”, “BC/DR never tested”. ChaosScore prototypes the program that surfaces those gaps.
How it ships
Single HTML file, ~23KB. Zero dependencies. 18 services × 6 dimensions × composite math + recommendation engine in 280 lines of vanilla JavaScript.