PIIScout — Database PII Column Scanner
Walks 38 columns across an e-commerce schema and classifies each as direct identifier, sensitive PI, PHI, cardholder data, or quasi-identifier. Maps to GDPR Art 4(1), HIPAA 45 CFR 164.514, PCI-DSS v4.0 Req 3, CCPA. Proposes minimization: drop, tokenize, pseudonymize, generalize, truncate.
What it is
The shape behind data-discovery tools (BigID, Securiti, OneTrust DataDiscovery). For every column in a schema: a category, a recommended minimization action, the regulation that drove the call, and the actual migration SQL.
What’s in it
- 38 seeded columns across 10 realistic tables (customers, addresses, payments, orders, support_tickets, marketing_sends, audit_log, analytics_events, health_records, kyc_verifications).
- 6 PII categories — direct identifier, sensitive PI, PHI (HIPAA), cardholder data (PCI), quasi-identifier, non-PII.
- 6 minimization actions with real migration SQL templates:
- drop —
ALTER TABLE … DROP COLUMN(for prohibited storage like CVV) - tokenize — replace with non-reversible token; original to PCI/KYC vault
- pseudonymize — replace with
customer_idFK - generalize — reduce precision (full ZIP → first 3, full DOB → year, GPS 7-decimal → 3-decimal)
- truncate — retention-bounded (IP → /16 after 30d)
- keep — retain at source with access controls
- drop —
- Real regulatory citations on every column — GDPR Art 4(1), HIPAA Safe Harbor (18 identifiers, 45 CFR 164.514), PCI-DSS v4.0 Req 3.3.1 / 3.4 / 3.5, CCPA 1798.140, GLBA Safeguards Rule, NIST SP 800-122.
- Critical findings surfaced — full PAN stored in
payments.card_number(PCI-DSS Req 3.5 violation), CVV stored (Req 3.3.1 — never permitted), full GPS lat/long at 7-decimal precision, full DOB stored (re-identifies 87% of US population per Sweeney 2000).
Why this shape
GDPR Art 5(1)(c) data minimization, HIPAA Safe Harbor de-identification, PCI-DSS Req 3 cardholder-data storage rules all share the same shape: walk every column, classify, decide minimization. PIIScout prototypes that workflow directly. The tool ships with the migration SQL because the gap between “this column is PII” and “we actually removed it” is where programs die.
How it ships
Single HTML file, ~29KB. Zero dependencies. 38 columns + 6 categories + 6 actions + SQL templating in 280 lines of vanilla JavaScript.