DLP Audit: Skip Code

Methodology: Skip Code

DLP (Data Loss Prevention) patterns are well-defined regex: SSNs, emails, phone numbers, credit cards, credentials. They work bidirectionally:

Offense (Redaction Resolution): Find unredacted PII near [REDACTED] markers. An SSN found 200 characters from a blacked-out name tells you who was redacted. Cross-reference that SSN against other documents where the name IS visible. Resolution.

Defense (Our DLP): Find PII our search API is currently serving. If someone's SSN is in our index, we mask it before returning results. Protect victims.

Confidence scoring: Same PII in unredacted doc with name (+30), matches across 2+ docs (+20), cross-index hit (+15), same date/location (+10), same doc_type (+5), co-occurrence (+5). Cap: 95% (epistemic humility).

False positive control: SSN pattern (\d{3}-\d{2}-\d{4}) also matches phone numbers and reference numbers. Context-dependent filtering: require "SSN", "social security", or "tax id" in surrounding text. Deny matches near "phone", "case no", "docket".

PII Pattern Distribution

DLP Findings Browser

Resolution Candidates (Redacted Names Identified)

Methodology: Skip Code