2026-02-12 · dcode · security, privacy, pii

PII Detection for AI Agents: Keep Sensitive Data Off the Cloud

How Klawty detects email addresses, phone numbers, credit cards, and IBANs in agent task content — and routes to local models to keep PII off cloud APIs.

The problem nobody talks about

Your AI agent processes a client email. The email contains a phone number, an address, and a bank account IBAN. Your agent sends all of this to a cloud LLM for analysis. Congratulations — you just transmitted personally identifiable information to a third-party API, potentially violating GDPR, and definitely violating your client's trust.

This happens silently in every agent framework that doesn't have a privacy layer. The agent doesn't know what PII is. The LLM provider stores the data for training (unless you've opted out per their specific policy). And you have no audit trail of what was sent where.

The privacy router

Klawty's privacy router sits between the agent runtime and the LLM provider. Every outbound prompt is scanned before it leaves the machine.

The scanner runs four detection patterns:

const PII_PATTERNS = {
  email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
  phone: /(?:\+|00)[1-9]\d{1,14}|\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b/g,
  credit_card: /\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))\d{8,12}\b/g,  iban: /\b[A-Z]{2}\d{2}[A-Z0-9]{4,30}\b/g
};

The email pattern follows RFC 5322 (simplified). The phone pattern handles international formats with country codes and common US/EU formats. The credit card pattern validates prefix ranges for Visa, Mastercard, Amex, and Discover. The IBAN pattern matches the ISO 13616 structure — 2-letter country code, 2 check digits, then the BBAN.

The routing decision

When PII is detected, the action field in klawty-policy.yaml determines what happens:

route_local — The entire prompt is redirected to a local model running on Ollama. The PII never leaves your machine. The local model (typically Llama 3 or Mistral) handles the task. Slower, less capable, but the data stays on your hardware.

redact — PII is replaced with placeholders before sending to the cloud LLM. [email protected] becomes [EMAIL_1]. +352 621 123 456 becomes [PHONE_1]. The response is then de-redacted with the original values. The cloud LLM never sees the real data.

block — The task is rejected entirely. The agent receives an error: "Task contains PII and cannot be processed. Reassign to a human operator." Used for high-sensitivity environments.

The GDPR angle

Under GDPR Article 28, you're a data controller. Your AI agent is a data processor. Sending PII to a cloud API makes that API provider a sub-processor. You need a Data Processing Agreement with every LLM provider your agent talks to.

Or you can keep PII local. The privacy router makes this automatic. Your agents process client data without you having to think about which tasks contain PII and which don't.

EU AI Act

Starting August 2, 2026, the EU AI Act introduces additional requirements for AI systems processing personal data. Klawty's PII detection and local routing gives you a technical measure you can point to in your compliance documentation. It's not the whole answer — you'll still need proper risk assessment via something like ARCA — but it's a strong foundation.

The detection isn't perfect. Regex patterns miss edge cases. But catching 95% of PII automatically is infinitely better than catching 0%.