PII Redaction for LLM Applications: Protect User Data Before It Reaches the Model

The Hidden Privacy Risk in Every LLM Prompt

When you send a user's message to the OpenAI API, you're sending it to a third-party data processor. Under GDPR, this requires a Data Processing Agreement and appropriate safeguards. Under HIPAA, sending protected health information to an LLM API without a BAA is a violation. Under SOC 2, you're expected to have controls preventing unnecessary exposure of sensitive data to external services.

The problem is that users routinely include sensitive information in prompts — not because they mean to, but because they're treating the AI as they would any helpful assistant:

"My email is [email protected], can you draft a reply to..."
"My card ending in 4242 was charged incorrectly, help me write a dispute..."
"My NI number is QQ 12 34 56 C, do I need to report this income?"
"Here's my phone number in case you need it: 07700 900123"

Each of these prompts contains personal data that you are now responsible for under data protection law the moment it passes through your API call to OpenAI or Anthropic.

OpenAI's API is not GDPR-compliant by default for European users — their standard terms process data in the US. Even if you've signed a DPA, minimizing the data you send is a core principle of GDPR's Article 5 (data minimisation). Sending unnecessary PII to the model is a compliance risk even with a DPA in place.

What PII Gets Redacted

Proxide's PII redaction layer sits between your application and the upstream LLM provider. Before any request is forwarded, it scans the prompt and replaces detected PII with placeholder tokens:

Email addresses [email protected] → [REDACTED:EMAIL]

Credit and debit card numbers (all major formats, with and without spaces) 4111 1111 1111 1111 → [REDACTED:CARD]

Social Security Numbers (US) 123-45-6789 → [REDACTED:SSN]

UK National Insurance numbers QQ 12 34 56 C → [REDACTED:NINO]

Phone numbers (UK and US formats) +44 7700 900123 → [REDACTED:PHONE] (555) 123-4567 → [REDACTED:PHONE]

How Detection Works

Proxide uses high-precision regex patterns for each supported PII type. Credit card detection includes Luhn algorithm validation, National Insurance numbers follow the strict HMRC format, and phone detection targets UK number formats. These patterns have very low false-positive rates because the data has a well-defined structure.

Redaction Is One-Way at the Gateway

An important design point: Proxide redacts data *before* forwarding to the LLM. The LLM never sees the original PII. This means:

The LLM receives [REDACTED:EMAIL] and responds naturally ("I've noted the email address you provided...")
Your application receives the LLM's response with the placeholder still present
If you need to restore the original value in the response, your application can do so locally — Proxide provides the redaction map in the response headers for this purpose

x-proxide-redactions: [{"token":"[REDACTED:EMAIL]","original":"[email protected]","position":45}]

This approach means: the sensitive data never leaves your infrastructure on its way to the model.

Zero Code Changes Required

PII redaction is configured entirely in your Proxide dashboard — no SDK changes, no middleware to write, no regex to maintain. Enable it with a toggle, choose which PII categories to redact, and every request through your Proxide gateway gets scanned automatically.

typescript

// Your existing code — unchanged
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: "My email is [email protected], help me draft a reply",
      // Proxide automatically redacts [email protected] before forwarding
    },
  ],
});

Testing PII Redaction

You can verify redaction is working by checking the x-proxide-pii-detected response header:

bash

curl https://gateway.proxide.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer prox-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "My card is 4111111111111111"}]
  }' -v 2>&1 | grep x-proxide

Response headers:

x-proxide-pii-detected: true
x-proxide-pii-types: CARD
x-proxide-redaction-count: 1

Compliance Benefits

Deploying PII redaction via Proxide supports compliance with:

GDPR (EU): Data minimisation (Article 5), technical measures for data security (Article 32), and reducing the scope of your LLM vendor's data processing obligations.

UK GDPR and DPA 2018: Same principles apply for UK-based services post-Brexit.

HIPAA (US healthcare): Prevents PHI from being included in prompts sent to LLM APIs that don't hold a BAA with you.

SOC 2 Type II: Demonstrates a technical control preventing unnecessary PII exposure to third-party services — directly relevant to the Confidentiality and Privacy criteria.

ISO 27001: Supports Annex A controls around information classification and external party security.

Audit Logging

Every redaction event is logged in your Proxide audit trail: timestamp, client ID, PII types detected, count of redactions, and which upstream provider received the sanitised request. These logs support compliance reporting and incident investigation without storing the actual PII values.

Getting Started

PII redaction is available on all Proxide plans. Sign up at app.proxide.ai, enable PII redaction in your dashboard settings, and your next request will be automatically scanned. No code changes, no SDK updates, no regex maintenance.