Skip to main content

Command Palette

Search for a command to run...

Zero-Copy PII Shielding: Architecting Local-First Gateways for Secure LLM Pipelines

Updated
6 min read
Zero-Copy PII Shielding: Architecting Local-First Gateways for Secure LLM Pipelines

Large Language Models (LLMs) are transforming how we build products, but they introduce a massive architectural threat: the dissolution of the security perimeter.

Every time a user inputs text, or an internal agent gathers database rows for Retrieval-Augmented Generation (RAG), sensitive data is transmitted in plain text to third-party APIs (OpenAI, Anthropic, Google Cloud). This violates standard Zero Trust architectures and global compliance frameworks like GDPR, HIPAA, and SOC2.

Some companies attempt to solve this by running heavy self-hosted models, which requires substantial GPU infrastructure and adds significant operational overhead. Others block LLMs entirely, hindering developer productivity.

There is a third path: reversible local-first PII shielding. By intercepting prompts at a native, compiled network middleware layer before they exit the corporate boundary, we can redact sensitive data, let remote LLMs reason over anonymous tokens, and restore the private information on the way back.

In this article, we'll walk through how we architected this pattern in NLProxy using Rust, leveraging its zero-copy abstractions, compile-time lifetimes, and RAII memory safety to enforce absolute data sovereignty under 3 milliseconds.


The Architecture: Local-First Reversible Tokenization

The local-first gateway divides your AI pipeline into two execution planes:

  1. The Private Control Plane (Local): Runs inside your virtual private cloud (VPC) or local machine. It manages raw data, Named Entity Recognition (NER), encryption keys, and the reconstitution map.

  2. The Public Reasoning Plane (Cloud): The remote LLM provider. It receives only anonymous prompts and returns text containing redacted placeholders.

       [Raw User Prompt]
              │
              ▼
   ┌──────────────────────┐
   │  Local Rust Gateway  │
   │  (PromptShield)      ├──────► [Local Reconstitution Map] (RAM)
   └──────────┬───────────┘
              │ (Redacted Prompt)
              ▼
   ┌──────────────────────┐
   │   Remote Cloud LLM   │ (Sees only: "Describe the tax records of __PROT__")
   └──────────┬───────────┘
              │ (Redacted Response)
              ▼
   ┌──────────────────────┐
   │  Local Rust Gateway  │
   │ (PromptReconstructor)◄─────── [Local Reconstitution Map] (RAM)
   └──────────┬───────────┘
              │ (Restored Response)
              ▼
      [Secure Response]

By decoupling reasoning from identity, the LLM can still answer complex questions about context without ever "knowing" the actual customer's name, email, or IP address.


Why Rust? Memory Safety and Deterministic Lifecycle

Implementing a PII shielding layer in Python or Node.js introduces security and performance risks.

  • Garbage Collection (GC) Jitter: Intricate string formatting, regex parsing, and tokenization generate millions of short-lived objects. In GC languages, this causes unpredictable latency spikes (jitter) under high load.

  • Residual RAM Violations: Intransient variables containing raw, unmasked PII can linger in the heap of a Python microservice for minutes before garbage collection clears them. If an attacker performs a memory dump of your gateway, your redact system has failed.

Rust solves both challenges through its fundamental design principles:

1. RAII (Resource Acquisition Is Initialization)

In Rust, the lifetime of a variable is strictly bound to its scope. When a variable goes out of scope, Rust automatically runs its destructor and deallocates its memory immediately. There are no GC sweeps. As soon as PromptShield::shield completes, the raw unmasked strings are purged from the heap.

2. Zero-Copy Lifetimes (&str)

Instead of copying strings over and over during tokenization, segmenting, and scanning, Rust's borrow checker lets us work with string slices (&str) that point directly to the original input buffer. We only allocate memory for the redacted version and the placeholder map. This keeps memory allocation costs close to zero and guarantees high-throughput parsing.


Implementing the PII Shield in Rust

NLProxy's PromptShield combines deterministic regular expressions with an optimized BERT Named Entity Recognition model to identify sensitive data. When a match is found, it generates a cryptographically random placeholder (e.g., __PROT_18273948__) and saves the mapping in a local hashmap.

Here is the simplified implementation of the shielding module:

use regex::Regex;
use std::collections::HashMap;

#[derive(Debug, Clone)]
pub enum DomainMode {
    General,
    Legal,
    Finance,
    Code,
}

pub struct PromptShield {
    pub mode: DomainMode,
    patterns: Vec<(&'static str, Regex)>,
}

impl PromptShield {
    pub fn new(mode: DomainMode) -> Self {
        let mut patterns = Vec::new();
        // Standard PII Regex rules
        patterns.push(("ip", Regex::new(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b").unwrap()));
        patterns.push(("email", Regex::new(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b").unwrap()));
        
        match mode {
            DomainMode::Finance => {
                patterns.push(("iban", Regex::new(r"(?i)\b[A-Z]{2}\d{2}[A-Z0-9]{4,30}\b").unwrap()));
            }
            DomainMode::Code => {
                patterns.push(("file_path", Regex::new(r"(?:/[a-zA-Z0-9._-]+)+/[a-zA-Z0-9._-]+").unwrap()));
            }
            _ => {}
        }

        PromptShield { mode, patterns }
    }

    pub fn shield(&self, text: &str) -> (String, HashMap<String, String>) {
        let mut processed_text = text.to_string();
        let mut placeholders = HashMap::new();

        for (_label, pattern) in &self.patterns {
            let mut replacements = Vec::new();
            for mat in pattern.find_iter(&processed_text) {
                let original = mat.as_str().to_string();
                let rand_id: u32 = rand::random();
                let placeholder = format!("__PROT_{:08}__", rand_id % 100_000_000);
                replacements.push((mat.start(), mat.end(), placeholder, original));
            }

            // Replace occurrences backwards to preserve character indices
            for (start, end, placeholder, original) in replacements.into_iter().rev() {
                if start < processed_text.len() && end <= processed_text.len() {
                    processed_text.replace_range(start..end, &placeholder);
                    placeholders.insert(placeholder, original);
                }
            }
        }

        (processed_text, placeholders)
    }
}

Restoring PII in responses: The Reconstitution Step

Once the cloud LLM finishes generating the response, it contains the original placeholders (or slightly altered versions if the model hallucinated some formatting, which we correct at the gateway).

The PostLLMVerifier intercepts the output stream, reads the reconstitution map stored in the client's local memory, and replaces placeholders with their original values.

pub struct PostLLMVerifier;

impl PostLLMVerifier {
    pub fn reconstruct_placeholders(
        &self,
        response_text: &str,
        placeholders: &HashMap<String, String>,
    ) -> String {
        let mut corrected = response_text.to_string();
        for (placeholder, original_pii) in placeholders {
            corrected = corrected.replace(placeholder, original_pii);
        }
        corrected
    }
}

Because the reconstitution map is never written to disk and never sent over the network, the public cloud provider has no mechanism to decrypt the tokens back to real names or credit card numbers.

If your cloud provider’s database is compromised or their models are trained on prompt logs, your company's PII remains completely secure.


Conclusion: Balancing Compliance with AI Speed

Choosing between strict compliance and AI innovation is a false dichotomy. By shifting your integration architecture from passive API clients to a local-first active gateway:

  1. You eliminate compliance friction: Security teams can approve public LLM usage since sensitive PII never leaves the boundary.

  2. You guarantee microsecond performance: Compiled Rust code executes token swapping and regex DFA analysis faster than a single network hop.

  3. You scale without memory leaks: Rust's deterministic destructor system automatically manages the lifecycle of unmasked values, keeping your heap clean.

Implementing this design pattern in your system ensures your AI apps remain fast, secure, and fully sovereign.


To explore the source code of the PromptShield and review benchmarks, check out the NLProxy GitHub Repository.