Agent Traces - PII Scrubbing

Last updated: June 16, 2026

Span detects and redacts personally identifiable information (PII) from agent traces. Detected values are replaced in place with a stable marker, so identical values correlate across events of the same trace without the original ever being stored. Each marker also indicates the detected entity type and where the redaction happened — on-device or in Span's pipeline.

Where redaction happens

  1. On-device (released). Pattern-based, deterministic redaction runs inside coding-hooks on the developer's machine, before any data leaves it. No model and no network call are involved.

  2. In Span's data pipeline — [work in progress, not yet released]. A second pass on Span's ingestion pipeline that redacts raw data before it is stored, adding semantic detection (e.g. names, addresses) that patterns alone can't catch. This stage runs entirely within Span's infrastructure — it never makes external calls, so your data is never sent to any third party.

Supported entities

Detection is built on Microsoft Presidio's predefined recognizers. Two complementary approaches cover two different kinds of PII.

Pattern-based, deterministic — on-device (released)

Structured PII that has a recognizable shape, detected with regular-expression patterns plus checksum/structure validators. Fast, exact, and runs entirely on the developer's machine.

Some of these patterns are intentionally naive and will occasionally over-match (produce false positives). Because the patterns target concise, well-bounded shapes, the resulting information loss is typically minimal — so we err toward redacting more than strictly necessary rather than risk leaving PII exposed.

  • Contact & network

    • Email addresses

    • Phone numbers

    • IP addresses (IPv4 and IPv6)

    • MAC addresses

  • Financial

    • Credit-card numbers (Luhn-validated)

    • IBANs

    • US bank routing numbers

    • Bank account numbers

    • Crypto wallet addresses

  • National & government IDs

    • United States: SSN, ITIN, NPI (provider identifier)

    • United Kingdom: NHS number, National Insurance number (NINO), postcode

    • Spain: DNI, NIE

    • India: Aadhaar, PAN

    • Australia: TFN, ABN, Medicare number

    • Singapore: NRIC/FIN

    • Italy: Codice Fiscale

    • Finland: Personal Identity Code

    • South Korea: Resident Registration Number (RRN)

    • Poland: PESEL

    • Sweden: Personnummer

    • Canada: SIN

  • Other identifiers

    • Passport numbers

    • Driver's license numbers

NER-based, semantic — Span's pipeline (work in progress, not yet released)

Free-form PII that has no fixed shape and can only be recognized from context, detected by named-entity-recognition (NER) models running in Span's pipeline. The tentative set below — kept to the entity types NER detects most reliably — is based on our prototype and may change before release:

  • Person names

  • Physical / postal addresses

  • Phone numbers

  • Email addresses

  • Secrets & credentials (e.g. API keys, tokens, passwords)

Phone numbers and email addresses are also covered deterministically by the on-device patterns above; the NER stage detects them as a semantic backstop.

Default settings

By default, Span redacts all of the supported entity types. This is configurable per tenant — you can restrict redaction to a chosen subset of entity types, or disable on-device scrubbing entirely.

Extending on-device scrubbing

If you want to extend to the PII rules, reach out to your Span representative with:

  • an entity name (used in the redaction marker),

  • a regular-expression pattern, and

  • a few example values.