Context-aware PII
detection for .NET.

Detect and de-identify personal data offline - validated recognizers, reversible encryption, structured JSON/CSV redaction, and optional multilingual NER. PII never leaves your process.

Get Started View on GitHub
dotnet add package TasmanianDevil

Quick Start

Detect, de-identify, restore.

One facade wires the analyzer, anonymizer, and deanonymizer from a single options object - or compose the engines directly for full control.

1. The facade - one call to de-identify

using TasmanianDevil;

var engine = new PiiEngine();
var result = engine.Deidentify("Email jane@contoso.com or call +1 425 555 0100.");

Console.WriteLine(result.AnonymizedText);
// Email <EMAIL_ADDRESS> or call <PHONE_NUMBER>.

2. Per-entity operators + opt-in country pack

using TasmanianDevil;

var options = new PiiOptions
{
    Countries = [PiiCountries.De],
    Operators = new Dictionary<string, OperatorConfig>
    {
        ["EMAIL_ADDRESS"] = new("mask", new() { [OperatorParams.CharsToMask] = 6 }),
        ["CREDIT_CARD"]   = new("redact"),
        ["DEFAULT"]       = new("encrypt", new() { [OperatorParams.Key] = key }),
    },
};

var engine = new PiiEngine(options);
var deid   = engine.Deidentify(text);   // deid.IsReversible == true

3. Reversible round-trip - encrypt out, restore exactly

// hand the opaque text to a third party, then decrypt it back byte-for-byte
var decrypt = new Dictionary<string, OperatorConfig>
{
    ["DEFAULT"] = new("decrypt", new() { [OperatorParams.Key] = key }),
};

var restored = engine.Reidentify(deid, decrypt);
// restored.Text == original text

Validated, not just regex.

Architecture inspired by Microsoft Presidio (MIT; not affiliated or endorsed), rebuilt from scratch as idiomatic, dependency-light C#.

Checksum-validated recognizers

A 16-digit number is only a credit card if it passes Luhn; an IBAN if it passes mod-97. Real validation (Luhn, mod-97, Verhoeff, ISO-7064, ICAO, bech32) cuts false positives that naive regex can't.

Context-aware scoring

A bare token scores low and drops; nearby words ("card", "IBAN", "postcode") lift it over threshold via a dependency-free Porter-stemmer lemma matcher. Detection adapts to surrounding text.

Reversible de-identification

Encrypt PII spans (AES), hand the opaque text to a third party, and decrypt the exact original back. Lossy operators (mask, hash, redact) report themselves non-reversible; a wrong key fails loudly.

Structured data

Redact JSON by key path (allow / deny) preserving shape and non-string types, or infer PII columns in CSV/TSV and redact them consistently - reusing the same recognizers and operators.

Batch API

Analyze and anonymize lists or keyed record dictionaries in one call, results aligned to the input. Allocation-light and safe to share across threads.

Offline core, optional ML reach

The whole engine runs with zero models. When you want more, TasmanianDevil.Onnx plugs a real multilingual span-NER model into the same pipeline - the hard part, shipped.

Detection Coverage

~50 entity types, validated.

Generic and US recognizers are always on. Opt-in country packs add national identifiers, validated with real checksums and off by default to keep false positives low.

~50entity types
6country packs (US always on)
7anonymization operators
0network calls

8 generic recognizers (email, phone via libphonenumber, credit card / Luhn, IBAN / mod-97, crypto, IP, URL, MAC) and a 9-entity US pack are always on. Opt-in country packs add national IDs, tax numbers, and passports for the UK, Germany, India, Italy, and Spain. An optional offline ONNX add-on layers in multilingual named-entity detection (PERSON, LOCATION, ORGANIZATION, DATE_TIME), all resolved in one pass.

1 Recognizers Validated regex + checksums
Luhn · mod-97 · Verhoeff · ISO-7064
2 Context boosting Lemma-aware confidence scoring
offline Porter stemmer
3 Conflict resolution Overlap / containment merge
threshold + allow-list
4 Operators replace · redact · mask · hash
encrypt ↔ decrypt · keep · custom

Seven ways to anonymize.

Per-entity or via a DEFAULT fallback. Compose the analyzer and anonymizer directly, or drive everything through the PiiEngine facade.

using TasmanianDevil.Analyzer;
using TasmanianDevil.Anonymizer;

var analyzer   = new AnalyzerEngine(PiiRecognizers.CreateDefaultRegistry("en"));
var anonymizer = new AnonymizerEngine();

// replace (default) · redact · mask · hash (salted) · encrypt ↔ decrypt · keep · custom
var results    = analyzer.Analyze(text, "en");
var anonymized = anonymizer.Anonymize(text, results, operators);

// structured + batch reuse the very same recognizers and operators
engine.AnonymizeJson(json, new JsonRedactionScope { IncludePaths = ["user.email"] });
engine.AnonymizeCsv(header, rows);                 // infers which columns are PII
engine.AnonymizeBatch(keyedRecords);               // aligned results per key

Optional ONNX NER

The span entities regex can't reach.

TasmanianDevil.Onnx adds PERSON / LOCATION / ORGANIZATION / DATE_TIME detection via a zero-shot, multilingual GLiNER model - run through Kyoto. It registers as an ordinary recognizer, so its spans flow through the same overlap resolution and anonymization as the regex/checksum entities.

using TasmanianDevil.Onnx;

// the ONNX export is published on Hugging Face (filip-w/gliner-multi-pii-onnx)
var ner = new GlinerNerRecognizer(new GlinerNerOptions
{
    ModelPath = modelPath, TokenizerPath = spmPath, ConfigPath = configPath,
});

registry.AddRecognizer(ner);   // PERSON/LOCATION/... now join the same analyzer pass