Safety controls for
.NET AI agents.

Composable, testable, declarative guardrails. Block prompt injections, redact PII, enforce topics, and validate outputs.

Get Started View on GitHub
dotnet add package AgentGuard --prerelease

Quick Start

Three integration tiers.

Pick the level that fits your architecture - framework-agnostic standalone, IChatClient decorator, or Microsoft Agent Framework middleware.

1. Standalone pipeline - framework-agnostic, run rules directly

using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
using AgentGuard.Onnx;

var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPii()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);

var ctx = new GuardrailContext
{
    Text = userInput,
    Phase = GuardrailPhase.Input,
    Messages = conversationHistory   // optional - enables history-aware rules
};

var result = await pipeline.RunAsync(ctx);

if (result.IsBlocked)
    Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
    Console.WriteLine(result.FinalText);

2. IChatClient decorator - wrap any chat client with one call

using AgentGuard.Core.ChatClient;
using AgentGuard.Onnx;

// Wrap any IChatClient - works with OpenAI, Azure OpenAI, Ollama, or any
// Microsoft.Extensions.AI client. Conversation history is propagated automatically.
var guardedClient = chatClient.UseAgentGuard(g => g
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPii()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
);

// Use exactly like a normal IChatClient
var response = await guardedClient.GetResponseAsync(conversationHistory);

// Streaming works too
await foreach (var update in guardedClient.GetStreamingResponseAsync(conversationHistory))
{
    Console.Write(update.Text);
}

3. Microsoft Agent Framework middleware - plug into AIAgentBuilder

using AgentGuard.AgentFramework;
using AgentGuard.Onnx;

var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g
        .NormalizeInput()
        .BlockPromptInjection()
        .BlockPromptInjectionWithDefender()
        .RedactPii()
        .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
        .LimitInputTokens(4000)
    )
    .Build();

// Use exactly like a normal agent
var response = await guardedAgent.RunAsync(messages, session, options);

// Streaming works too - with progressive retraction support
await foreach (var update in guardedAgent.RunStreamingAsync(messages, session, options))
{
    Console.Write(update.Text);
}

Rules for input and output.

22 built-in rules across the input and output phases - regex, ONNX classifiers, and LLM-as-judge.

Prompt Injection Detection

Six tiers: regex patterns (from Arcanum taxonomy), bundled StackOne Defender multi-head ONNX model (minilm-multihead-v5, calibrated dual-head, ~8ms, no download), optional DeBERTa v3 classifiers including PIGuard (strong on indirect / code-style injection), remote ML classifier (Sentinel-v2 via HTTP), Azure Prompt Shields (jailbreaks + indirect injection), and LLM-as-judge with structured threat classification.

Content Safety

Toxicity, hate speech, violence, sexual content, self-harm, and harassment. Plug in Azure AI Content Safety, or run the offline Opir mDeBERTa classifier for non-English text (German, Spanish, Russian, Arabic, Chinese, Hindi) when a per-call cloud API isn't an option.

PII Engine

A complete offline PII detection + de-identification engine - ~50 entity types, reversible encryption, structured JSON/CSV redaction, and a batch API. Always-on generic + US recognizers, opt-in country packs (UK/DE/IN/IT/ES), and an optional multilingual NER add-on. See the full breakdown →

Topic Enforcement

Keep conversations on-topic with LLM semantic classification. Understands intent and conversation context - not just keywords. Conversation history is automatically included so follow-up messages are evaluated correctly.

Output Validation

Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.

Input Normalization

Decodes evasion encodings - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates, so encoding-based attacks are caught by the downstream rules.

Agentic & RAG Guardrails

Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.

Dynamic Rule Enabling

Gate any rule per request with .When() / .Unless(). The predicate sees the guardrail context and can capture ambient services like IHttpContextAccessor - enable, disable, or retune a rule based on ClaimsPrincipal, tenant, feature flag, or detected language (e.g. run the English-centric Defender classifier at a higher threshold for non-English users).

Observability (OpenTelemetry)

Built-in spans and metrics for every pipeline run, rule evaluation, re-ask attempt, and streaming retraction. Uses System.Diagnostics - no SDK dependency. Works with Aspire Dashboard, Jaeger, Zipkin, and any OTel collector.

Flagship Capability

A complete, offline PII engine for .NET.

Detection and reversible de-identification, written from scratch in C# - architecture inspired by Microsoft Presidio (MIT; not affiliated or endorsed). No cloud, no ML required for the core. PII never leaves your process.

~50entity types
6country packs (US always on)
7anonymization operators
0network calls

8 generic recognizers (email, phone via libphonenumber, credit card / Luhn, IBAN / mod-97, crypto, IP, URL, MAC) and a 9-entity US pack are always on. Opt-in country packs add national IDs, tax numbers, and passports for the UK, Germany, India, Italy, and Spain. An optional offline ONNX add-on layers in multilingual named-entity detection (PERSON, LOCATION, ORGANIZATION, DATE_TIME), all resolved in one pass.

1 Recognizers Validated regex + checksums
Luhn · mod-97 · Verhoeff · ISO-7064
2 Context boosting Lemma-aware confidence scoring
offline Porter stemmer
3 Conflict resolution Overlap / containment merge
threshold + allow-list
4 Operators replace · redact · mask · hash
encrypt ↔ decrypt · keep · custom

De-identify, then restore - encrypt is fully reversible

var analyzer   = new AnalyzerEngine(PiiRecognizers.CreateDefaultRegistry("en"));
var anonymizer = new AnonymizerEngine();

// input  : "Email ada@acme.com or call +1 415 555 0132"
var results    = analyzer.Analyze(text, "en");
var anonymized = anonymizer.Anonymize(text, results);
// replace: "Email <EMAIL_ADDRESS> or call <PHONE_NUMBER>"

// reversible round-trip: encrypt with a key, persist the items, decrypt later
var deid     = PiiDeidentificationResult.FromEngineResult(encrypted); // deid.IsReversible == true
var restored = new DeanonymizerEngine().Deanonymize(deid.AnonymizedText, deid.Items, decryptOps);
// restored.Text == original text   (byte-for-byte)

In a Microsoft Agent Framework agent - PII never reaches the model

var agent = chatClient
    .AsAIAgent(instructions, name: "SupportBot", tools: [lookupCustomer])
    .AsBuilder()
    .UseAgentGuard(g => g.RedactPii().GuardToolResults()) // input, output & tool-result redaction
    .UsePiiReversibleRedaction(key)                       // encrypt in, decrypt out
    .Build();

// user : "Email me at john@example.com about order 12345"
// model sees : "Email me at <AES ciphertext token> about order 12345"
// user gets  : "...follow up on john@example.com"   (decrypted only on the way out)

Reversible de-identification

Encrypt PII spans (AES) and restore them later with DeanonymizerEngine. Lossy operators (mask, hash, redact) report themselves as non-reversible; a wrong key fails loudly, never silently.

Structured data

Redact JSON by key path (allow / deny) preserving shape and non-string types, or infer PII columns in CSV/TSV and redact them consistently - reusing the same recognizers and operators.

Batch API

Analyze and anonymize lists or keyed record dictionaries in one call, with results aligned to the input. Sequential, allocation-light, and safe to share across threads.

Fully offline & sovereign-friendly

Deterministic regex + checksums with no network calls and no telemetry. Sensitive data stays inside the process - suitable for air-gapped and regulated environments.

Country packs

Opt-in national identifier packs (UK / DE / IN / IT / ES) are validated with real checksums and disabled by default to keep false positives low. The US pack is always on.

Optional multilingual NER

A separate offline ONNX model adds names, places, organizations, and dates across languages - merged into the same order-20 pass as the regex recognizers when you want it.

Built-in Rules

22 rules. Ordered by cost.

Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.

Order Rule Type Phase
5 InputNormalizationRule Local Input
8 RetrievalGuardrailRule Regex Input
10 PromptInjectionRule Regex Input
11 DefenderPromptInjectionRule ONNX ML (bundled) Input
12 OnnxPromptInjectionRule ONNX ML (DeBERTa) Input
12 PIGuardPromptInjectionRule ONNX ML (DeBERTa, PIGuard) Input
13 RemotePromptInjectionRule Remote ML Input
14 AzurePromptShieldRule Azure API Input
15 LlmPromptInjectionRule LLM Input
20 PiiRule (+ optional GlinerNerRecognizer) RegexONNX NER Both
22 SecretsDetectionRule Regex Both
25 LlmPiiDetectionRule LLM Both
35 LlmTopicGuardrailRule LLM Input
40 TokenLimitRule Local Input / Output
45 ToolCallGuardrailRule Regex Output
47 ToolResultGuardrailRule Regex Output
50 ContentSafetyRule Pluggable Both
50 OpirSafetyRule ONNX ML (mDeBERTa, multilingual) Input
55 LlmOutputPolicyRule LLM Output
65 LlmGroundednessRule LLM Output
75 LlmCopyrightRule LLM Output
76 AzureProtectedMaterialRule Azure API Output

How the classifiers compare.

The classifiers are complementary, not competing. The bundled Defender model is the fast default for English prompt injection; PIGuard and Opir are optional models you can layer on for cases it isn't built for. Numbers are from held-out datasets - full method and data in the eng/*-eval RESULTS files. Recall = % of unsafe inputs blocked; FPR = % of safe inputs blocked.

Prompt injection

The real rules head to head on a balanced 25/class held-out sample (jackhhao English jailbreaks, deepset German injections); cells are recall / FPR. The bundled Defender model is the fast default - on the full jackhhao test split it scores 90.6% / 0.8%. The LLM-as-judge tier (LlmPromptInjectionRule) runs on any IChatClient you supply - AgentGuard bundles no LLM; the two models below are illustrative bring-your-own examples (one capable, one tiny), not shipped components. Per-call latency measured on an Apple M4 Pro.

Classifierjackhhaodeepset (German)per call
regex (medium)60% / 8%8% / 0%<1 ms
Defender (bundled)92% / 4%64% / 0%~8 ms
LLM (BYO) · gemma-4-26b-a4b96% / 0%68% / 0%~6 s
LLM (BYO) · qwen3-0.6b32% / 16%16% / 0%~1.5 s

The LLM rows illustrate that quality scales hard with the model you bring: the MoE gemma-4-26b-a4b (~4B active) tops every column, while a tiny 0.6B lands well below Defender. Both happen to run fully locally, so an LLM tier is a real option for offline deployments - but the model is yours to choose, so budget the latency and pick a capable one. PIGuard isn't in this table: both datasets are in its training set, so its numbers would be optimistic - it's shown on indirect injection below, where it's held-out.

Indirect / code-style injection

A direct-injection sentence classifier is weak by design on payloads hidden in tool results. The optional PIGuard model (DeBERTa-v3) is built for these and layers on top of Defender - held-out indirect set (BIPIA) and an over-defense benign set (NotInject).

CheckDefender - bundledPIGuard - optional
Indirect / code injection (BIPIA), recall34%96%
Over-defense (NotInject benign FPR, lower better)10.3%8.3%

Run them together - PIGuard after Defender - rather than choosing one.

Multilingual content safety

Content safety is a different job from injection detection. For non-English toxicity you can call Azure AI Content Safety (cloud, billed per call) or run the optional Opir model (mDeBERTa-v3) fully offline. Toxicity on textdetox/multilingual_toxicity_dataset, balanced per language; cells are recall / FPR. textdetox negatives are real, partly borderline social-media comments, so absolute FPRs run high for every model.

LanguageOpir - offlineAzure CS - cloud
German72% / 24%92% / 52%
Spanish76% / 24%92% / 20%
Russian52% / 16%76% / 8%
Arabic40% / 36%84% / 24%
Chinese40% / 28%44% / 32%
Hindi56% / 16%64% / 4%

Azure generally leads on recall; Opir trades some of that for running locally, free, and PII-safe, with comparable-or-lower FPR on German and Chinese. Use Opir when you need offline or sovereign deployment; reach for Azure when a cloud call is fine and you want maximum recall.

Input → Agent → Output.

Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.

1. Input Guardrails

Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.

2. Agent Execution

Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.

3. Output Guardrails

Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM with the failure reason on violations.

Packages

Pick what you need.

Eight packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.

AgentGuard
All-in-one package. Core rules engine, bundled Defender multi-head ONNX model (minilm-multihead-v5), and offline classifiers. No agent framework dependency.
AgentGuard.Pii
Flagship offline PII engine: ~50 entity types across generic + always-on US + opt-in country packs, lemma-aware context scoring, anonymization operators, reversible de-identification, and structured JSON/CSV + batch APIs. Architecture inspired by Microsoft Presidio (MIT). Fully offline.
AgentGuard.AgentFramework
Microsoft Agent Framework adapter. UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().
AgentGuard.RemoteClassifier
Remote ML classifier via HTTP. Call Sentinel-v2, Ollama, vLLM, or custom endpoints for SOTA prompt injection detection.
AgentGuard.Azure
Azure AI Content Safety integration. Prompt Shields (injection detection), protected material detection (text & code with license citations), category analysis, severity thresholds, and server-side blocklists.
AgentGuard.Hosting
DI registration, named policy factory, and appsettings.json configuration binding for ASP.NET Core and Aspire.

When regex isn't enough.

Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.

using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;

// Multi-tier prompt injection: Regex → Defender → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()                            // regex (order 10)
    .BlockPromptInjectionWithDefender()                    // Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier(          // remote ML (order 13)
        "http://localhost:8000/classify")
    .BlockPromptInjectionWithAzurePromptShield(         // Azure Prompt Shield (order 14)
        endpoint, apiKey)
    .BlockPromptInjectionWithLlm(chatClient)           // LLM (order 15)
    .DetectPIIWithLlm(chatClient,
        new() { Action = PiiAction.Redact })
    .EnforceOutputPolicy(chatClient,
        "Never recommend competitor products")
    .CheckGroundedness(chatClient)
    .CheckCopyright(chatClient)
    .Build();

Re-ask Experimental

Re-ask on violation.

When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.

var policy = new GuardrailPolicyBuilder()
    .EnforceOutputPolicy(chatClient, "Never recommend competitors")
    .CheckGroundedness(chatClient)
    .EnableReask(chatClient, o =>
    {
        o.MaxAttempts = 2;
        o.IncludeBlockedResponse = true;
    })
    .Build();

var result = await pipeline.RunAsync(outputContext);

if (result.WasReasked)
    Console.WriteLine($"Re-asked {result.ReaskAttemptsUsed} time(s)");

See every rule fire.

OpenTelemetry-compatible spans and metrics out of the box. Register with one line - works with Aspire, Jaeger, Zipkin, and any OTel collector.

using AgentGuard.Hosting;

// Register AgentGuard telemetry with OpenTelemetry
builder.Services.AddOpenTelemetry()
    .WithTracing(t => t.AddAgentGuardInstrumentation())
    .WithMetrics(m => m.AddAgentGuardInstrumentation());

// Spans emitted:
//   agentguard.pipeline.run          (policy, phase, outcome)
//   agentguard.rule.evaluate {name}  (rule, phase, order, outcome)
//   agentguard.pipeline.reask        (attempts, outcome)
//   agentguard.middleware.input      (agent, outcome)
//   agentguard.middleware.output     (agent, outcome, tool calls)
//
// Metrics emitted:
//   agentguard.pipeline.evaluations  (counter)
//   agentguard.rule.evaluations      (counter)
//   agentguard.rule.blocks           (counter)
//   agentguard.pipeline.duration     (histogram, ms)
//   agentguard.rule.duration         (histogram, ms)

Extensible

Build your own rules.

Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.

using AgentGuard.Core.Abstractions;

public class NoProfanityRule : IGuardrailRule
{
    public string Name => "no-profanity";
    public GuardrailPhase Phase => GuardrailPhase.Output;
    public int Order => 100;

    public ValueTask<GuardrailResult> EvaluateAsync(
        GuardrailContext context,
        CancellationToken cancellationToken = default)
    {
        var hasProfanity = ProfanityDetector.Check(context.Text);

        return ValueTask.FromResult(hasProfanity
            ? GuardrailResult.Blocked("Inappropriate language.")
            : GuardrailResult.Passed());
    }
}

// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()
    .AddRule(new NoProfanityRule())
    .Build();