AgentGuard - Guardrails for .NET AI Agents

Quick Start

Three integration tiers.

Pick the level that fits your architecture - framework-agnostic standalone, IChatClient decorator, or Microsoft Agent Framework middleware.

1. Standalone pipeline - framework-agnostic, run rules directly

using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
using AgentGuard.Onnx;

var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPii()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);

var ctx = new GuardrailContext
{
    Text = userInput,
    Phase = GuardrailPhase.Input,
    Messages = conversationHistory   // optional - enables history-aware rules
};

var result = await pipeline.RunAsync(ctx);

if (result.IsBlocked)
    Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
    Console.WriteLine(result.FinalText);

2. IChatClient decorator - wrap any chat client with one call

using AgentGuard.Core.ChatClient;
using AgentGuard.Onnx;

// Wrap any IChatClient - works with OpenAI, Azure OpenAI, Ollama, or any
// Microsoft.Extensions.AI client. Conversation history is propagated automatically.
var guardedClient = chatClient.UseAgentGuard(g => g
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPii()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
);

// Use exactly like a normal IChatClient
var response = await guardedClient.GetResponseAsync(conversationHistory);

// Streaming works too
await foreach (var update in guardedClient.GetStreamingResponseAsync(conversationHistory))
{
    Console.Write(update.Text);
}

3. Microsoft Agent Framework middleware - plug into AIAgentBuilder

using AgentGuard.AgentFramework;
using AgentGuard.Onnx;

var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g
        .NormalizeInput()
        .BlockPromptInjection()
        .BlockPromptInjectionWithDefender()
        .RedactPii()
        .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
        .LimitInputTokens(4000)
    )
    .Build();

// Use exactly like a normal agent
var response = await guardedAgent.RunAsync(messages, session, options);

// Streaming works too - with progressive retraction support
await foreach (var update in guardedAgent.RunStreamingAsync(messages, session, options))
{
    Console.Write(update.Text);
}

Features

Rules for input and output.

22 built-in rules across the input and output phases - regex, ONNX classifiers, and LLM-as-judge.

Prompt Injection Detection

Six tiers: regex patterns (from Arcanum taxonomy), bundled StackOne Defender multi-head ONNX model (minilm-multihead-v5, calibrated dual-head, ~8ms, no download), optional DeBERTa v3 classifiers including PIGuard (strong on indirect / code-style injection), remote ML classifier (Sentinel-v2 via HTTP), Azure Prompt Shields (jailbreaks + indirect injection), and LLM-as-judge with structured threat classification.

Content Safety

Toxicity, hate speech, violence, sexual content, self-harm, and harassment. Plug in Azure AI Content Safety, or run the offline Opir mDeBERTa classifier for non-English text (German, Spanish, Russian, Arabic, Chinese, Hindi) when a per-call cloud API isn't an option.

PII Redaction

Strip personal data at every agent boundary - input, output, and tool results - so it never reaches the model or your logs. Reversible redaction encrypts PII before the provider and decrypts it back in the reply. Offline, ~50 entity types, powered by the TasmanianDevil engine. See how it works →

Topic Enforcement

Keep conversations on-topic with LLM semantic classification. Understands intent and conversation context - not just keywords. Conversation history is automatically included so follow-up messages are evaluated correctly.

Output Validation

Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.

Input Normalization

Decodes evasion encodings - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates, so encoding-based attacks are caught by the downstream rules.

Agentic & RAG Guardrails

Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.

Dynamic Rule Enabling

Gate any rule per request with .When() / .Unless(). The predicate sees the guardrail context and can capture ambient services like IHttpContextAccessor - enable, disable, or retune a rule based on ClaimsPrincipal, tenant, feature flag, or detected language (e.g. run the English-centric Defender classifier at a higher threshold for non-English users).

Observability (OpenTelemetry)

Built-in spans and metrics for every pipeline run, rule evaluation, re-ask attempt, and streaming retraction. Uses System.Diagnostics - no SDK dependency. Works with Aspire Dashboard, Jaeger, Zipkin, and any OTel collector.

Flagship Capability

Keep PII out of the model.

AgentGuard redacts personal data as a guardrail at every boundary an agent crosses - user input, model output, and tool results - so PII never reaches the LLM provider or your logs. Reversible redaction goes further: encrypt PII before the model and decrypt it back in the reply, so the model reasons over opaque tokens while the user still gets a coherent answer. Fully offline, powered by the TasmanianDevil engine.

1 User input Redact (or encrypt) PII
before the prompt is sent

2 Model Reasons over placeholders
or ciphertext tokens

3 Tool results Redact PII in retrieved
records before they re-enter context

4 Response Restore reversibly, or redact
before it reaches the user

In a Microsoft Agent Framework agent - PII never reaches the model

var agent = chatClient
    .AsAIAgent(instructions, name: "SupportBot", tools: [lookupCustomer])
    .AsBuilder()
    .UseAgentGuard(g => g.RedactPii().GuardToolResults()) // input, output & tool-result redaction
    .UsePiiReversibleRedaction(key)                       // encrypt in, decrypt out
    .Build();

// user : "Email me at john@example.com about order 12345"
// model sees : "Email me at <AES ciphertext token> about order 12345"
// user gets  : "...follow up on john@example.com"   (decrypted only on the way out)

Or as a single rule in any pipeline / IChatClient decorator

var policy = new GuardrailPolicyBuilder()
    .RedactPii()                 // order 20, runs on input and output
    .GuardToolResults()          // strip PII from tool output before the model sees it
    .Build();

// input  : "Email ada@acme.com or call +1 415 555 0132"
// output : "Email <EMAIL_ADDRESS> or call <PHONE_NUMBER>"

Every boundary, one rule

The order-20 RedactPii() rule runs on both input and output; GuardToolResults() extends it to tool/function results so retrieved records are scrubbed before they ever re-enter the model's context.

Reversible round-trip

Encrypt PII before the provider sees it and decrypt the exact value back in the response - the model reasons over tokens, the user gets real data. A wrong key fails loudly; lossy operators report themselves non-reversible.

Nothing leaves your process

Detection is deterministic regex + checksums with no network calls and no telemetry - so PII isn't shipped to a cloud classifier just to be redacted. Suitable for air-gapped and regulated deployments.

Broad, validated coverage

~50 entity types - email, phone, cards (Luhn), IBAN (mod-97), crypto, plus an always-on US pack and opt-in UK/DE/IN/IT/ES packs - with an optional multilingual NER add-on. The full engine, breadth, and operators live in TasmanianDevil.

Built-in Rules

22 rules. Ordered by cost.

Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.

Order	Rule	Type	Phase
5	`InputNormalizationRule`	Local	Input
8	`RetrievalGuardrailRule`	Regex	Input
10	`PromptInjectionRule`	Regex	Input
11	`DefenderPromptInjectionRule`	ONNX ML (bundled)	Input
12	`OnnxPromptInjectionRule`	ONNX ML (DeBERTa)	Input
12	`PIGuardPromptInjectionRule`	ONNX ML (DeBERTa, PIGuard)	Input
13	`RemotePromptInjectionRule`	Remote ML	Input
14	`AzurePromptShieldRule`	Azure API	Input
15	`LlmPromptInjectionRule`	LLM	Input
20	`PiiRule` (+ optional `GlinerNerRecognizer`)	RegexONNX NER	Both
22	`SecretsDetectionRule`	Regex	Both
25	`LlmPiiDetectionRule`	LLM	Both
35	`LlmTopicGuardrailRule`	LLM	Input
40	`TokenLimitRule`	Local	Input / Output
45	`ToolCallGuardrailRule`	Regex	Output
47	`ToolResultGuardrailRule`	Regex	Output
50	`ContentSafetyRule`	Pluggable	Both
50	`OpirSafetyRule`	ONNX ML (mDeBERTa, multilingual)	Input
55	`LlmOutputPolicyRule`	LLM	Output
65	`LlmGroundednessRule`	LLM	Output
75	`LlmCopyrightRule`	LLM	Output
76	`AzureProtectedMaterialRule`	Azure API	Output

Benchmarks

How the classifiers compare.

The classifiers are complementary, not competing. The bundled Defender model is the fast default for English prompt injection; PIGuard and Opir are optional models you can layer on for cases it isn't built for. Numbers are from held-out datasets - full method and data in the Kyoto repo's eng/*-eval RESULTS files. Recall = % of unsafe inputs blocked; FPR = % of safe inputs blocked.

Prompt injection

The real rules head to head on a balanced 25/class held-out sample (jackhhao English jailbreaks, deepset German injections); cells are recall / FPR. The bundled Defender model is the fast default - on the full jackhhao test split it scores 90.6% / 0.8%. The LLM-as-judge tier (LlmPromptInjectionRule) runs on any IChatClient you supply - AgentGuard bundles no LLM; the two models below are illustrative bring-your-own examples (one capable, one tiny), not shipped components. Per-call latency measured on an Apple M4 Pro.

Classifier	jackhhao	deepset (German)	per call
regex (medium)	60% / 8%	8% / 0%	<1 ms
Defender (bundled)	92% / 4%	64% / 0%	~8 ms
LLM (BYO) · gemma-4-26b-a4b	96% / 0%	68% / 0%	~6 s
LLM (BYO) · qwen3-0.6b	32% / 16%	16% / 0%	~1.5 s

The LLM rows illustrate that quality scales hard with the model you bring: the MoE gemma-4-26b-a4b (~4B active) tops every column, while a tiny 0.6B lands well below Defender. Both happen to run fully locally, so an LLM tier is a real option for offline deployments - but the model is yours to choose, so budget the latency and pick a capable one. PIGuard isn't in this table: both datasets are in its training set, so its numbers would be optimistic - it's shown on indirect injection below, where it's held-out.

Indirect / code-style injection

A direct-injection sentence classifier is weak by design on payloads hidden in tool results. The optional PIGuard model (DeBERTa-v3) is built for these and layers on top of Defender - held-out indirect set (BIPIA) and an over-defense benign set (NotInject).

Check	Defender - bundled	PIGuard - optional
Indirect / code injection (BIPIA), recall	34%	96%
Over-defense (NotInject benign FPR, lower better)	10.3%	8.3%

Run them together - PIGuard after Defender - rather than choosing one.

Multilingual content safety

Content safety is a different job from injection detection. For non-English toxicity you can call Azure AI Content Safety (cloud, billed per call) or run the optional Opir model (mDeBERTa-v3) fully offline. Toxicity on textdetox/multilingual_toxicity_dataset, balanced per language; cells are recall / FPR. textdetox negatives are real, partly borderline social-media comments, so absolute FPRs run high for every model.

Language	Opir - offline	Azure CS - cloud
German	72% / 24%	92% / 52%
Spanish	76% / 24%	92% / 20%
Russian	52% / 16%	76% / 8%
Arabic	40% / 36%	84% / 24%
Chinese	40% / 28%	44% / 32%
Hindi	56% / 16%	64% / 4%

Azure generally leads on recall; Opir trades some of that for running locally, free, and PII-safe, with comparable-or-lower FPR on German and Chinese. Use Opir when you need offline or sovereign deployment; reach for Azure when a cloud call is fine and you want maximum recall.

Architecture

Input → Agent → Output.

Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.

1. Input Guardrails

Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.

2. Agent Execution

Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.

3. Output Guardrails

Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM with the failure reason on violations.

Packages

Pick what you need.

Eight packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.

AgentGuard

All-in-one package. Core rules engine, bundled Defender multi-head ONNX model (minilm-multihead-v5), and offline classifiers. No agent framework dependency.

AgentGuard.Pii

Flagship offline PII engine: ~50 entity types across generic + always-on US + opt-in country packs, lemma-aware context scoring, anonymization operators, reversible de-identification, and structured JSON/CSV + batch APIs. Architecture inspired by Microsoft Presidio (MIT). Fully offline.

AgentGuard.AgentFramework

Microsoft Agent Framework adapter. UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().

AgentGuard.RemoteClassifier

Remote ML classifier via HTTP. Call Sentinel-v2, Ollama, vLLM, or custom endpoints for SOTA prompt injection detection.

AgentGuard.Azure

Azure AI Content Safety integration. Prompt Shields (injection detection), protected material detection (text & code with license citations), category analysis, severity thresholds, and server-side blocklists.

AgentGuard.Hosting

DI registration, named policy factory, and appsettings.json configuration binding for ASP.NET Core and Aspire.

Built on two standalone engines: TasmanianDevil (the PII engine behind AgentGuard.Pii) and Kyoto (the ONNX classifiers behind AgentGuard.Onnx).

LLM-as-Judge

When regex isn't enough.

Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.

using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;

// Multi-tier prompt injection: Regex → Defender → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()                            // regex (order 10)
    .BlockPromptInjectionWithDefender()                    // Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier(          // remote ML (order 13)
        "http://localhost:8000/classify")
    .BlockPromptInjectionWithAzurePromptShield(         // Azure Prompt Shield (order 14)
        endpoint, apiKey)
    .BlockPromptInjectionWithLlm(chatClient)           // LLM (order 15)
    .DetectPIIWithLlm(chatClient,
        new() { Action = PiiAction.Redact })
    .EnforceOutputPolicy(chatClient,
        "Never recommend competitor products")
    .CheckGroundedness(chatClient)
    .CheckCopyright(chatClient)
    .Build();

Re-ask Experimental

Re-ask on violation.

When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.

var policy = new GuardrailPolicyBuilder()
    .EnforceOutputPolicy(chatClient, "Never recommend competitors")
    .CheckGroundedness(chatClient)
    .EnableReask(chatClient, o =>
    {
        o.MaxAttempts = 2;
        o.IncludeBlockedResponse = true;
    })
    .Build();

var result = await pipeline.RunAsync(outputContext);

if (result.WasReasked)
    Console.WriteLine($"Re-asked {result.ReaskAttemptsUsed} time(s)");

Observability

See every rule fire.

OpenTelemetry-compatible spans and metrics out of the box. Register with one line - works with Aspire, Jaeger, Zipkin, and any OTel collector.

using AgentGuard.Hosting;

// Register AgentGuard telemetry with OpenTelemetry
builder.Services.AddOpenTelemetry()
    .WithTracing(t => t.AddAgentGuardInstrumentation())
    .WithMetrics(m => m.AddAgentGuardInstrumentation());

// Spans emitted:
//   agentguard.pipeline.run          (policy, phase, outcome)
//   agentguard.rule.evaluate {name}  (rule, phase, order, outcome)
//   agentguard.pipeline.reask        (attempts, outcome)
//   agentguard.middleware.input      (agent, outcome)
//   agentguard.middleware.output     (agent, outcome, tool calls)
//
// Metrics emitted:
//   agentguard.pipeline.evaluations  (counter)
//   agentguard.rule.evaluations      (counter)
//   agentguard.rule.blocks           (counter)
//   agentguard.pipeline.duration     (histogram, ms)
//   agentguard.rule.duration         (histogram, ms)

Extensible

Build your own rules.

Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.

using AgentGuard.Core.Abstractions;

public class NoProfanityRule : IGuardrailRule
{
    public string Name => "no-profanity";
    public GuardrailPhase Phase => GuardrailPhase.Output;
    public int Order => 100;

    public ValueTask<GuardrailResult> EvaluateAsync(
        GuardrailContext context,
        CancellationToken cancellationToken = default)
    {
        var hasProfanity = ProfanityDetector.Check(context.Text);

        return ValueTask.FromResult(hasProfanity
            ? GuardrailResult.Blocked("Inappropriate language.")
            : GuardrailResult.Passed());
    }
}

// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()
    .AddRule(new NoProfanityRule())
    .Build();

Safety controls for.NET AI agents.