Composable, testable, declarative guardrails. Block prompt injections, redact PII, enforce topics, and validate outputs.
dotnet add package AgentGuard --prerelease
Quick Start
Pick the level that fits your architecture - framework-agnostic standalone, IChatClient decorator, or Microsoft Agent Framework middleware.
1. Standalone pipeline - framework-agnostic, run rules directly
using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
using AgentGuard.Onnx;
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPii()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
.Build();
var pipeline = new GuardrailPipeline(policy, logger);
var ctx = new GuardrailContext
{
Text = userInput,
Phase = GuardrailPhase.Input,
Messages = conversationHistory // optional - enables history-aware rules
};
var result = await pipeline.RunAsync(ctx);
if (result.IsBlocked)
Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
Console.WriteLine(result.FinalText);
2. IChatClient decorator - wrap any chat client with one call
using AgentGuard.Core.ChatClient;
using AgentGuard.Onnx;
// Wrap any IChatClient - works with OpenAI, Azure OpenAI, Ollama, or any
// Microsoft.Extensions.AI client. Conversation history is propagated automatically.
var guardedClient = chatClient.UseAgentGuard(g => g
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPii()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
);
// Use exactly like a normal IChatClient
var response = await guardedClient.GetResponseAsync(conversationHistory);
// Streaming works too
await foreach (var update in guardedClient.GetStreamingResponseAsync(conversationHistory))
{
Console.Write(update.Text);
}
3. Microsoft Agent Framework middleware - plug into AIAgentBuilder
using AgentGuard.AgentFramework;
using AgentGuard.Onnx;
var guardedAgent = agent
.AsBuilder()
.UseAgentGuard(g => g
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPii()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
)
.Build();
// Use exactly like a normal agent
var response = await guardedAgent.RunAsync(messages, session, options);
// Streaming works too - with progressive retraction support
await foreach (var update in guardedAgent.RunStreamingAsync(messages, session, options))
{
Console.Write(update.Text);
}
Features
22 built-in rules across the input and output phases - regex, ONNX classifiers, and LLM-as-judge.
Six tiers: regex patterns (from Arcanum taxonomy), bundled StackOne Defender multi-head ONNX model (minilm-multihead-v5, calibrated dual-head, ~8ms, no download), optional DeBERTa v3 classifiers including PIGuard (strong on indirect / code-style injection), remote ML classifier (Sentinel-v2 via HTTP), Azure Prompt Shields (jailbreaks + indirect injection), and LLM-as-judge with structured threat classification.
Toxicity, hate speech, violence, sexual content, self-harm, and harassment. Plug in Azure AI Content Safety, or run the offline Opir mDeBERTa classifier for non-English text (German, Spanish, Russian, Arabic, Chinese, Hindi) when a per-call cloud API isn't an option.
A complete offline PII detection + de-identification engine - ~50 entity types, reversible encryption, structured JSON/CSV redaction, and a batch API. Always-on generic + US recognizers, opt-in country packs (UK/DE/IN/IT/ES), and an optional multilingual NER add-on. See the full breakdown →
Keep conversations on-topic with LLM semantic classification. Understands intent and conversation context - not just keywords. Conversation history is automatically included so follow-up messages are evaluated correctly.
Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.
Decodes evasion encodings - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates, so encoding-based attacks are caught by the downstream rules.
Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.
Gate any rule per request with .When() / .Unless(). The predicate sees the guardrail context and can capture ambient services like IHttpContextAccessor - enable, disable, or retune a rule based on ClaimsPrincipal, tenant, feature flag, or detected language (e.g. run the English-centric Defender classifier at a higher threshold for non-English users).
Built-in spans and metrics for every pipeline run, rule evaluation, re-ask attempt, and streaming retraction. Uses System.Diagnostics - no SDK dependency. Works with Aspire Dashboard, Jaeger, Zipkin, and any OTel collector.
Flagship Capability
Detection and reversible de-identification, written from scratch in C# - architecture inspired by Microsoft Presidio (MIT; not affiliated or endorsed). No cloud, no ML required for the core. PII never leaves your process.
8 generic recognizers (email, phone via libphonenumber, credit card / Luhn, IBAN / mod-97, crypto, IP, URL, MAC) and a
9-entity US pack are always on. Opt-in country packs add national IDs, tax numbers, and passports for the
UK, Germany, India, Italy, and Spain. An optional offline ONNX add-on layers in multilingual named-entity detection
(PERSON, LOCATION, ORGANIZATION, DATE_TIME), all resolved in one pass.
De-identify, then restore - encrypt is fully reversible
var analyzer = new AnalyzerEngine(PiiRecognizers.CreateDefaultRegistry("en"));
var anonymizer = new AnonymizerEngine();
// input : "Email ada@acme.com or call +1 415 555 0132"
var results = analyzer.Analyze(text, "en");
var anonymized = anonymizer.Anonymize(text, results);
// replace: "Email <EMAIL_ADDRESS> or call <PHONE_NUMBER>"
// reversible round-trip: encrypt with a key, persist the items, decrypt later
var deid = PiiDeidentificationResult.FromEngineResult(encrypted); // deid.IsReversible == true
var restored = new DeanonymizerEngine().Deanonymize(deid.AnonymizedText, deid.Items, decryptOps);
// restored.Text == original text (byte-for-byte)
In a Microsoft Agent Framework agent - PII never reaches the model
var agent = chatClient
.AsAIAgent(instructions, name: "SupportBot", tools: [lookupCustomer])
.AsBuilder()
.UseAgentGuard(g => g.RedactPii().GuardToolResults()) // input, output & tool-result redaction
.UsePiiReversibleRedaction(key) // encrypt in, decrypt out
.Build();
// user : "Email me at john@example.com about order 12345"
// model sees : "Email me at <AES ciphertext token> about order 12345"
// user gets : "...follow up on john@example.com" (decrypted only on the way out)
Encrypt PII spans (AES) and restore them later with DeanonymizerEngine. Lossy operators (mask, hash, redact) report themselves as non-reversible; a wrong key fails loudly, never silently.
Redact JSON by key path (allow / deny) preserving shape and non-string types, or infer PII columns in CSV/TSV and redact them consistently - reusing the same recognizers and operators.
Analyze and anonymize lists or keyed record dictionaries in one call, with results aligned to the input. Sequential, allocation-light, and safe to share across threads.
Deterministic regex + checksums with no network calls and no telemetry. Sensitive data stays inside the process - suitable for air-gapped and regulated environments.
Opt-in national identifier packs (UK / DE / IN / IT / ES) are validated with real checksums and disabled by default to keep false positives low. The US pack is always on.
A separate offline ONNX model adds names, places, organizations, and dates across languages - merged into the same order-20 pass as the regex recognizers when you want it.
Built-in Rules
Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.
| Order | Rule | Type | Phase |
|---|---|---|---|
| 5 | InputNormalizationRule |
Local | Input |
| 8 | RetrievalGuardrailRule |
Regex | Input |
| 10 | PromptInjectionRule |
Regex | Input |
| 11 | DefenderPromptInjectionRule |
ONNX ML (bundled) | Input |
| 12 | OnnxPromptInjectionRule |
ONNX ML (DeBERTa) | Input |
| 12 | PIGuardPromptInjectionRule |
ONNX ML (DeBERTa, PIGuard) | Input |
| 13 | RemotePromptInjectionRule |
Remote ML | Input |
| 14 | AzurePromptShieldRule |
Azure API | Input |
| 15 | LlmPromptInjectionRule |
LLM | Input |
| 20 | PiiRule (+ optional GlinerNerRecognizer) |
RegexONNX NER | Both |
| 22 | SecretsDetectionRule |
Regex | Both |
| 25 | LlmPiiDetectionRule |
LLM | Both |
| 35 | LlmTopicGuardrailRule |
LLM | Input |
| 40 | TokenLimitRule |
Local | Input / Output |
| 45 | ToolCallGuardrailRule |
Regex | Output |
| 47 | ToolResultGuardrailRule |
Regex | Output |
| 50 | ContentSafetyRule |
Pluggable | Both |
| 50 | OpirSafetyRule |
ONNX ML (mDeBERTa, multilingual) | Input |
| 55 | LlmOutputPolicyRule |
LLM | Output |
| 65 | LlmGroundednessRule |
LLM | Output |
| 75 | LlmCopyrightRule |
LLM | Output |
| 76 | AzureProtectedMaterialRule |
Azure API | Output |
Benchmarks
The classifiers are complementary, not competing. The bundled Defender model is the fast default
for English prompt injection; PIGuard and Opir are optional models you can layer on for cases it
isn't built for. Numbers are from held-out datasets - full method and data in the
eng/*-eval RESULTS files. Recall = % of unsafe inputs blocked; FPR = % of safe inputs blocked.
The real rules head to head on a balanced 25/class held-out sample (jackhhao English jailbreaks,
deepset German injections); cells are recall / FPR. The bundled Defender model is the
fast default - on the full jackhhao test split it scores 90.6% / 0.8%. The LLM-as-judge tier
(LlmPromptInjectionRule) runs on any IChatClient you supply -
AgentGuard bundles no LLM; the two models below are illustrative bring-your-own
examples (one capable, one tiny), not shipped components. Per-call latency measured on an Apple M4 Pro.
| Classifier | jackhhao | deepset (German) | per call |
|---|---|---|---|
| regex (medium) | 60% / 8% | 8% / 0% | <1 ms |
| Defender (bundled) | 92% / 4% | 64% / 0% | ~8 ms |
| LLM (BYO) · gemma-4-26b-a4b | 96% / 0% | 68% / 0% | ~6 s |
| LLM (BYO) · qwen3-0.6b | 32% / 16% | 16% / 0% | ~1.5 s |
The LLM rows illustrate that quality scales hard with the model you bring: the MoE gemma-4-26b-a4b (~4B active) tops every column, while a tiny 0.6B lands well below Defender. Both happen to run fully locally, so an LLM tier is a real option for offline deployments - but the model is yours to choose, so budget the latency and pick a capable one. PIGuard isn't in this table: both datasets are in its training set, so its numbers would be optimistic - it's shown on indirect injection below, where it's held-out.
A direct-injection sentence classifier is weak by design on payloads hidden in tool results. The optional PIGuard model (DeBERTa-v3) is built for these and layers on top of Defender - held-out indirect set (BIPIA) and an over-defense benign set (NotInject).
| Check | Defender - bundled | PIGuard - optional |
|---|---|---|
| Indirect / code injection (BIPIA), recall | 34% | 96% |
| Over-defense (NotInject benign FPR, lower better) | 10.3% | 8.3% |
Run them together - PIGuard after Defender - rather than choosing one.
Content safety is a different job from injection detection. For non-English toxicity you can call Azure AI
Content Safety (cloud, billed per call) or run the optional Opir model (mDeBERTa-v3) fully offline. Toxicity
on textdetox/multilingual_toxicity_dataset, balanced per language; cells are recall / FPR.
textdetox negatives are real, partly borderline social-media comments, so absolute FPRs run high for every model.
| Language | Opir - offline | Azure CS - cloud |
|---|---|---|
| German | 72% / 24% | 92% / 52% |
| Spanish | 76% / 24% | 92% / 20% |
| Russian | 52% / 16% | 76% / 8% |
| Arabic | 40% / 36% | 84% / 24% |
| Chinese | 40% / 28% | 44% / 32% |
| Hindi | 56% / 16% | 64% / 4% |
Azure generally leads on recall; Opir trades some of that for running locally, free, and PII-safe, with comparable-or-lower FPR on German and Chinese. Use Opir when you need offline or sovereign deployment; reach for Azure when a cloud call is fine and you want maximum recall.
Architecture
Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.
Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.
Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.
Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM with the failure reason on violations.
Packages
Eight packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.
UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().appsettings.json configuration binding for ASP.NET Core and Aspire.LLM-as-Judge
Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.
using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;
// Multi-tier prompt injection: Regex → Defender → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection() // regex (order 10)
.BlockPromptInjectionWithDefender() // Defender ML (order 11, bundled)
.BlockPromptInjectionWithRemoteClassifier( // remote ML (order 13)
"http://localhost:8000/classify")
.BlockPromptInjectionWithAzurePromptShield( // Azure Prompt Shield (order 14)
endpoint, apiKey)
.BlockPromptInjectionWithLlm(chatClient) // LLM (order 15)
.DetectPIIWithLlm(chatClient,
new() { Action = PiiAction.Redact })
.EnforceOutputPolicy(chatClient,
"Never recommend competitor products")
.CheckGroundedness(chatClient)
.CheckCopyright(chatClient)
.Build();
Re-ask Experimental
When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.
var policy = new GuardrailPolicyBuilder()
.EnforceOutputPolicy(chatClient, "Never recommend competitors")
.CheckGroundedness(chatClient)
.EnableReask(chatClient, o =>
{
o.MaxAttempts = 2;
o.IncludeBlockedResponse = true;
})
.Build();
var result = await pipeline.RunAsync(outputContext);
if (result.WasReasked)
Console.WriteLine($"Re-asked {result.ReaskAttemptsUsed} time(s)");
Observability
OpenTelemetry-compatible spans and metrics out of the box. Register with one line - works with Aspire, Jaeger, Zipkin, and any OTel collector.
using AgentGuard.Hosting;
// Register AgentGuard telemetry with OpenTelemetry
builder.Services.AddOpenTelemetry()
.WithTracing(t => t.AddAgentGuardInstrumentation())
.WithMetrics(m => m.AddAgentGuardInstrumentation());
// Spans emitted:
// agentguard.pipeline.run (policy, phase, outcome)
// agentguard.rule.evaluate {name} (rule, phase, order, outcome)
// agentguard.pipeline.reask (attempts, outcome)
// agentguard.middleware.input (agent, outcome)
// agentguard.middleware.output (agent, outcome, tool calls)
//
// Metrics emitted:
// agentguard.pipeline.evaluations (counter)
// agentguard.rule.evaluations (counter)
// agentguard.rule.blocks (counter)
// agentguard.pipeline.duration (histogram, ms)
// agentguard.rule.duration (histogram, ms)
Extensible
Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.
using AgentGuard.Core.Abstractions;
public class NoProfanityRule : IGuardrailRule
{
public string Name => "no-profanity";
public GuardrailPhase Phase => GuardrailPhase.Output;
public int Order => 100;
public ValueTask<GuardrailResult> EvaluateAsync(
GuardrailContext context,
CancellationToken cancellationToken = default)
{
var hasProfanity = ProfanityDetector.Check(context.Text);
return ValueTask.FromResult(hasProfanity
? GuardrailResult.Blocked("Inappropriate language.")
: GuardrailResult.Passed());
}
}
// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
.BlockPromptInjection()
.AddRule(new NoProfanityRule())
.Build();