Composable, testable, declarative guardrails. Block prompt injections, redact PII, enforce topics, validate outputs - in two lines of code.
dotnet add package AgentGuard --prerelease
Quick Start
Pick the level that fits your architecture - framework-agnostic standalone, IChatClient decorator, or Microsoft Agent Framework middleware.
1. Standalone pipeline - framework-agnostic, run rules directly
using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
using AgentGuard.Onnx;
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPII()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
.Build();
var pipeline = new GuardrailPipeline(policy, logger);
var ctx = new GuardrailContext
{
Text = userInput,
Phase = GuardrailPhase.Input,
Messages = conversationHistory // optional - enables history-aware rules
};
var result = await pipeline.RunAsync(ctx);
if (result.IsBlocked)
Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
Console.WriteLine(result.FinalText);
2. IChatClient decorator - wrap any chat client with one call
using AgentGuard.Core.ChatClient;
using AgentGuard.Onnx;
// Wrap any IChatClient - works with OpenAI, Azure OpenAI, Ollama, or any
// Microsoft.Extensions.AI client. Conversation history is propagated automatically.
var guardedClient = chatClient.UseAgentGuard(g => g
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPII()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
);
// Use exactly like a normal IChatClient
var response = await guardedClient.GetResponseAsync(conversationHistory);
// Streaming works too
await foreach (var update in guardedClient.GetStreamingResponseAsync(conversationHistory))
{
Console.Write(update.Text);
}
3. Microsoft Agent Framework middleware - plug into AIAgentBuilder
using AgentGuard.AgentFramework;
using AgentGuard.Onnx;
var guardedAgent = agent
.AsBuilder()
.UseAgentGuard(g => g
.NormalizeInput()
.BlockPromptInjection()
.BlockPromptInjectionWithDefender()
.RedactPII()
.EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
.LimitInputTokens(4000)
)
.Build();
// Use exactly like a normal agent
var response = await guardedAgent.RunAsync(messages, session, options);
// Streaming works too - with progressive retraction support
await foreach (var update in guardedAgent.RunStreamingAsync(messages, session, options))
{
Console.Write(update.Text);
}
Features
21 built-in rules covering the full input/output safety lifecycle. Regex for speed, ONNX ML for accuracy, LLM for nuance.
Six-tier defense: regex patterns (from Arcanum taxonomy), bundled StackOne Defender ONNX model (F1 ~0.97, ~8ms, no download), optional DeBERTa v3, remote ML classifier (Sentinel-v2 via HTTP), Azure Prompt Shields (jailbreaks + indirect injection), and LLM-as-judge with structured threat classification.
Regex-based redaction for emails, phones, SSNs, credit cards, IPs, and dates of birth. LLM-based detection catches names, addresses, and contextual identifiers.
Keep conversations on-topic with LLM semantic classification. Understands intent and conversation context - not just keywords. Conversation history is automatically included so follow-up messages are evaluated correctly.
Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.
Decodes evasion tricks - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates. Stops encoding-based attacks cold.
Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.
Built-in spans and metrics for every pipeline run, rule evaluation, re-ask attempt, and streaming retraction. Uses System.Diagnostics — no SDK dependency. Works with Aspire Dashboard, Jaeger, Zipkin, and any OTel collector.
Benchmark
Prompt injection detection benchmarked on 500 adversarial samples from jayavibhav/prompt-injection-safety.
| Classifier | Precision | Recall | F1 |
|---|---|---|---|
| StackOne Defender (bundled) | 96.9% | 97.3% | 97.1% |
| DeBERTa v3 ONNX | 50.7% | 99.1% | 67.1% |
| Local LLM (gpt-oss-20b) | 100% | 40.5% | 57.7% |
| Regex (all levels) | 100% | 3.6% | 7.0% |
| Azure Prompt Shield | 85.9% | 35.6% | 50.3% |
Defender runs in ~8ms with zero configuration - the model ships inside the NuGet package. Combine tiers for defense in depth: regex short-circuits obvious attacks, Defender catches the rest.
All benchmarks on 2020 M1 MacBook Air. Local LLM (gpt-oss-20b) served from 2025 M4 Pro Mac Mini via LM Studio. Azure Prompt Shield on free tier (5 RPS). All classifiers ran with 0 errors across 500 samples.
Built-in Rules
Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.
| Order | Rule | Type | Phase |
|---|---|---|---|
| 5 | InputNormalizationRule |
Local | Input |
| 8 | RetrievalGuardrailRule |
Regex | Input |
| 10 | PromptInjectionRule |
Regex | Input |
| 11 | DefenderPromptInjectionRule |
ONNX ML (bundled) | Input |
| 12 | OnnxPromptInjectionRule |
ONNX ML (DeBERTa) | Input |
| 13 | RemotePromptInjectionRule |
Remote ML | Input |
| 14 | AzurePromptShieldRule |
Azure API | Input |
| 15 | LlmPromptInjectionRule |
LLM | Input |
| 20 | PiiRedactionRule |
Regex | Both |
| 22 | SecretsDetectionRule |
Regex | Both |
| 25 | LlmPiiDetectionRule |
LLM | Both |
| 35 | LlmTopicGuardrailRule |
LLM | Input |
| 40 | TokenLimitRule |
Local | Input / Output |
| 45 | ToolCallGuardrailRule |
Regex | Output |
| 47 | ToolResultGuardrailRule |
Regex | Output |
| 50 | ContentSafetyRule |
Pluggable | Both |
| 55 | LlmOutputPolicyRule |
LLM | Output |
| 65 | LlmGroundednessRule |
LLM | Output |
| 75 | LlmCopyrightRule |
LLM | Output |
Architecture
Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.
Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.
Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.
Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM on violations for self-healing responses.
Packages
Seven packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.
UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().appsettings.json configuration binding for ASP.NET Core and Aspire.LLM-as-Judge
Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.
using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;
// Multi-tier prompt injection: Regex → Defender → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection() // regex (order 10)
.BlockPromptInjectionWithDefender() // Defender ML (order 11, bundled)
.BlockPromptInjectionWithRemoteClassifier( // remote ML (order 13)
"http://localhost:8000/classify")
.BlockPromptInjectionWithAzurePromptShield( // Azure Prompt Shield (order 14)
endpoint, apiKey)
.BlockPromptInjectionWithLlm(chatClient) // LLM (order 15)
.DetectPIIWithLlm(chatClient,
new() { Action = PiiAction.Redact })
.EnforceOutputPolicy(chatClient,
"Never recommend competitor products")
.CheckGroundedness(chatClient)
.CheckCopyright(chatClient)
.Build();
Self-healing Experimental
When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.
var policy = new GuardrailPolicyBuilder()
.EnforceOutputPolicy(chatClient, "Never recommend competitors")
.CheckGroundedness(chatClient)
.EnableReask(chatClient, o =>
{
o.MaxAttempts = 2;
o.IncludeBlockedResponse = true;
})
.Build();
var result = await pipeline.RunAsync(outputContext);
if (result.WasReasked)
Console.WriteLine($"Self-healed after {result.ReaskAttemptsUsed} attempt(s)");
Observability
OpenTelemetry-compatible spans and metrics out of the box. Register with one line — works with Aspire, Jaeger, Zipkin, and any OTel collector.
using AgentGuard.Hosting;
// Register AgentGuard telemetry with OpenTelemetry
builder.Services.AddOpenTelemetry()
.WithTracing(t => t.AddAgentGuardInstrumentation())
.WithMetrics(m => m.AddAgentGuardInstrumentation());
// Spans emitted:
// agentguard.pipeline.run (policy, phase, outcome)
// agentguard.rule.evaluate {name} (rule, phase, order, outcome)
// agentguard.pipeline.reask (attempts, outcome)
// agentguard.middleware.input (agent, outcome)
// agentguard.middleware.output (agent, outcome, tool calls)
//
// Metrics emitted:
// agentguard.pipeline.evaluations (counter)
// agentguard.rule.evaluations (counter)
// agentguard.rule.blocks (counter)
// agentguard.pipeline.duration (histogram, ms)
// agentguard.rule.duration (histogram, ms)
Extensible
Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.
using AgentGuard.Core.Abstractions;
public class NoProfanityRule : IGuardrailRule
{
public string Name => "no-profanity";
public GuardrailPhase Phase => GuardrailPhase.Output;
public int Order => 100;
public ValueTask<GuardrailResult> EvaluateAsync(
GuardrailContext context,
CancellationToken cancellationToken = default)
{
var hasProfanity = ProfanityDetector.Check(context.Text);
return ValueTask.FromResult(hasProfanity
? GuardrailResult.Blocked("Inappropriate language.")
: GuardrailResult.Passed());
}
}
// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
.BlockPromptInjection()
.AddRule(new NoProfanityRule())
.Build();