Composable, testable, declarative guardrails. Block prompt injections, redact PII, enforce topics, validate outputs - in two lines of code.
dotnet add package AgentGuard --prerelease
Quick Start
The core engine works standalone. Build a policy, run the pipeline, get a result. Use it with any .NET AI stack - or none at all.
Standalone pipeline
using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection()
.RedactPII()
.EnforceTopicBoundary("billing", "returns")
.LimitInputTokens(4000)
.Build();
var pipeline = new GuardrailPipeline(policy, logger);
var ctx = new GuardrailContext
{
Text = userInput,
Phase = GuardrailPhase.Input
};
var result = await pipeline.RunAsync(ctx);
if (result.IsBlocked)
Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
Console.WriteLine(result.FinalText);
With Microsoft Agent Framework
using AgentGuard.AgentFramework;
// Two lines to add guardrails
var guardedAgent = agent
.AsBuilder()
.UseAgentGuard(g => g
.NormalizeInput()
.BlockPromptInjection()
.RedactPII()
.EnforceTopicBoundary("billing")
.LimitInputTokens(4000)
)
.Build();
// Use exactly like a normal agent
var response = await guardedAgent
.RunAsync(messages, session, options);
// Streaming works too
await foreach (var update in guardedAgent
.RunStreamingAsync(messages, session, options))
{
Console.Write(update.Text);
}
Features
20 built-in rules covering the full input/output safety lifecycle. Regex for speed, ONNX ML for accuracy, LLM for nuance.
Five-tier defense: regex patterns (from Arcanum taxonomy), bundled StackOne Defender ONNX model (F1 ~0.97, ~8ms, no download), optional DeBERTa v3, remote ML classifier (Sentinel-v2 via HTTP), and LLM-as-judge with structured threat classification.
Regex-based redaction for emails, phones, SSNs, credit cards, IPs, and dates of birth. LLM-based detection catches names, addresses, and contextual identifiers.
Keep conversations on-topic with keyword matching, embedding-based similarity, or LLM semantic classification. Works on both input and output.
Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.
Decodes evasion tricks - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates. Stops encoding-based attacks cold.
Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.
Benchmark
Prompt injection detection benchmarked on 500 adversarial samples from jayavibhav/prompt-injection-safety.
| Classifier | Precision | Recall | F1 |
|---|---|---|---|
| StackOne Defender (bundled) | 96.9% | 97.3% | 97.1% |
| DeBERTa v3 ONNX | 50.7% | 99.1% | 67.1% |
| LLM (gpt-oss-20b) | 100% | 40.9% | 58.1% |
| Regex (all levels) | 100% | 3.6% | 7.0% |
Defender runs in ~8ms with zero configuration - the model ships inside the NuGet package. Combine tiers for defense in depth: regex short-circuits obvious attacks, Defender catches the rest.
Built-in Rules
Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.
| Order | Rule | Type | Phase |
|---|---|---|---|
| 5 | InputNormalizationRule |
Local | Input |
| 8 | RetrievalGuardrailRule |
Regex | Input |
| 10 | PromptInjectionRule |
Regex | Input |
| 11 | DefenderPromptInjectionRule |
ONNX ML (bundled) | Input |
| 12 | OnnxPromptInjectionRule |
ONNX ML (DeBERTa) | Input |
| 13 | RemotePromptInjectionRule |
Remote ML | Input |
| 15 | LlmPromptInjectionRule |
LLM | Input |
| 20 | PiiRedactionRule |
Regex | Both |
| 22 | SecretsDetectionRule |
Regex | Both |
| 25 | LlmPiiDetectionRule |
LLM | Both |
| 30 | TopicBoundaryRule |
Keywords | Input |
| 35 | LlmTopicGuardrailRule |
LLM | Input |
| 40 | TokenLimitRule |
Local | Input / Output |
| 45 | ToolCallGuardrailRule |
Regex | Output |
| 47 | ToolResultGuardrailRule |
Regex | Output |
| 50 | ContentSafetyRule |
Pluggable | Both |
| 55 | LlmOutputPolicyRule |
LLM | Output |
| 60 | OutputTopicBoundaryRule |
Embedding | Output |
| 65 | LlmGroundednessRule |
LLM | Output |
| 75 | LlmCopyrightRule |
LLM | Output |
Architecture
Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.
Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.
Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.
Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM on violations for self-healing responses.
Packages
Seven packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.
UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().appsettings.json configuration binding for ASP.NET Core and Aspire.LLM-as-Judge
Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.
using AgentGuard.Onnx;
// Multi-tier prompt injection: Regex → Defender → Remote ML → LLM
var policy = new GuardrailPolicyBuilder()
.NormalizeInput()
.BlockPromptInjection() // regex (order 10)
.BlockPromptInjectionWithOnnx() // Defender ML (order 11, bundled)
.BlockPromptInjectionWithRemoteClassifier( // remote ML (order 13)
"http://localhost:8000/classify")
.BlockPromptInjectionWithLlm(chatClient) // LLM (order 15)
.DetectPIIWithLlm(chatClient,
new() { Action = PiiAction.Redact })
.EnforceOutputPolicy(chatClient,
"Never recommend competitor products")
.CheckGroundedness(chatClient)
.CheckCopyright(chatClient)
.Build();
Self-healing Experimental
When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.
var policy = new GuardrailPolicyBuilder()
.EnforceOutputPolicy(chatClient, "Never recommend competitors")
.CheckGroundedness(chatClient)
.EnableReask(chatClient, o =>
{
o.MaxAttempts = 2;
o.IncludeBlockedResponse = true;
})
.Build();
var result = await pipeline.RunAsync(outputContext);
if (result.WasReasked)
Console.WriteLine($"Self-healed after {result.ReaskAttemptsUsed} attempt(s)");
Extensible
Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.
using AgentGuard.Core.Abstractions;
public class NoProfanityRule : IGuardrailRule
{
public string Name => "no-profanity";
public GuardrailPhase Phase => GuardrailPhase.Output;
public int Order => 100;
public ValueTask<GuardrailResult> EvaluateAsync(
GuardrailContext context,
CancellationToken cancellationToken = default)
{
var hasProfanity = ProfanityDetector.Check(context.Text);
return ValueTask.FromResult(hasProfanity
? GuardrailResult.Blocked("Inappropriate language.")
: GuardrailResult.Passed());
}
}
// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
.BlockPromptInjection()
.AddRule(new NoProfanityRule())
.Build();