Safety controls for
.NET AI agents.

Composable, testable, declarative guardrails. Block prompt injections, redact PII, enforce topics, validate outputs - in two lines of code.

Get Started View on GitHub
dotnet add package AgentGuard --prerelease

Quick Start

No framework required.

The core engine works standalone. Build a policy, run the pipeline, get a result. Use it with any .NET AI stack - or none at all.

Standalone pipeline

using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;

var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()
    .RedactPII()
    .EnforceTopicBoundary("billing", "returns")
    .LimitInputTokens(4000)
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);
var ctx = new GuardrailContext
{
    Text = userInput,
    Phase = GuardrailPhase.Input
};

var result = await pipeline.RunAsync(ctx);

if (result.IsBlocked)
    Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
    Console.WriteLine(result.FinalText);

With Microsoft Agent Framework

using AgentGuard.AgentFramework;

// Two lines to add guardrails
var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g
        .NormalizeInput()
        .BlockPromptInjection()
        .RedactPII()
        .EnforceTopicBoundary("billing")
        .LimitInputTokens(4000)
    )
    .Build();

// Use exactly like a normal agent
var response = await guardedAgent
    .RunAsync(messages, session, options);

// Streaming works too
await foreach (var update in guardedAgent
    .RunStreamingAsync(messages, session, options))
{
    Console.Write(update.Text);
}

Everything you need.
Nothing you don't.

20 built-in rules covering the full input/output safety lifecycle. Regex for speed, ONNX ML for accuracy, LLM for nuance.

Prompt Injection Detection

Five-tier defense: regex patterns (from Arcanum taxonomy), bundled StackOne Defender ONNX model (F1 ~0.97, ~8ms, no download), optional DeBERTa v3, remote ML classifier (Sentinel-v2 via HTTP), and LLM-as-judge with structured threat classification.

PII Redaction

Regex-based redaction for emails, phones, SSNs, credit cards, IPs, and dates of birth. LLM-based detection catches names, addresses, and contextual identifiers.

Topic Enforcement

Keep conversations on-topic with keyword matching, embedding-based similarity, or LLM semantic classification. Works on both input and output.

Output Validation

Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.

Input Normalization

Decodes evasion tricks - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates. Stops encoding-based attacks cold.

Agentic & RAG Guardrails

Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.

Benchmark

Measured, not marketed.

Prompt injection detection benchmarked on 500 adversarial samples from jayavibhav/prompt-injection-safety.

Classifier Precision Recall F1
StackOne Defender (bundled) 96.9% 97.3% 97.1%
DeBERTa v3 ONNX 50.7% 99.1% 67.1%
LLM (gpt-oss-20b) 100% 40.9% 58.1%
Regex (all levels) 100% 3.6% 7.0%

Defender runs in ~8ms with zero configuration - the model ships inside the NuGet package. Combine tiers for defense in depth: regex short-circuits obvious attacks, Defender catches the rest.

Built-in Rules

20 rules. Ordered by cost.

Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.

Order Rule Type Phase
5 InputNormalizationRule Local Input
8 RetrievalGuardrailRule Regex Input
10 PromptInjectionRule Regex Input
11 DefenderPromptInjectionRule ONNX ML (bundled) Input
12 OnnxPromptInjectionRule ONNX ML (DeBERTa) Input
13 RemotePromptInjectionRule Remote ML Input
15 LlmPromptInjectionRule LLM Input
20 PiiRedactionRule Regex Both
22 SecretsDetectionRule Regex Both
25 LlmPiiDetectionRule LLM Both
30 TopicBoundaryRule Keywords Input
35 LlmTopicGuardrailRule LLM Input
40 TokenLimitRule Local Input / Output
45 ToolCallGuardrailRule Regex Output
47 ToolResultGuardrailRule Regex Output
50 ContentSafetyRule Pluggable Both
55 LlmOutputPolicyRule LLM Output
60 OutputTopicBoundaryRule Embedding Output
65 LlmGroundednessRule LLM Output
75 LlmCopyrightRule LLM Output

Input → Agent → Output.

Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.

1. Input Guardrails

Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.

2. Agent Execution

Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.

3. Output Guardrails

Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM on violations for self-healing responses.

Packages

Pick what you need.

Seven packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.

AgentGuard
All-in-one package. Core rules engine, bundled Defender ONNX model (F1 ~0.97), and offline classifiers. No agent framework dependency.
AgentGuard.AgentFramework
Microsoft Agent Framework adapter. UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().
AgentGuard.RemoteClassifier
Remote ML classifier via HTTP. Call Sentinel-v2, Ollama, vLLM, or custom endpoints for SOTA prompt injection detection.
AgentGuard.Azure
Azure AI Content Safety integration. Category analysis, severity thresholds, and server-side blocklists.
AgentGuard.Hosting
DI registration, named policy factory, and appsettings.json configuration binding for ASP.NET Core and Aspire.

When regex isn't enough.

Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.

using AgentGuard.Onnx;

// Multi-tier prompt injection: Regex → Defender → Remote ML → LLM
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()                            // regex (order 10)
    .BlockPromptInjectionWithOnnx()                    // Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier(          // remote ML (order 13)
        "http://localhost:8000/classify")
    .BlockPromptInjectionWithLlm(chatClient)           // LLM (order 15)
    .DetectPIIWithLlm(chatClient,
        new() { Action = PiiAction.Redact })
    .EnforceOutputPolicy(chatClient,
        "Never recommend competitor products")
    .CheckGroundedness(chatClient)
    .CheckCopyright(chatClient)
    .Build();

Self-healing Experimental

Re-ask on violation.

When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.

var policy = new GuardrailPolicyBuilder()
    .EnforceOutputPolicy(chatClient, "Never recommend competitors")
    .CheckGroundedness(chatClient)
    .EnableReask(chatClient, o =>
    {
        o.MaxAttempts = 2;
        o.IncludeBlockedResponse = true;
    })
    .Build();

var result = await pipeline.RunAsync(outputContext);

if (result.WasReasked)
    Console.WriteLine($"Self-healed after {result.ReaskAttemptsUsed} attempt(s)");

Extensible

Build your own rules.

Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.

using AgentGuard.Core.Abstractions;

public class NoProfanityRule : IGuardrailRule
{
    public string Name => "no-profanity";
    public GuardrailPhase Phase => GuardrailPhase.Output;
    public int Order => 100;

    public ValueTask<GuardrailResult> EvaluateAsync(
        GuardrailContext context,
        CancellationToken cancellationToken = default)
    {
        var hasProfanity = ProfanityDetector.Check(context.Text);

        return ValueTask.FromResult(hasProfanity
            ? GuardrailResult.Blocked("Inappropriate language.")
            : GuardrailResult.Passed());
    }
}

// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()
    .AddRule(new NoProfanityRule())
    .Build();