AgentGuard - Guardrails for .NET AI Agents

Quick Start

Three integration tiers.

Pick the level that fits your architecture - framework-agnostic standalone, IChatClient decorator, or Microsoft Agent Framework middleware.

1. Standalone pipeline - framework-agnostic, run rules directly

using AgentGuard.Core.Abstractions;
using AgentGuard.Core.Builders;
using AgentGuard.Core.Guardrails;
using AgentGuard.Onnx;

var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPII()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);

var ctx = new GuardrailContext
{
    Text = userInput,
    Phase = GuardrailPhase.Input,
    Messages = conversationHistory   // optional - enables history-aware rules
};

var result = await pipeline.RunAsync(ctx);

if (result.IsBlocked)
    Console.WriteLine(result.BlockingResult!.Reason);
else if (result.WasModified)
    Console.WriteLine(result.FinalText);

2. IChatClient decorator - wrap any chat client with one call

using AgentGuard.Core.ChatClient;
using AgentGuard.Onnx;

// Wrap any IChatClient - works with OpenAI, Azure OpenAI, Ollama, or any
// Microsoft.Extensions.AI client. Conversation history is propagated automatically.
var guardedClient = chatClient.UseAgentGuard(g => g
    .NormalizeInput()
    .BlockPromptInjection()
    .BlockPromptInjectionWithDefender()
    .RedactPII()
    .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
    .LimitInputTokens(4000)
);

// Use exactly like a normal IChatClient
var response = await guardedClient.GetResponseAsync(conversationHistory);

// Streaming works too
await foreach (var update in guardedClient.GetStreamingResponseAsync(conversationHistory))
{
    Console.Write(update.Text);
}

3. Microsoft Agent Framework middleware - plug into AIAgentBuilder

using AgentGuard.AgentFramework;
using AgentGuard.Onnx;

var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g
        .NormalizeInput()
        .BlockPromptInjection()
        .BlockPromptInjectionWithDefender()
        .RedactPII()
        .EnforceTopicBoundaryWithLlm(chatClient, "billing", "returns")
        .LimitInputTokens(4000)
    )
    .Build();

// Use exactly like a normal agent
var response = await guardedAgent.RunAsync(messages, session, options);

// Streaming works too - with progressive retraction support
await foreach (var update in guardedAgent.RunStreamingAsync(messages, session, options))
{
    Console.Write(update.Text);
}

Features

Everything you need.
Nothing you don't.

21 built-in rules covering the full input/output safety lifecycle. Regex for speed, ONNX ML for accuracy, LLM for nuance.

Prompt Injection Detection

Six-tier defense: regex patterns (from Arcanum taxonomy), bundled StackOne Defender ONNX model (F1 ~0.97, ~8ms, no download), optional DeBERTa v3, remote ML classifier (Sentinel-v2 via HTTP), Azure Prompt Shields (jailbreaks + indirect injection), and LLM-as-judge with structured threat classification.

PII Redaction

Regex-based redaction for emails, phones, SSNs, credit cards, IPs, and dates of birth. LLM-based detection catches names, addresses, and contextual identifiers.

Topic Enforcement

Keep conversations on-topic with LLM semantic classification. Understands intent and conversation context - not just keywords. Conversation history is automatically included so follow-up messages are evaluated correctly.

Output Validation

Policy enforcement, groundedness checking, and copyright detection. Catch hallucinations, brand violations, and copyrighted content before they reach users.

Input Normalization

Decodes evasion tricks - base64, hex escapes, reversed text, Unicode homoglyphs - before any other rule evaluates. Stops encoding-based attacks cold.

Agentic & RAG Guardrails

Guard tool call arguments against SQL injection, code injection, SSRF, and more. Detect indirect prompt injection in tool results (emails, documents). Filter RAG chunks for injection and secrets before they reach the LLM context.

Observability (OpenTelemetry)

Built-in spans and metrics for every pipeline run, rule evaluation, re-ask attempt, and streaming retraction. Uses System.Diagnostics — no SDK dependency. Works with Aspire Dashboard, Jaeger, Zipkin, and any OTel collector.

Benchmark

Measured, not marketed.

Prompt injection detection benchmarked on 500 adversarial samples from jayavibhav/prompt-injection-safety.

Classifier	Precision	Recall	F1
StackOne Defender (bundled)	96.9%	97.3%	97.1%
DeBERTa v3 ONNX	50.7%	99.1%	67.1%
Local LLM (gpt-oss-20b)	100%	40.5%	57.7%
Regex (all levels)	100%	3.6%	7.0%
Azure Prompt Shield	85.9%	35.6%	50.3%

Defender runs in ~8ms with zero configuration - the model ships inside the NuGet package. Combine tiers for defense in depth: regex short-circuits obvious attacks, Defender catches the rest.

All benchmarks on 2020 M1 MacBook Air. Local LLM (gpt-oss-20b) served from 2025 M4 Pro Mac Mini via LM Studio. Azure Prompt Shield on free tier (5 RPS). All classifiers ran with 0 errors across 500 samples.

Built-in Rules

21 rules. Ordered by cost.

Cheap regex checks run first and short-circuit. Expensive LLM calls only run if needed.

Order	Rule	Type	Phase
5	`InputNormalizationRule`	Local	Input
8	`RetrievalGuardrailRule`	Regex	Input
10	`PromptInjectionRule`	Regex	Input
11	`DefenderPromptInjectionRule`	ONNX ML (bundled)	Input
12	`OnnxPromptInjectionRule`	ONNX ML (DeBERTa)	Input
13	`RemotePromptInjectionRule`	Remote ML	Input
14	`AzurePromptShieldRule`	Azure API	Input
15	`LlmPromptInjectionRule`	LLM	Input
20	`PiiRedactionRule`	Regex	Both
22	`SecretsDetectionRule`	Regex	Both
25	`LlmPiiDetectionRule`	LLM	Both
35	`LlmTopicGuardrailRule`	LLM	Input
40	`TokenLimitRule`	Local	Input / Output
45	`ToolCallGuardrailRule`	Regex	Output
47	`ToolResultGuardrailRule`	Regex	Output
50	`ContentSafetyRule`	Pluggable	Both
55	`LlmOutputPolicyRule`	LLM	Output
65	`LlmGroundednessRule`	LLM	Output
75	`LlmCopyrightRule`	LLM	Output

Architecture

Input → Agent → Output.

Rules run as a pipeline. Input guardrails protect the LLM. Output guardrails protect the user.

1. Input Guardrails

Normalization, RAG chunk filtering, prompt injection detection, PII redaction, secrets detection, topic boundary, token limits - all before the LLM sees the input.

2. Agent Execution

Your AI agent runs with the sanitized input. Works with any framework - MAF, Semantic Kernel, or standalone. Supports both RunAsync and streaming.

3. Output Guardrails

Tool call injection detection, indirect injection in tool results, PII/secrets redaction, policy compliance, groundedness checking, copyright detection - all before the response reaches the user. Optional re-ask re-prompts the LLM on violations for self-healing responses.

Packages

Pick what you need.

Seven packages, layered. The main AgentGuard package is all you need to start. Add framework adapters, cloud integrations, or remote classifiers as needed.

AgentGuard

All-in-one package. Core rules engine, bundled Defender ONNX model (F1 ~0.97), and offline classifiers. No agent framework dependency.

AgentGuard.AgentFramework

Microsoft Agent Framework adapter. UseAgentGuard() middleware + workflow guardrails via .WithGuardrails().

AgentGuard.RemoteClassifier

Remote ML classifier via HTTP. Call Sentinel-v2, Ollama, vLLM, or custom endpoints for SOTA prompt injection detection.

AgentGuard.Azure

Azure AI Content Safety integration. Prompt Shields (injection detection), protected material detection (text & code with license citations), category analysis, severity thresholds, and server-side blocklists.

AgentGuard.Hosting

DI registration, named policy factory, and appsettings.json configuration binding for ASP.NET Core and Aspire.

LLM-as-Judge

When regex isn't enough.

Plug in any IChatClient - Azure OpenAI, Ollama, local models. Built-in prompt templates, fail-open on errors.

using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;

// Multi-tier prompt injection: Regex → Defender → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()
    .BlockPromptInjection()                            // regex (order 10)
    .BlockPromptInjectionWithDefender()                    // Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier(          // remote ML (order 13)
        "http://localhost:8000/classify")
    .BlockPromptInjectionWithAzurePromptShield(         // Azure Prompt Shield (order 14)
        endpoint, apiKey)
    .BlockPromptInjectionWithLlm(chatClient)           // LLM (order 15)
    .DetectPIIWithLlm(chatClient,
        new() { Action = PiiAction.Redact })
    .EnforceOutputPolicy(chatClient,
        "Never recommend competitor products")
    .CheckGroundedness(chatClient)
    .CheckCopyright(chatClient)
    .Build();

Self-healing Experimental

Re-ask on violation.

When output guardrails block a response, the pipeline can re-prompt the LLM with the failure reason and re-evaluate. Opt-in, configurable, non-streaming only for now.

var policy = new GuardrailPolicyBuilder()
    .EnforceOutputPolicy(chatClient, "Never recommend competitors")
    .CheckGroundedness(chatClient)
    .EnableReask(chatClient, o =>
    {
        o.MaxAttempts = 2;
        o.IncludeBlockedResponse = true;
    })
    .Build();

var result = await pipeline.RunAsync(outputContext);

if (result.WasReasked)
    Console.WriteLine($"Self-healed after {result.ReaskAttemptsUsed} attempt(s)");

Observability

See every rule fire.

OpenTelemetry-compatible spans and metrics out of the box. Register with one line — works with Aspire, Jaeger, Zipkin, and any OTel collector.

using AgentGuard.Hosting;

// Register AgentGuard telemetry with OpenTelemetry
builder.Services.AddOpenTelemetry()
    .WithTracing(t => t.AddAgentGuardInstrumentation())
    .WithMetrics(m => m.AddAgentGuardInstrumentation());

// Spans emitted:
//   agentguard.pipeline.run          (policy, phase, outcome)
//   agentguard.rule.evaluate {name}  (rule, phase, order, outcome)
//   agentguard.pipeline.reask        (attempts, outcome)
//   agentguard.middleware.input      (agent, outcome)
//   agentguard.middleware.output     (agent, outcome, tool calls)
//
// Metrics emitted:
//   agentguard.pipeline.evaluations  (counter)
//   agentguard.rule.evaluations      (counter)
//   agentguard.rule.blocks           (counter)
//   agentguard.pipeline.duration     (histogram, ms)
//   agentguard.rule.duration         (histogram, ms)

Extensible

Build your own rules.

Implement IGuardrailRule and plug it in. Full access to conversation context, phase, and metadata.

using AgentGuard.Core.Abstractions;

public class NoProfanityRule : IGuardrailRule
{
    public string Name => "no-profanity";
    public GuardrailPhase Phase => GuardrailPhase.Output;
    public int Order => 100;

    public ValueTask<GuardrailResult> EvaluateAsync(
        GuardrailContext context,
        CancellationToken cancellationToken = default)
    {
        var hasProfanity = ProfanityDetector.Check(context.Text);

        return ValueTask.FromResult(hasProfanity
            ? GuardrailResult.Blocked("Inappropriate language.")
            : GuardrailResult.Passed());
    }
}

// Add to any pipeline
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()
    .AddRule(new NoProfanityRule())
    .Build();

Safety controls for.NET AI agents.