Research Foundations
PRECINCT's threat model, defense architecture, and governance design are informed by peer-reviewed security research, not invented in isolation. Four published papers shape how PRECINCT reasons about threats, structures defenses, and governs recursive agent execution.
Agents of Chaos
Red-teaming exercise producing 16 threat case studies. Validates PRECINCT's defense-in-depth coverage and the boundary-vs-cognition distinction.
Threat ModelSecuring MCP
Five-layer defense framework for MCP security. PRECINCT implements all five layers across its identity, policy, and gateway subsystems.
Defense FrameworkPrompt Injection
Demonstrates 85%+ attack success on agentic coding tools. Validates PRECINCT's tiered deep scan and step-up gating at the infrastructure level.
Attack ValidationRecursive Language Models
Introduces LLM recursive self-calling over unbounded contexts. PRECINCT's RLM Governance Engine enforces depth, subcall, and budget limits on these chains.
Governance ModelAgents of Chaos
Shapira, N. et al. (2026). Agents of Chaos. arXiv:2602.20021v1.
Agents of Chaos reports on a structured red-teaming exercise in which 20 researchers spent 2 weeks probing OpenClaw-based autonomous agents in a live lab environment. The agents had access to Discord, email, shell commands, and persistent memory. A realistic surface area for enterprise agentic deployments. The exercise produced 16 case studies, of which 11 resulted in successful attacks.
The paper demonstrates that autonomous agents are vulnerable not only to traditional prompt injection but also to social-engineering vectors, identity spoofing, cross-channel manipulation, and resource exhaustion attacks that exploit the gap between agent cognition and infrastructure enforcement.
20 Researchers
Adversarial team probing agent defenses across multiple attack surfaces simultaneously.
Red Team Scale16 Case Studies
Covering identity spoofing, social engineering, data exfiltration, resource exhaustion, and more.
Attack Surface11 Successful Attacks
69% success rate against unprotected agents, demonstrating the need for infrastructure-level defense.
ImpactThreat Taxonomy: All 16 Case Studies
Each case study is mapped to the PRECINCT defense layers that address it. Coverage indicates how the threat is handled: Full Coverage means infrastructure enforcement completely mitigates the vector without agent cooperation; Defense-in-Depth means multiple independent layers combine to address the vector; Infrastructure-Assisted means infrastructure provides detection, flagging, and containment while agent reasoning integrity contributes to complete defense.
| # | Case Study | PRECINCT Defense | Coverage |
|---|---|---|---|
| 1 | Agent Forgetting / Gradual Escalation | Escalation score tracking, Irreversibility gating, Rate limiting (step 11) | Infrastructure-Assisted |
| 2 | Non-Owner Compliance | OPA policy (step 6), Principal hierarchy, Irreversibility gating | Defense-in-Depth |
| 3 | Sensitive Data Disclosure via Email | DLP scanning (step 7), Email adapter mediation | Defense-in-Depth |
| 4 | Agent-to-Agent Discord Loop | Rate limiting (step 11), Discord adapter mediation | Defense-in-Depth |
| 5 | DoS via Storage Exhaustion | Request size limit (step 1), Rate limiting (step 11) | Full Coverage |
| 6 | Provider Bias Exploitation | Audit log (step 4), Model egress governance | Infrastructure-Assisted |
| 7 | Gaslighting / Social Pressure | Concession accumulator, Escalation detection | Infrastructure-Assisted |
| 8 | Identity Spoofing via Display Name | SPIFFE/SPIRE (step 3), Deep scan (step 10) | Full Coverage |
| 9 | Cross-Channel Impersonation Reset | Principal hierarchy metadata, OPA (step 6) | Defense-in-Depth |
| 10 | Agent Corruption via External Config | Data source integrity registry, Tool registry hash (step 5) | Defense-in-Depth |
| 11 | Libelous Mass Broadcast | Mass-send step-up gating, Principal hierarchy | Defense-in-Depth |
| 12 | Prompt Injection via Channels | Deep scan (step 10), DLP scanning (step 7) | Full Coverage |
| 13 | Credential Exfiltration | DLP scanning (step 7), SPIKE token substitution (step 13) | Full Coverage |
| 14 | Session Fixation | SPIFFE/SPIRE rotation (step 3), Session context (step 8) | Full Coverage |
| 15 | Memory Poisoning | Tool registry verification (step 5), Data source integrity registry | Defense-in-Depth |
| 16 | Instruction Override | Step-up gating (step 9), Deep scan (step 10) | Infrastructure-Assisted |
5 Full Coverage
Infrastructure enforcement completely mitigates the vector. No agent cooperation required.
Infrastructure-Enforced7 Defense-in-Depth
Multiple independent enforcement layers combine to address the vector with redundant coverage.
Multi-Layer4 Infrastructure-Assisted
Infrastructure provides detection, flagging, and containment. Agent reasoning integrity contributes to complete defense.
AssistedDefense-in-Depth: Threat-to-Layer Mapping
PRECINCT's 13-layer middleware chain provides defense-in-depth against the Agents of Chaos threats. The table below shows which gateway layers participate in defending against each threat category, demonstrating that most attacks are intercepted by multiple independent layers.
| Threat Category | Case Studies | Primary Layer(s) | Secondary Layer(s) | Defense Mechanism |
|---|---|---|---|---|
| Identity & Spoofing | #8, #9, #14 | SPIFFE Auth (step 3) | Session Context (step 8), OPA (step 6) | Cryptographic workload identity eliminates display-name spoofing. SPIRE-issued SVIDs rotate automatically, preventing session fixation. OPA enforces principal hierarchy for cross-channel requests. |
| Data Exfiltration | #3, #13 | DLP Scanning (step 7) | Token Substitution (step 13) | DLP blocks outbound credential patterns and sensitive content before they reach adapters. SPIKE token substitution ensures secrets are never exposed in agent-visible payloads. They are only resolved at egress. |
| Prompt Injection & Override | #12, #16 | Deep Scan (step 10) | Step-Up Gating (step 9), DLP (step 7) | Guard model inspection detects adversarial payloads in channel messages. Step-up gating requires additional authorization for elevated-risk operations triggered by suspicious instructions. |
| Resource Exhaustion & Loops | #4, #5 | Rate Limiting (step 11) | Request Size Limit (step 1), Circuit Breaker (step 12) | Per-identity rate limits prevent runaway agent-to-agent loops. Request size limits block storage exhaustion payloads at ingress. Circuit breakers isolate upstream failures from cascading. |
| Social Engineering & Manipulation | #1, #2, #7 | OPA Policy (step 6) | Session Context (step 8), Audit Log (step 4) | OPA enforces owner-only compliance rules regardless of social pressure. Escalation score tracking detects gradual privilege creep across sessions. Concession accumulators flag patterns of agent capitulation. |
| Integrity & Poisoning | #10, #15 | Tool Registry Verify (step 5) | Deep Scan (step 10) | Hash-based tool registry verification detects tampered configurations. Data source integrity registry validates external config sources before they influence agent behavior. |
| Broadcast & Amplification | #6, #11 | Step-Up Gating (step 9) | Audit Log (step 4), Rate Limiting (step 11) | Mass-send operations require step-up authorization. Audit logs provide forensic evidence for bias exploitation investigations. Rate limits bound the blast radius of amplification attacks. |
| Irreversible Action Gating | #1, #2 | Step-Up Gating (step 9) | OPA Policy (step 6), Principal Hierarchy | Actions classified as irreversible (Score=3: delete, shutdown, purge, wipe) are forced to the Deny gate when requested by non-owner principals (Level > 1) or during escalated sessions (EscalationScore > 15). The X-Precinct-Reversibility and X-Precinct-Backup-Recommended headers communicate classification results to downstream callers. |
Every Agents of Chaos threat is addressed by at least two independent PRECINCT layers. This means that even if one defense is bypassed, a secondary layer provides a backstop. That is the core principle of defense-in-depth architecture.
Boundary vs. Cognition
The paper exposes a fundamental tension in agentic AI security: some threats can be stopped entirely by infrastructure enforcement (boundary defenses), while others require the agent itself to reason correctly (cognitive defenses). Understanding this boundary is critical for realistic security postures.
What Infrastructure Can Solve
Boundary defenses operate at the network, identity, and policy layers -- they do not require the agent to cooperate or even be aware of them. PRECINCT's gateway middleware chain is a boundary defense.
- Identity spoofing (#8, #14):Cryptographic identity (SPIFFE/SPIRE) makes display-name spoofing irrelevant. The infrastructure verifies identity, not the agent.
- Credential exfiltration (#13):Late-binding token substitution means agents never see real secrets. There is nothing to exfiltrate.
- Storage exhaustion (#5):Request size limits and rate limiting are enforced before payloads reach the agent.
- Prompt injection (#12):Deep scan inspection occurs in the middleware chain, before the payload reaches the agent's context window.
- Session fixation (#14):SVID rotation is automatic and infrastructure-managed. The agent cannot prevent or interfere with it.
What Infrastructure Cannot Solve Alone
Cognitive defenses require the agent to maintain reasoning integrity under adversarial pressure. Infrastructure can detect and flag these situations, but cannot guarantee correct agent behavior.
- Provider bias (#6):Infrastructure can audit model selection and flag anomalies, but cannot determine whether a model's output reflects genuine bias or legitimate reasoning.
- Instruction override (#16):Deep scan can detect known injection patterns, but novel override techniques that pass semantic inspection require the agent to distinguish legitimate from adversarial instructions.
- Gaslighting (#7):Escalation detection can flag patterns of capitulation, but the agent must ultimately decide whether to comply with persistent social pressure.
- Gradual escalation (#1):Infrastructure tracks escalation scores across sessions, but the boundary between legitimate task evolution and adversarial escalation is context-dependent.
PRECINCT's position: infrastructure should do everything it can to enforce security boundaries without relying on agent cooperation. For the remaining cognitive-layer threats, infrastructure provides detection, flagging, and escalation:shifting the problem from "the agent must be perfectly secure" to "the agent must respond correctly when the infrastructure tells it something is wrong."
This is a meaningful reduction in attack surface: instead of requiring agents to detect and defend against all 16 threat vectors independently, PRECINCT reduces the cognitive burden to 4 scenarios where infrastructure provides supporting signals.
Securing the Model Context Protocol
Securing the Model Context Protocol: A Five-Layer Defense Framework. arXiv:2511.20920.
This paper proposes a structured five-layer defense framework for securing the Model Context Protocol (MCP), which has become the standard interface between LLM agents and external tools. The framework addresses the full lifecycle of MCP interactions, from identity verification through runtime policy enforcement to centralized governance.
PRECINCT implements all five layers. The table below maps each layer from the paper to the concrete PRECINCT subsystem that fulfills it.
Five-Layer Framework Implementation
| Paper Layer | Purpose | PRECINCT Implementation |
|---|---|---|
| Authentication & Authorization | Verify identity and enforce fine-grained permissions | SPIFFE/SPIRE for cryptographic workload identity (step 3); OPA for policy-based authorization (step 6) |
| Provenance Tracking | Verify origin and integrity of tools and data | Tool Registry with SHA-256 hash verification (step 5); detects rug-pull attacks when tool descriptions or schemas change post-deployment |
| Isolation & Sandboxing | Contain breaches and limit blast radius | Container isolation via Docker/Kubernetes; NetworkPolicy enforcement; optional gVisor sandboxing for high-risk workloads |
| Inline Policy Enforcement | Inspect and filter traffic in real time | 13-layer gateway middleware chain: DLP scanning (step 7), deep scan (step 10), rate limiting (step 11), step-up gating (step 9), circuit breaker (step 12) |
| Centralized Governance | Single control point for policies and audit | OPA policy bundles with centralized authoring; OpenTelemetry tracing for distributed audit; in-process policy evaluation for sub-millisecond enforcement |
MCP Attack Vectors
The paper identifies six categories of MCP-specific attacks. PRECINCT addresses each through its middleware chain and supporting infrastructure.
| Attack Vector | Description | PRECINCT Defense |
|---|---|---|
| Tool Poisoning | Malicious instructions embedded in tool descriptions, invisible to users but processed by LLMs | Tool Registry hash verification (step 5) detects description changes; deep scan (step 10) inspects tool-sourced content |
| Rug Pull | Tool behavior changes after initial approval: same URI, different behavior | Continuous hash verification monitors for post-deployment changes; mismatch triggers alerts and optional blocking |
| Cross-Tool Manipulation | Malicious tool descriptions influence other tools through shared context | Tool isolation via separate registrations; DLP scanning of tool descriptions; OPA policies enforce per-tool permissions |
| Credential Exfiltration | Compromised LLM uses obfuscation, chunking, encoding, or steganography to extract secrets | Late-binding SPIKE token substitution (step 13). Agents see only handles like $SPIKE{ref:9f3c2a,exp:86400}, never real secrets |
| Data Exfiltration via Legitimate Tools | Injected instructions cause agents to query sensitive data and send it through authorized tools | DLP scanning (step 7) on both inbound and outbound payloads; output firewall with response handle-ization |
| Active Content via MCP-UI | ui:// resources deliver executable HTML+JavaScript to agent interfaces |
CSP enforcement, permission mediation, and content scanning for MCP-UI resources |
PRECINCT implements all five layers of the MCP defense framework and addresses all six documented attack vectors. The framework's emphasis on layered, independent defenses aligns directly with PRECINCT's 13-layer middleware chain architecture.
Prompt Injection on Agentic Coding Assistants
Prompt Injection on Agentic Coding Assistants. arXiv:2601.17548.
This paper evaluates the resilience of state-of-the-art agentic coding assistants against prompt injection attacks. The findings are stark: attack success rates exceed 85% across multiple commercial tools. The attacks require no special access. They are embedded in seemingly benign repository files, documentation, and tool outputs that agents process as part of normal operation.
These results validate a core PRECINCT design decision: prompt injection defense cannot be left to the agent alone. It must be enforced at the infrastructure level, before adversarial content reaches the agent's context window.
85%+ Success Rate
Attacks succeed against state-of-the-art agentic coding tools with minimal effort.
FindingZero Special Access
Injections hide in repository files, docs, and tool outputs. No elevated privileges needed.
Attack SurfaceInfrastructure-Level Defense
PRECINCT's tiered deep scan intercepts injections before they enter the agent's context.
PRECINCT ResponseTiered Deep Scan Architecture
PRECINCT addresses prompt injection through a two-tier scanning architecture that balances latency with detection depth. The fast path handles most requests synchronously; the deep path is triggered only when risk indicators are present.
| Tier | Guard Model | Latency | Trigger | Action |
|---|---|---|---|---|
| Fast Path | Prompt Guard 2 (86M params) | 10-20ms local, 50-150ms via Groq | All requests through step-up gating (step 9) for high-risk tools | Synchronous block if injection probability > 0.30 on high-risk tools; async flag otherwise |
| Deep Path | Llama Guard 4 (12B params) | 200-550ms via Groq | Conditional: only when Prompt Guard flags content or session risk score > 0.5 | Content classification, alert generation, session risk update, optional blocking |
Triggering Logic
Deep scanning is not applied uniformly. It is triggered by risk signals from other layers in the middleware chain, keeping latency low for normal traffic while ensuring thorough inspection when warranted.
DLP Signal
DLP scanning (step 7) flags potential_injection patterns, triggering deep scan for semantic confirmation.
Session Risk
Accumulated session risk score exceeds 0.5, indicating elevated threat context from prior interactions.
New Agent Baseline
First 100 requests from newly registered agents are scanned unconditionally to establish behavioral baseline.
When the guard model is unavailable (network failure, timeout, service degradation), PRECINCT defaults to blocking high-risk tool invocations rather than allowing them through unscanned. This fail-closed behavior ensures that a scanning outage does not become a security bypass.
Recursive Language Models
Zhang, H., Kraska, T. & Khattab, O. (2025). Recursive Language Models. arXiv:2512.24601.
The RLM paper introduces a framework enabling LLMs to recursively decompose tasks and make self-calls over unbounded contexts via a REPL (Read-Eval-Print Loop) environment. This is a powerful pattern for agentic systems. It allows agents to process arbitrarily long documents, recursively delegate subtasks, and compose results across depth levels without context window limitations.
RLM recursion is also inherently risky. Without governance, a recursive agent can spiral into unbounded depth, consume unlimited resources, and bypass mediation controls by making direct sub-calls. PRECINCT's RLM Governance Engine provides observable and controllable execution through per-lineage resource limits.
RLM Governance Controls
| Control | Default Limit | Denial Code | Purpose |
|---|---|---|---|
| Depth Limit | 6 levels | RLM_HALT_MAX_DEPTH (429) |
Caps maximum nesting depth for recursive agent calls, preventing unbounded recursion |
| Subcall Budget | 64 subcalls | RLM_HALT_MAX_SUBCALLS (429) |
Limits total number of subcalls per lineage, bounding resource consumption |
| Budget Units | 128 units | RLM_HALT_MAX_SUBCALL_BUDGET (429) |
Per-lineage cost accounting with per-call cost attribution, preventing cost explosions |
| UASGS Mediation | Required | RLM_BYPASS_DENIED (403) |
Subcalls without uasgs_mediated=true are denied, ensuring all recursion is gateway-mediated |
Governance Invariants
The RLM Governance Engine enforces four structural invariants on every recursive call chain:
Validated Context Only
The prompt variable in the REPL must be built from the external-context validation pipeline, not raw web content or unscanned inputs. This ensures every level of recursion operates on content that has passed through the gateway's inspection layers.
Gateway-Mediated Sub-Calls
Any REPL module making recursive calls must route through the gateway for policy evaluation, classification, and egress control. Direct sub-LM calls that bypass the gateway are denied.
Mandatory Sub-Call Budgets
Recursion depth, total sub-calls, and max bytes per sub-call are all budgeted per lineage. These limits prevent cost explosions and resource exhaustion from runaway recursive decomposition.
Restricted REPL API
The default REPL surface is a whitelisted string-only API. File access, network access, imports, and reflective APIs are blocked unless explicitly permitted by policy.
PRECINCT does not block RLM recursion. It governs it. The gateway provides observable and controllable execution through budgets and mandatory mediation, allowing teams to use RLM's powerful recursive decomposition within defined risk bounds. The goal is to make recursive agents safe to deploy, not to prevent their use.
Related Pages
Architecture
The full PRECINCT reference architecture document with design rationale and component relationships.
Capabilities
Complete capability map covering all 13 runtime layers, governed planes, and deployment modes.
Gateway
Deep dive into the 13-layer middleware chain that enforces the defense-in-depth controls referenced on this page.
Case Study
Reference port adapter implementation showing how PRECINCT enforcement applies to a real application.