Research Foundations

PRECINCT's threat model, defense architecture, and governance design are informed by peer-reviewed security research, not invented in isolation. Four published papers shape how PRECINCT reasons about threats, structures defenses, and governs recursive agent execution.

Agents of Chaos

Red-teaming exercise producing 16 threat case studies. Validates PRECINCT's defense-in-depth coverage and the boundary-vs-cognition distinction.

Threat Model

Securing MCP

Five-layer defense framework for MCP security. PRECINCT implements all five layers across its identity, policy, and gateway subsystems.

Defense Framework

Prompt Injection

Demonstrates 85%+ attack success on agentic coding tools. Validates PRECINCT's tiered deep scan and step-up gating at the infrastructure level.

Attack Validation

Recursive Language Models

Introduces LLM recursive self-calling over unbounded contexts. PRECINCT's RLM Governance Engine enforces depth, subcall, and budget limits on these chains.

Governance Model

Agents of Chaos

Citation

Shapira, N. et al. (2026). Agents of Chaos. arXiv:2602.20021v1.

Agents of Chaos reports on a structured red-teaming exercise in which 20 researchers spent 2 weeks probing OpenClaw-based autonomous agents in a live lab environment. The agents had access to Discord, email, shell commands, and persistent memory. A realistic surface area for enterprise agentic deployments. The exercise produced 16 case studies, of which 11 resulted in successful attacks.

The paper demonstrates that autonomous agents are vulnerable not only to traditional prompt injection but also to social-engineering vectors, identity spoofing, cross-channel manipulation, and resource exhaustion attacks that exploit the gap between agent cognition and infrastructure enforcement.

20 Researchers

Adversarial team probing agent defenses across multiple attack surfaces simultaneously.

Red Team Scale

16 Case Studies

Covering identity spoofing, social engineering, data exfiltration, resource exhaustion, and more.

Attack Surface

11 Successful Attacks

69% success rate against unprotected agents, demonstrating the need for infrastructure-level defense.

Impact

Threat Taxonomy: All 16 Case Studies

Each case study is mapped to the PRECINCT defense layers that address it. Coverage indicates how the threat is handled: Full Coverage means infrastructure enforcement completely mitigates the vector without agent cooperation; Defense-in-Depth means multiple independent layers combine to address the vector; Infrastructure-Assisted means infrastructure provides detection, flagging, and containment while agent reasoning integrity contributes to complete defense.

All 16 Agents of Chaos case studies with PRECINCT defense mapping
#	Case Study	PRECINCT Defense	Coverage
1	Agent Forgetting / Gradual Escalation	Escalation score tracking, Irreversibility gating, Rate limiting (step 11)	Infrastructure-Assisted
2	Non-Owner Compliance	OPA policy (step 6), Principal hierarchy, Irreversibility gating	Defense-in-Depth
3	Sensitive Data Disclosure via Email	DLP scanning (step 7), Email adapter mediation	Defense-in-Depth
4	Agent-to-Agent Discord Loop	Rate limiting (step 11), Discord adapter mediation	Defense-in-Depth
5	DoS via Storage Exhaustion	Request size limit (step 1), Rate limiting (step 11)	Full Coverage
6	Provider Bias Exploitation	Audit log (step 4), Model egress governance	Infrastructure-Assisted
7	Gaslighting / Social Pressure	Concession accumulator, Escalation detection	Infrastructure-Assisted
8	Identity Spoofing via Display Name	SPIFFE/SPIRE (step 3), Deep scan (step 10)	Full Coverage
9	Cross-Channel Impersonation Reset	Principal hierarchy metadata, OPA (step 6)	Defense-in-Depth
10	Agent Corruption via External Config	Data source integrity registry, Tool registry hash (step 5)	Defense-in-Depth
11	Libelous Mass Broadcast	Mass-send step-up gating, Principal hierarchy	Defense-in-Depth
12	Prompt Injection via Channels	Deep scan (step 10), DLP scanning (step 7)	Full Coverage
13	Credential Exfiltration	DLP scanning (step 7), SPIKE token substitution (step 13)	Full Coverage
14	Session Fixation	SPIFFE/SPIRE rotation (step 3), Session context (step 8)	Full Coverage
15	Memory Poisoning	Tool registry verification (step 5), Data source integrity registry	Defense-in-Depth
16	Instruction Override	Step-up gating (step 9), Deep scan (step 10)	Infrastructure-Assisted

5 Full Coverage

Infrastructure enforcement completely mitigates the vector. No agent cooperation required.

Infrastructure-Enforced

7 Defense-in-Depth

Multiple independent enforcement layers combine to address the vector with redundant coverage.

Multi-Layer

4 Infrastructure-Assisted

Infrastructure provides detection, flagging, and containment. Agent reasoning integrity contributes to complete defense.

Assisted

Defense-in-Depth: Threat-to-Layer Mapping

PRECINCT's 13-layer middleware chain provides defense-in-depth against the Agents of Chaos threats. The table below shows which gateway layers participate in defending against each threat category, demonstrating that most attacks are intercepted by multiple independent layers.

Architecture mapping: threats to PRECINCT middleware layers
Threat Category	Case Studies	Primary Layer(s)	Secondary Layer(s)	Defense Mechanism
Identity & Spoofing	#8, #9, #14	SPIFFE Auth (step 3)	Session Context (step 8), OPA (step 6)	Cryptographic workload identity eliminates display-name spoofing. SPIRE-issued SVIDs rotate automatically, preventing session fixation. OPA enforces principal hierarchy for cross-channel requests.
Data Exfiltration	#3, #13	DLP Scanning (step 7)	Token Substitution (step 13)	DLP blocks outbound credential patterns and sensitive content before they reach adapters. SPIKE token substitution ensures secrets are never exposed in agent-visible payloads. They are only resolved at egress.
Prompt Injection & Override	#12, #16	Deep Scan (step 10)	Step-Up Gating (step 9), DLP (step 7)	Guard model inspection detects adversarial payloads in channel messages. Step-up gating requires additional authorization for elevated-risk operations triggered by suspicious instructions.
Resource Exhaustion & Loops	#4, #5	Rate Limiting (step 11)	Request Size Limit (step 1), Circuit Breaker (step 12)	Per-identity rate limits prevent runaway agent-to-agent loops. Request size limits block storage exhaustion payloads at ingress. Circuit breakers isolate upstream failures from cascading.
Social Engineering & Manipulation	#1, #2, #7	OPA Policy (step 6)	Session Context (step 8), Audit Log (step 4)	OPA enforces owner-only compliance rules regardless of social pressure. Escalation score tracking detects gradual privilege creep across sessions. Concession accumulators flag patterns of agent capitulation.
Integrity & Poisoning	#10, #15	Tool Registry Verify (step 5)	Deep Scan (step 10)	Hash-based tool registry verification detects tampered configurations. Data source integrity registry validates external config sources before they influence agent behavior.
Broadcast & Amplification	#6, #11	Step-Up Gating (step 9)	Audit Log (step 4), Rate Limiting (step 11)	Mass-send operations require step-up authorization. Audit logs provide forensic evidence for bias exploitation investigations. Rate limits bound the blast radius of amplification attacks.
Irreversible Action Gating	#1, #2	Step-Up Gating (step 9)	OPA Policy (step 6), Principal Hierarchy	Actions classified as irreversible (Score=3: delete, shutdown, purge, wipe) are forced to the Deny gate when requested by non-owner principals (Level > 1) or during escalated sessions (EscalationScore > 15). The `X-Precinct-Reversibility` and `X-Precinct-Backup-Recommended` headers communicate classification results to downstream callers.

Defense-in-Depth Coverage

Every Agents of Chaos threat is addressed by at least two independent PRECINCT layers. This means that even if one defense is bypassed, a secondary layer provides a backstop. That is the core principle of defense-in-depth architecture.

Boundary vs. Cognition

The paper exposes a fundamental tension in agentic AI security: some threats can be stopped entirely by infrastructure enforcement (boundary defenses), while others require the agent itself to reason correctly (cognitive defenses). Understanding this boundary is critical for realistic security postures.

What Infrastructure Can Solve

Boundary defenses operate at the network, identity, and policy layers -- they do not require the agent to cooperate or even be aware of them. PRECINCT's gateway middleware chain is a boundary defense.

Identity spoofing (#8, #14):Cryptographic identity (SPIFFE/SPIRE) makes display-name spoofing irrelevant. The infrastructure verifies identity, not the agent.
Credential exfiltration (#13):Late-binding token substitution means agents never see real secrets. There is nothing to exfiltrate.
Storage exhaustion (#5):Request size limits and rate limiting are enforced before payloads reach the agent.
Prompt injection (#12):Deep scan inspection occurs in the middleware chain, before the payload reaches the agent's context window.
Session fixation (#14):SVID rotation is automatic and infrastructure-managed. The agent cannot prevent or interfere with it.

Infrastructure-Enforced

What Infrastructure Cannot Solve Alone

Cognitive defenses require the agent to maintain reasoning integrity under adversarial pressure. Infrastructure can detect and flag these situations, but cannot guarantee correct agent behavior.

Provider bias (#6):Infrastructure can audit model selection and flag anomalies, but cannot determine whether a model's output reflects genuine bias or legitimate reasoning.
Instruction override (#16):Deep scan can detect known injection patterns, but novel override techniques that pass semantic inspection require the agent to distinguish legitimate from adversarial instructions.
Gaslighting (#7):Escalation detection can flag patterns of capitulation, but the agent must ultimately decide whether to comply with persistent social pressure.
Gradual escalation (#1):Infrastructure tracks escalation scores across sessions, but the boundary between legitimate task evolution and adversarial escalation is context-dependent.

Requires Cognitive Cooperation

The Security Gap

PRECINCT's position: infrastructure should do everything it can to enforce security boundaries without relying on agent cooperation. For the remaining cognitive-layer threats, infrastructure provides detection, flagging, and escalation:shifting the problem from "the agent must be perfectly secure" to "the agent must respond correctly when the infrastructure tells it something is wrong."

This is a meaningful reduction in attack surface: instead of requiring agents to detect and defend against all 16 threat vectors independently, PRECINCT reduces the cognitive burden to 4 scenarios where infrastructure provides supporting signals.

Securing the Model Context Protocol

Citation

Securing the Model Context Protocol: A Five-Layer Defense Framework. arXiv:2511.20920.

This paper proposes a structured five-layer defense framework for securing the Model Context Protocol (MCP), which has become the standard interface between LLM agents and external tools. The framework addresses the full lifecycle of MCP interactions, from identity verification through runtime policy enforcement to centralized governance.

PRECINCT implements all five layers. The table below maps each layer from the paper to the concrete PRECINCT subsystem that fulfills it.

Five-Layer Framework Implementation

MCP five-layer defense framework mapped to PRECINCT components
Paper Layer	Purpose	PRECINCT Implementation
Authentication & Authorization	Verify identity and enforce fine-grained permissions	SPIFFE/SPIRE for cryptographic workload identity (step 3); OPA for policy-based authorization (step 6)
Provenance Tracking	Verify origin and integrity of tools and data	Tool Registry with SHA-256 hash verification (step 5); detects rug-pull attacks when tool descriptions or schemas change post-deployment
Isolation & Sandboxing	Contain breaches and limit blast radius	Container isolation via Docker/Kubernetes; NetworkPolicy enforcement; optional gVisor sandboxing for high-risk workloads
Inline Policy Enforcement	Inspect and filter traffic in real time	13-layer gateway middleware chain: DLP scanning (step 7), deep scan (step 10), rate limiting (step 11), step-up gating (step 9), circuit breaker (step 12)
Centralized Governance	Single control point for policies and audit	OPA policy bundles with centralized authoring; OpenTelemetry tracing for distributed audit; in-process policy evaluation for sub-millisecond enforcement

MCP Attack Vectors

The paper identifies six categories of MCP-specific attacks. PRECINCT addresses each through its middleware chain and supporting infrastructure.

MCP attack vectors and PRECINCT defenses
Attack Vector	Description	PRECINCT Defense
Tool Poisoning	Malicious instructions embedded in tool descriptions, invisible to users but processed by LLMs	Tool Registry hash verification (step 5) detects description changes; deep scan (step 10) inspects tool-sourced content
Rug Pull	Tool behavior changes after initial approval: same URI, different behavior	Continuous hash verification monitors for post-deployment changes; mismatch triggers alerts and optional blocking
Cross-Tool Manipulation	Malicious tool descriptions influence other tools through shared context	Tool isolation via separate registrations; DLP scanning of tool descriptions; OPA policies enforce per-tool permissions
Credential Exfiltration	Compromised LLM uses obfuscation, chunking, encoding, or steganography to extract secrets	Late-binding SPIKE token substitution (step 13). Agents see only handles like `$SPIKE{ref:9f3c2a,exp:86400}`, never real secrets
Data Exfiltration via Legitimate Tools	Injected instructions cause agents to query sensitive data and send it through authorized tools	DLP scanning (step 7) on both inbound and outbound payloads; output firewall with response handle-ization
Active Content via MCP-UI	`ui://` resources deliver executable HTML+JavaScript to agent interfaces	CSP enforcement, permission mediation, and content scanning for MCP-UI resources

Complete Framework Coverage

PRECINCT implements all five layers of the MCP defense framework and addresses all six documented attack vectors. The framework's emphasis on layered, independent defenses aligns directly with PRECINCT's 13-layer middleware chain architecture.

Prompt Injection on Agentic Coding Assistants

Citation

Prompt Injection on Agentic Coding Assistants. arXiv:2601.17548.

This paper evaluates the resilience of state-of-the-art agentic coding assistants against prompt injection attacks. The findings are stark: attack success rates exceed 85% across multiple commercial tools. The attacks require no special access. They are embedded in seemingly benign repository files, documentation, and tool outputs that agents process as part of normal operation.

These results validate a core PRECINCT design decision: prompt injection defense cannot be left to the agent alone. It must be enforced at the infrastructure level, before adversarial content reaches the agent's context window.

85%+ Success Rate

Attacks succeed against state-of-the-art agentic coding tools with minimal effort.

Finding

Zero Special Access

Injections hide in repository files, docs, and tool outputs. No elevated privileges needed.

Attack Surface

Infrastructure-Level Defense

PRECINCT's tiered deep scan intercepts injections before they enter the agent's context.

PRECINCT Response

Tiered Deep Scan Architecture

PRECINCT addresses prompt injection through a two-tier scanning architecture that balances latency with detection depth. The fast path handles most requests synchronously; the deep path is triggered only when risk indicators are present.

PRECINCT deep scan architecture tiers
Tier	Guard Model	Latency	Trigger	Action
Fast Path	Prompt Guard 2 (86M params)	10-20ms local, 50-150ms via Groq	All requests through step-up gating (step 9) for high-risk tools	Synchronous block if injection probability > 0.30 on high-risk tools; async flag otherwise
Deep Path	Llama Guard 4 (12B params)	200-550ms via Groq	Conditional: only when Prompt Guard flags content or session risk score > 0.5	Content classification, alert generation, session risk update, optional blocking

Triggering Logic

Deep scanning is not applied uniformly. It is triggered by risk signals from other layers in the middleware chain, keeping latency low for normal traffic while ensuring thorough inspection when warranted.

DLP Signal

DLP scanning (step 7) flags potential_injection patterns, triggering deep scan for semantic confirmation.

Session Risk

Accumulated session risk score exceeds 0.5, indicating elevated threat context from prior interactions.

New Agent Baseline

First 100 requests from newly registered agents are scanned unconditionally to establish behavioral baseline.

Fail-Closed Design

When the guard model is unavailable (network failure, timeout, service degradation), PRECINCT defaults to blocking high-risk tool invocations rather than allowing them through unscanned. This fail-closed behavior ensures that a scanning outage does not become a security bypass.

Recursive Language Models

Citation

Zhang, H., Kraska, T. & Khattab, O. (2025). Recursive Language Models. arXiv:2512.24601.

The RLM paper introduces a framework enabling LLMs to recursively decompose tasks and make self-calls over unbounded contexts via a REPL (Read-Eval-Print Loop) environment. This is a powerful pattern for agentic systems. It allows agents to process arbitrarily long documents, recursively delegate subtasks, and compose results across depth levels without context window limitations.

RLM recursion is also inherently risky. Without governance, a recursive agent can spiral into unbounded depth, consume unlimited resources, and bypass mediation controls by making direct sub-calls. PRECINCT's RLM Governance Engine provides observable and controllable execution through per-lineage resource limits.

RLM Governance Controls

PRECINCT RLM governance controls
Control	Default Limit	Denial Code	Purpose
Depth Limit	6 levels	`RLM_HALT_MAX_DEPTH` (429)	Caps maximum nesting depth for recursive agent calls, preventing unbounded recursion
Subcall Budget	64 subcalls	`RLM_HALT_MAX_SUBCALLS` (429)	Limits total number of subcalls per lineage, bounding resource consumption
Budget Units	128 units	`RLM_HALT_MAX_SUBCALL_BUDGET` (429)	Per-lineage cost accounting with per-call cost attribution, preventing cost explosions
UASGS Mediation	Required	`RLM_BYPASS_DENIED` (403)	Subcalls without `uasgs_mediated=true` are denied, ensuring all recursion is gateway-mediated

Governance Invariants

The RLM Governance Engine enforces four structural invariants on every recursive call chain:

Validated Context Only

The prompt variable in the REPL must be built from the external-context validation pipeline, not raw web content or unscanned inputs. This ensures every level of recursion operates on content that has passed through the gateway's inspection layers.

Gateway-Mediated Sub-Calls

Any REPL module making recursive calls must route through the gateway for policy evaluation, classification, and egress control. Direct sub-LM calls that bypass the gateway are denied.

Mandatory Sub-Call Budgets

Recursion depth, total sub-calls, and max bytes per sub-call are all budgeted per lineage. These limits prevent cost explosions and resource exhaustion from runaway recursive decomposition.

Restricted REPL API

The default REPL surface is a whitelisted string-only API. File access, network access, imports, and reflective APIs are blocked unless explicitly permitted by policy.

Design Philosophy

PRECINCT does not block RLM recursion. It governs it. The gateway provides observable and controllable execution through budgets and mandatory mediation, allowing teams to use RLM's powerful recursive decomposition within defined risk bounds. The goal is to make recursive agents safe to deploy, not to prevent their use.

Research Foundations

Agents of Chaos

20 Researchers

16 Case Studies

11 Successful Attacks

Threat Taxonomy: All 16 Case Studies

5 Full Coverage

7 Defense-in-Depth

4 Infrastructure-Assisted

Defense-in-Depth: Threat-to-Layer Mapping

Boundary vs. Cognition

What Infrastructure Can Solve

What Infrastructure Cannot Solve Alone

Securing the Model Context Protocol

Five-Layer Framework Implementation

MCP Attack Vectors

Prompt Injection on Agentic Coding Assistants

85%+ Success Rate

Zero Special Access

Infrastructure-Level Defense

Tiered Deep Scan Architecture

Triggering Logic

DLP Signal

Session Risk

New Agent Baseline

RLM Governance Controls

Governance Invariants

Validated Context Only

Gateway-Mediated Sub-Calls

Mandatory Sub-Call Budgets

Restricted REPL API

Related Pages