PRECINCT Architecture

Overview

PRECINCT stands for Policy-driven Runtime Enforcement and Cryptographic Identity for Networked Compute and Tools. It is a comprehensive security architecture for agentic AI systems built on the Model Context Protocol (MCP).

PRECINCT addresses three fundamental challenges that emerge when autonomous AI agents operate in production environments:

Identity: Who is the agent? Cryptographic workload identity via SPIFFE/SPIRE ensures every agent, service, and sidecar is verifiably who it claims to be.
Authorization: What can it do? Fine-grained, policy-as-code authorization via OPA ensures agents only perform actions they are explicitly permitted to take.
Secrets: How to provide credentials safely? Late-binding secret injection via SPIKE ensures agents never see, possess, or log real credentials.

Unlike framework-specific solutions, PRECINCT operates as a boundary enforcement layer. It wraps around existing agent frameworks and MCP servers without requiring any modification to agent code, model configurations, or tool implementations.

To extend enforcement to communication channels, PRECINCT includes channel mediation adapters for Discord and Email that route all agent-to-agent and agent-to-human communication through the gateway enforcement chain. These adapters ensure that no communication channel bypasses policy evaluation, DLP scanning, or audit logging, preventing cross-channel manipulation attacks such as agent-to-agent loops and libelous mass broadcast.

The Five Governed Planes

PRECINCT organizes governance into five distinct planes. Each plane has its own policy domain, enforcement points, and audit requirements. This separation prevents monolithic, one-size-fits-all rules and enables fine-grained governance at every layer of the agentic system.

graph TB subgraph "PRECINCT Governed Planes" P1["1. LLM / Model Egress Plane
Model selection, data residency,
egress constraints"] P2["2. Context / Memory Plane
Memory lifecycle, session tracking,
provenance"] P3["3. Tool Plane
MCP and non-MCP tool execution,
authorization, hash verification"] P4["4. Control Loop Plane
Execution pattern boundaries,
recursion limits, step budgets"] P5["5. Ingress / Event Plane
Webhook, queue, event
normalization and validation"] end P1 --- P2 P2 --- P3 P3 --- P4 P4 --- P5 GW["PRECINCT Gateway
(Centralized Enforcement)"] P1 --> GW P2 --> GW P3 --> GW P4 --> GW P5 --> GW

LLM / Model Egress Plane: Governs which models agents may call, what data may be sent to them, and where model inference may occur. Enforces data residency constraints, model allow-lists, and HIPAA-aware prompt safety with minimum-necessary data handling.
Context / Memory Plane: Governs the memory lifecycle of agent sessions with four-tier classification: ephemeral, session, long_term (writes require clean DLP classification), and regulated (reads require step-up authorization). Controls what context may be persisted and ensures provenance metadata is attached to all stored context.
Tool Plane: Governs MCP and non-MCP tool execution. Every tool invocation is authorized against a capability registry with protocol-specific adapters. The CLI adapter enforces command allowlists, max-args limits, and denied-arg-token detection (blocking shell injection via ;, &&, ||, |, $(, `, >, <).
Control Loop Plane: Governs execution pattern boundaries with a full 8-state governance state machine (CREATED, RUNNING, WAITING_APPROVAL, COMPLETED, HALTED_POLICY, HALTED_BUDGET, HALTED_PROVIDER_UNAVAILABLE, HALTED_OPERATOR). Enforces immutable budget limits across 8 dimensions: steps, tool calls, model calls, wall time, egress bytes, model cost, provider failovers, and risk score. Supports operator halt (human kill switch) and provider unavailability handling.
Ingress / Event Plane: Governs inbound triggers with canonical connector envelope validation. Enforces SPIFFE source principal matching, SHA-256 payload content-addressing, replay detection with composite nonce keys and 30-minute TTL, and 10-minute freshness windows. Supports webhook and queue connector types.

Cross-Cutting: RLM Governance Engine

In addition to the five governed planes, the gateway includes a Recursive Lineage Manager (RLM) governance engine that tracks multi-agent lineage across nested subcalls. RLM operates as a cross-cutting layer activated via execution_mode=rlm on any plane request. It enforces three configurable limits per lineage: max depth (default 6), max subcalls (default 64), and max budget units (default 128). Subcalls that lack UASGS mediation markers are denied to prevent governance bypass.

Core Components

PRECINCT is built on six proven, open-source infrastructure components. Each addresses a distinct security concern, and together they form a complete zero-trust security posture for agentic AI systems.

Core components of the PRECINCT architecture
Component	Role	Function
SPIFFE / SPIRE	Identity	Cryptographic workload identity. Every agent, service, and sidecar receives a SPIFFE ID attested by SPIRE. No static secrets, no shared tokens. Identity is attested, not asserted.
SPIKE	Secrets	SPIFFE-native secrets store. Agents authenticate with their SVID and receive only the secrets their policy permits, scoped to the current session and task. Secrets are references, never values.
OPA	Authorization	Fine-grained, declarative policy enforcement via Rego. Every agent request is evaluated against version-controlled policies that encode organizational security requirements as auditable code.
PRECINCT Gateway	Enforcement	The centralized enforcement point. A reverse proxy that orchestrates the 13-layer middleware chain, mediates all agent-to-tool communication, and ensures no request bypasses policy evaluation.
Phoenix + OTel Collector	Trace Observability	Distributed tracing for full request waterfalls across all middleware steps. Optimized for latency diagnostics, trace correlation, and incident triage.
OpenSearch + Dashboards (Optional)	Compliance Evidence	Indexed forensic and compliance investigations over audit records. In regulated Kubernetes deployments, this extension is wired with Secret-managed TLS/mTLS material and SPIRE identities.

This dual-backend model separates concerns cleanly: Phoenix remains the primary trace analysis surface, while OpenSearch Dashboards provides indexed audit evidence search for compliance and forensic workflows.

The 13-Layer Middleware Chain

Every request that passes through the PRECINCT Gateway traverses 13 distinct enforcement layers in strict order. No shortcuts, no bypass. Each layer has a specific function and a measurable latency impact.

The 13-layer PRECINCT middleware chain with latency impact
Step	Layer	Function	Latency Impact
`1`	Request Size Limit	Reject payloads exceeding 10MB before further processing	<0.1ms
`2`	Body Capture	Buffer and cache the request body for downstream inspection layers	<0.5ms
`3`	SPIFFE Auth	Cryptographic workload identity verification via mTLS SVID validation. For external callers, OAuth 2.0 bearer tokens and token exchange credentials are validated and mapped to `spiffe://domain/external/*` identities. Principal hierarchy metadata (System, Operator, Trusted Agent, Standard Agent, Restricted Agent, Unknown) is enriched from SPIFFE path patterns for downstream authorization decisions.	<1ms
`4`	Audit Log	Hash-chained JSONL audit record with unique decision IDs for tamper evidence	<0.5ms
`5`	Tool Registry Verify	SHA-256 hash validation of the target tool; detects rug-pull attacks. Extended with a Data Source Integrity Registry that verifies content hashes of external data sources and applies mutable-source policy (block_on_change / flag_on_change / allow) to detect external config corruption.	<1ms
`6`	OPA Policy	Rego-based fine-grained authorization evaluation against policy bundles	<2ms
	`post_authz` extension slot: pluggable extensions run here (e.g., custom RBAC, tool checkers)
`7`	DLP Scanning	Credential blocking (AWS keys, PEM, GitHub tokens), PII flagging, injection detection	<1ms
	`post_inspection` extension slot: pluggable extensions run here (e.g., content scanners, format validators)
`8`	Session Context	Cross-request risk tracking; accumulates behavior patterns per session. Now includes escalation detection with EscalationScore tracking (contribution = Impact x (4 - Reversibility); thresholds: Warning ≥ 15, Critical ≥ 25, Emergency ≥ 40) to detect gradual privilege escalation across sessions.	<0.5ms
`9`	Step-Up Gating	Risk scoring with guard model dispatch for high-risk operations. Includes irreversibility classification (ClassifyReversibility): actions scored 0-3 are classified as low/medium/high/critical; agents flagged as high-irreversibility (Score ≥ 2) are automatically sent to step-up gating for guard model review before execution.	0–500ms (conditional)
`10`	Deep Scan	Asynchronous prompt injection detection via guard model	200–550ms (async)
	`post_analysis` extension slot: pluggable extensions run here (e.g., final approval gates, aggregated risk decisions)
`11`	Rate Limiting	Per-identity token bucket enforcement via KeyDB	<0.5ms
`12`	Circuit Breaker	Upstream cascade protection; halts traffic to failing services	<0.1ms
`13`	Token Substitution	Late-binding secret injection via SPIKE; replaces opaque tokens with real credentials at egress	<1ms

An outer observability wrapper (Request Metrics, step 0) captures timing, size, and routing metadata but is not an enforcement layer.

Key Design Constraint

Token substitution happens last, at step 13, immediately before egress. No middleware layer ever sees raw credentials. The agent never possesses, observes, or logs a real secret.

Pluggable Extension Slots

The middleware chain exposes three named extension slots -- post_authz, post_inspection, and post_analysis -- where external HTTP sidecar extensions can be plugged in without modifying gateway code. Extensions are configured via a hot-reloadable YAML registry. See the Gateway extension slots documentation for details.

External Access: OAuth 2.0 & Token Exchange

The gateway acts as an OAuth 2.0 Resource Server (RFC 6750). External clients authenticate with bearer tokens from your existing Authorization Server (Auth0, Keycloak, Okta). The gateway validates the JWT, maps the sub claim to a SPIFFE identity in the external/ namespace, and applies the full middleware chain. A token exchange endpoint is also available for tools without an OAuth AS. See the External Access documentation for configuration, scope enforcement, and security boundaries.

Request Flow Through the Chain

sequenceDiagram participant Agent participant GW as PRECINCT Gateway participant Size as 1. Size Limit participant Body as 2. Body Capture participant SPIFFE as 3. SPIFFE Auth participant Audit as 4. Audit Log participant Registry as 5. Tool Registry participant OPA as 6. OPA Policy participant DLP as 7. DLP Scan participant Session as 8. Session Context participant StepUp as 9. Step-Up Gate participant Deep as 10. Deep Scan participant Rate as 11. Rate Limit participant CB as 12. Circuit Breaker participant Token as 13. Token Sub participant Upstream as MCP Server Agent->>GW: JSON-RPC Request GW->>Size: Check payload size Size->>Body: Buffer body Body->>SPIFFE: Verify SVID SPIFFE->>Audit: Write audit record Audit->>Registry: Verify tool hash Registry->>OPA: Evaluate policy Note over OPA,DLP: post_authz extension slot OPA->>DLP: Scan for credentials/PII Note over DLP,Session: post_inspection extension slot DLP->>Session: Update risk score Session->>StepUp: Check risk threshold StepUp->>Deep: Async injection scan Note over Deep,Rate: post_analysis extension slot Deep->>Rate: Check token bucket Rate->>CB: Check circuit state CB->>Token: Substitute secrets Token->>Upstream: Forward request Upstream-->>GW: Response GW-->>Agent: Filtered response

Threat Model

PRECINCT's threat model addresses both the OWASP Agentic AI threat categories and MCP-specific attack vectors that emerge when agents interact with tools via the Model Context Protocol.

OWASP Agentic AI Threats

Prompt Injection: Adversarial inputs designed to manipulate agent behavior. PRECINCT defends with DLP scanning (layer 7) and deep scan via guard model dispatch (layer 10).
Tool Misuse: Agents invoking tools outside their authorized scope. PRECINCT defends with OPA policy evaluation (layer 6), tool registry verification (layer 5), and session context tracking (layer 8).
Data Exfiltration: Agents leaking sensitive data through tool calls, model context, or side channels. PRECINCT defends with DLP credential blocking (layer 7), session context cross-request detection (layer 8), and the response firewall.
Lateral Movement: Compromised agents using their identity to access resources beyond their scope. PRECINCT defends with SPIFFE cryptographic identity (layer 3), per-identity rate limiting (layer 11), and fine-grained OPA policies (layer 6).

MCP-Specific Attack Vectors

Tool Poisoning: A malicious tool masquerades as a legitimate one. The tool registry (layer 5) verifies SHA-256 hashes at invocation time to detect substitution.
Rug Pull: A previously legitimate tool is updated with malicious code after registration. Hash verification at every invocation detects changes between the registered and actual tool.
Cross-Tool Manipulation: An agent chains tool calls to achieve an unauthorized outcome that no single tool call would permit. Session context tracking (layer 8) accumulates cross-request behavior to detect multi-step attack patterns.
Credential Exfiltration: An agent extracts real credentials from tool responses or intermediate state. Late-binding token substitution (layer 13) ensures agents never possess real secrets, and the response firewall returns sensitive data as opaque handles.
Active Content via MCP-UI: Malicious content injected through MCP user interface channels. DLP scanning (layer 7) and deep scan (layer 10) inspect all content flowing through the gateway regardless of channel.

Validated Against Real-World Attacks

Every threat in this model has been mapped to the 16 case studies in Research (Shapira et al., 2026, arXiv:2602.20021v1). See the defense mapping table for a layer-by-layer breakdown of how PRECINCT addresses each attack scenario.

Key Architecture Decisions

The following architectural decisions define PRECINCT's approach to securing agentic AI systems. Each represents a deliberate choice with specific trade-offs.

Boundary enforcement, not framework replacement. PRECINCT wraps around existing agent frameworks as a reverse proxy. It does not require agents to use a specific SDK, framework, or programming language. Any agent that speaks HTTP can be governed.
Policy decisions, not tool execution. The gateway decides whether a tool call is permitted. It never executes tools itself. The separation of policy decision from tool execution ensures the gateway cannot be weaponized as an execution engine.
Structured deny codes over opaque errors. When the gateway denies a request, it returns a structured error code (e.g., authz_policy_denied, dlp_credentials_detected) that enables agents to understand why a request was rejected and adapt their behavior accordingly.
Secrets as references, not values. Agents operate with opaque, meaningless tokens. Real credentials are resolved at egress time inside the gateway. If an agent is compromised, the attacker obtains tokens that are useless outside the gateway context.
Cloud-agnostic by default. PRECINCT runs on any Kubernetes-conformant cluster or Docker Compose environment. No cloud-specific services are required. All dependencies are open-source and self-hostable.

Context Admission Invariants

These invariants are unconditional guarantees enforced by the PRECINCT architecture. They are not configurable, not optional, and not bypassable. Every request path through the gateway is subject to all four invariants.

no-scan-no-send

Nothing enters model context without scanning. Every piece of data that flows into an LLM's context window must first pass through the DLP scanning layer. If the scan cannot be performed, the data is not sent.

no-provenance-no-persist

Nothing persists without provenance. Every piece of data stored in the context/memory plane must carry provenance metadata: who created it, when, under what policy decision, and with what audit trail.

no-verification-no-load

Nothing loads without verification. Every tool, plugin, and extension must be verified against its registered cryptographic hash before it is loaded or invoked. Unverified code does not execute.

minimum-necessary

Data minimization is enforced at every boundary. Agents receive only the data they need for the current task. Context windows are scoped, secrets are scoped, and tool access is scoped to the minimum necessary.

The 3 Rs Operating Doctrine

PRECINCT's operational posture is defined by three continuous practices. These are not one-time setup tasks but ongoing operational disciplines that keep the security posture fresh and resilient.

The 3 Rs (Repair, Repave, Rotate) were originally conceived at Pivotal Software and largely implemented in Cloud Foundry. PRECINCT extends this doctrine to agentic AI infrastructure.

Repair

Self-healing and redundancy. When a component fails, the system detects the failure and initiates automatic recovery. Health checks, liveness probes, and circuit breakers ensure degraded components are isolated and restored without manual intervention.

Rotate

Short-lived identities and referential credentials. SPIFFE SVIDs are rotated automatically on short intervals. Secrets are never stored as long-lived values. Credential references expire and must be re-obtained, limiting the window of exposure if any single credential is compromised.

Repave

Rebuild trusted runtime state on demand. Rather than patching a running system, PRECINCT supports tearing down and rebuilding the entire stack from trusted base images. This eliminates configuration drift and ensures every deployment starts from a known-good state.