RLM Governance
Recursive Language Models (RLM) are based on research by Zhang, Kraska & Khattab (2025), arXiv:2512.24601. PRECINCT does not block RLM recursion. It governs it.
RLM lets agents decompose problems by recursively calling themselves and other models through a code execution environment. It is one of the most powerful patterns in agentic AI, and one of the most dangerous without proper governance. PRECINCT provides protocol-level controls that work with any RLM implementation.
What Is RLM?
Traditional LLM inference follows a simple pattern: send a prompt, receive a response. RLM replaces this with a fundamentally different model. Instead of feeding the entire context into a single call, the LLM gets a REPL (Read-Eval-Print Loop) environment where it can write code, examine data programmatically, call sub-LMs for semantic analysis, and build up answers iteratively.
In practice, this means an agent can:
- Decompose large contexts by writing code to split, filter, and analyze data in chunks rather than trying to process everything at once
- Make recursive sub-calls to itself or other models, each handling a piece of the problem, then synthesize the results
- Execute iteratively through multiple REPL cycles, observing intermediate results and adjusting strategy
- Process near-unlimited context because the context lives in the REPL environment, not in the model's context window
Think of the difference this way: traditional inference asks the LLM to read an entire book and answer a question. RLM gives the LLM a desk, a pen, and the ability to open the book to any page, take notes, and ask a colleague to read specific chapters.
Why RLM Needs Governance
The same capabilities that make RLM powerful make it dangerous in production. Without governance, a recursive agent can:
| Risk | What Happens | Impact |
|---|---|---|
| Unbounded recursion | Agent keeps spawning sub-calls that spawn more sub-calls | Exponential cost, resource exhaustion, cascading failures |
| Cost explosion | Each recursive call consumes model tokens; deep trees multiply cost geometrically | Unexpected bills in the thousands from a single task |
| Bypass attacks | A prompt-injected agent makes direct sub-calls that skip the gateway | Policy, DLP, and audit controls are silently circumvented |
| Hidden escalation | Sub-calls accumulate permissions or access patterns not visible at the root level | Data exfiltration or privilege escalation through recursive depth |
The core problem is that RLM recursion is opaque by default. The root caller has no visibility into what sub-calls are doing, how deep the tree has grown, or how much budget has been consumed. PRECINCT makes it observable and controllable.
How PRECINCT Governs RLM
PRECINCT's RLM governance operates at the protocol level,
not the implementation level. It does not care whether you use
DSPy's
dspy.RLM module, the standalone
rlms
package, or a custom implementation. What matters is that every model call
routes through the PRECINCT gateway.
When an agent operates in RLM mode, the gateway's RLM Governance Engine evaluates every request against a per-lineage state machine that tracks:
- Lineage ID: A correlation identifier tying every call in a recursive chain back to the same root invocation
- Depth: How many levels deep the current call is (0 for root, 1+ for subcalls)
- Subcall count: Total number of sub-LM calls made across the entire lineage
- Budget units: Cumulative cost accounting with per-call cost attribution
- Mediation flag: Whether the subcall was routed through the gateway (UASGS check)
The UASGS Invariant
UASGS stands for Unmediated Agent Self-Generated Subcall. It is the single most important RLM governance control.
When an LLM in RLM mode calls llm_query() from within its REPL
environment, that call must route through the PRECINCT gateway, not directly
to an LLM provider. The gateway verifies this by requiring
uasgs_mediated=true on every subcall at depth > 0.
If a subcall arrives without this flag, it is denied with
RLM_BYPASS_DENIED (403).
This ensures that every recursive call, no matter how deep, passes through the full enforcement chain: identity verification, OPA policy evaluation, DLP scanning, and audit logging. An agent cannot escape governance by recursing deeper.
Architecture Flow
The following diagram shows how an RLM execution flows through the PRECINCT gateway. The agent's REPL environment (whether Deno/Pyodide, Docker, or any other sandbox) is where code runs. But every LLM call from within that environment routes through the gateway for governance.
Key points:
- The REPL sandbox executes code locally. It does not make network calls except through the gateway-configured LM client.
-
Every
llm_query()from the REPL is a full gateway request, subject to identity verification, OPA policy, and RLM budget checks. - The gateway returns RLM metadata (remaining budget, depth, subcall count) with every response, giving operators full visibility into the recursion.
Governance Controls
The RLM Governance Engine enforces four structural invariants on every recursive call chain. These are protocol-level controls that apply regardless of which RLM implementation the agent uses.
| Control | Default Limit | Reason Code | Purpose |
|---|---|---|---|
| Max Depth | 6 levels | RLM_HALT_MAX_DEPTH (429) |
Caps recursion nesting. Prevents unbounded depth even from prompt-injected agents. |
| Max Subcalls | 64 calls | RLM_HALT_MAX_SUBCALLS (429) |
Limits total sub-LM calls per lineage. Bounds breadth-first explosion. |
| Budget Units | 128 units | RLM_HALT_MAX_SUBCALL_BUDGET (429) |
Cost accounting with per-call attribution. Prevents cost explosions. |
| UASGS Mediation | Required | RLM_BYPASS_DENIED (403) |
Every subcall must route through the gateway. No bypass, no exceptions. |
Limits are configurable per request via the rlm_limits field
in the policy attributes. Operators can set different budgets for different
agent classes through OPA policy, allowing research agents more headroom
while keeping production agents tightly bounded.
Lineage Tracking
Every RLM execution is tracked as a lineage: a tree of
calls sharing a single lineage_id. The gateway maintains
per-lineage state including:
- Root run ID (the original invocation)
- Current and parent run IDs (for tree reconstruction)
- Cumulative subcall count and budget consumption
- Last observed decision ID (for audit trail correlation)
This means operators can reconstruct the full call tree after the fact, understand exactly how budget was consumed, and trace any anomalous sub-call back to its root invocation.
Integration
DSPy (Recommended)
PRECINCT's Python SDK provides
configure_dspy_gateway_lms() which sets up both a standard LM
and an optional RLM model, both routed through the gateway:
from mcp_gateway_sdk.runtime import configure_dspy_gateway_lms
lm, rlm = configure_dspy_gateway_lms(
gateway_url="https://gateway.precinct.local",
llm_model="claude-sonnet-4-5-20250929",
model_provider="anthropic",
spike_ref="secret:anthropic-api-key",
rlm_model="gpt-5-nano", # Model for RLM sub-calls
rlm_provider="openai",
rlm_spike_ref="secret:openai-key", # Credentials via SPIKE
)
The DSPy dspy.RLM module uses a sandboxed REPL (Deno/Pyodide)
for code execution. When the LLM calls llm_query() from within
the REPL, that call goes through dspy.settings.lm, which points
to the PRECINCT gateway. The governance chain is automatic.
Standalone rlms Package
If you use the standalone rlms package, point its backend to the gateway using an OpenAI-compatible base URL:
from rlm import RLM
rlm = RLM(
backend="openai",
backend_kwargs={
"model_name": "claude-sonnet-4-5-20250929",
"base_url": "https://gateway.precinct.local/v1",
},
)
As long as the model calls route through the gateway, RLM governance applies
automatically. The gateway detects RLM mode from the
execution_mode field in the request envelope.
Custom Implementations
Any RLM implementation works with PRECINCT governance as long as it:
- Routes all model calls through the PRECINCT gateway
- Sets
execution_mode: "rlm"in the request envelope - Provides a
lineage_idto correlate the call chain - Includes
rlm_depthanduasgs_mediated: trueon subcalls
The governance is protocol-level. It works with any language, any REPL environment, and any model provider.