RLM Governance

Recursive Language Models (RLM) are based on research by Zhang, Kraska & Khattab (2025), arXiv:2512.24601. PRECINCT does not block RLM recursion. It governs it.

RLM lets agents decompose problems by recursively calling themselves and other models through a code execution environment. It is one of the most powerful patterns in agentic AI, and one of the most dangerous without proper governance. PRECINCT provides protocol-level controls that work with any RLM implementation.

What Is RLM?

Traditional LLM inference follows a simple pattern: send a prompt, receive a response. RLM replaces this with a fundamentally different model. Instead of feeding the entire context into a single call, the LLM gets a REPL (Read-Eval-Print Loop) environment where it can write code, examine data programmatically, call sub-LMs for semantic analysis, and build up answers iteratively.

In practice, this means an agent can:

  • Decompose large contexts by writing code to split, filter, and analyze data in chunks rather than trying to process everything at once
  • Make recursive sub-calls to itself or other models, each handling a piece of the problem, then synthesize the results
  • Execute iteratively through multiple REPL cycles, observing intermediate results and adjusting strategy
  • Process near-unlimited context because the context lives in the REPL environment, not in the model's context window

Think of the difference this way: traditional inference asks the LLM to read an entire book and answer a question. RLM gives the LLM a desk, a pen, and the ability to open the book to any page, take notes, and ask a colleague to read specific chapters.

Why RLM Needs Governance

The same capabilities that make RLM powerful make it dangerous in production. Without governance, a recursive agent can:

RLM risks without governance
Risk What Happens Impact
Unbounded recursion Agent keeps spawning sub-calls that spawn more sub-calls Exponential cost, resource exhaustion, cascading failures
Cost explosion Each recursive call consumes model tokens; deep trees multiply cost geometrically Unexpected bills in the thousands from a single task
Bypass attacks A prompt-injected agent makes direct sub-calls that skip the gateway Policy, DLP, and audit controls are silently circumvented
Hidden escalation Sub-calls accumulate permissions or access patterns not visible at the root level Data exfiltration or privilege escalation through recursive depth

The core problem is that RLM recursion is opaque by default. The root caller has no visibility into what sub-calls are doing, how deep the tree has grown, or how much budget has been consumed. PRECINCT makes it observable and controllable.

How PRECINCT Governs RLM

PRECINCT's RLM governance operates at the protocol level, not the implementation level. It does not care whether you use DSPy's dspy.RLM module, the standalone rlms package, or a custom implementation. What matters is that every model call routes through the PRECINCT gateway.

When an agent operates in RLM mode, the gateway's RLM Governance Engine evaluates every request against a per-lineage state machine that tracks:

  • Lineage ID: A correlation identifier tying every call in a recursive chain back to the same root invocation
  • Depth: How many levels deep the current call is (0 for root, 1+ for subcalls)
  • Subcall count: Total number of sub-LM calls made across the entire lineage
  • Budget units: Cumulative cost accounting with per-call cost attribution
  • Mediation flag: Whether the subcall was routed through the gateway (UASGS check)

The UASGS Invariant

UASGS stands for Unmediated Agent Self-Generated Subcall. It is the single most important RLM governance control.

When an LLM in RLM mode calls llm_query() from within its REPL environment, that call must route through the PRECINCT gateway, not directly to an LLM provider. The gateway verifies this by requiring uasgs_mediated=true on every subcall at depth > 0. If a subcall arrives without this flag, it is denied with RLM_BYPASS_DENIED (403).

This ensures that every recursive call, no matter how deep, passes through the full enforcement chain: identity verification, OPA policy evaluation, DLP scanning, and audit logging. An agent cannot escape governance by recursing deeper.

Architecture Flow

The following diagram shows how an RLM execution flows through the PRECINCT gateway. The agent's REPL environment (whether Deno/Pyodide, Docker, or any other sandbox) is where code runs. But every LLM call from within that environment routes through the gateway for governance.

sequenceDiagram participant Agent as Agent (DSPy / rlms) participant REPL as REPL Sandbox participant GW as PRECINCT Gateway participant RLM as RLM Governance Engine participant OPA as OPA Policy participant LLM as LLM Provider Agent->>GW: PlaneRequest (execution_mode: rlm, depth: 0, lineage_id: L1) GW->>RLM: Evaluate lineage L1 RLM-->>GW: RLM_ALLOW (depth 0, budget 128 remaining) GW->>OPA: Standard policy evaluation OPA-->>GW: allow GW->>LLM: Forward to model LLM-->>GW: Response with REPL code GW-->>Agent: Response + RLM metadata Agent->>REPL: Execute code in sandbox Note over REPL: Code calls llm_query(prompt) REPL->>GW: PlaneRequest (depth: 1, lineage_id: L1, uasgs_mediated: true) GW->>RLM: Evaluate lineage L1 RLM-->>GW: RLM_ALLOW (depth 1, 63 subcalls remaining) GW->>OPA: Standard policy evaluation OPA-->>GW: allow GW->>LLM: Forward to model LLM-->>GW: Sub-response GW-->>REPL: Response + RLM metadata REPL-->>Agent: llm_query() result Note over Agent: Agent synthesizes and submits final answer

Key points:

  • The REPL sandbox executes code locally. It does not make network calls except through the gateway-configured LM client.
  • Every llm_query() from the REPL is a full gateway request, subject to identity verification, OPA policy, and RLM budget checks.
  • The gateway returns RLM metadata (remaining budget, depth, subcall count) with every response, giving operators full visibility into the recursion.

Governance Controls

The RLM Governance Engine enforces four structural invariants on every recursive call chain. These are protocol-level controls that apply regardless of which RLM implementation the agent uses.

PRECINCT RLM governance controls
Control Default Limit Reason Code Purpose
Max Depth 6 levels RLM_HALT_MAX_DEPTH (429) Caps recursion nesting. Prevents unbounded depth even from prompt-injected agents.
Max Subcalls 64 calls RLM_HALT_MAX_SUBCALLS (429) Limits total sub-LM calls per lineage. Bounds breadth-first explosion.
Budget Units 128 units RLM_HALT_MAX_SUBCALL_BUDGET (429) Cost accounting with per-call attribution. Prevents cost explosions.
UASGS Mediation Required RLM_BYPASS_DENIED (403) Every subcall must route through the gateway. No bypass, no exceptions.

Limits are configurable per request via the rlm_limits field in the policy attributes. Operators can set different budgets for different agent classes through OPA policy, allowing research agents more headroom while keeping production agents tightly bounded.

Lineage Tracking

Every RLM execution is tracked as a lineage: a tree of calls sharing a single lineage_id. The gateway maintains per-lineage state including:

  • Root run ID (the original invocation)
  • Current and parent run IDs (for tree reconstruction)
  • Cumulative subcall count and budget consumption
  • Last observed decision ID (for audit trail correlation)

This means operators can reconstruct the full call tree after the fact, understand exactly how budget was consumed, and trace any anomalous sub-call back to its root invocation.

Integration

DSPy (Recommended)

PRECINCT's Python SDK provides configure_dspy_gateway_lms() which sets up both a standard LM and an optional RLM model, both routed through the gateway:

from mcp_gateway_sdk.runtime import configure_dspy_gateway_lms

lm, rlm = configure_dspy_gateway_lms(
    gateway_url="https://gateway.precinct.local",
    llm_model="claude-sonnet-4-5-20250929",
    model_provider="anthropic",
    spike_ref="secret:anthropic-api-key",
    rlm_model="gpt-5-nano",           # Model for RLM sub-calls
    rlm_provider="openai",
    rlm_spike_ref="secret:openai-key", # Credentials via SPIKE
)

The DSPy dspy.RLM module uses a sandboxed REPL (Deno/Pyodide) for code execution. When the LLM calls llm_query() from within the REPL, that call goes through dspy.settings.lm, which points to the PRECINCT gateway. The governance chain is automatic.

Standalone rlms Package

If you use the standalone rlms package, point its backend to the gateway using an OpenAI-compatible base URL:

from rlm import RLM

rlm = RLM(
    backend="openai",
    backend_kwargs={
        "model_name": "claude-sonnet-4-5-20250929",
        "base_url": "https://gateway.precinct.local/v1",
    },
)

As long as the model calls route through the gateway, RLM governance applies automatically. The gateway detects RLM mode from the execution_mode field in the request envelope.

Custom Implementations

Any RLM implementation works with PRECINCT governance as long as it:

  1. Routes all model calls through the PRECINCT gateway
  2. Sets execution_mode: "rlm" in the request envelope
  3. Provides a lineage_id to correlate the call chain
  4. Includes rlm_depth and uasgs_mediated: true on subcalls

The governance is protocol-level. It works with any language, any REPL environment, and any model provider.