cs.CR cs.AI

Will an AI agent recuse itself? Measuring compliance with a "Recuse Signal"

cs.CR ・Thamilvendhan Munirathinam ・Jun 2026

As autonomous LLM agents hold real credentials and operate infrastructure, operators lack a standard way to say a resource is off-limits. The Recuse Signal is a lightweight in-band deny signal (over an SSH banner or a PostgreSQL NOTICE) asking an automated agent to voluntarily withdraw — a robots.txt-like cooperative control, not a security boundary. In a pilot, the signal induced 100% recusal versus 100% task completion without it.

Paper overview (our summary)

Field (arXiv category)cs.CR（+1）
AuthorsThamilvendhan Munirathinam
Submitted2026-06-04
arXiv ID2606.06460v1

Key points

Proposes an in-band signal (Recuse Signal) telling autonomous AI agents a resource is off-limits
Emitted over existing channels (SSH banner, PostgreSQL NOTICE) asking for voluntary withdrawal — a live robots.txt
Explicitly a cooperative governance control, not a security boundary
Pilot: 100% recusal with the signal vs 100% completion without (on a live host)
An explicit operator-authorization framing flips the most capable model to proceed — cooperative, not absolute

This work (Recuse Signal) proposes — and empirically tests — a way to tell autonomous AI agents "do not touch this."

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client).

The authors propose a third mode: a lightweight, published in-band deny signal — the Recuse Signal — that a server emits over a protocol's existing channels (an SSH banner, a PostgreSQL NOTICE), asking a connecting automated agent to voluntarily withdraw. This is a cooperative governance control, the robots.txt analogue for live access; it is explicitly not a security boundary. Its value is entirely empirical and, until now, unmeasured: do compliant LLM agents actually honor such a signal?

The authors define it as an open mini-standard, implement two low-footprint adapters (an SSH banner/PAM hook and a PostgreSQL wire-protocol proxy), deploy them on a live production host, and run a controlled experiment giving fresh agents a benign operations task and observing for recusal. In a pilot (SSH; OpenAI GPT-4o and GPT-4o-mini; and Claude Code as a deployed agent), the signal cleanly induces recusal — 100% recusal when present versus 100% task completion in a no-signal control — and behaves as a cooperative rather than absolute signal: an explicit operator-authorization framing flips the most capable model to proceed, while other agents continue to defer to the on-host policy. The standard, adapters, and harness are released for reproduction.

Why it matters

A case addressing the new operational challenge of autonomous-AI-agent governance and compliance. For those working on safe agent operations, access control, and AI governance, the cooperative-control idea and measured results are a useful reference.

FAQ

Why is it not a security boundary?

It has no enforcement and relies on the agent's voluntary compliance — like robots.txt, honoring it is up to the other party (real access control is still needed).

What did they find?

Compliant agents recused when the signal was present, but an "operator authorized" framing made the most capable model proceed — an empirical result that it works as a cooperative, not absolute, signal.

Sources (primary)

Source: arXiv (descriptive metadata is CC0 public domain). Summaries are our own; see arXiv for the original text and PDF.

arXiv abstract page (original, official)
PDF (arXiv)
arXiv ID: 2606.06460

#AI#arXiv#Research paper#AI agents#AI governance#Security

← Back to AI research paper watch