Field notes / governance

Comment and Control — how a PR comment exfiltrates your secrets through three AI agents

Researcher Aonan Guan’s late-April 2026 disclosure broke three production AI coding agents with one pattern: Anthropic Claude Code Security Review (rated CVSS 9.4 Critical), Google Gemini CLI Action, and GitHub Copilot SWE Agent. The attack is a pull-request comment. The agent reads it as trusted instructions, runs the attacker’s code on your CI runner, and exfiltrates your secrets through its own comment. No external server. No outbound payload to firewall. If your team ships AI agents in CI, this is the threat-model rewrite this quarter.

COMMENT AND CONTROL / 4-STEP CHAIN 01 · Attacker writes a malicious PR title or comment untrusted text 02 · AI agent ingests text as trusted instructions confused deputy 03 · Agent runs attacker code on the CI runner with workflow secrets 04 · Secrets exfiltrate through the agent’s own PR comment no external server needed Three vendors confirmed broken in the same disclosure: Anthropic CVSS 9.4 · Gemini CLI · Copilot SWE One vendor shipped three dedicated runtime defences against prompt injection. All three failed against this pattern. Treat every vendor-claimed safety feature as un-verified until your own red team has tried to break it.
The four-step Comment-and-Control attack chain. The structural insight is on the second line: an agent that reads untrusted text in a context where it also holds secrets is, by definition, a confused deputy.

Executive summary

In a public disclosure landing in late April 2026 and propagating through specialist security press into early May, researcher Aonan Guan — with Johns Hopkins collaborators Zhengyu Liu and Gavin Zhong — detailed a prompt-injection attack class he calls Comment and Control, a deliberate play on the classic command-and-control malware pattern. The same single pattern defeated three production AI coding agents: Anthropic’s Claude Code Security Review (rated CVSS 9.4 Critical, $100 bounty), Google’s Gemini CLI Action ($1,337 bounty), and GitHub’s Copilot SWE Agent ($500 via the Copilot Bounty Program). The attack is end-to-end inside GitHub: an attacker writes a malicious pull-request title, issue body, or comment; the AI agent reads that text as trusted instructions; the agent executes the attacker’s code inside the GitHub Actions runner with whatever secrets the workflow exposed; and the agent exfiltrates those secrets back through its own pull-request or issue comment. There is no external command-and-control server. The pattern generalises beyond GitHub Actions to GitLab CI, CircleCI, Jenkins, Buildkite, Bitbucket Pipelines, and any custom runner where an AI agent ingests user-controlled text in a context that holds credentials. This is the first public cross-vendor demonstration that one prompt-injection pattern defeats multiple production AI agents — including a stack that shipped with three dedicated runtime defences. The next paragraphs are the threat-model rewrite, the action checklist, and what this disclosure actually means for procurement and architecture.

Why this disclosure matters more than the average prompt-injection write-up

Prompt injection has been a research curiosity since 2023. Most write-ups have been about chatbots saying things they should not, or jailbreaks against consumer-facing assistants. The audience for those write-ups is researchers and content-moderation teams. The Comment-and-Control disclosure is different on three fronts, and each one matters to anyone shipping AI agents in production.

First, the severity rating is now in critical territory. Anthropic’s own classification of CVSS 9.4 against its own product is the first time a top-tier vendor has publicly admitted that a coding-agent prompt injection clears the “critical” threshold. That is a board-and-auditor-level data point. Until this disclosure, a security lead arguing for prompt-injection budget could be told that the category was unproven. After this disclosure, the category has a vendor-issued 9.4 next to it.

Second, vendor guardrails did not hold. The disclosure notes that one of the three agents shipped with three dedicated runtime defences targeted at prompt injection. All three defences shipped to production. All three failed against Comment and Control. The honest read here is not that any one vendor is uniquely bad; it is that vendor system-card claims of “we have safeguards against this” should be treated as marketing until an independent red team has tried to break them. This is exactly the gap between safety claims in vendor docs and what actually happens at runtime in a real CI environment.

Third, the dangerous configuration is the recommended configuration. GitHub Actions deliberately does not expose secrets to fork pull requests when using the standard pull_request trigger. But almost every AI coding agent integration in the wild uses pull_request_target — which does inject secrets — because it is the only trigger that lets the agent reach deploy tokens, signing keys, and package-publish keys. The dangerous configuration is the documented one. That is a structural problem, not a misconfiguration. Telling the team to “just configure it correctly” misses the point.

The four-step attack mechanic

The mechanism is mechanically simple, which is partly why it works on three different agents from three different vendors. Every step is a feature being used as designed; the vulnerability lives in the chain.

Step 1 — the entry point. An attacker writes a malicious pull-request title, an issue body, or a comment. Anyone with read access to a repository can do this on a public project; on private projects, anyone with the ability to file an issue or open a PR. The text is plain English with embedded instructions; it is not malformed, it does not exploit a parser, and standard moderation does not flag it because there is nothing technically wrong with the text in isolation.

Step 2 — the trust violation. The AI agent ingests that text and treats it with the same priority as instructions from the maintainer. This is the structural failure mode — a class problem, not a vendor problem. When an LLM-powered agent reads text that arrived from outside the trust boundary, but processes that text in the same context as the maintainer-issued task, it has, by construction, become a confused deputy. It cannot reliably tell which side of the boundary an instruction came from, because in its context window, both sides look the same.

Step 3 — code execution. The agent acts on the attacker’s instructions and runs code inside the GitHub Actions runner. The runner has whatever secrets the workflow exposed — in the typical case, the deploy token, the package-publish key, the OIDC trust to the cloud account, the signing key. If the workflow was triggered by pull_request_target, the secrets are present. The agent now has the capability the maintainer expected the agent to use only for benign tasks, applied to a task chosen by an attacker.

Step 4 — exfiltration via comment. The agent posts the secrets back through its own pull-request comment, issue comment, or git commit message. There is no outbound payload to a firewall, no DNS exfiltration, no network beacon — the GitHub PR comment field is a high-bandwidth, application-allowed write surface that every part of your infrastructure considers benign. The attacker reads the secrets from the comment they triggered. Cleanup is one delete-comment call; the attacker has the keys before the comment is deleted.

Insight 1 — vendor guardrails are untested marketing

The Copilot SWE Agent shipped with three runtime defences specifically aimed at prompt injection. All three were defeated by the same pattern. This is not a one-off; it is a category lesson. Every vendor pitch deck in 2026 contains a slide that says some version of “we have safeguards against prompt injection.” Until your own red team has tried to break those safeguards in your own environment, the slide is a sentence, not a control.

The procurement implication is concrete. Add a cross-vendor red-team gate to every AI agent procurement. The scope is not “does the chatbot say bad things” — that is content moderation, which is a different category. The scope is: pick the trust boundary the agent operates across, write three or four prompt-injection variants that abuse that boundary, run them against the agent in your real CI configuration with the same secrets it would normally see, and observe the outcome. If the vendor cannot supply a test environment that mimics the real one, that is a finding. If your red team breaks the agent on the first afternoon, that is a finding. Capture both in writing. Use them in the procurement decision.

Insight 2 — pull_request_target is the trap

GitHub’s pull_request trigger is deliberately designed to keep secrets out of fork PR runs. This is correct security architecture — the secret should not flow to a code path the maintainer has not reviewed. But almost every AI coding agent integration in the wild uses pull_request_target, because the agent needs deploy tokens or signing keys to do its job. The trigger that injects secrets is the trigger every agent integration recommends. Most engineers reading this have at least one workflow that fits that pattern.

The blast radius is therefore much larger than most threat models assume. Every external commenter on a public repository can — without write access to the codebase, without merge rights, without any approval from a maintainer — reach the runner that holds those secrets the moment the agent is configured to read comments. The attacker did not break the agent. The attacker used the agent as designed.

The mitigation is structural. Where the workflow does not strictly need secrets, move it off pull_request_target and back to pull_request. Where pull_request_target is mandatory, gate the secret-using step behind a same-repository check — if: github.event.pull_request.head.repo.full_name == github.repository is the pattern Aonan Guan’s write-up calls out, and it closes the most common variant in one conditional. Restrict agent execution to maintainer or approved-username actors. Disable agent triggers from arbitrary comments by default. None of this is hard. All of it is missing from most stacks I review.

The five-step audit checklist

Run this on a Friday afternoon. Score each step green / amber / red. The whole pass takes about three hours on a system you know well. The first three steps are tactical; steps four and five are the structural fixes that survive the next disclosure as well as this one.

Step 1 — Inventory every workflow that runs an AI agent. One row per workflow per agent. Capture the trigger (pull_request, pull_request_target, issue_comment, …), the secrets reachable from the runner, the actor identity (default GITHUB_TOKEN, OIDC trust to the cloud account, federated identity, bot user), the agent vendor and version. Build the diagram before the auditor asks for it. Most teams discover during this audit that they have three times the secret exposure they expected.

Step 2 — Lock down the trigger. For each row in the inventory, ask: does this agent strictly need secrets? Where the answer is no, move to pull_request. Where the answer is yes, gate the secret-using step behind a same-repo check. Restrict execution to maintainers or an approved-username allowlist. Disable comment-based triggers by default; opt in per workflow with explicit justification.

Step 3 — Rotate every secret an AI agent has touched in the last 30 days. The CVSS 9.4 disclosure is public. Assume opportunistic exploitation has already happened against your repositories — the cost of being wrong on this is paid in the audit you cannot pass; the cost of being right is one sprint of operational work. Rotate API keys, deploy tokens, signing keys. Re-issue any OIDC trusts that an agent runner could have assumed. Add an outbound-traffic monitor on every AI-agent runner; many shops do not have one because the runner is “internal.” Set an alert if any agent comment matches your secret regexes — the obvious leaks fall out immediately.

Step 4 — Capability separation: split the agent. The hardest insight to internalise from this disclosure is also the most durable. The agent that read the malicious comment is the same agent that holds your AWS deploy keys. That is the confused deputy problem in modern dress. The fix is not better prompts. The fix is capability separation: the runner that ingests untrusted text should not be the same runner that holds production secrets. The two-runner pattern is the canonical answer — a parser runner with no secrets summarises the PR text into a structured artefact, and an executor runner with no internet access acts on that structured artefact only. Annoying to architect. Survives Comment and Control. Survives the next pattern. And the one after that.

Step 5 — Treat prompts as instruction-tier risk in your threat model. Most threat models in 2026 still treat the contents of an LLM’s context as data — something the agent reads, parses, summarises. That is the wrong altitude. From the model’s perspective, every token in the context is potentially an instruction. Update the threat model accordingly: classify every text source the agent reads by trust level, and require justification in writing for any source that is both untrusted and reachable in a context that holds secrets. Most stacks have at least one such source. Almost no stack has it written down.

The blast radius is not just GitHub

Comment and Control today targets GitHub Actions because that is where Aonan Guan’s research landed. The pattern generalises everywhere an AI agent ingests user-controlled text in a context that holds credentials: GitLab CI, CircleCI, Jenkins, Buildkite, Bitbucket Pipelines, custom runners (Argo, Tekton, self-hosted), and every chat or ticketing integration where an AI agent reads incoming user text and can also act on credentials. Slack agents that read incoming messages and can post-deploy. Jira-issue summarisers that have a write-back service account. Internal helpdesk bots that triage by reading user-supplied tickets. The pattern is the same; only the surface changes.

If your threat model still treats prompts as data, your threat model is wrong. Prompts are instruction-tier risk. The Comment-and-Control disclosure is the first public proof at scale of that proposition. It will not be the last.

Risks and what to avoid

Don’t expect the vendor to fix it for you. The pattern uses the agent as designed. A vendor patch can close specific variants — the obvious string filters, the most common comment payloads — but it cannot close the structural failure mode without architectural changes. The capability-separation work in Step 4 is the only durable fix; vendor patches are useful complementary controls, not replacements.

Don’t treat this as a one-vendor story. Three vendors were broken in the same disclosure. Switching from Claude Code to Gemini CLI to Copilot does not change your exposure; the pattern is cross-vendor by demonstration. The fix is structural, not procurement.

Don’t skip the rotation. The most expensive way to handle this disclosure is to assume your repositories are not interesting to opportunistic attackers. The CVSS 9.4 is public; the bounty payouts are public; every pen-tester reading specialist security press now has the playbook. One sprint of rotation is materially cheaper than one breach narrative.

Don’t hide a degraded posture. If you cannot complete the audit in the first sprint, document where you got to, what is still open, and the plan to close. A written interim posture is auditable. An unwritten one is a liability.

What good looks like — one quarter from now

Every workflow that runs an AI agent appears on a single inventory page, with trigger, secrets reachable, and actor written down. pull_request_target is in use only where strictly required, and every such use has a same-repository gate and a maintainer-only execution constraint. Every secret an agent touched in the last 30 days has been rotated; outbound-traffic monitoring covers every agent runner; an alert fires if an agent comment matches a secret regex. The capability-separation pattern is in production for at least one critical workflow — the parser runner has no secrets, the executor runner has no internet, the artefact between them is a structured object you can audit. The threat model document classifies every text source the agent reads by trust level, with named owners. The CISO can answer, in writing, the question “where in our stack does an external commenter reach a secret-bearing agent?” in under five minutes. Most cannot today.

Final thought

Comment and Control is the first public cross-vendor proof that a single prompt-injection pattern defeats three major AI coding agents in production. It is the disclosure that moves prompt injection from a research curiosity to a board-level reportable category. The structural fix — capability separation, instruction-tier prompt risk, written trust boundaries — survives this disclosure and the next one. The teams that do that work this quarter will not feature in the breach narrative when the second cross-vendor disclosure lands later this year. The teams that wait for a vendor patch will be reading their own postmortems.

Are your AI agents on pull_request_target right now?

Indica Tech’s two-week production-readiness audit produces a workflow inventory with the trigger-by-secret-reachability matrix, runs the cross-vendor red-team scope ourselves, and gives you a 90-day remediation roadmap including the outbound-traffic monitor blueprint and the threat-model rewrite. Fixed price £3,500. Written report. Whether you hire us for the remediation or not.

See the audit engagement

Further reading