Field notes / governance

Your AI agents are ghost users in your IAM stack

You probably have 10–20 AI agents in production. None of them appear in your quarterly access review. Microsoft just shipped Agent 365 Runtime Protection because every prod AI org now has agents acting as ghost users in its own IAM stack. If your security team is reviewing humans but not agents, you have privileged service accounts running 24/7 that nobody is watching. This is the five-gate hygiene checklist, the failure modes underneath each one, and what to do this week.

THE GAP / PROD AI ORG IN 2026 18 agents per prod AI org (typical) inventory baseline 1 shared service account they all use ai-svc, .env, two repos 0 in the last quarterly access review humans only 0 with documented offboarding when the engineer leaves Most orgs hit zero on three of four. Microsoft shipped a runtime control plane because the gap is now product-line-shaped. Treat agents as identities, not as code.
The shape of the agent-identity gap on the typical 2026 production AI estate. The five gates below close it.

Executive summary

The 2026 production AI org has 10–20 agents in active service: cron jobs, Slack bots, internal automations, customer-facing assistants. They authenticate, call APIs, read and write customer data, and run on a schedule. They behave like employees. Most teams govern them like code — static review at deploy, maybe a secret scan, then forgotten. Microsoft’s Agent 365 Runtime Protection, in public preview from Apr 30 2026 with a Q3 GA, is a market-level admission that this gap exists. The five gates below are the upstream hygiene every CTO should land before bolting on a runtime product. Most orgs start at 1/5. Three weeks of disciplined work gets a production-minded team to 5/5.

The business problem

Your security team reviews humans every quarter. Joiners. Movers. Leavers. Privilege escalations. Shared-secret rotations. The runbook is mature, the tooling is mature, the audit trail satisfies the regulator. None of it covers the agents.

Meanwhile your engineering team has shipped, over the last twelve to eighteen months, a quiet portfolio of automations. The cron called daily summariser. The Slack bot that creates Jira tickets from incoming customer emails. The internal automation a contractor wrote in 2024 and forgot. The new agent that drafts responses in your support inbox. Each one has credentials. Each one runs on a schedule. Each one touches customer data. None of them appear on the access review.

This is not a hypothetical. I have walked into eight production AI estates this year, and every one had a version of the same finding.

Why traditional approaches fail

The reflex when a CTO hears this is to ask security for a tool. There are now several — including Microsoft’s — and they will help. But buying a runtime-protection product before fixing the upstream hygiene buys you a dashboard with one red light per agent, no clear owner, no remediation path, and a security team that now has more visibility than authority.

The other reflex is to extend the human IAM playbook unchanged. That doesn’t work either, because humans and agents have different lifecycles. A human onboards once and offboards once. An agent forks. The same engineer ships v1, v2, v3 of the same agent across four months; v1 still runs in a dusty corner of staging; v2 is in test; v3 is in prod. Five service accounts, one engineer, one purpose. Human IAM doesn’t model that.

The fix is not a tool. It is a discipline, ordered, that you apply once and maintain quarterly. The five gates below are that discipline.

Gate 1 — Inventory: you don’t know how many agents you run

The honest test is timed. If you cannot answer “how many agents do we have in production” in under five minutes, that is the finding. Not later, not after the meeting — in under five minutes, in writing, with names.

The fix is a one-page inventory. Every agent: name, owner, purpose, last-changed date, identity used, data classes touched. Maintained as a checked-in file or as a row per agent in the same system that holds your service catalogue. Not a spreadsheet on someone’s laptop. Not a Notion page nobody updates. The act of maintaining the inventory is what closes the gate; the artefact is just the byproduct.

The worst case is the contractor-2024 agent. It runs, it has credentials, it touches customer data, nobody knows it exists. The quarter you discover it during an incident is the quarter the regulator stops trusting your access controls.

Gate 2 — Identity model: one shared service account

The most common 2026 setup: three engineers shipped four agents, all four share one service account named ai-svc, the credentials live in a .env file in two repos. Probably also pasted in a Slack thread three months ago. This is the AI-era equivalent of the shared admin password.

The problem is not theoretical. When something breaks, you cannot tell which agent broke it. When credentials need rotating, you cannot rotate without coordinating across four code paths owned by three people. When the regulator asks who acted, the answer is “the service account” — which is no answer at all.

The fix is one identity per agent. A service principal, an IAM role, a scoped API key. Named after the agent, owned by the same engineer, lifecycle-bound to the same source repository. Rotate any agent that breaks the rule first — the ones that share an account are also the ones with the most permissions, in my experience, because the shared account accreted permissions across four different jobs.

Gate 3 — Permission scope: implicit, not written

Most agent permissions are inherited rather than designed. The shared service account has read access to half your warehouse. Therefore the new agent has read access to half your warehouse. Nobody decided that; it’s just where the credentials lived.

The fix is a five-line written contract per agent. Which APIs can it call. Which data classes can it read. Which customer accounts can it act on. Which actions are logged with what fields. What is its kill switch. Five lines. Reviewed at deploy. Re-reviewed at the quarterly access review.

The point is the writing, not the ceremony. If the document doesn’t exist, the agent has every permission its credentials inherited. Once the document exists, you can grant capability-scoped least privilege — the agent has only the permissions its written contract requires, your IAM enforces it, your SIEM detects deviations, and the security team can answer the regulator’s questions in minutes rather than days.

Gate 4 — Logging: agent actions hidden under service-account

When the breach happens, your incident response is one-day archaeology. “Who called this API at 14:32?” The SIEM says the service account. Which agent? Which workflow? Which engineer owns it? Which customer was affected? Unknown. By the time you trace it, the regulator’s clock has been running for six hours.

The fix is named-identity logging end-to-end. Every agent action lands in your SIEM under the agent’s named identity, with workflow and owner attribution, with a trace ID that ties the action to the upstream request that caused it. Named identities in the SIEM cost an afternoon to wire. Anonymous service-account logging costs you the breach narrative the day it matters.

Bonus: once your agents are named in the SIEM, your existing detection rules start to work on them. The threshold-based alerts your security team built for human accounts — access from new geographies, unusual API patterns, off-hours data exfiltration — suddenly cover agents too, with no new tooling.

Gate 5 — Lifecycle: no offboarding for agents

Your offboarding runbook covers humans. Revoke SSO. Disable accounts. Rotate shared secrets. Review the access matrix. The runbook does not cover the agents the leaving engineer deployed. Those agents are still running, still authenticated, still touching customer data, still nobody’s job — until something breaks and the team realises a former colleague’s name appears in the most recent commit.

The fix is to add agents to the quarterly access review and to the leaver checklist. When an engineer leaves, every agent they own is rotated, retired, or transferred — same day. No exceptions, no “we’ll do it next week,” no waiting for the next sprint. The reason this gate is last is that it is the cheapest of the five and the one teams skip first; the reason it matters is that it’s where the breach narrative starts, every time.

The five-gate hygiene checklist, in one place

Run this on a Friday afternoon. The whole pass takes about two hours on a system you know well, four on one you don’t. Score each gate green / amber / red. Three weeks of focused work moves a typical org from 1/5 to 5/5 — this is well-trodden ground.

Gate 1 — Inventory. Every agent, owner, identity, data scope listed. Maintained quarterly.

Gate 2 — Identity model. One service principal per agent. No shared accounts. No .env credentials.

Gate 3 — Permission scope. Written capability contract per agent. Capability-scoped least privilege enforced in IAM.

Gate 4 — Logging. Named-identity attribution in SIEM. Every action ties to a named agent and an upstream request.

Gate 5 — Lifecycle. Agents on the quarterly access review and the leaver checklist. Rotated, retired, or transferred when the owner leaves.

Risks and what to avoid

Don’t buy the runtime tool first. Microsoft’s Agent 365 Runtime Protection (and its competitors) is a useful layer once gates 1–3 are in place; it is a frustrating dashboard before then, because every alert points at a problem you cannot fix without knowing which agent it is and who owns it. Hygiene first, runtime tooling second.

Don’t over-rotate. The temptation when you discover the shared service account is to rotate everything immediately. That breaks production. The right shape is: stop the bleeding (no new agents on the shared account), then plan a methodical migration to per-agent identities over four to six weeks, with rollback paths for each agent.

Don’t put agents in your human IAM groups. The temptation is to drop the agents into the same groups as the engineers who own them. The blast radius of that decision shows up six months later when an engineer’s permissions get reviewed and the agent silently inherits the upgrade. Agents need their own group structure, mapped to capability not personnel.

What good looks like — one quarter from now

The inventory exists, lives in the same source-of-truth as your service catalogue, and is updated automatically when an agent is deployed. Each agent has its own service principal with capability-scoped permissions and a one-page written contract. Every agent action lands in the SIEM under a named identity with workflow attribution. The quarterly access review covers humans and agents, with the same rigour and the same template. When an engineer leaves, agents are rotated or retired the same day — and the leaver checklist has a checkbox to prove it.

The CISO can answer, in writing, the question “who or what acted on customer data at 14:32 on Tuesday” in under five minutes. Most can’t. The orgs that get to that answer this quarter are the orgs that won’t feature in the breach narrative when the first big agent-identity incident lands.

Final thought

Microsoft’s Agent 365 Runtime Protection is a useful tool. It is also a market signal: the gap between how organisations govern agents and how they govern humans is now visible enough to fund a product line. The CTOs who close that gap with hygiene first — inventory, identity, scope, logging, lifecycle — will use the runtime tool well when it lands. The CTOs who skip the hygiene and buy the dashboard will spend the next year explaining red lights they cannot remediate. Your call.

How many of the five gates does your stack pass?

Indica Tech’s two-week agent-identity audit produces a full inventory of every production agent, scores you against the five-gate hygiene checklist, ranks risk gaps by likelihood and blast radius, and gives you an implementation roadmap aligned to your IAM stack. Fixed price £3,500. Written report. Whether you hire us for the remediation or not.

See the audit engagement

Further reading