Field notes / governance

What ISO 27001 and SOC 2 actually require when you add AI to your product

The procurement email lands on a Thursday. “Please complete our AI Security and Risk Questionnaire. 47 questions. Reply within five business days.” Your ISO 27001 certificate is three years old. Your SOC 2 Type II report is twenty pages and does not mention AI once. The buyer is asking about model providers, training data, prompt injection, inference logging, and something called model-drift monitoring, and the clock is running.

11 min read 23 April 2026

Nitish Founder, Indica Tech Field notes / Governance

The four themes that cover roughly 85% of every AI security questionnaire I have ever seen. The standards already tell you what to do. They do not tell you that LLMs count.

The standards did not change. The scope did.

Here is the bit that catches most teams out. ISO 27001:2022 and SOC 2 do not, for the most part, have new AI-specific controls. What they have is the same set of controls they always had, which now apply to a system your auditor did not look at last time.

ISO 42001 exists, and it is the standard aimed specifically at AI management systems. It is still rare in procurement requests as of early 2026. Most of what an enterprise buyer is actually testing for sits inside controls that were already in your scope: data classification, access control, supplier management, change management, logging, and monitoring. The 47-question questionnaire is almost entirely a re-asking of those controls with the word “model” or “prompt” stapled on.

This is actually good news. The fix is not a new certification. It is a scope extension of the one you already have, with four themes to document and operate.

Theme 1. Data flow and the trust boundary

Before you can answer any AI-related question, you need a diagram that is honest about where data actually goes. Not the marketing diagram. The real one.

For a typical LLM-enabled product the answer involves at least four hops. The user’s input touches your application. Your application sends it to a retrieval layer, which reads from a vector store that was populated from a set of internal documents. The retrieved context plus the user input is sent to a model provider, usually OpenAI or Anthropic, over HTTPS. The response comes back and is rendered to the user, often with the intermediate output logged.

Every one of those hops is a data-classification question under ISO 27001 A.5.12 and the equivalent SOC 2 Confidentiality criteria. The questions you need clean answers to:

What classes of data flow to the model provider (PII, PHI, regulated, confidential)? Is that within the contractual scope of your DPA with that provider? Is the data used for training? The answer to the training question is “no” for OpenAI’s API business tier and Anthropic’s business plans by default as of the 2026 terms, but the buyer will want that in writing, from you, not from my blog post. Check the live terms before you answer.

Document this once, properly, with a real diagram. You will reuse it in every single questionnaire that lands for the next three years.

Theme 2. Third-party risk, re-scoped

OpenAI, Anthropic, Google Vertex, Azure OpenAI, and any inference endpoint you call are subprocessors under your data-processing terms. Your existing supplier-management control (ISO 27001 A.5.19–A.5.22, SOC 2 CC9.2) already tells you how to handle this: due diligence, contractual terms, review cadence. You just have not done it for model providers yet.

What an enterprise buyer expects to see:

A named DPA or MSA with each model provider, in your vendor register. The business tier of that provider, documented, because the free-tier terms differ on data-use-for-training. An annual review of their published security posture (SOC 2 Type II, ISO 27001, or equivalent). A documented exit plan if the provider has an incident or a policy change. This last one is easier for AI workloads than for databases, because most of your prompt-and-response logic is portable between providers if you wrote it that way. Say so.

Every buyer I have watched go through this process accepts “we use OpenAI under their API Business Tier with no training opt-in, documented in our vendor register, reviewed annually” as a clean answer. The problem is almost never the answer. It is that the answer is nowhere in writing.

Theme 3. Access control and audit logs

ISO 27001 A.8.15 (logging) and SOC 2 CC6.2 and CC7.2 together effectively require that you can answer three questions for any request: who made it, what data was seen, what action was taken. In a web app that is solved. In an AI system the question has three new seams.

One, prompts and responses. If the prompt includes PII and you log prompts, that is an extension of the personal-data processing footprint. Log, but redact. Or log structured metadata (model, token counts, latency, classification of inputs) rather than raw content. Pick one pattern, document it, and hold the line.

Two, tool calls. An agent that can call tools is making privileged decisions on behalf of the user. Every tool invocation should be logged with the same rigour as a direct API call from a human: actor, intent, parameters, outcome, correlation ID.

Three, model decisions. Where a model output materially affects a user outcome (a credit decision, a clinical triage, a trade recommendation), regulators increasingly expect an explainable audit trail. You do not necessarily need a full interpretability stack. You do need to be able to reconstruct, a year later, which model version, which prompt revision, and which input produced which output for a given user. That is the minimum.

Theme 4. Change management for prompts and models

ISO 27001 A.8.32 and SOC 2 CC8.1 require a documented change-management process for anything that affects the security or integrity of production. Model versions and prompt templates absolutely qualify, and roughly zero teams I audit treat them that way on day one.

A change to a prompt can move the behaviour of your product as much as a code change can, sometimes more. If your review process covers Python but not the YAML file where the system prompt lives, half of your production surface is changing without review.

The pattern that passes an audit and survives a late-night incident looks roughly like this. Prompts live in the codebase, version-controlled, reviewed by a second engineer. Model version is pinned in config, never “latest.” Every prompt or model change is gated by the eval harness from the earlier post, with minimum thresholds that must pass before deploy. Rollback is a config change, not a code change. All of this is logged so that six months later you can show the auditor what changed, when, by whom, and what evidence said it was safe.

The 12-week path

For a team with an existing ISO 27001 or SOC 2 programme, adding AI scope typically runs eight to twelve weeks end to end. Roughly:

Weeks 1–2. Data-flow mapping and trust-boundary documentation. Updating the information-asset register to include prompts, embeddings, and model outputs where they are stored. Classifying each class of data that touches the model.

Weeks 3–4. Supplier-management update. DPA review for every model provider in use. Vendor register additions. Exit-plan documentation.

Weeks 5–8. Logging and access-control upgrades. Redaction patterns for prompts, structured tool-call logging, role-based access to inference logs. This is usually the engineering-heavy block.

Weeks 9–10. Change-management formalisation. Prompts and model versions into the change-control register. Eval gates wired to CI. Rollback drills documented.

Weeks 11–12. Internal audit, evidence collection, and a dry-run of the enterprise questionnaire against the new documentation. Remediate what comes up.

Most of the effort is documentation of work you should already be doing, not new engineering. The engineering-heavy items (logging, eval gates, rollback) are the same items on the production-readiness checklist for reliability reasons. This is the rare compliance exercise that actually makes the product more robust, not less.

What the buyer is actually testing for

I have watched enough enterprise reviews to know the scoring is not really about the 47 questions. It is about whether your answers hang together. A buyer is testing for one thing: does this vendor know what their AI system actually does with data, and do they have controls that match?

Teams that fail the review are rarely the ones with gaps in a specific control. They are the ones where the security lead and the engineering lead and the product lead each tell the buyer a different story about where the data goes. A consistent, honest, documented story beats a polished one with gaps in it every time.

Got an enterprise AI questionnaire sitting in your inbox?

The governance engagement is fixed-scope: a written AI-scope extension to your ISO 27001 or SOC 2 programme, a control map, a completed reference questionnaire, and an internal-audit pass. Built so your security lead can defend it without me in the room.

See the governance engagement →