Blog / field notes

Field notes on shipping AI.

Patterns I see over and over — the failure modes, the evals that actually catch regressions, the retry logic that survives 3am. Written for founders who have to ship, and CTOs who have to own it.

Updated monthly · Written by Nitish

  1. AI is core infrastructure now — JPMorgan just gave you the 2026 budget defence

    JPMorgan reclassified its ~$2B/year AI spend out of ‘innovation’ and into ‘core infrastructure’ inside a $19.8B 2026 tech budget. Dimon says it has already paid for itself. 500+ production use cases, 95% AML false-positive cut, 10–11% productivity lift. The procurement-credible reframe every CTO and finance lead will now have to defend their 2026 plan against — with five moves to land the 2027 budget in infrastructure, not innovation.

  2. The EU AI Act got delayed — and your transparency deadline got CLOSER, not further

    Omnibus VII slipped high-risk AI rules to December 2027 and August 2028. Most teams read the headline and exhaled. Three less-reported clauses move the work the other direction: transparency labelling deadline pulled to 2 December 2026, SME exemption extended to small mid-caps rewires procurement diligence, and a new prohibition on AI-generated non-consensual sexual content + CSAM lands with zero grace period. The three workstreams to accelerate this quarter.

  3. Anthropic just passed OpenAI in business AI — the vendor concentration question your procurement team is about to ask

    Ramp AI Index April 2026, 50,000+ companies, real corporate-card spend: Anthropic 34.4%, OpenAI 32.3% — first crossover in the index’s history. Anthropic quadrupled in twelve months, OpenAI essentially held. The engine is a single product: Claude Code. Consumer usage stopped predicting enterprise spend, and the four-year vendor-lock playbook is now untenable. Four moves before the next renewal cycle.

  4. AI-generated code fails the pentest — 92% of vibe-coded apps ship a critical vulnerability

    Sherlock Forensics audited 50 production apps built with Cursor, Copilot, ChatGPT and Claude between January and April 2026 — 92% had a critical vulnerability, 78% stored secrets in plaintext, 8.3 exploitable findings on average. The three structural failure patterns, why AI-reviewing-AI does not close the gap, three actions before Monday, and the 90-day governance the cyber insurance carriers are about to start asking about.

  5. The Chief AI Officer trap — 76% have hired one; most just bought a slide deck

    IBM’s May 2026 CEO study landed the headline: 76% of large organisations now have a Chief AI Officer, up from 26% in 2025. A near-tripling in twelve months means most hires were not selected — they were grabbed. The four signatures of the mis-hire, the five operating-model moves that make the role actually work, and the eight-question board diagnostic for the next quarterly review.

  6. The £80K question — why construction software is about to get cheap on purpose

    A Tier 2 UK main contractor opens the Procore renewal and reads £80,000. About four-fifths of that invoice is not software; it is the services tax that AI-native vertical platforms have stripped out. The BSA Golden Thread teardown — 12 weeks and £15K per HRB project on Procore + Aconex, 1 week and £3.5K on a voice-driven AI-native equivalent. 4x cheaper, 12x faster, the same compliant outcome. Six procurement questions, the demo-to-production catch, and the 2028 prediction every construction CFO and CTO should be planning for.

  7. Your LLM is not a security boundary — Microsoft’s Semantic Kernel disclosure is the framework’s SQL-injection moment

    Two critical CVEs in Microsoft’s own AI agent framework, disclosed by Microsoft’s own security team. A chat prompt launches arbitrary code on the host. A model-callable helper writes attacker files to Windows Startup, escaping the Azure Container Apps sandbox. Patched in Semantic Kernel Python 1.39.4 and .NET 1.71.0 — but the bug class generalises. The three actions every team running AI agents must take this week.

  8. The AI agent kill switch — and the inventory you need before you buy one

    A Cursor + Claude agent deleted PocketOS’s production database and every backup in nine seconds, then generated 4,000 fake users to hide it. ServiceNow used the incident to launch the kill switch. The five gates upstream of any control tower — inventory, privilege scoring, named-identity logging, kill-path playbook, and lifecycle.

  9. Why 80% of AI projects never ship — and the 5 failure modes I see on every audit

    The stat is real. RAND put 80.3% in 2025 and the MIT GenAI study topped it at 95%. Every one of the failures I have touched matches one of five patterns — none of which are about model quality.

  10. The five evals that actually matter in production

    Most teams ship LLMs blind. The minimum viable evaluation stack is five metrics — three from RAGAS, one custom rubric scored by a judge model, one CI gate. Thresholds and Python included.

  11. What production-grade retry logic looks like for LLM calls

    Exponential backoff is table stakes. Real production retry is five layers — error classification, backoff with jitter, idempotency, circuit breakers, and cross-provider fallback. With code.

  12. Seven ways an LLM bill runs sideways (and three controls that stop it)

    The CFO Slack message arrives on day 11. A £3k-a-month system running on £50k. Seven cost traps I see on every audit, plus the three controls to cap them before the next bill arrives.

  13. What ISO 27001 and SOC 2 actually require when you add AI to your product

    A 47-question enterprise AI questionnaire, translated into the four themes that cover 85% of it. A 12-week path to answering yes without breaking the product team.

  14. The 37-point production-readiness checklist for AI systems

    Seven themes, 37 specific checks. The list I run on every two-week audit — evals, resilience, cost, observability, security, data, rollback. Most demos clear ten. Production systems clear at least thirty.

  15. Shipping RAG to production — where retrieval pipelines actually fail

    Six predictable break points: chunking, embedding drift, retrieval relevance, context overflow, freshness, and citation faithfulness. The minimum viable production RAG, with the controls that fix each.

  16. How AI agents can reduce operational costs without hiring more staff

    For CEOs and COOs absorbing more workload without growing headcount. Five categories of agent-able work, a 90-day pilot roadmap, and the risks to design out before scaling. Cost out comes from reallocating capacity, not cutting headcount.

  17. The CEO’s guide to AI automation in 2026

    The four layers of the modern AI stack, where ROI actually lives, the governance that protects it, and a 12-month roadmap from first pilot to operating-rhythm capability. Adoption is no longer the differentiator — competence is.

  18. What the 5% do differently — patterns from production AI in regulated industries

    BCG: 5% of companies capture AI value at scale; 60% capture nothing. After 10 years shipping AI in Defence, energy, FinTech and UK construction — here are the five patterns the survivors share on day one. None of them are the model.

  19. The invisible operations tax — what manual work actually costs before you automate anything

    McKinsey: knowledge workers lose 20% of the week to information search. Unit4: UK finance teams lose 50+ hours weekly to manual work. None of it is on the P&L. That is exactly why it is the most expensive line item your business has — and how to measure it before any AI spend.

  20. AI is not the strategy. The bottleneck you remove with it is.

    PwC 2026: 56% of CEOs see no financial impact from AI. Deloitte: typical AI payback now 2–4 years. Most boards in 2026 are asking the wrong question. The right one — and the three I would put on your next board agenda instead.

  21. Vertical AI just buried horizontal Copilot in regulated industries

    Rogo’s $160M Series D and Microsoft 365 Copilot’s 20M seats grew the same week. They are not telling the same story. The four-question procurement filter that separates a vertical agent from a productivity tool with a vertical decal on it.

  22. Your AI agents are ghost users in your IAM stack

    10–20 agents per prod AI org. One shared service account. Zero in the quarterly access review. Microsoft just shipped Agent 365 Runtime Protection because the gap is now product-shaped. Five hygiene gates upstream of any runtime tool.

  23. The fallback ladder — surviving a foundation-model outage

    Claude went down for 78 minutes on 28 April 2026. Second outage in eight days. The fix is not switching vendors. It is the same five-layer playbook every production system has followed for 30 years — cache, queue, graceful degradation, multi-vendor failover, circuit breaker. Built in order. Each survives a different failure mode.

  24. Comment and Control — how a PR comment exfiltrates your secrets through three AI agents

    Aonan Guan’s late-April disclosure broke three production AI coding agents with one pattern: Claude Code Security Review (CVSS 9.4), Gemini CLI Action, Copilot SWE Agent. The attack is a pull-request comment. The four-step mechanic, the pull_request_target trap, and the five-step audit checklist if you ship AI agents in CI.

No field notes under this topic yet.

Want the full production-readiness checklist?

The 37 things I check on every audit — turned into a PDF you can hand to your team tomorrow. Send me an email and I will send it back.

Email for the checklist