The Three Levels of AI Transformation

TL;DR

AI transformation happens on three distinct levels: the organization, the teams, and the product. They have different goals, different success metrics, and different failure modes
Organizational level: AI lets anyone work with data and automate processes across functions and data silos. The goal is to automate your company - not to give everyone a chatbot
Team level: Tools like Claude Code change how individual teams work - software engineering, data science, and beyond. The hard part is not adoption, it is scaling agents across departments while maintaining standards (specs, review gates, security)
Product level: LLMs enable use cases for your customers that were impossible two years ago. This level is about experimentation - and accepting that not every non-deterministic idea will survive contact with reality
Most companies only play on one level and call it a strategy. The real leverage comes from playing on all three - deliberately, and with different expectations for each

Intro

“We need an AI strategy” is the sentence I hear most often as an interim CTO right now. The problem is that it usually means three completely different things, depending on who says it.

The CEO means: our company runs on manual processes and Excel, and our competitors are automating.

The VP of Engineering means: my teams are experimenting with coding agents and I have no idea whether we are getting faster or just producing unreviewed code faster.

The Head of Product means: our customers expect AI features and I don’t know which ones are real and which ones are demos that fall apart in production.

All three are right. And all three are talking about different levels of the same transformation. Treating them as one initiative - one budget, one task force, one rollout plan - is how companies end up with an expensive chatbot pilot and not much else.

Let’s take the levels apart.

Level 1: The Organization - Automate Your Company

The first level has nothing to do with engineering. It is about the everyday work of the entire company: finance pulling numbers from three systems into a spreadsheet, operations copying data between tools, customer service answering the same twenty questions, marketing waiting two weeks for a report from the data team.

For decades, the bottleneck for automating this work was simple: you needed programmers. Every automation was a small software project, and small software projects compete with the roadmap, so they never happened. The result is the company most of us know - full of smart people doing repetitive work across data silos that no one ever had the budget to connect.

LLMs change this equation fundamentally. For the first time, people who cannot program can work with data and automate processes themselves. A controller can ask questions across systems in plain language. An operations person can describe a workflow and have an agent execute it. The data silo does not disappear - but the cost of bridging it drops to almost nothing.

What this looks like in practice:

Self-service data access. Instead of every question becoming a ticket for the data team, business users query data directly in natural language. The data team shifts from answering questions to governing access and data quality
Cross-functional process automation. The processes worth automating almost always cross department boundaries - order data from sales, delivery data from ops, invoice data from finance. That is exactly why they were never automated: no single department owned them. Agents that can read from multiple systems make these processes automatable for the first time
The long tail of small automations. The big processes were automated years ago. The value now is in the hundreds of small, annoying tasks that were individually too cheap to justify a developer and collectively eat a significant share of everyone’s week
Better presentations in a fraction of the time. Preparing really good presentations with these org-level agents becomes very simple - and the quality goes up tremendously. The agent pulls the numbers, drafts the storyline, and produces the deck; the human refines the message. No more days wasted fighting PowerPoint

You do not have to build this platform from scratch. Projects like Open WebUI give you a self-hosted chat and agent interface for the whole company: one place where employees access models, share prompts and tools, and work with company data - while you keep control over which models are used and where the data goes. It is exactly the kind of sanctioned platform that beats the shadow-IT workaround.

The goal at this level is blunt: automate your company. Not “explore AI.” Not “run a pilot.” Take the processes that consume your people’s time, and remove them.

The trap at this level is governance theater in one direction and chaos in the other. If every automation needs a committee, nothing happens and people use ChatGPT with company data in their private accounts anyway. If there are no rules at all, you get shadow IT with API keys. The answer is the same as it always was: provide a sanctioned, safe platform that is easier to use than the workaround, set clear data rules, and then get out of the way.

Level 2: The Teams - Agents at Work, Standards at Scale

The second level is where most of the public conversation happens: AI inside the teams that build things. Software engineering is the obvious case, but the same shift is happening in data science, QA, DevOps, and design.

I have written before about how AI is changing engineering teams, so here is the short version: tools like Claude Code are not autocomplete. They are agents that take a task, read the codebase, write the code and the tests, and iterate. A single engineer with a well-configured agent setup does the work that used to take a small team. That part is real, and if your teams are not working this way yet, that is the first problem to fix.

But here is the part that gets far less attention: the hard problem is not adoption inside one team. It is scaling agents across many teams without losing your standards.

One team with a coding agent is an experiment. Twenty teams with coding agents is an organizational design question:

Specs become the unit of work. When agents write the code, the quality of the outcome is determined by the quality of the specification. Approaches like SpecKit formalize this: the spec is written, reviewed, and versioned before the agent runs. This is not bureaucracy - it is the same insight as test-driven development, one level up. Vague prompt in, plausible-looking garbage out
Standards must live in files, not in heads. Conventions that used to spread through code review and osmosis - architecture rules, naming, security requirements, how to write tests - now need to be written down where agents can read them (CLAUDE.md files, lint rules, project templates). The pleasant side effect: writing your standards down so an agent can follow them also makes them clear enough for humans, often for the first time
Review gates matter more, not less. Agents produce more code, faster. Without strong review, CI quality gates, and security scanning, you are simply accumulating technical debt at machine speed. The guardrails are not optional extras - they are what makes the speed usable
Simplicity becomes a force multiplier. An agent can hold one well-structured service in its context and refactor it confidently. It cannot reason about five services wired together through a broker. Simple architecture was always a good idea; now it directly determines how much leverage you get from AI tooling

The trap at this level is measuring activity instead of outcomes. “80% of our developers use AI tools” is not a result. Cycle time, deployment frequency, defect rate - those are results. Measure them before and after, and be honest about what you find.

Level 3: The Product - New Use Cases for Your Customers

The third level is the one with the highest ceiling and the highest failure rate: using LLMs inside your product to enable things your customers simply could not do before.

This is genuinely new territory. Software used to be deterministic: same input, same output, and entire product disciplines were built on that assumption. LLMs break it - and that is both the opportunity and the problem.

The opportunity: whole categories of features that were impossible are now a few weeks of work. Understanding free-text input. Summarizing documents. Letting users talk to their data. Turning a vague customer request into a structured action. If your product sits on top of data or documents or communication - and almost every product does - there are use cases waiting that your customers will pay for.

The problem: not all non-deterministic things will work. Some use cases tolerate the occasional wrong answer beautifully. Some are destroyed by it.

A rough sorting rule that has served me well:

Great fit: the LLM produces a draft and a human decides. Summaries, suggested replies, search, classification with review, report generation. A 95% useful answer is a massive win, and the 5% failure costs a shrug
Workable fit: the LLM acts, but the action is low-stakes and reversible, or it is constrained by hard validation around it. Structured extraction with schema checks, routing, tagging
Bad fit: the LLM acts autonomously where a wrong answer is expensive, irreversible, or a legal problem. Anything where the customer needs the same answer to the same question every time

The operating mode at this level is therefore different from the other two: it is time to try things out. Build small, ship to a subset of users, measure whether the feature actually survives real usage - and kill it without sentimentality if it does not. The companies getting this right run product-level AI as a portfolio of cheap experiments, not as one big bet announced in a press release. The cost of trying has collapsed; the cost of betting the roadmap on an unvalidated AI feature has not.

The trap at this level is the demo. Every LLM feature demos brilliantly - that is practically a property of the technology. The distance between “amazing demo” and “works for thousands of customers with messy real-world data” is where most product-level AI initiatives die. Budget for that distance.

Playing All Three Levels

Here is what I see in practice: most companies play on exactly one level and believe they have an AI strategy.

The company that rolled out a chatbot license to all employees (level 1, barely) and calls itself AI-first
The engineering org that lives in Claude Code (level 2) while the rest of the company still emails Excel files around
The startup that shipped an AI feature (level 3) while its own internal operations are entirely manual

The levels are independent enough to be run in parallel - and different enough that they need different owners, different budgets, and different definitions of success:

Level	Goal	Success looks like	Failure mode
Organization	Automate the company	Hours of manual work eliminated, data accessible to everyone	Chatbot pilots, governance theater
Teams	Faster, better delivery	Cycle time and quality improve measurably	Activity metrics, unreviewed agent code
Product	New customer value	Features customers use and pay for	Demos that never survive production

If you can only start in one place, start where the pain is. But know which game you are playing on each level - because the move that wins one of them loses another. Experimentation is the right mode for the product level and the wrong mode for team standards. Strict standardization is right for scaling agents across teams and deadly for product discovery.

Closing Thought

AI transformation is not a project with an end date, and it is not one thing. It is the organization learning to automate itself, the teams learning to work with agents without losing their standards, and the product learning what non-determinism is good for.

Three levels, three games, three sets of rules. The companies that win will be the ones that know - at every moment - which game they are playing.

The Three Levels of AI Transformation.

Table Of Contents