Governance debt
Modernization of legacy systems is not a new phenomenon. I have personally been involved in legacy system migration to a more efficient & modern software. Driven by the goal of achieving efficiencies, it may take months for the initial results to show up while the migration continues in other phases. The emergence of AI has triggered an enterprise-wide race to drive efficiency across nearly every business process.
Today’s CIOs are under pressure to see measurable returns from AI investments and as a result, chatbots, agents and GenAI tools are being deployed at an unprecedented pace. The primary metric used to evaluate success is productivity, with AI delivering massive gains through faster coding, documentation, content generation and prototyping. However, these benefits often obscure a less visible reality: AI-generated outputs need code verification, compliance reviews and ongoing oversight.
I have not personally seen an organization where the “AI First” mandate is accompanied by a “governance-first” strategy. Yet, as executives push for faster delivery and measurable gains, risk assessments are often viewed as obstacles rather than necessities. This creates an interesting organizational paradox: the very technology adopted to accelerate work simultaneously introduces new requirements for oversight, accountability and trust. The phenomenon becomes even more significant as it gets embedded into every facet of organization, from software development to business reporting as well as customer support.
Having a manual human in the loop review for every agent output will not scale in the long run. But if not governed, the results are much more devastating with undocumented AI behavior and auditability gaps. Another factor necessitating governance is the AI inconsistency. Most leaders assume that AI behaves like traditional software with an input and an output. But with most AI models, the behavior differs even with the same prompt, model and data as different agents interpret context differently. Inconsistent AI outputs make enterprise quality standards harder to scale.
According to a study “Global AI Pulse” by KPMG in 2026’, 54% organizations remain in the early stages of the AI journey, while 75% executives expressed concern about AI-related risk and security, which begs a vital question – How do leaders enforce AI adoption while keeping the safeguards in place?
How should AI be reviewed?
As organizations embrace AI, the question of governance involves thinking at the grassroots level. In my fifteen years of overseeing complex architectures, having a human in the loop for daily pipelines generating output defeats the premise of AI adoption. Why is this operationally challenging? Most AI agents are generating thousands of answers to prompts and writing multiple lines of code. Additionally, AI agents are working at several stages of cleansing data, algorithm design and model configuration. The volume of generated AI artifacts will quickly exceed human review capabilities. One of the ways of countering this dilemma is to have additional oversight where human judgment delivers the most value in the initial phases of adoption. The AI leaders should be asking teams to validate for high-risk decisions, regulatory requirements, customer facing interactions. The objective can be to define a set of AI red flags for every team to be used as a governance framework, helping to identify the most common risks and maintain standards across the organization. This also leads to an important question of AI usage metric: What should leaders be using as a metric for measuring AI success while governance safeguards are being put in place?
The token trap
Enterprises have traditionally had to evolve their metrics of measuring digital transformation. Since productivity is the byproduct of AI, there is a temptation to use AI activity as a proxy for the value it generates. In many organizations, AI usage is getting measured by prompts, queries submitted, tokens used and interaction with chatbots. “Token maximization” — where employees are asked to track AI token usage, thereby correlating productivity with using more tokens — is driving up the organization’s costs without considering AI validation costs. In one of the articles on Fortune, this stark reality is exposed. According to the article, even the least expensive version of Clause Opus 4.6, which costs $5 for every million tokens and token usage going into billions, one user alone can cost the firm more than $1.4 million in costs. This creates a dangerous incentive structure with employees working towards higher token usage than maximizing the business outcomes.
In complex engineering environments, high activity does not correlate with high productivity. Employees trying to research a proprietary tool can use millions of tokens to get basic information, while seasoned employees trying to add value to the work may end up using a fraction of them. So how can leaders address this? The answer once lies in governance. During the cloud transformation era, organizations established teams responsible for Cloud deployment, migration standards and cost optimization. AI adoption requires a similar operating model for measuring AI activity as well as outcomes.
Measuring adoption to outcome
For a CIO, measuring AI impact is as critical as the adoption of AI. To get measurable values out of AI tools, the urge to deploy and measure usage activity should be replaced with a tactical, long-term approach to measure gains. AI adoption should be evaluated based on its impact on workflows. Leaders should focus on measurable improvements in the day-to-day tasks themselves. Organizations can track the reduction in deployment time for processes with or without the use of AI, along with the costs incurred on tokens or queries. Another metric to measure is improvements in accuracy by comparing established baselines with AI-generated output. An AI agent that generates faster output but requires more corrections might end up being less productive than a human. Cost efficiencies that compare AI cycle time with token usage are another good indicator of AI adoption measurement.
AI and its impact on organizational learning is another critical metric where the objective should be for employees to build expertise faster, transfer knowledge with better decisions over time. AI adoption that leads to less learning and more dependency (due to reliance on AI) may lead to organizational risk rather than adding value. Finally, as AI adoption matures, organizations should establish prompt governance frameworks. Aggregated team-level reporting that highlights prompt usage will reveal key training opportunities among employees. The idea is to help teams develop stronger AI practices while optimizing token usage and business impact.
From adoption to value
One of the most overlooked aspects of organizations adopting AI is its long-term operating cost. The underlying economics of AI carry the same level of discipline that organizations apply to all the other assets. While AI observability has emerged as an important metric to gauge AI adoption, CIOs must think beyond usage metrics and focus on the long-term return of AI investments. An organization’s AI maturity assessment should be calculated on the basis of spend vs created value, accuracy, skill development and cost effectiveness. Creating a framework to measure the value that AI creates will define the success of AI adoption for the organization and enable it to innovate and scale. Ultimately, the enterprises that succeed with AI will not be the ones to show it the fastest. AI success will not be a function of deployment speed; it will be a function of architectural discipline
This article is published as part of the Foundry Expert Contributor Network.
Want to join?