AWU by Salesforce: A shiny new metric that tells CIOs little of value

Every CIO would love a single metric that explains whether their spend on agentic software is paying off and gives them a clean story to tell the board when it’s time to move pilots to production or when renewal rolls around.

Salesforce is pitching its new Agentic Work Unit (AWU) metric as a measure that can help tell that story. In its current form, though, it reads more like a sales slide or a marketing vehicle for Agentforce with a number attached — perhaps unsurprisingly, as AWU is the brainchild of Salesforce CMO Patrick Stokes.

CEO Marc Benioff introduced AWU during the company’s quarterly earnings call this week, crediting Stokes with the creation of the metric that, he said, “helps” its customers “measure the value” that Agentforce delivers.

AWU is not meant to be viewed in isolation but alongside token consumption, Stokes said later in response to a question from a Goldman Sachs analyst. Salesforce is trying to correlate the number of AWUs produced with the number of tokens consumed, effectively an inference efficiency ratio, to distinguish between simply running a model and producing something of operational value.

In Salesforce’s framing, tokens represent the raw input cost, while the ratio of tokens to work units is meant to signal how much “actual work” agents are delivering for that cost.

However, it was not immediately clear how Salesforce defines what qualifies as a discrete “work unit,” or how consistently that outcome is verified across customer environments, or even where and how AWU as a metric can be accessed.

Work done, not value produced

These ambiguities, analysts say, are exactly why the metric reads better in Salesforce’s marketing material than it does in a CIO’s spreadsheet.

“AWU measures execution rather than accuracy. A triggered workflow or an API call counts, regardless of whether the agent resolved the issue correctly. It tracks activity, not quality: That is the limitation,” said Robert Kramer, principal analyst at Moor Insights and Strategy.

AWU’s complexity also compounds with scale, especially in production workloads, said Sanchit Vir Gogia, chief analyst at Greyhound Research.

“At scale, agent retry behavior and exception handling are inevitable. … Without explicit classification between attempted, succeeded, accepted and validated actions, AWU remains a throughput metric rather than a trust metric,” Gogia said.

Each AWU measures one discrete action performed and completed by an AI agent, such as updating a record, triggering a workflow, or calling an external system, Stokes later clarified in response to CIO.com’s queries.

“A single business outcome may require many AWUs to complete,” Stokes added.

The CMO’s own framing of AWUs as building blocks rather than a ready measure to showcase business outcomes is where the metric’s promise quietly unravels for CIOs as they can’t use it effectively in the one place it actually matters: in the boardroom to justify return on investment or seek more investment for implementation, said Keith Kirkpatrick, research director at The Futurum Group.

Echoing Kirkpatrick, Constellation Research principal analyst Liz Miller said that AWUs “don’t necessarily define success for a Salesforce’s customer or indicate that their customer’s customer has accomplished their goals when leveraging the agentic system of engagement”.

Rather, Miller sees AWUs as a health metric that the financial markets are likely to use to track just how valid Salesforce’s reporting of upticks in utilization are, especially when tracked against overhead costs, sales and revenue.

Skepticism in the boardroom

“I think it is a great metric to deliver consistent growth of Salesforce. But do I think Salesforce customers are going to prioritize AWUs over, say, yield and return of AI ROI? Likely no. CIOs and all leaders in the C-Suite are already skeptical over returns,” Miller said.

Getting real value out of AWUs will require more work, said Gogia, especially more guardrails and scripting as AWUs themselves don’t measure task correctness or support verification.

“For AWUs to evolve into a verified task metric, enterprises would require layered instrumentation. They would need to distinguish attempted from committed actions. They would need rollback tracking and exception visibility. They would need per-tool success ratios rather than aggregate volume. They would need override tracking and human intervention metrics. They would need validation logic confirming that business objectives were achieved, not merely executed,” he said.

“The more an enterprise begins counting work units, the more it must constrain variability. Without deterministic scaffolding, AWU growth may correlate with drift and rework rather than with maturity,” Gogia added.

Pressed on whether AWUs could independently reflect high-quality outcomes, even Stokes was notably careful in his wording, leaving the impression that the metric counts how much work happened, not how well it was done.

The CMO seemed to suggest that enterprises may have to use guardrails and observability tools in combination with AWUs if they effectively wanted to measure real value from outcomes.

“In short, AWUs quantify work, while guardrails, deterministic flows, and tools like Agent Script (all part of Agentforce 360) ensure that work is reliable, repeatable, and outcome-driven. Tokens alone or raw LLM output are insufficient for measuring true enterprise value. These operational controls are what turn intelligence into consistent business outcomes,” he said.

Not a complete waste of time

However, analysts do see the metric helping CIOs in some small ways.

AWUs, when integrated properly, could help CIOs simplify how they frame the narrative around agentic use in their own enterprises, Gogia said.

At a fleet level, the metric could offer a coarse-grained signal across sprawling deployments as agents creep into CRM, service, analytics, and collaboration stacks — a macro view of density and activity that’s otherwise hard to summarize, Gogia said.

When coupled with financial attribution, AWUs could also support cost modelling: cost per work unit is a more intelligible story for finance teams than cost per token, enabling internal chargebacks and departmental comparisons, though that only holds if work units are segmented by complexity and validation status, so that useful work can be measured too, Gogia said.

The same applies to productivity claims, he said, cautioning, “AWUs can help model impact only when it is explicitly wired to measurable KPIs such as cycle-time reduction, faster resolution, or admin hours saved.”

Other analysts see the metric as a step in the right direction for recalibrating usage, pricing, and value in agentic offerings — part of the broader SaaSpocalypse reckoning, where seat-based SaaS models are fraying and vendors are being pushed toward outcome-based pricing.

Much like when clicks and likes became convenient stand-ins for success in the absence of better metrics, AWU feels like a necessary placeholder for the agentic era, according to Constellation Research’s Miller.

Those early measures helped get the online media industry started but never proved durable indicators of real value, Miller said, adding that AWU may serve the same “better than nothing” role today with the open question of whether it matures into a metric that actually endures.

However, Moor Insights and Strategy’s Kramer was more optimistic about the metric itself: “The concept likely will, even if the specific term does not, become an industry gold standard that CIOs will use as an evaluation criterion in the future.”

Although, he did warn that there is a risk of fragmentation if every vendor defines a similar metric differently: “Outcome-oriented AI metrics will likely appear in RFPs, but enterprises should define their own versions of a completed agentic task and how to verify it.