It’s not your AI that’s failing. It’s your data

What do baseball’s Mario Mendoza and AI have in common? A 20% success rate. Mendoza’s batting average established the Mendoza Line: shorthand for barely acceptable performance. Across industries, four out of five AI initiatives still fall short of expectations. In all likelihood, it’s typically not the AI that’s failing, but the lack of data readiness.

Companies rush to embark on ambitious AI-driven transformation projects in search of increased efficiency, revenue or other benefits, but overlook data readiness as a fundamental prerequisite. Gartner predicts that through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data.

So, before investing in another model or tool, make sure your data is ready to ship.

Establish a data foundation based on well-understood relationships and ground truth

Companies that succeed with AI use clean data pipelines, integrated data lakes and a shared understanding of what their data means. Even in 2024, Secoda reported that 68% of enterprise data remained untapped for the purposes of analysis and innovation. When most of your organization’s knowledge is locked away, your algorithms draw from a shallow, murky pool.

When it comes to data readiness, Walmart offers a paradigm of patience. They spent years linking supply-chain, point-of-sale and vendor data. This robust foundation made subsequent AI rollouts reliably smooth. It also helped shave costs, limit stock outages and streamline deliveries. Establishing and describing clear relationships across the various data sources within your organization ensures that AI will properly understand and interpret data across your business landscape.

It is also critical to establish ground truth datasets when training AI models. Ground truth isn’t just labeled data—it’s your organization’s expertise turned into something machines can learn from to create scalable processes. Begin with clear ontologies and label taxonomies that map directly to the business problem you’re solving. Finally, ensure there is sufficient quantity, quality and diversity in your training data to ensure model performance is consistent across a variety of scenarios and mitigates bias, especially in the case of use cases that have compliance implications, like human resources or healthcare.

“The takeaway is clear,” according to Typewiser. “Getting your data house in order is perhaps the unglamorous, yet most critical, step in AI adoption.”

Ensure data governance isn’t an afterthought

Although it can feel like governance slows you down, it actually speeds up approvals and cuts risk. Assign clear owners and stewards for your data, and codify the contracts, lineage and provenance from raw sources all the way to model outputs.

When you train models or build RAG pipelines on data sources, ensure you enforce data access and retention policies across downstream AI applications. Pay attention to sensitive personal data and obtain consent where regulations require it.

Governance also plays a key role in overall system reliability. Clear ownership, reproducible documentation and auditable processes can limit the chaos of emergency data fixes. The so-called red tape of governance works as a velocity engine: reliable quality drives faster workflows.

Prevent temporal drift: The silent data decay undermining AI ROI

As a rule, wine ages well. Data typically doesn’t. Customers’ tastes shift, supply chains move and regulations tighten. The result is called drift: it’s the mismatch between what your AI thinks the world looks like and what it actually looks like.

Such drift comes in two varieties. Data drift arises when model inputs change in distribution—say, a shift in patient demographics volumes. Concept drift occurs when the relationship between inputs and outcomes shifts—think clinical algorithms built ahead of a pandemic.

When left unchecked, both forms of drift erode ROI. According to InsightFinder, the click-through rate at an e-commerce company dropped an unexpected 30% before anyone noticed.

Drift can reduce or even erase the model’s value to the business. In some cases, it can even drive massive losses. At Zillow, a “valuation algorithm led the company to overestimate the value of the houses it purchased in Q3 and Q4 2021 by more than $500M.”

To maintain resilience, organizations must add monitors to their pipelines, run statistical drift tests on key variables, compare prediction outputs to real-world feedback and retrain models on rolling intervals. Some even deploy shadow models that learn alongside production systems and sound the alarm when outputs diverge.

The takeaway: your AI doesn’t fail in an instant. Its accuracy simply fades away. Establish metrics to catch data degradation before it costs you customers, your credibility or even a quarter’s worth of revenue.

The payoff of sustainable data readiness

The stakes are clear: organizations that neglect to spend time on data readiness risk joining the large percentage of AI deployments that struggle to meet ROI expectations or fail outright. Those who master these data readiness fundamentals will join the echelon of AI projects that succeed and deliver real business value.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?