Managing Data Quality in an AI World

Each year, Monte Carlo surveys real data professionals about the state of their data quality. This year, we turned our gaze to the shadow of AI—and the message was clear. Managing Data Quality in an AI World is getting harder.

Data quality risks are evolving — and data quality management isn’t.

Among the 200 data professionals polled about the state of enterprise AI, a staggering 91% said they were actively building AI applications, but two out of three admitted to not completely trusting the data these applications are built on. And “not completely” leaves a lot of room for error in the world of AI.

Far from pushing the industry toward better habits and more trustworthy outputs, the introduction of GenAI seems to have exacerbated the scope and severity of data quality problems.

The Core Issue

Why is this happening, and what can we do about it?

2024 State of Reliable AI Survey

The Wakefield Research survey, commissioned by Monte Carlo in April 2024, polled 200 data leaders and professionals. It comes as data teams grapple with the adoption of generative AI. The findings highlight several key statistics that indicate the current state of the AI race and professional sentiment about the technology:

Pressure from Leadership: 100% of data professionals feel pressure from their leadership to implement a GenAI strategy and/or build GenAI products.
Active Development: 91% of data leaders (VP or above) have built or are currently building a GenAI product.
Perceived Usefulness: 82% of respondents rated the potential usefulness of GenAI at least an 8 on a scale of 1–10, but 90% believe their leaders do not have realistic expectations for its technical feasibility or ability to drive business value.
Responsibility for Implementation: 84% of respondents indicate that it is the data team’s responsibility to implement a GenAI strategy, versus 12% whose organizations have built dedicated GenAI teams.

While AI is widely expected to be among the most transformative technological advancements of the last decade, these findings suggest a troubling disconnect between data teams and business stakeholders. More importantly, they suggest a risk of downward pressure toward AI initiatives without a clear understanding of the data and infrastructure that power them. Managing Data Quality in an AI World.

The State of AI Infrastructure—and the Risks It’s Hiding

Even before the advent of GenAI, organizations were dealing with exponentially greater volumes of data than in decades past. Since adopting GenAI programs, 91% of data leaders report that both applications and the number of critical data sources have increased even further, deepening the complexity and scale of their data estates in the process.

There’s no clear solution for a successful enterprise AI architecture. Survey results reveal how data teams are approaching AI:

49% are building their own LLM
49% are using model-as-a-service providers like OpenAI or Anthropic
48% are implementing a retrieval-augmented generation (RAG) architecture
48% are fine-tuning models-as-a-service or their own LLMs

As the complexity of AI’s architecture and the data that powers it continues to expand, one perennial problem is expanding with it: data quality issues.

The Modern Data Quality Problem

While data quality has always been a challenge for data teams, this year’s survey results suggest the introduction of GenAI has exacerbated both the scope and severity of the problem. More than half of respondents reported experiencing a data incident that cost their organization more than $100K. And we didn’t even ask how many they experienced. Previous surveys suggest an average of 67 data incidents per month of varying severity.

This is a shocking figure when you consider that 70% of data leaders surveyed also reported that it takes longer than four hours to find a data incident—and at least another four hours to resolve it.

Managing Data Quality in an AI World

But the real deal breaker is this: even with 91% of teams reporting that their critical data sources are expanding, an alarming 54% of teams surveyed still rely on manual testing or have no initiative in place at all to address data quality in their AI.

This anemic approach to data quality will have a demonstrable impact on enterprise AI applications and data products in the coming months—allowing more data incidents to slip through the cracks, multiplying hallucinations, diminishing the safety of outputs, and eroding confidence in both the AI and the companies that build them.

Is Your Data AI-Ready?

While a lot has certainly changed over the last 12 months, one thing remains absolutely clear: if AI is going to succeed, data quality needs to be front and center.

“Data is the lifeblood of all AI — without secure, compliant, and reliable data, enterprise AI initiatives will fail before they get off the ground. The most advanced AI projects will prioritize data reliability at each stage of the model development life cycle, from ingestion in the database to fine-tuning or RAG.”
Lior Solomon, VP of Data at Drata,

The success of AI depends on the data—and the success of the data depends on your team’s ability to efficiently detect and resolve the data quality issues that impact it.

By curating and pairing your own first-party context data with modern data quality management solutions like data observability, your team can mitigate the risks of building fast and deliver reliable business value for your stakeholders at every stage of your AI adventure.

What can you do to improve data quality management in your organization?