Why Data Quality Matters in the Age of Generative AI

Why Data Quality Matters in the Age of Generative AI

In the dynamic realm of data engineering, the integration of Generative AI is not just a distant aspiration; it’s a vibrant reality. With data serving as the catalyst for innovation, its creation, processing, and management have never been more crucial.

“While AI models are important, the quality of results we get are dependent on datasets, and if quality data is not tapped correctly, it will result in AI producing incorrect results. With the help of Gen AI, we are generating quality synthetic data for testing our models,” said Abhijit Naik, Managing Director, India co-lead for Wealth Management Technology at Morgan Stanley.

Speaking at AIM’s Data Engineering Summit 2024, Naik said that Gen-AI, machine learning, neural networks, and deep learning models that we have, is the next stage of automation post the RPA era.

“Gen AI will always generate results for you. And when it generates results for you, sometimes it hallucinates. So, what data you feed becomes very critical in terms of the data quality, in terms of the correctness of that data, and in terms of the details of data that we feed,” Naik said. 

However, it’s important to note that human oversight is crucial in this process, Naik added. When integrated carefully into existing pipelines and with appropriate human oversight, GenAI can augment human intelligence by identifying patterns and correlations that humans may miss.

The task of documenting every aspect of their functioning and the knowledge they derive from data is a complex one. This underscores the need for caution and thorough understanding when integrating Generative AI. 

Due to their vast size and training on extensive unstructured data, generative models can behave in unpredictable, emergent ways that are difficult to document exhaustively.  

“This unpredictability can lead to challenges in understanding and explaining their decisions” Naik said.

GenAI in Banking

Naik emphasised GenAI’s importance in the banking and finance sectors. “It can generate realistic synthetic customer data for testing models while addressing privacy and regulatory issues. This helps improve risk assessment,” he added.

This is especially critical when accurate data is limited, costly, or sensitive. A practical example could be creating transactional data for anti-fraud models.

Gen AI models, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and massive language models like GPT, may generate synthetic data that mimics the statistical features of real-world datasets. 

For example, Capital One and JPMorgan Chase use GenAI to strengthen their fraud and suspicious activity detection systems. Morgan Stanley implemented an AI tool that helps its financial advisors find data anomalies in detecting fraud, and Goldman Sachs uses GenAI to develop internal software.  Customers globally can benefit from 24/7 accessible chatbots that handle and resolve concerns, assist with banking issues, and expedite financial transactions.

A recent study showed that banks that move quickly to scale generative AI across their organisations could increase their revenues by up to 600 bps in three years.

“Of course, the highly regulated nature of banking/finance requires careful model governance, oversight, explainability and integration into existing systems,” Naik concluded.

Originally Appeared Here