Why Machine Learning Challenges Still Matter in the Age of Generative AI | by Angelica Lo Duca | Aug, 2025

Why Machine Learning Challenges Still Matter in the Age of Generative AI | by Angelica Lo Duca | Aug, 2025


Data Science, Artificial Intelligence

Three years after publishing Comet for Data Science, I’m revisiting the timeless challenges of Machine Learning, now amplified in the age of Generative AI.

Press enter or click to view image in full size

Photo by Nahrizul Kadri on Unsplash

It’s been three years since I published my first book, Comet for Data Science, and so much has changed. Back then, Machine Learning was the hottest topic in AI. Today, it almost feels like old news. Everywhere you look, people are talking about Generative AI: large language models, image generators, and copilots. It’s easy to think that traditional Machine Learning has been left behind.

But here’s the truth: the challenges we faced in Machine Learning are still here, and they’re just as relevant for Generative AI. Data quality, model reliability, overfitting, explainability… these aren’t relics of the past. They’re the foundation on which modern Generative AI systems are being built.

That’s why I decided to revisit the challenges I outlined in Chapter 8 of Comet for Data Science and reframe them through the lens of today’s Generative AI revolution. Because while the buzzwords have changed, the hard problems remain, only bigger, messier, and more urgent.

In this post, I’ll walk you through the timeless challenges of Machine Learning, from data headaches to model explainability, and show you how they map onto today’s Generative AI world.

If there’s one truth that hasn’t changed in the shift from traditional Machine Learning to Generative AI, it’s this: your model is only as good as your data. In fact, the challenges we struggled with in ML, such as insufficient quantity, poor quality, lack of representativeness, and data drift, are even more critical today. Let’s unpack each of them in turn.

1. Insufficient Quantity of Data

In the classic ML world, small datasets often meant models that couldn’t generalize. A regression model trained on a few hundred rows of data couldn’t capture the complexity of real-world phenomena.

With Generative AI, the problem looks different but hasn’t disappeared. Foundation models are trained on billions of tokens, but the moment you move to a specialized domain, such as medicine, law, finance, or manufacturing, you quickly realize that relevant, high-quality domain data is scarce. Fine-tuning a large model on a few thousand examples often leads to overfitting or instability. The scale has changed, but the scarcity problem persists. The community should further investigate this problem.

2. Poor Quality of Data

Traditional ML practitioners dreaded messy data: duplicates, outliers, missing values, or inconsistent labels could ruin performance. Cleaning data was often the most time-consuming part of any project.

In Generative AI, the stakes are higher. Training datasets are so large that manual cleaning is impossible, yet the web-scale corpora used to build LLMs are filled with spam, toxic language, bias, and outright misinformation. A mislabeled row in a CSV might hurt a small classifier; a flood of low-quality or harmful content in a large dataset can skew the behavior of a model in ways that are subtle, hard to detect, and very difficult to reverse once the model is trained.

The problem of poor quality of data is also mapped to model fine-tuning, which could produce wrong results while answering domain-specific questions, or even hallucinate.

3. Non-Representative Data

Bias in training data has always been a challenge. A face recognition model trained primarily on light-skinned faces will perform poorly, and unfairly, on darker-skinned individuals. In ML, this was already a problem of equity and accuracy.

In GenAI, the issue is amplified. A language model trained disproportionately on English content will struggle with underrepresented languages, dialects, or cultural contexts. Worse, because these models are deployed at scale, their biases aren’t just statistical errors; they can reinforce stereotypes, exclude communities, and shape discourse in society. Representativeness is no longer just a technical concern; it’s an ethical one.

4. Data Drift

Data drift occurs when the world changes but your training data doesn’t. This problem has long been the enemy of production ML systems. A credit scoring model built on five-year-old data might not reflect current economic realities.

For Generative AI, drift is exponential. Language, culture, facts, and knowledge evolve daily. An LLM trained last year may already be outdated, missing current events, new scientific discoveries, or changing cultural norms. Users expect these systems to be up to date, but retraining or fine-tuning massive models is exceptionally costly. Drift in GenAI isn’t just a performance issue; it’s a question of trust.

Data is only half the battle. Once you start building models, a new set of challenges emerges. In traditional Machine Learning, these often revolved around overfitting, low performance, or computational costs. With Generative AI, the same issues still apply.

1. Overfitting and Underfitting

In classical ML, overfitting was usually easy to spot: a model performed brilliantly on the training set but failed on new data. Underfitting, on the other hand, meant the model was too simple to capture the patterns in the data.

Generative AI introduces new shades of this problem. Overfitting in LLM fine-tuning can lead to models that “parrot” the training data, memorizing entire chunks of text or reproducing sensitive information word for word. Underfitting, meanwhile, manifests as models that ignore domain-specific fine-tuning and revert to their generic, pre-trained behavior. The line between a model that generalizes well and one that either memorizes or ignores training data is now far blurrier.

2. Low Performance

With traditional ML, low performance often meant poor accuracy, precision, or recall. The solution was to tune hyperparameters, try different algorithms, or engineer better features.

In Generative AI, measuring performance is far more complicated. What does “accuracy” mean when a model generates free-form text, images, or code? Evaluating creativity, relevance, or factual correctness is inherently subjective. And yet, performance matters deeply: an LLM that “hallucinates” answers can be dangerous in medicine or law, while a generative image model that misses subtle details can undermine trust. The challenge is not only to improve models but to define the metrics by which we measure success.

3. Computational Cost and Efficiency

In ML, training a large model or running extensive cross-validation was expensive but manageable for most practitioners with the right infrastructure. Parallelization, GPUs, and cloud resources helped mitigate costs.

In GenAI, costs are astronomical. Training a foundation model can run into millions of dollars in compute, energy, and engineering time. Even fine-tuning or running inference at scale can overwhelm smaller teams. Efficiency isn’t just a “nice to have” anymore; it’s often the deciding factor between being able to deploy a system or not. Techniques like parameter-efficient fine-tuning (LoRA, adapters) and model distillation are emerging, but the challenge remains: how do you balance performance with sustainability and accessibility?

4. Concept Drift

Concept drift is when the relationship between input and output changes over time. This problem has long plagued ML in production. For example, consumer behavior shifts, making old models less predictive.

In Generative AI, concept drift takes on new forms. Language evolves, cultural references change, and the “ground truth” of facts shifts daily. A chatbot trained in 2022 may fail to understand memes, slang, or news from 2024. Worse, users often assume GenAI systems “know everything,” making outdated responses more problematic. Unlike classical ML models that could be retrained periodically, updating massive foundation models is far from trivial and often economically unfeasible.

If data challenges are about what goes in and model challenges are about how the system learns, explainability is about what comes out and how we make sense of it. In traditional ML, the push for interpretability led to tools like SHAP or LIME, which helped us understand the influence of individual features. With Generative AI, the problem is deeper and more urgent: how do you explain the behavior of models that generate entire paragraphs of text, complex images, or working code?

1. The Black Box Problem, Amplified

A decision tree can be visualized; a linear regression can be read like an equation. Even neural networks, though opaque, can be probed with feature attribution methods. But with LLMs and multimodal models containing hundreds of billions of parameters, the inner workings are beyond human comprehension.

This opacity matters. When a generative model hallucinates, where did the error come from? Was it the training data, the fine-tuning set, the decoding strategy? The scale and complexity of these systems make it nearly impossible to answer with confidence.

2. Feature Importance vs. Emergent Behavior

In classic ML, explainability often meant measuring how much each feature contributed to a prediction. In GenAI, the “features” aren’t just age, salary, or word counts; they’re embeddings spread across massive parameter spaces. What emerges from those embeddings isn’t a neat mapping but behaviors: creativity, reasoning, style, bias.

Trying to attribute these emergent properties back to specific inputs is like trying to explain a novel by analyzing the frequency of its letters. We need new forms of interpretability that go beyond feature attribution and focus on understanding patterns of behavior at scale.

3. Trust, Accountability, and Regulation

Explainability has always been linked to trust: users are more likely to accept model outputs when they understand how they were produced. In critical domains like finance or healthcare, regulators often require explanations.

Generative AI raises the stakes. A model that generates a wrong classification might cause inconvenience; a model that produces false medical advice, misleading legal arguments, or biased imagery can cause real harm. Explainability here isn’t optional; it’s essential for safety, compliance, and societal trust. Yet, our current tools are far behind the needs of the technology.

4. Towards New Paradigms of Explainability

The community is experimenting with new approaches: probing models with synthetic tests, using smaller interpretable models to approximate LLM behaviors, and designing evaluation frameworks that measure biases, toxicity, or factuality. But these are only first steps. Explainability in Generative AI is about finding entirely new metaphors for what interpretability means in systems that create.

Three years ago, when I first wrote about Machine Learning, these challenges — messy data, fragile models, black-box decisions — felt like the core obstacles to building intelligent systems. Fast forward to today, and the world is buzzing about Generative AI. The scale has exploded, the architectures have evolved, and the possibilities feel endless. And yet, the challenges remain.

Data challenges are bigger than ever: scarcity in niche domains, oceans of low-quality web text, cultural biases, and constant drift.

Model challenges have multiplied: overfitting now means memorization at scale, performance is hard even to measure, costs are staggering, and concept drift happens faster than ever.

Explainability challenges are existential: we’ve moved from interpreting a decision tree to trying to make sense of emergent behavior in hundred-billion-parameter models.

Generative AI may feel like a revolution, but it’s also a continuation of the same journey we started with Machine Learning. The core lesson hasn’t changed: intelligent systems are only as strong as the data they learn from, the models we build, and the trust we can place in their outputs.

By revisiting these timeless challenges through the lens of GenAI, we can remind ourselves that progress isn’t just about bigger models or flashier demos. It’s about solving the complex, unglamorous problems that have always defined AI: making it reliable, fair, efficient, and understandable.

Because in the end, whether we call it ML or GenAI, the questions are the same: Can we trust this system? Can we understand it? And can we use it responsibly?



Content Curated Originally From Here