LLMs Have Eaten the Internet — Now They’re Starving on Themselves

I’ve spent my career swimming in data — as former Chief Data Officer at Kaiser Permanente, UnitedHealthcare, and Optum — and at one point, I had oversight of nearly 70% all of America’s healthcare claims. So when I tell you the problem with enterprise AI isn’t the model architecture but the data that models are being fed, believe me: I’ve seen it firsthand.

LLMs are already peaking

The cracks are already showing in LLMs. Take GPT-5. Its launch was plagued with complaints: it failed basic math, missed context that earlier versions handled with ease, and left paying customers calling it “bland” and “generic.” OpenAI even had to restore an older model after users rejected its colder, checklist-driven tone. After two years of delays, many started asking if OpenAI had lost its edge — or if the entire LLM approach was simply hitting a wall.

Meta’s LLaMA 4 tells a similar story. In long-context tests — the kind of work enterprises actually need — Maverick showed no improvement over LLaMA 3, and Scout performed “downright atrociously.” Meta claimed these models could handle millions of tokens; in reality, they struggled with just 128,000. Meanwhile, Google’s Gemini sailed past 90% accuracy at the same scale. 

The data problem no one wants to admit

Instead of confronting the limits we’re already seeing with LLMs, the industry keeps scaling up — pouring more compute and electricity into these models. And yet, despite all that power, the results aren’t getting any smarter.

The reason is simple: the internet data these models are built on has already been scraped, cleaned, and retrained over and over again to death. That’s why new releases feel flat — there’s little new to learn. Every cycle just recycles the same patterns back into the model. They’ve already eaten the internet. Now they’re starving on themselves.

Meanwhile, the real gold mine of intelligence — private enterprise data — sits locked away. LLMs aren’t failing for lack of data — they’re failing because they don’t use the right data. Think about what’s needed in healthcare: claims, medical records, clinical notes, billing, invoices, prior authorization requests, call center transcripts — the information that actually reflects how businesses and industries are run. 

Until models can train on that kind of data, they’ll always run out of fuel. You can stack parameters, add GPUs, and pour electricity into bigger and bigger models, but it won’t make them smarter. 

Small language models are the future

The way forward isn’t bigger models. It’s smaller, smarter ones. Small Language Models (SLMs) are designed to do what LLMs can’t: learn from enterprise data and focus on specific problems.

Here’s why they work.

First, they’re efficient. SLMs have fewer parameters, which means lower compute costs and faster response times. You don’t need a data center full of GPUs just to get them running.

Second, they’re domain-specific. Instead of trying to answer every question on the internet, they’re trained to do one thing well — like HCC risk coding, prior authorizations, or medical coding. That’s why they deliver accuracy in places where generic LLMs stumble.

Third, they fit enterprise workflows. They don’t sit on the outside as a shiny demo. They integrate with the data that actually drives your business —billing data invoices, claims, clinical notes — and they do it with governance and compliance in mind.

The future isn’t bigger — it’s smaller

I’ve seen this movie before: massive investments, endless hype, and then the realization that scale alone doesn’t solve the problem.

The way forward is to fix the data problem and build smaller, smarter models that learn from the information enterprises already own. That’s how you make AI useful — not by chasing size for its own sake. And I’m not the only one saying it. Even NVIDIA’s own researchers now say the future of agentic AI belongs to small language models.

The industry can keep throwing GPUs at ever-larger models, or it can build better ones that actually work. The choice is obvious.

Photo: J Studios, Getty Images


Fawad Butt is the co-founder and CEO of Penguin Ai. He previously served as the Chief Data Officer at Kaiser Permanente, UnitedHealthcare Group, and Optum, leading the industry’s largest team of data and analytics experts and managing a multi-hundred-million dollar P&L.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.

Similar Posts