Commonly used generative AI models, such as ChatGPT and DeepSeek R1, are highly vulnerable to repeating and elaborating on medical misinformation, according to new research.

Mount Sinai researchers published a study this month revealing that when fictional medical terms were inserted into patient scenarios, large language models accepted them without question — and went on to generate detailed explanations for entirely fabricated conditions and treatments.

Even a single made-up term can derail a conversation with an AI chatbot, said Dr. Eyal Klang, one of the study’s authors and Mount Sinai’s chief of generative AI. He and the rest of the research team found that introducing just one false medical term, such as a fake disease or symptom, was enough to prompt a chatbot to hallucinate and produce authoritative-sounding — yet wholly inaccurate — responses

Dr. Klang and his team conducted two rounds of testing. In the first, chatbots were simply fed the patients scenarios, and in the second, the researchers added a one-line cautionary note to the prompt, reminding the AI model that not all the information provided may be inaccurate.

Adding this prompt decreased hallucinations by about half, Dr. Klang said.

The research team tested six large language models, all of which are “extremely popular,” he stated. For example, ChatGPT receives about 2.5 billion prompts per day from its users. People are also becoming increasingly exposed to large language models whether they seek them out or not — such as when a simple Google search delivers a Gemini-generated summary, Dr. Klang noted.

But the fact that popular chatbots can sometimes spread health misinformation doesn’t mean healthcare should abandon or scale back generative AI, he remarked.

Generative AI use is becoming more and more common in healthcare settings for good reason —  because of how well these tools can speed up clinicians’ manual work during an ongoing burnout crisis, Dr. Klang pointed out.

“[Large language models] basically emulate our work in front of a computer. If you have a patient report and you want a summary of that, they’re very good. They’re very good at administrative work and can have very good reasoning capacity, so they can come up with things like medical suggestions. And you will see it more and more,” he stated.

It’s clear that novel forms of AI will become even more present in healthcare in the coming years, Dr. Klang added. AI startups are dominating the digital health investment market, companies like Abridge and Ambience Healthcare are surpassing unicorn status, and the White House recently issued an action plan to advance AI’s use in critical sectors like healthcare.

Some experts were surprised that the White House’s AI action plan didn’t have a greater emphasis on AI safety, given it’s a major priority within the AI research community. 

For instance, responsible AI use is a frequently discussed topic at industry events, and organizations focused on AI safety in healthcare — such as the Coalition for Health AI and Digital Medicine Society — have attracted thousands of members. Also, companies like OpenAI and Anthropic have dedicated significant amounts of their computing resources to safety efforts.

Dr. Klang noted that the healthcare AI community is well aware about the risk of hallucinations, and it is still working to best mitigate harmful outputs.

Moving forward, he emphasized the need for better safeguards and continued human oversight to ensure safety.

Photo: Andriy Onufriyenko, Getty Images

Similar Posts