Consumer Chatbots See High Error Rates For Health Queries

AI chatbots aimed at general-purpose consumer use routinely misdiagnose illnesses when presented with incomplete patient data, a study has found, emphasising the risk of an increasingly common use case for such tools.

For all models tested, the study from the Massachusetts-based Mass General Brigham healthcare system found that failure rates was above 80 percent for differential diagnosis, referring to a diagnosis without full patient information.

The models found it difficult to suggest a range of possible diagnoses in such cases, frequently narrowing to a single answer, the study found.

Image credit: Pexels

High error rates

The tools, including leading models from Anthropic, DeepSeek, Google, OpenAI and xAI, performed well when complete information was provided.

The study tested 21 LLMs overall, finding error rates fell to below 40 percent for final diagnoses with more complete data.

The best performers recorded 90 percent accuracy, researchers said in the study, which was published in Jama Network Open on Monday.

Researchers tested AI models using 29 clinical vignettes based on a standard medical reference text.

The findings recall the persistent difficulties in limiting the effects of so-called “hallucinations” in AI models, or incorrect information frequently provided when the model has limited access to information.

Specialised tools

Anthropic, Google and OpenAI said they have safeguards built in to discourage the use of their models for clinical diagnoses.

But people are nevertheless increasingly using such models for medical advice, with a poll in March finding that one in three US adults had turned to AI chatbots for medical advice in the past year.

Companies including Google and Amazon are developing chatbots specifically geared to deliver medical advice.

Source link

Consumer Chatbots See High Error Rates For Health Queries

Share This Post

High error rates

Specialised tools

Related Posts

Silicon Valley Is Spending Millions to Stop One of Its Own

Hublot launches a Big Bang sequel at Watches and Wonders – meet the Big Bang Reloaded

Nvidia enters cloud gaming space in India; sets April 16 launch

Intel’s ‘Nova Lake’ CPU details leak: Productivity powerhouse?

Washington agrees on space urgency, but not on how to deliver

A key solution to climate change isn’t happening – and that’s good