HOME Science & Technology

AI outperforms doctors in medical diagnosis evaluation

2024.12.15 03:04:23 Justin Yu
43

[An image of ChatGPT. Image Credit to Pixabay]

A study published in the journal JAMA Network Open in late October found that ChatGPT demonstrated superior medical diagnostic capabilities compared to human physicians when assessing clinical case histories.

OpenAI's chatbot achieved a 90 percent diagnostic accuracy when analyzing medical case reports, outperforming physicians who scored an average of 76 percent.

The research involved 50 medical professionals—26 attending physicians and 24 residents—who attempted to diagnose six medical conditions. 

The study methodology was straightforward: Medical professionals evaluated the participants' diagnostic skills by presenting six patient case histories. 

When working without AI, doctors averaged 74% accuracy, while those using ChatGPT improved slightly to 76%. 

In contrast, the AI chatbot, working independently, achieved a 90% diagnostic accuracy rate.

Perhaps more revealing than the accuracy rates themselves, the research uncovered physicians' tendency to stubbornly adhere to their initial diagnoses, even when presented with potentially more accurate alternative suggestions from AI.

Despite increasing exposure to artificial intelligence tools, most doctors lack the skills to effectively leverage chatbots, failing to capitalize on AI’s potential to solve complex diagnostic challenges. 

Dr. Adam Rodman, an internal medicine expert from Beth Israel Deaconess Medical Center, envisions AI systems as valuable doctor extenders providing critical second opinions on diagnoses. 

However, he notes that substantial progress is still needed before this potential can be fully realized.

The findings reveal a historical challenge in medical diagnostics: the difficulty of understanding physicians’ actual thinking processes. 

When asked to explain their diagnostic reasoning, doctors often resort to vague explanations like "intuition" or "based on my experience." 

This imprecise description of clinical decision-making has long frustrated researchers attempting to develop computer programs capable of mimicking medical diagnostic reasoning. 

The evolution of diagnostic technology provides important context.

By the mid-1990s, approximately six computer programs had been developed to attempt medical diagnostics, yet none gained meaningful adoption in clinical practice. 

However, the emergence of large language models like ChatGPT fundamentally transformed the diagnostic technology landscape. 

Unlike previous attempts, these models do not aim to explicitly mimic medical reasoning; instead, their diagnostic capabilities stem from sophisticated language prediction algorithms. 

Following the study, Dr. Rodman conducted a more detailed investigation, carefully examining message logs exchanged between physicians and ChatGPT. 

His findings revealed that even when the chatbot provided well-reasoned diagnostic suggestions, doctors frequently remained unpersuaded when the AI’s insights contradicted their own diagnostic assumptions.

This resistance to AI-generated insights raises important questions about the future integration of artificial intelligence in medical practice and the need for better training in leveraging these powerful diagnostic tools.

Justin Yu / Grade 9
Seoul International School