Researchers from the National Institutes of Health (NIH) have demonstrated that a multimodal AI can achieve high accuracy on a medical diagnostic quiz, yet struggles to describe medical images and explain the reasoning behind its answers. ChatBots in Medical Diagnostics may not be ready for prime time.

To evaluate AI’s potential in clinical settings, the research team tasked Generative Pre-trained Transformer 4 with Vision (GPT-4V) with answering 207 questions from the New England Journal of Medicine (NEJM) Image Challenge. This challenge, designed to help healthcare professionals test their diagnostic abilities, prompts users to select a diagnosis from multiple-choice options after reviewing clinical images and a text-based description of patient symptoms.

The researchers asked the AI to both answer the questions and provide a rationale for each answer, including a description of the image presented, a summary of current, relevant clinical knowledge, and step-by-step reasoning for how GPT-4V arrived at its answer.

Nine clinicians from various specialties were also tasked with answering the same questions, first in a closed-book environment with no access to external resources, then in an open-book setting where they could refer to external sources.

The research team then provided the clinicians with the correct answers and the AI’s responses, asking them to score GPT-4V’s ability to describe the images, summarize medical knowledge, and provide step-by-step reasoning.

The analysis revealed that both clinicians and the AI scored highly in choosing the correct diagnosis. In closed-book settings, the AI outperformed the clinicians, whereas humans outperformed the model in open-book settings.

Moreover, GPT-4V frequently made mistakes when explaining its reasoning and describing medical images, even in cases where it selected the correct answer.

Despite the study’s small sample size, the researchers noted that their findings highlight how multimodal AI could be used to provide clinical decision support.

“This technology has the potential to help clinicians augment their capabilities with data-driven insights that may lead to improved clinical decision-making,” said Zhiyong Lu, Ph.D., corresponding author of the study and senior investigator at NIH’s National Library of Medicine (NLM), in a press release. “Understanding the risks and limitations of this technology is essential to harnessing its potential in medicine.”

However, the research team emphasized the importance of assessing AI-based clinical decision support tools.

“Integration of AI into healthcare holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner,” explained Stephen Sherry, Ph.D., NLM acting director. “However, as this study shows, AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis.”

Related Posts
Who is Salesforce?
Salesforce

Who is Salesforce? Here is their story in their own words. From our inception, we've proudly embraced the identity of Read more

Salesforce Marketing Cloud Transactional Emails
Salesforce Marketing Cloud

Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more

Salesforce Unites Einstein Analytics with Financial CRM
Financial Services Sector

Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more

AI-Driven Propensity Scores
AI-driven propensity scores

AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more