Researchers from Mass General Brigham recently published findings in PAIN indicating that large language models (LLMs) do not exhibit race- or sex-based biases when recommending opioid treatments.
Thank you for reading this post, don't forget to subscribe!The team highlighted that, while biases are prevalent in many areas of healthcare, they are particularly concerning in pain management. Studies have shown that Black patients’ pain is often underestimated and undertreated by clinicians, while white patients are more likely to be prescribed opioids than other racial and ethnic groups. These disparities raise concerns that AI tools, including LLMs, could perpetuate or exacerbate such biases in healthcare.
To investigate how AI tools might either mitigate or reinforce biases, the researchers explored how LLM recommendations varied based on patients’ race, ethnicity, and sex. Using 40 real-world patient cases from the MIMIC-IV Note data set—each involving complaints of headache, abdominal, back, or musculoskeletal pain—the cases were stripped of references to sex and race. Random race categories (American Indian or Alaska Native, Asian, Black, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, and white) and sex (male or female) were then assigned to each case. This process was repeated until all combinations of race and sex were generated, resulting in 480 unique cases.
These cases were analyzed using GPT-4 and Gemini, both of which assigned subjective pain ratings and made treatment recommendations. The analysis found that neither model made opioid treatment recommendations that differed by race or sex. However, the tools did show some differences—GPT-4 tended to rate pain as “severe” more frequently than Gemini, which was more likely to recommend opioids.
While further validation is necessary, the researchers believe the results indicate that LLMs could help address biases in healthcare. “These results are reassuring in that patient race, ethnicity, and sex do not affect recommendations, indicating that these LLMs have the potential to help address existing bias in healthcare,” said co-first authors Cameron Young and Ellie Einchen, students at Harvard Medical School, in a press release.
However, the study has limitations. It categorized sex as a binary variable, omitting a broader gender spectrum, and it did not fully represent mixed-race individuals, leaving certain marginalized groups underrepresented. The team suggested future research should incorporate these factors and explore how race influences LLM recommendations in other medical specialties.
Marc Succi, MD, strategic innovation leader at Mass General Brigham and corresponding author of the study, emphasized the need for caution in integrating AI into healthcare. “There are many elements to consider, such as the risks of over-prescribing or under-prescribing medications and whether patients will accept AI-influenced treatment plans,” Succi said. “Our study adds key data showing how AI has the potential to reduce bias and improve health equity.”
Succi also noted the broader implications of AI in clinical decision support, suggesting that AI tools will serve as complementary aids to healthcare professionals. “In the short term, AI algorithms can act as a second set of eyes, running in parallel with medical professionals,” he said. “However, the final decision will always remain with the doctor.”
These findings offer important insights into the role AI could play in reducing bias and enhancing equity in pain management and healthcare overall.