Study Identifies Cost-Effective Strategies for Deploying Large Language Models in Healthcare
Efficient deployment of large language models (LLMs) at scale in healthcare can streamline clinical workflows and reduce costs by up to 17 times without compromising reliability, according to a study published in NPJ Digital Medicine by researchers at the Icahn School of Medicine at Mount Sinai.
The research highlights the potential of LLMs to enhance clinical operations while addressing the financial and computational hurdles healthcare organizations face in scaling these technologies. To investigate solutions, the team evaluated 10 LLMs of varying sizes and capacities using real-world patient data. The models were tested on chained queries and increasingly complex clinical notes, with outputs assessed for accuracy, formatting quality, and adherence to clinical instructions.
“Our study was driven by the need to identify practical ways to cut costs while maintaining performance, enabling health systems to confidently adopt LLMs at scale,” said Dr. Eyal Klang, director of the Generative AI Research Program at Icahn Mount Sinai. “We aimed to stress-test these models, evaluating their ability to manage multiple tasks simultaneously and identifying strategies to balance performance and affordability.”
The team conducted over 300,000 experiments, finding that high-capacity models like Meta’s Llama-3-70B and GPT-4 Turbo 128k performed best, maintaining high accuracy and low failure rates. However, performance began to degrade as task volume and complexity increased, particularly beyond 50 tasks involving large prompts.
The study further revealed that grouping tasks—such as identifying patients for preventive screenings, analyzing medication safety, and matching patients for clinical trials—enabled LLMs to handle up to 50 simultaneous tasks without significant accuracy loss. This strategy also led to dramatic cost savings, with API costs reduced by up to 17-fold, offering a pathway for health systems to save millions annually.
“Understanding where these models reach their cognitive limits is critical for ensuring reliability and operational stability,” said Dr. Girish N. Nadkarni, co-senior author and director of The Charles Bronfman Institute of Personalized Medicine. “Our findings pave the way for the integration of generative AI in hospitals while accounting for real-world constraints.”
Beyond cost efficiency, the study underscores the potential of LLMs to automate key tasks, conserve resources, and free up healthcare providers to focus more on patient care.
“This research highlights how AI can transform healthcare operations. Grouping tasks not only cuts costs but also optimizes resources that can be redirected toward improving patient outcomes,” said Dr. David L. Reich, co-author and chief clinical officer of the Mount Sinai Health System.
The research team plans to explore how LLMs perform in live clinical environments and assess emerging models to determine whether advancements in AI technology can expand their cognitive thresholds.