Generative AI’s Growing Role in Healthcare: Potential and Challenges
The rapid advancements in large language models (LLMs) have introduced generative AI tools into nearly every business sector, including healthcare.
As defined by the Government Accountability Office, generative AI is “a technology that can create content, including text, images, audio, or video, when prompted by a user.” These systems learn patterns and relationships from vast datasets, enabling them to generate new content that resembles but is not identical to the original training data. This capability is powered by machine learning algorithms and statistical models.
In healthcare, generative AI is being utilized for various applications, including clinical documentation, patient communication, and clinical text summarization.
Streamlining Clinical Documentation
Excessive documentation is a leading cause of clinician burnout, as highlighted by a 2022 athenahealth survey conducted by the Harris Poll. Generative AI shows promise in easing these documentation burdens, potentially improving clinician satisfaction and reducing burnout.
A 2024 study published in NEJM Catalyst explored the use of ambient AI scribes within The Permanente Medical Group (TPMG). This technology employs smartphone microphones and generative AI to transcribe patient encounters in real-time, providing clinicians with draft documentation for review. In October 2023, TPMG deployed this ambient AI technology across various settings, benefiting 10,000 physicians and staff.
Physicians who used the ambient AI scribe reported positive outcomes, including more personal and meaningful patient interactions and reduced after-hours electronic health record (EHR) documentation. Early patient feedback was also favorable, with improved provider interactions noted. Additionally, ambient AI produced high-quality clinical documentation for clinician review.
However, a 2023 study in the Journal of the American Medical Informatics Association (JAMIA) cautioned that ambient AI might struggle with non-lexical conversational sounds (NLCSes), such as “mm-hm” or “uh-uh,” which can convey clinically relevant information. The study found that while the ambient AI tools had a word error rate of about 12% for all words, the error rate for NLCSes was significantly higher, reaching up to 98.7% for those conveying critical information. Misinterpretation of these sounds could lead to inaccuracies in clinical documentation and potential patient safety issues.
Enhancing Patient Communication
With the digital transformation in healthcare, patient portal messages have surged. A 2021 study in JAMIA reported a 157% increase in patient portal inbox messages since 2020. In response, some healthcare organizations are exploring the use of generative AI to draft replies to these messages.
A 2024 study published in JAMA Network Open evaluated the adoption of AI-generated draft replies to patient messages at an academic medical center. After five weeks, clinicians used the AI-generated drafts 20% of the time, a notable rate considering the LLMs were not fine-tuned for patient communication. Clinicians reported reduced task load and emotional exhaustion, suggesting that AI-generated replies could help alleviate burnout.
However, the study found no significant changes in reply time, read time, or write time between the pre-pilot and pilot periods. Despite this, clinicians expressed optimism about time savings, indicating that the cognitive ease of editing drafts rather than writing from scratch might not be fully captured by time metrics.
Summarizing Clinical Data
Summarizing information within patient records is a time-consuming task for clinicians, and errors in this process can negatively impact clinical decision support. Generative AI has shown potential in this area, with a 2023 study finding that LLM-generated summaries could outperform human expert summaries in terms of conciseness, completeness, and correctness.
However, using generative AI for clinical data summarization presents risks. A viewpoint in JAMA argued that LLMs performing summarization tasks might not fall under FDA medical device oversight, as they provide language-based outputs rather than disease predictions or numerical estimates. Without statutory changes, the FDA’s authority to regulate these LLMs remains unclear.
The authors also noted that differences in summary length, organization, and tone could influence clinician interpretations and subsequent decision-making. Furthermore, LLMs might exhibit biases, such as sycophancy, where responses are tailored to user expectations. To address these concerns, the authors called for comprehensive standards for LLM-generated summaries, including testing for biases and errors, as well as clinical trials to quantify potential harms and benefits.
The Path Forward
Generative AI holds significant promise for transforming healthcare and reducing clinician burnout, but realizing this potential requires comprehensive standards and regulatory clarity. A 2024 study published in npj Digital Medicine emphasized the need for defined leadership, adoption incentives, and ongoing regulation to deliver on the promise of generative AI in healthcare.
Leadership should focus on establishing guidelines for LLM performance and identifying optimal clinical settings for AI tool trials. The study suggested that a subcommittee within the FDA, comprising physicians, healthcare administrators, developers, and investors, could effectively lead this effort.
Additionally, widespread deployment of generative AI will likely require payer incentives, as most providers view these tools as capital expenses.
With the right leadership, incentives, and regulatory framework, generative AI can be effectively implemented across the healthcare continuum to streamline clinical workflows and improve patient care.