Artificial Intelligence
Representative Image Gerd Altmann/Pixabay

Dr. Donald Macfarlane presents a grave scenario that creates a pressing ethical and practical dilemma that will lead to malpractice suits, inflated reimbursements and compromised patient care. He argues that there are limitations in utilizing artificial intelligence-driven generative pre-trained transformer tools, such as OpenAI GPT 4 or Google's Bard, in critical domains like clinical note creation, billing, X-ray reports, discharge summaries or insurance submissions due to the risk of creating facts, which are misleading, inaccurate and fraudulent.

Generative AI (Gen AI) has gained recognition for its capability to formulate convincing and well-written documents from specific prompts. This leads to the use of these tools in customer service and creative writing. However, there are increasing concerns over "hallucinations" or the propensity of GPT to invent facts and argue that they're true. This concern creates practical and ethical impediments to using the documents produced by generative AI tools in medical note-taking and billing documents.

In the United States alone, clinicians write approximately 1 billion outpatient reports and a comparable number of inpatient reports annually. These notes are extensive records detailing patient interactions, including medical history, physical examinations, opinions on patient management and data from various diagnostic studies and consultations.

Medical notes are highly significant to patient care, as they serve as guides for future clinical decisions, billing for services rendered and evidence in malpractice defense.

In addition, they contribute to clinical research and influence administrative determinations. These records are traditionally handwritten and stored in physical binders, eventually transitioning to electronic medical record systems. Despite the technological shift, oversights like omissions, duplications, factual inaccuracies and spelling errors persist. Adding to this is the often lack of organization and coherence that renders them inefficient for proper computer analysis.

Macfarlane illustrates this point in his recent publication. "I input a simple prompt in OpenAI GPT 4 — 'Prepare a clinical note detailing the first visit of a 70-year-old male patient requiring a left knee replacement.' Note that I provided three facts in this prompt. What GPT 4 then does is fascinating and concerning at the same time. It doesn't just fulfill the request; it goes beyond. It generated an additional 47 pieces of fabricated information to construct a comprehensive medical note." Macfarlane says these "hallucinations" of invented facts are intrinsic to the processing within generative, pre-trained transformers. He adds, "I haven't seen any successful efforts to eradicate hallucinations."

The rise of AI can potentially address the challenges evident in professional documentation. McKinsey & Company's latest annual Global Survey highlights the rapid proliferation of generative AI tools across various industries. According to survey findings, there is a dramatic surge in the utilization of generative AI within business functions. Approximately one-third of respondents confirm regular use of these tools within their organization. Meanwhile, 40% of respondents anticipate increased investment in AI overall due to the advancement in the technologies' capabilities.

Macfarlane, a retired professor of internal medicine specializing in hematology, oncology and blood and marrow transplantation at the University of Iowa Hospitals and Clinics, aims to address the persistent issues surrounding medical note-taking.

In 2008, he founded Lexeme Technologies, LLC to leverage Lexeme Theories to predict thematic progression in professional reports. This venture led to the development of LexeNotes®, a software product set to redefine medical note-taking, operating on a system that reverses the thinking of AI-driven natural language processing.

Following his extensive experience spanning 44 years in the field, Macfarlane shares, "Throughout my career, the quality of medical notes has been a concern. I was probably writing 40 notes a day, with around 25 being typical for a clinic day. The pressure to deliver notes on time was immense, and institutional oversight rarely focused on the quality of these notes. As a consequence, doctors' notes tend to be subpar. They're full of errors, from omissions and duplications to misspellings and grammatical issues."

The doctor notes that these human errors are not immediately consequential within a medical setting, as colleagues typically review and rectify them within the notes. However, he cannot determine if AI-generated notes are true or false.

The resulting document does not carry obvious markers of AI origin. It is well-written with impeccable structure, spelling, grammar and punctuation. The layout also adheres to professional standards. With this, Macfarlane adds, "A doctor who might be overwhelmed and falling behind in their note-taking responsibilities might resort to using a generative language model to catch up. My concern is that, the worse the doctor's performance, the better the notes generated by the AI. So, a doctor with shortcomings in their documentation could utilize it to produce seemingly faultless notes."

Macfarlane believes that, based on his research and extensive experience, health care institutions that submit medical bills and employ clinicians must not use AI to generate documents.