AI Health
Friday Roundup
The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.
November 1, 2024
In this week’s Duke AI Health Friday Roundup: concerns about LLMs clouding patients’ medical records; methylation allows more efficient DNA computing; surveying the impact of generative AI in scholarly publishing; access to chatbots doesn’t improve clinical reasoning; AI transcription tool hallucinates, especially with lots of pauses; cardiovascular risks of yo-yo dieting; cataloguing skills taught in US higher education; effects of screen time on teen mental health; much more:
AI, STATISTICS & DATA SCIENCE
- “We fear, however, that instead of facilitating communication and transparency, the insertion of LLM-generated text directly into the medical record risks diminishing the quality, efficiency, and humanity of health care. Such text may include structured notes about clinical encounters, prepopulated responses to patient-portal messages, or summaries of clinical information intended for physicians. We are particularly concerned that LLM-generated text will reduce the overall informational quality of the chart, rendering this critical resource less useful for both physicians and future AI models.” A perspective article by McCoy and colleagues, published in the New England Journal of Medicine, warns of potential harm to patient medical records resulting from incautious use of large language models as a writing aid by clinicians…
- …and in that same issue of the New England Journal: Isaac Kohane points out that for many patients, lack of access to reliable primary care is a significant challenge, and that rigorous trials of AI technologies that could help narrow these gaps are needed to compare their performance against the existing standard: “…individual anecdotes do not substitute for systematic evaluation. With any new clinical intervention, rigorous trials are the medical field’s best tools to drive the establishment of best practices. In the case of AI, shouldn’t we be comparing health outcomes achieved with patients’ use of these programs with outcomes in our current primary-care-doctor–depleted system?”
- “Novel products applying artificial intelligence (AI)-based methods to digital pathology images are touted to have many uses and benefits. However, publicly available information for products can be variable, with few sources of independent evidence. This review aimed to identify public evidence for AI-based products for digital pathology….Only 10 of the products (38%) had peer-reviewed internal validation studies and 11 products (42%) had peer-reviewed external validation studies. To support transparency an online register was developed using identified public evidence…” A review article by Matthews and colleagues, published in NPJ Digital Medicine, examines European and British AI applications for digital pathology and finds a relative paucity of evidence regarding internal and external validation studies.
- “…physician use of a commercially available LLM chatbot did not improve diagnostic reasoning on challenging clinical cases, despite the LLM alone significantly outperforming physician participants. The results were similar across subgroups of different training levels and experience with the chatbot. These results suggest that access alone to LLMs will not improve overall physician diagnostic reasoning in practice. These findings are particularly relevant now that many health systems offer…[HIPAA]–compliant chatbots that physicians can use in clinical settings, often with no to minimal training on how to use these tools.” A research article by Goh and colleagues, published in JAMA Network Open, presents results from a randomized study that found that access to a large language model-based diagnostic aid did not significantly improve the clinical reasoning of physicians – but also that the LLM alone performed better than either of the physician groups evaluated.
- …“there are significant issues with the reproducibility of transcriptions in light of non-deterministic hallucinations. Most importantly, however, there are implications for understanding the harms and biases arising from the disparate impacts on subpopulations whose speech is likely to yield hallucinations.” In a study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), Koenecke and colleagues examined the OpenAI Whisper transcription service, and found that despite generally high accuracy, a propensity to hallucinate nonexistent content remained, particularly when tasked with transcribing subjects prone to episodes of frequent pauses, such as those that accompany aphasia.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “Although it has been speculated that this pattern of switching back and forth between diets and the resulting changes in body weight is bad for health, compelling data are lacking and mechanisms are poorly understood. Writing in Nature, two independent and complementary studies in mice explore the impact of cycling between a high-fat and a low-fat diet and both find that, surprisingly, diet cycling substantially increases atherosclerosis compared with a continuous high-fat diet.” A news article in Nature by Daniel J. Rader and Kate Townsend Creasy explores the potential cardiovascular risks that may accompany so-called “yo-yo” dieting that oscillates between high- and low-fat foods.
- “…in the domains of moral disengagement, algorithm aversion, and scientific misconduct, medical researchers who received NPF [negative performance feedback] from algorithms showed significantly higher levels of these effects compared to the group receiving NPF from humans. Secondly, in the group receiving NPF from algorithms, we find that the interaction between egoism and algorithm aversion has a positive impact on moral disengagement among medical researchers, subsequently leading to an increase in scientific misconduct. As the level of algorithm aversion increases, the positive relationship between egoism and moral disengagement becomes stronger.” A research article by Liao and colleagues, published in the journal BMC Medical Ethics, examines attitudinal responses to human vs algorithmic negative feedback on medical researchers’ performance.
- “Qian and her colleagues encoded data through methylation, a chemical reaction that switches genes on and off by attaching a methyl compound—a small methane-related molecule. Once the bricks are locked into their assigned spots on the strand, researchers select which bricks to methylate, with the presence or absence of the modification standing in for binary values of 0 or 1.” An article by MIT Technology Review’s Jenna Ahart unpacks recent research describing a new method for storing computable data in strands of DNA.
COMMUNICATION, Health Equity & Policy
- “Teenagers with higher daily screen time were more likely to experience both anxiety and depression symptoms over the past 2 weeks, which is consistent with previous research. As technology and screens continue to develop, their influence on the lives of children changes, making it increasingly important to expand our understanding of the patterns of screen time use overall and among selected subgroups.” A new data brief by Zablotsky and colleagues, available from the CDC’s National Center for Health Statistics, summarizes some findings from recent surveys of screen time among teenagers.
- “While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document which of these skills are being developed in higher education at a similar granularity. Here, we fill this gap by presenting Course-Skill Atlas – a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions.” A research article by Sabet and colleagues, published in Scientific Data, introduces a longitudinal dataset that captures various skillsets taught at US colleges and universities.
- “The pace of iteration over the past 24 months has outstripped publishing organizations’ abilities to adapt underlying business models. Individuals we spoke to across the sector acknowledged that organizations need to invest more time in understanding generative AI technology more deeply because it has the potential to upend a number of underlying systems in the future. Many of the thorniest challenges necessitate cross-sector collaboration involving not only publishing organizations but also universities, funders, and technology providers.” A guest post by Tracy Bergstrom and Dylan Ruediger at Scholarly Kitchen examines the prospects for generative AI tools in the world of academic publishing.