AI Health
Friday Roundup
The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.
November 7, 2025
In this week’s Duke AI Health Friday Roundup: offloading moral agency to chatbots; benefits and limits of RAG-based large language models for medical teaching; tiny tyrannosaur confirmed; helping doctors navigate feelings of shame; bogus, AI-penned letters crowd the plate at journals; LLMS for data extraction; AIs struggle to understand out-of-sequence comic strips; much more:
AI, STATISTICS & DATA SCIENCE
- “We deployed a RAG-based teaching assistant in a medical school basic science course across two consecutive cohorts….Students demonstrated strategic, context-dependent usage, with engagement intensifying during high-stakes assessment periods and substantial after-hours utilization. Users primarily sought clarification on foundational concepts and valued the system’s continuous availability and source-grounded responses. However, knowledge-base constraints that ensured accuracy also limited broader inquiries, creating tension between reliability and comprehensiveness that shaped how students incorporated the tool into their study routines.” A research article published in NPJ Digital Medicine by Thesen and Park describes how medical students work with “constrained” AI systems that employ built-in mechanisms designed to enhance reliability (H/T @smcgrath.phd).
- “We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at the frontier of the Reasoner’s capability, while corpus grounding provides the rich, near-inexhaustible external signal necessary for sustained improvement.” In a preprint available from arXiv, Liu and colleagues describe an approach to machine reinforcement learning that splits a model’s activities into two roles: one that mines documents to propose tasks and a reasoning module that solves them.
- “A nationally representative pre-registered experiment (n = 459) found that Googling images of occupations amplifies age-related gender bias in participants’ beliefs and hiring preferences. Furthermore, when generating and evaluating resumes, ChatGPT assumes that women are younger and less experienced, rating older male applicants as of higher quality. Our study shows how gender and age are jointly distorted throughout the internet and its mediating algorithms, thereby revealing critical challenges and opportunities in the fight against inequality.” An investigation by Guilbeault and colleagues, published in Nature, offers evidence of systematic bias in representation across media available online – and in the large language models trained on that raw material.
- “We introduce SciDaSynth, a novel interactive system powered by large language models that automatically generates structured data tables according to users’ queries by integrating information from diverse sources, including text, tables, and figures. Furthermore, SciDaSynth supports efficient table data validation and refinement, featuring multi-faceted visual summaries and semantic grouping capabilities to resolve cross-document data inconsistencies.” In a paper published in Campbell Systematic Reviews, Wang and colleagues describe an approach for using large language models to assist in extracting data from multiple disparate sources for synthesis and analysis.
- “Notably, evaluation results on STRIPCIPHER reveals a significant gap between current LMMs and human performance—e.g., GPT-4o achieves only 23.93% accuracy in the reordering task, 56.07% below human levels. These findings underscore the limitations of current LMMs in implicit visual narrative understanding and highlight opportunities for advancing sequential multimodal reasoning.” FINALLY, something I’m still better at than an AI: In a paper available from ACL Anthology, Wang and colleagues present findings of an evaluation that suggest that large language models struggle with understanding the implicit narrative structures present in visual media such as comic strips (H/T @neilcohn.bsky.social).
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- Big, which is to say, small, dinosaur news from downtown Raleigh: “’Simply put, the Dueling Dinosaur Nanotyrannusis fully grown at half the length and one tenth the body mass of a mature rex,’ says study co-author Lindsay Zanno, a paleontologist at North Carolina State University. ‘There is no scenario in which this animal morphs into a T. rex.’ She explains that Nanotyrannus must have been a smaller, sleeker predator that hunted alongside Tyrannosaurus.” Scientific American reports on the recent study hailing a new Tyrannosaur species based on fossils being studied at the NC Museum of Natural Sciences.
- “Shame is a common and highly uncomfortable human emotion. In the years since that pivotal incident, Bynum has become a leading voice among clinicians and researchers who argue that the intense crucible of medical training can amplify shame in future doctors…He is now part of an emerging effort to teach what he describes as “shame competence” to medical school students and practicing physicians. While shame can’t be eliminated…related skills and practices can be developed to reduce the culture of shame and foster a healthier way to engage with it.” In an article jointly published by Kaiser Family Foundation News and NPR, Charlotte Huff reports on the rise of approaches to medical training that try to equip physicians to manage the powerful emotional reactions that can accompany a perceived failure or harm that takes place when providing patient care.
- “Here, we present a wearable solar fluidic system that harnesses human sweat to enable self-sufficient freshwater production, energy supply, and information exchange. By designing a photothermal fabric that can generate a temperature gradient between skin and environment, sweat evaporation and ion flow can drive to provide sustained electrical power without external energy input. In outdoor wearing tests, these fluidic fabrics can produce fresh water of 24.2 liters per kilogram and deliver 8.50 volts of power, ultimately powering a Mars rover wirelessly.” In a paper published in Science Advances, Meng and colleagues describe the successful prototyping of something that sounds awfully like a Dune-style stillsuit.
COMMUNICATIONS & Policy
- “The study suggests that AI systems provide human actors with an easily accessible, low-cost, and hard-to-monitor means of offloading personal moral responsibility, highlighting the need to consider in AI regulation not only the inherent risks of AI output, but also how AI’s perceived moral agency can influence human behavior and ethical accountability in human-AI interaction.” In a paper available from the SSRN preprint server, Tontrup and Sprigman describe an experiment that suggests human users may “offload” moral decision-making onto AI chatbots in certain circumstances.
- “Claude proved to be a dogged, forensic ally. The biggest catch was that it uncovered duplications in billing. It turns out that the hospital had billed for both a master procedure and all its components. That shaved off, in principle, around $100,000 in charges that would have been rejected by Medicare.” In a news item at the Tom’s Hardware computer blogsite, reporter Mark Tyson conveys the account of a hospital customer who used Anthropic’s Claude AI to identify hospital billing discrepancies and miscodings.
- “The number of suspicious letters keeps growing, Dr. Chaccour and his colleagues found. In 2023, the share of letters written by prolific authors — those who had three or more published in a year — was 6 percent. In 2024, it was 12 percent. This year, the investigators report, it is approaching 22 percent…They’re invading journals ‘like Omicron,’ Dr. Chaccour said, referring to the Covid variant that quickly became dominant.” In an article for the New York Times, Gina Kolata reports on the emergence of suspiciously prolific (and probably AI-boosted) letter-writers seeking a publication credit in prestigious biomedical journals.
