AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

March 6, 2026

In this week’s Duke AI Health Friday Roundup: validation study for updated Epic sepsis tool; Health GPT tool offers mixed bag of results in diagnosing problems; mathematical model for the incentive structures threatening peer review; decades of fictionalized clinical vignettes never identified as such by journal; AI agents keep getting better in terms of capabilities (but not reliability); much more:

AI, STATISTICS & DATA SCIENCE

An artist’s illustration of artificial intelligence (AI). This image was inspired by neural networks used in deep learning. It was created by Novoto Studio as part of the Visualising AI project launched by Google DeepMind. Image credit: Google DeepMind/Unsplash
Image credit: Google DeepMind/Unsplash
  • “This prognostic study of 227 091 inpatient encounters across 4 major US health systems found that the model had an area under the receiver operating curve between 0.82 and 0.92 but demonstrated high institutional variability, low positive predictive value, and high alert burden…These findings suggest that institutions implementing this model should conduct local validation studies to verify performance…” In a research article published in JAMA Network Open, Wong and colleagues present results from a multi-institutional validation study of an updated version of the Epic Sepsis Prediction Model.
  • “Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system under-triaged 52% of cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department, while correctly triaging classical emergencies such as stroke and anaphylaxis. When family or friends minimized symptoms (anchoring bias), triage recommendations shifted significantly in edge cases…” In a research article accepted for publication by Nature Medicine and available as an uncorrected preprint, Ramaswamy and colleagues evaluate the performance of the “Health GPT” chatbot as a dispenser of medical information.
  • “Yue believes that the large amount of data in her real inbox ‘triggered compaction,’ she wrote. Compaction happens when the context window — the running record of everything the AI has been told and has done in a session — grows too large, causing the agent to begin summarizing, compressing, and managing the conversation…At that point, the AI may skip over instructions that the human considers quite important. TechCrunch’s Julie Bort reports on an instance of the OpenClaw agentic AI exceeding its brief after being tasked with organizing an email inbox.
  • “While our findings are tentative at this stage, we hope they can help explain the puzzlement among many in the industry as to why the economic impacts of AI agents have been gradual, even though they are crushing capability benchmarks. To help the community track reliability systematically, we plan to launch an AI agent “reliability index”. …We hope this will stimulate researchers and industry to invest effort into improving reliability.” A preprint paper by Rabanser, Kapoor, and Narayanan addresses the shortfall between the accelerating performance of AI agentic AIs and their (relatively flat) reliability.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

Galileo Galilei (1564-1642) A half-length portrait, in old age, of the Italian mathematician, philosopher and astronomer, who was appointed court mathematician to the Medici dukes of Tuscany at Florence in 1610. He is dressed in black, wears a white beard and is seated in a chair holding a telescope in his right hand. His left hand rests on the arm of the chair and he wears a ring with a clear stone on his fourth finger. He faces forward towards the viewer. Image via Wikimedia Commons. Alt text excerpted from Justus Sustermans - http://collections.rmg.co.uk/collections/objects/14174, Public Domain, https://commons.wikimedia.org/w/index.php?curid=62614082
Public domain image via Wikimedia Commons
  • “That book, Malara came to realize, had been extensively annotated by none other than Galileo Galilei. Malara’s discovery, described in a paper now under review at the Journal for the History of Astronomy, promises new insights into one of the most famous ideological transitions in the history of science: the moment when Earth was thrust from the center of our universe.” Science’s Joshua Sokol reports on a remarkable antiquarian find: a scholar leafing through an early printed edition of Ptolemy’s astronomical manual The Almagest finds copious handwritten marginalia by Galileo.
  • “We identified a distinct profile of neurogenesis in SuperAgers that may reflect a ‘resilience signature’. Finally, alterations in the profile of astrocytes and CA1 neurons govern cognitive function in the ageing hippocampus. Together, our study points to a multiomic molecular signature of the hippocampus that distinguishes cognitive resilience and deterioration with ageing.” In a research article published in Nature, Disouky and colleagues present evidence for the development of new neurons in the hippocampal regions of the brain in so-called “Super Agers” – older individuals with exceptional memories.
  • “…while there are overwhelming performance gaps between tools in some comparisons, most comparisons have marginal differences between the tools. In these cases, tool run times, cost, ease of use, and quality/transparency of the output are likely more important factors than the raw performance. Therefore, tool developers should invest effort in tool usability and transparency in addition to pure performance.” In a research article published in PLOS One, Eckman and colleagues evaluate the performance of different automated tools designed to check the methodological rigor of scientific papers.

COMMUNICATIONS & Policy

Photograph shows a scientific calculator and a workbook open to a set of physics homework problems. Image credit: Anoushka Puri/Unsplash
Image credit: Anoushka Puri/Unsplash
  • “For a minority of teens, chatbots have become a go-to tool for much of their schoolwork. One-in-ten teens say they do all or most of their schoolwork with chatbots’ help….Larger shares say they do some (21%) or a little (23%) of their schoolwork with the help of a chatbot. Another 45% haven’t used them in this way.” A report recently released by the Pew Research Center presents results from a survey examining U.S. teens’ use of large language model chatbots, including for school assignments.
  • “…we develop mathematical models to reveal the intricate interactions among incentives faced by authors, reviewers, and readers in their endeavors to identify the best science. …First, peer review partially reveals authors’ private sense of their work’s quality through their decisions of where to send their manuscripts. Second, journals’ reliance on traditionally unpaid and largely unrewarded review labor deprives them of a standard market mechanism—wages—to recruit additional reviewers when review labor is in short supply.” A research article published in PLOS Biology by Bergstrom and Gross examines incentive cycles that threaten to make the peer-review system unsustainable.
  • “The advent of social media has created communities that can more easily connect across time and space, and so has introduced a new type of opinion leader, says Amelia Burke-Garcia, director of the Center for Health Communication Science at NORC at the University of Chicago in Illinois. What’s important, says Burke-Garcia, who has been working with influencers in the public-health space since 2008…is to support credible voices and evidence in these communities.” In a news feature for Nature, Catriona Clarke examines the efforts of science influencers to push back on mis- and disinformation on social media.
  • “Regardless of the statements in the author guidelines, the fact the cases are fictional should have been conveyed to the readers of all of these articles, Juurlink said. In the case of Baby boy blue, ‘the article was structured as an authentic clinical case, indexed as such, and cited as an actual clinical observation. Readers had no way of knowing it was fictional,’ he said.” Yikes: Retraction Watch reports that the Canadian journal Paediatrics & Child Health has been publishing – for decades – a series of fictionalized case reports that were never identified as such to readers.