AI Health
Friday Roundup
The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.
September 19, 2025
In this week’s Duke AI Health Friday Roundup: why LLMs hallucinate; origins of small-cell lung cancer revealed; lightweight language model overcomes resource bottlenecks; surveying the big picture on microplastics; modeling scientific risk-taking; worries that AI is eroding critical thinking; AI governance lessons from the world of genetics; new STARD-AI reporting guidelines for diagnostic accuracy studies; much more:
AI, STATISTICS & DATA SCIENCE
- “We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline….If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded — language models are optimized to be good test-takers, and guessing when uncertain improves test performance.” In a research article available as a preprint from arXiv, a group of OpenAI authors explore why large language models are prone to making up answers (“hallucinating”).
- in achieves performance comparable to open-source Transformer baselines while using only about 150B tokens for continual pre-training. Our models significantly improve long-sequence training efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior….Training remains stable for weeks on hundreds of MetaX C550 GPUs, with the 7B model reaching a Model FLOPs Utilization of 23.4 percent. The proposed spiking scheme achieves 69.15 percent sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.” In a research article available as a preprint from arXiv, Pan and colleagues debut SpikingBrain, a language model designed overcome compute-resource bottlenecks that limit the performance of other large language models.
- “Authors are encouraged to provide descriptions of dataset practices, the AI index test and how it was evaluated, as well as considerations of algorithmic bias and fairness. The STARD-AI statement supports comprehensive and transparent reporting in all AI-centered diagnostic accuracy studies, and it can help key stakeholders to evaluate the biases, applicability and generalizability of study findings.” A new set of STARD-AI Consensus Statement reporting guidelines focused on AI-based diagnostic accuracy studies is now available at Nature Medicine.
- “Many people who strongly believe in seemingly fact-resistant conspiratorial beliefs can change their minds when presented with compelling evidence. From a theoretical perspective, this paints a surprisingly optimistic picture of human reasoning: Conspiratorial rabbit holes may indeed have an exit. Psychological needs and motivations do not inherently blind conspiracists to evidence—it simply takes the right evidence to reach them.” A research article by Costello and colleagues, published in Science, explores the effects human-LLM dialogue in reducing belief in conspiracy theories.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “In science as elsewhere, attention is a limited resource and scientists compete with one another to produce the most exciting, novel and impactful results. We develop a game-theoretic model to explore how such competition influences the degree of risk that scientists are willing to embrace in their research endeavors. We find that competition for scarce resources…motivates scientific risk-taking and may be important in counterbalancing other incentives that favor cautious, incremental science.” A modeling study by Gross and Bergstrom, available as a preprint from arXiv, analyzes the dynamics that affect scientific risk-taking.
- “Early clinical findings indicate that MNPs [micro- and nano-plastics] may be associated with adverse health outcomes, including immune modulation, reproductive effects and cardiovascular effects. However, these studies typically suffer from low patient numbers and inadequate MNP exposure assessment, which precludes adequate risk assessment. Still, outcomes from animal and cell-based analyses generally support the preliminary clinical findings.” A review article published in Nature Medicine by Lamoree and colleagues provides a comprehensive overview of the current state of knowledge concerning the health impacts of microplastics.
- “Together, these data indicate that the basal cell is a probable origin for SCLC and other neuroendocrine–tuft cancers that can explain neuroendocrine–tuft heterogeneity, offering new insights for targeting lineage plasticity.” A research article published in Nature by Ireland and colleagues presents findings that clarify the hitherto mysterious cellular origins of small-cell lung cancer.
COMMUNICATIONS & Policy
- “…indeed, some of the new studies emphasize new opportunities and use cases for generative AI tools. But the research also points to significant potential drawbacks, including hindering developing skills and a general overreliance on the tools. Researchers also suggest that users are putting too much trust in AI chatbots, which often provide inaccurate information. With such findings coming from the tech industry itself, some experts say, it may signal that major Silicon Valley companies are seriously considering potential adverse effects of their own AI on human cognition…” At Undark, an essay by Ramin Skibba examines the question of whether uncritical reliance on oracular chatbots is replacing critical thinking in their human users.
- “These cases show that AI harms emerge not only from technical failures but also from design and deployment choices that do not fully account for how algorithms interact with users in specific institutional settings….These cases highlight a problem that includes but exceeds defective code: Companies are racing to push new AI capabilities to market while investing less in anticipating or preventing harms. These are sociotechnical failures, and understanding them requires multidisciplinary investigation.” In an essay for Science, Alondra Nelson places recent high-profile incidents involving chatbots into their sociotechnical context and proposes some lessons in governance from the realm of genetics.
- Did Not Understand the Assignment: Ars Technica’s Benj Edwards has the story of a Canadian government report on the ethical use of AI in education that contains multiple nonexistent (and possibly AI-hallucinated) references: “The presence of potentially AI-generated fake citations becomes especially awkward given that one of the report’s 110 recommendations specifically states the provincial government should ‘provide learners and educators with essential AI knowledge, including ethics, data privacy, and responsible technology use.’”
- “Dinika said he’s seen this pattern time and again where safety is only prioritized until it slows the race for market dominance. Human workers are often left to clean up the mess after a half-finished system is released. ‘Speed eclipses ethics,’ he said. ‘The AI safety promise collapses the moment safety threatens profit.’” The Guardian’s Varsha Bansal reports on the largely unseen world of human labor that supports the latest public-facing AI models.
