AI Health
Friday Roundup
The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.
February 6, 2025
In this week’s Duke AI Health Friday Roundup: evidence on benefit for LLM-human collaborations still murky; probing associations between AI use and depressive symptoms; randomized trial examines human plus AI for mammogram reading; geographic AI analysis reveals gaps in MMR coverage; the argument against abdicating peer review to AI; plain language summaries could improve public understanding of science; much more:
AI, STATISTICS & DATA SCIENCE
- “LLM-enabled human–AI collaboration yields statistically suggestive but highly uncertain, context-sensitive benefits in diagnostic/management and documentation tasks. The pooled estimates for diagnostic and management quality are not robust, with 95% prediction intervals crossing the null, indicating a high risk of no benefit in different clinical settings….the most prudent path is contextualized, phased deployment under strengthened safety and governance guardrails, coupled with standardized trials, transparent reporting, and continuous monitoring…” A systematic review and meta-analysis of reports on human plus LLM collaboration in clinical settings by Wang and colleagues, available as an accepted preprint from NPJ Digital Medicine, finds that the evidence for or against such approaches remains preliminary and difficult to definitively parse.
- “AI-supported mammography screening showed consistently favourable outcomes compared with standard double reading, with a non-inferior interval cancer rate, fewer interval cancers with unfavourable characteristics, higher sensitivity, and the same specificity, while also reducing screen reading workload. These findings imply that AI-supported mammography screening can efficiently improve screening performance compared with standard double reading and may be considered for implementation in clinical practice.” A research article published in Lancet by Gommers and colleagues presents findings from a large, randomized Swedish study that compared AI-assisted reading of breast mammography with the standard double-reading methods.
- “Our study presents prospectively-collected qualitative evidence from 10 physicians providing 147 sets of free-text feedback on AI-generated chart summaries as part of standardized, EHR-integrated functionality that is currently being scaled nationally. The predominant finding was positive feedback, but inaccuracies were commonly reported. When information was inaccurate, we found that missing information and confusing information was more commonly reported than hallucinations.” A study by Kahl and colleagues, published in the journal NPJ Health Systems, evaluates the quality of AI-based patient chart summaries integrated into electronic health records.
- “Clinicians do not lack interest in AI. They lack the evidence that engenders trust. Trust follows from evidence of the tasks they perform and in the environments in which they practice. Clinicians also want visible guardrails and clear accountability. If the study form does not match the use form, and if we do not quantify misses, deferrals and time-to-action, skepticism is justified.” A commentary by Azad, Krumholz, and Saria, published in Nature Medicine, lays out a series of proposed principles for evaluating AI tools in realistic clinical settings and dispensing with artificial benchmarks.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “AI tools are poised to transform emergency medicine, but only if integrated with careful attention to legal, ethical, and regulatory safeguards to protect patient information. Physicians must recognize that using AI to process or generate information involving PHI carries very real liability. A failure to align with HIPAA federal statutes and additional layered state-specific laws may result in disciplinary action, institutional fines, or legal liability.” A review article published in the Journal of the American College of Emergency Physicians Open by Schoolcraft and colleagues examines the complexities of using AI tools while subject to the privacy requirements of the HIPAA framework.
- “…we found that daily AI use was common and significantly associated with depressive and other negative affective symptoms after adjusting for sociodemographic features. While the magnitudes of associations were generally modest, among individuals aged 45 to 64, the odds of reporting at least moderate depression were 50% greater for daily AI users. Given the rapidity of AI dissemination and the scale of use, these results in aggregate suggest the need to better understand potential causation and heterogeneity of outcomes.” A research article published in JAMA Network Open by Perlis and colleagues examines potential associations between the use of generative AI and the development of depressive symptoms.
- “Our study provides the nationwide county-level MMR vaccine coverage among US children under age 5, leveraging a digital surveillance tool and advanced spatial modelling methods. These granular estimates reveal substantial gaps in coverage, highlighting the critical role of local variation in vaccine-induced immunity in shaping measles vulnerability. Importantly, by drawing on a digital participatory surveillance platform rather than administrative records, our approach captures children who are often absent from official reporting systems…” A research article published in Nature Health by Zhou and colleagues describes the use of a participatory digital disease surveillance system, coupled with AI analysis, to reveal gaps in vaccination coverage for MMR.
COMMUNICATIONS & Policy
- “To combat misuse of scientific research, peer reviewed journals should publish plain language abstracts in conjunction with standard abstracts. These abstracts can be developed with journal editors to help authors avoid unnecessary jargon and be finalized after the peer review process to avoid having experts overly criticize the generalization of the work….Within this abstract, scientists can describe their findings, the impact of their findings, and the limitations of the work.” In opinion article for STAT News, Kirstin R.W. Matthews and Heidi Russell advocate for the timely translation of research findings into nontechnical, plain-language summaries to help better inform patients and the public and to stem the tide of medical misinformation.
- “Marginal savings in time or costs are important, but it would be greatly disappointing if these are the bulk of what the AI revolution delivers to medicine. It is important to consider that the highest-value applications to patients, providers, and payers may be in those areas where data are scarce, expensive, biased, or otherwise less accessible.” A perspective article by Murthy and Patel, published in NEJM AI, explores a “paradox” of medical AI adoption in which truly high-value AI applications may encounter serious roadblocks.
- “Once we invite machines to act as reviewers, we begin to slide from seeing peer review as a human conversation to treating it as a technical service. But a peer is not merely an error detector. A peer is someone who inhabits the same argumentative world as the author, who recognises the paper as situated in a field’s live disputes, fashions and vulnerabilities…. Once we invite machines to act as reviewers, we begin to slide from seeing peer review as a human conversation to treating it as a technical service. But a peer is not merely an error‑detector.” In an article for Times Higher Education, Akhil Bhardwaj anatomizes some of the problems with relying on AI applications to solve the current resource scarcities afflicting scientific peer review.
