AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

October 25, 2024

In this week’s Duke AI Health Friday Roundup: AI boosted cameras to prevent medication errors; systems education for future scientists; LLMs extract quality measures from patient data; fine-tuning persuasion for foundation models; staffing patterns and quality measures; tools target suspicious patterns at scholarly journals; an ethics framework for evaluating AI; the benefits of publishing failures as well as successes; much more:

AI, STATISTICS & DATA SCIENCE

Two people are illustrated in a warm, cartoon style, one on the left and one on the right. The person on the left, who has their back to the viewer, and is typing on a laptop which is sitting on a table. They are white, their hair is shoulder length and dark, and they are wearing a green t-shirt. The computer screen is dark with rows of coloured squares representing programming. The person on the right looks similar but their hair is now tied back in a pony tail, and they are wearing a white lab coat and safety goggles. They are reaching down to lift up an orange hazard label which is about the size of a book. The label is an orange square with a black exclamation mark in the middle. The person looks like they are being careful as they lift it. Yasmin Dwiputri & Data Hazards Project / Better Images of AI / Safety Precautions / CC-BY 4.0
Image credit: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / CC-BY 4.0
  • “…our system has achieved the expected accuracy for clinical use as determined during our provider survey. Participants were asked about the minimum performance of the drug classifier to be considered for use in daily practice. 56% of survey participants indicated that the performance of the drug classifier should at minimum be 50–95% for use in daily practice, with the remaining 44% expressing a minimum performance of 99%. With image augmentation and a background image class, our syringe and vial classifier achieves a performance greater than 95% in the real-world deployment…” A research article by Chan and colleagues, published in NPJ Digital Medicine, describes a pilot study of wearable, AI-assisted cameras for preventing medication errors in the clinical setting (H/T @EricTopol).
  • “In order to balance positive and negative persuasion, we introduce Persuasion-Balanced Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate. PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion. Crucially, we show that PBT models are better teammates in multi-agent debates. We find that without PBT, pairs of stronger and weaker models have unstable performance, with the order in which the models present their answers determining whether the team obtains the stronger or weaker model’s performance.” A preprint by Stengel-Eskin and colleagues, available from arXiv, addresses an interesting problem in training foundation models: finding the right balance between accepting correction or persuasion while not becoming excessively “credulous.”
  • “Hospital quality measures are a vital component of a learning health system, yet they can be costly to report, statistically underpowered, and inconsistent due to poor interrater reliability. Large language models (LLMs) have recently demonstrated impressive performance on health care–related tasks and offer a promising way to provide accurate abstraction of complete charts at scale. To evaluate this approach, we deployed an LLM-based system that ingests Fast Healthcare Interoperability Resources data and outputs a completed Severe Sepsis and Septic Shock Management Bundle (SEP-1) abstraction.” A research article by Boussina and colleagues from UC San Diego, published in NEJM AI, reports on the use of an LLM to extract quality measures from electronic patient chart data.
  • “We explore the alignment of values in Large Language Models (LLMs) with specific age groups, leveraging data from the World Value Survey across thirteen categories. Through a diverse set of prompts tailored to ensure response robustness, we find a general inclination of LLM values towards younger demographics, especially when compared to the US population. Although a general inclination can be observed, we also found that this inclination toward younger groups can be different across different value categories.” A preprint by Liu and colleagues, available from arXiv, documents a youthful tilt in the underlying values incorporated into large language models.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

Two Zebra Finches (left male; right female) perched on a branch. These small songbirds have orange beaks with black and white vertical stripes on their faces, buff belly feathers, and grey top feathers. The male has brown and white speckles on his sides just under his wings. Image credit: christoph_moning - https://www.inaturalist.org/photos/96756110, CC BY 4.0
Image credit: christoph_moning - https://www.inaturalist.org/photos/96756110, CC BY 4.0
  • “When awake, Zebra Finches sing a well-regulated line of staccato notes. But their sleeping song movements are fragmented, disjointed and sporadic—‘rather like a dream,’ Mindlin says. A dozing finch seems to silently practice a few ‘notes’ and then add another, producing a pattern of muscle activity that reminds Mindlin ‘of learning a musical instrument.’” Scientific American’s David Godkin reports on recent research that eavesdrops on the musical dreams of songbirds.
  • “Are gene-editing therapies actually helping patients? Although there has been considerable excitement about the prospect of directly administering gene-editing therapy that is based on clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated protein 9 (Cas9) into the bodies of patients to treat diseases, we have only recently begun to see signs of success in clinical settings…The answer to the opening question is now an unambiguous ‘yes.’” An editorial by Kiran Musunuru in the New England Journal of Medicine hails a recently published study of a phase 2 trial of CRISPR gene-editing therapy for hereditary angioedema as early evidence of therapeutic benefit resulting from CRISPR-based interventions.
  • “The traditional methods of teaching science have failed to keep pace with the rapid advancements of the twenty-first century. Evidence of declining interest is reflected in the decreasing number of students enrolling in science courses at both the high school level and the university level. According to recent studies, fewer students are choosing to specialize in life sciences, with many perceiving these fields as inaccessible or irrelevant to real-world applications….One issue is the disjointed nature of science education across different educational stages.” A viewpoint article by Péter Hegyi and András Varró, appearing in Nature Medicine, makes a case for a ‘systems education’ approach to ensure that future generations of biomedical researchers and clinicians are adequately prepared.
  • “This cross-sectional study characterized clinician staffing patterns within health centers, revealing differentiated associations between physicians, APRNs, and PAs with individual clinical quality care metrics. Physician staffing was associated with increased cancer screening, infant vaccinations, and HIV testing; APRN staffing was associated with adult BMI assessment and counseling; and PA staffing was associated with infant vaccinations.” A research article published in JAMA Network Open by Ba and colleagues examines associations between clinician staffing patterns and quality of care measures.

COMMUNICATION, Health Equity & Policy

Road sign reading WRONG WAY photographed in a construction zone. Image credit: Kenny Eliason/Unsplash.
Image credit: Kenny Eliason/Unsplash
  • “The Journal started in 2018 as a thought experiment, after we had all met at a lecture on Open Access and its potential to give the public insight into how research is done, how the scientific process works. Scientific articles often give the impression that everything works out in one go, when we all know that is not the case. What would it be like to launch a journal about failed research, we wondered.” An article available from the University of Utrecht website includes an interview with Stefan Gaillard, one of the founders of the Journal of Trial and Error.
  • “The science-integrity website Argos, which was launched in September by Scitility, a technology firm headquartered in Sparks, Nevada, gives papers a risk score on the basis of their authors’ publication records, and on whether the paper heavily cites already-retracted research. A paper categorized as ‘high risk’ might have multiple authors whose other studies have been retracted for reasons related to misconduct, for example. Having a high score doesn’t prove that a paper is low quality, but suggests that it is worth investigating.” Nature’s Richard Van Noorden reports on the emergence of tools designed to flag potentially worrisome publication patterns in ostensibly peer-reviewed journals.
  • “We propose a novel approach to protecting people from misleading medical advice that is presented as authoritative with intent to deceive. We map a path for the Federal Trade Commission (FTC), the nation’s premier consumer protection agency, to reduce the production, spread, and impact of AI-generated medical disinformation. Our proposal leverages Section 5 of the FTC Act, which allows the Commission to prevent ‘persons, partnerships, or corporations’ from using ‘unfair or deceptive acts or practices in or affecting commerce.’ When produced or shared by commercial actors, AI-generated medical disinformation falls under Section 5 of the FTC Act.” A viewpoint article published in JAMA by Claudia Haupt and Mason Marks proposes using the authority of the FTC to rein in harmful AI-generated medical disinformation.
  • “We are therefore developing a new assessment tool, called the Collaborative Assessment for Responsible and Ethical AI Implementation (CARE-AI) tool, that will consolidate existing guidance, identify gaps and provide recommendations to promote the implementation of fair, trustworthy and ethically responsible AI prediction models to improve health outcomes. In addition to disease diagnosis and prognosis (which have been the focus of existing recommendations on prediction models), CARE-AI will include less discussed yet equally important applications, including the use of AI tools for drug discovery and development.” An article published in Nature Medicine by Ning and colleagues announces efforts to develop an evaluation tool for ethical considerations in health AI (h/t @GSCollins).