AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

June 21, 2024

In this week’s Duke AI Health Friday Roundup: scientific literature being flooded with bogus publications; LLMs can help screen for trial participants; lingering questions about H5N1 transmission; trial tests walking for lower back pain; interpretable deep learning model helps docs scan EEGs; why you shouldn’t cite chatbots; LLM model translates neglected languages; AI challenges for global governance; critiquing (and defending) Medicare Advantage; much more:

AI, STATISTICS & DATA SCIENCE

Selective focus photograph of a disorderly jumble of moveable type letters. Image credit: Raphael Schaller/Unsplash
Image credit: Raphael Schaller/Unsplash
  • “The NLLB-200 model currently supports 71 more languages than does Google Translate, and it provides translations of reasonable quality with several low-resource languages. However, the quality of these translations is still much worse than those involving high-resource languages. It is therefore crucial for the NLLB team to continue to work with the communities that use low-resource languages to encourage the creation of high-quality translation data — even a few thousand translations can provide a considerable improvement.” A news article in Nature by David I. Adelani describes research recently published in the journal that evaluated the performance of a large language model, developed by the No Language Left Behind (NLLB) project, capable of translation across hundreds of languages, many of them underrepresented in training data sets and neglected in more widely used LLMs.
  • “Users showed significant pattern classification accuracy improvement with the assistance of this interpretable deep-learning model. The interpretable design facilitates effective human–AI collaboration; this system may improve diagnosis and patient care in clinical settings.” A research article by Barnett and colleagues, published last month in NEJM AI, reports findings from an evaluation of an interpretable deep-learning system trained to help physicians identify seizures and other potentially harmful anomalies on electroencephalograms.
  • “…researchers tested five models — Mistral’s Mistral 7B, Cohere’s Command-R, Alibaba’s Qwen, Google’s Gemma and Meta’s Llama 3 — using a dataset containing questions and statements across topic areas such as immigration, LGBTQ+ rights and disability rights. To probe for linguistic biases, they fed the statements and questions to the models in a range of languages, including English, French, Turkish and German. Questions about LGBTQ+ rights triggered the most ‘refusals,’ according to the researchers — cases where the models didn’t answer. But questions and statements referring to immigration, social welfare and disability rights also yielded a high number of refusals.” An article by TechCrunch’s Kyle Wiggers explores the phenomenon of dissent among different large language models when confronted with questions about topics associated with polarized opinions or controversy.
  • “Large language model–based solutions such as RECTIFIER can significantly enhance clinical trial screening performance and reduce costs by automating the screening process. However, integrating such technologies requires careful consideration of potential hazards and should include safeguards such as final clinician review. “ A study by Unlu and colleagues, recently published in NEJM AI, describes the use of the GPT-4 large language model, bolstered by retrieval-augmented generation to enhance its reliability, to screen candidates for clinical trial enrollment.
  • “Our approach yields two key findings. First, messages generated by GPT-4 were broadly persuasive, in some cases increasing support for an issue stance by up to 12 percentage points. Second, in aggregate, the persuasive impact of microtargeted messages was not statistically different from that of non-microtargeted messages…These trends hold even when manipulating the type and number of attributes used to tailor the message. These findings suggest—contrary to widespread speculation—that the influence of current LLMs may reside not in their ability to tailor messages to individuals but rather in the persuasiveness of their generic, nontargeted messages.” A research article recently published in the journal Proceedings of the National Academy of Science (PNAS) by Hackenburg and Margetts presents an analysis of the use of large language models in “microtargeting” audiences for persuasive messaging.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

Black and white cow standing in profile in a field against a background of blue sky and bright sun. Image credit: Teresa Vega/Unsplash
Image credit: Teresa Vega/Unsplash
  • “To get a sense of what the key questions are, STAT asked scientists who have long worked on influenza or in veterinary medicine what they viewed as the most pressing questions….The answers — their questions — roughly fit into three buckets: What’s happening on the farms among cows? What’s happening on farms among farmworkers? What’s happening to the virus and what does this all portend for H5N1, which for nearly three decades has danced around humans but has yet to take us on directly.” STAT News’ Helen Branswell takes a closer look at unanswered questions stemming from recent news that the H5N1 strain of bird flu had made the jump from avian species to dairy cattle in the U.S.
  • “When researchers collaboratively tackle challenging questions…they often encounter disagreements that are difficult to resolve… When disagreements persist, what should the team members do? Options may seem limited to coauthoring a paper they disagree with, delaying the next steps of the project in hopes that consensus will eventually be reached, or leaving without credit. However, there is a better approach: transparently documenting the disagreements.” A letter published in Science by Coles and colleagues presents an option for managing dissent and disagreement in team science as part of the natural process of scientific collaboration.
  • “An individualised, progressive walking and education intervention substantially reduced low back pain recurrence compared with a no treatment control group in adults who were not previously engaging in regular physical activity. This finding was consistent across the primary and two secondary recurrence outcomes. There were also reductions in back pain-related disability in the intervention group for up to 12 months, and the intervention had a high probability of being cost-effective from the societal perspective compared with a no treatment control.” A research article published in Lancet by Pocovi and colleagues presents findings from a randomized trial that evaluated the effectiveness of a walking intervention to help control lower back pain.
  • “LMICs [lower and middle-income countries] have historically been exploited with pathogen samples and data being used by HICs [higher-income countries] without ensuring equity or true partnership. The method often involves researchers from HICs conceptualising the study and local researchers merely collecting and shipping samples to HICs for genome sequencing. This approach strips LMICs of data and sample ownership, perpetuating their subservient roles. Meanwhile, researchers from HICs secure tenures and accolades, maintaining the imbalances of power.” A viewpoint article recently published in Lancet Digital Health by Saha and colleagues examines approaches to pathogen genomics that counter historical colonial practices that disadvantage lower and middle income countries.

COMMUNICATION, Health Equity & Policy

Photograph of a yellow traffic sign for “stop light ahead” standing in flood water that has reach the bottom corner of the sign. Image credit: Kelly Sikkema/Unsplash
Image credit: Kelly Sikkema/Unsplash
  • “MA’s [Medicare Advantage] overhead is mostly the price of their profits. A minority goes for profit; most funds the bureaucracy needed to upcode, cherry-pick, and erect barriers to high-value care….Could MA be reformed to make it a better deal for taxpayers? The failure of past efforts to rein in MA gaming and overpayment does not bode well for incremental reforms like tweaking the payment formula. Moreover, payment reductions would not curtail administrative waste or administrative burdens on physicians, or ease access for patients. A viewpoint article published in JAMA Internal Medicine by Gaffney and colleagues presents a critique of the Medicare Advantage program; in that same issue, Sachin H. Jain mounts a defense of Medicare Advantage while acknowledging the need for continuing improvements.
  • “Only recently, after more mainstream journalists took an interest in paper mills and related dubious endeavors, did Elsevier and other titans such as Springer Nature and Wiley begin acknowledging their existence, while claiming victim status instead of admitting they were complicit in creating business models and incentives that promoted such behavior. In the meantime, paper mills have been bribing journal editors to publish their clients’ work.” At the Washington Post, Retraction Watch co-founders Adam Marcus and Ivan Oransky plumb the recent floods of fraudulent research overwhelming the pipelines of scientific literature.
  • “This collection of essays examines innovative approaches to AI regulation and governance. It presents and evaluates proposals and mechanisms for ensuring responsible AI, from EU-style regulations to open-source governance, from treaties to CERN-like research facilities and publicly owned corporations. Drawing on perspectives from around the world, the collection underscores the need to protect openness, ensure inclusivity and fairness in AI, and establish clear ethical frameworks and lines of cooperation between states and technology companies.” A research paper comprising 9 essays on the challenges that AI presents for global governance is available from the Chatham House thinktank.
  • “Chatbots have not been designed as tools for information purposes, though they can perform very well in tasks primarily concerned with communication. The uncertainty about the quality of their outputs is due to their purpose and structure, not their degree of technological maturity. LLMs are probabilistic by design, meaning that falsehoods are — as those in tech culture would say — a feature, not a bug.” The second installment of a two-part essay by Leticia Antunes Nogueira and Jan Ove Rein at the Scholarly Kitchen makes a detailed case for not accepting AI chatbots as citeable sources of information.