AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

March 1, 2024

In this week’s Duke AI Health Friday Roundup: configuring health AI for human benefit; criticism erupts over figure in All of Us paper; risk assessment for open foundation models; deprecated authorship practices still common in life sciences; flagging cross-task inconsistency in unified models; promising findings for treating food allergies; gene duplication implicated in antimicrobial resistance; adding up generative AI’s environmental tab; using ChatGPT to evaluate research articles; much more:


Black and white photograph, taken from a low angle, showing a silhouetted tree trunk and its fractal branching pattern. Image credit: Mila Tovar/Unsplash
Image credit: Mila Tovar/Unsplash
  • “Given such ongoing problems and recent technological leaps, adopting AIH where it could help is a matter of urgency. At a moment when the complexity of modern medicine has surpassed the capacity of the human mind, only AIH will be able to perform many tasks. AIH thus seems to offer unparalleled potential for further medical progress, including for precision medicine — the right therapy, for the right patient, at the right time. Thus, it is in the interest of the public and the medical profession to hasten its adoption, so long as it is used safely and made maximally accessible for all.” A commentary published in Nature Medicine by members of the RAISE Consortium reflects on findings from a conference that discussed ways to ensure that health AI is deployed in ways that benefit patients and aid healthcare providers in their tasks.
  • As the twig is bent: “Leonardo da Vinci described geometric proportions in trees to provide both guidelines for painting and insights into tree form and function. Da Vinci’s Rule of trees further implies fractal branching with a particular scaling exponent α=2 governing both proportions between the diameters of adjoining boughs and the number of boughs of a given diameter. Contemporary biology increasingly supports an analogous rule with α=3 known as Murray’s Law. Here we relate trees in art to a theory of proportion inspired by both da Vinci and modern tree physiology….We find that both conformity and deviations from ideal branching create stylistic effect and accommodate constraints on design and implementation.” From a preprint article by Gao and Newberry, available at arXiv.
  • “To truly address the environmental impacts of AI requires a multifaceted approach including the AI industry, researchers and legislators. In industry, sustainable practices should be imperative, and should include measuring and publicly reporting energy and water use; prioritizing the development of energy-efficient hardware, algorithms, and data centres; and using only renewable energy. Regular environmental audits by independent bodies would support transparency and adherence to standards.” In an article for Nature, AI expert Kate Crawford considers the hidden environmental tab being run up by generative AI.
  • “…we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all of the computable functions and will therefore always hallucinate. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs.” A preprint article by Xu and colleagues, available from arXiv, suggests that hallucination may be an inescapable feature of large language models.
  • “…we introduce a benchmark dataset, CoCoCON, where we use contrast sets created by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label, and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art systems suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks.” A preprint article by Maharana and colleagues describes inconsistent results that arise in some “unified vision-language” models when prompted to perform a task (such as describing an image) with semantically different prompts.
  • “We take one set of analogy problems used to evaluate LLMs and create a set of ‘counterfactual’ variants-versions that test the same abstract reasoning abilities but that are likely dissimilar from any pre-training data. We test humans and three GPT models on both the original and counterfactual problems, and show that, while the performance of humans remains high for all the problems, the GPT models’ performance declines sharply on the counterfactual set….these models lack the robustness and generality of human analogy-making.” A preprint article by Lewis and Mitchell, available from arXiv, investigates the performance of contemporary LLMs on reasoning tasks versus humans.


A small pile of unshelled peanuts sitting on a brown wooden table top. Image credit: Isai Dzib/Unsplash
Image credit: Isai Dzib/Unsplash
  • “In persons with multiple food allergies who were as young as 1 year of age, omalizumab treatment for 16 weeks was superior to placebo in increasing the reaction threshold for peanut, cashew, egg, and milk; 67% of the participants who received omalizumab were able to successfully consume at least 600 mg of peanut protein (cumulative dose, 1044 mg, equivalent to approximately 4 peanuts), and 44% were able to successfully consume a cumulative dose of 6044 mg (equivalent to approximately 25 peanuts, the highest dose used in the first stage of the trial).” A research article published by Wood and colleagues in the New England Journal of Medicine highlights a remarkable new role for an asthma medication in treating food allergies.
  • “Many researchers noted the value of the data set for expanding genomic research to include a greater diversity of people. However, several prominent geneticists quickly expressed concern that the way the All of Us team depicted the diversity in its data set was overly simplistic…. The problem, critics said, is that UMAP creates blobs that look distinct while masking the inherent messiness in the data.” One of a clutch of genomics papers based on NIH All of Us data published in Nature journals last week (and featured in last week’s Roundup) has generated some concern among geneticists who worry that the paper’s central illustration may reinforce erroneous notions about race and genetic diversity. Science’s Jocelyn Kaiser has the story.
  • “Historically, emphasis has been placed on how genetic factors contribute to phenotypic variation within populations. However, an emerging concept is that self-reactive antibodies (autoantibodies) represent a critical yet largely underexplored factor that influences human health and disease. Investigating autoantibodies and their protective as well as pathological roles in disease may unlock new treatment paradigms, much like the prior study of genetics.” A perspective published in Science by Jaycox and colleagues surveys the autoantibody “reactome” and the role of autoantibodies in autoimmune diseases.
  • “The research shows bacteria exposed to higher levels of antibiotics often harbor multiple identical copies of protective antibiotic resistance genes. These duplicated resistance genes are often linked to “jumping genes” called transposons that can move from strain to strain. Not only does this provide a mechanism for resistance to spread, having multiple copies of a resistance gene can also provide a handle for evolution to generate resistance to new types of drugs.” A web article at Duke’s Pratt School of Engineering website highlights recently published work by Duke researchers that identifies a “smoking gun” implicated in the evolution of antibiotic-resistant pathogenic bacteria.

COMMUNICATION, Health Equity & Policy

An person is illustrated in a warm, cartoon-like style in green. They are looking up thoughtfully from the bottom left at a large hazard symbol in the middle of the image. The Hazard symbol is a bright orange square tilted 45 degrees, with a black and white illustration of an exclamation mark in the middle where the exclamation mark shape is made up of tiny 1s and 0s like binary code. To the right-hand side of the image a small character made of lines and circles (like nodes and edges on a graph) is standing with its ‘arms’ and ‘legs’ stretched out, and two antenna sticking up. It faces off to the right-hand side of the image. Image credit: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / Managing Data Hazards / CC-BY 4.0
Image credit: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / Managing Data Hazards / CC-BY 4.0
  • “The risk framework enables precision in discussing the misuse risk of open foundation models and is based on the threat modeling framework in computer security. For example, without clearly articulating the marginal risk of biosecurity concerns stemming from the use of open language models, researchers might come to completely different conclusions about whether they pose risks: open language models can generate accurate information about pandemic-causing pathogens, yet such information is publicly available on the Internet, even without the use of open language models.” A discussion paper by Kapoor and colleagues posted on the Stanford institute for Human-Centered Artificial Intelligence website presents a risk evaluation framework for open foundation models.
  • “…the apparent high prevalences of HA (regardless of how questions are phrased and definitions used), confirm previous findings that authorship issues are among the most prevalent Questionable Research Practices (QRPs) in science, affecting both young and old researchers…A slight ray of hope is that we also found an indication that the prevalence of HA when respondents are asked to declare co-author(s) contributions and these are compared to the ICMJE criteria has been decreasing over time.” A study published in Scientific Reports by Meursinge Reynders shares some depressing findings about the continued prevalence of guest authorship in the biomedical literature.
  • “The results suggest that ChatGPT 4.0 can write plausible REF reviews of journal articles and has a weak capacity to estimate REF scores, but that this is probably due to an ability to differentiate between research that is and isn’t high quality… Its evaluative reports are primarily derived from the article itself in terms of information about significance, rigour, and originality. It is not clear why it can score articles with some degree of accuracy, but it might typically deduce them from author claims inside an article rather than by primarily applying external information.” A preprint article by Thelwall, available at arXiv, evaluates the ability of ChatGPT 4.0 to assess papers for research quality.
  • “To ensure users ‘credit’ them, many institutions choose CC licenses (which require “attribution”) to release faithful reproductions of public domain material. This is bad practice. Digital reproductions of public domain materials should remain in the public domain and thus be shared under CC0 or PDM….As a best practice, CC recommends a simple framework to create behavioral change and encourage positive outcomes through ‘nudges.’” New guidelines from Creative Commons spell out best practices for sharing material from collections that are in the public domain.