AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

May 17, 2024

In this week’s Duke AI Health Friday Roundup: the persistence of bias in large language models; genomic study sheds light on mammalian adaptations; failure to publish code with recent AlphaFold paper irks scientists; questioning LLMs’ value proposition; application flags papers discussed on PubPeer; the hidden human expenses of cost-sharing in healthcare; questioning whether generative AIs are ready for primetime in patient care; a late foray into alchemy; much more:

AI, STATISTICS & DATA SCIENCE

“The biggest question raised by a future populated by unexceptional A.I., however, is existential. Should we as a society be investing tens of billions of dollars, our precious electricity that could be used toward moving away from fossil fuels, and a generation of the brightest math and science minds on incremental improvements in mediocre email writing?” In an essay for the New York Times, Julia Angwin questions whether recently touted progress in publicly available AIs, such as different versions of ChatGPT and other generative AIs, are living up to expectations.
“In this paper, we evaluate the downstream impact of dataset scaling on 14 visio-linguistic models (VLMs) trained on the LAION400-M and LAION-2B datasets by measuring racial and gender bias using the Chicago Face Dataset (CFD) as the probe. Our results show that as the training data increased, the probability of a pre-trained CLIP model misclassifying human images as offensive non-human classes such as chimpanzee, gorilla, and orangutan decreased, but misclassifying the same images as human offensive classes such as criminal increased.” In a preprint available from arXiv, Birhane and colleagues present findings from an evaluation of multimodal AI models that show correlations between racial and gender bias and dataset scales.
“This gap between promise and actual practice may seem surprising since health systems are no strangers to implementing cutting-edge technology – electronic medical records (EMR), imaging databases, etc. But generative AI as a technology is very different from past deployments….Health systems previously have implemented traditional AI, which is much more predictable: A clinical question was defined, a model was trained, and prediction algorithms assisted with clinical care. Release updates were gradual, and priorities were determined top-down. GenAI’s emergent capabilities and continued rapid development have upended these usual pathways to implementation.” In a blog post for Stanford’s Human-centered AI institute, Jindal and colleagues examine the peculiar challenges arising as the world of healthcare embarks on attempts to integrate generative large language model applications into the patient-care setting.
“Large language models (LLMs) can pass explicit bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both of these challenges by introducing two measures of bias inspired by psychology: LLM Implicit Association Test (IAT) Bias, which is a prompt-based method for revealing implicit bias; and LLM Decision Bias for detecting subtle discrimination in decision-making tasks.” In a preprint available from arXiv, Bai and colleagues probe the extent to which implicit biases lurk in large language models that do not reveal explicit biases based on commonly used benchmarks.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

“The controversy began with experiments conducted in 1924 by chemist and photography expert Adolf Miethe and his colleague Hans Stammreich at the Technical University Berlin. They found that black deposits collected from a mercury lamp, in which an electrical discharge was passed through mercury vapour, contained gold…Although gold was known to be a trace impurity in commercially available mercury, the German researchers had purified their mercury by distillation. What they had seen, they said, was the ‘formation of gold from mercury’ by chemical means.” An essay published in Chemistry World by Phillip Ball documents a curious early 20^th century foray into alchemy.
“Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth’s vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.” A sweeping study of mammalian genomes, published in Science by Christmas and colleagues, explores genetic factors that give rise to “exceptional” traits, some of which may have potential therapeutic applications.
“As different as these multicellular creatures might be, their bodies are all composed of the same type of cell — eukaryotic cells, which enclose their DNA in a nucleus and possess energy-producing mitochondria. The much older prokaryotic cells, which make up the vast kingdoms of bacteria and archaea and whose cells lack these features, never got complex multicellularity off the ground. They have evolved primitive forms of multicellularity, such as colonies of photosynthetic cyanobacteria. But there they stop. Even with a 1.5-billion-year head start on eukaryotes, prokaryotes never evolved this other way of living.” In an article for Quanta, Veronique Greenwood explores some potential genetic answers to the puzzle of why bacteria do not evolve into multicellular organisms.

COMMUNICATION, Health Equity & Policy

“As well as flagging PubPeer discussions, the plug-in alerts users if a study, or a paper that it cites, has been retracted. There are existing tools that alert academics about retracted citations; some can do this during the writing process, so that researchers are aware of the publication status of studies when constructing bibliographies. But with the new tool, users can opt in to receive notifications about further ‘generations’ of retractions — alerts cover not only the study that they are reading, but also the papers it cites, articles cited by those references and even papers cited by the secondary references.” In a news article for Nature, Dalmeet Singh Chawla describes a new tool that’s designed to provide automated alerts when a given article is being discussed on PubPeer, an online forum where users can raise issues about the scientific rigor and integrity of published papers.
“A group of researchers is taking Nature to task for publishing a paper earlier this month about Google DeepMind’s protein folding prediction program without requiring the authors publish the code behind the work.” A post at Retraction Watch gathers some disgruntled reactions to the recent publication (noted in a previous Roundup) of a paper describing advances in DeepMind’s latest protein-folding AlphaFold model.
“What happens when patients suddenly stop their medications? We study the health consequences of drug interruptions caused by large, abrupt, and arbitrary changes in price….We conclude that, far from curbing waste, cost-sharing is itself highly inefficient, resulting in missed opportunities to buy health at very low cost (⁠11,321 per life-year).” In a research article published in the Quarterly Journal of Economics, Chandra, Flack and Obermeyer take advantage of policy-induced pseudorandomization to examine the potential harms arising from the ubiquitous practice of cost-sharing in healthcare coverage.