In this week’s Duke AI Health Friday Roundup: transparency index for foundation models; upending assumptions about 1918 flu; disparity dashboards considered; fixing drift in image classifiers; COVID trial shows no benefit for vitamin C; Excel data gremlin vanquished; LLMs reveal medical racism, bias when queried; external validation not enough for clinical AI; “data poison” fends off generative AI; NIH changes grant evaluation criteria; much more:
AI, STATISTICS & DATA SCIENCE
- “To assess the transparency of the foundation model ecosystem and help improve transparency over time, we introduce the Foundation Model Transparency Index. The 2023 Foundation Model Transparency Index specifies 100 fine-grained indicators that comprehensively codify transparency for foundation models, spanning the upstream resources used to build a foundation model (e.g. data, labor, compute), details about the model itself (e.g. size, capabilities, risks), and the downstream use (e.g. distribution channels, usage policies, affected geographies).” A group of authors from Stanford, MIT, and Princeton have released a preprint describing their Foundation Model Transparency Index, a tool for assessing the degree of transparency available for widely available generative AIs.
- “The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth.” MIT Technology Review’s Melissa Heikkilä reports on a new tool that “poisons” image data to forestall their use by generative AI.
- “We report that four major commercial LLMs all had instances of promoting race-based medicine. Since these models are trained in an unsupervised fashion on large-scale corpuses from the internet and textbooks, they may incorporate older, biased, or inaccurate information since they do not assess research quality….Most of the models appear to be using older race-based equations for kidney and lung function, which is concerning since race-based equations lead to worse outcomes for Black patients.” A brief communication published in NPJ Digital Medicine by Omiye and colleagues documents how harmful racist canards and other forms of bias have been absorbed by commercially available, off-the-shelf large language models.
- “Excel’s automatic conversions are intended to make it easier and faster to input certain types of commonly entered data — numbers and dates, for instance. But for scientists using quick shorthand to make things legible, it could ruin published, peer-reviewed data, as a 2016 study found.” Hallelujah! The Verge’s Wes Davis reports that Microsoft has fixed the date-conversion glitch/feature that was wreaking havoc on scientific datasets, especially genomic ones.
- “We have proposed an effective method to correct clinically relevant performance drift attributable to changes in the image acquisition pipeline (e.g., replacement of scanners, updates of image processing software, use of different staining protocols). This has been demonstrated across various scenarios from one-off changes when deploying to a new site to gradual and progressive drifts in acquisition characteristics over time.” A research article published last week in Nature Communications by Roschewitz and colleagues describes an approach for automatically correcting AI-based medical imaging classification systems when their performance begins to drift.
- “We argue that it is a fallacy to judge a model’s generalizability, reliability, safety or utility from external validation alone, especially when operational inputs are used. Using external validation to make deterministic, broad conclusions about generalizability and subsequent reliability can lead us astray. We need scalable validation techniques that work for models across healthcare facilities with vastly different operational, workflow and demographic characteristics.” A perspective published in Nature Medicine by Youssef and colleagues argues that external validation of clinical AI applications is not sufficient for ensuring safety and efficacy of these tools.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “…the narrative of the 1918 flu has been that it was a unique killer, taking down all ages no matter the state of their health, and mysteriously most lethal to people whose immune systems were most robust. Now, though, an analysis of skeletons of people who died in 1918 shows that story may not be correct. Their bones retain evidence of underlying frailty, from other infections or malnutrition. That finding could both rewrite the history of 1918 and affect how we plan for pandemics to come.” An article in Wired by Maryn McKenna re-examines conventional wisdom about the epidemiology of the 1918 influenza pandemic.
- “The National Institutes of Health is taking steps to simplify its process to assess the scientific merit of research grant applications and mitigate elements that have the potential to introduce bias into review. The changes will help reviewers focus on the potential for proposed research to advance scientific knowledge and improve human health. Previously, five criteria were individually scored using a common scale; the simplified review framework reorganizes these criteria into three factors. Two of these factors – importance of research and rigor and feasibility – are scored using a common scale. A third factor, expertise and resources, is evaluated for sufficiency only and not given a numeric score.” A recent announcement from the National Institutes of Health describes the agency’s new framework for reviewing grant funding proposals, which includes new steps meant to eliminate the advantage conferred by the reputation of some institutions.
- “From the standpoint of the bedside clinician, the results of the harmonized trial are clear: vitamin C should not be used as a treatment for hospitalized patients with COVID-19. Even though prior, significantly smaller trials showed a potential benefit for vitamin C in patients with COVID-19, there is a well-known risk of bias with small studies, and the harmonized trial in totality strongly supports futility with vitamin C with the chance of harm being significantly greater than the small possibility of efficacy.” In an editorial for JAMA, Jabaley and Coopersmith place findings from a “harmonized” pair of recently published trials of vitamin C for COVID and for community acquired pneumonia in perspective.
- “The main underlying risk factors for gun violence are also amenable to a public health approach: intimate partner violence, alcohol use, and heat. Mass shootings are rare; shootings in the home, where women are most often the victims, are not—yet the former still dominate popular discourse around guns. As a recent Lancet Commission concluded, improvements in health equity and gender equality can put societies on pathways to enduring peace….However, a health approach alone is insufficient.” An editorial in the Lancet announces the creation of a Lancet Commission on guns and health.
- “In this study, we screened 1511 candidate predictors for RSV hospital admission, covering pregnancy and birth information, the infant’s diagnoses, and extensive disease and medication information for parents and siblings. We present a simple clinical prediction model for risk of RSV hospital admission during the first year of life, with satisfactory predictive performance across different RSV epidemics in two countries. The model showed potential clinical utility in decision curve analysis, and its performance was fair across different strata of parental income.” A research article published in Lancet Digital Health by Vartiainen and colleagues describes the creation and testing of clinical risk prediction model for RSV infection in infants.
COMMUNICATION, Health Equity & Policy
- “…most published dashboards were generated by academic medical centres and focused on either institutional or regional catchment areas. Disaggregation by race and ethnicity categories was most common, although some included disaggregation by sex, language, and age. Most disparity dashboards were static and not repeatedly updated after the final versions were released, suggesting that the demands of maintaining disparity dashboard….exceeded organisational capacities.” A review article published in Lancet Digital Health by Gallifant and colleagues surveys the creation, use, and updating of “disparity dashboards” intended to address issues relating to health inequities.
- “We are editors of bioethics and humanities journals who have been contemplating the implications of this ongoing transformation. We believe that generative AI may pose a threat to the goals that animate our work but could also be valuable for achieving those goals. We do not pretend to have resolved the many social questions that we think generative AI raises for scholarly publishing, but in the interest of fostering a wider conversation about these questions, we have developed a preliminary set of recommendations about generative AI in scholarly publishing.” A perspective article by a group of scholarly editors, published in the Hastings Center Report, lays out some preliminary approaches to dealing with the use of AI tools in research and publishing.
- “While the tools or methods employed may become obsolete with time, peer review’s underlying goal or purpose remains unchanged. Yes, the academic landscape will evolve further as technology advances, societal expectations change, and research practices adapt to new challenges. Hence, the peer review process too will need to evolve with the changing times….This evolution requires a commitment to adapt, innovate, and learn from experiences.” A guest post by CACTUS’ Roohi Ghosh at Scholarly Kitchen makes a case not for abandoning the often-cumbersome and disappointing process of peer review, but rather for updating it to meet modern challenges.
- “The Web created the potential for a more decoupled publishing system in which articles are initially disseminated by preprint servers and then undergo evaluation elsewhere. To build this future, we must first understand the roles journals currently play and consider what types of content screening and review are necessary and for which papers. A new, open ecosystem involving preprint servers, journals, independent content-vetting initiatives, and curation services could provide more multidimensional signals for papers and avoid the current conflation of trust, quality, and impact.”…and continuing this theme: An essay published in PLOS Biology by Richard Sever surveys the potential for change in scholarly publishing and the need for adaptation ushered in by technological developments.
- “This publication…aims to deliver an overview of regulatory considerations on AI for health that covers the following six general topic areas: documentation and transparency, the total product lifecycle approach and risk management, intended use and analytical and clinical validation, data quality, privacy and data protection, and engagement and collaboration. This overview is not intended as guidance or as a regulatory framework or policy.” The World Health Organization has released a discussion paper focused on regulatory issues in AI.