AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

August 2, 2024

In this week’s Duke AI Health Friday Roundup: parsing the implications of implementing PREVENT cardiovascular risk equations; AI-generated training set causes model to melt down into nonsense; tracking the “expert gaze” to evaluate AI decision support; making time for researchers to think; framework addresses algorithmic bias for nurses; AI image detectors stumble at flagging faked Western blot images; spotting spin in systematic reviews; much more:

AI, STATISTICS & DATA SCIENCE

“We compared standard viewing of bitewing images without AI support versus viewing where AI support could be freely toggled on and off. We found that experts turned the AI on for roughly 25% of the total inspection task, and generally turned it on halfway through the course of the inspection. Gaze behavior showed that when supported by AI, more attention was dedicated to user interface elements related to the AI support, with more frequent transitions from the image itself to these elements….such interruptions in attention can lead to increased time needed for the overall assessment.” A research article published in NPJ Digital Medicine by Castner and colleagues used eye tracking software to assess how experts work with AI-enabled decision support tools and examined the implications of those findings for the integration of AI tools into clinical workflows.
“The researchers began by using an LLM to create Wikipedia-like entries, then trained new iterations of the model on text produced by its predecessor. As the AI-generated information — known as synthetic data — polluted the training set, the model’s outputs became gibberish. The ninth iteration of the model completed a Wikipedia-style article about English church towers with a treatise on the many colours of jackrabbit tails (see ‘AI gibberish’).” In a news article at Nature, Elizabeth Gibney describes recent research published in Nature that demonstrates the descent of an AI model into “gibberish” as it is fed AI-generated training material.
“In this work, we tackle a task which we call data mixture inference, which aims to uncover the distributional make-up of training data. We introduce a novel attack based on a previously overlooked source of information — byte-pair encoding (BPE) tokenizers, used by the vast majority of modern language models. Our key insight is that the ordered list of merge rules learned by a BPE tokenizer naturally reveals information about the token frequencies in its training data: the first merge is the most common byte pair, the second is the most common pair after merging the first token, and so on.” In a preprint available from arXiv, Hayase and colleagues present results from a study that used adversarial attacks to uncover details about an AI model’s training data via the tokenizers typically used by large language models.
“This study evaluates the efficacy of three free web-based AI detectors in identifying AI-generated images of Western blots, which is a very common technique in biology. We tested these detectors on a collection of artificial Western blot images (n=48) that were created using ChatGPT 4 DALLE 3 and on authentic Western blots (n=48) that were sampled from articles published within four biology journals in 2015; this was before the rise of generative AI based on large language models. The results reveal that the sensitivity…across various AI prevalence were low, for example reaching 0.1885 for Is It AI, 0.1429 for Hive Moderation, and 0.1189 for Illuminarty at an AI prevalence of 0.1. This highlights the difficulty in confidently determining image authenticity based on the output of a single detector.” In a preprint available at arXiv, Romain-Daniel Gosselin presents findings from a study that evaluated the effectiveness of “AI detector” software on identifying Western blot images produced by AI (faked or tampered-with Western blot images are a common form of data fraud in published scientific literature).

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

“The takeaway should not be that a large proportion of US adults receiving primary prevention will be ineligible for preventive therapies using PREVENT-ASCVD. Rather, the key message is that the establishment of optimal PREVENT-ASCVD risk thresholds for guiding therapy is critical in the development of future guidelines. PREVENT-ASCVD offers a pathway to more accurate and inclusive risk prediction, can be used to motivate sustained lifestyle changes, and can help focus statin and antihypertensive therapy on those most likely to benefit.” In an editorial published in JAMA, Grant, Ndumele and Martin parse the implications of a new study by Diao and colleagues that suggests that the implementation of new PREVENT guidelines for cardiovascular risk prediction may augur significant changes to the use of statin and antihypertensive therapies for the prevention of heart disease.
“Overall, we find no main effect of age when tasked with recognizing a theme in a piece of music, nor any significant interaction of age with familiarity, setting or musical training. Age was only a significant predictor for hit rate analysed separately. Familiarity was consistently a significant predictor, where performance was best overall for Eine Kleine Nachtmusik, a tonal and familiar piece. When removed, we also report a significant difference between the unfamiliar tonal (Pirate Waltz) and unfamiliar atonal (Unexpectedly Absent) pieces, where tonality conferred an advantage. Musical training had small positive effects, where formal musical training predicted a small reduction in false alarm rate and non-formal musical training predicted better task performance.” A research article published by Sauvé and colleagues in the journal PLOS One describes the curious ways that music seems to defy age-related memory loss in a series of controlled experiments.
“Drawing on principles recently articulated by the Office of the National Coordinator for Health Information Technology, we conducted a critical examination of the concept of health equity by design. We also reviewed recent literature describing the risks of artificial intelligence (AI) technologies in healthcare as well as their potential for advancing health equity. Building on this context, we describe the BE FAIR framework, which has the potential to enable nurses to take a leadership role within health systems by implementing a governance structure to oversee the fairness and quality of clinical algorithms.” In a paper published this month in the Journal of Nursing Scholarship, Cary and colleagues introduce a framework for identifying and mitigating biases in algorithmic applications in healthcare.

COMMUNICATION, Health Equity & Policy

“We therefore curated and used an open dataset of annual APC list prices from Elsevier, Frontiers, MDPI, PLOS, Springer Nature, and Wiley in combination with the number of open access articles from these publishers indexed by OpenAlex to estimate that, globally, a total of $8.349 billion ($8.968 billion in 2023 US dollars) were spent on APCs between 2019 and 2023….After adjusting for inflation, we also show that annual spending almost tripled from $910.3 million in 2019 to $2.538 billion in 2023, that hybrid exceed gold fees, and that the median APCs paid are higher than the median listed fees for both gold and hybrid.” A preprint article by Haustein and colleagues, available from arXiv, estimates the total global tab for APCs or author processing charges (that is, charges associated with the expense of open-access publication, typically borne by the authors or their institutions) over a four-year interval.
“One way to think about the practice of juggling research with e-mail and instant messaging is to visualize someone working next to a physical letterbox. Imagine opening and reading every letter as soon as it arrives, and starting to compose a reply, even as more letters drop through the box — all the while trying to do your main job. Researchers say that their to-do lists tend to lengthen, in part because colleagues can contact them instantly, often for good reasons. Researchers also often have to choose what to prioritize, which can cause them to feel overwhelmed.” A Nature editorial makes the case for studying the effects of instantaneous communications that can overwhelm scientists and render time for thought and reflection a rare commodity.
“We define ‘spin for harms in systematic reviews’ as any presentation of harm results that misleads readers about the review’s complete findings for harms through inappropriate reporting, interpretation, or extrapolation. In applying our framework to a random sample of 100 systematic reviews, we found that a third had at least 1 type of spin. Among the 58 reviews that assessed harm as an outcome, we found that nearly half (48%) of those had at least 1 type of spin. We also provided examples of how such spin manifests in systematic reviews and ways to rectify it.” An article by Qureshi and colleagues, published this month in the Annals of Internal Medicine, proposes a framework for identifying and acting on “spin” in systematic reviews.