The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

February 16, 2024

In this week’s Duke AI Health Friday Roundup: the toll of digital disconnection; teaching LLMs to mimic doctors’ cognitive approaches; prosthetic allows user to sense temperature; a benchmark for LLMs designed to diagnose rare diseases; bibliometric analysis shows lack of clarity regarding genAI use in scientific publishing; LLMs can autonomously hack websites; regulatory frameworks for thinking about AI; the lasting epigenetic effects of smoking, much more:


This picture is made up of 9 images in rows of 3. Each row shows a different image of a pill bottle spilling out pills onto a plain surface, on yellow or white backgrounds. On one side, the image is an original photograph. The next two iterations show it getting represented in progressively larger blocks of colour. Image credit: Rens Dimmendaal & Banjong Raksaphakdee / Better Images of AI / CC-BY 4.0
Image credit: Rens Dimmendaal & Banjong Raksaphakdee / Better Images of AI / CC-BY 4.0
  • “…we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care.” A Brief Communication published in NPJ Digital Medicine by Savage and colleagues presents research exploring whether large language models can be coaxed to mimic human reasoning processes.
  • “By mathematically probing large language models for weaknesses, researchers have discovered weird chatbot behaviors. Adding certain mostly unintelligible strings of characters to the end of a request can, perplexingly, force the model to buck its alignment…The attacks also reveal how, despite chatbots’ often convincingly humanlike performance, what’s under the hood is very different from what guides human language.” Science News’ Emily Conover reports on AI researchers’ efforts to safeguard LLM chatbots from prompt-based tampering with their behavioral guidelines.
  • “…we introduce RareBench, a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases. Meanwhile, we have compiled the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology, leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, significantly enhancing LLMs’ diagnostic performance.” In a preprint available from arXiv, Chen and colleagues present a benchmark for evaluating the performance of LLM applications trained to diagnose rare diseases.
  • In a blog post at OpenAI, the ChatGPT originators describe new features for the chatbot that allows users to prompt it to both “remember” and “forget” things within a chat session.
  • “In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs.” In a preprint available from arXiv, Fang and colleagues present research showing that LLMs can be trained to autonomously engage in web hacking.


A hand-drawn sketch in coloured pens depicts a small uniform row of houses at the bottom of the picture. From each of the houses, 0s and 1s representing digital data are floating up in plumes which look like smoke from the chimneys of the houses; but these are not uniform: they represent the different types of data from each house. Image credit: Joahna Kuiper / Better Images of AI / Little data houses / CC-BY 4.0
Image credit: Joahna Kuiper / Better Images of AI / Little data houses / CC-BY 4.0
  • “We come into this world craving the presence of others. But a few modern trends…spread us out as adults in a way that invites disconnection….screens have replaced a chunk of our physical-world experience with a digital simulacrum that has enough spectacle and catastrophe to capture hours of our greedy attention. These devices so absorb us that it’s very difficult to engage with them and be present with other people.” In an article for The Atlantic, Derek Thompson probes the epidemic of social isolation pervading America – and its potential effects on mental health and wellbeing.
  • “A team of researchers in Italy and Switzerland attached the device, called ”MiniTouch,” to the prosthetic hand of a 57-year-old man named Fabrizio, who has an above-the-wrist amputation. In tests, the man could identify cold, cool and hot bottles of liquid with perfect accuracy; tell the difference between plastic, glass and copper significantly better than chance; and sort steel blocks by temperature with around 75 percent accuracy…” Science News’ Simon Makin reports on progress in developing prosthetic limbs that are capable of allowing their users to sense the temperature of objects they touch.
  • “Among the environmental factors studied, the authors report that smoking-related variables showed the most statistically significant associations across immune stimulations. Smoking was found to exert a transient effect on immediate, non-specific, innate immune responses. Surprisingly, its enduring influence on specialized adaptive immune responses was found to persist well beyond smoking cessation….The study reveals that the association between smoking and cytokines in the adaptive branch of the immune system is shaped by a specific epigenetic process called DNA methylation, which modifies DNA sequences in the nucleus.” In a news article at Nature, Luo and Stent discuss recently published research suggesting that smoking causes lasting effects to the immune system.

COMMUNICATION, Health Equity & Policy

Close-up photograph of an open book, taken from top-down, perspective aligned with the book’s spine as the pages fan out. Image credit: Jonas Jacobsson/Unsplash
Image credit: Jonas Jacobsson/Unsplash
  • “Guidelines by some top publishers and journals on the use of GAI by authors are lacking. Among those that provided guidelines, the allowable uses of GAI and how it should be disclosed varied substantially, with this heterogeneity persisting in some instances among affiliated publishers and journals. Lack of standardization places a burden on authors and could limit the effectiveness of the regulations. As GAI continues to grow in popularity, standardized guidelines to protect the integrity of scientific output are needed.” A bibliometric analysis published in BMJ by Ganjavi and colleagues surveys the current state of biomedical publishing policies with respect to the use of generative AI (such as LLM chatbots and image generators).
  • “The FDA has catalyzed and organized an entire field of expertise that has enhanced our understanding of pharmaceuticals and creating and disseminating expertise across stakeholders far beyond understanding incidents in isolation. AI is markedly opaque in contrast: mapping the ecosystem of companies and actors involved in AI development (and thus subject to any accountability or safety interventions) is a challenging task absent regulatory intervention.” A policy brief from the AI Now Institute explores how an FDA-analog might work in the context of developing regulation for artificial intelligence applications (H/T Sophia Bessias).
  • “The list is intended to spotlight the top 2% of influential scientists and intriguingly includes 235 authors purported to have publishing careers spanning over 80 years. For instance, William S. Marshall, a biologist from St. Francis Xavier University, is credited with a staggering 187 years of publishing, from 1834 to 2021. William Marshall is a retired professor….Similarly, Tom L. Blundell of Cambridge University is listed with a publication history beginning in 1853, yet he was born in 1942. Another historical figure, Lord Kelvin, is noted for publishing from 1849 until 2011, including posthumous publications after he died in 1907.” A guest post at the Scholarly Kitchen by Akirah Abduh points out some curious anomalies that emerged from a review of an annually updated bibliometric list that identifies the top tier of scientific productivity.