AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

April 5, 2024

In this week’s Duke AI Health Friday Roundup: benchmarking LLMs for extracting oncology data from charts; greenery and mental well-being; LLMs get around information asymmetry; tiny artificial liver shows promise for treating liver failure without transplantation; network analysis reveals fraudulent “paper mills”; turning a skeptical eye on LLM performance on bar exams; the serious health impacts of loneliness; much more:


An person is illustrated in a warm, cartoon-like style in green. They are looking up thoughtfully from the bottom left at a large hazard symbol in the middle of the image. The Hazard symbol is a bright orange square tilted 45 degrees, with a black and white illustration of an exclamation mark in the middle where the exclamation mark shape is made up of tiny 1s and 0s like binary code. To the right-hand side of the image a small character made of lines and circles (like nodes and edges on a graph) is standing with its ‘arms’ and ‘legs’ stretched out, and two antenna sticking up. It faces off to the right-hand side of the image.Image credit: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / Managing Data Hazards / CC-BY 4.0
Image credit: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / Managing Data Hazards / CC-BY 4.0
  • “Before health systems rush to integrate such technology into clinical practice, however, it is important to put this work into context. The evaluations of Van Veen et al. were conducted on retrospective data sets, not in prospective scenarios in which the generated summaries were for actual patients of the clinicians involved in the user study. Furthermore, although the datasets were diverse, they do not necessarily represent high-priority targets to improve clinical workflows.” A viewpoint article by Kirk Roberts, published in Nature Medicine, offers a skeptical perspective on the recent efforts aimed at applying large language models to the problem of burdensome medical documentation.
  • “A central argument of this paper asserts that artificial agents, powered by language models, can contribute to mitigating the pervasive issue of information asymmetry in information markets. These agents come with dual capabilities: a capacity to evaluate the quality of privileged information and the ability to forget. By granting these agents temporary access to proprietary information, vendors significantly reduce the risk of expropriation while allowing the agents to assess the information’s relevance to a specific query or task.” A preprint by Rahaman and colleagues, available from arXiv, describes a role for large language models in overcoming the Arrow Information Paradox.
  • “When you’re trying to get intuition for a problem, one thing that helps is to start with some simplifying assumptions. Those assumptions are usually not true, but they can help you come up with a road map. Say, ‘If I had an elephant, I could get over the mountains. Of course, I don’t have an elephant. But if I did, here’s how I would do it.’ And then you realize, ‘Well, maybe I don’t need an elephant for this step. A mule would be fine.’” Quanta’s Ben Brubaker interviews mathematician Russell Impagliazzo to reveal the playful origins of his foundational work in cryptography and complexity theory.
  • “The dataset facilitated the benchmarking of the zero-shot capability of LLMs in oncologic history summarization. It demonstrated the surprising zero-shot capability of the GPT-4 model in synthesizing oncologic history from the HPI and A&P sections, including tasks requiring advanced linguistic reasoning, such as extracting adverse events for prescribed medications and the reason for their prescription. The model, however, also showed room for improvement in causal inference, such as inferring whether a symptom was caused by cancer.” A research article published in NEJM AI by Sushil and colleagues presents findings from a study that established benchmarking for information related to oncology in patient chart notes, facilitating the comparison of different LLMs for automated extraction and summarizing of information from those notes.
  • “…the study team used an algorithm to identify patients in the electronic health record in real time. Practice facilitators then worked with the participating primary care providers and patients to meet blood pressure targets, promote use of appropriate medications, achieve goals for blood glucose control, and engage in other guideline-directed care. The intervention period lasted 12 months, and the primary outcome was hospitalization for any reason.” A news article posted on Rethinking Clinical Trials reports on the publication in the New England Journal of Medicine of primary results from the NIH Pragmatic Clinical Trials Collaboratory’s ICD-PIECES study, which used a variety of innovative approaches, including an algorithm to identify candidate participants via EHR data and a stepped-wedge study design, to evaluate an intervention for improving care for patients with chronic kidney disease, hypertension and diabetes.


Photograph shows an empty park bench painted dark green, sitting against a background of greenery with a carpet of fallen yellow leaves underneath and on the bench itself. Image credit: Will Paterson/Unsplash
Image credit: Will Paterson/Unsplash
  • “Residential greenness is considered a unique and potentially modifiable exposure construct to reduce physiological stress and improve human health. Here this study aims to investigate the longitudinal relationships of residential greenness with incident depression and anxiety and to explore and compare the pathways in which greenery may influence mental health.” A study by Wang and colleagues published in Nature Mental Health finds that the presence of greenery in the local environment was associated with a reduced risk of depression and anxiety.
  • “The robot can move against flows of various substances within pipes featuring complex geometries and diverse materials. Solely powered by flow, the robot can transport cylindrical payloads with a diameter of up to 55% of the pipe’s diameter and carry devices such as an endoscopic camera for pipeline inspection…” Totally tubular tiny robots: A research article by Hong and colleagues, published in Science Robotics, demonstrates how tiny “kirigami-wheel” robots can traverse tortuous, tubular channels, drawing power from the flow of fluids within those tubes.
  • “The approach is unusual: researchers injected healthy liver cells from a donor into a lymph node in the upper abdomen of the person with liver failure. The idea is that in several months, the cells will multiply and take over the lymph node to form a structure that can perform the blood-filtering duties of the person’s failing liver.” Nature’s Max Kozlov reports on a pioneering effort to treat liver failure through the implantation of a tiny “mini-liver” in a patient’s lymph node.
  • “…why does feeling alone lead to poor health? Over the past few years, scientists have begun to reveal the neural mechanisms that cause the human body to unravel when social needs go unmet. The field ‘seems to be expanding quite significantly’, says cognitive neuroscientist Nathan Spreng at McGill University in Montreal, Canada. And although the picture is far from complete, early results suggest that loneliness might alter many aspects of the brain, from its volume to the connections between neurons.” In a news feature for Nature, Saima May Sidik explores the health implications of loneliness.

COMMUNICATION, Health Equity & Policy

A touring/cruiser-style bicycle is propped against a steel pole, to which it is secured by a looped steel cable lock. A rough-surfaced exterior wall is in the background. Image credit: Indre B/Unsplash
Image credit: Indre B/Unsplash
  • “If a crime is easy to do, pays well, and has no punishment, then that is an endemic crime. It isn’t going away because the people who do it have incentives and don’t have disincentives…Organised research fraud fits a similar niche.” In a post on Medium, Adam Day makes a case for network analysis tools in identifying research fraud – particular the kind originating from so-called “paper mills” that produce low- or no-quality “research” articles at large scale for profit.
  • “…to the extent that one believes the UBE to be a valid proxy for lawyerly competence, these results suggest GPT-4 to be substantially less lawyerly competent than previously assumed, as GPT-4’s score against likely attorneys (i.e. those who actually passed the bar) is 48th percentile. Moreover, when just looking at the essays, which more closely resemble the tasks of practicing lawyers and thus more plausibly reflect lawyerly competence, GPT-4’s performance falls in the bottom 15th percentile. These findings align with recent research work finding that GPT-4 performed below-average on law school exams…” A research article by Eric Martínez published in the journal Artificial Intelligence and Law critiques claims that the GPT-4 large language model was able to exceed the performance of the majority of human test-takers on bar exams.
  • “You are invited to participate in a research study evaluating medical students’ perspectives and usage of AI-based language models in practice. You will be asked to answer questions about if you have used AI large language models in practice, which AI large language models you have used, how you have used AI large language models, and information about your medical school. All survey responses are captured anonymously.” A study being conducted by researchers at the University of California – San Francisco is seeking input from medical students on the degree to which they have used large language models in their work.