AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

July 5, 2024

In this week’s Duke AI Health Friday Roundup: the value of clinical humility in the face of uncertainty; implications of generative AI for robotics; refining risk prediction for binary outcomes; managing retractions and the implications for information literacy; the limits of what scaling can achieve in AI; regulating AI with existing frameworks; haystack summarization and performance for large language models; lessons from China on managing hypertension; much more:

AI, STATISTICS & DATA SCIENCE

“…the feasibility of ever gathering sufficient data to develop a general-purpose robotics model is questionable. The complexity of real-world interactions is enormous, and high standards in reliability and robustness are needed. A high zero-shot performance of 50% or even 75% is an impressive achievement in the laboratory setting, but unacceptable in real-world interactions.” An editorial published in Nature Machine Intelligence examines the potential impact of generative AI for powering a new generation of robotic applications.
“When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures can also be used. We extend the previously published guidance to precisely estimate threshold-based performance measures.” A preprint by Whittle and colleagues, available from arXiv, presents a new approach for calculating the sample size needed for prediction models that evaluate binary outcomes.
“Research on scaling laws shows that as we increase model size, training compute, and dataset size, language models get “better”. The improvement is truly striking in its predictability, and holds across many orders of magnitude. This is the main reason why many people believe that scaling will continue for the foreseeable future, with regular releases of larger, more powerful models from leading AI companies…. But this is a complete misinterpretation of scaling laws. What exactly is a “better” model? Scaling laws only quantify the decrease in perplexity, that is, improvement in how well models can predict the next word in a sequence.” A detailed post by Arvind Narayanan and Sayash Kapoor at the AI Snake Oil Substack page breaks down the details behind what data scaling in AI development can – and can’t – accomplish.
“Our findings indicate that SummHay is an open challenge for current systems, as even systems provided with an Oracle signal of document relevance lag our estimate of human performance (56\%) by 10+ points on a Joint Score. Without a retriever, long-context LLMs like GPT-4o and Claude 3 Opus score below 20% on SummHay. We show SummHay can also be used to study enterprise RAG systems and position bias in long-context models.” A preprint by Laban and colleagues, available from HuggingFace and also arXiv, evaluates the use of a “haystack” summarization system across multiple LLMs for generating summaries from documents and citing sources.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

“One of the greatest challenges in medicine is the desire for clinical certainty. Our patients demand it. Our colleagues demand it. Many physicians, determined to offer their patients certainty, shy away from admitting that all too often they cannot offer assurance. Yet anyone who practices medicine knows that ‘it depends’ is often the most accurate answer we can give…As I’ve reflected on how clinicians responded to the COVID crisis, I’ve come to believe that being comfortable with saying ‘I don’t know’ is one of the most important ways we can build trust with our patients and one another.” In an essay for Harvard Medicine Magazine, Sachin Jain explores the concept of “clinical humility” as physicians inevitably are called upon to navigate the gap between what the patient needs and what the clinician actually knows.
“The U.S. has slid backward on control of high blood pressure, despite ready access to medicines and other tools to moderate its risks. Dan Jones, former president of the American Heart Association, thinks the nation can learn from China….Researchers there recently detailed the success of community health workers — well trained people but not M.D.s — helping thousands of people living with high blood pressure in rural regions. People who received a combination of blood pressure monitoring, medication adjustments, and health coaching from these nonclinicians saw their blood pressure readings go down significantly during the study’s four years, a testament to the impact of people known as ‘village doctors’ who went beyond usual care in the health care system.” STAT News’ Elizabeth Cooney reports (subscription required) on China’s approaches to managing hypertension – and their potential adaptability to the US as progress against hypertension falters.
“Moderna’s influenza vaccine candidate uses current mRNA technology leveraged successfully during the COVID-19 response, resulting in one of the first two FDA-authorized – and ultimately FDA-licensed – COVID-19 vaccines. In 2023, BARDA issued a request for proposal to Moderna and other companies to develop mRNA vaccines to prepare for potential public health emergencies (PHEs) caused by influenza viruses, such as avian influenza A(H5N1). mRNA vaccines have the potential to complement traditional vaccine technologies during a pandemic influenza emergency response. The Centers for Disease Control and Prevention (CDC) has said the risk to general human health from H5N1 is still low and this award is a part of ASPR preparedness efforts.” A press release from the U.S. Department of Health and Human Services describes the award of funding for vaccine development to meet a potential avian flu epidemic.

COMMUNICATION, Health Equity & Policy

“A drug is simply a molecule. Whether a specific molecule will resolve a specific problem for a specific patient at a specific time is a clear, testable hypothesis—a hypothesis that clinicians form and test each time we prescribe a molecule to our patients. Likewise, an AI is simply an algorithm. And the hypothesis that any particular AI algorithm will be effective in any specific situation is, of course, testable.” An editorial published in Science by Barron, Li, and Li makes the case for greater specificity when talking about – and evaluating – the capacity for an AI application to provide benefit for patients in a clinical context.
“Although there is currently no separate statutory authority to regulate AI in clinical care, we believe that the CoPs [CMS Condition of Payment] for hospitals already require them to develop policies and procedures related to the use of AI in their organizations detailing the qualifications and responsibilities of end users and those involved in monitoring safety issues when AI is used. Safety, transparency, accountability, equity, fairness, and usefulness are some of the principles that governance structures can implement to ensure trustworthy AI solutions are used for their patient care.” A viewpoint paper by Duke’s Lee Fleisher and Nicoleta Economou, published in JAMA Health Forum, examines how existing frameworks for ensuring patient safety in hospitals can be used to govern clinical AI applications.
“…unmarked retractions pose significant challenges to information literacy. If the fact that the retraction occurred isn’t discoverable, that’s an incredibly important piece of context that a reader doesn’t have when deciding if and how they want to use the publication. A reader who is unaware that an article has been retracted would be more likely to use that article in their research and practice, and perpetuate its misinformation.” Retraction Watch interviews the University of Regina’s Caitlin Bakker regarding recent research that examines how paper retractions are managed, and how that in turn can impact the integrity of the larger body of science.