AI Health

Friday Roundup

The AI Health Friday Roundup highlights the week’s news and publications related to artificial intelligence, data science, public health, and clinical research.

December 8, 2023

In this week’s Duke AI Health Friday Roundup: exploring LLMs’ capacity for inductive reasoning; Google debuts new Gemini LLM; structural racism and lung cancer risk; “passive fatigue” behind virtual meeting burnout; fruit flies suggest approach for generative AI learning; simple attack prompt can make LLMs disgorge sensitive training data; early warning for ovarian cancer; rating LLM trustworthiness; the global warming contributions of a digital pathology deep learning system; much more:

AI, STATISTICS & DATA SCIENCE

“In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement…we observe that LMs are phenomenal hypothesis proposers (i.e., generating candidate rules), and when coupled with a (task-specific) symbolic interpreter…this hybrid approach achieves strong results across inductive reasoning benchmarks…. However, they also behave as puzzling inductive reasoners, showing notable performance gaps between rule induction (i.e., identifying plausible rules) and rule application (i.e., applying proposed rules to instances), suggesting that LMs are proposing hypotheses without being able to actually apply the rules.” A research paper by Qiu and colleagues, available as a preprint from arXiv, explores whether large language models are capable of applying the tools of inductive reasoning.
“…by modelling a robust Drosophila learning system that actively regulates forgetting with multiple learning modules, we propose a generic approach that appropriately attenuates old memories in parameter distributions to improve learning plasticity, and accordingly coordinates a multi-learner architecture to ensure solution compatibility.” A research article published in Nature Machine Intelligence by Wang and colleagues reports on an attempt to incorporate AI learning systems modeled on fruit fly neural architecture that are capable of flexibility in learning without incurring the condition known in AI research as “catastrophic forgetting.”
“I was able to find verbatim passages the researchers published from ChatGPT on the open internet: Notably, even the number of times it repeats the word “book” shows up in a Google Books search for a children’s book of math problems. Some of the specific content published by these researchers is scraped directly from CNN, Goodreads, WordPress blogs, on fandom wikis, and which contain verbatim passages from Terms of Service agreements, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, a casino wholesaling website, news blogs, and random internet comments.” At 404, Jason Koebler reports on an experiment by Google DeepMind researchers that resulted in ChatGPT disgorging training data – some of it sensitive.
“We’ve been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.” In a post on Google’s blog, CEO Sundar Pichai and DeepMind CEO Demis Hassabis introduce Gemini, a new large language model intended to compete with OpenAI’s GPT.
“Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially due to the reason that GPT-4 follows the (misleading) instructions more precisely.” A research paper presented by Wang and colleagues at NeurIPS 2023 presents a system for assessing trustworthiness in GPT large language models and offers findings from a test drive of such an evaluation system.

BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH

“Our findings suggest that widespread use of deep learning in pathology might have considerable global-warming potential. The medical community, policy decision makers, and the public should be aware of this potential and encourage the use of CO₂ eq emissions reduction strategies where possible.” A modelling study published by Sadr and colleagues in Lancet Digital Health evaluates the carbon footprint of deep learning in digital pathology.
“…we demonstrated that the early detection of HGSOC [high-grade serious ovarian cancer] is potentially achievable through SCNA analysis of DNA extracted from archival Pap test smears of pre-HGSOC women. The analysis of cell swabs for the early detection of HGSOC has been investigated in the past because the procedure involved is noninvasive and well tolerated. Furthermore, cervical cell swabs offer the potential for population-wide screening. They can be easily incorporated into routine gynecological examinations, rendering them accessible to many women. This accessibility could lead to increased early detection rates and ultimately improve survival rates for ovarian cancer.” A research article published in Science Translational Medicine by Paracchini and colleagues presents findings from an attempt to develop an early ovarian cancer test using DNA analysis of archived Pap test samples.
“Given the dossier findings, its authors want all clinical testing of 3K3A-APC halted for now. Multiple neurologists and neuroscientists who reviewed the dossier for Science ‘To have a fourfold increase in mortality in the first few days of giving the drug really gives me pause,’ says Wade Smith, a neurologist at the University of California, San Francisco. Smith found the whistleblower report so disturbing that he couldn’t sleep the night after he read it.” Science’s Charles Piller reports on serious whistleblower allegations of potential misconduct in the development and testing of a neurological drug meant to treat ischemic stroke.
“The findings of this scoping review suggest that structural racism contributes to unequal exposure to lung cancer risk factors and thus to disparate lung cancer risk among racial and ethnic minority groups. Addressing racial and ethnic inequities in lung cancer risk will require prioritization and investments in large-scale observation studies to allow for intervention creation by health care professionals, public health stakeholders, and policy makers.” A scoping review published by Bonner and colleagues in JAMA Oncology assess the impact of structural racism on the risk of lung cancer.

COMMUNICATION, Health Equity & Policy

“Using subjective and cardiac measures (heart rate variability), we investigated the relationships between virtual versus face-to-face meetings and different types of fatigue (active and passive) among 44 knowledge workers during real-life meetings (N = 382). Our multilevel path analysis revealed a link between virtual meetings and higher levels of passive fatigue, which then impacted cognitive performance.” You’re not so much tired as bored: A research article by Nurmi and Pakarinen published in the Journal of Occupational Health Psychology explores the precise nature of virtual meeting burnout.
“Despite these risks, educators should not avoid using LLMs. Rather, they need to teach students the chatbots’ strengths and weaknesses and support institutions’ efforts to improve the models for education-specific purposes. This could mean building task-specific versions of LLMs that harness their strengths in dialogue and summarization and minimize the risks of a chatbot providing students with inaccurate information or enabling them to cheat.” A Nature editorial recommends that teachers take the plunge on engaging with chatbot-style LLMs in the instructional environment.
“What’s needed is a more transparent and inclusive approach….Health-care institutions, academic researchers, clinicians, patients and even technology companies worldwide must collaborate to build open-source LLMs for health care — models in which the underlying code and base models are easily accessible…What we’re proposing is similar to the Trillion Parameter Consortium (TPC) announced earlier this month — a global consortium of scientists from federal laboratories, research institutes, academia and industry to advance AI models for scientific discovery…” Also in Nature comes this commentary from Toma and colleagues, who suggest that generative AI’s potential for benefit in healthcare could be blunted if big tech dominates the field.