In this week’s Duke AI Health Friday Roundup: Department of Commerce announces debut of US AI Safety Institute Consortium; AI literature may be facing its own replication crisis; where to next for public health?; FDA eyes bias in pulse oximetry; California legislators propose new AI regulations; AI benchmarks easily perturbed; PLOS looks back on four years of open peer review; Google makes its Gemini AI available for some products and customer; much more:
AI, STATISTICS & DATA SCIENCE
- “The consortium includes more than 200 member companies and organizations that are on the frontlines of creating and using the most advanced AI systems and hardware, the nation’s largest companies and most innovative startups, civil society and academic teams that are building the foundational understanding of how AI can and will transform our society, and representatives of professions with deep engagement in AI’s use today. The consortium represents the largest collection of test and evaluation teams established to date and will focus on establishing the foundations for a new measurement science in AI safety.” A web posting at the National Institute for Standards and Technology (NIST) details this week’s announcement by Department of Commerce Secretary Gina Raimondo of the debut of the US AI Safety Institute Consortium (AISIC).
- “Machine learning (ML) and other types of AI are powerful statistical tools that have advanced almost every area of science by picking out patterns in data that are often invisible to human researchers. At the same time, some researchers worry that ill-informed use of AI software is driving a deluge of papers with claims that cannot be replicated, or that are wrong or useless in practical terms….There has been no systematic estimate of the extent of the problem, but researchers say that, anecdotally, error-strewn AI papers are everywhere.” Nature’s Philip Ball looks at growing academic unease about the reliability of results being reported from AI projects spanning a myriad of scientific and health-related fields.
- “Often, the published leaderboard rankings are taken at face value – we show this is a (potentially costly) mistake. Under existing leaderboards, the relative performance of LLMs is highly sensitive to (often minute) details. We show that for popular multiple choice question benchmarks (e.g. MMLU) minor perturbations to the benchmark, such as changing the order of choices or the method of answer selection, result in changes in rankings up to 8 positions. We explain this phenomenon by conducting systematic experiments over three broad categories of benchmark perturbations and identifying the sources of this behavior.” A preprint article by Alzahrani and colleagues, available from arXiv, suggests that many of the benchmarks used to evaluated large language model performance are sensitive to perturbation and may not be reliable.
- “In the biggest mass-market AI launch yet, Google is rolling out Gemini, its family of large language models, across almost all its products, from Android to the iOS Google app to Gmail to Docs and more. You can now get your hands on Gemini Ultra, the most powerful version of the model, for the first time….But some will have to wait longer than others to play with Google’s new tools. The company has announced rollouts in the US and East Asia but said nothing about when the Android and iOS apps will come to the UK, the EU, and Switzerland.” MIT Technology Review’s Will Douglas Heaven reports on Google’s release of its Gemini AI, which is now being bundled into a variety of popular Google applications – at least for some customers.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “Neurotechnology researchers are cautiously excited about Neuralink’s human trial….But there is frustration about a lack of detailed information. There has been no confirmation that the trial has begun, beyond Musk’s tweet. The main source of public information on the trial is a study brochure inviting people to participate in it. But that lacks details such as where implantations are being done and the exact outcomes that the trial will assess, says Tim Denison, a neuroengineer at the University of Oxford, UK. The trial is not registered at ClinicalTrials.gov, an online repository curated by the US National Institutes of Health. Many universities require that researchers register a trial and its protocol in a public repository of this type before study participants are enrolled.” An article by Nature’s Liam Drew rounds up experts’ reactions to news of Neuralink’s bare-bones announcement of a first-in-human trial of its implanted brain interface chip.
- “The gap between knowledge and action, and the failure to envision public health efforts as transformative, has weakened the discipline and left public health practitioners understanding their role as largely technocratic. And all the while, the public’s health declines, and with it what little trust the public has in us….How do we extract our field from this quagmire? How do we translate these critiques into concrete steps that will benefit the public’s health?” A perspective article by Yudell and Amon at Health Affairs Forefront takes a critical look at the field of public health and proposes ways to restore its focus after a series of pandemic-era setbacks.
- “The proposal, which the agency has not yet formally announced, calls on manufacturers to increase both the devices’ accuracy and the number of people on which the devices are tested. The agency also wants companies to test the devices on people whose skin colours span the entire range of a predetermined scale. FDA scientists presented the proposal at a meeting of an independent advisory committee on 2 February.” An article by Nature’s Max Kozlov reports on recent indications that regulators may be about to require action by manufacturers to more thoroughly test pulse oximeters for acceptable performance in persons with darker skin.
COMMUNICATION, Health Equity & Policy
- “PLOS authors have chosen Published Peer Review History at a fairly consistent rate of about 40% (rising gently from 38% in 2019 to 42% in 2023). Despite the benefits inherent in a more open and transparent scholarly communication system, the stable rate of uptake suggests that concerns also persist. While we haven’t encountered the realization of these concerns in our experience so far, the evidence base is still poor.” PLOS’ Lindsay Morton reflects on data from 4 years of the open-access publisher’s experiment with an opt-in system that allowed reviewers to unblind themselves to authors and to sign their reviews.
- “Elevated prices and substantial publisher power significantly diminish the volume of article citations and collaborative research. While these barriers operate somewhat differently across academic fields, the hindrance effects consistently prove more pronounced for lower-ranked institutions and developing countries.” An analysis by An and colleagues, available as a preprint from SSRN, finds that journal pricing practices have an effect on subsequent article citation and collaborative activity.
- “The new bill, sponsored by state Sen. Scott Wiener, a Democrat who represents San Francisco, would require companies training new AI models to test their tools for “unsafe” behavior, institute hacking protections and develop the tech in such a way that it can be shut down completely, according to a copy of the bill….AI companies would have to disclose testing protocols and what guardrails they put in place to the California Department of Technology. If the tech causes “critical harm,” the state’s attorney general can sue the company.” The Washington Post’s Gerrit De Vynck and Cat Zakrzewski report that California legislators appear to be making serious moves toward forcing AI companies to institute testing regimes aimed at ensuring the safety of algorithmic products.