In today’s Duke AI Health Friday Roundup: bias in foundation models translates to real world via robots; calculating the “missing Americans” of higher US mortality rates; what to do when physicians spread medical misinformation; BLOOM debuts as open-source, open-access large language model; saving lives by brushing teeth; bringing back the single-panel figure; advances in wastewater analysis allow scientists to track individual COVID variants; much more:
AI, STATISTICS & DATA SCIENCE
- “BLOOM was created over the last year by over 1,000 volunteer researchers in a project called BigScience, which was coordinated by AI startup Hugging Face using funding from the French government. It officially launched on July 12. The researchers hope developing an open-access LLM that performs as well as other leading models will lead to long-lasting changes in the culture of AI development and help democratize access to cutting-edge AI technology for researchers around the world.” In an article for MIT Technology Review, Melissa Heikkilä introduces readers to BLOOM, a large language model that, unlike many other LLMs, has been built as an open-science, open-access project designed to foster transparency in AI development.
- “These early results suggest that a human judgment forecasting platform can quickly generate probabilistic predictions for targets of public health importance. Predictions from a human judgment forecasting platform might be especially important when data are sparse, limiting the accuracy of statistical models, or when historical data has been collected in different locations than the present outbreak, limiting the accuracy of mechanistic models.” An article by McAndrew and colleagues appearing in Lancet Digital Health describes a crowdsourced method for estimating the prevalence and spread of the monkeypox virus.
- “In this paper, to the best of our knowledge, we conduct the first-ever experiments showing existing robotics techniques that load pretrained machine learning models cause performance bias in how they interact with the world according to gender and racial stereotypes…, in addition to enacting the scientifically discredited pseudoscience of physiognomy, all at scale.” A paper by Hundt and colleagues presented at the June ACM Conference on Fairness, Accountability, and Transparency explores how robots trained on large foundation models translate the bias encoded in the underlying training data into biased action.
- “Gateways and portals are frequent and potent images in religion, mythology, and literature. They separate, but allow passage between, heaven and hell, earthly and divine, real and imagined. Other conduits are more figurative gateways and portals: the looking glass that transports Alice to Wonderland, the twister that hurtles Dorothy to Oz. In all cases, moving from one realm to the next leaves the traveler indelibly changed.” In an essay for the New England Journal of Medicine, physician Suzanne Koven’s encounter with a patient portal – as a patient – turns out to be more momentous than expected.
BASIC SCIENCE, CLINICAL RESEARCH & PUBLIC HEALTH
- “How many U.S. deaths would have been averted each year if the U.S. had ASMRs equal to the average of other wealthy nations? As a descriptive counterfactual, we computed the number of deaths that would have occurred in the U.S. if the U.S. population had experienced the age-specific mortality profile of peer nations. We then subtracted this number from the actual number of U.S. deaths at each age to compute the number of “Missing Americans”.” A sobering preprint by Bor and colleagues, available at medRxiv, tallies the “missing Americans” – the estimated number of excess lives lost due to worse mortality rates in the US compared with peer nations.
- “The researchers detected the Alpha and Delta variants of the coronavirus in waste water up to two weeks before the strains were picked up by swabbing and testing people in clinics. They also detected Omicron roughly ten days before the first person tested positive for it in San Diego, and traced the surge of the BA.1 variant of Omicron in the population.” Public health surveillance of wastewater to gain insight into COVID prevalence and spread in a community has been taking place for most of the pandemic, but new capabilities are now allowing scientists to discriminate specific variants of the virus. Nature’s Smriti Mallapaty has the story.
- “The virus is likely now locked with the human immune system in a perpetual evolutionary arms race. A variant emerges to circumvent our existing immunity, then vaccines and infections gradually rebuild our defenses … until another variant emerges. This is exactly what happens with flu, but the coronavirus seems to be changing even more quickly.” At The Atlantic, Ed Yong unpacks the implications of the ongoing “BA.5” wave of COVID infections.
- “Hospital patients not getting their teeth brushed, or not brushing their teeth themselves, is believed to be a leading cause of hundreds of thousands of cases of pneumonia a year in patients who have not been put on a ventilator. Pneumonia is among the most common infections that occur in health care facilities, and a majority of cases are non-ventilator hospital-acquired pneumonia, or NVHAP, which kills up to 30% of those infected, Giuliano and other experts said.” A story by Brett Kelman at Kaiser Health News illuminates how the humble toothbrush could have an outside impact on a serious – and often deadly – health condition.
COMMUNICATION, Policy & Health Equity
- “Establishing a paper-reading habit and workflow has absolutely made me a better scientist. I read far more in less time than I used to, and I regularly apply what I read to improve my own experimental designs. I used this system extensively when putting together my PhD fellowship applications, as well as my candidacy exam. When I sit down to write my dissertation, I know my future self will thank me for having the foresight to take these steps today.” In a “Career Column” article for Nature, PhD candidate Maya Gosztyla shares some strategies for efficiently tackling the professional to-read pile.
- “With nearly 1 million Americans dead from Covid, and deaths — some of them clearly preventable — continuing at a rate of more than 200,000 per year, it has become imperative for our profession to empower our institutions to signal clearly who is — and who is not — providing evidence-based information. We physicians need to use the institutions we’ve created for professional self-regulation to maintain public trust by establishing some recognizable boundaries.” A perspective article by Baron and Ejnes published in the New England Journal of Medicine ponders courses of action when physicians spread misinformation on social media.
- “Figures are not to be considered as a place to dump raw data. Any results that are not being discussed in the main text with greater detail should be moved to the Supporting Information. Adding unnecessary data panels in a single figure dilutes the focus of key findings. Including multiple panels in a single figure also makes it difficult to go through individual panels’ data set… So, the point to be made here is the need for a clear focus on the data that the authors like to present in each figure.” Back to basics: in an editorial for ACS Energy Letters, Prashant V. Kamat urges a return to the single-panel journal article figure (H/T @RetractionWatch).
- “In all fields of science, increasing access to data and code used for pre-printed or published research is a step in the direction of more transparent, reproducible, and reliable research. The COVID-19 pandemic has created a novel, constantly changing scientific culture that should be navigated with care to uphold standards of scientific practice for both the research community and the safety of the public. Our analysis shows that there is room for improvement in the areas of open data and code availability within COVID-19 pre-print papers on arXiv, bioRxiv, and medRxiv.” In an article for Scientometrics, Annie Collins and Rohan Alexander examine COVID-19 preprints for the availability of materials – such as open-access data and code – that would allow other to replicate findings.