Project profile: Predictive Accuracy of Stroke Prediction Models
Status: Closed
Stroke is the fifth most prevalent cause of death in the U.S., afflicting nearly 800,000 per year. About three quarters of strokes are first events, underscoring the importance of primary prevention. Designing optimal preventive strategies requires identification of risk factors and estimation of the risk of stroke. However, the most recent American Heart Association (AHA)/American Stroke Association Guidelines for the Primary Prevention of Stroke conclude that “an ideal stroke risk assessment tool that is simple, is widely applicable and accepted, and takes into account the effects of multiple risk factors does not exist.” Although multiple stroke risk prediction models have been developed, their generalizability is limited for a number of reasons, including that they rely on data from one study, focus on any stroke, do not differentiate between different etiologies, and only incorporate a small number of predictors.
In this NIH-supported project, researchers from Duke AI Health and the American Heart Association aim to address these limitations and enhance stroke risk prediction modeling by aggregating and harmonizing existing patient-level data collected as part of four longitudinal cohort studies. This broad range of data from different patient cohorts strengthens the evidence for the scientific questions being investigated and, as part of the study, we have introduced an open metadata repository designed to give researchers access to the processes and methods used to harmonize the datasets. In addition, the research team is developing and validating a new machine learning-based risk prediction model for primary stroke and comparing its performance with existing models. The model development includes adding new risk factors, including genetic information, and applying clustering of individuals for the prediction of stroke etiology (ischemic versus hemorrhagic).
This research is supported by NIH/NINDS R61-NS120246.
TEAM
Principal Investigator: Michael Pencina
Co-investigators: Ricardo Henao, Chuan Hong, Matt Engelhard, Jennifer Hall (AHA), Juan (Wendy) Zhao (AHA), Ying Xian (UTSW), Suzanne Judd (UAB), Sara Hassani (Northwestern)
Project Team: Daniel Wojdyla, Tony Schibler, Nathan Bihlmeyer, Andrew Olson, Holly Picotte (AHA), Pratheek Mallya (AHA)
Open metadata repository via the American Heart Association’s Precision Medicine Platform: https://pmp.heart.org/duke-ninds
Project code available in GitHub: https://github.com/duke-harmonization
Selected publications:
Engelhard M, Wojdyla D, Wang H, Pencina M, Henao R. Exploring trade-offs in equitable stroke risk prediction with parity-constrained and race-free models. Artif Intell Med. 2025 Jun;164:103130. doi: 10.1016/j.artmed.2025.103130. Epub 2025 Apr 10. PMID: 40253926; PMCID: PMC12133243.
Hong C, Liu M, Wojdyla DM, Hickey J, Pencina M, Henao R. Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance. J Biomed Inform. 2024 Jan;149:104532. doi: 10.1016/j.jbi.2023.104532. Epub 2023 Dec 7. PMID: 38070817; PMCID: PMC10850917.
Mallya P, Stevens LM, Zhao J, Hong C, Henao R, Economou-Zavlanos N, Wojdyla DM, Schibler T, Manchanda V, Pencina MJ, Hall JL. Facilitating harmonization of variables in Framingham, MESA, ARIC, and REGARDS studies through a metadata repository. Circulation: Cardiovascular Quality and Outcomes. 2023 Nov;16(11):e009938.
Hong C, Pencina MJ, Wojdyla DM, et al. Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups. JAMA. 2023;329(4):306–317. doi:10.1001/jama.2022.24683
AI Health Virtual Seminars:
Part 1: Facilitating Harmonization of Variables from the Framingham, MESA, ARIC, and REGARDS Studies Through a Metadata Repository
https://duke.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=b3b216b1-19c7-4ea5-afce-b22100f54e51
Part 2: Research Applications with Harmonized Variables from the Framingham, MESA, ARIC, and REGARDS Studies
https://duke.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=3859a4a5-c1c4-4ac1-b75b-b17800e11c58
