Joint Duke AI Health – American Heart Association Stroke Risk Prediction Project Offers New Tools for Prevention
In June of 2025, scientific collaborators from Duke AI Health and the American Heart Association concluded a five-year study funded by the National Institutes of Health and aimed at improving the estimation of patients’ risk of stroke. Stroke is a leading cause of death in the United States that affects almost 800,000 people per year. Approximately three quarters of strokes are first events and most strokes are preventable, underscoring the importance of primary prevention. An international study of 6,000 participants found that approximately 90% of the risk of stroke can be explained by 10 potentially modifiable risk factors, including smoking, diet, physical activity, psychosocial stress and high blood pressure.
In addition to identifying these risk factors, designing optimal preventive strategies for patients requires accurately estimating an individual person’s risk of stroke. Prior to the study, stroke risk prediction models integrated demographic, clinical, socio-economic, and genetic information to produce individualized probabilities of stroke, and they were used to stratify populations under study and propose preventive strategies depending on the level of risk.
However, early iterations of stroke risk prediction models faced a number of limitations to their accuracy and utility. These include that models were developed based on data from patient cohorts that were not diverse in terms of age, sex, race/ethnicity and other key variables. In addition, the models were often developed using only a limited number of risk factor variables or predictors. Lastly, the models did not differentiate between the two primary categories of stroke etiology: ischemic, when blood flow to the brain is blocked, versus hemorrhagic, when there is bleeding in the brain from a ruptured vessel.
Led by principal investigator and former Duke AI Health Director Michael Pencina, PhD, scientists on the research team recognized that new technologies, datasets, and machine learning methods could all be used to address some of these limitations. They proposed a study with three key aims:
- First, obtain a larger and more representative patient cohort for testing existing stroke risk prediction models and developing new ones. The team aggregated and harmonized patient-level data collected from four NIH-sponsored cohort studies, the Framingham Offspring Study, Atherosclerosis Risk in Communities (ARIC), Multi-ethnic Study of Atherosclerosis (MESA) and Reasons for Geographic and Racial Differences in Stroke (REGARDS).
- Second, incorporate additional predictors into the risk modeling and apply new machine learning-based risk models that might perform better than the simpler existing models.
- Third, apply new clustering techniques to differentiate between patients at risk of ischemic versus hemorrhagic stroke.
Over the course of the study’s five years, the research team has shared their findings through 18 publications in high-impact academic journals and conference proceedings, including JAMA, the Journal of Biomedical Informatics, and Artificial Intelligence in Medicine. Some of these publications describe their approach to integrating the four datasets to improve their statistical power and representation, including by testing natural language processing methods to automate and scale variable harmonization.
Other papers explain how the team utilized the harmonized dataset to evaluate the performance and fairness of existing stroke risk prediction models to develop new models with the potential to improve clinical decision-making, and to illuminate ways to mitigate algorithmic bias and improve the fairness of models in clinical use.
All of the code developed to conduct the analyses underlying the project’s publications has been made publicly available through the study’s dedicated GitHub repository. Through these efforts, the team has advanced the science of stroke risk prediction on several fronts and invites other researchers to continue to build on their work.
