Bridging Disciplines to Leverage Electronic Health Record Data for Clinical Research

Introducing the Clinical Research with Electronic Health Records Course

By Ben Goldstein and Jillian Hurst

The Emergence of Electronic Health Records as a Research Tool

Data gathered from electronic health records (EHRs) are increasingly accessible for research, opening new possibilities for understanding medical conditions and patient outcomes in real-world settings. However, the ability to make meaningful use of these rich sources of data requires a diverse skill set, including expertise in clinical care, informatics, statistics, and clinical study design. Given the diverse skills needed to leverage these data, studies based on EHR data frequently require a team-based approach. Clinicians typically provide the basic motivation for such studies, including information about how individuals with a disease or condition of interest interact with providers and the healthcare system. Meanwhile, investigators with training in quantitative methods and expertise in data science and statistics inform the study design and curate and analyze the large datasets typically generated by EHR data.

A Course to Grow EHR-based Research at Duke

In order to grow the number of investigators at Duke capable of using EHR data appropriately for research purposes, we developed the Clinical Research with Electronic Health Records (CR-EHR) course. We designed the CR-EHR course to bring together investigators with different backgrounds to learn how to collaborate on the design and execution of EHR-based studies. This includes:

  • Learning how to construct EHR-based study cohorts
  • Becoming familiar with the strengths and limitations of EHR data
  • Understanding how common analytic strategies may change when working with EHR data
  • Understanding how the strengths and limitations of EHR data influence the interpretation of results from EHR-based analyses

A primary motivation for this course came from our own experience of learning to work together on EHR-based projects. Ben trained as a biostatistician, and while his research focuses on the meaningful use of EHR data, he does not always have the background of clinical knowledge to identify and understand all of the variables from the EHR that are relevant for a given study. Jillian trained as a molecular pharmacologist and has worked in multiple areas within biomedical research for the past decade but has minimal statistical expertise.

Image of a Black physician wearing a white coat and a stethoscope around his neck typing on a laptop
Image credit: Ivan Samkov via Pexels

In order to collaborate with each other and include clinicians and informaticists on our project teams, we had to develop a common language that spans different disciplines. To this end, we designed a course that would create teams of clinically and quantitatively trained investigators who could combine their talents and knowledge effectively in order to develop an EHR-based clinical study, with the ultimate goal of supporting the translation of scientific discoveries into health benefits for communities through collaborative research.

Developing Team Science Skills through a Project-Based Course

In order to simulate a collaborative research team, we sought applications from researchers with quantitative backgrounds, including biostatisticians, informaticists, and methodologists, and clinically trained researchers, including physicians, nurses, and other clinicians. We received over 150 applications from Duke faculty, students, and staff and were able to accommodate 20 investigators in total. Course participants represented 9 departments across the Schools of Medicine and Nursing, as well as the Duke Clinical Research Institute and Duke Global Health Institute. Participants spanned a wide range of career stages, including medical students, graduate students, postdocs, faculty, and staff.

Establishing a Knowledge Base for Research with EHR Data

The course aimed to establish a common knowledge base through a “flipped-classroom” design, in which participants are provided with materials to help orient them to new skills and concepts and class time is devoted to experiential challenges that encourage students to apply the concepts of the materials and lectures studied asynchronously. Prerecorded lectures, resources, and suggested readings were posted to the class Sakai site. Class time (conducted via Zoom) was used for brief discussion of posted material, followed by time in breakout rooms for each team to work on projects (described below) that would leverage new skills. Lectures were provided by Duke investigators with expertise in the structure of EHR systems, data security, ontology systems, data models, and the development of computable phenotypes. Instructors also discussed common analytic approaches and study designs, including association studies, causal inference, longitudinal modeling, and predictive model development. Finally, the course included a series of lectures focused on incorporating types of data into EHR-based studies, including geospatial data, social determinants of health, -omics, and mobile health (mHealth) data. Each topic was briefly discussed at the beginning of class but the majority of class time was devoted to developing an EHR-based study.

Learning by Doing: A Project-Based Class

Participants were divided into 7 teams of paired clinical and quantitative investigators, and 6 “solo” investigators who had both clinical and quantitative training. Each team identified a research question and produce a statistical analysis plan, an EHR-derived clinical cohort, and the results of the planned analyses. Patient privacy and appropriate use of data were carefully managed. All teams were added to a central institutional review board (IRB) protocol that approved the appropriate use of the EHR data for educational purposes. Teams were then assisted in setting up a workspace within Duke’s Protected Analytics Computing Environment (PACE) to ensure data security. Finally, all project teams were given access to Duke’s Clinical Research Datamart (CRDM), a research-ready extract of the Duke University Health System EHR that follows a curated, well-defined common data model.

After a brief class discussion of online materials, each team proceeded to a breakout room to work on their projects. During this time, learners were able to request help from course instructors. These included Wen Zhao, a staff biostatistician with the Biostatistics, Epidemiology, and Research Design (BERD) Core, and Mike Chrestensen, an information technology analyst and informatician from Duke Health Technology Solutions who builds and maintains the Duke Clinical Research Datamart. Additionally, Mike set up “CRDM Office Hours” to provide individualized help for accessing EHR data.

Course Outcomes and Lessons Learned

The final class session consisted of brief presentations of each project, with class participants providing an overview of their research question, the results of their initial analyses, and the challenges and successes encountered during the work. We were thrilled with the variety of projects proposed. Topics included use of eReferrals, disparities in lung cancer screening, development of cardiovascular disease after preeclampsia, risk factors for surgical complications, and prescribing practices. Importantly, this variety helped uncover different benefits and limitations of using EHR data for clinical research, including the availability of different types of data elements such as laboratory values, social determinants, and patient screening tools. It also highlighted the challenges of developing new computable phenotypes to identify specific clinical events or different patient populations.

These projects also helped to identify various technical challenges, including accessing data through PACE, the use of different coding languages and software, and different data ontologies, such as those used for medications. We also gathered feedback from class participants via an end-of-course survey on course structure and content; we’ll use this feedback to design future iterations of the course. We hope that these courses will expand the number of investigators using EHR data for research purposes and help them to develop new strategies to improve patient care and identify patients who may be at risk for different diseases or adverse events.

Coming Attractions

The first edition of the CR-EHR course was a learning experience not only for the participants, but also for the instructors! Based on the feedback we have received, all of the topics covered by the class were useful, but some participants likened it to trying to drink from a firehose. Additionally, the course was time-intensive: putting together an EHR-based cohort is an iterative process, particularly if you are learning the ins and outs of the data at the same time.

Based on this feedback, we are working with Duke AI Health to develop a series of short courses that will cover these topics in a focused format, including EHR-based study design, an introduction to using different EHR data resources at Duke to construct cohorts, and different methods of analyzing EHR data. We’re looking forward to developing these short courses and learning more about the diverse research questions that can leverage the EHR!


This course was supported by the Duke Clinical and Translational Science Award, NIH Award UL1TR002553. We would also like to express our sincere thanks to the Department of Biostatistics and Bioinformatics, Duke AI Health, and the students who participated in the first iteration of the course for their enthusiastic support.

Ben Goldstein, PhD, MPH
Ben is an Associate Professor of Biostatistics & Bioinformatics at Duke University, a faculty member at the Duke Clinical Research Institute, and the Data Science Lead for the Duke Children’s Health and Discovery Initiative.  His research focuses on the meaningful use of electronic health records data, with an interest in both deriving inference from EHR data and developing risk prediction models and clinical decision support tools with EHRs, including understanding the potential and limitations of EHR data for use in clinical research applications.

Jillian Hurst, PhD
Jillian is an Assistant Professor of Pediatrics in the Division of Infectious Diseases and the Director of the Duke Children’s Health and Discovery Initiative, which fosters multidisciplinary research of early life factors that contribute to long-term health and disease. Her research focuses on the integration of clinical data and data derived from biological specimens to identify risk factors for common childhood diseases.