Duke AI Health Hosts December EHR Study Design Workshop

Duke AI Health is pleased to announce the Duke Electronic Health Records Study Design Workshop (EHR-SDW) 2023. The workshop will be offered in December as a virtual five-day class that provides foundational lectures and hands-on studios on the fundamentals of working with and designing EHR based studies.

The EHR-SDW is targeted toward individuals interested in learning about how to work with and conduct studies using electronic health records (EHR) data. EHR data are a widely available form of real-world data that have become standard in studies ranging from clinical trials, comparative effectiveness, risk prediction, and population health. The EHR-SDW will introduce the components of EHR data and introduce considerations for design of effective studies. In addition to didactic lectures, participants will get hands-on experience in working with publicly available tools to facilitate EHR studies (e.g., RxNorm, CCS codes, geocoding) as well as feedback on effective study designs that they will work on.

The course will be conducted virtually via Zoom.

This workshop is offered through Duke AI Health’s Health Data Science (HDS) program and builds on the success of our highly successful Machine Learning Schools, with 11 events held since 2017. The Duke Machine Learning Schools have reached hundreds of participants from academia and industry and including international audiences at the SingHealth/Duke NUS Medical School and the Duke Kunshan University campus. Our 2022 Duke Machine Learning Summer School attracted 140 participants from around the world, representing 41 universities, institutes, and corporations.

To register for the EHR-SDW, please visit  https://events.duke.edu/ehr-sdw-2023

To request consideration for a scholarship, please visit https://duke.qualtrics.com/jfe/form/SV_a3Fs9TyyK82Bkr4

The deadline for registration is Tuesday, November 28, 2023.

Who Should Attend

The EHR-SDW is intended for participants working with (or interested in working with) EHR data in academic, industrial, or government settings for research or professional purposes. Previous experience with or direct access to EHR data is not required.

As a form of “real-world data,” EHR data has seen tremendous usage by academic researchers, biomedical companies and government bodies to provide a vantage into how health care is provided in a real-world context. The breadth of EHR data makes it suitable for designing and conducting clinical trials, performing comparative effectiveness studies, developing clinical prediction tools, and performing public health surveillance.

While powerful, EHR data also come with a number of embedded challenges that require thoughtful study design. This course will introduce the components of EHR data, discuss its inherent challenges, and go over different study designs that one may want to conduct with EHR data.

The EHR-SDW is designed to provide value to participants with various backgrounds. Those new to EHR data will develop an appreciation for how EHR data can be used to enhance various types of studies. Participants with experience using EHR data will develop deeper appreciation for the subtleties of different study designs.

Program Format

The 5-day workshop will combine foundations lectures with hands-on studios allowing participants to engage directly with experienced practitioners. After the introductory first data, each day will focus on a specific study design that utilizes EHR data:

  • Randomized Studies
  • Observation Studies
  • Prediction Studies
  • Population Health Studies

Each day will be broken into a didactic morning session and group-based afternoon session. The morning session will consist of lectures by experts in conducting the types of studies discussed. Topics will cover rationale for the type of studies, how and why to leverage EHR data into the study, and caveats to be concerned about when using EHR data for such studies.

The afternoon session, will allow participants to gain experience designing the discussed study topic. An experienced practitioner will walk through a design plan for the study under discussion. Then participants will be broken up into small groups (via Zoom Breakout Rooms) and will work through vignettes to design their own study. The larger group will come together to present their design and get feedback.

At the end of the EHR-SDW each participant will have a deeper understanding of various types of studies one can conduct with EHR data and the challenges and solutions for how to design appropriate studies.


Morning sessions will be led by clinical and methodological faculty from Duke University with hands-on experience designing and conducting EHR studies in both academic and industry settings. Afternoon workshops will be led by practicing biostatisticians experienced in conducting the discussed studies.

The broad areas of emphasis for the five-day class will include:

Introduction to Electronic Health Record (EHR) Data

Monday, December 4, 2023 (9:00 AM – 4:00 PM Eastern time) 

  • Introduction to EHR systems and EHR data
  • EHR data that are used for clinical research
  • Differences between structured and unstructured data
  • Privacy concerns with EHR data
  • Potential challenges and biases associated with EHR data
  • Afternoon Studio: Hands on working with ontology systems (RxCUI, CCS) to assist EHR based studies

Population Health Studies

Tuesday, December 5, 2023 (9:00 AM – 4:00 PM Eastern time) 

  • EHR systems and EHR data to promote population health
  • Defining target populations and catchments
  • Identifying and mitigating biases in epidemiologic and population health studies
  • Linking social and environmental data with EHR data
  • Afternoon Studio: Population health design

Pragmatic and Randomized Clinical Trials

Wednesday, December 6, 2023 (9:00 AM – 4:00 PM Eastern time) 

  • Using EHR data to design clinical trials
  • Identifying patients for clinical trials
  • Pragmatic trials
  • Afternoon Studio: Randomized study design

Clinical Prediction Studies

Thursday , December 7, 2023 (9:00 AM – 4:00 PM Eastern time) 

  • Principles of clinical prediction studies
  • Handling diverse and longitudinal predictor variables
  • Design prediction models for implementation
  • Algorithmic bias and fairness
  • Afternoon Studio: Clinical prediction design

Comparative Effectiveness Studies

Friday, December 8, 2022 (9:00 AM – 4:00 PM Eastern time) 

  • Principles of observational studies and comparative effectiveness research (CER)
  • Confounding and selection bias
  • Propensity scores
  • Afternoon Studio: Observational study design

A final wrap-up will be conducted at the end of the Friday studio.


The course will be taught by an experienced group of methodological and clinical investigators from Duke University experienced in using EHR data for academic, industry and government projects.

• Dr. Benjamin Goldstein, Associate Professor of Biostatistics and Bioinformatics
          • Meaningful use of EHR data for clinical research, biases in EHR data
          • Course co-creator, day 4 presenter
• Dr. Jillian Hurst, Assistant Professor of Pediatrics
          • Use of real-world data for pediatric studies, training of clinician scholars
          • Course co-creator, day 1 presenter
• Dr. Amanda Brucker, Biostatistician, BERD Core
          • Use of EHR data for clinical research studies
          • Day 1 presenter
• Dr. Nrupen Bhavsar, Associate Professor of Medicine
          • EHR data to study chronic disease; incorporation of geospatial data in EHR studies
          • Day 2 presenter
• Dr Deepshikha Ashana, Assistant Professor of Medicine
          • Social drivers of health, health policy
          • Day 2 presenter
• Dr. Schuyler Jones, Associate Professor of Medicine
          • Pragmatic clinical trials in cardiology
          • Day 3 presenter
• Dr. Lisa Wruck, Associate Professor of Biostatistics and Bioinformatics
          • Pragmatic clinical trials, use of real world evidence
          • Day 3 presenter
• Ms. Congwen Zhao, Biostatistician, BERD Core
          • Use of EHR data for predictive modelling
          • Day 4 presenter
• Ms. Sophia Bessias, Data Scientist, AI Health
          • Evaluation of clinical decision support tools
          • Day 4 presenter
• Dr. Laine Thomas, Associate Professor of Biostatistics and Bioinformatics
          • Observational data studies, causal inference
          • Day 5 presenter
• Dr. Karen Chiswell, Statistical Scientist
          • Use of real world data for observational studies
          • Day 5 presenter

Program Details: Location, Registration, and Cost

The registration fee for the EHR-SDW is $400. We are able to offer a set number of seats at a discount for members of nonprofit organizations ($150) and current students with a valid ID at Duke or other universities ($50). All fees are payable through the registration site.

All fees are non-refundable. Once we reach maximum registration, we will maintain a waitlist, and will contact those on the waitlist as spots become available. We also have a small number of scholarships available for those who would be otherwise unable to join.

Each participant will receive a personal link for the virtual webinars, which will be held live and provide opportunities for questions and engagement with each lecturer. We strongly encourage live participation during all sessions across the 5 days, but every participant will also have access to the video recordings of lectures to use for their personal reference.