Synthetic Health Data Generation to Accelerate Patient-Centered Outcomes Research

Project Updates

The Synthetic Health Data Challenge invited proposals for enhancing Synthea or demonstrating novel uses of Synthea-generated synthetic data. ONC is pleased to announce that nine proposals for innovative models have been selected to progress to Phase II and develop their prototype or solution. These Phase I finalists are:

  • Battellion
  • Code Rx
  • Generalistas
  • UI Health
  • LMI
  • Menrva.AI
  • Particle Health
  • Team TeMa #1
  • Team TeMa #2

Learn more about the Synthetic Health Data Challenge and the Phase I Finalists.

Project Information

The Office of the National Coordinator for Health Information Technology (ONC) is leading an effort to enhance an open-source synthetic data engine to accelerate research. Synthea™, a synthetic health data engine developed by the MITRE Corporation, employs an open-source development model. Synthea uses publicly available data to generate synthetic health records and can export information in multiple standardized formats. Synthea generates realistic patients, simulates their entire lives, and outputs electronic health record data.

Project Goal

The focus of this project is to enhance the ability of Synthea to produce high-quality synthetic data for opioid, pediatric, and complex care use cases.

This project will reach its goal by:

  • Identifying and convening a multidisciplinary panel of experts to provide insights regarding the selection of use cases and module development;
  • Developing opioid, pediatric, and complex care data generation modules for Synthea to increase the number and diversity of synthetic patient health records to meet PCOR needs; and
  • Engaging the broader community of researchers and developers to validate the realism and demonstrate the potential uses of the generated synthetic health records through a challenge.

Project Dates

This project began in 2019 and will end in 2022.

Project Background

Synthetic health data can reflect the characteristics of a population of interest and be a useful resource for researchers, health information technology (health IT) developers, and informaticians. Researchers and developers often depend on anonymized data to test theories, data models, algorithms, or prototype innovations, but individuals may be required to aggregate, de-identify, or analyze data before it can be used.

Synthetic data sets are compatible with a variety of technologies, such as the Health Level Seven International® (HL7®) Fast Healthcare Interoperability Resources® (FHIR®) and Consolidated-Clinical Document Architecture (C-CDA). This type of synthetic data engine can support the greater patient-centered outcomes research (PCOR) data infrastructure by providing researchers and health IT developers with a low-risk, readily available synthetic data source to provide access to data until real clinical data are available.

Clinical data are critical for the conduct of PCOR, which focuses on the effectiveness of prevention and treatment options. However, realistic patient data are often difficult to access because of cost, patient privacy concerns, or other legal restrictions. Synthetic health data help address these issues and speed the initiation, refinement, and testing of innovative health and research approaches.

Learn More

Synthetic Health Data Challenge

The Synthetic Health Data Challenge launched on January 19, 2021 and invites proposals for enhancing Synthea or demonstrating novel uses of Synthea-generated synthetic data. Selected proposals move on to the development phase and compete for up to $100,000 in total prizes.

Key dates:

  • Phase I Submission Period Opens: 01/19/2021 9:00 AM ET
  • Phase I Informational Webinar: 02/02/2021 12:00 PM ET
  • Phase I Submission Period Closes: 03/02/2021 5:00 PM ET
  • Phase II Submission Period Opens: 03/23/2021 9:00 AM ET
  • Phase II Submission Period Closes: 07/13/2021 05:00 PM ET

Synthetic Health Data Challenge Finalists

ONC is pleased to announce the finalists (individual or team name) from Phase I of the Synthetic Health Data Challenge. These Proposals for Innovative Models will proceed to Phase II. Wish them luck!

  • Battellion: A Generic Quality Construct Module for Integrated Testing of eCQM using Synthea
  • Code Rx: Medication Diversification Tool
  • Generalistas: Virtual Generalist
  • UI Health: Spatiotemporal Big Data Analysis of Opioid Epidemic in Illinois
  • LMI: On Improving Realism of Disease Modules in Synthea: Social Determinant-Based Enhancements to Conditional Transition Logic
  • Menrva.AI: Incorporating SDOH Data to Predict Diabetes Progression in Patients with Laboratory-Defined Prediabetes
  • Particle Health: The Necessity of Realistic Synthetic Health Data Development Environments
  • Team TeMa #1: Empirical Inference of Underlying Condition Probabilities Using Synthea-Generated Synthetic Health Data
  • Team TeMa #2: Modification and Use of Synthea to Account for Patient Vaccination Choice

Please contact with questions about this project.

Content last reviewed on August 26, 2021
Was this page helpful?