Synthetic Health Data Generation to Accelerate Patient-Centered Outcomes Research

Project Information

The Office of the National Coordinator for Health Information Technology (ONC) led an effort to enhance an open-source synthetic data engine to accelerate research. Synthea™, a synthetic health data engine developed by the MITRE Corporation, employs an open-source development model. Synthea uses publicly available data to generate synthetic health records and can export information in multiple standardized formats. Synthea generates realistic patients, simulates their entire lives, and outputs electronic health record data.

Project Goal

The project was focused on enhancing Synthea’s ability to produce high-quality synthetic data for patients with complex care needs, opioid use, and pediatric populations.

This project achieved this goal by:

  • Identifying and convening a multidisciplinary panel of experts to provide insights regarding the selection of use cases and module development;
  • Developing Synthea synthetic health data generation modules that increase the number and variety of synthetic patient health records to meet PCOR needs; and
  • Engaging a broader community of researchers, developers, and innovators to validate the realism and demonstrate the potential uses of Synthea-generated synthetic health data through a nationwide challenge competition: the Synthetic Health Data Challenge.

Project Dates

This project began in 2019 and ended in March 2022.

Project Background

Synthetic health data can reflect the characteristics of a population of interest and be a useful resource for researchers, health information technology (health IT) developers, and informaticians. Researchers and developers often depend on anonymized data to test theories, data models, algorithms, or prototype innovations, but individuals may be required to aggregate, de-identify, or analyze data before it can be used.

Synthetic health data sets are compatible with a variety of technologies, such as the Health Level Seven International® (HL7®) Fast Healthcare Interoperability Resources® (FHIR®) and Consolidated-Clinical Document Architecture (C-CDA). This type of synthetic health data engine can support the greater patient-centered outcomes research (PCOR) data infrastructure by providing researchers and health IT developers with a low-risk, readily available synthetic health data source to provide access to data until real clinical data are available.

Clinical data are critical to conduct PCOR, which focuses on the effectiveness of prevention and treatment options. However, realistic patient data are often difficult to access because of cost, patient privacy concerns, or other legal restrictions. Synthetic health data help address these issues and speed the initiation, refinement, and testing of innovative health and research approaches.

Learn More

 Synthetic Health Data Challenge

The Synthetic Health Data Challenge launched on January 19, 2021 and invited proposals for enhancing Synthea or demonstrating novel uses of Synthea-generated synthetic health data. Selected proposals moved on to the development phase and competed for $100,000 in total prizes. Challenge winners presented their innovative and novel solutions during the Winning Solutions Webinar held on Tuesday, October 19, 2021 at 12:00 PM ET. If you missed the webinar, you can download the materials or watch the recording.

See ONC’s press release announcing the Challenge winners.

Synthetic Health Data Challenge 1st Place Winner: $40,000 award

Team CodeRx: Medication Diversification Tool

Synthetic Health Data Challenge 2nd Place Winners: $15,000 award

The Generalistas: Virtual Generalist –  Modeling Co-morbidities in SyntheaTM


Team LMI: On Improving Realism of Disease Modules in SyntheaTM: Social Determinant- Based Enhancements to Conditional Transition Logic

Synthetic Health Data Challenge 3rd Place Winners: $10,000 award

Particle Health: The Necessity of Realistic Synthetic Health Data Development Environments


Team TeMa: Empirical Inference of Underlying Condition Probabilities Using SyntheaTM-Generated Synthetic Health Data


UI Health: Spatiotemporal Big Data Analysis of Opioid Epidemic in Illinois


Synthea Technical Guidance and Tips

  • Additional information about Synthea.
  • Detailed information for using Synthea  is available on the Synthea Wiki.
  • Read these tips if you are new to using Synthea

Project Fact Sheet

The fact sheet [PDF - 781 KB] provides a visual overview of the project and includes the goal and objectives, use cases selected, and methodology used for developing, testing, and validating Synthea modules.

Final Report and FAQs

  • To learn more about how this Project enhanced Synthea’s ability to produce high-quality synthetic health data, read the Final Report.
  • Read these FAQs for additional information about PCOR, Synthea, this project, and the Challenge.

Please contact with questions about this project.