Synthetic Health Data Generation to Accelerate Patient-Centered Outcomes Research

Project Background

Synthetic health data can reflect the characteristics of a population of interest and be a useful resource for researchers, health information technology (health IT) developers, and informaticians. Researchers and developers often depend on anonymized data to test theories, data models, algorithms, or prototype innovations, but individuals may be required to aggregate, de-identify, or analyze data before it can be used.

Clinical data are critical for the conduct of patient-centered outcomes research (PCOR), which focuses on the effectiveness of prevention and treatment options. However, realistic patient data are often difficult to access because of cost, patient privacy concerns, or other legal restrictions. Synthetic health data help address these issues and speed the initiation, refinement, and testing of innovative health and research approaches. Capitalizing on this opportunity, the Office of the National Coordinator for Health Information Technology (ONC) is leading an effort to enhance an open-source synthetic data engine to accelerate research.

Synthea™, a synthetic health data engine developed by the MITRE Corporation, employs an open-source development model. Synthea uses publicly available data to generate synthetic health records and can export information in multiple standardized formats. Synthea generates realistic patients, simulates their entire lives, and outputs electronic health record data. The synthetic data sets are compatible with a variety of technologies, such as the Health Level Seven International® (HL7®) Fast Healthcare Interoperability Resources® (FHIR®) and Consolidated-Clinical Document Architecture (C-CDA).

This type of synthetic data engine can support the greater PCOR data infrastructure by providing researchers and health IT developers with a low-risk, readily available synthetic data source to provide access to data until real clinical data are available.

Project Dates

This project began in 2019 and will end in 2022.

Project Goal

The focus of this project is to enhance the ability of Synthea to produce high-quality synthetic data for opioid, pediatric, and complex care use cases.

This project will reach its goal by:

  • Identifying and convening a multidisciplinary panel of experts to provide insights regarding the selection of use cases and module development;
  • Developing opioid, pediatric, and complex care data generation modules for Synthea to increase the number and diversity of synthetic patient health records to meet PCOR needs; and
  • Engaging the broader community of researchers and developers to validate the realism and demonstrate the potential uses of the generated synthetic health records through a challenge.

Learn More

Synthetic Health Data Challenge

The Synthetic Health Data Challenge launched on January 19, 2021 and invites proposals for enhancing Synthea or demonstrating novel uses of Synthea-generated synthetic data. Selected proposals move on to the development phase and compete for up to $100,000 in total prizes.

Key dates:

  • Phase I Submission Period Opens: 01/19/2021 9:00 AM ET
  • Phase I Informational Webinar (recording): 02/02/2021 12:00 PM ET
  • Phase I Submission Period Closes: 03/02/2021 5:00 PM ET
  • Phase II Submission Period Opens: 03/23/2021 9:00 AM ET
  • Phase II Submission Period Closes: 07/13/2021 05:00 PM ET

Please contact with questions about this project.

Content last reviewed on February 17, 2021
Was this page helpful?