Training Data for Machine Learning to Enhance Patient-Centered Outcomes Research Data Infrastructure

Project Background

Innovative artificial intelligence (AI) methods and the increase in computational power support the use of tools and advanced technologies such as machine learning, which consumes large amounts of data to make predictions for actionable information.

Current AI workflows make it possible to conduct complex studies and uncover deeper insights than traditional analytical methods do. As the volume and availability of electronic health data increases, patient-centered outcomes research (PCOR) investigators need better tools to analyze data and interpret those outcomes. A foundation of high-quality training data is critical to developing robust machine-learning models. Training data sets are essential to train prediction models that use machine learning algorithms, to extract features most relevant to specified research goals, and to reveal meaningful associations.

Project Dates

This project began in 2019 and will end in 2021.

Project Goal

The focus of this project is to conduct foundational work to support future applications of machine learning and AI to improve health and healthcare delivery.

This project will reach its goal by:

  • Identifying and convening a multidisciplinary expert workshop to provide insights regarding the selection of use cases and development of training data sets for machine learning;
  • Capturing lessons learned from the process of developing high-quality training data sets, which involves:
    • Establishing kidney disease-related use cases that benefit most from machine learning applications and using data from multiple sources to better inform use cases, and
    • Focusing on data annotation, data curation, and establishing data quantity and quality requirements. The project may also explore validation and balancing of the data, and feature selection to avoid overfitting;
    • Developing machine-learning models and identifying approaches to evaluate their performance; and
    • Disseminating project tools, activities, and lessons learned to encourage future applications of these methods by PCOR researchers.

Learn More

Read the blog post about this project.

Please contact with questions about this project.

Content last reviewed on March 31, 2020
Was this page helpful?