Training Data for Machine Learning to Enhance Patient-Centered Outcomes Research (PCOR) Data Infrastructure

Innovative Artificial intelligence (AI) methods and the increase in computational power support the use of tools and advanced technologies such as machine learning, which consumes large amounts of data to make predictions for actionable information.

Current AI workflows make it possible to conduct complex studies and uncover deeper insights than traditional analytical methods. As the volume and availability of electronic health data increases, PCOR researchers need better tools to analyze data and interpret those outcomes. A foundation of high-quality training data is critical to developing robust machine learning models. Training data sets are essential to train prediction models that use machine learning algorithms, to extract features most relevant to specified research goals, and to reveal meaningful associations.

The focus of this project is to conduct foundational work to support future applications of machine learning and AI to improve health, healthcare delivery, and PCOR by:

  • Capturing lessons learned from the process of developing high-quality training data sets, which involves:
    • establishing kidney disease-related use cases that benefit most from machine learning applications and using data from multiple sources to better inform use cases, and
    • focusing on data annotation, data curation, and establishing data quantity and quality requirements. The project may also explore validation and balancing of the data, and feature selection to avoid overfitting;
  • Developing machine learning models and identifying approaches to evaluate their performance; and
  • Disseminating project tools, activities, and lessons learned to encourage future applications of these methods by PCOR researchers.
Content last reviewed on October 7, 2019
Was this page helpful?