Patient Demographic Data Quality Framework

Data Integration

Purpose

Reduce the need to to obtain data from multiple sources and improve data consistency and availability for business processes that require record integration or merging, multi-source data consolidation, and aggregation or reporting.

Introductory Notes

Data integration addresses data transport and processing (linking, combining, deduplication of records, etc.). The term is most frequently described as the mechanism for transforming and integrating data from multiple sources into a target destination environment, but also can refer to the activities of matching, merging, and deleting records within a single data store, therefore many recommended practices within the scope of this topic also apply to smaller organizations which may host only a single data store.

Data integration challenges can be extensive: diverse data representations in multiple sources, discrepancies in business meaning for the same or similar terms, exact requirements development for the target result, and the defined steps for how to achieve it, etc. Although the mechanics of data integration are usually handled by information technology resources, it is very important to engage representatives from supplying and consuming business areas to determine:

  • Consensus about the desired target result for the selected data set;
  • Agreement about what rules, test, and actions will be applied to the data, and the order of execution;
  • Implementation of quality rules that apply to data at rest and data in motion (i.e., when extracted to another data store);
  • Agreement about how defects and anomalies that may occur will be addressed (e.g. false positives or false negatives, unexpected results, etc.); and
  • Agreement about how data in motion is mapped to the destination, and if more than one source, what order of source precedence should apply to the data being integrated.

Successful data integration can be considered a synthesis of many other data management processes. If sound practices have been implemented for data standards, data requirement definitions, the business glossary, metadata, data profiling, and data quality assessments, integration activities will be much easier to plan, specify, and perform. Where there are gaps in those processes, the activities involved in integrating data will quickly reveal them.

For organizations that acquire patient records for aggregation, insurance submissions, research, and other purposes, integration and data standards are critical both for data exchange (e.g., the HL7 exchange standard) and for shared respositories intended as authoritative sources. Data integration required for EHRs depends on complex logic to link records, identify anomalous pairings, and determine duplicates for merging.

Successful integration efforts, regardless of scope, depend on effective collaboration. Business representatives need to define and validate business and quality rules and IT needs to understand the business use of data to implement rules governing integration within the context of the organization's standards. Patient demographic data can also be improved at the point of integration, or within a data store, through data enrichment. The business can consider employing data enrichment to improve the data, for instance, employing a service to standardize addresses and apply 9-digit ZIP codes, or utilizing a “householding” feature to identify co-located patients.

Application of sound data integration practices enables an organization to realize the following benefits:

  • Creates organization-wide alignment for well-organized, shared and accessible data;
  • Improves source data accuracy and quality via embedded business and quality rules;
  • Improves business logic to identify duplicate records;
  • Increases data quality optimization for data at rest and in motion;
  • Reduces manual efforts spent fixing data; and
  • Improves timely delivery of well-structured data content.

Additional Information

Data governance should identify relevant stakeholders along the patient demographic data lifecycle and facilitate collaboration among them to establish integration policies and processes. Working groups can be convened to ensure that data integration requirements are specified with representation from all relevant stakeholders.

A standard integration approach should start with defining the scope of critical data attributes and provide guidance on the development of and a common language for data quality and business rules that will ensure consistent data handling along the lifecycle. The approach will include agreements on data precedence rules based on real world scenarios, data requirements, and selected triggers that will satisfy business objectives.

In addition, the data integration effort should include guidance for a change control process to ensure that changes to the integration environment, including upstream sources and downstream targets, are controlled and kept in alignment.

Example Work Products

  • Data integration scripts
  • Data integration requirements

Additional Information

Patient data goes through internal and external processes that are related from the standpoint of patient care, but often are not linked through direct data feeds. At key points along the care continuum, patient records may need to be integrated to present a complete set of patient information or to generate consolidated statements for billing and reporting that require combining selected attributes from multiple sources into a single record.

Quality rules and matching algorithms are employed for standardizing and merging duplicate records, as well as linking information created by multiple unlinked processes back to a single patient record. The set of data integration rules used by the organization facilitates data transfer and loading, detects and captures data changes from upstream sources, captures metadata, and applies in-line data quality checks and controls, e.g., merging algorithms and exception reports on unmerged candidate duplicate records.

Example Work Products

  • Data integration standards
  • Integration rules and quality rules
  • Matching algorithm(s)

Additional Information

The integrated data environment is a hub that brings together mutiple sources and uses of shared data. As a result, many stakeholders are impacted by changes to data sources, rules, and targets. Downstream applications typically utilize custom rules to filter and relate data from the integrated environment in support of specific business purposes. If downstream consumers are not involved, changes can have significant downstream impacts.

For example, were a Billing Code of “P” to change to “R”, all downstream systems that automatically select records where Billing Code = “P” would fail to find current records. Typically, data governance will inform and coordinate relevant stakeholders to vet proposed changes to the integrated environment. It is also important to update metadata with any changes to data labels, descriptions, values, and relationships.

Example Work Products

  • Integration method standards
  • Integration environment change management processes

Practice Evaluation Questions

Tier 1: Foundational

1.1 Do stakeholders participating in collective decisions about the integration of patient demographic data follow a standard approach based on data needs and priorities?

Tier 2: Building

2.1 Does the organization apply quality rules to the integration of patient demographic data from multiple sources?

Tier 3: Advanced

3.1 Are changes to data sources, quality rules, and destinations documented and approved by all relevant stakeholders?