Document

Data Quality

Data Cleansing and Improvement

Purpose

Addresses the mechanisms, processes, and methods used to validate and correct data defects according to predefined quality rules, as well as analysis and enhancement of business processes to prevent errors.

Introductory Notes

Data cleansing focuses on data correction to meet business user criteria (targets and thresholds) as determined by data quality rules addressing all applicable quality dimensions. Quality rules, developed through the data quality assessment process and the results of data profiling efforts, provide a baseline for identifying data defects which can affect business operations.

An example of enhancing a business process to improve patient demographic data quality might be, for example: once a patient has produced identification, if there is an existing record in the system, providing the patient a subset of the demographic information for validation. Adding this activity step would help prevent potential duplications.

Data cleansing activities are most effective when conducted at, or as close as possible to, the point of first capture, i.e. the first automated data store to record the patient’s data, or as close to the original creation point as feasible. A best practice is to undertake cleansing activites based on data profiling or data quality assessment analysis. The organization should develop a standard data cleansing plan template to ensure that cleansing rules for data are shared and reused for any data store in which patient demographic data is located. This will help to minimize duplication of effort and conflicting cleansing activities, e.g. cleaning the same data in two physical locations but applying different rules.

Organizations need to establish criteria for what events trigger data cleansing efforts. In most organizations, data cleansing is more frequently conducted on key shared data stores, such as a master patient index or the patient registration or scheduling system. However, it is advisable to expand the scope to operational systems which provide data to other internal or external consumers, subject to criticality and budget. As with data profiling efforts, it is recommended that data cleansing criteria are subject to impact analysis.

Once accomplished, data corrections should be published and immediately made available for affected downstream data stores and repositories. It is advised to develop and document a consistent process for escalating issues approporiately to governance, the quality coordinator, or the vendor if applicable. Data changes should be verified with internal and external data providers, preferably through an automated message or report. If this capability does not exist, then a manual report should be produced, and provided both to the designated data quality resource and any corresponding internal or external stakeholders.

The benefits of implementing standard data cleansing and improvement processes include:

Realized savings in both effort and cost;
Errors traced to root causes to prevent downstream errors;
Increased resource allocation efficiency through cumulative experience;
Improved and sustainable data quality through discovery of quality rules; and
Improved effectiveness of business processes to minimize errors.

Additional Information

Data cleansing requirements may result from the identification of data defects through various sources, e.g., change requests, report inaccuracies, defect logs, etc. However, it is particularly important to regularly conduct data profiling and data quality assessments, which can catch defects before they negatively impact patient safety. Defects should be translated into documented cleansing requirements.

Data cleansing requirements will specify exactly what data needs to be cleansed and the rules that should be applied. The cleansing rules, scripts, or procedures are intended to be reused since the cleansing process will be repeated consistently over time. The document may include the following content:  A statement of the defect, and its impact on operational objectives;  A definition of the scope of data to be addressed; and  The benefits that will result from improved data quality.

Defining the benefits of improving the data will most likely require knowledge of business rules, data definitions, usage impacts, etc. Data cleansing requirements should adhere to quality criteria that are expressed in terms of data quality dimensions (i.e., conformity, accuracy, uniqueness) and linked to business objectives, such as making measurable improvements on patient safety.

For example, one organization hired a contractor to assist in a 10-month data cleansing effort that included an analysis exploring the hospital’s entire identity integrity process. The root cause of every duplicate was documented, and a resolution plan was created.⁶

Example Work Products

Data cleansing requirements
Data profiling results
Data quality assessment results

⁶ Duplicate Records Compromise EHR Investment. Healthcare Financial Management. August 2009.http://www.healthcaretechnologyonline.com/doc/duplicate-records-compromise-ehr-investment-0002

Additional Information

Quality rules and manual work arounds are most often used to resolve data defects with little if any attention paid to identifying their root causes. The root cause of data defects is most often at the point of origin (upstream). A significant percentage are the result of human error with the remainder being due to sub-optimal design. Typos, misspellings, transpositions, fields left empty, or fields filled with false data can all cause problems for consumers downstream from the point of entry.

Whatever the cause of the defect, improving the process, especially at the original point of data input or acquisition, can resolve most defects (because process improvements may also lead to design changes or addition of automated quality rules). For example, most health systems, as a best practice, now require registrars to search for an existing patient record before creating a new record. Some organizations have improved the process, requiring that registrars scan and check a photo ID, which is also helpful for reducing the risk of fraudulent use of a patient’s insurance coverage.⁷ Once the data has been corrected, it is important to communicate data changes to all relevant stakeholders.

For example, healthcare organizations agree that any changes to patient data attributes in exchange transactions should be coordinated with organizations working on parallel efforts to standardize healthcare transactions.

⁷ Patient Identification and Matching Final Report. Office of the National Coordinator for Health Information Technology. February 7, 2014.

Additional Information

It is important to determine a prioritization method for identifying the most important defects to resolve. Commonly used approaches include estimates of the costs of fixes, the level of effort, characterization of the business impact, and tangible and intangible benefits.

For example, the average cost of a duplicate patient record has been estimated by one organization to be $96 and total impact on patient care at 4% (delayed surgeries, duplicated lab tests, and imaging). The organization prioritized reducing the duplicate rate, which was cut from 22 % to .14 %, due to the high cost of the issue.⁸

The methods for assessing business impacts, including costs and risks, should be defined, approved, and consistently applied across the patient demographic data lifecycle. This enables a standard baseline to be established for each cleansing initiative, allowing for cross-comparability and easier measurement of progress.

⁸ Duplicate Records Compromise EHR Investment. Healthcare Financial Management. August 2009. http://www.healthcaretechnologyonline.com/doc/duplicate-records-compromise-ehr-investment-0002

Additional Information

Time-series analysis of defect patterns will often indicate the root cause. Addressing root causes results in scalable solutions and increased confidence in the quality of the data. Therefore, maintaining an archive of data changes supports sustainable data quality effectiveness. For example, many organizations require a patient’s previous name and previous address change history for updates.

All demographic updates should be logged with a date and time stamp and the user identifier or other source of the change. The organization should also produce (or acquire from the EHR) and maintain reports for all patient record merges.

Additional Information

Addressing data defects at the source, when possible, will increase efficiency and improve stakeholders’ confidence in the quality of the data. This is especially important, since some organizations invest in manual review of potential duplicates out of fear of automating false positive matches that could have serious patient safety repercussions. As a result, there is a strong industry interest in establishing policies internally and industry-wide that call for adherence to standard procedures in the patient registration process.

While it is not always possible to make changes at the original source of data capture, it is important to hold data providers accountable for the quality of their data. (See Provider Management.)

Example Work Products

Data cleansing policy
Data cleansing processing and rules
Data correction methodologies
Data cleansing defect reports

Additional Information

It is important for quality rules to be implemented consistently across the data lifecycle in order to ensure data consistency and insulate data improvement efforts against being accidentally undermined by inconsistent data handling in downstream systems.

The lack of standard patient matching processes and technologies threatens patient safety and organizational efficiency, which will be exacerbated as interoperability increases in various forms. For example, patient record matching algorithms that differ widely in terms of required attributes and precedence rules sometimes result in incorrectly merged records that cannot be unmerged.

Example Work Products

Data change history log
Traceability matrix
Data cleansing feedback
RACI matrix for data cleansing governance, activities, and rule development
Data cleansing results report templates

Practice Evaluation Questions

Tier 1: Foundational

1.1 Are data cleansing requirements derived from the results of data profiling or data quality assessments?

Tier 2: Building

2.1 Does the organization evaluate and improve business processes to prevent repeatedly cleansing the same patient demographic data?

2.2 Are cleansing activities for patient demographic data evaluated according to business and technical impacts?

2.3 Is a history of patient demographic data changes resulting from cleansing activities maintained?

2.4 Does the organization have a policy and processes to ensure that patient demographic data is modified at the point of origination in accordance with quality rules?

Tier 3: Advanced

3.1 Are patient data cleansing rules applied consistently across the patient demographic data lifecycle?

Evaluate your Organization

Patient Demographic Data Quality Framework

Data Quality

Data Cleansing and Improvement

Purpose

Introductory Notes

Additional Information

Additional Information

Additional Information

Additional Information

Additional Information

Additional Information

Practice Evaluation Questions

Tier 1: Foundational

Tier 2: Building

Tier 3: Advanced