Patient Demographic Data Quality Framework

Data Requirements Definition

Purpose

Ensures that data produced and consumed satisfies business objectives, is understood by all relevant stakeholders, and meets the needs of the business processes that create and use the data.

Introductory Notes

While most organizations have a comprehensive approach to defining requirements for information system functionality, the corresponding data requirements are often neglected by comparison. Typically, attention is focused on system behavior; for example, “The system shall display a patient’s name history”, “The system shall require that the Social Security Number is entered twice”, or “The system shall display the message ‘check for existing patient’ if the user enters the same name, birth date, and gender as an existing patient record.”

It is not uncommon for an IT project team to quickly design a database during software development, without reference to business terms, data standards (names, metadata, allowed values, ranges, lengths, etc.), or quality rules. Organizations are much better served by ensuring that selection of and specifications for data used to satisfy business objectives, are prioritized, validated by stakeholders, and well documented through a repeatable process.

Data requirements definition establishes the process used to identify, prioritize, precisely formulate, and validate the data needed to achieve business objectives. When documenting data requirements, data should be referenced in business language, reusing approved standard business terms if available. If business terms have not yet been standardized and approved for the data within scope, the data requirements process provides the occasion to develop them. For patient demographic data, governance should be engaged in validating data requirements, with representation from supplying and consuming business areas across the lifecycle to ensure that their requirements are met.

Data requirements definition should follow an organized and sequential discovery and decomposition process. Business rules for system behavior should be developed in parallel with the logical design of the destination data store; this method is bi-directional and iterative. Data requirements should be represented in the logical design of the data store and should reflect standardization across projects.

If data in the new data store already exists elsewhere and will migrate, profiling should be performed to ensure that it meets the business expectations and requirements prior to population (See Data Profiling). This may positively impact the design process by surfacing the need for additional quality rules or specifications, and it will improve the percentage of requirements satisfied and reduce the amount of rework for future releases.

It is advised to develop a standard template for data requirements specification, for new systems, data stores consolidations, data repositories (e.g., Master Patient Index, enterprise data warehouse), and developing data exchange mechanisms. The data requirements definition process contributes to the creation and validation of business terms and definitions, which link to metadata, data standards, and the business processes which manage and process the data. The template can be as simple as a spreadsheet capturing, for example, the following information:

  • Business term – the data element name in business English, e.g., Street Address;
  • Term definition – the approved definition of the business term, e.g. Birth Date – the date on which a person was born;
  • Originating business process – the process that creates the data, e.g., Patient Registration;
  • Consuming business process(es) – the process or processes that use the data, e.g., Clinical Care, Laboratory, Claims;
  • Modifying business process(es) – the process or processes through which the data can be modified, e.g., Billing;
  • Owner – the name of the individual who has the responsibility for ensuring that the business term is correct and approved; and/or the ability to grant or deny permission for access; and/or the individual who manages the business process that creates the data element – however the responsibility is assigned;
  • Steward – the name of the individual who represents the data element in governance activities, on behalf of the entire organization;
  • Logical name – the business term transformed to the organization’s data design standards, e.g., Street Address 1 Text, Street Address 2 Text;
  • Allowed values – the codes, minimum/maximum ranges, etc. which are acceptable, e.g., M, F, U;
  • Values format – how the values are represented, e.g., MMDDYYYY, 60x (text characters), 999-99-9999 (SSN), 2x (state code), 9-999-999-9999 (phone number), etc.;
  • Originating data source(s) (if acquired) – e.g., Registration Capture System;
  • Source table name – the name of the table within the source, if applicable, e.g., PT_PRFLE (patient profile);
  • Source column name – the name of the column containing the data in the data source, e.g., PT_FRST_NM;
  • Physical name – the name of the term developed for the physical database in which it is or will be stored, applying physical data standards, e.g. ST_ADDR_1_TX; and
  • Quality rule(s) – the automated test or tests that will be applied to the data element upon entry, e.g., First Name must contain more than one character, First Name must contain a single word (no extra components such as suffixes).

It can be observed from the sample list above, which may vary according to the organization, that the data requirements definition process is dependent on, or may become the occasion for, executing many of the data management processes described in this document, supported by corresponding work products: Business Glossary, Metadata Management, Data Governance, Data Lifecycle Management, Data Quality Assessment, Data Cleansing and Improvement, and Provider Management. This is a practical illustration of the synergy of best practices.

The organization should apply the requirements definition process and standard template when considering adding new patient demographic data elements, such as mother’s maiden name, previous address, previous phone number, etc. The effects on existing business processes, matching algorithms, confidence in patient identity, and projected development and maintenance costs should be analyzed and reviewed and approved through governance.

Adopting data requirements from regulatory and industry sources is highly advised an organization seeking to improve the quality of its patient demographic data. For example, there are a number of healthcare industry efforts underway to advocate adoption of standardized data attributes to improve patient identity integrity. The recommended data sets proposed to improve matching vary; a number of relevant standards are referenced in the “Patient Identification and Matching Final Report,” published by Audacious Inquiry in 2014 for ONC.

Establishing and following sound practices for defining data requirements is critical to minimizing data complexity over time. Effectively implementing this process will yield the following benefits:

  • Ensures that knowledgeable individuals determine what data is needed;
  • Increases the ability to share data across the organization and among organizations;
  • Ensures proactive data quality measures are built into systems and data stores;
  • Strengthens the relationship between data and business processes;
  • Establishes data ownership, stewardship, and lineage; and
  • Enhances the business glossary and builds metadata assets.

Additional Information

Standard business glossary terms and definitions support common understanding for shared concepts by all relevant stakeholders. Using business glossary terms as a starting point for data requirements definition ensures that similarly described data attributes can be clearly understood. Precise meaning provides the foundation for consistent data capture and interpretation.

For example, Patient Name is ambiguous without additional information, whereas Patient Last Name may be overly specific for the glossary. Establishing a common term, Last Name, and giving it a precise definition can ensure that all instances of a person’s last name are implemented consistently and support the same meaning. Requirements should document traceability from attributes to business glossary terms.

Example Work Products

  • Requirements mapping to business terms and definitions
  • Data requirements documentation
  • Stakeholder approval of requirements

Additional Information

The data requirements process can be documented by creating a template that is annotated with instructions. Templates ensure consistency and allow more emphasis to be given to stakeholder needs rather than how to document them.

The complexity of documenting data requirements can vary. However, basic data requirements definition should include the data attributes required, metadata standards, data owners, business glossary mapping, and identification of relevant business processes to name a few (See Introductory Notes).

Additional Information

The relationships between data requirements may include one-to-one, many-to-one, one-to-many, and many-to-many. Techniques are used to design and articulate these complex interrelationships into conceptual, logical, and physical data models. Whether the organization builds its own data stores, or purchases software or services based on a vendor data store design, it is important to be aware of any design constraints, such as allowing only one historical address to be stored, etc.

Example Work Products

  • Standard data requirements template
  • Requirements mapping to business objectives
  • Requirements mapping to data model(s)
  • Stakeholder requirements approvals
  • Review board notes and decisions

Additional Information

While standard business glossary terms and definitions support shared meaning, other metadata standards are needed to enable patient demographic data to flow smoothly from data store to data store across the lifecycle from creation to deletion.

Metadata standards include data attribute names, descriptions, privileges, data governance roles, data types (integer, string, date, time, etc.), valid values, domain ranges, etc.

Data requirements that follow common metadata standards support greater control over stored data. In addition, data from systems designed utilizing the same metadata standards may eventually be compared to, or even linked by, automated feeds with little need for data cleansing (See Data Cleansing and Improvement).

Additional Information

Data requirements should reflect the meaning of the data, and the meaning of the data should align with the purpose of the process. Data requirements result from the defnition of process needs. Understanding the steps involved in the processes that create, read, update, or delete data is essential to understanding the correct use of the data, to identify data owners, and to better grasp the meaning of the data (See Data Lifecycle Management).

Providing a view into the process promotes trust in the quality of the data. Linking data to its processes also supports data quality improvement initiatives (See Data Cleansing and Improvement). For example, improving the patient registration process to ensure that personnel validate essential patient data attributes up front results in better quality patient demographic data overall.

Example Work Products

  • Requirements mapping to use case documentation (e.g., flow charts, process diagrams with swim lanes for roles, etc.)
  • Requirements mapping to business processes
  • Documented data security and entitlement rules

Practice Evaluation Questions

Tier 1: Foundational

1.1 When data requirements are developed, are they expressed using standard business terms consistent with the organization’s business glossary?

Tier 2: Building

2.1 Is a data requirements definition process for patient demographic data documented and followed?

2.2 Are data requirements aligned with internal (or external) data model(s) and other related artifacts?

Tier 3: Advanced

3.1 Do data requirements utilize standard definitions, ranges, and values for patient registration data?

3.2 When data requirements are documented, are they mapped to the business processes that create and modify the data?