Patient Demographic Data Quality Framework

Data Lifecycle Management


Ensures that the organization understands, inventories, maps, and controls its data, as it is created and modified through business processes throughout the data lifecycle, from creation or acquisition to retirement.

Introductory Notes

Data lifecycle management enables an organization to avoid data risks and supports the discovery and application of needed data quality improvements. It is a particularly important topic when addressing interdependent business processes that share or modify data. The data lifecycle begins with the creation of data at its point of origin through its useful life in the business processes dependent on it, and its eventual retirement, archiving, or destruction. An organization benefits from defining data usage and corresponding dependencies across business processes, for data that is either required by multiple business processes or critical for important business functions.

The classification of lifecycle phases for data assets typically includes the following sequential categories:

  • Business specification (e.g., data requirements, business terms, metadata);
  • Origination (i.e., the point of data creation or acquisition by the organization);
  • Development (e.g., architecture and logical design);
  • Implementation (i.e., physical design, initial population in data store(s));
  • Deployment (i.e., rollout of physical data usage in an operational environment);
  • Operations (e.g., data modifications, data transformations, and integration performance monitoring and maintenance); and
  • Retirement (i.e., retirement, archiving, and destruction).

Data within each major subject area (i.e., broad data groupings such as Organizations, Facilities, Persons) is also classified, traced, and sequenced by its creation, modification, or usage within the primary business processes of the organization. For example, a new patient is registered and provides insurance information, the patient then sees a physician, is subject to treatment and/or laboratory tests, and returns for a followup visit; the patient’s insurance is submitted, the insurance payment is received, and the patient is billed.

The corresponding business processes that creates data would be: Demographic and insurance data is captured through the registration process; office visit information, diagnosis and treatment data, and provider notes are captured through the clinicial evaluation and diagnosis process; laboratory data is captured when tests are ordered and when lab results data is added; returning patient data is again captured through the registration process; procedures are documented and sent through the insurance claims process; insurance payments are recorded for that patient through the payment receipt and allocation process; and a bill for uncovered charges is sent through the billing process.

All of the above processes create data about a patient; however, central to them all is patient demographic data. Duplicate records caused by a lack of patient identity integrity, most often occuring at the point of origination through the registration process, can affect treatment, testing, insurance claims, and billing. The demographic data about a patient is a critical data set, and its reference and usage throughout the healthcare lifecycle is ubiquitous. Therefore, it is recommended that organizations analyze every process where it is created and updated, both to ensure completeness and accuracy within patient records and to prevent duplicate records.

It is advised to identify dependencies among business processes using patient demographic data at the attribute level, enabling the organization to develop a comprehensive understanding of data interrelationships. If external organizations are involved in capturing or modifying the data, it may be necessary to determine what processes they follow to discover where defects, anomalies, or missing data occurs.

The first step in mapping patient demographic data to supplying and consuming business processes is to model each process that produces, modifies, or consumes the data. This can begin as simply as creating a sequential activity list, indicating what the usage is with respect to the data. For example, it may be that under no circumstances would a nurse or provider providing clinical care ever make a change in demographic data. In that case, the usage can be classified as Reference (aka, “Read” access) for that business process. However, it may be that the claims or billing processes occasionally surface the need to correct an inaccurate ZIP code, so that process usage may be classified as Modify.

For any usage other than Reference, the organization can zero in on the activity step within the business process and determine if there is potential for introducing errors. This may lead to improvements in the business process or procedures.

If the organization has multiple data stores containing patient demographic data, establishing the source to target(s) mapping is another highly useful activity. This consists of the identificaion of data elements at the point of origin, the identification of other data destinations, the mapping of the representation in the source to the representation in the target(s). For example, the street address in a patient record as captured in registration may be initially stored as Street Address with a 60 character length limit, when transferred to another system it may be stored as Patient Address with a 40 character length limit, and when transferred to still another system it may be stored as Address with a 50 character length limit.

Understanding where the data comes from, where it goes, and who can modify it is essential to effective prevention of defects and proactive efforts for data improvement. Over time, the organization is advised to map all business processes involving patient demographic data. Once established, mapping may be reviewed periodically and updated to reflect changes.

The data management function (or role), working with business experts, business process architects, and other stakeholders, often through a data working group (See Governance Management), is typically charged with facilitating the definition and verification of business process to data requirements. The data management function also typically develops and maintains data lifecycle management processes.

When data usage has been mapped to business processes and data has been traced from source to target(s), the organization can realize the following benefits:

  • Identify and reduce process and data bottlenecks;
  • Control redundancy through more accurate identification of duplicate records;
  • Minimize or eliminate unwanted changes to data content;
  • Improve consistency, reliability, and access to needed data;
  • Improve the ability to perform root cause analysis;
  • Trace data lineage across the patient demographic lifecycle; and
  • Improve management of historical data
  • Additional Information

    At a high level, the definition of the lifecycle for patient demographic data requires traceability of data from creation or acquisition through key transformations including on to final deletion or archiving. Traceability of data dependencies need not be restricted to applications that are linked through automated data feeds. Often, manual updates are made based on reports from siloed applications. For example, updates may be made by manually reviewing potentially matching patient records identified using rule-based algorithms to determine if the they refer to the same patient.

    Example Work Products

    • Business process to data mapping, specifying creates, reads, updates, and deletes (e.g., ‘CRUD’ matrix)
    • Consumer and producer matrix
    • Business process descriptions
    • Data flow diagrams

    Additional Information

    Using disparate sources to capture the same data on the same patient is a common behavior and a typical cause of data quality issues. In some cases, different data sources may be used in the same process at different times and/or by different people. In other cases, inconsistent patient records surface when data sharing is needed to accomplish different processes along the care continuum.

    An authoritative data source is an official source of information that may include a trusted source or a system of record. As such, the authoritative data source may refer to the source that creates the data or the best source for a specific data set, which will depend on context.

    Data governance should provide guidance on how authoritative sources are to be defined, prioritized, and controlled to reduce the risk of creating inconsistent or duplicate patient records.

    Any quality issues identified at any point along the patient data lifecycle should be be addressed in the authoritative source to ensure all other processes will benefit from the correct information. Accordingly, the systems development lifecycle process should require reference to and adoption of approved shared data from authoritative sources.

    Example Work Products

    • Defined authoritative data sources and approved attributes
    • Data source selection criteria
    • Reference to authoritative data sources as part of system development process
    • Data attribute to source mappings

    Additional Information

    At a minimum, the scope of data lifecycle management should address the set of critical data attributes that have been identified for ensuring a patient record can be reconciled to an actual patient. All new development and technology purchases should use the appropriate business glossary terms when documenting data requirements, and the relationships from the glossary terms to metadata corresponding to the implemented technology should be documented and subject to review by data governance.

    While there is no unanimity within the healthcare industry on the entire set of critical data atttributes necessary to ensure patient identity integrity within every context, there is a consensus forming around a minimum set that applies to most settings (See Data Quality Assessment).

    Additional Information

    Processes and applications that impact patient demographic data change over time. Accordingly, traceability of key data attributes along the lifecycle should be regularly reviewed and kept up to date. It can be particularly important to keep track of manual reconciliation processes. These processes can be improved, formalized, and even automated over time, resulting in a more efficient flow of patient demographic data through its lifecycle.

    Example Work Products

    • Approved patient demographic data lifecycle scope document
    • Change management process for defined data sets
    • Lifecycle data mapping of core business processes
    • Governance process to identify data dependencies
    • Metrics measuring progress and authoritative data sources adoption

Practice Evaluation Questions

Tier 1: Foundational

1.1 Is the patient demographic data lifecycle for key business processes defined and understood by stakeholders?

Tier 2: Building

2.1 Do producers and consumers of patient demographic data apply consistent criteria when selecting approved authoritative sources?

Tier 3: Advanced

3.1 Are stakeholder requirements for patient demographic data mapped and aligned to an approved data lifecycle scope?

3.2 Does the organization maintain and periodically review business process to data mapping?