Patient Demographic Data Quality Framework

Metadata Management

Purpose

Establish the processes, standards, and infrastructure for specification of well organized, comprehensive and accessible information about the data assets under management.

Introductory Notes

Metadata equates to an organization’s knowledge about its data assets. It is information about data: it identifies, describes, and links. It provides context, structure, and classification. It also enables effective usage, retrieval, and management of data. A metadata repository is a compendium of data asset knowledge, typically compiled and enhanced over time in manageable phases.

Metadata empowers the business user. It is structured knowledge about the data assets. Therefore, its usefulness depends on accuracy and meaning for all stakeholders. Even the smallest organization benefits from adopting a thoughtful approach to metadata. For instance, defining and applying prescribed attribute formats, e.g., a 60-character street addresses in initial caps with standard abbreviations (e.g., BLVD, ST), will prevent some patient record matching errors.

For larger organizations which may have multiple data stores housing patient demographic data and engage in data integration and data exchange, developing meaningful metadata is critical for tracking and tracing data at rest and in motion (e.g., what was the originating source, what was the data element name in that source, at what point exactly did an error occur, etc.). Capturing accurate and comprehensive metadata directly supports effective efforts to improve data quality.

Metadata is usually classified in three primary categories:

Business Metadata: Descriptive information employed to understand, locate, search, and control content. It can include elements such as terms and definitions (i.e., the Business Glossary), values, authors, keywords, and publishers. Business metadata may also include identification of business domains (e.g., registration, billing), related subject areas (e.g. clinical data), business rules, and data quality rules. Business metadata is the starting point for mapping to related persistent work products such as standards and procedures.

Operational Metadata: Descriptive administrative information that assists in managing a data asset. It includes information such as who created or updated a record, when it was created or updated (i.e., data provenance); information needed for archival or integration, and access rights and entitlement restrictions (e.g., privacy codes). Administrative metadata related to governance, including information about individuals involved in governing the data (e.g. data owner, data steward, data custodian) is included in this category. It may also describe governance bodies (e.g., executive data council) and their authorities, participants, structure, and responsibilities. In addition, operational metadata is employed to surface process improvements to enhance productivity and improve data quality.

Process metadata, a subcategory of operational metadata, addresses process steps for data production and maintenance, as well as for data quality measurement and analysis. Examples include: quality rules and control requirements.

Technical Metadata: Descriptive information about data stored in physical databases, as well as its transformations through automated processes. For example, the content (e.g., tables and columns) and location (e.g., server) of data stores and interfaces, changes to data sources. It can include information about data types (e.g. name, number) links to related files, database indexes, etc. Technical metadata consists of the following subcategories: 1) “run-time” or dynamic metadata (e.g., configuration or messaging information), and 2) “design-time” or static metadata (e.g., physical data models, data dictionary, and load and transformation scripts).

An organization that develops its patient demographic metadata will realize a number of direct and indirect benefits, as metadata reduces data risks and is essential for:

  • Improving data quality through common understanding and agreement about names, definitions, values, ranges, and formats;
  • Improving the accuracy of patient record matching and identifying duplicates;
  • Tracing the origin of data and assessing impacts across the lifecycle;
  • Building an accessible knowledge base for stakeholders across the organization;
  • Determining when to archive a record; and
  • Mapping data from multiple sources for integration and sharing.

Metadata should be appropriately governed. It is recommended that data governance provides oversight for defining and conforming to metadata categories, properties, and updates. Governance is also engaged in fostering adoption and consistent use across the organization. The data management function or role is typically responsible for the population and maintenance of the metadata repository.

Additional Information

Metadata documentation is valuable when there is a need to determine if the data required for a new system, profiling effort, or algorithm is both available and appropriate for the task. For example, the proposed requirements for patient demographic data matching algorithms include, at a minimum, the following:

  • Given Name
  • Last Name
  • Date of Birth
  • Administrative Gender5

Metadata should be clearly described and stored in a location that is easily accessible for all relevant stakeholders. Defining and populating metadata is a task that has few dependencies. Depending on the resources and needs of the organization, metadata can be conducted in manageable phases—either standalone or associated with new development efforts.

Example Work Products

  • Metadata compendium or virtual metadata repository (single or multiple sources)

References

5 Research has shown that effective patient record matching may require additional data elements, to avoid auto-merging of twins and individuals with common names: Middle Name; Social Security Number; Address; and Phone Number.

Additional Information

When a new data store is built or software services are acquired, there is new metadata to be captured and managed. Many organizations view defining metadata primarily as a necessary but minor task required to complete a requirements or design document. If different teams are creating their own metadata for a point solution, the organization does not realize the cumulative benefits of accurate, standardized metadata. As a result, metadata is often poorly and inconsistently defined, captured, and linked across the lifecycle.

For any data that is highly shared, such as patient demographic data, it is best to engage business representation through governance to define and approve the metadata scope (Business, Technical, and Operational) that will be captured, standardized, and managed.

Adopting consistent formats and acceptable values and adhering to them are important and should be emphasized in a defined process. However, accurately describing what the data represents is even more critical. This is why it is important to start with the business terms and then move to related metadata representations. For example, an inconsistent approach to use of legal names versus nicknames can cause duplicate patient records.

Additional Information

Business metadata includes categories such as Person and Address, but may also include non-demographic categories of data such as Clinical, Facility, etc. Business properties may include type of person, such as patient, physician, nurse, etc. In addition, properties that represent operational and technical attributes associated with the data should also be included, such as steward, time stamp, data owner, and who updated the record.

Standards should be established to define the conventions to be adopted across the lifecycle to ensure consistency of data flow with minimal need for rework. In particular, if metadata standards are not enforced there is a risk of inconsistent population of fields that can lead to an increase in duplicate patient records. This can be addressed through policies and the implementation of controls that restrict data entry to standard codes, values, and formats in the data store that provides initial capture.

It is a best practice for the organization’s metadata repository to be populated with additional categories and classifications of metadata according to a phased implementation plan. It is equally important to be linked to the implemented architecture components. Metadata, and any changes to metadata, should be validated against existing data stores (such as physical column names) to ensure that data store designs match the built environment and that discrepancies between the design and build can be identified, documented, and approved.

Additional Information

While changes to metadata may occur infrequently, they can have significant impacts on downstream systems and users. The trigger for a metadata change can be the implementation of a new system, updates to an existing system, or updates to related business glossary terms and definitions. For example, a change in the business glossary definition for last name may result in the need to add a suffix attribute to every system that allows its inclusion in family name and a redefinition of all metadata related to last name.

Knowledge of applicable systems and users of patient demographic data is needed to conduct an impact assessment on data changes. Therefore, operational metadata should document all stakeholders of a data set across the lifecycle. Because adding metadata to the implemented environment can be costly, organizations may require statistical proof that proposed additional data attributes would improve the quality of patient demographic data and record matching before undertaking to adopt an external standard.

Example Work Products

  • Metadata management policy
  • Metadata repository or repositories
  • Metadata governance and publication approval documentation (including business and technical stakeholders)
  • Metadata standards
  • Metadata change log

Additional Information

Even when metadata is developed in accordance with a defined process, the overall management of metadata is complex and requires thoughtful planning to ensure success. There are tools, processes, and people involved in the ongoing management of metadata. Plans help to align resources with goals and objectives to ensure that metadata is steadily evolving to describe the patient demographic data environment well, as well as being managed effectively. The plan should include the following topics:

  • Goals and objectives for metadata management;
  • A RACI chart (e.g., governance teams, subject matter experts, other stakeholders);
  • The metadata scope (categories, properties, and other metadata attributes);
  • Tool / mechanism selection or maintenance process; and
  • Sequence plan describing how the organization will develop and populate metadata.

The metadata plan should address the lifecycle of patient demographic data, as well as any external standards that may be beneficial to adopt to improve interoperability with relevant industry participants and regulators.

Example Work Products

  • Metadata management plan
  • Metadata responsibilities for data management function
  • Metadata management organization standards
  • Gap analysis results comparing implemented platforms against metadata
  • Metadata sequence plan milestone and progress reports

Practice Evaluation Questions

Tier 1: Foundational

1.1 Is metadata defined, stored, and accessible?

Tier 2: Building

2.1 Is a metadata management process defined and followed?

2.2 Are metadata categories, properties, and standards defined and followed?

2.3 Is metadata used to capture data interdependencies and conduct impact analysis on potential data changes?

Tier 3: Advanced

3.1 Is the organization following a defined plan for capturing, maintaining, and governing metadata?