Get your daily dose of tech!

We Shape Your Knowledge

Metadata: which are the differences between Business Glossary, Data Catalog and Data Dictionary?

Kirey Group

  

    The terms Business Glossary, Data Catalog and Data Dictionary are often used in Data Governance field, but their definitions are not always standardized.

    By Laura Bacci, Principal Consultant at Kirey Group

    The terms Business Glossary, Data Catalog and Data Dictionary are often used in Data Governance field, but their definitions are not always standardized. The lack of clarity and distinct differentiation often causes overlap and conflicts of competencies.

    Element Content Application Level
    Responsibility
    Business Glossary Business Metadata
    Corporate Business
    Data Dictionary Technical Metadata
    Corporate/Divisional Technology
    Data Catalog Data Localization
    Corporate/Divisional Technology/Business


    Tab 1
    Three Data Governance elements

    As listed in Tab. 1, these tools collect metadata useful to contextualize data, despite having different characteristics: some are business-oriented metadata (Business Glossary), others are technical metadata (Data Catalog and Data Dictionary). Some of the confusion is is easy to understand, considering how Data Governance generally evolves within an organisation. For example, it is fairly typical to start by creating a Data Catalog and then to build a governance program from that; or, for a Data Quality program, it is typical to start by defining the Data Dictionary.

    This type of approach is potentially effective in delivering immediate results, but it should be "adjusted" and corrected to fully capture the additional value that is obtained from common implementation of all three instruments, within a data governance project.

    Go to the news

     

    Business Glossary

    A Glossary should be focused on the business terms used within the organization, it should be easily understandable at all levels and it should define what each term means in terms of business. What is meant by Customer? By Tax Code? By Reserve? By Award? By Current Account?

    The Glossary was created to answer these kinds of questions. It is clear that a Glossary is typical of a specific company, even if, in order to facilitate interoperability, it should be constructed to have most of the terms common to companies operating in the same sector.

    The advantage of having such a Glossary is that a common vocabulary of terms can be shared within the organization, with all the positive effects that this has on operational activities, both functional and technical, as well as on the projects that are executed. The range of application for a Glossary should be at all the company levels. The Glossary could be placed at the divisional level when the different company divisions deal with significantly different business. Therefore, they have to adopt a significantly different terminology.

    The Glossary’s responsibility should not be delegated to the Information Technology, because of its scope and the skills required for its drafting. The Glossary is the property of the Company or, better to say, of the functional component of the Company, rather than of its technological component.


    Data Dictionary

    A Dictionary should focus on descriptions and details of the physical structure of data. Every data stream (file or database table) used within the organization should be registered within the Dictionary. This should include details of data such as type, permissible length, technical name, transformations (the lineage[1] ) and any other relevant technical details. These details are nothing more than the mostly technical metadata we were talking about at the beginning of this article. The details allow data architects and data engineers to understand how to associate and to query the data for the design of information systems and the production of reporting used by the functional component of the company.


    Data Catalog

    Finally, the Catalogue works as a register to identify the data localization.  It should be seen as an asset that is the unique reference source for the location of any data set and for the identification of all possible internal needs, at technical-operational level, at Information Technology level and for Data Science or Business Analytics activities.

    Like the glossary, the catalogue is developed at the divisional level rather than at the corporate level if some divisions carry out a business significantly different from the others.

    The Catalogue's compilation requires the knowledge of the localization of all the data resources present in the company.  Most of these data are IT managed, so usually the Catalogue's responsibility is entrusted to Information Technology. In the Catalogue should also list those data produced and managed by the functional components. Consequently, the compilation should be the result of a joint activity involving both the technical and functional components of the organisation.


    Some examples

    All three tools described above are necessary to implement Data Governance in the best possible way, as they are the foundation on which Data Governance is built. How can we think about governing data if we don’t know what they are, where they are and what they mean for my business?

     

    Element Content Example
    Business Glossary Business Metadata

    The tax code serves to uniquely identify natural persons and other entities in their dealings with State institutions and governments.

    Data
    Dictionary
    Technical Metadata
    Storage in string composed of 16 alphanumeric characters for natural persons and 11 digits for subjects other than natural persons. Features of no nullability and uniqueness.
    Data
    Catalog
    Data
    Localization
    The tax code is contextualized to the various processes in which it is used listing the storage in the various data sources:<Database1><tabella1><campo1>
    <Database2><tabella3><campo15>


    Tab 2
    Content examples for the three Data Governance elements

    Moreover, the three tools should "talk" to each other: if we have recorded the location of all informative resources in the Catalog and we have recorded all the metadata related to the same informative resources in the Dictionary, we will have to link the two pieces of information to take full advantage of both. I will also have to link the definitions of the business terms included in the Glossary  to the resources recorded in the Catalog to completely close the circle and to know exactly which metadata are characteristic of each business term.


    An effective and efficient approach

    The complete effectiveness and efficiency are achieved by converging all the information within the three tools in a single centralized repository of Data Governance metadata. In this repository, all the metadata related to the entities of Data Quality processes, described below, are added as control rules, outcomes and reports. In this way, the management of metadata becomes the central node for the development of data governance. This, at least, is the theory.

    In real practice, the approach is often partial not completely organized or optimized due to factors such as managerial ability or simply due to cost problems. For this reason, it is unrealistic to imagine an approach to Data Governance that provides, a global project that counts the entire company’s information assets starting from scratch.

    The best approach is an Agile one (borrowing the term from Project Management[2]): local and departmental initiatives are initially developed; at the end, they eventually converge towards a centralized vision. The approach should be incremental, including first those areas of information that are considered as priorities for various reasons (market, impact on multiple business areas, regulatory relevance, etc...), and then gradually add all the others considered relevant for overall governance.

    New call-to-action

    Notes

    [1] The lineage is the source-target relationship between the data transformations.

    [2] "Agile" methodologies originated in software development projects and refer to flexible development practices, which include small development teams, iterative and incremental development, adaptive planning, and the direct and continuous involvement of the customer in the development process.


    Related posts:

    Kirey Group at DAMA Italy Annual Convention 2021

    Kirey Group participates in the DAMA Italy Annual Convention to promote and encourage best practices...

    Data virtualization: how to achieve innovative pot...

    Real-time information and data is an increasingly strategic asset for companies to improve services ...

    The importance of Data Lineage in the Data Governa...

    Data Lineage is a data traceability approach for implementing a robust Information Governance framew...