The terms Business Glossary, Data Catalog and Data Dictionary are often used in Data Governance field, but their definitions are not always standardized.
By Laura Bacci, Principal Consultant at Kirey Group
The terms Business Glossary, Data Catalog and Data Dictionary are often used in Data Governance field, but their definitions are not always standardized. The lack of clarity and distinct differentiation often causes overlap and conflicts of competencies.
Element | Content | Application Level |
Responsibility |
Business Glossary | Business Metadata |
Corporate | Business |
Data Dictionary | Technical Metadata |
Corporate/Divisional | Technology |
Data Catalog | Data Localization |
Corporate/Divisional | Technology/Business |
Tab 1 – Three Data Governance elements
As listed in Tab. 1, these tools collect metadata useful to contextualize data, despite having different characteristics: some are business-oriented metadata (Business Glossary), others are technical metadata (Data Catalog and Data Dictionary). Some of the confusion is is easy to understand, considering how Data Governance generally evolves within an organisation. For example, it is fairly typical to start by creating a Data Catalog and then to build a governance program from that; or, for a Data Quality program, it is typical to start by defining the Data Dictionary.
This type of approach is potentially effective in delivering immediate results, but it should be "adjusted" and corrected to fully capture the additional value that is obtained from common implementation of all three instruments, within a data governance project.
What is Business Glossary?
A Glossary should be focused on the business terms used within the organization, it should be easily understandable at all levels and it should define what each term means in terms of business. What is meant by Customer? By Tax Code? By Reserve? By Award? By Current Account?
The Glossary was created to answer these kinds of questions. It is clear that a Glossary is typical of a specific company, even if, in order to facilitate interoperability, it should be constructed to have most of the terms common to companies operating in the same sector.
Benefits of a Comprehensive Business Glossary
The advantage of having such a Glossary is that a common vocabulary of terms can be shared within the organization, with all the positive effects that this has on operational activities, both functional and technical, as well as on the projects that are executed. The range of application for a Glossary should be at all the company levels. The Glossary could be placed at the divisional level when the different company divisions deal with significantly different business. Therefore, they have to adopt a significantly different terminology.
Responsibility and Ownership
The Glossary’s responsibility should not be delegated to the Information Technology, because of its scope and the skills required for its drafting. The Glossary is the property of the Company or, better to say, of the functional component of the Company, rather than of its technological component.
Data Dictionary
A Dictionary should focus on descriptions and details of the physical structure of data. Every data stream (file or database table) used within the organization should be registered within the Dictionary. This should include details of data such as type, permissible length, technical name, transformations (the lineage[1] ) and any other relevant technical details. These details are nothing more than the mostly technical metadata we were talking about at the beginning of this article. The details allow data architects and data engineers to understand how to associate and to query the data for the design of information systems and the production of reporting used by the functional component of the company.
Data Catalog
Finally, the Catalogue works as a register to identify the data localization. It should be seen as an asset that is the unique reference source for the location of any data set and for the identification of all possible internal needs, at technical-operational level, at Information Technology level and for Data Science or Business Analytics activities.
Like the glossary, the catalogue is developed at the divisional level rather than at the corporate level if some divisions carry out a business significantly different from the others.
Some examples
All three tools described above are necessary to implement Data Governance in the best possible way, as they are the foundation on which Data Governance is built. How can we think about governing data if we don’t know what they are, where they are and what they mean for my business?
Element | Content | Example |
Business Glossary | Business Metadata |
The tax code serves to uniquely identify natural persons and other entities in their dealings with State institutions and governments. |
Data Dictionary |
Technical Metadata |
Storage in string composed of 16 alphanumeric characters for natural persons and 11 digits for subjects other than natural persons. Features of no nullability and uniqueness. |
Data Catalog |
Data Localization |
The tax code is contextualized to the various processes in which it is used listing the storage in the various data sources:<Database1><tabella1><campo1> <Database2><tabella3><campo15> |
Tab 2 – Content examples for the three Data Governance elements
Moreover, the three tools should "talk" to each other: if we have recorded the location of all informative resources in the Catalog and we have recorded all the metadata related to the same informative resources in the Dictionary, we will have to link the two pieces of information to take full advantage of both. I will also have to link the definitions of the business terms included in the Glossary to the resources recorded in the Catalog to completely close the circle and to know exactly which metadata are characteristic of each business term.
An effective and efficient approach
The complete effectiveness and efficiency are achieved by converging all the information within the three tools in a single centralized repository of Data Governance metadata. In this repository, all the metadata related to the entities of Data Quality processes, described below, are added as control rules, outcomes and reports. In this way, the management of metadata becomes the central node for the development of data governance. This, at least, is the theory.
In real practice, the approach is often partial not completely organized or optimized due to factors such as managerial ability or simply due to cost problems. For this reason, it is unrealistic to imagine an approach to Data Governance that provides, a global project that counts the entire company’s information assets starting from scratch.
The best approach is an Agile one (borrowing the term from Project Management[2]): local and departmental initiatives are initially developed; at the end, they eventually converge towards a centralized vision. The approach should be incremental, including first those areas of information that are considered as priorities for various reasons (market, impact on multiple business areas, regulatory relevance, etc...), and then gradually add all the others considered relevant for overall governance.
Notes
[1] The lineage is the source-target relationship between the data transformations.
[2] "Agile" methodologies originated in software development projects and refer to flexible development practices, which include small development teams, iterative and incremental development, adaptive planning, and the direct and continuous involvement of the customer in the development process.