As the number and type of data assets continues to increase, the three key objectives that organisations are continually striving to achieve are:
- Deriving actionable insights from their data assets
- Compliance with regulations and standards
- Improving operational efficiency
What is a data catalogue and how does it help?
Data catalogues are simply the tools that support the practice of data cataloguing. They provide the repository for storing the useful reference of information about your data (the metadata), whilst in many cases also helping to first collect and later share the information. Once in place, the data catalogue helps by supporting users access the wealth of information that they store. Typical data catalogue users include:- Data Consumers: They are people who need to connect to data that the organisation holds, such as data analysts or data scientists. When you have the right information in your data catalogue, data consumers can quickly locate and assess relevant datasets that support their analytical goals, from communicating business intelligence to decision makers to building accurate machine learning models. This drives efficiency and helps quality.
- Data Governors: This group comprises of people who have responsibilities around understanding and controlling data across the board, such as privacy and compliance teams, data governance and data management teams and data owners. Using a data catalogue, they gain insights into data risks and governance requirements to enable appropriate remedial actions to be taken and demonstrate to the business and to regulators that the data is understood and governed.
- Data Custodians: They are the ones who ensure the data is secure and available, typically referring to the IT department. Data catalogues enable them to gain valuable technical details relating to the way the data moves and is stored, thereby enabling data security and availability. This information is particularly useful for assessing and managing changes, such as data migrations, helping to efficiently minimise risk associated to such changes.
Maximising the impact of data catalogues
The metadata that could be stored falls into two distinct categories:- That which can be discovered from the data itself. This includes fundamental details such as data size, format, and type, which can often be discovered easily and automatically by data cataloguing tools or other means.
- That which can only be discovered from people. This is often the contextual information surrounding the data that adds a critical layer of meaning to the information in the data catalogue.
- Understanding the audience: One of the biggest risks to the success of a data catalogue is losing sight of the intended users- who they are and how they will want to use the tool. Maintaining focus on their needs ensures the information collected and stored in the data catalogue will answer their questions. This is important as different stakeholders require different types of information, and it’s crucial to capture metadata in a way that is meaningful to them. For instance, while data analysts may focus on attributes like data accuracy and relevance, privacy professionals may prioritise information related to personally identifiable information (PII) and sensitive data categories. Invariably, the information needed will be a combination of both that which the data knows about itself and that which must be harvested from the collective knowledge of the organisation.
- Incremental cataloguing: An important thing to note is that it is not about the tool in itself but about the process of cataloguing. Data cataloguing is an ongoing process that evolves over time. It’s not necessary to capture all the tribal knowledge present in different pockets of the organisation up front. Organisations should prioritise cataloguing efforts based on immediate needs and gradually expand to cover additional datasets and domains. Starting small allows for agile adaptation to changing requirements and prevents overwhelming cataloguing efforts.
- Capturing contextual information: By definition, contextual information is known and understood by people and thus often relies on human interpretation and subjectivity. As a result, we are faced with the challenge of capturing the various nuances and variations in interpretation between people. To address this challenge and avoid getting stuck in a quagmire of difference of opinion and disagreement, focus efforts on the ‘non-negotiable’ foundational elements that have clear and unambiguous definitions that can be referenced and start with that. These might include business glossary terms or taxonomies or conventions, perhaps defined within documented business processes. Later, you can move on to expand the terms and categories that are formally defined or alternatively accommodate flexible or localised terminology or conventions informally within the catalogue, thus avoiding the need for consensus. By capturing both foundational elements and flexible, domain-specific terms, organisations can ensure comprehensive coverage while accommodating diverse perspectives.


