Thought Leadership

How companies define data governance varies, but data cataloging remains a pressing priority

Nick Amabile


November 18, 2020
Featured image for “How companies define data governance varies, but data cataloging remains a pressing priority”

In our digital economy, companies no longer question whether there’s a wealth of vital information within their own data. There is.

What remains unclear, however, are the buzzwords behind the modernization project will place that data within their reach.

A favored term among consultants and industry analysts, data governance is a critical part of an organization’s digital transformation. But a given Google search uncovers as many definitions of the phrase as there are types of businesses who need it.

What is data governance?

More than an item to be crossed off a project checklist, true data governance is a holistic program involving multiple interrelated initiatives. Data governance requires a concerted, ongoing effort across multiple domains within your company. And unfortunately, even when all the right tools are in place, data governance suffers without internal stakeholders who are accountable for maintaining its ongoing health.

Proper data governance involves four components:

  • Data cataloging
  • Data lineage
  • Data quality
  • Change management

Data quality centers on accuracy, which is maintained through providing a means to resolve data that’s fragmented, duplicated, or incorrect. Change management refers to ensuring your organization is informed of changes to data definitions, such as how revenue or expenditures are calculated. This component is especially crucial in regulated industries like financial services or healthcare.

Data cataloging and data lineage are just as critical because these aspects of your data governance program center on where your data is stored and what it means. When implemented properly, an effective data cataloging initiative ensures the facts behind your data make sense to users without a technical background – a number that likely constitutes most of your company. In terms of establishing a truly sustainable, data-driven culture, data cataloging cannot be overlooked by your organization.

A data catalog functions like google for your business information

Data cataloging and data lineage constitute two similar pieces of the data governance puzzle. Your data lineage tracks where the your business data is coming from and what happens to it before it’s available for analysis. For example, your website generates customer data, which is then stored in a cloud-based warehouse. As your analytics team uses business intelligence software to generates reports from this data, it must be accessed from the warehouse and transformed. Put another way, data lineage constitutes your company’s data supply chain.

Similarly, data cataloging monitors your business data sources  with a focus on what it means and how it’s used. A data catalog functions like  Google  for your data. However, instead of being organized around technical details, a data catalog allows you to search your defined business concepts.

Say your business uses a subscription model, and someone within your organization wants to examine customer retention. With a well-organized data catalog, this user could search for ‘retention’ and view all the datasets, tables, and reports that relate to that subject.

The information may include how your business defines retention, what specific queries relate to retention metrics, or previous reports generated by your analytics program. Implemented properly, a data catalog displays the data you have available for analysis, where it lives, and how to access it in a trustworthy way.

Given the sheer volume of data and data sources involved, data cataloging provides a crucial means to better understand the information at your disposal. And with the help of the right software solution, you can manage part of this effort automatically.

How data catalog software illuminates the information at your disposal

A data governance program requires multiple layers of software to manage the information generated and consumed by today’s business. Cloud environments such as Snowflake gather data into a single source, and analytics programs like Looker and Tableau connect with those warehouses to provide visualizations and reporting. Maintaining these platforms and how they interrelate is a herculean task without the right technology.

A software platform like Alation will search all the existing reports and datasets used by your data stack. Alation incorporates information from your data warehouse as well as the associated reports generated by business intelligence software. The software also keeps your data catalog up-to-date by adding new information added by your users. By automatically compiling information across these platforms, Alation provides a level of maintenance to ensure your data sources are accurate.

Data cataloging software also illustrates which parts of your organization are using reports from a given source. If an issue arises for the data within a specific topic like finance, Alation allows you to contact everyone who is using those assets. Plus, the software increases efficiency by monitoring whether a given report or table is still being used within your organization. With the ability to deprecate unused components, your data remains more organized.

However, for a data catalog to be meaningful for your whole organization, it needs documentation. Alation will determine a table is applicable to a given search term, except users may not understand how to use it if its terms aren’t clearly defined. Your data can be organized and accessible, but without a level of internal ownership to clarify its definitions, it will never reach its full potential.

Data stewards from stakeholder departments keep governance on track

For a data governance program to succeed, your organization must ensure accountability among its stakeholders. Companies see the value in managing and maintaining their data, yet many forget to assign ownership over its continued reliability. Without this critical step, faulty reports, inaccuracies, and other data quality issues inevitably remain unresolved.

Stakeholders who rely most on the insights from your data have a vested interest in ensuring its conclusions are trustworthy. By establishing a data quality working group that crosses multiple departments, your data becomes a shared responsibility. If there’s a problem such as missing information from a given day, then the working group can triage the issue, identify its source, and implement a solution.

These data stewards also play a key role in data governance. When given a subject, Alation will surface relevant reports, tables, and datasets across multiple tools and sources.  That said, without proper documentation, these search results will have limited use within your organization. Given that data stewards are experts in their datasets and use cases, they provide a translation between your data’s technical and business languages so every user can understand.

In a data catalog search, Alation will surface details from a cloud-based data warehouse and its analytics software. But not every business user will understand those results unless the labels that define its terms are clear. Even if a given table’s results are trustworthy,  its value is limited when a column listing last year’s revenue is named something cryptic like “LY_REV.” A data steward can clarify these terms within your data warehouse. Once these canonical definitions are in place, they also become searchable through Alation. As a result, your data is placed in a business context that is immediately understandable for all users.

By establishing the terms within your business’ data catalog, these data stewards are curating a list of resources that are accessible to all departments, regardless of technical expertise. As a result, your data becomes democratized across your organization.

Tactical governance drives the creation of a data-driven culture

Modernizing your business to readily access and understand its data is no small task. The power of data analytics provides a crucial competitive advantage  as the digital marketplace introduces multitudes of new data sources. Without a data governance program, this wealth of information is impossible to manage. The ability to ensure the conclusions from your data are trustworthy is obviously critical.  At the same time, your organization cannot lose sight of the need for documentation as it pursues a sustainable data-driven culture.

Your internal stakeholders will recognize where your data comes from with the help of the right tools. But you can’t ensure the ways it has been used to generate conclusions will be meaningful through software alone. All the right tools may be in place for data governance. However,  without internal stakeholders who are accountable for establishing canonical definitions, the program will fall short of its potential.

A modernization initiative is only successful when its components make sense across your organization. Data governance allows users to draw meaningful conclusions from your business information regardless of their technical expertise. With a data catalog in place to ensure your data and its reports have meaning, a truly data-driven culture will take root.