Unifying the Data Model and Business Glossary


The growing adoption of data democratization is creating new frameworks and technology for sharing data across data silos. These strategies are reducing the friction of data sharing between business domains, and access to data is becoming much more straightforward. One of the central challenges with integrating data is working with disparate data models that describe diverse databases and data sets in unique ways.

The traditional approach to merging data sets was to extract one data set from its database, transform it, and load it into another database to match the data structure of that database. To perform this ETL process, data engineers need to understand not only the technical aspects of moving and transforming data but also how each data set is organized and labeled. They must understand how the two data sets are modeled so they can be mapped together into one.

Modern data virtualization technology provides greater access to disparate data sources by abstracting away data from its underlying data structure, simplifying the process, and eliminating the need for ETL. While this technology is powerful, it does not provide a uniform way to access data. Data virtualization provides a single interface or connectivity layer that enables access to distributed data from one place. But to understand what the data means, analysts must still rely on each separate data model for each database to gain context. For effective analysis, we need to understand what the data in each system represents and how they relate to one another. These insights require a more robust data federation strategy that standardizes how we access different data stores. A unified data model that maps data and relationships across data silos is a crucial component. For even easier access, a business glossary that maps these relationships to business terms can make this data model even more valuable by increasing its accessibility to business leaders and decision-makers.

The Federated Data Model

A federated data model is based on metadata extracted from the connected source systems and merged into a uniform logical data structure. When data is organized around a single data model, data platforms can interact with all your heterogeneous databases as if they were one. This approach lets you pull data from multiple systems with one federated query. This capability saves data engineers and skilled analysts significant amounts of time when integrating data and creating data assets and data products.

Abstracting the logic from the physical layer also makes self-serve data analytics easier, as tools are less complex and don’t need to interact with multiple underlying database structures

Federated Data Catalog

In a federated data strategy, to access data, metadata is used to create a global or federated data catalog. This data catalog leverages the central metadata repository to create a searchable inventory of data assets that analysts can use to build federated data queries.

A federated data catalog enables searches across all your data assets; it can also consolidate lineage so users and data stewards can understand how data was changed in the past.

A federated data strategy can also manage who has access to what data. Instead of managing access at each database individually or applying uniform rules to all databases, a federated data catalog can be a security gateway to manage identity in one place, while also supporting authorized access to all your data assets.

With a standardized data catalog, creating self-serve capabilities is much less complex. Self-serve platforms can automate the process of accessing data, but with more uniform terminology, business users know what data they are looking for, making them much more self-sufficient. A simpler model also helps AI better understand how to access data. A consolidated and standardized set of data semantics that uniformly defines data elements makes it easier for an LLM model to translate data requests into an SQL query, using business terminology.

Unified Business Glossary

While a federated data model is great for creating single data queries across data sources, these models are not typically geared to business users. Business glossaries are particularly important when federating data across domains and regions because business terms are sometimes defined differently in each business domain. Terminology also differs across regions. “Turnover” in the UK vs. “revenues” in the US, for example. Both terms mean the same thing in the data model, but each region uses a different lexicon. A detailed business glossary that explicitly defines business terms and their synonyms makes it much easier for business-focused decision-makers to find the data they need and understand its meaning.

In the past, business glossaries have existed in standalone documents that define each term. Today, business glossaries are connected to data dictionaries and data catalogs so that business users can automatically get the data they need using business terms. This enhancement enables business users to access data anywhere in the organization with just an understanding of the business terms that describe the data they seek.

Modern business glossaries include:

Centralized repository
Centralized repository

This capability creates a single source of truth for business terms, definitions, and associated metadata.

Structured taxonomy
Structured taxonomy

This functionality organizes business terms into structured taxonomies or hierarchies. Hierarchical categorization allows users to explore related terms and concepts, promoting a deeper understanding of the organization's domain.

Auto-Assignment
Auto-Assignment

In some cases, business terms from glossaries can be auto-assigned to data assets, linking technical metadata with relevant business context. This auto-assignment process helps normalize technical metadata by adding business essence to each data asset, enhancing its relevance and usability.

Normalization
Normalization of technical metadata

this capability connects business terms with technical metadata. The Business glossary helps to standardize terminology across data sets. Normalizing technical metadata ensures consistency in data descriptions, making it easier for users to interpret and analyze information.

Standardizing a Business Glossary

The business glossary should be built from the top down, starting with the requirements of the business. An excellent way to start creating your business glossary is to consider the existing standard industry terminology. This approach will give you a solid foundation and enable better data sharing with third parties. You can also build your business glossary using a hierarchical taxonomy structure to organize and classify your data more effectively.

Managing Conflict

With each domain having its own business glossaries and logical models, conflicts around how different business groups understand what terms and data mean can arise when merged. Having a resource to manage these disagreements is an integral part of a well-functioning universal data glossary.

Data stewards can also be helpful in tagging data assets to designate their value or flag data quality issues. While data stewards can take the lead in data classification, correctly classifying data to be more accessible and discoverable is everyone's responsibility when interacting with data assets. AI can help support this process across the organization. AI can learn from existing data models and suggest classification designations if conflict or uncertainty does arise.

A unified data model and business glossary can be a massive asset in aligning your business data. But it can also help align the business itself. As different domains think about data more uniformly and communicate more consistently, decision-making can be more collaborative and efficient as business terminology and metrics are standardized.

Application of AI

AI will be increasingly important in enabling efficient data catalogs and business glossaries. As AI models become more effective, they will better understand your data assets across your organization. With the help of AI, analysts will have a copilot to help them find the exact data set to help them get the answers they need.

Unifying how we access data and abstracting away the metadata we use to describe it from the actual data enables much greater agility in how we use data. A unified data catalog makes finding and accessing data much faster and more efficient. This capability means that business questions can be answered quicker and more effectively. The faster organizations can make quality decisions, the more competitive they will be in the market.

The increasing demand for data creates an environment where replicating data wherever it is needed through ETL pipelines is unsustainable. A model that consolidates information on where data is stored and how to access it is much more scalable. Federated data strategies that manage metadata and the context around data provide the flexibility and agility needed for the future.

Get in touch to unlock the real potential of your data!

Trianz would be pleased to set up Extrica demo for you and conduct proof of value to showcase the benefits of Extrica.

data mesh lab