Data Mesh - Enriching Data With Experts

Listen to this blog

Disclaimer

Data stored all across any organization has immense value and the knowledge derived from it can differentiate one company from its competitors. Not having a solid strategy for sharing data across silos is a strategic mistake.

Most organizations are not designed to share data across business lines and functions. Modern data strategies are creating bridges across these data silos and gateways between them to enable more collaboration and stimulate more data-driven cultures.

While ETL pipelines and data lakes are common ways to bring data together, more innovative, distributed approaches are being quickly adopted. These include data mesh and data fabrics. The end goal of these strategies is to make data available and self-serve so more people can access it to support better decision making. Since these concepts and approaches have come to market, they have evolved.

Evolving data mesh & data fabrics

A Data Mesh is a data architecture that is designed to facilitate data sharing across an organization. A Data Mesh is technology agnostic and is defined by four tenants.

Domain Ownership

The business function that collects data has authority over it.

Data Products

Data is packaged into data products, making them easier to share across the organization.

Self Service

Data and data products must be available in a way that non-technical people can access and use them in their analysis without help from IT or the domain sharing it.

Federated Governance

The responsibility to govern and secure data is shared by the domain as well as central IT authorities.

To learn more about data mesh read our blog on what a data mesh is and why you need one.

Read More: Extrica Data Mesh

Gartner defines data fabric as a design concept that serves as an integration layer of data and connecting processes. It utilizes continuous analytics over existing discoverable and inferenced metadata assets to support the design, deployment and utilization of integrated and reusable data across all environments.

These two strategies are trying to solve the same problem and enable more access to data

Changing distributed data strategies

Since the original concept of Data Mesh was coined, the strategy has evolved. In the early days, some thought that domains should have the power to use whatever tools they wanted to create data products to share. This concept has matured as concerns around standardization and interoperability arose. Reinforcing the concept of data silos and not defining how data products would interoperate may not be the best approach, even if the domain leaders have the best understanding of the data. Today’s data mesh implementations are standardized on process and platforms that make data products easy to create, share and integrate with other data products.

Data fabric architectures have also emerged over the same period but are much more focused on technology, automation and central governance control. While data mesh and data fabric may not compete, they are influencing how each change to meet the need of the market. Modern data practitioners are exploring how data fabric architecture can support data mesh concepts such as federated governance, data products and domain ownership.

Data Mesh vs Data Fabric

Data integration is key to each approach and data democratization through virtualization is becoming the architecture of choice for both data fabric and data mesh. Virtualization keeps data in their source domains and virtualizes data sets to enable data democratization. Where the concepts of data fabric and data mesh diverge is around governance, automation, and consumption/discovery.

Automation

Data fabric leverages automation as much as possible to enable self-service where data mesh looks to domain experts to embed their expertise in data products.

Governance

Data fabric relies on central governance control where mesh takes a federated approach with domains responsible for governing their own data.

Consumption

Data fabric consolidates data assets in data catalogs or deploys knowledge graphs to map data assets across the organization. A data mesh approach exposes data through data products that are created by domain experts and typically published though a data product marketplace.

As the concept of data mesh and the technology of data fabrics have evolved, they are converging. Practitioners are experimenting with various levels of control and data consolidation and automation. AI is becoming an important enabler.

As the market evolves, it becomes less about automation vs. people or federated vs. central governance or data assets vs. data product, but rather about strategies that incorporate all the best features and leveraging the right tool for the right job. Data management platforms and analytics gateways are supporting these merging approaches.

Automation – people & machines

In modern data mesh or data fabric approaches, neither solely rely on domain experts or on automation, but these strategies incorporate these resources in different ways. Data fabrics use automation to integrate data in real time. While humans are involved in the process, they are in a much more passive role addressing issues that AI alerts them to.

Data mesh focuses on data products that data producers build. AI helps producers with automating repetitive tasks, removing the need for coding skills; however, the human who understands the nuance of the data remains at the center of the process. Automated data wrangling processes and AI-assisted data classification are examples.

The approaches can coexist in the same strategy with different participants in the process relying on automation in different ways.

Consumption & discovery - data products vs data assets

Data fabric architectures produce data assets while a data mesh produces data products. Both discovery and consumption approaches can exist in a combined strategy. A data mesh just adds more controls to packages data assets into data products.

The data mesh approach focuses on the data product as the main vehicle for sharing data. Data products published on a data product marketplace are richer and arguably more valuable. They typically are made up of data assets that have been merged and normalized under the guidance of a knowledgeable domain expert. Data products are reusable and more permanent and better for external use outside of data domains.

Combined approaches are exposing consolidated data catalogs to less technical data consumers to create data products to share. By leveraging AI to expose these data assets to data consumers, more like a data fabric, less technical skills are needed to access data. LLM enables data consumers with limited expertise in SQL to explore and query data assets to get the data they need.

Whether it is a data fabric or mesh, the data catalog becomes a very important piece of the strategy. Gateway platforms are creating unified data catalogs that span the entire organization and organize data assets. These platforms are also using Gen AI tools to reduce manual work. AI can help with data classification and help normalize data to support robust data models and business glossaries.

Advances in AI will continue to make it easier for data producers to create data products leveraging the efficiency of automation. Also, experts have the opportunity to train AI to help data consumer get the most from their data. This leads to a best of both worlds approach where skilled humans are working closely with powerful machines.

Data Governance – federated vs. centralized

Emerging platforms and tools are enabling greater federation of governance. Governance tools are making it easier for central IT to relinquish more control while still having oversight.

Data governance controls are being integrated into data management platforms so all players on the data team can take responsibility for governance.

Domain Manager Controls	IT Manager Controls	Data Producers Controls
Controls for access to domains	IT can control access to data platforms	Fine grained access controls to the table level
Controls for granular access to data	IT can control how domains are organized

Automation is also being integrated into data governance with the emergence of active data governance which monitors data assets. Active data governance is technology that monitors data assets and alerts are delivered to producers and consumers if issues arise.

Data mesh incorporates end-to-end data from source to data product, requiring governance beyond the data asset. Managing data governance and quality does not end with the data asset in a data mesh. Public data products are continuously improved and monitored thought human feedback loops. If data products are not very useful or don’t meet the needs of the consumer improvements can be suggested.

With the capabilities of data fabrics and data mesh converging, it creates more flexibility on how to access data. Users can access and discover data whichever way fits with their technical skills or understanding of the data. In the future there will not be data meshes or data fabrics but unique blends of people, machines, governance and consumption tactics.

Get in touch to unlock the real potential of your data!

Trianz would be pleased to set up Extrica demo for you and conduct proof of value to showcase the benefits of Extrica.