Recently, data fabric and data mesh have emerged as solutions to many of today’s data integration complexities, such as data silos and legacy applications. However, some may be confused about these two architectural configurations, as they both conjure the image of a flexible layer or blanket, placed over the data. Also, some may not understand the role of data virtualisation in enabling both.
In a nutshell, while both enable streamlined access to disparate data sources, data fabric is strictly a technology, whereas data mesh is more of an organizational and process structure.
Data Fabric
Data fabric provides a unified architecture and management framework that facilitates the access and sharing of disparate data sources. Noel Yuhanna, a Forrester Analyst, was one of the first to define data fabric, mostly in the context of big data implementations. However, in recent years, data fabric has been an area of strong interest. In 2021, data fabric was listed by Gartner as one of the top 10 data and analytics trends.
Data fabric is enabled by a variety of tools and technologies in addition to data virtualisation, and these include extract, transform, and load (ETL) processes, data warehouses, master data management (MDM) systems, and data catalogues. Forrester’s recent report on enterprise data fabric shows that data virtualization plays a primary role in this technological ecosystem.
Data Mesh
Data mesh, first defined by Zhamak Dehghani at ThoughtWorks, is a data platform configuration that distributes data ownership among different data domains within an organisation. This is in contrast with the traditional monolithic data management approach, in which data is managed by a central department that delivers the data to business users. Each data domain in a data mesh configuration can use its domain-specific knowledge to package the data as products for distribution throughout the organization. Each data domain provides its own modelling and aggregation, and this democratises data access, helping business users to engage in self-service analytics.
However, in a data mesh, it is critical for organizations to enable the seamless interoperability of data domains, without which there would be fragmentation, duplication, and inconsistencies. Interoperability can be established with a universal interoperability layer, which provides the data standards and governance protocols.
Data Virtualisation
Data virtualisation enables data fabric because it provides real-time access to data across disparate systems, without having to first move the source data to a new repository. A data fabric enabled by data virtualisation provides seamless access to data, without data analysts needing to know where exactly the data is stored, be it on-premises, in the cloud, on legacy systems, or the one that was installed yesterday.
Data virtualisation enables and streamlines data mesh configurations in many ways. First and foremost, by virtue of its architecture and real-time data-access capabilities across vastly different systems, it is custom fit to serve as the universal interoperability layer.
Beyond that, by providing a virtual data-access layer between data sources and the domain-specific data consumers, data virtualization provides a core foundation for a data mesh. This is in contrast with ETL/data warehousing solutions, which rely on physically moving and warehousing data.
Data consumers can gain access to only the data they need, when they need it, rather than waiting to move large amounts of data from one location to another. As data continues to grow in volume, data replication gets increasingly costly, and data virtualisation seems like the sensible, cost-effective solution. In Gartner’s Data Management Hype Cycle, data virtualisation is placed on the “Plateau of Productivity,” meaning that it presents a low level of risk combined with a high return on investment.
Data virtualisation enables the interoperability, governance, and security required in a data mesh, while also enabling the necessary data federation for domain-based data ownership and agile business intelligence (BI). Beyond data federation, data virtualisation provides advanced performance optimisation and self-service search and discovery using included data catalogues.