The Real-Time Data Mesh and Its Place in Modern IT Stacks – The New Stack

2022-05-28 06:44:50 By : Mr. Daniel Guo

In Part 1 of this series, we highlighted the challenges of real-time data sharing, discussed operational vs. analytical data, and considered legacy solutions and their limitations. This post defines the real-time data mesh and discusses the key tenets for incorporating them into modern IT stacks. 

Facilitating real-time data sharing is a challenging proposition, particularly when multicloud and SaaS applications are included as typical requirements. At the same time, these difficult implementation challenges are surprisingly undifferentiated: They don’t differ significantly across industries, sectors or company sizes, making data sharing ideal for a platform-based solution.

Best-of-breed solutions share with legacy blockchain approaches a key architectural insight: The job of maintaining a single source of truth across multiple parties belongs with the platform, not with each of the parties. This produces several advantages:

Unlike early blockchains, which were essentially commercialized prototypes, modern data mesh offerings are based on solid public cloud engineering. They share the same multitenanted, highly scalable designs as highly adopted public cloud services and exploit modern interfaces, including GraphQL APIs and container-based code sharing. These advances in engineering and architectural patterns have allowed “second-generation” approaches to solve the issues that plagued early (and usually failed) attempts to deploy blockchain technologies in enterprise settings:

Despite being a ubiquitous need, real-time data sharing isn’t always a well-modeled element in existing IT stacks. Gartner echoes this thought, “IT and business-oriented roles … adopt EiPaaS as a key component of their integration strategy … [but] despite its mainstream use, choices of providers are fragmented and difficult to navigate.” It’s an intriguing question: Why should that be?

The answer lies in the structural shifts our industry is undergoing. “Classic” IT had a relatively simple problem to solve:

In other words, both production and consumption of data, along with any transmission or “sharing”, was handled in-house — often within the confines of a single mainframe. Whether built in-house, through outsourced delivery partners, or provided via ERP systems, these “data monoliths” were, despite their other challenges, relatively easy to manage from a sharing perspective.

With all these structural changes, it’s easy to see why ERP systems developed in the ‘90s, and even EAI approaches that worked fine in the 2000s are no longer able to satisfy the needs of companies and their IT demands: The challenge of disparate data isn’t something they had to worry about, and as a result, they’re ill-equipped to deliver on modern IT experiences in data sharing.

Because of the challenges cited above, even high-functioning IT teams don’t necessarily have a strong “recipe” for incorporating real-time data sharing into their approach in a uniform, best-practice fashion. This section briefly surveys three deployment approaches with increasing levels of capability and complexity to provide an overview of how these platforms can be incorporated into modern, service-based IT portfolios.

The simplest deployment approaches are those where the data model and connectivity is tied directly to an existing SaaS-based domain, such as sharing marketing or sales information between CRM systems for co-selling purposes. Because the domain is well known in these cases, there is little to no data modeling challenge, and because the systems of record (Salesforce, Microsoft Dynamics) are well known, connectivity is equally easy, usually limited to authorizing the platform against the systems in question.

Setup and configuration typically can be done inside of a week and involves:

Figure 1 illustrates a typical application-based deployment using CRM contact sharing as an example.

Figure 1: CRM data sharing — a sample deployment architecture

Application-based solutions are simplified by a shared domain model, such as CRM data, but are able to connect different SaaS vendors across different organizations, even among multiple companies. They represent substantial leverage over point-to-point API-based data-sharing solutions that require building and operating a full, security-hardened and compliant integration between every pair of parties involved.

Because both the applications being connected and the underlying platform are all SaaS-based, there is no infrastructure to deploy or complex data modeling to perform, and deployments can move from prototyping to testing to production in the space of weeks rather than months or years. For teams already familiar with ETL “data exhaust” from these applications, the design pattern is identical, making deployment even more efficient because similar patterns of authorization and enablement can be followed.

This pattern can be easily repeated for other SaaS applications and takes advantage of the industrywide trend toward SaaS: Eventually, every major SaaS application will have real-time data connectors that simplify the sharing of data with similar applications across departmental, cloud or organization lines.

The design is also open-ended: “Hybrid” deployments can take advantage of the simplicity of connection to a SaaS system, such as a CRM provider like Salesforce, while also connecting (internally or through a partner’s implementation) to in-house applications (see Figure 2). This flexibility supports custom development of mission-critical applications without giving up the advantages of simple data connectivity to existing systems.

(For more on fully modeled solutions and their deployments, see below.)

Figure 2: A “hybrid” deployment showing connections through a partner’s implementation to in-house applications

The next step toward custom development is file-based sharing. This pattern shares with application-based sharing the advantage of not requiring the construction of a data model: The data model is essentially just a file system shared among the various parties. File-based approaches are more flexible than pure application-based solutions, however, because they can leverage legacy formats. Many existing cross-company data-sharing solutions are based on files, and a file-based sharing approach is a simple way to maintain compatibility while simultaneously progressing toward a modern data-sharing solution for real-time data needs. Figure 3 illustrates migrating from an sFTP-based “file depot” solution to a real-time data-sharing pattern based on files while preserving existing file formats and application-processing logic.

Figure 3: Migration from an sFTP-based “file depot” solution to a real-time data-sharing pattern based on files

As with the application-based approach described above, access controls are critical: Each party needs to define, for the files it authors, which other parties should receive the data. In addition, files can be large, and best-of-breed platforms will actually distinguish between sharing the data and copying the data. This additional dimension of control allows the members of a data-sharing arrangement, whether they’re two regional deployments in an application, multiple organizations within a single company or multiple companies with a shared workload (such as a supply chain) to decide how many copies of a file are warranted. Copying controls allow parties to balance the cost of making copies with the operational isolation that “having your own copy” naturally affords.

Real-time data mesh offerings also provide versioning, lineage (who changed what and when), built-in auditing, logging and reporting capabilities. These are essential for governing file-sharing systems over time and at scale; otherwise, the sheer weight of building appropriate compliance and security reporting can overwhelm already taxed teams. The more parties involved and the more “arm’s length” they are from each other, the more critical fine-grained access controls (and commensurate reporting, versioning and auditing capabilities) become. Legacy blockchains and “walled garden” ERP and EAI solutions typically fail at this level of complexity, because they don’t easily provide simple file-sharing capabilities, coupled to production-grade security and versioning controls.

The best file-sharing platforms also provide backward compatibility with existing public cloud blog storage APIs. This compatibility enables existing investments in popular cloud service APIs, such as AWS’s S3, to be preserved intact while still offering seamless data sharing across both organizations and with other clouds. Having cloud-based portability for files built-in means that file-sharing solutions can also be used in-house to create multiregion, multi-account and multicloud strategies with just a few lines of configuration code, rather than the months or years of planning and development usually mandated for a complex “cross-cloud” data-sharing platform.

File-sharing solutions are easily extended to incrementally incorporate additional fine-grained data modeling. This optional process can proceed in graduated steps:

Even for teams that want to adopt fully modeled solutions (see below), file-based approaches can be an easy on-ramp, as they often permit existing application workloads and file formats to remain unchanged in the initial stages of adopting a real-time data mesh framework.

The “holy grail” of real-time data sharing is a fine-grained data model capable of automatically powering secure, scalable public APIs. While this approach requires having in hand a data model (also known as a data schema) acceptable to all the parties involved, from there the platform can take over: Modern platform approaches such as Vendia’s can generate APIs automatically, using nothing more than the data model itself. This includes not just sharing current data, but also versioning (“time travel” access to older versions of the data) and lineage/auditing (access to information about “who did what and when,” which is needed to create compliant end-to-end solutions that third parties can successfully audit). Figure 4 illustrates a fully modeled, fine-grained data-sharing architecture among multiple parties.

Figure 4: A fully modeled, fine-grained data sharing architecture among multiple parties.

As discussed above, sharing data is only half the battle: Just as it’s important to get data swiftly from one party to another, it’s important to ensure that only the right data is shared. Access controls, governance mechanism and fully auditable tracing of these settings are key requirements, not just for enterprises but for any company operating in an environment where accidental sharing of personal data makes headlines. Fine-grained data models also provide a natural framework on which to “hang” metadata such as access controls, indexing requirements, and other operational and security annotations, allowing the platform to compile them automatically into a complete, SaaS-delivered solution.

Real-time data mesh solutions don’t make challenges like authorization or authentication harder, but they do emphasize the inherent heterogeneity and security challenges associated with connecting clients that may vary dramatically from party to party. For example, one party might ingest data from a public cloud service and require a cloud native identity and access-control solution, while another party may have elected to distribute shared data to a mobile app running on millions of handheld devices. A successful platform needs to embrace, rather than bypass, these differences by supporting a variety of authentication and authorization mechanisms that can be customized on a per-party basis. As important as a shared-data and governance model is, allowing and supporting required differences among parties is equally critical.

Business relationships are constantly changing. Business needs, and the data that powers them, is constantly evolving. So to be successful, a real-time data mesh needs to model the “sharing topology” as a first-class element and make it both simple and safe to evolve the data model over time to match the needs of the business.

Successful real-time data meshes incorporate both of these features: The parties sharing data, whether they represent multiple companies, different organizations within a single company, different cloud vendors, multiple SaaS applications, regional deployments or any combination thereof, need to be easy to capture and represent using configuration, rather than requiring complex code or tooling. The data model itself needs to be represented in a standards-based format, not a proprietary representation that could lead to a “walled garden” problem down the road with the ability to augment or alter it in controlled ways over time. By generating APIs and other infrastructure automatically from the data model, the platform can also guarantee backward compatibility for clients, ensuring that as the data model evolves, applications and other parties aren’t left broken and unable to continue sharing data effectively.

Once a deployment strategy has been elected, how can an IT organization perform an effective vendor selection process? The next article provides a methodology for vendor consideration that incorporates the requirements exposed by these design strategies to assist in locating a best-of-breed platform.

Looking to learn more about real-time data meshes or their integration with analytical data solutions? The Vendia blog has a number of articles, including how these features surface in modern applications and get exposed through data-aware APIs.

In Part 3 of this series, we provide a vendor checklist that focuses on what’s needed to effectively evaluate real-time data-sharing solutions.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Photo by Uriel SC on Unsplash