Resources

Data Mesh Architecture Pattern

Pattern Name: Data Mesh Category: Data Architecture / Distributed Systems Complexity: High Stability: Emerging: conceptually mature but implementations vary significantly


Context

Centralised data platforms: data lakes, data warehouses, enterprise data hubs: became the dominant architectural pattern for analytics and reporting through the 2010s. The promise was clear: bring all your data to one place, clean it up, and make it available for analysis. The reality, for many large organizations, has been a centralized bottleneck: a small data platform team responsible for ingesting, transforming, and serving data for the entire enterprise, unable to keep pace with demand, and disconnected from the business domains that understand the data best.


Problem

How do you scale data access and quality across a large, complex organization when a centralized data platform team cannot keep up with the volume and variety of data demands from multiple business domains?

Centralised data platforms suffer from a fundamental organisational tension: the people who understand what a dataset means (the domain teams that produce it) are not the people responsible for making it available for analysis (the central data team). This creates quality problems, slow time-to-insight, and a bottleneck that grows with organisational complexity. Adding more capacity to the central team does not solve the structural problem.


Solution

Treat data as a product owned by the domain that produces it. Each domain team is responsible for the full lifecycle of its data products: collection, storage, quality, governance, and publication: using a self-serve data platform provided centrally.

The data mesh pattern has four core principles:

Domain ownership: Each business domain owns and is accountable for its data products. The team that produces the Customer data is responsible for its quality, schema, and availability. This aligns accountability with knowledge.

Data as a product: Domain teams treat their datasets as products with internal consumers. Each data product has a defined schema (contract), SLAs for freshness and quality, and documentation. Data products are discoverable and versioned.

Self-serve data platform: A central platform team provides the infrastructure and tooling that domain teams use to produce and consume data products: storage, compute, cataloguing, lineage tracking, access control: without requiring central mediation for each data request.

Federated computational governance: A central governance function sets the standards: data quality thresholds, classification requirements, interoperability standards, security policies: that all data products must meet. Compliance is enforced computationally (automated checks) rather than manually (central team reviewing each dataset).


Sparx EA Implementation Notes

Element types to use:

Diagram types: ArchiMate Application and Data Layer diagrams showing data products by domain; ArchiMate Motivation diagram linking data products to strategic data governance principles; network/topology diagram showing the mesh of data product relationships

MDG considerations: A well-designed MDG profile for data mesh is essential: data products need consistent metadata to be discoverable and governable. Mandatory tagged values on «DataProduct» elements should include: domain, owner, classification, schema version, freshness SLA, consumer list. Validation rules should prevent publication of data products missing required metadata. This is precisely the kind of governance that breaks down in ungoverned repositories.

Package structure: Create a Data Architecture package with sub-packages per domain (Customer, Finance, Operations, etc.), each containing that domain’s data products. A central Data Platform package documents the self-serve infrastructure. A Governance package documents the federated standards and policies.

EA GraphLink and AI integration: Data mesh governance generates natural AI use cases. With EA GraphLink connected, questions like “which data products does the Customer domain own?”, “which data products are consumed by more than five applications?”, and “which data products have exceeded their freshness SLA?” become answerable via the MCP interface. For data stewards and governance teams, this provides real-time visibility into the data mesh without manual reporting. AI tools can also surface lineage questions: “what is the upstream source of the Revenue data product?”: by traversing the relationship graph in the EA model.


When to Use

When Not to Use


Related Patterns

Ready to make your EA investment work harder?

Talk to a Sparx Services architect about where your organization is on the journey and what the next stage looks like.