Direct Answer: Sparx EA models data architecture across multiple layers: conceptual and logical data models (UML Class diagrams or ArchiMate Data Objects), physical data models (database schema), canonical data models for master data and API contracts, and data flow diagrams showing how data moves between systems. For CDOs and data platform teams, the significant development is EA GraphLink Interface B — the MCP Server that allows AI agents (including Microsoft Fabric’s agent framework) to query the EA repository as a semantic layer. This means your EA model is not just documentation — it can serve as the governed semantic layer that AI and BI tools read from directly. The prerequisite is the same as for any EA GraphLink use case: MDG governance that makes the data architecture machine-readable.
Key Takeaways
- Sparx EA supports data architecture at conceptual, logical, and physical levels — from business-facing Data Objects to database schema.
- The canonical data model in Sparx EA is the semantic contract between business capabilities, application APIs, and data platform entities.
- EA GraphLink’s MCP Server enables Microsoft Fabric agents and other AI tools to query the EA data architecture as a governed semantic layer.
- Data Mesh federated governance is modelable in Sparx EA — domain ownership, data contracts, and data product definitions as first-class architecture elements.
- CDOs who integrate their EA model with their data platform get live semantic governance, not a documentation artefact.
How Sparx EA Models Data Architecture
Sparx EA supports data architecture across four modelling levels, each serving a different audience and purpose:
Conceptual data model. At the highest level of abstraction, ArchiMate Data Objects (Business layer and Application layer) represent business entities — Customer, Product, Order, Contract — without implementation detail. These are the entities that appear in capability maps and business process models. The conceptual model creates the shared vocabulary for data conversations between business and IT.
Logical data model. UML Class diagrams in Sparx EA are the standard for logical data modelling — entities with attributes and typed relationships (Association, Aggregation, Composition, Inheritance). The logical model is technology-independent: it describes the data structure required to support business processes without specifying the physical implementation (relational, document, graph). Sparx EA’s UML Class support includes full cardinality notation, association classes, and interface definitions.
Canonical data model. The canonical model is the authoritative schema for data exchange — the definition of Customer that is used in the CRM, the ERP, and the data warehouse, resolved to a single governed definition. In Sparx EA, the canonical model lives as a governed package of UML Class elements or ArchiMate Data Objects with explicit ownership, version tagging, and traceability to the source systems and consuming applications that reference each entity.
Physical data model. Sparx EA supports physical data modelling through its database schema modelling toolset — tables, columns, primary/foreign keys, indexes. DDL generation and reverse-engineering from existing databases allows architects to import actual physical schemas and model them alongside the logical architecture. Physical models can be linked to their logical counterparts via dependency or realisation connectors.
Connecting Data Architecture to Business Architecture
The value of data architecture in an EA context is the connection upward to business capabilities and downward to physical systems. Without these connections, data models are technical documentation. With them, they are traceable governance instruments.
Capabilities to information to data. The chain runs: Business Capability (what the organisation can do) → Business Object or Data Object (what information supports that capability) → Logical Data Entity (how that information is structured) → Physical Table/Document (where it lives). In Sparx EA, these connections are modelled via Association, Realisation, and Dependency connectors. An architect can trace from “we need to improve our Customer Onboarding capability” to “which data entities are critical to that capability” to “which physical systems host those entities” — and therefore which systems are in scope for a data platform migration.
Application to data traceability. ArchiMate’s realisation relationship connects Application Components to the Data Objects they manage. This creates the application-to-data map: which systems own which data, which systems consume which data, and where master data governance responsibility lies. For Master Data Management programmes, this model is foundational.
Data Lineage in Sparx EA
Data lineage — the traceable path of data from its origin through transformations to its final use — is increasingly important for regulatory compliance (GDPR Article 30 record of processing, DORA operational resilience) and for data quality governance.
Sparx EA models data lineage through:
Data Flow Diagrams. Using ArchiMate Flow connectors or BPMN Data Objects with sequence flows, data flows between systems can be modelled explicitly. A data flow model shows: Customer Data enters at CRM, flows to the Data Warehouse via an ETL layer, is transformed into Customer Analytics Data, and is consumed by the BI platform.
Information Flow Diagrams. ArchiMate’s Association connectors with the «Flow» stereotype (or ArchiMate Information Flow elements in version 3.1+) capture information flows at the architectural level — between application services, between business functions, and across organisational boundaries.
Transformation mapping. In Sparx EA, transformation steps can be modelled as intermediate process or service elements with tagged values capturing transformation rules, data quality rules, and responsible teams. This is less detailed than a data catalogue tool’s lineage but serves the architectural traceability purpose — showing what transforms what, without specifying field-level mapping.
For full field-level lineage (column to column, transformation logic), dedicated data catalogue tools (Microsoft Purview, Collibra, Alation) are more appropriate. The Sparx EA model provides the architectural-level lineage that contextualises the field-level detail.
The EA GraphLink Path to Microsoft Fabric
This is the development most relevant to data platform teams in 2025 and 2026.
Microsoft Fabric’s agent framework — part of the Fabric AI capabilities in the Microsoft 365 Copilot ecosystem — allows AI agents to query connected data sources via MCP (Model Context Protocol). EA GraphLink Interface B is an MCP Server. This means:
A Fabric agent can query the Sparx EA repository as a semantic source — asking “what data entities does the Customer Onboarding capability depend on?” or “which applications are authoritative sources for Product master data?” — and receive structured answers from the governed EA model.
The practical implication for data platform teams: your EA model can serve as the semantic governance layer for your Fabric data platform. Instead of manually maintaining semantic models in Fabric’s OneLake or Power BI datasets, AI agents can interrogate the EA repository — the authoritative source of your data architecture — and use that to inform data access, transformation decisions, and impact analysis.
This integration is not theoretical. EA GraphLink MCP connectivity is deployable today. The quality of AI agent responses is determined by the MDG governance of the EA data architecture model. A well-governed canonical data model in Sparx EA, connected via EA GraphLink MCP to a Fabric agent, produces reliable semantic intelligence. A poorly-governed model produces unreliable agent responses.
Data Governance Use Cases
Master data governance. A canonical data model in Sparx EA defines the authoritative version of each master data domain (Customer, Product, Employee, Location). Tagged values on each entity capture the authoritative source system, the data steward, the governance status, and the data quality SLA. EA GraphLink exposes this to BI dashboards showing master data coverage and quality status.
API contract to data model traceability. API contracts (OpenAPI / JSON Schema specifications) can be imported or modelled in Sparx EA, linked to the logical data entities they expose. This traceability supports API governance: when the logical model changes, which APIs are affected? When an API is deprecated, which data entities lose their consumption path?
Federated data governance (Data Mesh). Data Mesh patterns model data domains as autonomous teams with ownership of specific data products. In Sparx EA, each data domain is a package with a defined set of Data Product elements, data contract specifications (tagged values for schema, SLA, quality guarantees), and explicit consumer-producer relationships between domains. The governance layer — who owns what, what the contracts are, where disputes are resolved — is modelled and queryable.
GDPR Article 30 records of processing. Records of processing require documentation of personal data flows across systems, the legal basis for each processing activity, data retention periods, and third-party sharing. Sparx EA can model these flows with GDPR-specific tagged values on data flow elements, producing an auditable, queryable record of processing that is far more maintainable than a spreadsheet register.
MDG Requirements for Data Architecture
The same MDG principle that governs capability maps and application portfolio models applies to data architecture: consistency of element types and tagged values determines whether the model is queryable.
For data architecture specifically, the MDG requirements are:
- A
«DataEntity»or«DataObject»stereotype with tagged values for DataOwner, MasterDataDomain, GDPRClassification, RetentionPeriod, and SourceSystem. - Consistent connector types between data elements: realisation for application-to-data ownership, association or flow for data movement, dependency for data consumption.
- A canonical data model package with version management (tagged values for SchemaVersion, EffectiveDate, ApprovalStatus).
- Logical-to-physical traceability connectors linking UML Class elements to physical table elements or external system representations.
Without these conventions, an EA GraphLink query — “which data entities contain personal data and which systems host them?” — cannot return a reliable answer.
FAQ
Can Sparx EA replace a dedicated data modelling tool? Sparx EA can handle conceptual, logical, and physical data modelling effectively — UML Class diagrams with full cardinality, database schema modelling with DDL generation, and data flow diagrams. For organisations that want a single governed repository for both enterprise architecture and data architecture, Sparx EA is a viable choice. For teams that need extremely detailed physical schema management with automated CI/CD integration (data versioning, migration scripts), a dedicated tool like Erwin, PowerDesigner, or dbt for physical layer management may complement Sparx EA rather than be replaced by it.
How does EA GraphLink connect to Microsoft Fabric? EA GraphLink Interface B is an MCP (Model Context Protocol) Server. Microsoft Fabric’s AI agent framework supports MCP connectivity, allowing Fabric agents to query EA GraphLink as a semantic data source. This enables Fabric agents to ask natural language questions about the EA data architecture — entity ownership, lineage, governance status — and receive structured answers from the governed EA repository. The integration requires EA GraphLink deployment and MCP configuration; the Connect offering covers this.
What is a canonical data model and how do I build one in Sparx EA? A canonical data model is the authoritative, business-agreed definition of shared data entities — the single version of Customer, Product, Order that all systems reference. In Sparx EA, build it as a governed package of UML Class elements with stereotypes capturing ownership, governance status, and version. Link it to source systems (realisation connectors from applications to data entities) and to consuming systems (dependency connectors). The canonical model is the semantic contract; the package governance — review cadence, change process, version management — is the governance practice.
How does Sparx EA model data lineage? Data lineage in Sparx EA is modelled at the architectural level — system-to-system and process-to-system data flows — using ArchiMate Flow connectors, Information Flow elements, or BPMN Data Objects and sequence flows. Tagged values on flow elements capture transformation type, data quality rules, and responsible teams. For field-level lineage (column-to-column transformation), dedicated data catalogue tools are more appropriate; Sparx EA provides the architectural context that makes field-level lineage meaningful.
What is a Data Mesh and can Sparx EA model it? Data Mesh is a data architecture pattern that distributes data ownership to domain teams, each responsible for their own data products with explicit contracts for cross-domain consumption. Sparx EA can model Data Mesh by creating domain packages, data product elements (with tagged values for schema, SLA, quality guarantees), and explicit consumer-producer relationships between domains. The governance model — domain ownership, contract management, dispute resolution — is documented as architecture principles and decision records in the same repository.
What MDG configuration does data architecture modelling need? Data architecture in Sparx EA benefits from a custom MDG extension defining: a DataEntity stereotype with tagged values for DataOwner, MasterDataDomain, GDPRClassification, RetentionPeriod, and SourceSystem; a DataProduct stereotype for Data Mesh patterns with SLA and schema version tagged values; and consistent connector type conventions (realisation for ownership, flow for movement, dependency for consumption). Without this MDG structure, EA GraphLink queries on the data architecture return incomplete or inconsistent results.
How does data architecture in Sparx EA support GDPR compliance? Sparx EA can model GDPR Article 30 records of processing by representing personal data flows between systems with GDPR-specific tagged values: legal basis for processing, data subject categories, retention period, third-party sharing, and data controller/processor classification. This produces a queryable, auditable record of processing far more maintainable than a spreadsheet. When connected via EA GraphLink, compliance status dashboards can be generated automatically from the repository.
Do I need to replace our data catalogue if we implement EA GraphLink? No. EA GraphLink and data catalogue tools serve complementary purposes. The EA repository models architectural-level data governance — canonical data models, application-to-data ownership, cross-system data flows. Data catalogues provide field-level lineage, business glossary terms, and data quality metrics at the technical asset level. The two can be integrated: EA GraphLink exposes the architectural context that data catalogues lack; data catalogues provide the technical asset detail that EA models do not capture. Consider the integration a maturity ambition, not a prerequisite.
Connect Your EA Model to Your Data Platform
Sparx Services’ Connect offering deploys EA GraphLink — including Interface B (MCP Server) for Microsoft Fabric and AI tool integration — giving your data platform teams access to the EA repository as a governed semantic layer.
For CDOs building toward AI-augmented data governance, this is the connectivity layer that makes it possible.