Insight · Data Architecture

Data Architecture in Sparx EA: A Guide for CDOs and Data Platform Teams

By Sparx Services · October 10, 2025 · 7 min read

The short version: Sparx EA models data architecture across four levels — conceptual, logical, canonical, and physical — and links each one upward to business capabilities and outward to the systems that host the data. For a CDO, the payoff is that the EA model stops being documentation and becomes a governed semantic layer your BI and data-platform tools can read from. The prerequisite is the same as for any serious EA use case: governance that makes the model consistent enough to query.

Data models on their own are technical documentation. What makes them governance instruments is the connection upward to capabilities and downward to physical systems — so a question like "which systems are in scope for this migration?" can be traced rather than reconstructed. That tracing only works if the four modeling levels are built deliberately.

The four modeling levels

Conceptual

Business entities, no implementation detail

At the highest abstraction, ArchiMate Data Objects represent business entities — Customer, Product, Order, Contract — without implementation detail. These are the entities that appear in capability maps and process models, creating the shared vocabulary for data conversations between business and IT.

Logical

Technology-independent structure

UML Class diagrams are the standard for logical modeling — entities with attributes and typed relationships (Association, Aggregation, Composition, Inheritance). The logical model describes the data structure required to support business processes without specifying whether it lands in a relational, document, or graph store. Sparx EA supports full cardinality, association classes, and interface definitions.

Canonical

The single governed definition

The canonical model is the authoritative schema for data exchange — the one definition of Customer used across the CRM, the ERP, and the warehouse. In Sparx EA it lives as a governed package of UML Class elements or ArchiMate Data Objects with explicit ownership, version tagging, and traceability to the systems that produce and consume each entity.

Physical

Tables, columns, keys

Physical data modeling through the database schema toolset — tables, columns, primary and foreign keys, indexes. DDL generation and reverse-engineering let architects import actual schemas and model them alongside the logical architecture, linking physical to logical via dependency or realization connectors.

Connecting data architecture to business architecture

The value of data architecture in an EA context is the connection upward to business architecture and downward to physical systems. Without those connections, data models are documentation. With them, they are traceable governance instruments.

Capabilities to information to data. The chain runs: Business Capability (what the organization can do) → Business or Data Object (what information supports it) → Logical Data Entity (how that information is structured) → Physical Table or Document (where it lives). Modeled via Association, Realization, and Dependency connectors, this lets an architect trace from "we need to improve Customer Onboarding" to "which data entities are critical to it" to "which physical systems host them" — and therefore which systems are in scope for a platform migration.

Application to data traceability. ArchiMate's realization relationship connects application components to the Data Objects they manage. That creates the application-to-data map: which systems own which data, which systems consume it, and where master data governance responsibility lies. For Master Data Management programs, this map is foundational.

Modeling data lineage

Data lineage — the traceable path of data from origin through transformation to final use — matters increasingly for regulatory compliance (GDPR Article 30 records of processing, DORA operational resilience) and for data quality governance. Sparx EA models it at the architectural level through:

Data flow diagrams. Using ArchiMate Flow connectors or BPMN Data Objects with sequence flows, data movement between systems is modeled explicitly — Customer Data enters at the CRM, flows to the warehouse via an ETL layer, becomes Customer Analytics Data, and is consumed by the BI platform.

Information flow diagrams. ArchiMate's Association connectors with the «Flow» stereotype, or Information Flow elements in ArchiMate 3.2+, capture information flows between application services, business functions, and across organizational boundaries.

Transformation mapping. Transformation steps can be modeled as intermediate process or service elements with tagged values capturing transformation rules, data quality rules, and responsible teams. This is less granular than a data catalog's field-level lineage but serves the architectural purpose — showing what transforms what.

For full field-level lineage (column to column, transformation logic), dedicated data catalog tools — Microsoft Purview, Collibra, Alation — are the right home. The Sparx EA model provides the architectural context that makes that field-level detail meaningful.

The EA model as a governed semantic layer

This is the development most relevant to data platform teams. Modern data platforms increasingly let analytics and BI tools query connected sources for semantic context — entity ownership, lineage, governance status. A well-governed EA repository is a natural source for exactly that context.

The practical implication: rather than maintaining a separate semantic model by hand inside your data platform, you can treat the EA repository — the authoritative source of your data architecture — as the governed semantic layer your platform and BI tools draw on. A query like "which data entities does the Customer Onboarding capability depend on?" or "which applications are the authoritative source for Product master data?" returns a structured answer from a single governed model.

The quality of those answers is determined entirely by governance. A well-governed canonical model produces reliable semantic intelligence; a poorly-governed one produces unreliable results no matter what reads from it. Connecting the model to your platform is the easy part — making it trustworthy is the work.

Data governance use cases

Master data governance. A canonical model defines the authoritative version of each master data domain (Customer, Product, Employee, Location). Tagged values capture the authoritative source system, the data steward, governance status, and the data quality SLA — surfaced to BI dashboards showing coverage and quality.

API contract to data model traceability. API contracts (OpenAPI / JSON Schema) can be imported or modeled and linked to the logical entities they expose. This supports API governance: when the logical model changes, which APIs are affected? When an API is deprecated, which entities lose their consumption path?

Federated data governance (Data Mesh). Data Mesh patterns model data domains as autonomous teams owning specific data products. Each domain is a package with Data Product elements, data contract specifications (tagged values for schema, SLA, quality guarantees), and explicit consumer-producer relationships. The governance layer — who owns what, what the contracts are, where disputes resolve — is modeled and queryable.

GDPR Article 30 records of processing. Records of processing require documentation of personal data flows, the legal basis for each activity, retention periods, and third-party sharing. Sparx EA models these flows with GDPR-specific tagged values, producing an auditable, queryable record far more maintainable than a spreadsheet register.

MDG requirements for data architecture

The same governance principle that governs capability maps and portfolio models applies here: consistency of element types and tagged values determines whether the model is queryable. For data architecture specifically:

A «DataEntity» or «DataObject» stereotype with tagged values for DataOwner, MasterDataDomain, GDPRClassification, RetentionPeriod, and SourceSystem.
Consistent connector types: realization for application-to-data ownership, association or flow for data movement, dependency for consumption.
A canonical model package with version management (SchemaVersion, EffectiveDate, ApprovalStatus).
Logical-to-physical traceability connectors linking UML Class elements to physical table elements.

Without these conventions, a query like "which data entities contain personal data and which systems host them?" can't return a reliable answer. Governance is the prerequisite, not an afterthought — it's the same discipline we bring to every Sparx EA engagement.

FAQ

Can Sparx EA replace a dedicated data modeling tool?

For conceptual, logical, and physical modeling — UML Class diagrams with full cardinality, schema modeling with DDL generation, and data flow diagrams — yes. For organizations wanting one governed repository for both enterprise and data architecture, it's a viable choice. Teams needing extremely detailed physical schema management with automated CI/CD (data versioning, migration scripts) may keep a dedicated tool like Erwin, PowerDesigner, or dbt alongside it rather than replace it.

What is a canonical data model and how do I build one in Sparx EA?

It's the authoritative, business-agreed definition of shared entities — the single version of Customer, Product, Order all systems reference. Build it as a governed package of UML Class elements with stereotypes capturing ownership, governance status, and version. Link it to source systems (realization from applications to entities) and consuming systems (dependency connectors). The model is the semantic contract; the package governance — review cadence, change process, versioning — is the practice.

How does Sparx EA model data lineage?

At the architectural level — system-to-system and process-to-system flows — using ArchiMate Flow connectors, Information Flow elements, or BPMN Data Objects with sequence flows. Tagged values on flow elements capture transformation type, quality rules, and responsible teams. For field-level lineage, dedicated data catalog tools are more appropriate; Sparx EA provides the architectural context that makes field-level detail meaningful.

What is a Data Mesh and can Sparx EA model it?

Data Mesh distributes data ownership to domain teams, each responsible for their own data products with explicit cross-domain contracts. Sparx EA models it through domain packages, data product elements (tagged values for schema, SLA, quality guarantees), and explicit consumer-producer relationships. The governance model — ownership, contract management, dispute resolution — is documented as architecture principles and decision records in the same repository.

What MDG configuration does data architecture modeling need?

A custom extension defining a DataEntity stereotype with tagged values for DataOwner, MasterDataDomain, GDPRClassification, RetentionPeriod, and SourceSystem; a DataProduct stereotype for Data Mesh patterns; and consistent connector conventions (realization for ownership, flow for movement, dependency for consumption). Without this structure, queries on the data architecture return incomplete or inconsistent results.

Do I need to replace our data catalog if we model data architecture in Sparx EA?

No. The two are complementary. The EA repository models architectural-level governance — canonical models, application-to-data ownership, cross-system flows. Data catalogs provide field-level lineage, business glossary terms, and data quality metrics at the asset level. They integrate well: the EA model supplies architectural context the catalog lacks, the catalog supplies the technical detail the EA model doesn't capture. Treat integration as a maturity ambition, not a prerequisite.

Turn your EA model into a governed semantic layer.

Talk to a practitioner about modeling data architecture in Sparx EA so your data platform and BI tools read from one governed source.

Book a call →

Keep reading