Data Architecture in Sparx EA: The CDO's Architecture Foundation

Sparx EA is the data architect’s modeling environment for enterprise-level data architecture: data entity models, data flow diagrams, business data glossaries, and data lineage documentation: all in the same repository as the application, business, and technology architecture the rest of the organization depends on. When EA GraphLink connects the repository, data architects and CDO organizations can query cross-domain data questions in natural language and connect data lineage to regulatory compliance requirements at scale.

Key Takeaways

Sparx EA supports conceptual, logical, and physical data modeling in the same repository: UML class diagrams for entity-relationship models, ArchiMate data objects for business-layer data concepts
Data lineage for BCBS 239, GDPR, and similar regulatory requirements is documentable in Sparx EA with end-to-end traceability from data source to consumer
EA GraphLink enables cross-domain data architecture queries: “what systems produce this data entity?” answered live from the repository via MCP
The CDO use case: Sparx EA as the authoritative data architecture registry connecting data governance platforms to the broader enterprise architecture context
MDG governance for data architecture defines consistent entity typing, required metadata (owner, classification, retention), and lineage relationship types as enforced standards

Data Modeling in Sparx EA

Three Modeling Layers in One Repository

The power of Sparx EA for data architecture is the vertical integration across modeling levels: conceptual, logical, and physical data models connected by traceability relationships in a single repository. Data architects don’t maintain three separate artifacts; they maintain one connected model.

Conceptual data model: business entities and their relationships as the business understands them. Typically modeled using ArchiMate Business Objects or UML Class diagrams with domain-specific stereotypes. A Customer, a Product, a Transaction: defined at the semantic level that business stakeholders recognize. Conceptual models are owned by business domains; they’re validated by business stakeholders, not data architects.

Logical data model: normalized entity-attribute-relationship models that translate conceptual entities into structured data definitions. UML Class diagrams with attributes and relationships. A Customer entity at the logical level has Customer ID, Customer Name, Contact Details, Segment Code: defined precisely enough for database implementation, without physical storage concerns.

Physical data model: database schema: tables, columns, data types, indexes, foreign keys. UML Class diagrams with database-specific stereotypes (Table, Column, Primary Key), or generated directly from reverse engineering a live database schema into Sparx EA. Physical models are where implementation decisions live; they connect back to logical models through realization relationships.

All three levels live in the repository, connected. When a conceptual entity is renamed, the logical and physical implementations are traceable. When a physical schema change is proposed, the upstream conceptual entity and any downstream consuming applications are identifiable through the relationship model.

Cross-Domain Value: Why Data in the EA Repository

Data modeling tools exist that handle individual modeling layers well. What Sparx EA adds is the cross-domain context that standalone data tools can’t provide.

In Sparx EA, a Customer data entity in the logical model can be connected to:

The ArchiMate Application Components that produce or consume it
The Business Capabilities that depend on it
The Technology Components (databases, data platforms) that store it
The regulatory requirements (GDPR data subject rights, BCBS 239 critical data elements) that govern it

This cross-domain connectivity is the CDO use case. The Chief Data Officer needs to understand not just what data entities exist, but which business processes depend on them, which applications manage them, which platforms store them, and what the risk is if any of those platforms change. That question is answerable from the Sparx EA repository in a way it isn’t from a standalone data modeling tool.

Data Lineage for Regulatory Compliance

BCBS 239: Critical Data Element Documentation

BCBS 239 (Principles for Effective Risk Data Aggregation and Risk Reporting) requires financial institutions to identify critical risk data elements and show end-to-end data lineage: from source system to regulatory report, with transformation rules documented at each step.

In Sparx EA, BCBS 239 data lineage is documented as:

Critical data element identification: a governed stereotype “Data Entity (Critical Risk)” with required tagged values including BCBS 239 principle mappings, criticality rationale, and data owner
Source system identification: Application Component elements tagged as authoritative sources for specific critical data elements
Transformation documentation: Information Flow and Data Flow elements connecting source systems to downstream consumers, with transformation rules documented on the relationship or as associated notes
End-to-end lineage views: custom diagrams or generated reports showing the complete flow from source to regulatory report

The key governance point: MDG enforcement ensures that critical data elements have complete lineage documentation before they can be set to an “Approved” status. Partial lineage isn’t flagged at review time: it’s prevented at creation time.

GDPR and Data Subject Rights

GDPR compliance documentation in Sparx EA follows a similar pattern: personal data entity identification (Data Entity (Personal) stereotype with required classification and retention tags), processing flow documentation (which applications process which personal data entities and for what business purpose), and data subject rights mapping (which applications and databases would need to respond to a Subject Access Request or erasure request for a specific data subject).

The EA repository, when populated with complete GDPR data architecture, becomes the source of truth for Data Protection Impact Assessments and Subject Access Request responses: answerable in natural language via EA GraphLink rather than through manual repository searches.

Connecting Data Architecture to Enterprise Architecture

The unique value proposition of Sparx EA for data architects: data architecture in the same repository as the rest of the enterprise architecture. The cross-domain queries that this enables:

Application-to-data queries: “What applications produce or consume the Customer data entity?”: traverses from the data layer to the application layer using ArchiMate data object relationships. Useful for data migration planning, application rationalization, and impact analysis.

Business capability-to-data queries: “What business capabilities depend on the Contract data domain?”: traverses from the data layer to the business capability layer. Useful for capability investment decisions that involve data considerations.

Technology risk-to-data queries: “Which data entities are stored on technology platforms approaching end of life?”: traverses from the data layer through application associations to the technology layer. Surfaces data risk that exists because of platform risk, not because of data quality issues.

Regulatory coverage queries: “Which personal data entities are processed by applications that haven’t been assessed for GDPR compliance?”: combines data layer governance status with application compliance tags. Surfaces compliance gaps at the intersection of data and application governance.

None of these queries are easily answerable from a standalone data modeling tool. All of them are answerable from a well-governed Sparx EA repository via EA GraphLink.

AI for Data Architecture

EA GraphLink and MCP for Data Architecture Queries

EA GraphLink’s MCP (Model Context Protocol) server enables data architecture to become a first-class AI context source. A data architect using Claude can ask: “What are the data lineage paths for the Customer Revenue critical data element, and which transformation steps have incomplete documentation?”: and receive a synthesized answer from the live repository.

For CDO stakeholders, Kernaro AI Hub provides natural language data governance queries without requiring repository access: “Which business-critical data entities have no documented owner?” or “What data entities would be affected by a migration of the legacy data warehouse?” become self-service queries rather than architecture team requests.

Microsoft Fabric Integration

For organizations using Microsoft Fabric as their enterprise data platform, EA GraphLink enables the Sparx EA data architecture to be published as a governed data asset in the Fabric catalog. The authoritative data architecture: entity definitions, lineage, ownership, classifications: becomes the metadata layer for the Fabric data environment. Data engineers and data consumers can access architecture metadata in the tools they already use, without needing to query the EA repository directly.

Kernaro AI Hub for CDO Organizations

CDO organizations with active data governance programs benefit from Kernaro AI Hub as a dedicated data architecture intelligence layer. Data stewards, data owners, and CDO leadership can query the data architecture in natural language: without EA licenses, without modeling tool training, without waiting for a data architect to produce a report. The governance prerequisite is the same as for any AI integration: MDG-governed data architecture with consistent entity typing, required metadata, and complete lineage documentation.

Frequently Asked Questions

Can Sparx EA do conceptual, logical, and physical data modeling?

Yes. Sparx EA supports all three levels in the same repository. Conceptual models use ArchiMate Business Objects or UML Class with domain stereotypes. Logical models use UML Class diagrams with entity-relationship conventions. Physical models use UML Class with database stereotypes (or Sparx EA’s built-in database diagram support). The three levels connect through traceability relationships, so a change at any level can be traced to its implications at other levels.

How do I document data lineage for BCBS 239 in Sparx EA?

Using a combination of critical data element stereotypes (with required BCBS 239 tagged values), source system identification on Application Component elements, and Information Flow or Data Flow relationships documenting transformation steps between systems. MDG enforcement ensures lineage completeness is a requirement, not an aspiration. The Discover service includes an assessment of current BCBS 239 documentation state; the Deploy service establishes the MDG configuration for compliant documentation.

What is the difference between UML class models and ArchiMate data objects in Sparx EA?

ArchiMate Data Objects represent data as a business-layer concept: they appear in the Information layer and connect to Business Processes and Application Components. They’re appropriate for conceptual data modeling in an enterprise architecture context. UML Class diagrams represent data structure in detail: entity-attribute-relationship models appropriate for logical and physical data modeling. Both live in the same Sparx EA repository and can be connected through relationships. Most data architecture practices use both: ArchiMate for the business-facing conceptual layer and UML Class for the technical modeling layers.

Can I reverse engineer a database schema into Sparx EA?

Yes. Sparx EA includes a database reverse engineering feature that imports schema from major RDBMS platforms (Oracle, SQL Server, MySQL, PostgreSQL, and others) directly into UML Class diagrams. Reverse-engineered schemas can then be connected to logical and conceptual models through traceability relationships, establishing the vertical connection between what exists in production and what is defined in the architecture. This is frequently the starting point for data architecture programs that need to document what exists before designing what should exist.

How do I connect data architecture to application architecture in Sparx EA?

Through ArchiMate relationship types: Application Components “access” Data Objects (read, write, or read/write); Application Services “use” Data Objects; Information Flows carry Data Objects between Application Components. These relationships, when consistently modeled, enable the cross-domain queries described above: “which applications produce this entity?” becomes a live query rather than a manual research exercise.

What MDG configuration is needed for data architecture governance?

At minimum: governed stereotypes for each data modeling level (Data Entity (Conceptual), Data Entity (Logical), Data Object (Physical)) with required tagged values at each level; a critical data element stereotype with regulatory compliance tags; required metadata including data owner, classification, and retention period; and lineage relationship types defined as governed relationship stereotypes. For BCBS 239 or GDPR compliance programs, additional stereotypes for regulatory-specific documentation. The Deploy service designs and implements MDG configuration for data architecture governance.

How does EA GraphLink enable data architecture AI queries?

EA GraphLink transforms the repository into an AI-accessible data layer via the MCP (Model Context Protocol). Data architecture elements: entities, lineage relationships, ownership tags, compliance documentation: become queryable through Claude, Copilot, Kernaro AI Hub, or other AI tools connected via MCP. Queries that traverse cross-domain relationships (data entity to application to technology) are supported through the MCP server’s relationship traversal capability. The quality of AI answers reflects the completeness of MDG-governed tagging and relationship modeling in the repository.

What does a Connect engagement cover for a CDO organization?

A Connect engagement for a CDO organization typically covers: EA GraphLink deployment configured for the data architecture scope; MCP server configuration for data lineage and governance queries; Power BI or Tableau dashboards for data governance metrics (coverage of required metadata, lineage completeness, unowned entities); Kernaro AI Hub deployment for CDO stakeholder self-service; and Microsoft Fabric integration if applicable. Connect assumes a governed data architecture foundation: if the data architecture MDG configuration is not yet established, a Discover assessment and Deploy engagement should precede Connect.

Connect Your Data Architecture to the Rest of the Enterprise

Connect: EA GraphLink deployment and data architecture AI integration. $50K–$185K+ depending on integration scope and CDO stakeholder access requirements.

Discover: data architecture governance assessment, MDG readiness for BCBS 239 or GDPR, and cross-domain integration readiness. $25K–$75K.

Also relevant: Microsoft Fabric Integration with Sparx EA

Data Architecture in Sparx EA: The CDO’s Architecture Foundation