Insight · TOGAF ADM

TOGAF Phase C: Information Systems Architecture — Application and Data in Sparx EA

By Sparx Services · December 15, 2025

TOGAF TOGAF Phase C answers a deceptively simple question: what systems do we need, and what data do they handle? It is deliberately split into two sub-phases — Application Architecture and Data Architecture — and the discipline of keeping them separate is what makes the output usable downstream.

What this covers

Why Phase C splits into two sub-phases, and how to structure the Sparx EA package tree to match.
Modeling the application architecture — portfolio, interactions, and integration — on the ArchiMate application layer.
Building the data architecture — logical model, canonical model, and lineage — with the right traceability.
Running the application and data gap analyzes that drive application portfolio management and rationalization decisions.

Phase C structure: two sub-phases

Many practitioners blur the boundary and produce a combined "application and data" model that satisfies neither sub-phase. Keep them separate in your Sparx EA package structure:

Phase C — Information Systems Architecture
├── C1 — Application Architecture
│   ├── Baseline Application Architecture
│   ├── Target Application Architecture
│   └── Application Gap Analysis
└── C2 — Data Architecture
    ├── Logical Data Model
    ├── Canonical Data Model
    ├── Data Lineage Diagrams
    └── Data Gap Analysis

The separation matters because the two sub-phases have different stakeholders (application owners vs. data owners), different approval timelines, and different downstream consumers. Application architecture decisions feed Phase D (technology); data architecture decisions feed integration design and analytics.

Application architecture in Sparx EA

Application portfolio. Model the complete application landscape using ArchiMate Application Component elements. Each component should carry a standard set of tagged values:

app_owner — business owner (not IT owner)
lifecycle_status — Active / Sunset / Strategic / Legacy
total_cost_of_ownership — annual figure
vendor — vendor name
contract_expiry — if applicable
capability_served — link to the Phase B capability map

With those tagged values in place, Sparx EA can generate an application portfolio dashboard, and the same dataset can be exposed to a live Power BI report — giving portfolio managers a current application inventory without manual spreadsheet maintenance.

Application interaction diagrams. For each significant integration point, create an ArchiMate application-collaboration diagram showing which components exchange data, through which interfaces (Application Interface elements), and the direction and frequency of exchange. Do not model every API — model the architecturally significant interactions: those that cross organizational boundaries, carry critical data, or represent technical risk.

Integration architecture. For complex integration landscapes, create a dedicated integration view. Model the integration platform (ESB, API gateway, event bus) as an Application Component with Realization relationships to the integration services it provides, and show each application's connectivity to that layer using Application Association relationships.

Data architecture in Sparx EA

Logical data model. The logical data model captures the key business entities and their relationships, independent of physical implementation. Use a UML Class diagram with entity stereotypes, or ArchiMate Data Object elements for a higher-level view. Link key entities to the business capabilities and application components that create, read, update, or delete them.

To build a CRUD matrix, use the Relationship Matrix with Data Objects on one axis and Application Components on the other, and populate the cells with CRUD tagged values. This matrix is the foundation of the data architecture and the primary tool for spotting data-ownership ambiguity — the most common data architecture problem.

Canonical data model. The canonical model is a vendor-neutral reference for the key business entities used in integration. Model it as a separate package from the logical data model. Each canonical entity is the "master" definition that integration mappings reference: when an application exposes or consumes data via an API, it maps to and from the canonical entity, not directly to another application's proprietary schema.

Data lineage diagrams. Lineage shows how data flows through the landscape — from source through transformations to consumption. Use ArchiMate data-flow views or UML sequence diagrams, and tag each flow:

data_classification — Public / Internal / Confidential / Restricted
data_quality_score — if a data quality assessment has been run
lineage_verified — whether the lineage has been validated with the application owner

Documented lineage is what makes downstream analytics trustworthy: when a Power BI report knows the lineage of the data it shows, quality issues can be traced to source rather than blamed on the report.

The application gap analysis, step by step

The gap analysis compares the baseline application architecture to the target and produces a disposition decision for each application. Run it as a repeatable sequence in the model rather than a one-off workshop.

Establish the baseline inventory

Confirm every Application Component element carries app_owner, lifecycle_status, total_cost_of_ownership, and capability_served. The gap analysis is only as good as the tagged values behind it.

Assign a disposition to each application

Add a disposition tagged value with one of: Retain (meets target, no change), Invest (enhance or scale), Migrate (same capability, different form, e.g. cloud), Replace, Retire (no longer needed), or Tolerate (keep for now, decide later).

Render the disposition heat map

Use a diagram filter or SQL report to color the landscape by disposition. This is the view that turns rationalization from a political exercise into an evidence-based one.

Publish it as live intelligence

Capability linkage, lifecycle status, TCO, contract expiry, and disposition all live as tagged values. Export them to Power BI so the rationalization dashboard updates as architects update the model — not when someone remembers to refresh a spreadsheet.

With this output in place, the rationalization conversation changes from "we have too many CRMs" (opinion) to "three of our five CRM instances serve capabilities that consolidate in the target architecture; two have contract expiries within 18 months; the combined annual TCO is $4.2M" (model-derived fact).

The data gap analysis

The data gap analysis identifies four kinds of gap. Model each one as a Gap element linked to the affected data entity and the capability or application that requires it, then tag it with type and priority:

Missing data entities — capabilities in the target architecture with no identified data source.
Data quality gaps — entities that exist but lack the quality the target use cases demand.
Data ownership gaps — entities with no authoritative source (master-data-management failures).
Integration gaps — data that needs to flow between applications where no integration exists.

The data gap analysis feeds the integration architecture design here in Phase C and the platform-selection decisions in Phase D.

Should Application Architecture or Data Architecture come first?

Application architecture usually comes first, because the application landscape is better understood and provides the context for data decisions. But if the program is data-driven — a data platform initiative, say — start with data architecture and let it drive application decisions. The TOGAF standard is silent on ordering within Phase C, so let the program context decide.

How detailed should application interaction diagrams be?

Architecturally significant integrations only. The test: does it cross an organizational boundary, carry data classified Confidential or above, represent a technical risk, or involve a vendor boundary? If yes, model it. Internal point-to-point integrations within a single application are implementation detail. A typical enterprise of 200 applications has 20–40 significant integrations — not 2,000.

How does Phase C relate to application portfolio management?

Phase C creates the architecture-layer view of the portfolio; APM adds the business-layer view of cost, risk, value, and strategic fit. The Application Component elements you create in Phase C become the persistent portfolio inventory, updated as the landscape changes. Exposing that inventory to business intelligence tools removes the manual extraction step entirely.

Do we really need a canonical data model?

You need one if more than three applications exchange data about the same entities (customers, products, orders). Without it, every integration carries a bespoke schema mapping and you get N-squared complexity. With it, every application maps to one reference — linear complexity. For small landscapes it is overkill; above 20 applications it is an architecture necessity.

How does Phase C data architecture connect to AI initiatives?

AI tools need well-governed data with known lineage, quality, and ownership. Phase C data architecture is the foundation that makes AI projects succeed or fail. If your data architecture shows fragmented ownership, no canonical model, and poor quality scores, AI projects hit data-quality walls during build. Phase C produces the remediation roadmap that clears those blockers first.

What is the right granularity for a logical data model?

Entity level — key business objects and their relationships — not the attribute level. Capturing every field is database design, not architecture. Aim for 20–50 core entities for a mid-sized enterprise, each traceable to at least one business capability and one application component. If an entity cannot be traced to either, question whether it belongs in the Phase C model.

Connect Phase C to live analytics

If you need to connect your Phase C application and data architecture to live analytics and AI tools — exposing the application portfolio and data lineage to Power BI, Tableau, or AI assistants — that integration layer is what AI Power Tools for EA is built for. It turns the structured model you produced in Phase C into a live business-intelligence asset, and is a natural next step once the architecture is in place. For the broader picture, see how AI Augmented Architecture reshapes the work across modeling, analysis, governance, and engagement.

Make your application and data model a live asset.

Talk to a practitioner about turning your Phase C architecture into a current portfolio and lineage view your stakeholders actually use.

Book a call →

Keep reading