Direct Answer
TOGAF Phase C covers two sub-phases: Application Architecture and Data Architecture. Together they answer the question “what systems do we need, and what data do they handle?” In Sparx EA, application architecture uses the ArchiMate application layer — application components, interfaces, and interaction diagrams — while data architecture uses logical data models, canonical data models, and data lineage diagrams. The gap analysis compares the baseline application landscape to the target, identifying applications to build, buy, replace, retire, or rationalise. Phase C outputs are the primary inputs to application portfolio management decisions, vendor selection, and integration strategy. They also establish the data layer that EA GraphLink’s Interface A exposes to Power BI and Tableau — making Phase C the point where your enterprise architecture starts feeding your analytics estate.
Phase C Structure: Two Sub-Phases
TOGAF Phase C is deliberately split. Many practitioners blur the boundary and produce a combined “application and data” model that satisfies neither sub-phase. Keep them separate in your Sparx EA package structure:
“ Phase C - Information Systems Architecture ├── C1 - Application Architecture │ ├── Baseline Application Architecture │ ├── Target Application Architecture │ └── Application Gap Analysis └── C2 - Data Architecture ├── Logical Data Model ├── Canonical Data Model ├── Data Lineage Diagrams └── Data Gap Analysis “
This separation matters because the two sub-phases have different stakeholders (application owners vs data owners), different approval timelines, and different downstream consumers. Application architecture decisions feed Phase D (technology). Data architecture decisions feed integration design and analytics.
Application Architecture in Sparx EA
Application Portfolio. Model the complete application landscape using ArchiMate Application Component elements. Each application component should have standard tagged values:
app_owner— business owner (not IT owner)lifecycle_status— Active / Sunset / Strategic / Legacytotalcostof_ownership— annual figurevendor— vendor namecontract_expiry— if applicablecapability_served— link to Phase B capability map
With these tagged values in place, Sparx EA can generate an application portfolio dashboard. EA GraphLink Interface A (GraphQL → Power BI) exposes this dataset directly to a Power BI report — giving portfolio managers a live application inventory without manual spreadsheet maintenance.
Application Interaction Diagrams. For each significant integration point, create an ArchiMate application collaboration diagram showing which application components exchange data, through which interfaces (Application Interface elements), and the direction and frequency of exchange. Do not model every API — model the architecturally significant interactions: those that cross organisational boundaries, carry critical data, or represent technical risk.
Integration Architecture. For complex integration landscapes, create a dedicated integration architecture view. In Sparx EA, model the integration platform (ESB, API gateway, event bus) as an Application Component with realisation relationships to the integration services it provides. Show each application’s connectivity to the integration layer using Application Association relationships.
Data Architecture in Sparx EA
Logical Data Model. The logical data model captures the key business entities and their relationships, independent of physical implementation. In Sparx EA, use the UML Class diagram with entity stereotypes, or ArchiMate Data Object elements for a higher-level view. Key entities should be linked to the business capabilities and application components that create, read, update, or delete them (CRUD matrix).
To create a CRUD matrix in Sparx EA: use the Relationship Matrix with Data Objects on one axis and Application Components on the other. Populate relationships with CRUD tagged values. This matrix is the foundation of the data architecture and the primary tool for identifying data ownership ambiguity — the most common data architecture problem.
Canonical Data Model. The canonical data model is a vendor-neutral reference for key business entities used in integration. In Sparx EA, model it as a separate package from the logical data model. Each canonical entity is the “master” definition that integration mappings reference. When an application exposes or consumes data via an API, it maps to and from the canonical entity, not directly to another application’s proprietary schema.
Data Lineage Diagrams. Data lineage shows how data flows through the application landscape — from source (where it is created) through transformations to consumption (where it is used for decisions). In Sparx EA, use ArchiMate Data Flow views or UML sequence diagrams to model lineage. Tag each data flow with:
data_classification— Public / Internal / Confidential / Restricteddataqualityscore— if a data quality assessment has been runlineage_verified— whether the lineage has been validated with the application owner
Data lineage is what makes EA GraphLink’s Interface A genuinely useful for analytics: when Power BI knows the lineage of the data it is reporting on, data quality issues can be traced to source rather than blamed on the report.
The Application Gap Analysis
The gap analysis compares the baseline application architecture to the target, producing a disposition decision for each application:
- Retain — meets target requirements, no change needed
- Invest — needs enhancement or scaling
- Migrate — capability is needed but in a different form (e.g., cloud migration)
- Replace — will be replaced by a different application
- Retire — capability is no longer needed in the target architecture
- Tolerate — keep as-is for now, scheduled for future decision
In Sparx EA, add a disposition tagged value to each Application Component element. Then use a custom diagram filter or SQL report to produce the disposition heat map — a view of the application landscape colour-coded by disposition decision.
This is the primary input to application rationalization. It answers the question “which applications are we keeping, which are we replacing, and in what order?” with evidence from the model rather than individual stakeholder opinion.
How Phase C Feeds Application Rationalization
Application rationalization without an architecture model is a political exercise. With a Phase C output in Sparx EA, it becomes an evidence-based process.
The rationalization conversation changes from “we have too many CRMs” (opinion) to “three of our five CRM instances serve capabilities that will be consolidated in the target architecture; two have contract expiries within 18 months; the TCO for all five is $4.2M annually” (model-derived facts).
The data for this analysis — capability linkage, lifecycle status, TCO, contract expiry, disposition — is all in tagged values on Application Component elements. EA GraphLink Interface A exposes it to Power BI as a live dataset. The rationalization dashboard updates automatically as architects update the model, not when someone remembers to update a spreadsheet.
Data Gap Analysis
The data gap analysis identifies:
- Missing data entities — capabilities in the target architecture that have no identified data source
- Data quality gaps — entities that exist but with insufficient quality for target use cases
- Data ownership gaps — entities with no identified authoritative source (master data management failures)
- Integration gaps — data that needs to flow between applications but no integration exists
Model each gap as a Gap element in Sparx EA, linked to the affected data entity and the capability or application that requires it. Tag with gap type and priority. The data gap analysis feeds the integration architecture design in Phase C and the platform selection decisions in Phase D.
Frequently Asked Questions
Q: Should Application Architecture or Data Architecture come first in Phase C? A: Application Architecture typically comes first because the application landscape is better understood and provides the context for data architecture decisions. However, if the programme is data-driven (a data platform initiative, for example), start with data architecture and let it drive application decisions. The TOGAF standard is silent on ordering within Phase C — let the programme context decide.
Q: How detailed should application interaction diagrams be? A: Architecturally significant integrations only. The test: does this integration cross an organisational boundary, carry data classified Confidential or above, represent a technical risk, or involve a vendor boundary? If yes, model it. Internal point-to-point integrations within a single application are implementation detail, not architecture. A typical enterprise of 200 applications has 20–40 architecturally significant integrations — not 2,000.
Q: How does Phase C relate to an application portfolio management practice? A: Phase C creates the architecture-layer view of the application portfolio. APM adds the business-layer view: cost, risk, business value, strategic fit. In Sparx EA, APM is an ongoing practice built on top of Phase C outputs. The application component elements created in Phase C become the persistent portfolio inventory, updated as the landscape changes. Sparx EA’s EA GraphLink Interface A makes this inventory available to business intelligence tools without manual extraction.
Q: What is a canonical data model and do we really need one? A: A canonical data model is a vendor-neutral reference schema for your most important business entities. You need one if you have more than three applications exchanging data about the same entities (customers, products, orders, etc.). Without a canonical model, every integration has a bespoke schema mapping and you get N-squared integration complexity. With a canonical model, every application maps to one reference — linear complexity. For small landscapes it is overkill; for anything above 20 applications it is an architecture necessity.
Q: How do we handle applications that are in flight (being replaced or built) during Phase C? A: Model both the current state and the target state as separate application component elements. Use a lifecycle_status tagged value of In-Flight for the transitional application. This preserves the baseline architecture for gap analysis while making the target architecture explicit. Do not model only the target — the gap analysis requires the baseline as a reference point.
Q: How does Phase C data architecture connect to AI initiatives? A: AI tools need data — specifically, well-governed, well-documented data with known lineage, quality, and ownership. Phase C data architecture is the foundation that makes AI projects succeed or fail. If your data architecture shows fragmented ownership, no canonical model, and poor quality scores, AI projects will hit data quality walls during build. Phase C produces the data architecture remediation roadmap that addresses these issues before they become AI project blockers.
Q: Can Sparx EA generate the application portfolio report automatically? A: Yes. Using the built-in model search and reporting features, or via EA GraphLink Interface A (GraphQL → Power BI), Sparx EA can generate an application portfolio report from tagged values on Application Component elements. The report is always current because it draws from the live model. Configuration takes a half-day; ongoing maintenance is zero.
Q: What is the right level of granularity for a logical data model in Phase C? A: Phase C logical data models operate at the entity level — key business objects and their relationships — not the attribute level. Capturing every field is a database design activity, not an architecture activity. Aim for 20–50 core entities for a mid-sized enterprise. Each entity should be traceable to at least one business capability and at least one application component. If an entity cannot be traced to either, question whether it belongs in the Phase C model.
Next Step
If you need to connect your Phase C application and data architecture to live analytics and AI tools — exposing your application portfolio and data lineage to Power BI, Tableau, or AI assistants — the Connect engagement is the right next step. Connect delivers EA GraphLink configuration, the Interface A GraphQL layer, and the Power BI templates that turn your Sparx EA model into a live business intelligence asset.