Why your EA repository quality determines your AI output quality

The MDG is the instruction manual

When you ask an AI system to query your EA repository and answer a question like “what data sources feed our Customer service system?”, the AI doesn’t understand “Customer service system” the way you do.

It understands the raw database schema: tables, columns, relationships, constraints. The word “Customer service system” exists somewhere in that schema, probably in an element name, probably with variation in spelling or abbreviation across different parts of the repository.

What transforms the raw database into something the AI can actually reason about is the Meta Data Group: the metamodel that defines what kinds of things can exist in your EA repository, what attributes they have, how they relate to each other, and what each of those things means conceptually.

The MDG is the instruction manual that tells the AI: “A System is a type of Application Component. Application Components have attributes like Technology Stack and Owner and Criticality. Application Components can Connect To other Application Components. A Connection represents a logical data or control flow.”

Without that manual, the AI sees database columns but doesn’t understand what they represent architecturally. With the manual, the AI can translate your business question into database queries and then translate the results back into architecture answers.

This is why MDG quality is the single biggest variable in whether your AI output is useful.

What bad MDG looks like vs. clean MDG

Picture two scenarios.

Scenario 1: Implicit knowledge.

Your architects have modeled the organization’s systems extensively. There are hundreds of application components, dozens of technology platforms, elaborate connection diagrams. But when you ask where the MDG defines what a “Technology Platform” is, what attributes it must have, what types of things can connect to it, the answer is fuzzy. The definition exists partially in the metamodel, partially in architectural conventions that only experienced architects know, partially in the actual data itself.

Some platforms have a clear Business Capability mapping. Some don’t. Some have a Technology Type defined. Others use a description field instead. Some connections are modeled as formal relationships. Others are documented in a notes field. The pattern isn’t consistent because the MDG doesn’t enforce it. When the AI tries to answer “what platforms support our e-commerce capability?”, it has to navigate contradictory patterns and makes inferences about what the data probably means.

The answers are sometimes right and sometimes wrong, in ways that are hard to predict. Stakeholders learn not to trust the output.

Scenario 2: Clean MDG.

The MDG explicitly defines Technology Platform as an element type with mandatory attributes: Name, Technology Category, Vendor, Life Cycle Status, Business Capability (cardinality: many-to-many). Every Technology Platform in the repository has these attributes populated consistently. The allowed values for Technology Category and Life Cycle Status are explicitly defined. Relationships are all modeled using the formal Supports relationship type, never documented in notes.

When you ask “what platforms support our e-commerce capability?”, the AI traverses the explicit Supports relationship from e-commerce capability to every Technology Platform that connects to it. The answer is reliable because the underlying data is structured and consistent.

The difference isn’t subtle. One produces answers that require human verification. The other produces answers you can act on.

Why this matters for Kernaro and GraphLink

Sparx’s AI Hub (Kernaro) and the GraphLink MCP server work by transforming your EA repository’s database schema into a semantic layer that the AI can reason about.

That transformation depends entirely on a well-defined MDG. If your MDG is clean, if the metamodel is explicit and the data follows it consistently, then GraphLink can build an accurate semantic model and the AI can answer questions reliably. If your MDG is implicit or inconsistent, GraphLink can only approximate what the data means, and the AI’s answers reflect that uncertainty.

This is why some organizations get transformative value from Connect, and others get answers that sound plausible but aren’t quite right.

The organizations that get value did the MDG work first. They didn’t model the entire landscape perfectly, that’s an infinite task. But they made the metamodel explicit, applied it consistently to the elements that matter most, and invested in keeping it current. They’re using the tool to answer well-defined questions about well-structured data.

The organizations that get unclear answers skipped the MDG discipline. They modeled extensively but inconsistently. The data is rich but implicit. The AI is trying to reverse-engineer what the architecture actually is from contradictory signals.

A concrete example

Imagine you want to know: “What systems have a criticality of High or above that depend on the legacy database platform?”

With a clean MDG, the query path is clear: traverse from Database Platform (explicitly defined element type, with a Technology Type attribute that can be filtered) to all Systems that have a Depends On relationship (explicitly defined relationship type) and filter where System.Criticality (explicitly defined attribute) is in the set [High, Critical]. The result is reliable.

With an implicit MDG, the same question requires guessing:

Is “Database Platform” a Technology Platform element, or a Technology Component? The MDG doesn’t explicitly distinguish.
Is the legacy database called “Legacy DB” or “Oracle Legacy” or something else? The naming pattern isn’t enforced.
Are dependencies modeled as formal relationships or documented in Connection diagrams or noted in description fields?
Is “Criticality” an attribute on System elements or on something else? Is it called Criticality or Business Impact or something else?

The AI can make educated guesses. Sometimes it’s right. Sometimes the combination of guesses produces a plausible-sounding but incorrect answer.

The only thing that separates these scenarios isn’t the AI. It’s the data discipline behind the MDG.

What to do about it

Before you engage Connect or GraphLink, audit your MDG against these questions:

Is your metamodel explicit? Can you point to a specific document or configuration that defines what element types exist, what attributes they have, what relationships are possible, and what the cardinality is? Or does the definition exist partially in the metamodel and partially in the minds of your architects?

Is your data consistent? Do all System elements have a Technology Stack attribute? Do all Dependencies use the same relationship type? Or do you find one-offs and workarounds because the MDG wasn’t quite right?

Are critical concepts modeled explicitly? If Business Capability is important to how you think about architecture, is it a first-class element type with explicit relationships to systems and technology? Or is it documented in descriptions and diagrams?

How much is implicit? How much of the architectural meaning lives in diagrams, notes, and architect knowledge rather than in the structured data itself?

If you answer “our MDG is pretty clean,” then GraphLink and Connect will give you reliable answers and significant value.

If you answer “our MDG is implicit and inconsistent,” then you have a choice: invest in MDG discipline first, or accept that AI output will require human verification. Many organizations choose to invest in the MDG work because the payoff extends far beyond AI, a clean metamodel improves everything that touches the repository.

The right sequence

This is why the conversation about AI should come after the conversation about MDG.

“Our repository is mature but our metamodel is implicit” is a solvable problem. You don’t need to remodel everything. You focus on the elements and relationships that matter most, the ones people ask about, the ones that appear in governance decisions, the ones that drive technical choices. You make the metamodel explicit. You ensure the data follows it. You establish a discipline to keep it that way.

Then you plug in GraphLink and AI. The investment in MDG discipline pays off immediately in AI output quality, but it also improves the efficiency of every architect who uses the repository and the confidence of every stakeholder who reads the models.

The AI isn’t what makes the architecture information accessible. Clean, consistent, explicitly-modeled data is what makes it accessible. The AI is just the interface that makes the accessibility effortless.

If you’re considering Connect or GraphLink, the first conversation isn’t with your AI vendor. It’s with your architects about whether your MDG is ready. In most cases, a focused effort to clean up the metamodel pays for itself many times over in improved decision-making, improved tool efficiency, and improved AI output quality.

Start there. The AI will be ready when you are.

Why your EA repository quality determines your AI output quality

The MDG is the instruction manual

What bad MDG looks like vs. clean MDG

Why this matters for Kernaro and GraphLink

A concrete example

What to do about it

The right sequence

Related insights

Why architecture data is the missing context layer in your AI strategy

Sparx EA: 100 Questions Answered by Enterprise Architecture Practitioners

Ready to make your EA investment work harder?