Skip to content

Contract Diffing

Prerequisites

Install MCP Fusion before following this guide: npm install @vinkius-core/mcp-fusion @modelcontextprotocol/sdk zod — or scaffold a project with npx fusion create.

Traditional diffing tools compare bytes. A tool's description changed? That's a text diff. A tool gained a subprocess entitlement? Also a text diff — same severity, same visual treatment in the PR review.

But these are not the same change. The description rewording is cosmetic — the LLM won't behave differently. The subprocess entitlement is a capability expansion — the tool can now execute arbitrary commands. ContractDiff understands this difference. Every change is classified by behavioral impact, not by how many bytes moved.

Severity Classification

Every contract change falls into one of four severity levels. The classification is deterministic — it depends on which field changed and the direction of the change, not on heuristics:

SeverityMeaningWhen it fires
BREAKINGThe LLM's behavior will fail or hallucinateEgress schema changed, system rules changed, action removed, readOnly flipped, handler gained subprocess entitlement
RISKYThe LLM's behavior might be affectedCognitive guardrail loosened, middleware chain changed, affordance topology changed, concurrency config changed
SAFEAdditive change, no regression riskNew action added, required field removed, entitlement dropped, inflation risk decreased
COSMETICNo behavioral impactDescription rewording, tag added without removal

This matters because your CI pipeline can make decisions based on severity. BREAKING changes block the merge. RISKY changes require a senior review. SAFE and COSMETIC changes pass automatically.

Using the Diff Engine

typescript
import { diffContracts, formatDiffReport } from '@vinkius-core/mcp-fusion/introspection';

const result = diffContracts(previousContract, currentContract);

console.log(result.maxSeverity);
// "BREAKING" | "RISKY" | "SAFE" | "COSMETIC"

console.log(result.isBackwardsCompatible);
// false — any BREAKING delta makes this false

console.log(formatDiffReport(result));

The formatted report reads like a changelog:

text
[invoices] Contract diff: 3 change(s), max severity: BREAKING

  [BREAKING] systemRulesFingerprint: System rules changed — LLM behavioral calibration invalidated
         static:abc123 → dynamic:def456
  [RISKY] middlewareChain: Middleware chain changed — execution semantics may differ
         auth:mw → auth:mw,rate-limit:mw
  [SAFE] actions.refund: Action "refund" was added
         (added) refund

Deltas are sorted by severity — BREAKING first, COSMETIC last. The ContractDiffResult gives you everything programmatically:

typescript
interface ContractDiffResult {
  readonly toolName: string;
  readonly deltas: readonly ContractDelta[];
  readonly maxSeverity: DeltaSeverity;
  readonly digestChanged: boolean;
  readonly isBackwardsCompatible: boolean;
}

What Gets Compared

The diff engine inspects every section of the ToolContract. Here's what fires at each severity level.

Surface Changes

Removing an action is BREAKING — the LLM was trained to call it and now it's gone. Adding an action is SAFE — no existing behavior regresses. Changing the description is COSMETIC — the LLM might parse it slightly differently but the behavioral contract is intact.

FieldChangeSeverity
inputSchemaDigestSchema changedBREAKING
actions.{key}Action removedBREAKING
actions.{key}Action addedSAFE
descriptionDescription changedCOSMETIC
tagsTag removedSAFE
tagsTag added (no removal)COSMETIC

Action-Level Changes

Flipping readOnly or destructive is BREAKING — the LLM's behavioral calibration for that action is invalid. Adding a new required field is BREAKING — existing invocations will fail validation. Removing a required field is SAFE — existing invocations still work.

FieldChangeSeverity
destructiveFlag changedBREAKING
readOnlyFlag changedBREAKING
requiredFieldsNew required fieldBREAKING
requiredFieldsRequired field removedSAFE
idempotentFlag changedRISKY
inputSchemaDigestAction schema changedRISKY
presenterNamePresenter removedBREAKING
presenterNamePresenter changedRISKY

Behavior Changes

System rules changing is BREAKING — the LLM was calibrated to behave according to those rules. Removing a guardrail is RISKY — the response might flood the context. Adding a guardrail is SAFE — it only constrains.

FieldChangeSeverity
egressSchemaDigestEgress schema changedBREAKING
systemRulesFingerprintSystem rules changedBREAKING
agentLimitMaxLimit removedRISKY
agentLimitMaxLimit added/tightenedSAFE
egressMaxBytesCap removedRISKY
egressMaxBytesCap added/tightenedSAFE
middlewareChainChain changedRISKY
affordanceTopologyNavigation graph changedRISKY

Entitlement Changes

Gaining an I/O capability is always BREAKING. Losing one is always SAFE. There is no ambiguity — if the handler quietly gains a subprocess entitlement through an obscure dependency update, it mathematically guarantees a potential Lateral Movement Attack. Contract Diffing flags this instantly as BREAKING, preventing malicious capabilities from passing CI untouched.

FieldChangeSeverity
filesystemGained trueBREAKING
networkGained trueBREAKING
subprocessGained trueBREAKING
cryptoGained trueBREAKING
AnyLost truefalseSAFE

Token Economics Changes

Inflation risk escalating is BREAKING — higher risk of system rule eviction from the context window. Risk de-escalating is SAFE.

FieldChangeSeverity
inflationRiskEscalated (e.g., low → high)BREAKING
inflationRiskDe-escalated (e.g., high → low)SAFE
unboundedCollectionBecame unboundedRISKY
unboundedCollectionBecame boundedSAFE

Real-World Scenarios

Silent Schema Widening

A tool's input schema gains a new optional parameter. The tool name and description are identical. The diff engine catches the action-level schema digest change:

typescript
const result = diffContracts(before, after);
// [RISKY] actions.upload.inputSchemaDigest: Action "upload" input schema changed
//         sha256:aaa... → sha256:bbb...

System Rules Removed

A Presenter loses its system rules (e.g., "Never expose PII"). This is BREAKING because the LLM was calibrated to behave according to those rules — removing them invalidates the behavioral contract:

typescript
const result = diffContracts(before, after);
// [BREAKING] systemRulesFingerprint: System rules changed
//            static:abc123 → static:e3b0c4...

Capability Expansion

A tool gains subprocess entitlement — the handler now imports child_process. The tool's declared surface hasn't changed, but its blast radius expanded:

typescript
const result = diffContracts(before, after);
// [BREAKING] subprocess: Handler gained "subprocess" entitlement
//            false → true

Formatting for LLM Self-Healing

The diff engine can format deltas as XML for injection into LLM correction prompts. This is how Self-Healing Context works — when the LLM sends wrong arguments because the contract changed, the error response includes the specific changes:

typescript
import { formatDeltasAsXml } from '@vinkius-core/mcp-fusion/introspection';

const xml = formatDeltasAsXml(result.deltas);
xml
<contract_changes>
  <change severity="BREAKING" field="systemRulesFingerprint">
    <description>System rules changed</description>
    <before>static:abc</before>
    <after>dynamic:def</after>
  </change>
  <change severity="SAFE" field="actions.list">
    <description>Action "list" was added</description>
    <after>list</after>
  </change>
</contract_changes>

CI Integration

Combine diffing with the lockfile check for a complete governance gate:

typescript
import { readLockfile, checkLockfile } from '@vinkius-core/mcp-fusion/introspection';
import { diffContracts, formatDiffReport } from '@vinkius-core/mcp-fusion/introspection';

const lockfile = await readLockfile(process.cwd());
const result = checkLockfile(lockfile!, contracts);

if (!result.ok) {
  for (const toolName of result.changed) {
    const before = lockfileToContract(lockfile!, toolName);
    const after = contracts[toolName]!;
    const diff = diffContracts(before, after);
    console.error(formatDiffReport(diff));
  }
  process.exit(1);
}

The lockfile check tells you which tools changed. The diff engine tells you how they changed and what impact that has on the LLM.