Skip to content

Determinism and Tier

Conformance language (MUST/SHOULD/MAY) follows BCP 14 [RFC2119]/[RFC8174] as defined in 00-overview.md. The binding source is the FlowMCP Schemas Specification v4.2.0.


The Grading-Spec separates reproducibility (Determinism) from attainability (Tier). The two axes are orthogonal: a dimension can be deterministic but group-bound, or non-deterministic but autonomous. Both axes are carried independently in the grading entry as the fields determinism and gradingTier (see 08-grading-model.md).

AxisValuesEffect
Determinismdeterministic / non-deterministicReproducibility
Tierautonomous / group-boundMaximum attainable grade (Ch. 7)

A dimension is deterministic when the score is reproducible given:

  • identical inputs, and
  • identical scoringSystem/X.Y.Z version.

Examples: schema structure (v4.2 field-shape check), HTTP status, route-name match, imports scan, API-key-domain match, lint.

A dimension is non-deterministic when the output depends on:

  • the LLM model used,
  • the persona under which the evaluation runs, or
  • the group context (selection composition).

For non-deterministic dimensions, the grading entry MUST record both llmModel and selectionContext (see 08-grading-model.md).

Some dimensions have both deterministic and non-deterministic sub-parts. The canonical example is the About Resource compliance: the route-exists check (is an About Resource declared and present?) is deterministic; the content judgement (is the About content meaningful?) is non-deterministic.

For mixed forms, implementers MAY:

  • split the dimension into two sub-dimensions (one deterministic, one non-deterministic), or
  • collapse it into a single dimension with determinism = non-deterministic (the strictly reproducible sub-part still runs, but the aggregate carries the weaker reproducibility claim).

The dimension is graded by an autonomous grader on the provider side (04-phases-single.md) without group context. The maximum attainable grade for an aggregate composed exclusively of autonomous dimensions is B.

The dimension is graded by a group- or persona-bound grader on the selection side (05-phases-selection.md). Grade A is reachable only when the aggregate contains at least one group-bound contribution.

The grading model (08-grading-model.md) exposes a maxAttainableGrade field. This field makes it visible to a consumer that — for a schema graded only on the provider side — a higher grade is reachable by adding the schema’s namespace to a selection and running the selection-side Areas.


The following table is the non-exhaustive but canonical mapping of grading dimensions to the two axes. Each row carries the dimension name, its determinism value, its tier, and the Area that writes it.

DimensionDeterminismTierSource (Area)
Schema structure (v4.2)deterministicautonomoustools-aggregate-schema
HTTP status (200 = pass)deterministicautonomoussingle-test
Tool description neutralitydeterministic (heuristic)autonomoussingle-test / tools-aggregate-schema
whenToUse claritynon-deterministicautonomoussingle-test / tools-aggregate-schema
parameters understandabilitynon-deterministicautonomoussingle-test
About Resource compliancedeterministic (route-exists) + non-deterministic (content)autonomousabout-namespace
namespaceSkillValiditydeterministic + non-deterministicautonomousnamespace-skills
domainConformancedeterministic (against the About / Domain-Knowledge document)group-boundselection-aggregate
selectionSkillL1 / L2 / L3non-deterministicgroup-boundselection-skills-L1 / -L2 / -L3
personaUseCaseFitnon-deterministicgroup-boundselection-aggregate
External-module auditdeterministic (imports) + non-deterministic (purpose)autonomousSecurity (Ch. 9)
API-key-domain matchdeterministicautonomousSecurity (Ch. 9)

A dimension that does not appear in this matrix MUST be added (and its axes declared) before it can be used in a grading entry.


The following four rules are binding for every grader, scorer, and aggregator that conforms to this spec.

  1. HTTP 4xx MUST NOT be treated as “auth-pass”. HTTP 4xx — including 401 and 403 — MUST NOT be scored as pass. 200 is pass; everything else is fail or defect. (See 04-phases-single.md.)
  2. A schema MUST run all applicable deterministic tests. Selective skipping is forbidden. If a deterministic test is applicable to a schema, the grader MUST execute it; the result MAY be n/a only when the test is provably non-applicable (e.g. a jq-pipe check on a schema without output).
  3. aggregateGrade ≥ B SHOULD contain at least one LLM-based (non-deterministic) evaluation. A schema graded exclusively on deterministic dimensions can reach grade B, but the Grading-Spec recommends that at least one LLM verification be present at grade B and above.
  4. aggregateGrade ≥ A MUST contain at least one group-bound evaluation. Grade A is not autonomously reachable. A schema graded only on the provider side (tier = autonomous throughout) cannot be assigned grade A.

The categorical Veto (see 09-security-and-development.md) can be raised on either tier. Veto-driven gates halt dependent Areas regardless of tier — see the cascade-stop rule in 04-phases-single.md and the analogous rule in 05-phases-selection.md.

A Veto is an outcome of its own; it does not reduce a numerical score, it replaces the aggregate grade with REJECTED. The index derivation maps REJECTED to the terminal node status rejected (see 19-folder-layout.md).


Interaction with Scoring- / Grading-System Version

Section titled “Interaction with Scoring- / Grading-System Version”

Determinism applies at a fixed Scoring-System version. A bump of the scoringSystem/X.Y.Z namespace can change how a deterministic test is scored — the test remains deterministic at the new version, but old scores cannot be compared one-to-one to new scores.

When scoringSystem is bumped, schemas MUST be re-scored. Cached scores from older versions MUST NOT be silently aggregated with new scores. The version contract is described in detail in 07-scoring-vs-grading.md.

The same applies to gradingSystem/X.Y.Z bumps: thresholds, weights, tier trims, and the Veto list MAY change; the mapping from scores to grades is therefore version-bound.


maxAttainableGrade is a fixed mapping from gradingTier (see Consumer Visibility). An autonomous grading can reach grade B at most; a group-bound grading can reach grade A. Tier trim is the deterministic final stage of the aggregate computation (see 08-grading-model.md).

Partial vs. Full Grading and the stable Status

Section titled “Partial vs. Full Grading and the stable Status”

A grading with gradingMode: "partial" updates only the explicitly checked Areas / dimensions in the grading set. The aggregateGrade remains at the value computed by the most recent mode: "full" operation. Promotion to the node status stable is possible only through a mode: "full" grading.

Rationale: partial gradings serve iteration steps (re-testing a single dimension on purpose). If they changed the aggregate, a single improvement step could distort the overall evaluation without the remaining dimensions having been re-checked.

ModeAllowed grading subsetEffect on aggregateGradeEffect on node status
fullAll applicable Areas / dimensionsRecomputedMay switch to stable
partialA subsetUnchanged (stays at the last full value)Stays at the last full status

aggregateGrade remains is the binding statement: a partial grading MUST NOT overwrite the previous aggregate. A pure collection of partials without a concluding full grading never reaches the status stable.

The node status of a graded primitive is one of five values, derived by the index rollup (see 19-folder-layout.md):

StatusMeaning
pendingNot yet graded.
blockedCannot be graded right now, with a reason (fewer than 3 working tests, no About Resource, API unreachable) — repairable.
gradedA grade exists.
stableFully graded via a mode: "full" operation and above threshold — ready for use; only this status passes the selection pre-condition.
rejectedVeto raised — terminal and irreversible.

The partial/full distinction (see Partial vs. Full Grading and the stable Status) interacts directly with this status set: partial keeps the node at its last full status, only full can move a node to stable.

Cross-Refs: