Zum Inhalt springen

Provider-Side Grading Areas

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Conformance language (MUST/SHOULD/MAY) follows BCP 14 [RFC2119]/[RFC8174] as defined in 00-overview.md. The binding source is the FlowMCP Schemas Specification v4.2.0.


This chapter is the normative source for the provider-side grading Areas — the Areas that grade one schema inside one namespace without group context. It replaces the linear phase model of earlier spec versions with an Area model: each Area is a self-contained grading rubric attached to the primitive it evaluates, written to a _gradings/ folder next to that primitive (see 19-folder-layout.md).

The provider side produces the base unit of the FlowMCP corpus: one namespace with one or more schemas, namespace skills, and an About Resource. Higher-level grouping (selection side) is defined separately in 05-phases-selection.md.

A schema graded only on the provider side has gradingTier = autonomous. Per 06-determinism-and-tier.md, the maximum attainable grade on this tier is B. Grade A requires a group-bound contribution from the selection side.


The grading system defines eleven Areas in total (see 05-phases-selection.md and 19-folder-layout.md). Of these, the following six are provider-side (everything except the selection Areas):

#AreaEvaluates_gradings/ locationPersonaDet/Non-Det
1single-testone toolproviders/<ns>/<schema>/tools/<tool>/_gradings/nodeterministic gate + non-deterministic
2tools-aggregate-schemathe tools collection of one schemaproviders/<ns>/<schema>/_gradings/noboth
3tools-aggregate-namespacetools across the namespaceproviders/<ns>/_gradings/noboth
4namespace-descriptionnamespace metadataproviders/<ns>/_gradings/nonon-deterministic
5namespace-skillsone namespace skill (per skill)providers/<ns>/<schema>/skills/<skill>/_gradings/yesnon-deterministic
6about-namespacethe About Resource (declared in one schema)providers/<ns>/<schema>/resources/about/_gradings/yesdeterministic (route-exists) + non-deterministic

The remaining five Areas (about-selection, selection-skills-L1, selection-skills-L2, selection-skills-L3, selection-aggregate) are selection-side and live in 05-phases-selection.md.

Each Area is graded independently. There is no fixed linear order between Areas; the only ordering obligations are the cascade and veto procedures (see Area Procedures) and the deterministic-first rule of 06-determinism-and-tier.md.


The Area model retains four procedures that previously lived inside the phase model. They are now expressed as rules that apply across the provider-side Areas.

Description Cascade (within single-test and tools-aggregate-*)

Section titled “Description Cascade (within single-test and tools-aggregate-*)”

The description cascade is a mandatory ordered procedure for validating tool descriptions. It MUST be executed in the following order; skipping or reordering steps is a finding.

  1. Run tests against the endpoint. SHOULD: at least 3 working tests per tool (status true and non-empty data), covering the breadth of the parameter space. Fewer than 3 working tests blocks the tool from full grading and is recorded with a status reason (see 06-determinism-and-tier.md).
  2. Check the responses and validate the tool description against the actual responses.
  3. Normalise / update the tool description to match the validated responses.
  4. All tools, resources, and prompts MUST have descriptions — and each description MUST be individually checked.
  5. Descriptions MUST be neutral — see Description Neutrality.

The cascade is a contract: outputs of step n are inputs of step n+1. A failure in any step halts the cascade for the affected tool and is recorded as a finding. single-test carries the per-tool cascade result; tools-aggregate-schema and tools-aggregate-namespace aggregate the per-tool cascade outcomes.

The neutrality rule (cascade step 5) is normative and worth restating:

  • A tool description states what the tool does (capabilities, parameters, return shape).
  • A tool description MUST NOT state what for it should be used (application scenarios, persona use cases, “good for X”).
  • Application scenarios and persona use cases belong in the About Resource (11-about-convention.md) — not in the tool description.

This separation is essential for LLM-grader reproducibility: neutral descriptions can be deterministically compared to the observed API behaviour; mixed descriptions cannot.

A failed gate MUST halt the dependent grading for the affected schema. A categorical veto raised in any Area MUST stop further grading for that schema. Examples:

  • api-key-domain-mismatch veto — when the API key declared in the schema metadata does not match the API root domain, the veto is raised and the single-test live tests for the affected tools MUST NOT be treated as pass.
  • HTTP 4xx — when a tool returns HTTP 4xx (including 401/403), the response MUST NOT be treated as “auth-pass” (see 06-determinism-and-tier.md). The description cascade for that tool cannot be completed and is recorded as a finding.
  • Eligibility violation — when an endpoint fails an exclusion criterion under 02-eligibility.md and the schema author insists on including it, the schema is rejected and dependent Areas do not run for it.

Cascade-stop events are recorded in the grading entry. They do not lower the grade silently — a categorical veto replaces the aggregate grade with REJECTED, which the index derivation maps to the terminal status rejected (see 06-determinism-and-tier.md and 19-folder-layout.md).

When the provider-side Areas are complete, the artefact set is the base unit of the corpus:

  • one namespace,
  • one or more schemas under that namespace,
  • one or more namespace skills, and
  • an About Resource declared in one schema.

The provider-side grade is closed at this point. Selection-side grading (see 05-phases-selection.md) operates on aggregations of base units and never re-grades a base unit’s schemas.


The About Resource is graded by the about-namespace Area. It is a markdown Resource declared in one schema of the namespace (main.resources), stored under providers/<ns>/<schema>/resources/about/, not a namespace route. The full content contract and the deterministic / non-deterministic split are defined in 11-about-convention.md.

A Resource technically never lives at namespace level — there is no namespace object to attach it to, only schemas. About is therefore inserted into one schema, and the detector searches for it namespace-wide.


The provider-side Areas produce gradingTier = autonomous. Per 06-determinism-and-tier.md, the maximum attainable grade on autonomous is B. A schema that should be eligible for grade A must additionally be graded on the selection side (group-bound, see 05-phases-selection.md).