amduat-api/tier1/ms.md
2025-12-22 21:03:00 +01:00

24 KiB
Raw Permalink Blame History

AMDUAT-MS/1 — Mapping Surface Specification

Status: Draft Owner: Architecture Version: 0.2.2 SoT: Yes Last Updated: 2025-11-30 Linked Phase Pack: PH07 Tags: [composition, execution, deterministic]

identity_authority: amduat.programme lineage_id: L-PENDING-MS1 doc_code: AMDUAT-MS/1 code_status: tentative doc_code_aliases: []

location: /amduat/tier1/ms.md surface: developer internal_revision: 3 provenance_mode: full


Overview

AMDUAT-MS/1 standardises the executable mapping surface that turns a Concept plus a Context Frame into deterministic Data bytes. The surface orchestrates FPS/1 primitives through FCS/1 recipes and records runs through FER/1; certification, provenance, and policy decisions remain with FCT/1 and phase evidence packs. MS/1 governs observable mapping behaviour and context binding rules without authorising new governance or relation taxonomies.


Core Model

MS/1 retains Amduat's two primitive node kinds and treats all other structure as concept-typed relations.

  • Concept node (C): Abstract identifier that can describe, type, or govern any graph element. A concept can own multiple materialisations without losing identity.
  • Data node (D): Immutable byte sequence addressed by CID (SHA-256). All executions produce Data nodes.
  • Relation instances: Every edge is annotated with a relation_concept that identifies its semantics. Implementations MUST NOT embed a fixed relation enumeration; relation concepts are first-class concepts.

Non-normative relation concept examples include represents (C→D), materializesAs (C→D), requiresKey (C→C), withinDomain (C→C), computedBy (D→C), and hasProvenance (D→D). Ecosystems MAY register additional concepts and MUST treat each registered concept as a normal graph node.


Context Frames

A Context Frame (CF) is a deterministic multimap of { key_concept → value } that constrains execution.

  • Keys are concept identifiers; strings are optional.
  • Values are canonical scalars (integers, enums keyed by concept ID) or CIDs to Data nodes for larger payloads.
  • Context CID: CID_context = sha256(canonical_encode(CF)). The canonical encoding MUST order keys by concept ID and normalise values so identical frames generate identical CIDs.

Scoping and Refinement

  • Frames form a tree during pipeline execution. Each mapping step receives one frame.
  • Child frames inherit bindings from the parent. A child MAY add new bindings or narrow an existing binding but MUST NOT contradict established bindings.
  • Branch communication is explicit. Publishing a refinement to siblings requires constructing a new published frame and referencing it; no sideways state is implicit.

Gaps and Ambiguity

  • Missing required bindings yield a Gap Artifact (Data) that records the missing keys and the decision point. No payload bytes are emitted.
  • When multiple admissible bindings remain, the step returns an Ambiguity Artifact (Data) enumerating admissible alternatives and the rule needed to disambiguate. Resolution demands a refined frame.

Gap and Ambiguity artifacts are ordinary Data nodes with CIDs and MAY be audited like any other output.


Mapping Surface Semantics

The surface defines the total function:

MS_map : (Concept C, ContextFrame CF) -> Result
Result = Produced(Data bytes) | Gap(Data) | Ambiguity(Data)

Executable Form

  • Every mapping MUST be realised as an FCS/1 recipe parameterised by PCB1 blocks.
  • Recipes MAY only invoke FPS/1 primitives (put, get, slice, concatenate, reverse, splice) and compositions thereof.

Determinism and Replayability

For fixed (C, CF) inputs and fixed referenced Data CIDs, implementations MUST produce identical bytes.

Each run MUST emit a FER/1 run record (Data) that declares:

  • the recipe concept ID (FCS/1),
  • the PCB1 parameter block (as Data CID),
  • all input CIDs (including CID_context),
  • the output CID, and
  • relevant environment hashes (e.g., FPS/1 library surfaces) that affect determinism.

Replaying the FCS/1 recipe with the same inputs MUST yield the same CID.

Fidelity Predicates

Each mapping declares (via relation concepts) an equivalence predicate that states when the produced bytes faithfully represent the target concept. Fidelity predicates are themselves concepts and MAY be certified through FCT/1.

Domain Keys

Disambiguation relies on Domain concepts expressed as context keys (e.g., withinDomain). Recipes MAY require domain bindings such as classical vs. quantum state spaces to guarantee deterministic interpretation.

Function Interface Patterns

Implementations SHOULD expose a narrow callable surface so concept inputs and Determinism guarantees remain auditable.

  1. Primary signature:
    • Result ms_map(concept_id, context_frame) is the canonical entry point.
    • concept_id MUST be a stable concept node reference (e.g., CID or registry handle). Implementations MUST NOT infer the target concept from global mutable state.
    • context_frame MUST be the complete set of bindings used for the run. Any binding derived from other inputs MUST be reflected back into the frame before execution.
  2. Auxiliary parameters:
    • Additional parameters are only permitted when they can be canonicalised to deterministic Data CIDs and recorded in the FER/1 run record.
    • Preferred pattern: attachments: Sequence[DataCID] = (). Each attachment MUST already exist as a Data node, and the mapping MUST reference the CIDs explicitly in the FER/1 log.
    • Alternative pattern: concept_inputs: Sequence[ConceptID] = () for secondary concept handles. Each element MUST also appear in the effective context frame (e.g., under a relation-specific key). Passing the concept ID without updating the frame is a violation.
  3. Rejected pattern: Opaque keyword arguments or process-level environment variables MUST NOT influence execution. Implementations discovering such usage MUST emit an Ambiguity artifact describing the missing context binding instead.

The following pseudocode illustrates a compliant wrapper:

def ms_map(concept_id: ConceptID,
           context_frame: ContextFrame,
           *,
           attachments: Sequence[DataCID] = (),
           concept_inputs: Sequence[ConceptID] = ()) -> Result:
    frame = context_frame.with_bindings({
        relation_for(ci): ci for ci in concept_inputs
    })
    frame = frame.with_bindings({
        relation_for(cid): cid for cid in attachments
    })
    record = run_fcs_recipe(concept_id, frame, attachments)
    return normalise_result(record)

relation_for denotes a deterministic lookup from relation concept to the key under which the binding is stored. Implementations MAY inline more efficient mechanisms, but the observable effect MUST match first updating the frame and then invoking the recipe.

Media (MIME) Types vs. MS Context

MIME/media types label already-produced Data so downstream systems can interpret byte strings (e.g., text/plain; charset=utf-8, image/png). An MS context frame captures the pre-execution bindings that shape which bytes will be emitted. They relate but are not interchangeable:

  • Where they live: Context bindings exist before execution and are hashed into CID_context. MIME labels are attached after Data exists (e.g., CIL payload metadata, HTTP headers, or relation edges such as ms.produces_media_type).
  • What they encode: Context keys describe decision levers (encoding, domain, fidelity policy) that an FCS recipe uses to deterministically produce bytes. MIME types describe how consumers should parse already-materialised bytes. A context MAY carry a desired media-type concept ("emit this as application/pdf") but it is still a binding that constrains execution, not a replacement for Content-Type headers.
  • Interoperability: When MS outputs feed MIME-governed ecosystems, recipes SHOULD register a concept such as ms.media_type and declare it via ms.requires_key if the choice affects determinism (e.g., choosing between text/csv vs. application/json). FER/1 receipts then bind the produced Data both to the governing context frame and to a media-type relation so downstream MIME routers, storage overlays, and catalogues reconcile the two perspectives.

MIME types are one example of interpretation contracts that MS contexts can align with. The same pattern applies to CAD kernels with STEP schemas, audio encoders with bitstream levels, or ML artefacts with model.format descriptors. MS/1 keeps them deterministic by requiring the selection to be a context binding (recorded before execution) while also allowing publication surfaces to mirror the choice via their native metadata channels.


Pipelines and Composition

A Pipeline is a concept that composes ordered mapping steps.

  • Each step executes MS_map(C_i, CF_i).
  • Pipelines return either Produced(Data) or an Artifact (Gap/Ambiguity).
  • Given identical inputs, pipelines MUST be byte-stable.
  • Branch isolation: each branch operates on a forked frame. Publishing updates requires emitting a new frame with a fresh CID_context, preserving immutability.

Conformance Criteria

Implementations conform to MS/1 when they satisfy all of the following:

  1. Accept (C, CF) and emit either Produced(Data) or a Gap/Ambiguity artifact.
  2. Produce outputs reproducibly via FCS/1 + FPS/1, and record each run via FER/1.
  3. Encode mutable decisions through concept-typed context keys only; no hidden flags.
  4. Address all bytes by SHA-256 CIDs and guarantee identical inputs replay to identical CIDs.
  5. Treat relation types as concepts without hard-coded enumerations.

Dependencies (pinned drafts)

MS/1 relies on upstream drafts; consumers MUST treat these versions as pinned for this specification and handle later changes as upstream drift:

  • FCS/1 v0.2.1 (Draft)
  • FPS/1 v0.4.3 (Draft)
  • FLS/1 v0.1.0 (Draft)

Any adoption in PH10+ MUST re-pin consciously if these upstream specs change.

Evidence (PH08 runtime)

PH08 reference executions demonstrate MS/1 behaviour; cite these when asserting readiness or approval:

  • logs/ph08/evidence/ms/PH08-EV-MS-RUNTIME-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-ACCEPT-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-BIND-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-CORE-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-GATES-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-LADDER-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-PIPE-001/
  • logs/ph08/evidence/ms/PH08-EV-MS-ML-EVAL-001/

Worked Ladder Example (Zero → Byte)

The following non-normative ladder illustrates how MS/1 composes mappings from abstract numbers to textual bytes.

  1. Concepts: C_Number65, C_CodePoint(U+0041), C_LetterA, C_WordA.
  2. Context frame bindings: withinDomain → C_Unicode15, C_TextEncoding → C_UTF8.
  3. Steps:
    • Interpret C_Number65 as C_CodePoint(U+0041) by applying a mapping rule concept.
    • Map C_CodePoint(U+0041) within the Unicode/UTF-8 frame to Data 0x41.
    • Map C_WordA under the same frame to Data 0x41.

Each run records its FER/1 linkage so the byte 0x41 can be replayed.


Context Evolution and Replay Discipline

Context Revisions During Pipelines

When a branch detects missing bindings (e.g., absent C_TextEncoding), it MUST emit an Ambiguity artifact detailing admissible encodings. Progress resumes by creating a refined frame CF' = CF { C_TextEncoding → C_UTF8 } and re-running only the affected branch. Publishing the refinement to siblings is explicit and produces a new parent frame with a distinct CID_context.

Replay-First Evolution

MS/1 guards against semantic drift when the knowledge base or registry grows:

  • Frame immutability: Every FER/1 record captures the CID_context that was in force at execution time. New bindings (e.g., alternative encodings) produce new frame hashes and therefore new provenance edges. Historical runs replay byte-for-byte because their frames never mutate.
  • Required bindings: Recipes MUST register mandatory keys via ms.requires_key. When governance tightens (for example, by requiring a string.encoding key), existing frames lacking the key yield deterministic Gap/Ambiguity artifacts instead of silently adopting defaults.
  • Replay-first migration: If an ecosystem needs the new semantics, it reruns the recorded FCS/1 recipe with a refined frame. The new FER/1 record cites the refined CID_context, making comparisons between legacy and refreshed runs explicit and auditable.
  • Artifact parity: Gap and Ambiguity outputs are stored as first-class Data nodes. Audits can therefore demonstrate exactly where a richer knowledge base demanded additional context, keeping provenance complete even when execution paused.

Collectively, these rules let operators expand the registry or tighten policies without jeopardising determinism or traceability.


Hashing Bits and Abstract Values

MS/1 hashes bytes, not abstract concepts. A bit becomes hashable only once it is materialised (e.g., via C_BitAsOctet). When domain or packing decisions remain unbound, the mapping MUST emit an Ambiguity or Gap artifact rather than guess.


Risk Controls

MS/1 aligns with existing Amduat controls:

  • Semantic drift: Context keys are concepts with versioned materialisations; frame hashes reveal mismatches.
  • Provenance loss: Every Produced(Data) links to a FER/1 record via hasProvenance.
  • Undocumented mutation: Frames are immutable; refinements create new contexts.
  • Rights obligations: Licences and attribution obligations appear as context keys referencing policy Data. Recipes MAY decline to execute when obligations are unmet, provided the decision is deterministic.

Acceptance Checks

An MS/1 implementation SHOULD ship the following self-tests:

  1. Replay test: identical (C, CF) inputs produce the same CID and FER/1 log.
  2. Gap test: missing required keys yield deterministic Gap artifacts.
  3. Ambiguity test: admissible alternatives are enumerated stably.
  4. Scope test: sibling branches remain unchanged unless a published frame is adopted.
  5. Fidelity test: the declared equivalence predicate is machine-verifiable.

Registry Alignment

MS/1 is production-registered in the CRS/1 concept registry. Implementations MUST treat the following handles and digests as canonical when emitting or validating graphs:

Symbol Registry Handle Kind SHA-256 Digest Notes
MS/1 crs:concept/amduat.ms.surface@1 Concept d140ac54367a88fa2459e3fedf0b2fde934f9ac73568f8a159e2b0c1c1828c70 Primary mapping surface concept anchoring MS_map executions.
ms.produces crs:concept/amduat.ms.relation.produces@1 Relation concept 447a9f454d78f5b2ee300fe416138a864789e133b2eb9a84e32592aa9dd47965 Annotates Concept → Data edges that capture Produced(Data) bytes.
ms.requires_key crs:concept/amduat.ms.relation.requires_key@1 Relation concept a90295a8ca3006e062a5a1d5a6220330e53ba00677736b6f4a18efcec1169f6a Declares mandatory context key bindings for deterministic execution.
ms.within_domain crs:concept/amduat.ms.relation.within_domain@1 Relation concept 0993dff2531dd32ea32b98925bf8a5cbc88c88ed28cd3c3575e8affc84d7fa2d Expresses domain refinements used to disambiguate mappings.
ms.fidelity_predicate crs:concept/amduat.ms.relation.fidelity_predicate@1 Relation concept 7e159182789d89269b743c19da58d34acc3279d86650cfef83efd4f2c210c66a Binds outputs to the declared fidelity predicate concept.
ms.byteValue crs:concept/amduat.ms.relation.byte_value@1 Relation concept 4c43dd3a37ae695bac476e4dc62d8d8c2abda6c555668d4d863810d0053056c3 Records byte concepts and their literal value bindings within the ladder corpus.
ms.codePoint crs:concept/amduat.ms.relation.code_point@1 Relation concept bb7301d2fa0c5058cbd53019625bfd38ecb59e42010b64a9f0b0ada7dc494117 Links textual concepts to Unicode code points under registered contexts.
ms.symbolSequence crs:concept/amduat.ms.relation.symbol_sequence@1 Relation concept 115035e5dc2db7e9ab2e4255f50aee56d222b04f7b7434dbbec76620cf36aa6d Declares ordered relations between code points/bytes when emitting strings.
ms.upperCasePolicy crs:concept/amduat.ms.relation.upper_case_policy@1 Relation concept 6f5165648c1069178c2e8a615bd24bd02a3a367df6f8e6ac21050f42eeea484c Encodes casing policies that constrain textual ladders.
ms.titleCasePolicy crs:concept/amduat.ms.relation.title_case_policy@1 Relation concept 7f10f109b578b69b079fc9f284ed73c69b82434b31db0a999a872d2b47958ae5 Encodes title-casing policies enforced during deterministic mapping.

The registry sidecar at /amduat/registry/predicates.jsonl mirrors these entries, allowing auditors to verify digests independent of this specification. Implementations MUST fail closed when encountering unregistered aliases for these handles.


Phase 07 Cross-Stream Integration

Phase 07 workstreams bind their certificates, receipts, facts, overlays, and domain dossiers to MS/1 contexts through a shared TypeTag grid:

  • /amduat/phases/ph07/notes/PH07-CIL-XMAP-001.md (CIL-X1) and vectors/ph07/cil/PH07-CIL-XMAP-001.json declare the authoritative cross-stream mapping entries and TypeTag ranges that all semantic surfaces must cite before reaching Draft Ready gates. The JSON registry is logged under PH07-EV-CIL-ATTEST-001 so downstream profiles can dereference the same identifiers without ambiguity.
  • /amduat/phases/ph07/notes/PH07-CROSS-CHECK-001.md records the harness that enforces these bindings. The checklist stored in logs/ph07/evidence/cil/PH07-EV-CIL-ATTEST-001/PH07-CROSS-TV-001.md validates every FER/FCT/OI manifest against the refreshed XMAP IDs before the *-5 ledger exits (FER-5, FCT-5, OI-5) can advance.
  • /amduat/phases/ph07/notes/PH07-FER-SCHEMA-001.md (ledger FER-1, evidence PH07-EV-FER-RUN-001) pins xmap_refs[], certificate anchors, and replay bundles to the XMAP rows so every receipt explicitly declares which MS/1 context governed execution.
  • /amduat/phases/ph07/notes/PH07-FCT-SCHEMA-001.md (ledger FCT-1, evidence PH07-EV-FCT-FACTS-001) introduces the trust_spine block that carries the required xmap_ref, receipt_refs[], and anchor_certs[], keeping fact acceptance policies tied to the same mapping IDs.
  • /amduat/phases/ph07/notes/PH07-OI-HARNESS-001.md (ledger OI-5, evidence PH07-EV-OI-VIEWS-001) proves overlay descriptors and workspace views publish mapping_profile.xmap_refs[] plus TGK edge expectations aligned with XMAP-CIL-CUSTOM-V1.
  • /amduat/phases/ph07/notes/PH07-DOM-HARNESS-001.md (ledger DOM-5, evidence PH07-EV-DOM-APPS-001) extends the same guarantees to domain pilot dossiers, confirming their overlays, facts, and receipts cite approved TypeTags and TGK edge sets.

MS/1 implementations participating in PH07 MUST therefore emit the same mapping handles and evidence references recorded in these notes so provenance can be validated across CIL/FER/FCT/OI boundaries.


Phase 05 Textual Ladder Scope and Evidence

Phase 05 extends MS/1 from abstract examples to a production ladder that binds textual concepts to deterministic bytes. The ladder introduces:

  • A byte concept family with 256 child concepts (byte/0x00byte/0xFF) whose CRS/1 relations emit single-octet Data nodes and optionally record the radix context via ms.byteValue predicates.
  • UTF-8 code point concepts that sequence byte concepts through ms.symbolSequence relations while asserting ms.within_domain bindings to Unicode 15 and UTF-8 domain concepts.
  • Casing policy concepts (e.g., allCaps, titleCase) that require the string.casingPolicy context key and advertise fidelity predicates so downstream tooling can reject ambiguous casing decisions via ms.upperCasePolicy and ms.titleCasePolicy handles.
  • Dictionary word concepts implemented as FCS/1 recipes that concatenate code points into byte strings, emitting FER/1 receipts that cite the governing casing policy and context frame.

Predicate registries gain canonical handles (ms.byteValue, ms.codePoint, ms.symbolSequence, ms.produces, ms.requires_key, ms.upperCasePolicy, ms.titleCasePolicy) so tooling can resolve ladder edges without bespoke enumerations. Missing casing bindings MUST yield Ambiguity artifacts (ERR_MS_POLICY_MISSING/ERR_MS_AMBIGUITY); absent code points produce Gap artifacts (ERR_MS_GAP); undeclared predicates raise ERR_MS_UNDECLARED_PREDICATE.

Evidence for the ladder is captured under /amduat/vectors/ph05/ms1-text/manifest.json and the reserved /amduat/logs/ph05/evidence/ms1/ surfaces:

  • PH05-EV-MS-CTX-001/ — CTX/1 context frames with predicate registry vectors (domain separator AMDUAT:CTX\0, reject ERR_CTX_UNKNOWN_KEY).
  • PH05-EV-MS-LADDER-001/ — Dual-run FER/1-backed positive ladders for bytes, code points, and dictionary outputs with SA/PA guardrails.
  • PH05-EV-MS-ERRORS-001/ — Gap/ambiguity/missing policy/undeclared predicate receipts mapped to ADR-006.

Phase Alignment and Readiness

  • PH07 (Semantic Surfaces): MS/1 is authoritative for every PH07 semantic surface (CIL/FER/FCT/OI) and must be cited wherever mapping semantics are referenced. PH07 workstreams MAY extend context vocabularies or registry rows so long as they remain MS/1-conformant; no runtime commits are expected in this phase beyond harness stubs and governance evidence.
  • PH08 (Reference Implementation): The reference ms_map runtime, parity harnesses, and subsystem wiring reside in the KRN-2 campaign slotted for Phase 08. PH08 SHALL use this specification verbatim, emitting FER/1 records and exercising the acceptance checks in §Acceptance Checks.
  • Downstream pilots (PH09+): Reproducible ML, CI/CD, data mesh, and notebook pilots inherit the MS/1 contract by invoking the PH08 reference surface. Those phases MAY add domain-specific context keys or ladders, but they MUST register them through CRS/1 and capture evidence via the surfaces reserved in §Phase 05 Textual Ladder Scope and Evidence.

Declaring these boundaries keeps the approval pathway clear: PH07 completes the specification, PH08 proves the executable substrate, and later phases consume it without reopening MS/1 fundamentals.


Document History

  • 0.1.0 (2025-11-10): Initial draft capturing deterministic concept-to-data mapping surface.
  • 0.1.1 (2025-11-11): Add interface patterns for mapping functions and constrain auxiliary parameters.
  • 0.1.2 (2025-11-12): Formalize registry concept handles and digests for MS/1.
  • 0.1.3 (2025-11-14): Documented PH05 textual ladder scope, predicate handles, and evidence surfaces.
  • 0.1.4 (2025-11-15): Aligned MS/1 evidence references with CTX/1, ladder, and error reservations.
  • 0.1.5 (2025-11-18): Added ms.* predicate handles, CTX/1 domain separator, ADR-006 error mapping, and dual-run evidence guardrails.
  • 0.1.6 (2025-11-18): Added context-evolution guardrails and PH07→PH08 readiness boundaries.
  • 0.1.7 (2025-11-19): Clarified MIME/media-type relationship to MS context frames and execution bindings.
  • 0.2.0 (2025-11-19): Standardized metadata, headings, and evidence alignment per DOCSTD.
  • 0.2.1 (2025-11-19): Synced registry handles and documented PH07 XMAP/harness integration.
  • 0.2.2 (2025-11-30): Added DOCID header, pinned upstream draft dependencies, and referenced PH08 MS/1 evidence bundles.