amduat-api/tier1/ms.md
2025-12-22 21:03:00 +01:00

506 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AMDUAT-MS/1 — Mapping Surface Specification
Status: Draft
Owner: Architecture
Version: 0.2.2
SoT: Yes
Last Updated: 2025-11-30
Linked Phase Pack: PH07
Tags: [composition, execution, deterministic]
identity_authority: amduat.programme
lineage_id: L-PENDING-MS1
doc_code: AMDUAT-MS/1
code_status: tentative
doc_code_aliases: []
location: /amduat/tier1/ms.md
surface: developer
internal_revision: 3
provenance_mode: full
---
## Overview
**AMDUAT-MS/1** standardises the executable mapping surface that turns a
**Concept** plus a **Context Frame** into deterministic **Data** bytes. The
surface orchestrates **FPS/1 primitives** through **FCS/1 recipes** and records
runs through **FER/1**; certification, provenance, and policy decisions remain
with **FCT/1** and phase evidence packs. MS/1 governs observable mapping
behaviour and context binding rules without authorising new governance or
relation taxonomies.
---
## Core Model
MS/1 retains Amduat's two primitive node kinds and treats all other structure as
concept-typed relations.
* **Concept node (C):** Abstract identifier that can describe, type, or govern
any graph element. A concept can own multiple materialisations without losing
identity.
* **Data node (D):** Immutable byte sequence addressed by CID (SHA-256). All
executions produce Data nodes.
* **Relation instances:** Every edge is annotated with a `relation_concept` that
identifies its semantics. Implementations MUST NOT embed a fixed relation
enumeration; relation concepts are first-class concepts.
Non-normative relation concept examples include `represents` (C→D),
`materializesAs` (C→D), `requiresKey` (C→C), `withinDomain` (C→C), `computedBy`
(D→C), and `hasProvenance` (D→D). Ecosystems MAY register additional concepts and
MUST treat each registered concept as a normal graph node.
---
## Context Frames
A **Context Frame (CF)** is a deterministic multimap of `{ key_concept → value
}` that constrains execution.
* **Keys** are concept identifiers; strings are optional.
* **Values** are canonical scalars (integers, enums keyed by concept ID) or CIDs
to Data nodes for larger payloads.
* **Context CID:** `CID_context = sha256(canonical_encode(CF))`. The canonical
encoding MUST order keys by concept ID and normalise values so identical frames
generate identical CIDs.
### Scoping and Refinement
* Frames form a tree during pipeline execution. Each mapping step receives one
frame.
* Child frames inherit bindings from the parent. A child MAY add new bindings or
narrow an existing binding but MUST NOT contradict established bindings.
* Branch communication is explicit. Publishing a refinement to siblings requires
constructing a new **published frame** and referencing it; no sideways state is
implicit.
### Gaps and Ambiguity
* Missing required bindings yield a **Gap Artifact** (Data) that records the
missing keys and the decision point. No payload bytes are emitted.
* When multiple admissible bindings remain, the step returns an **Ambiguity
Artifact** (Data) enumerating admissible alternatives and the rule needed to
disambiguate. Resolution demands a refined frame.
Gap and Ambiguity artifacts are ordinary Data nodes with CIDs and MAY be audited
like any other output.
---
## Mapping Surface Semantics
The surface defines the total function:
```
MS_map : (Concept C, ContextFrame CF) -> Result
Result = Produced(Data bytes) | Gap(Data) | Ambiguity(Data)
```
### Executable Form
* Every mapping MUST be realised as an **FCS/1 recipe** parameterised by **PCB1**
blocks.
* Recipes MAY only invoke **FPS/1 primitives** (`put`, `get`, `slice`,
`concatenate`, `reverse`, `splice`) and compositions thereof.
### Determinism and Replayability
For fixed `(C, CF)` inputs and fixed referenced Data CIDs, implementations MUST
produce identical bytes.
Each run MUST emit a **FER/1 run record** (Data) that declares:
* the recipe concept ID (FCS/1),
* the PCB1 parameter block (as Data CID),
* all input CIDs (including `CID_context`),
* the output CID, and
* relevant environment hashes (e.g., FPS/1 library surfaces) that affect
determinism.
Replaying the FCS/1 recipe with the same inputs MUST yield the same CID.
### Fidelity Predicates
Each mapping declares (via relation concepts) an **equivalence predicate** that
states when the produced bytes faithfully represent the target concept. Fidelity
predicates are themselves concepts and MAY be certified through **FCT/1**.
### Domain Keys
Disambiguation relies on **Domain concepts** expressed as context keys (e.g.,
`withinDomain`). Recipes MAY require domain bindings such as classical vs.
quantum state spaces to guarantee deterministic interpretation.
### Function Interface Patterns
Implementations SHOULD expose a narrow callable surface so concept inputs and
Determinism guarantees remain auditable.
1. **Primary signature:**
* `Result ms_map(concept_id, context_frame)` is the canonical entry point.
* `concept_id` MUST be a stable concept node reference (e.g., CID or registry
handle). Implementations MUST NOT infer the target concept from global
mutable state.
* `context_frame` MUST be the complete set of bindings used for the run. Any
binding derived from other inputs MUST be reflected back into the frame
before execution.
2. **Auxiliary parameters:**
* Additional parameters are only permitted when they can be canonicalised to
deterministic Data CIDs and recorded in the FER/1 run record.
* Preferred pattern: `attachments: Sequence[DataCID] = ()`. Each attachment
MUST already exist as a Data node, and the mapping MUST reference the CIDs
explicitly in the FER/1 log.
* Alternative pattern: `concept_inputs: Sequence[ConceptID] = ()` for
secondary concept handles. Each element MUST also appear in the effective
context frame (e.g., under a relation-specific key). Passing the concept ID
without updating the frame is a violation.
3. **Rejected pattern:** Opaque keyword arguments or process-level environment
variables MUST NOT influence execution. Implementations discovering such usage
MUST emit an Ambiguity artifact describing the missing context binding
instead.
The following pseudocode illustrates a compliant wrapper:
```
def ms_map(concept_id: ConceptID,
context_frame: ContextFrame,
*,
attachments: Sequence[DataCID] = (),
concept_inputs: Sequence[ConceptID] = ()) -> Result:
frame = context_frame.with_bindings({
relation_for(ci): ci for ci in concept_inputs
})
frame = frame.with_bindings({
relation_for(cid): cid for cid in attachments
})
record = run_fcs_recipe(concept_id, frame, attachments)
return normalise_result(record)
```
`relation_for` denotes a deterministic lookup from relation concept to the key
under which the binding is stored. Implementations MAY inline more efficient
mechanisms, but the observable effect MUST match first updating the frame and
then invoking the recipe.
### Media (MIME) Types vs. MS Context
**MIME/media types** label already-produced Data so downstream systems can
interpret byte strings (e.g., `text/plain; charset=utf-8`, `image/png`). An
**MS context frame** captures the *pre-execution* bindings that shape which bytes
will be emitted. They relate but are not interchangeable:
* **Where they live:** Context bindings exist before execution and are hashed
into `CID_context`. MIME labels are attached after Data exists (e.g., CIL
payload metadata, HTTP headers, or relation edges such as
`ms.produces_media_type`).
* **What they encode:** Context keys describe decision levers (encoding, domain,
fidelity policy) that an FCS recipe uses to deterministically produce bytes.
MIME types describe how consumers should parse already-materialised bytes. A
context MAY carry a desired media-type concept ("emit this as
`application/pdf`") but it is still a binding that constrains execution, not a
replacement for Content-Type headers.
* **Interoperability:** When MS outputs feed MIME-governed ecosystems, recipes
SHOULD register a concept such as `ms.media_type` and declare it via
`ms.requires_key` if the choice affects determinism (e.g., choosing between
`text/csv` vs. `application/json`). FER/1 receipts then bind the produced Data
both to the governing context frame and to a media-type relation so downstream
MIME routers, storage overlays, and catalogues reconcile the two perspectives.
MIME types are one example of **interpretation contracts** that MS contexts can
align with. The same pattern applies to CAD kernels with STEP schemas, audio
encoders with bitstream levels, or ML artefacts with `model.format`
descriptors. MS/1 keeps them deterministic by requiring the selection to be a
context binding (recorded before execution) while also allowing publication
surfaces to mirror the choice via their native metadata channels.
---
## Pipelines and Composition
A **Pipeline** is a concept that composes ordered mapping steps.
* Each step executes `MS_map(C_i, CF_i)`.
* Pipelines return either Produced(Data) or an Artifact (Gap/Ambiguity).
* Given identical inputs, pipelines MUST be byte-stable.
* **Branch isolation:** each branch operates on a forked frame. Publishing
updates requires emitting a new frame with a fresh `CID_context`, preserving
immutability.
---
## Conformance Criteria
Implementations conform to MS/1 when they satisfy all of the following:
1. Accept `(C, CF)` and emit either Produced(Data) or a Gap/Ambiguity artifact.
2. Produce outputs reproducibly via FCS/1 + FPS/1, and record each run via FER/1.
3. Encode mutable decisions through concept-typed context keys only; no hidden
flags.
4. Address all bytes by SHA-256 CIDs and guarantee identical inputs replay to
identical CIDs.
5. Treat relation types as concepts without hard-coded enumerations.
## Dependencies (pinned drafts)
MS/1 relies on upstream drafts; consumers MUST treat these versions as pinned
for this specification and handle later changes as upstream drift:
* FCS/1 v0.2.1 (Draft)
* FPS/1 v0.4.3 (Draft)
* FLS/1 v0.1.0 (Draft)
Any adoption in PH10+ MUST re-pin consciously if these upstream specs change.
## Evidence (PH08 runtime)
PH08 reference executions demonstrate MS/1 behaviour; cite these when asserting
readiness or approval:
* `logs/ph08/evidence/ms/PH08-EV-MS-RUNTIME-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-ACCEPT-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-BIND-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-CORE-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-GATES-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-LADDER-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-PIPE-001/`
* `logs/ph08/evidence/ms/PH08-EV-MS-ML-EVAL-001/`
---
## Worked Ladder Example (Zero → Byte)
The following non-normative ladder illustrates how MS/1 composes mappings from
abstract numbers to textual bytes.
1. Concepts: `C_Number65`, `C_CodePoint(U+0041)`, `C_LetterA`, `C_WordA`.
2. Context frame bindings: `withinDomain → C_Unicode15`,
`C_TextEncoding → C_UTF8`.
3. Steps:
* Interpret `C_Number65` as `C_CodePoint(U+0041)` by applying a mapping rule
concept.
* Map `C_CodePoint(U+0041)` within the Unicode/UTF-8 frame to Data `0x41`.
* Map `C_WordA` under the same frame to Data `0x41`.
Each run records its FER/1 linkage so the byte `0x41` can be replayed.
---
## Context Evolution and Replay Discipline
### Context Revisions During Pipelines
When a branch detects missing bindings (e.g., absent `C_TextEncoding`), it MUST
emit an Ambiguity artifact detailing admissible encodings. Progress resumes by
creating a refined frame `CF' = CF { C_TextEncoding → C_UTF8 }` and re-running
only the affected branch. Publishing the refinement to siblings is explicit and
produces a new parent frame with a distinct `CID_context`.
### Replay-First Evolution
MS/1 guards against semantic drift when the knowledge base or registry grows:
* **Frame immutability:** Every FER/1 record captures the `CID_context` that was
in force at execution time. New bindings (e.g., alternative encodings) produce
new frame hashes and therefore new provenance edges. Historical runs replay
byte-for-byte because their frames never mutate.
* **Required bindings:** Recipes MUST register mandatory keys via
`ms.requires_key`. When governance tightens (for example, by requiring a
`string.encoding` key), existing frames lacking the key yield deterministic
Gap/Ambiguity artifacts instead of silently adopting defaults.
* **Replay-first migration:** If an ecosystem needs the new semantics, it reruns
the recorded FCS/1 recipe with a refined frame. The new FER/1 record cites the
refined `CID_context`, making comparisons between legacy and refreshed runs
explicit and auditable.
* **Artifact parity:** Gap and Ambiguity outputs are stored as first-class Data
nodes. Audits can therefore demonstrate exactly where a richer knowledge base
demanded additional context, keeping provenance complete even when execution
paused.
Collectively, these rules let operators expand the registry or tighten policies
without jeopardising determinism or traceability.
---
## Hashing Bits and Abstract Values
MS/1 hashes bytes, not abstract concepts. A bit becomes hashable only once it is
materialised (e.g., via `C_BitAsOctet`). When domain or packing decisions remain
unbound, the mapping MUST emit an Ambiguity or Gap artifact rather than guess.
---
## Risk Controls
MS/1 aligns with existing Amduat controls:
* **Semantic drift:** Context keys are concepts with versioned materialisations;
frame hashes reveal mismatches.
* **Provenance loss:** Every Produced(Data) links to a FER/1 record via
`hasProvenance`.
* **Undocumented mutation:** Frames are immutable; refinements create new
contexts.
* **Rights obligations:** Licences and attribution obligations appear as context
keys referencing policy Data. Recipes MAY decline to execute when obligations
are unmet, provided the decision is deterministic.
---
## Acceptance Checks
An MS/1 implementation SHOULD ship the following self-tests:
1. **Replay test:** identical `(C, CF)` inputs produce the same CID and FER/1 log.
2. **Gap test:** missing required keys yield deterministic Gap artifacts.
3. **Ambiguity test:** admissible alternatives are enumerated stably.
4. **Scope test:** sibling branches remain unchanged unless a published frame is
adopted.
5. **Fidelity test:** the declared equivalence predicate is machine-verifiable.
---
## Registry Alignment
MS/1 is production-registered in the CRS/1 concept registry. Implementations
MUST treat the following handles and digests as canonical when emitting or
validating graphs:
| Symbol | Registry Handle | Kind | SHA-256 Digest | Notes |
| --- | --- | --- | --- | --- |
| `MS/1` | `crs:concept/amduat.ms.surface@1` | Concept | `d140ac54367a88fa2459e3fedf0b2fde934f9ac73568f8a159e2b0c1c1828c70` | Primary mapping surface concept anchoring `MS_map` executions. |
| `ms.produces` | `crs:concept/amduat.ms.relation.produces@1` | Relation concept | `447a9f454d78f5b2ee300fe416138a864789e133b2eb9a84e32592aa9dd47965` | Annotates `Concept → Data` edges that capture Produced(Data) bytes. |
| `ms.requires_key` | `crs:concept/amduat.ms.relation.requires_key@1` | Relation concept | `a90295a8ca3006e062a5a1d5a6220330e53ba00677736b6f4a18efcec1169f6a` | Declares mandatory context key bindings for deterministic execution. |
| `ms.within_domain` | `crs:concept/amduat.ms.relation.within_domain@1` | Relation concept | `0993dff2531dd32ea32b98925bf8a5cbc88c88ed28cd3c3575e8affc84d7fa2d` | Expresses domain refinements used to disambiguate mappings. |
| `ms.fidelity_predicate` | `crs:concept/amduat.ms.relation.fidelity_predicate@1` | Relation concept | `7e159182789d89269b743c19da58d34acc3279d86650cfef83efd4f2c210c66a` | Binds outputs to the declared fidelity predicate concept. |
| `ms.byteValue` | `crs:concept/amduat.ms.relation.byte_value@1` | Relation concept | `4c43dd3a37ae695bac476e4dc62d8d8c2abda6c555668d4d863810d0053056c3` | Records byte concepts and their literal value bindings within the ladder corpus. |
| `ms.codePoint` | `crs:concept/amduat.ms.relation.code_point@1` | Relation concept | `bb7301d2fa0c5058cbd53019625bfd38ecb59e42010b64a9f0b0ada7dc494117` | Links textual concepts to Unicode code points under registered contexts. |
| `ms.symbolSequence` | `crs:concept/amduat.ms.relation.symbol_sequence@1` | Relation concept | `115035e5dc2db7e9ab2e4255f50aee56d222b04f7b7434dbbec76620cf36aa6d` | Declares ordered relations between code points/bytes when emitting strings. |
| `ms.upperCasePolicy` | `crs:concept/amduat.ms.relation.upper_case_policy@1` | Relation concept | `6f5165648c1069178c2e8a615bd24bd02a3a367df6f8e6ac21050f42eeea484c` | Encodes casing policies that constrain textual ladders. |
| `ms.titleCasePolicy` | `crs:concept/amduat.ms.relation.title_case_policy@1` | Relation concept | `7f10f109b578b69b079fc9f284ed73c69b82434b31db0a999a872d2b47958ae5` | Encodes title-casing policies enforced during deterministic mapping. |
The registry sidecar at `/amduat/registry/predicates.jsonl` mirrors these
entries, allowing auditors to verify digests independent of this specification.
Implementations MUST fail closed when encountering unregistered aliases for
these handles.
---
## Phase 07 Cross-Stream Integration
Phase 07 workstreams bind their certificates, receipts, facts, overlays, and
domain dossiers to MS/1 contexts through a shared TypeTag grid:
* `/amduat/phases/ph07/notes/PH07-CIL-XMAP-001.md` (`CIL-X1`) and
`vectors/ph07/cil/PH07-CIL-XMAP-001.json` declare the authoritative
cross-stream mapping entries and TypeTag ranges that all semantic surfaces
must cite before reaching Draft Ready gates. The JSON registry is logged
under `PH07-EV-CIL-ATTEST-001` so downstream profiles can dereference the
same identifiers without ambiguity.
* `/amduat/phases/ph07/notes/PH07-CROSS-CHECK-001.md` records the harness that
enforces these bindings. The checklist stored in
`logs/ph07/evidence/cil/PH07-EV-CIL-ATTEST-001/PH07-CROSS-TV-001.md`
validates every FER/FCT/OI manifest against the refreshed XMAP IDs before the
`*-5` ledger exits (`FER-5`, `FCT-5`, `OI-5`) can advance.
* `/amduat/phases/ph07/notes/PH07-FER-SCHEMA-001.md` (ledger `FER-1`, evidence
`PH07-EV-FER-RUN-001`) pins `xmap_refs[]`, certificate anchors, and replay
bundles to the XMAP rows so every receipt explicitly declares which MS/1
context governed execution.
* `/amduat/phases/ph07/notes/PH07-FCT-SCHEMA-001.md` (ledger `FCT-1`, evidence
`PH07-EV-FCT-FACTS-001`) introduces the `trust_spine` block that carries the
required `xmap_ref`, `receipt_refs[]`, and `anchor_certs[]`, keeping fact
acceptance policies tied to the same mapping IDs.
* `/amduat/phases/ph07/notes/PH07-OI-HARNESS-001.md` (ledger `OI-5`, evidence
`PH07-EV-OI-VIEWS-001`) proves overlay descriptors and workspace views publish
`mapping_profile.xmap_refs[]` plus TGK edge expectations aligned with
`XMAP-CIL-CUSTOM-V1`.
* `/amduat/phases/ph07/notes/PH07-DOM-HARNESS-001.md` (ledger `DOM-5`, evidence
`PH07-EV-DOM-APPS-001`) extends the same guarantees to domain pilot dossiers,
confirming their overlays, facts, and receipts cite approved TypeTags and TGK
edge sets.
MS/1 implementations participating in PH07 MUST therefore emit the same mapping
handles and evidence references recorded in these notes so provenance can be
validated across CIL/FER/FCT/OI boundaries.
---
## Phase 05 Textual Ladder Scope and Evidence
Phase 05 extends MS/1 from abstract examples to a production ladder that binds
textual concepts to deterministic bytes. The ladder introduces:
* A `byte` concept family with 256 child concepts (`byte/0x00` … `byte/0xFF`)
whose CRS/1 relations emit single-octet Data nodes and optionally record the
radix context via `ms.byteValue` predicates.
* UTF-8 code point concepts that sequence byte concepts through
`ms.symbolSequence` relations while asserting `ms.within_domain` bindings to
Unicode 15 and UTF-8 domain concepts.
* Casing policy concepts (e.g., `allCaps`, `titleCase`) that require the
`string.casingPolicy` context key and advertise fidelity predicates so
downstream tooling can reject ambiguous casing decisions via
`ms.upperCasePolicy` and `ms.titleCasePolicy` handles.
* Dictionary word concepts implemented as FCS/1 recipes that concatenate code
points into byte strings, emitting FER/1 receipts that cite the governing
casing policy and context frame.
Predicate registries gain canonical handles (`ms.byteValue`, `ms.codePoint`,
`ms.symbolSequence`, `ms.produces`, `ms.requires_key`, `ms.upperCasePolicy`,
`ms.titleCasePolicy`) so tooling can resolve ladder edges without bespoke
enumerations. Missing casing bindings MUST yield Ambiguity artifacts
(`ERR_MS_POLICY_MISSING`/`ERR_MS_AMBIGUITY`); absent code points produce Gap
artifacts (`ERR_MS_GAP`); undeclared predicates raise
`ERR_MS_UNDECLARED_PREDICATE`.
Evidence for the ladder is captured under
`/amduat/vectors/ph05/ms1-text/manifest.json` and the reserved
`/amduat/logs/ph05/evidence/ms1/` surfaces:
* `PH05-EV-MS-CTX-001/` — CTX/1 context frames with predicate registry vectors
(domain separator `AMDUAT:CTX\0`, reject `ERR_CTX_UNKNOWN_KEY`).
* `PH05-EV-MS-LADDER-001/` — Dual-run FER/1-backed positive ladders for bytes,
code points, and dictionary outputs with SA/PA guardrails.
* `PH05-EV-MS-ERRORS-001/` — Gap/ambiguity/missing policy/undeclared predicate
receipts mapped to ADR-006.
---
## Phase Alignment and Readiness
* **PH07 (Semantic Surfaces):** MS/1 is authoritative for every PH07 semantic
surface (CIL/FER/FCT/OI) and must be cited wherever mapping semantics are
referenced. PH07 workstreams MAY extend context vocabularies or registry rows
so long as they remain MS/1-conformant; no runtime commits are expected in
this phase beyond harness stubs and governance evidence.
* **PH08 (Reference Implementation):** The reference `ms_map` runtime, parity
harnesses, and subsystem wiring reside in the `KRN-2` campaign slotted for
Phase 08. PH08 SHALL use this specification verbatim, emitting FER/1 records
and exercising the acceptance checks in §Acceptance Checks.
* **Downstream pilots (PH09+):** Reproducible ML, CI/CD, data mesh, and notebook
pilots inherit the MS/1 contract by invoking the PH08 reference surface. Those
phases MAY add domain-specific context keys or ladders, but they MUST register
them through CRS/1 and capture evidence via the surfaces reserved in
§Phase 05 Textual Ladder Scope and Evidence.
Declaring these boundaries keeps the approval pathway clear: PH07 completes the
specification, PH08 proves the executable substrate, and later phases consume it
without reopening MS/1 fundamentals.
---
## Document History
<!-- Managed by tools/codex/document_history.py -->
* **0.1.0 (2025-11-10):** Initial draft capturing deterministic concept-to-data mapping surface.
* **0.1.1 (2025-11-11):** Add interface patterns for mapping functions and constrain auxiliary parameters.
* **0.1.2 (2025-11-12):** Formalize registry concept handles and digests for MS/1.
* **0.1.3 (2025-11-14):** Documented PH05 textual ladder scope, predicate handles, and evidence surfaces.
* **0.1.4 (2025-11-15):** Aligned MS/1 evidence references with CTX/1, ladder, and error reservations.
* **0.1.5 (2025-11-18):** Added ms.* predicate handles, CTX/1 domain separator, ADR-006 error mapping, and dual-run evidence guardrails.
* **0.1.6 (2025-11-18):** Added context-evolution guardrails and PH07→PH08 readiness boundaries.
* **0.1.7 (2025-11-19):** Clarified MIME/media-type relationship to MS context frames and execution bindings.
* **0.2.0 (2025-11-19):** Standardized metadata, headings, and evidence alignment per DOCSTD.
* **0.2.1 (2025-11-19):** Synced registry handles and documented PH07 XMAP/harness integration.
* **0.2.2 (2025-11-30):** Added DOCID header, pinned upstream draft dependencies, and referenced PH08 MS/1 evidence bundles.