diff --git a/tier1/asl-core-index-1.md b/tier1/asl-core-index-1.md new file mode 100644 index 0000000..772fa3b --- /dev/null +++ b/tier1/asl-core-index-1.md @@ -0,0 +1,233 @@ +# ASL/1-CORE-INDEX — Semantic Index Model + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [deterministic, index, semantics] + + + +**Document ID:** `ASL/1-CORE-INDEX` +**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle) + +**Depends on (normative):** + +* `ASL/1-CORE` +* `ASL/1-STORE` + +**Informative references:** + +* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts +* `ENC/ASL-CORE-INDEX/1` — bytes-on-disk encoding profile +* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding) +* `ASL/LOG/1` — append-only semantic log (segment visibility) +* `TGK/1` — TGK edge visibility and traversal alignment +* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment) + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX. + +--- + +## 1. Purpose & Non-Goals + +### 1.1 Purpose + +ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts: + +* It specifies what it means to map an artifact identity to a byte location. +* It defines visibility, immutability, and shadowing semantics. +* It ensures deterministic lookup for a fixed snapshot and log prefix. + +### 1.2 Non-goals + +ASL/1-CORE-INDEX explicitly does **not** define: + +* On-disk layouts, segment files, or memory representations. +* Block allocation, packing, GC, or lifecycle rules. +* Snapshot implementation details, checkpoints, or log storage. +* Performance optimizations (bloom filters, sharding, SIMD). +* Federation, provenance, or execution semantics. + +--- + +## 2. Terminology + +* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE. +* **Reference** — ASL/1 content address of an Artifact (hash_id + digest). +* **StoreConfig** — `{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE). +* **Block** — immutable storage unit containing artifact bytes. +* **BlockID** — opaque identifier for a block. +* **ArtifactExtent** — `(BlockID, offset, length)` identifying a byte slice within a block. +* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes. +* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state. +* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot. +* **CURRENT** — effective state after replaying a log prefix on a snapshot. + +--- + +## 3. Core Mapping Semantics + +### 3.1 Index Mapping + +The index defines a semantic mapping: + +``` +Reference -> ArtifactLocation +``` + +For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state. + +### 3.2 Determinism + +For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics. + +### 3.3 StoreConfig Consistency + +All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`. Encoding profiles MUST allow variable-length digests; the digest length MUST be either explicit in the encoding or derivable from `hash_id` and StoreConfig. + +--- + +## 4. ArtifactLocation Semantics + +* An ArtifactLocation is an **ordered list** of ArtifactExtents. +* Each extent references immutable bytes within a block. +* The artifact bytes are defined by **concatenating extents in order**. +* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes. +* Tombstone entries are visible but MUST have no ArtifactLocation; they only shadow prior entries. +* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks. +* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact. +* An ArtifactLocation is valid only while all referenced blocks are retained. +* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping. + +--- + +## 5. Visibility Model + +An index entry is **visible** at CURRENT if and only if: + +1. The entry is contained in a sealed segment whose seal record is admitted in the ordered log prefix for CURRENT (or anchored in the snapshot). +2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules). + +Visibility is binary; entries are either visible or not visible. + +--- + +## 6. Snapshot and Log Semantics + +Snapshots provide a base mapping of sealed segments; the append-only log admits later segment seals and policy records that define subsequent changes. + +The index state for a given CURRENT is defined as: + +``` +Index(CURRENT) = Index(snapshot) + replay(log_prefix) +``` + +Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed. + +--- + +## 7. Immutability and Shadowing + +### 7.1 Immutability + +* Index entries are never mutated. +* Once visible, an entry’s meaning does not change. +* Referenced bytes are immutable for the lifetime of the entry. + +### 7.2 Shadowing + +* Later entries MAY shadow earlier entries with the same Reference. +* Precedence is determined solely by log order. +* Snapshot boundaries do not alter shadowing semantics. + +--- + +## 8. Tombstones (Optional) + +Tombstone entries MAY be used to invalidate prior mappings. + +* A tombstone shadows earlier entries for the same Reference. +* Visibility rules are identical to regular entries. +* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used. + +--- + +## 9. Determinism Guarantees + +For fixed: + +* StoreConfig +* Snapshot +* Log prefix + +ASL/1-CORE-INDEX guarantees: + +* Deterministic lookup results +* Deterministic shadowing resolution +* Deterministic visibility + +--- + +## 10. Normative Invariants + +Conforming implementations MUST enforce: + +1. No visibility without a sealed segment whose seal record is log-admitted (or snapshot-anchored). +2. No mutation of visible index entries. +3. Referenced bytes remain immutable for the entry’s lifetime. +4. Shadowing follows strict log order. +5. Snapshot + log replay uniquely defines CURRENT. +6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun), except for tombstones which have no ArtifactLocation. + +Violation of any invariant constitutes index corruption. + +--- + +## 11. Relationship to Other Specifications + +| Layer | Responsibility | +| ------------------ | ---------------------------------------------------------- | +| ASL/1-CORE | Artifact semantics and identity | +| ASL/1-STORE | StoreSnapshot and put/get logical model | +| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation | +| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts | +| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records | + +--- + +## 12. Summary + +ASL/1-CORE-INDEX specifies the semantic meaning of the index: + +* It maps artifact References to byte locations deterministically. +* It defines visibility and shadowing rules across snapshot + log replay. +* It guarantees immutability and deterministic lookup. + +It answers one question: + +> *Given a Reference and a CURRENT state, where are the bytes?* diff --git a/tier1/asl-index-accel-1.md b/tier1/asl-index-accel-1.md new file mode 100644 index 0000000..bb53c21 --- /dev/null +++ b/tier1/asl-index-accel-1.md @@ -0,0 +1,296 @@ +# ASL/INDEX-ACCEL/1 — Index Acceleration Semantics + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [deterministic, index, acceleration] + + + +**Document ID:** `ASL/INDEX-ACCEL/1` +**Layer:** L1 — Acceleration rules over index semantics (no storage / encoding) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` +* `ASL/LOG/1` + +**Informative references:** + +* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts +* `ENC/ASL-CORE-INDEX/1` — bytes-on-disk encoding profile +* `TGK/1` — TGK semantics and visibility alignment +* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/INDEX-ACCEL/1 defines **acceleration semantics only**. It MUST NOT change index meaning defined by ASL/1-CORE-INDEX. + +--- + +## 1. Purpose + +ASL/INDEX-ACCEL/1 defines **acceleration mechanisms** used by ASL-based indexes, including: + +* Routing keys +* Sharding +* Filters (Bloom, XOR, Ribbon, etc.) +* SIMD execution +* Hash recasting + +All mechanisms defined herein are **observationally invisible** to ASL/1-CORE-INDEX semantics. + +--- + +## 2. Scope + +Applies to: + +* Artifact indexes (ASL) +* Projection and graph indexes (e.g., TGK) +* Any index layered on ASL/1-CORE-INDEX semantics + +Does **not** define: + +* Artifact or edge identity +* Snapshot semantics +* Storage lifecycle +* Encoding details + +--- + +## 3. Canonical Key vs Routing Key + +### 3.1 Canonical Key + +The **Canonical Key** uniquely identifies an indexable entity. + +Examples: + +* Artifact: `Reference` +* TGK Edge: canonical key defined by `TGK/1` and `TGK/1-CORE` (opaque here) + +Properties: + +* Defines semantic identity +* Used for equality, shadowing, and tombstones +* Stable and immutable +* Fully compared on index match + +### 3.2 Routing Key + +The **Routing Key** is a **derived, advisory key** used exclusively for acceleration. + +Properties: + +* Derived deterministically from Canonical Key and optional attributes +* MAY be used for sharding, filters, SIMD layouts +* MUST NOT affect index semantics +* MUST be verified by full Canonical Key comparison on match + +Formal rule: + +``` +CanonicalKey determines correctness +RoutingKey determines performance +``` + +--- + +## 4. Filter Semantics + +### 4.1 Advisory Nature + +All filters are **advisory only**. + +Rules: + +* False positives are permitted +* False negatives are forbidden +* Filter behavior MUST NOT affect correctness + +Invariant: + +``` +Filter miss => key is definitely absent +Filter hit => key may be present +``` + +### 4.2 Filter Inputs + +Filters operate over **Routing Keys**, not Canonical Keys. + +A Routing Key MAY incorporate: + +* Hash of Canonical Key +* Artifact type tag (if present) +* TGK `EdgeTypeId` or other immutable classification attributes (TGK/1-CORE) +* Direction, role, or other immutable classification attributes + +Absence of optional attributes MUST be encoded explicitly. + +### 4.3 Filter Construction + +* Filters are built only over **sealed, immutable segments** +* Filters are immutable once built +* Filter construction MUST be deterministic +* Filter state MUST be covered by segment checksums +* Filters SHOULD be snapshot-scoped or versioned with their segment to avoid + unbounded false-positive accumulation over time + +--- + +## 5. Sharding Semantics + +### 5.1 Observational Invisibility + +Sharding is a **mechanical partitioning** of the index. + +Invariant: + +``` +LogicalIndex = union(all shards) +``` + +Rules: + +* Shards MUST NOT affect lookup results +* Shard count and boundaries may change over time +* Rebalancing MUST preserve lookup semantics + +### 5.2 Shard Assignment + +Shard assignment MAY be based on: + +* Hash of Canonical Key +* Routing Key +* Composite routing strategies + +Shard selection MUST be deterministic per snapshot. + +--- + +## 6. Hashing and Hash Recasting + +### 6.1 Hashing + +Hashes MAY be used for routing, filtering, or SIMD layout. + +Hashes MUST NOT be treated as identity. + +### 6.2 Hash Recasting + +Hash recasting (changing hash functions or seeds) is permitted if: + +1. It is deterministic +2. It does not change Canonical Keys +3. It does not affect index semantics + +Recasting is equivalent to rebuilding acceleration structures. + +--- + +## 7. SIMD Execution + +SIMD operations MAY be used to: + +* Evaluate filters +* Compare routing keys +* Accelerate scans + +Rules: + +* SIMD must operate only on immutable data +* SIMD must not short-circuit semantic checks +* SIMD must preserve deterministic behavior + +--- + +## 8. Multi-Dimensional Routing Examples (Normative) + +### 8.1 Artifact Index + +* Canonical Key: `Reference` +* Routing Key components: + + * `H(Reference)` + * `type_tag` (if present) + * `has_typetag` + +### 8.2 TGK Edge Index + +* Canonical Key: defined by `TGK/1` and `TGK/1-CORE` (opaque here) +* Routing Key components: + + * `H(CanonicalEdgeKey)` + * `EdgeTypeId` (if present in the TGK profile) + * Direction or role (optional) + +--- + +## 9. Snapshot Interaction + +Acceleration structures: + +* MUST respect snapshot visibility rules +* MUST operate over the same sealed segments visible to the snapshot +* MUST NOT bypass tombstones or shadowing + +Snapshot cuts apply **after** routing and filtering. + +--- + +## 10. Normative Invariants + +1. Canonical Keys define identity and correctness +2. Routing Keys are advisory only +3. Filters may never introduce false negatives +4. Sharding is observationally invisible +5. Hashes are not identity +6. SIMD is an execution strategy, not a semantic construct +7. All acceleration is deterministic per snapshot + +--- + +## 11. Non-Goals + +ASL/INDEX-ACCEL/1 does not define: + +* Specific filter algorithms +* Memory layout +* CPU instruction selection +* Encoding formats +* Federation policies + +--- + +## 12. Summary + +ASL/INDEX-ACCEL/1 establishes a strict contract: + +> All acceleration exists to make the index faster, never different. + +It formalizes Canonical vs Routing keys and constrains filters, sharding, hashing, and SIMD so that correctness is preserved under all optimizations. diff --git a/tier1/asl-indexes-1.md b/tier1/asl-indexes-1.md new file mode 100644 index 0000000..f0d7c7d --- /dev/null +++ b/tier1/asl-indexes-1.md @@ -0,0 +1,139 @@ +# ASL/INDEXES/1 -- Index Taxonomy and Relationships + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: No +Last Updated: 2025-01-17 +Linked Phase Pack: N/A +Tags: [indexes, content, structural, materialization] + + + +**Document ID:** `ASL/INDEXES/1` +**Layer:** L2 -- Index taxonomy (no encoding) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` +* `ASL/STORE-INDEX/1` + +**Informative references:** + +* `ASL/SYSTEM/1` +* `TGK/1` +* `ENC/ASL-CORE-INDEX/1` + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/INDEXES/1 defines index roles and relationships. It does not define encodings or storage layouts. + +--- + +## 1. Purpose + +This document defines the minimal set of indexes used by ASL systems and their dependency relationships. + +--- + +## 2. Index Taxonomy (Normative) + +ASL systems use three distinct indexes: + +### 2.1 Content Index + +Purpose: map semantic identity to bytes. + +``` +ArtifactKey -> ArtifactLocation +``` + +Properties: + +* Snapshot-relative and append-only +* Deterministic replay +* Optional tombstone shadowing + +This is the ASL/1-CORE-INDEX and is the only index that governs visibility. + +### 2.2 Structural Index + +Purpose: map structural identity to a derivation DAG node. + +``` +SID -> DAG node +``` + +Properties: + +* Deterministic and rebuildable +* Does not imply materialization +* May be in-memory or persisted + +### 2.3 Materialization Cache + +Purpose: record previously materialized content for a structural identity. + +``` +SID -> ArtifactKey +``` + +Properties: + +* Redundant and safe to drop +* Recomputable from DAG + content index +* Pure performance optimization + +--- + +## 3. Dependency Rules (Normative) + +Dependencies MUST follow this direction: + +``` +Structural Index -> Materialization Cache -> Content Index +``` + +Rules: + +* The Content Index MUST NOT depend on the Structural Index. +* The Structural Index MUST NOT depend on stored bytes. +* The Materialization Cache MAY depend on both. + +--- + +## 4. PUT/GET Interaction (Informative) + +* PUT registers structure (if used), resolves to an ArtifactKey, and updates the Content Index. +* GET consults only the Content Index and reads bytes from the store. +* The Structural Index and Materialization Cache are optional optimizations for PUT. + +--- + +## 5. Non-Goals + +ASL/INDEXES/1 does not define: + +* Encodings for any index +* Storage layout or sharding +* Query operators or traversal semantics diff --git a/tier1/asl-log-1.md b/tier1/asl-log-1.md new file mode 100644 index 0000000..d0dfd49 --- /dev/null +++ b/tier1/asl-log-1.md @@ -0,0 +1,314 @@ +# ASL/LOG/1 — Append-Only Semantic Log + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [deterministic, log, snapshot] + + + +**Document ID:** `ASL/LOG/1` +**Layer:** L1 — Domain log semantics (no transport) + +**Depends on (normative):** + +* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts (pending spec) + +**Informative references:** + +* `ASL/1-CORE-INDEX` — index semantics +* `TGK/1` — TGK edge visibility and traversal alignment +* `ENC/ASL-LOG/1` — bytes-on-disk encoding profile +* `ENC/ASL-CORE-INDEX/1` — index segment encoding +* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment) + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/LOG/1 defines **semantic log behavior**. It does not define transport, replication protocols, or storage layout. + +--- + +## 1. Purpose + +ASL/LOG/1 defines the **authoritative, append-only log** for an ASL domain. + +The log records **semantic commits** that affect: + +* Index segment visibility +* Tombstone policy +* Snapshot anchoring +* Optional publication metadata + +The log is the **sole source of truth** for reconstructing CURRENT state. + +--- + +## 2. Core Properties (Normative) + +An ASL log MUST be: + +1. Append-only +2. Strictly ordered +3. Deterministically replayable +4. Hash-chained +5. Snapshot-anchorable +6. Binary encoded per `ENC-ASL-LOG` +7. Forward-compatible + +--- + +## 3. Log Model + +### 3.1 Log Sequence + +Each record has a monotonically increasing `logseq`: + +``` +logseq: uint64 +``` + +* Assigned by the domain authority +* Total order within a domain +* Never reused + +### 3.2 Hash Chain + +Each record commits to the previous record: + +``` +record_hash = H(prev_record_hash || logseq || record_type || payload_len || payload) +``` + +This enables tamper detection, witness signing, and federation verification. + +### 3.3 Record Envelope + +All log records share a common envelope whose **exact byte layout** is defined +in `ENC-ASL-LOG`. The envelope MUST include: + +* `logseq` (monotonic sequence number) +* `record_type` (type tag) +* `payload_len` (bytes) +* `payload` (type-specific bytes) +* `record_hash` (hash-chained integrity) + +--- + +## 4. Record Types (Normative) + +## 4.0 Common Payload Encoding (Informative) + +The byte-level payload schemas are defined in `ENC-ASL-LOG`. The shared +artifact reference encoding is: + +```c +typedef struct { + uint32_t hash_id; + uint16_t digest_len; + uint16_t reserved0; // must be 0 + uint8_t digest[digest_len]; +} ArtifactRef; +``` + +### 4.1 SEGMENT_SEAL + +Declares an index segment visible. + +Payload (encoding): + +```c +typedef struct { + uint64_t segment_id; + uint8_t segment_hash[32]; +} SegmentSealPayload; +``` + +Semantics: + +* From this `logseq` onward, the referenced segment is visible for lookup and replay. +* Segment MUST be immutable. +* All referenced blocks MUST already be sealed. +* Segment contents are not re-logged. + +### 4.2 TOMBSTONE + +Declares an artifact inadmissible under domain policy. + +Payload (encoding): + +```c +typedef struct { + ArtifactRef artifact; + uint32_t scope; + uint32_t reason_code; +} TombstonePayload; +``` + +Semantics: + +* Does not delete data. +* Shadows prior visibility. +* Applies from this logseq onward. + +### 4.3 TOMBSTONE_LIFT + +Supersedes a previous tombstone. + +Payload (encoding): + +```c +typedef struct { + ArtifactRef artifact; + uint64_t tombstone_logseq; +} TombstoneLiftPayload; +``` + +Semantics: + +* References an earlier TOMBSTONE. +* Does not erase history. +* Only affects CURRENT at or above this logseq. + +### 4.4 SNAPSHOT_ANCHOR + +Binds semantic state to a snapshot. + +Payload (encoding): + +```c +typedef struct { + uint64_t snapshot_id; + uint8_t root_hash[32]; +} SnapshotAnchorPayload; +``` + +Semantics: + +* Defines a replay checkpoint. +* Enables log truncation below anchor with care. + +### 4.5 ARTIFACT_PUBLISH (Optional) + +Marks an artifact as published. + +Payload (encoding): + +```c +typedef struct { + ArtifactRef artifact; +} ArtifactPublishPayload; +``` + +Semantics: + +* Publication is domain-local. +* Federation layers may interpret this metadata. + +### 4.6 ARTIFACT_UNPUBLISH (Optional) + +Withdraws publication. + +Payload (encoding): + +```c +typedef struct { + ArtifactRef artifact; +} ArtifactUnpublishPayload; +``` + +--- + +## 5. Replay Semantics (Normative) + +To reconstruct CURRENT: + +1. Load latest snapshot anchor (if any). +2. Initialize visible segments from that snapshot. +3. Replay all log records with `logseq > snapshot.logseq`. +4. Apply records in order: + + * SEGMENT_SEAL -> add segment + * TOMBSTONE -> update policy state + * TOMBSTONE_LIFT -> override policy + * PUBLISH/UNPUBLISH -> update visibility metadata + +Replay MUST be deterministic. + +--- + +## 6. Index Interaction + +* Index segments contain index entries. +* The log never records individual index entries. +* Visibility is controlled solely by SEGMENT_SEAL. +* Index rebuild = scan visible segments + apply policy. + +--- + +## 7. Garbage Collection Constraints + +* A segment may be GC'd only if: + + * No snapshot references it. + * No log replay <= CURRENT requires it. + +* Log truncation is only safe at SNAPSHOT_ANCHOR boundaries. + +--- + +## 8. Versioning & Extensibility + +* Unknown record types MUST be skipped and MUST NOT break replay. +* Payloads are opaque outside their type. +* New record types may be added in later versions. + +--- + +## 9. Non-Goals + +ASL/LOG/1 does not define: + +* Federation protocols +* Network replication +* Witness signatures +* Block-level events +* Hydration / eviction +* Execution receipts + +--- + +## 10. Invariant (Informative) + +> If it affects visibility, admissibility, or authority, it goes in the log. +> If it affects layout or performance, it does not. + +--- + +## 10. Summary + +ASL/LOG/1 defines the minimal semantic log needed to reconstruct CURRENT. + +If it affects visibility or admissibility, it goes in the log. If it affects layout or performance, it does not. diff --git a/tier1/asl-store-index-1.md b/tier1/asl-store-index-1.md new file mode 100644 index 0000000..eecda05 --- /dev/null +++ b/tier1/asl-store-index-1.md @@ -0,0 +1,414 @@ +# ASL/STORE-INDEX/1 — Store Semantics and Contracts for ASL Core Index + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [deterministic, index, log, storage] + + + +**Document ID:** `ASL/STORE-INDEX/1` +**Layer:** L1 — Store lifecycle and replay contracts (no encoding) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` — semantic index model +* `ASL/LOG/1` — append-only log semantics + +**Informative references:** + +* `ENC/ASL-CORE-INDEX/1` — index segment encoding +* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment) +* `TGK/1` — TGK semantics and visibility alignment +* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 1. Purpose + +This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX. + +It specifies: + +* **Block lifecycle**: creation, sealing, retention, GC +* **Index segment lifecycle**: creation, append, seal, visibility +* **Snapshot identity and log positions** for deterministic replay +* **Append-only log semantics** +* **Lookup, visibility, and crash recovery rules** +* **Small vs large block handling** + +It **does not define encoding** (see `ENC/ASL-CORE-INDEX/1`) or semantic mapping (see `ASL/1-CORE-INDEX`). + +**Informative references:** + +* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment) +* `TGK/1` — TGK semantics and visibility alignment +* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions + +--- + +## 2. Scope + +Covers: + +* Lifecycle of **blocks** and **index entries** +* Snapshot and CURRENT consistency guarantees +* Deterministic replay and recovery +* GC and tombstone semantics +* Packing policy for small vs large artifacts + +Excludes: + +* Disk-level encoding +* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1) +* Memory residency or caching +* Federation, PEL, or TGK semantics (see `TGK/1` and `TGK/1-CORE`) + +--- + +## 3. Core Concepts + +### 3.1 Block + +* **Definition:** Immutable storage unit containing artifact bytes. +* **Identifier:** BlockID (opaque, unique). +* **Properties:** + + * Once sealed, contents never change. + * Can be referenced by multiple artifacts. + * May be pinned by snapshots for retention. + * Allocation method is implementation-defined (e.g., hash or sequence). + +### 3.2 Index Segment + +Segments group index entries and provide **persistence and recovery units**. + +* **Open segment:** accepting new index entries, not visible for lookup. +* **Sealed segment:** closed for append, log-visible, snapshot-pinnable. +* **Segment components:** header, optional bloom filter, index records, footer. +* **Segment visibility:** only after seal and log append. + +### 3.3 Append-Only Log + +All store-visible mutations are recorded in a **strictly ordered, append-only log**: + +* Entries include: + + * Index additions + * Tombstones + * Segment seals +* Log is replayable to reconstruct CURRENT. +* Log semantics are defined in `ASL/LOG/1`. + +### 3.4 Snapshot Identity and Log Position + +To make CURRENT referencable and replayable, ASL-STORE-INDEX defines: + +* **SnapshotID**: opaque, immutable identifier for a snapshot. +* **LogPosition**: monotonic integer position in the append-only log. +* **IndexState**: `(SnapshotID, LogPosition)`. + +Deterministic replay is defined as: + +``` +Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition]) +``` + +Snapshots and log positions are required for checkpointing, federation, and deterministic recovery. + +### 3.5 Artifact Location + +* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block. +* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes. +* Multi-extent locations allow a single artifact to be striped across multiple blocks. + +--- + +## 4. PUT/GET Contract (Normative) + +### 4.1 PUT Signature + +``` +put(artifact) -> (ArtifactKey, IndexState) +``` + +* `ArtifactKey` is the content identity (ASL/1-CORE-INDEX). +* `IndexState = (SnapshotID, LogPosition)` after the PUT is admitted. + +### 4.2 PUT Semantics + +1. **Structural registration (if applicable)**: if a structural index (SID -> DAG) exists, it MUST register the artifact and reuse existing SID entries. +2. **Materialization (if applicable)**: if the artifact is lazy, materialize deterministically to derive `ArtifactKey`. +3. **Deduplication**: lookup `ArtifactKey` at CURRENT. If present, PUT MUST succeed without writing bytes or adding a new index entry. +4. **Storage**: if absent, write bytes to one or more sealed blocks and produce `ArtifactLocation`. +5. **Index mutation**: append an index entry mapping `ArtifactKey -> ArtifactLocation` and record visibility via log order. + +### 4.3 PUT Guarantees + +* PUT is idempotent for identical artifacts. +* No visible index entry points to mutable or missing bytes. +* Visibility follows log order and seal rules defined in this document. + +### 4.4 GET Signature + +``` +get(ArtifactKey, IndexState?) -> bytes | NOT_FOUND +``` + +* `IndexState` defaults to CURRENT when omitted. + +### 4.5 GET Semantics + +1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_prefix)`. +2. If no entry exists, return `NOT_FOUND`. +3. Otherwise, read exactly the referenced `(BlockID, offset, length)` bytes and return them verbatim. + +GET MUST NOT mutate state or trigger materialization. + +### 4.6 Failure Semantics + +* Partial writes MUST NOT become visible. +* Replay of snapshot + log after crash MUST reconstruct a valid CURRENT. +* Implementations MAY use caching, but MUST preserve determinism. + +--- + +## 5. Block Lifecycle Semantics + +| Event | Description | Semantic Guarantees | +| ------------------ | ------------------------------------- | ------------------------------------------------------------- | +| Creation | Block allocated; bytes may be written | Not visible to index until sealed | +| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index | +| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed | +| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed | + +Notes: + +* Sealing ensures any index entry referencing the block is immutable. +* Retention is driven by snapshot and log visibility rules. +* GC must **never violate CURRENT reconstruction guarantees**. + +--- + +## 6. Segment Lifecycle Semantics + +### 5.1 Creation + +* Open segment is allocated. +* Index entries appended in log order. +* Entries are invisible until segment seal and log append. + +### 5.2 Seal + +* Segment is closed to append. +* Seal record is written to append-only log. +* Segment becomes visible for lookup. +* Sealed segment may be snapshot-pinned. + +### 5.3 Snapshot Interaction + +* Snapshots capture sealed segments. +* Open segments need not survive snapshot. +* Segments below snapshot are replay anchors. + +--- + +## 7. Visibility and Lookup Semantics + +### 6.1 Visibility Rules + +* Entry visible **iff**: + + * The block is sealed. + * Log record exists at position ≤ CURRENT. + * Segment seal recorded in log. + +* Entries above CURRENT or referencing unsealed blocks are invisible. + +### 6.2 Lookup Semantics + +To resolve an `ArtifactKey`: + +1. Identify all visible segments ≤ CURRENT. +2. Search segments in **reverse seal-log order** (highest seal log position first). +3. Return first matching entry. +4. Respect tombstones to shadow prior entries. + +Determinism: + +* Lookup results are identical across platforms given the same snapshot and log prefix. +* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**. + +--- + +## 8. Snapshot Interaction + +* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time. +* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration. +* CURRENT is reconstructed as: + +``` +CURRENT = snapshot_state + replay(log) +``` + +Segment and block visibility rules: + +| Entity | Visible in snapshot | Visible in CURRENT | +| -------------------- | ---------------------------- | ------------------------------ | +| Open segment/block | No | Only after seal and log append | +| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log | +| Tombstone | Yes, if log-recorded | Yes, shadows prior entries | + +--- + +## 9. Garbage Collection + +Eligibility for GC: + +* Segments: sealed, no references from CURRENT or snapshots. +* Blocks: unpinned, unreferenced by any segment or artifact. + +Rules: + +* GC is safe **only on sealed segments and blocks**. +* Must respect snapshot pins. +* Tombstones may aid in invalidating unreachable blocks. +* Snapshots retained for provenance or receipt verification MUST remain pinned. + +Outcome: + +* GC never violates CURRENT reconstruction. +* Blocks can be reclaimed without breaking provenance. + +--- + +## 10. Tombstone Semantics + +* Optional marker to invalidate prior mappings. +* Visibility rules identical to regular index entries. +* Used to maintain deterministic CURRENT in face of shadowing or deletions. + +--- + +## 11. Small vs Large Block Handling + +### 11.1 Definitions + +| Term | Meaning | +| ----------------- | --------------------------------------------------------------------- | +| **Small block** | Block containing artifact bytes below a threshold `T_small`. | +| **Large block** | Block containing artifact bytes ≥ `T_small`. | +| **Mixed segment** | Segment containing both small and large blocks (discouraged). | +| **Packing** | Combining multiple small artifacts into a single physical block. | +| **BlockID** | Opaque identifier for a block; addressing is identical for all sizes. | + +Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers. +`T_small` is configurable per deployment. + +### 11.2 Packing Rules + +1. **Small blocks may be packed together** to reduce storage overhead. +2. **Large blocks are never packed with other artifacts**. +3. Mixed segments are **allowed but discouraged**; implementations MAY warn when mixing occurs. + +### 11.3 Segment Allocation Rules + +1. Small blocks are allocated into segments optimized for packing efficiency. +2. Large blocks are allocated into segments optimized for sequential I/O. +3. Segment sealing and visibility rules remain unchanged. + +### 11.4 Indexing and Addressing + +All blocks are addressed uniformly: + +``` +ArtifactExtent = (BlockID, offset, length) +ArtifactLocation = [ArtifactExtent...] +``` + +Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed. + +### 11.5 GC and Retention + +1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable. +2. Large blocks are reclaimed per block. + +Invariant: GC must never remove bytes still referenced by CURRENT or snapshots. + +--- + +## 12. Crash and Recovery Semantics + +* Open segments or unsealed blocks may be lost; no invariant is broken. +* Recovery procedure: + + 1. Mount last checkpoint snapshot. + 2. Replay append-only log from checkpoint. + 3. Reconstruct CURRENT. + +* Recovery is **deterministic and idempotent**. +* Segments and blocks **never partially visible** after crash. + +--- + +## 13. Normative Invariants + +1. Sealed blocks are immutable. +2. Index entries referencing blocks are immutable once visible. +3. Shadowing follows strict log order. +4. Replay of snapshot + log uniquely reconstructs CURRENT. +5. GC cannot remove blocks or segments needed by snapshot or CURRENT. +6. Tombstones shadow prior entries without deleting underlying blocks prematurely. +7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT. + +--- + +## 14. Non-Goals + +* Disk-level encoding (ENC-ASL-CORE-INDEX). +* Memory layout or caching. +* Sharding or performance heuristics. +* Federation / multi-domain semantics (handled elsewhere). +* Block packing strategies beyond the policy rules here. + +--- + +## 15. Relationship to Other Layers + +| Layer | Responsibility | +| ------------------ | ---------------------------------------------------------------------------- | +| ASL-CORE | Artifact semantics, existence of blocks, immutability | +| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation | +| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments | +| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters | + +--- + +## 16. Summary + +The tier1 ASL-STORE-INDEX specification: + +* Defines **block lifecycle** and **segment lifecycle**. +* Makes **snapshot identity and log positions** explicit for replay. +* Ensures deterministic visibility, lookup, and crash recovery. +* Formalizes GC safety and tombstone behavior. +* Adds clear **small vs large block** handling without changing core semantics. diff --git a/tier1/asl-system-1.md b/tier1/asl-system-1.md new file mode 100644 index 0000000..297d4f5 --- /dev/null +++ b/tier1/asl-system-1.md @@ -0,0 +1,213 @@ +# ASL/SYSTEM/1 — Unified ASL + TGK + PEL System View + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: No +Last Updated: 2025-01-17 +Linked Phase Pack: N/A +Tags: [deterministic, federation, pel, tgk, index] + + + +**Document ID:** `ASL/SYSTEM/1` +**Layer:** L2 — Cross-cutting system view (no new encodings) + +**Depends on (normative):** + +* `ASL/1-CORE` +* `ASL/1-CORE-INDEX` +* `ASL/STORE-INDEX/1` +* `ASL/LOG/1` +* `ENC/ASL-CORE-INDEX/1` + +**Informative references:** + +* `ASL/INDEX-ACCEL/1` +* `TGK/1` — Trace Graph Kernel semantics +* PEL draft specs (program DAG, execution receipts) +* `ASL/FEDERATION/1` — core federation semantics +* `ASL/FEDERATION-REPLAY/1` — cross-node deterministic replay +* `ASL/DAP/1` — domain admission +* `ASL/POLICY-HASH/1` — policy binding + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are +to be interpreted as in RFC 2119. + +ASL/SYSTEM/1 is an integration view. It does not define new encodings or +storage formats; those remain in the underlying layer specs. + +--- + +## 1. Purpose & Scope + +This document aligns the cross-cutting semantics of: + +* ASL index and log behavior +* PEL deterministic execution +* TGK edge semantics and traversal +* Federation visibility and replay + +It ensures a single, consistent model for determinism, snapshot bounds, and +domain visibility. + +Non-goals: + +* New on-disk encodings +* New execution operators +* Domain policy or governance rules + +--- + +## 2. Core Objects (Unified View) + +* **Artifact**: immutable byte value (ASL/1-CORE). +* **PER**: PEL Execution Receipt stored as an artifact. +* **TGK Edge**: immutable edge record linking artifacts and/or PERs. +* **Snapshot + Log Prefix**: boundary for deterministic visibility and replay. +* **Domain Visibility**: internal vs published visibility embedded in index + records (ENC-ASL-CORE-INDEX). + +All of these objects are addressed and stored via the same index semantics. + +--- + +## 3. Determinism & Snapshot Boundaries + +For a fixed `(SnapshotID, LogPrefix)`: + +* Index lookup is deterministic (ASL/1-CORE-INDEX). +* TGK traversal is deterministic when bounded by the same snapshot/log prefix. +* PEL execution is deterministic when its inputs are bounded by the same + snapshot/log prefix. + +PEL MUST read only snapshot-scoped artifacts and receipts. It MUST NOT depend +on storage layout, block packing, or non-snapshot metadata. + +PEL outputs (artifacts and PERs) become visible only through normal index +admission and log ordering. + +PEL MUST NOT depend on physical storage metadata. It MAY read only: + +* snapshot identity +* execution configuration that is itself snapshot-scoped and immutable + +--- + +## 4. One PEL Principle (Resolution) + +There is exactly one PEL: a deterministic, snapshot-bound, authority-aware +derivation language mapping artifacts to artifacts. + +Distinctions such as "PEL-S" vs "PEL-P" are not separate languages. They are +policy decisions about how outputs are treated: + +* **Promotion** (truth vs view) is a domain policy decision. +* **Publication** (internal vs published) is a visibility decision encoded in + index metadata. +* **Retention** (store, cache, discard, recompute) is a store policy decision. + +Implementations MUST NOT fork PEL semantics into separate dialects. Any +classification of outputs MUST be expressed via policy, publication flags, or +receipt annotations, not by changing the execution language. + +--- + +## 5. PEL, PERs, and TGK Integration + +* PEL programs consume artifacts and/or PERs. +* PEL execution produces artifacts and a PER describing the run. +* TGK edges may reference artifacts, PERs, or projections derived from them. + +--- + +## 5.1 PERs and Snapshot State (Clarification) + +PERs are artifacts that bind deterministic execution to a specific snapshot +and log prefix. They do not introduce a separate storage layer: + +* The sequential log and snapshot define CURRENT. +* A PER records that execution observed CURRENT at a specific log prefix. +* Replay uses the same snapshot + log prefix to reconstruct inputs. +* PERs are artifacts and MAY be used as inputs, but programs embedded in + receipts MUST NOT be executed implicitly. + +TGK remains a semantic graph layer; it does not alter PEL determinism and does +not bypass the index. + +--- + +## 6. Federation Alignment + +Federation operates over the same immutable artifacts, PERs, and TGK edges. +Cross-domain visibility is governed by index metadata: + +* `domain_id` identifies the owning domain. +* `visibility` marks internal vs published. +* `cross_domain_source` preserves provenance for imported artifacts. + +Deterministic replay across nodes MUST respect: + +* Snapshot boundaries +* Log order +* Domain visibility rules + +Federation does not change PEL semantics. It propagates artifacts and receipts +that were already deterministically produced. + +Admission and policy compatibility gate foreign state: only admitted domains and +policy-compatible published state may be included in a federation view. + +--- + +## 7. Index Alignment + +The index is the shared substrate: + +* Artifacts, PERs, and TGK edges are all indexed via the same lookup semantics. +* Sharding, SIMD, and filters (ASL/INDEX-ACCEL/1) are advisory and MUST NOT + change correctness. +* Tombstones and shadowing remain the only visibility overrides. + +--- + +## 8. Glossary and Terminology Alignment (Informative) + +To prevent drift across layers, the following terms map as: + +* **EdgeBody** (`TGK/1-CORE`) — logical edge content (`from[]`, `to[]`, `payload`, `type`). +* **EdgeArtifact** (`TGK/1-CORE`) — ASL Artifact whose payload encodes an EdgeBody. +* **EdgeRef** (`TGK/1-CORE`) — ASL Reference to an EdgeArtifact. +* **TGK index record** (`TGK/1`, `ASL/1-CORE-INDEX`) — index entry that makes an EdgeRef visible under snapshot/log rules; contains no edge payload. +* **TGK traversal result** (`TGK/1`) — snapshot/log-bounded set of visible edges (EdgeRefs) and/or node references derived from indexed EdgeArtifacts. + +--- + +## 9. Summary + +ASL/SYSTEM/1 provides a single, consistent view: + +* One PEL, with policy-based output treatment +* TGK and PEL both bounded by snapshot + log determinism +* Federation mediated by index-level domain metadata +* Index semantics remain the core substrate for all objects diff --git a/tier1/asl-tgk-execution-plan-1.md b/tier1/asl-tgk-execution-plan-1.md new file mode 100644 index 0000000..e51931a --- /dev/null +++ b/tier1/asl-tgk-execution-plan-1.md @@ -0,0 +1,251 @@ +# ASL/TGK-EXEC-PLAN/1 -- Unified Execution Plan Semantics + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: No +Last Updated: 2025-01-17 +Linked Phase Pack: N/A +Tags: [execution, query, tgk, determinism] + + + +**Document ID:** `ASL/TGK-EXEC-PLAN/1` +**Layer:** L2 -- Execution plan semantics (no encoding) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` +* `ASL/LOG/1` +* `ASL/INDEX-ACCEL/1` +* `TGK/1` + +**Informative references:** + +* `ASL/SYSTEM/1` +* `ENC/ASL-CORE-INDEX/1` +* `ENC/ASL-TGK-EXEC-PLAN/1` + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/TGK-EXEC-PLAN/1 defines execution plan semantics for querying artifacts and TGK edges. It does not define encoding, transport, or runtime scheduling. + +--- + +## 1. Purpose + +This document defines the operator model and determinism rules for executing queries over ASL artifacts and TGK edges using snapshot-bounded visibility. + +--- + +## 2. Execution Plan Model (Normative) + +An execution plan is a DAG of operators: + +``` +Plan = { nodes: [Op], edges: [(Op -> Op)] } +``` + +Each operator includes: + +* `op_id`: unique identifier +* `op_type`: operator type +* `inputs`: upstream operator outputs +* `snapshot`: `(SnapshotID, LogPrefix)` +* `constraints`: canonical filters +* `projections`: output fields +* `traversal`: optional traversal parameters +* `aggregation`: optional aggregation parameters + +--- + +## 2.1 Query Abstraction (Informative) + +A query can be represented as: + +``` +Q = { + snapshot: S, + constraints: C, + projections: P, + traversal: optional, + aggregation: optional +} +``` + +Where: + +* `constraints` describe canonical filters (artifact keys, type tags, edge types, roles, node IDs). +* `projections` select output fields. +* `traversal` declares TGK traversal depth and direction. +* `aggregation` defines deterministic reduction operations. + +--- + +## 3. Deterministic Ordering (Normative) + +All operator outputs MUST be ordered by: + +1. `logseq` ascending +2. canonical key ascending (tie-breaker) + +Parallel execution MUST preserve this order. + +--- + +## 4. Visibility Rules (Normative) + +Records are visible if and only if: + +* `record.logseq <= snapshot.log_prefix` +* The record is not shadowed by a later tombstone + +Unknown record types MUST be skipped without breaking determinism. + +--- + +## 5. Operator Types (Normative) + +### 5.1 SegmentScan + +* Inputs: sealed segments +* Outputs: raw record references +* Rules: + * Only segments with `segment.logseq_min <= snapshot.log_prefix` are scanned. + * Advisory filters MAY be applied but MUST NOT introduce false negatives. + * Shard routing MAY be applied prior to scan if deterministic. + +### 5.2 IndexFilter + +* Inputs: record stream +* Outputs: filtered record stream +* Rules: + * Applies canonical constraints (artifact key, type tag, TGK edge type, roles). + * Filters MUST be exact; advisory filters are not sufficient. + +### 5.3 TombstoneShadow + +* Inputs: record stream + tombstone stream +* Outputs: visible records only +* Rules: + * Later tombstones shadow earlier entries with the same canonical key. + +### 5.4 Merge + +* Inputs: multiple ordered streams +* Outputs: single ordered stream +* Rules: + * Order is `logseq` then canonical key. + * Merge MUST be deterministic regardless of shard order. + +### 5.5 Projection + +* Inputs: record stream +* Outputs: projected fields +* Rules: + * Projection MUST preserve input order. + +### 5.6 TGKTraversal + +* Inputs: seed node set +* Outputs: edge and/or node stream +* Rules: + * Expansion MUST respect snapshot bounds. + * Traversal depth MUST be explicit. + * Order MUST follow deterministic ordering rules. + +### 5.7 Aggregation (Optional) + +* Inputs: record stream +* Outputs: aggregate results +* Rules: + * Aggregation MUST be deterministic given identical inputs and snapshot. + +### 5.8 LimitOffset (Optional) + +* Inputs: ordered record stream +* Outputs: ordered slice +* Rules: + * Applies pagination or top-N selection. + * MUST preserve deterministic order from upstream operators. + +### 5.9 ShardDispatch (Optional) + +* Inputs: shard-local streams +* Outputs: ordered global stream +* Rules: + * Shard execution MAY be parallel. + * Merge MUST preserve deterministic ordering by `logseq` then canonical key. + +### 5.10 SIMDFilter (Optional) + +* Inputs: record stream +* Outputs: filtered record stream +* Rules: + * SIMD filters are advisory accelerators. + * Canonical checks MUST still be applied before output. + +--- + +## 6. Acceleration Constraints (Normative) + +Acceleration mechanisms (filters, routing, SIMD) MUST be observationally invisible: + +* False positives are permitted. +* False negatives are forbidden. +* Canonical checks MUST always be applied before returning results. + +--- + +## 7. Plan Serialization (Optional) + +Execution plans MAY be serialized for reuse or deterministic replay. + +```c +struct exec_plan { + uint32_t plan_version; + uint32_t operator_count; + struct operator_def operators[]; + struct operator_edge edges[]; +}; +``` + +Serialization MUST preserve operator parameters, snapshot bounds, and DAG edges. + +--- + +## 8. GC Safety (Informative) + +Records and edges MUST NOT be removed if they appear in a snapshot or are +reachable via traversal at that snapshot. + +--- + +## 9. Non-Goals + +ASL/TGK-EXEC-PLAN/1 does not define: + +* Runtime scheduling or parallelization strategy +* Encoding of operator plans +* Query languages or APIs +* Operator cost models diff --git a/tier1/dds.md b/tier1/dds.md new file mode 100644 index 0000000..902bd33 --- /dev/null +++ b/tier1/dds.md @@ -0,0 +1,944 @@ +# AMDUAT-DDS — Detailed Design Specification + +Status: Approved +Owner: Niklas Rydberg +Version: 0.5.0 +SoT: Yes +Last Updated: 2025-11-11 +Linked Phase Pack: PH01 +Tags: [design, cas, composition] + + + +**Document ID:** `AMDUAT-DDS` +**Layer:** L0.1 — Byte-level design (CAS + deterministic envelopes) + +**Depends on (normative):** + +* `AMDUAT-SRS` — behavioural requirements +* ADR-001 — CAS identity +* ADR-003 — canonical encoding discipline +* ADR-006 — deterministic error semantics + +**Informative references:** + +* ADR-015 — rejection governance + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +> **Note (scope):** +> This DDS covers **Phase 01 (Kheper CAS)** byte semantics and, where necessary, the canonical **binary encodings** for higher deterministic layers (FCS/1, PCB1, FER/1, FCT/1). +> **Behavioural semantics live in SRS.** This document governs the **bytes**. + +--- + +## 1 – Content ID (CID) + +**Rule.** + +``` +CID = algo_id || H("CAS:OBJ\0" || payload_bytes) +``` + +* `algo_id`: 1-byte or VARINT identifier (default `0x01` = SHA-256). +* `H`: selected hash over **exact payload bytes**. +* Domain separation prefix must be present verbatim: `"CAS:OBJ\0"`. + +**Properties.** + +* Deterministic: identical payload → identical CID. +* Implementation-independent (SRS NFR-001). +* Crypto-agile via `algo_id`. + +**Errors.** + +* `ERR_ALGO_UNSUPPORTED` when `algo_id` not registered. +* Empty payload is allowed and canonical. + +--- + +## 2. Canonical Object Record (COR/1) + +COR/1 is the **only** canonical import/export envelope for CAS objects. Exact bytes are consensus; on-disk layout is not. + +### 2.1 Envelope Layout (exact bytes) + +``` +Header (7 bytes total): + MAGIC : 4 bytes = "CAS1" (0x43 0x41 0x53 0x31) + VERSION : 1 byte = 0x01 + FLAGS : 1 byte = 0x00 (reserved; MUST be 0) + RSV : 1 byte = 0x00 (reserved; MUST be 0) + +Body (strict TLV order; no padding): + 0x10 algo_id (VARINT) + 0x11 size (VARINT) + 0x12 payload (BYTES; length == size) +``` + +**Notes** + +* Fixed header invariants; any mismatch is rejection. +* No alignment/padding anywhere. + +### 2.2 Tag Semantics + +| Tag | Name | Type | Card. | Notes | +| ---: | ------- | ------ | ----: | ----------------------------------------------- | +| 0x10 | algo_id | VARINT | 1 | MUST equal algorithm used for the object’s CID. | +| 0x11 | size | VARINT | 1 | **Minimal VARINT**; MUST equal payload length. | +| 0x12 | payload | BYTES | 1 | Raw bytes; never normalized. | + +### 2.3 Canonicalization Rules (strict) + +1. **Order & uniqueness:** `0x10`, `0x11`, `0x12`, each exactly once. +2. **VARINTS:** Unsigned LEB128 **minimal** form only. +3. **BYTES:** `VARINT(len) || len bytes`, with `len == size`. +4. **No extras:** No unknown tags, no trailing bytes. +5. **Header invariants:** `MAGIC="CAS1"`, `VERSION=0x01`, `FLAGS=RSV=0x00`. +6. **Policy domain:** `size ≤ max_object_size` when enforced (ICD/1 §3). +7. **Raw byte semantics** (SRS FR-010). + +### 2.4 Decoder Validation Algorithm (normative) + +1. Validate header ⇒ else `ERR_COR_HEADER_INVALID`. +2. Read `0x10` minimal VARINT ⇒ else `ERR_COR_TAG_ORDER` / `ERR_VARINT_NON_MINIMAL`. +3. Read `0x11` minimal VARINT ⇒ same error rules. +4. Read `0x12` BYTES (length minimal VARINT) ⇒ else `ERR_VARINT_NON_MINIMAL`. +5. Enforce `size == len(payload)` ⇒ `ERR_COR_LENGTH_MISMATCH` on failure. +6. Ensure **no trailing bytes** ⇒ `ERR_TRAILING_BYTES`. +7. Recompute CID and compare ⇒ mismatch `ERR_CORRUPT_OBJECT`. + +### 2.5 Consistency with CID (normative) + +* **Export:** set `algo_id` to CID algorithm. +* **Import:** verify `algo_id` and hash component against expected CID. +* Mismatch ⇒ `ERR_ALGO_MISMATCH` / `ERR_CORRUPT_OBJECT`. + +### 2.6 Round-Trip Identity + +`import(COR/1) → export(CID)` MUST produce **byte-identical** envelope (SRS FR-005). Re-encoding is forbidden. + +### 2.7 Rejection Matrix (normative) + +| Violation | Example | Error | +| ------------------ | -------------------------------- | ------------------------- | +| Bad header | Wrong MAGIC/VERSION/FLAGS/RSV | `ERR_COR_HEADER_INVALID` | +| Unknown/extra tag | Any tag not 0x10/0x11/0x12 | `ERR_COR_UNKNOWN_TAG` | +| Out-of-order | `0x11` before `0x10` | `ERR_COR_TAG_ORDER` | +| Duplicate tag | Two `0x10` entries | `ERR_COR_DUPLICATE_TAG` | +| Non-minimal VARINT | Over-long algo/size/bytes length | `ERR_VARINT_NON_MINIMAL` | +| Length mismatch | `size != len(payload)` | `ERR_COR_LENGTH_MISMATCH` | +| Trailing bytes | Any bytes after payload | `ERR_TRAILING_BYTES` | +| Algo mismatch | `algo_id` conflicts with CID | `ERR_ALGO_MISMATCH` | +| Hash mismatch | Recomputed hash ≠ expected | `ERR_CORRUPT_OBJECT` | + +--- + +## 3. Instance Descriptor (ICD/1) + +ICD/1 publishes canonical instance configuration; its bytes are consensus. + +### 3.1 Envelope + +``` +Header: + MAGIC : "ICD1" + VERSION : 0x01 + +TLV (strict order; minimal VARINTs; no duplicates): + 0x20 algo_default (VARINT) + 0x21 max_object_size (VARINT) + 0x22 cor_version (VARINT) # 0x01 => COR/1 v1 + 0x23 gc_policy_id (VARINT; 0 if none) + 0x24 impl_id (BYTES; optional build/impl descriptor CID) +``` + +### 3.2 Derived Identity + +``` +instance_id = SHA-256("CAS:ICD\0" || bytes(ICD/1)) +``` + +**Rules:** Ordering/minimal VARINTs mirror COR/1. Exporters preserve canonical bytes; `instance_id` is stable. + +--- + +## 4. Encodings + +* **VARINT (unsigned LEB128)** — minimal form only; else `ERR_VARINT_NON_MINIMAL`. +* **BYTES** — `VARINT(length) || length bytes`. +* **Fixed-width integers** — big-endian if present. +* **No padding/alignment** in canonical encodings. + +--- + +## 5. Algorithm Registry + +**Default** + +* `0x01` → SHA-256 + +**Reserved** + +* `0x02` → SHA-512/256 +* `0x03` → BLAKE3 + +**Policy** + +* New entries require ADR + test vectors. Backward compatible by design. + +--- + +## 6. Filesystem Considerations (Informative) + +``` +cas/ +├─ sha256/ +│ ├─ aa/.. # fan-out by CID prefix (implementation detail) +│ └─ ff/.. +└─ amduat/ + └─ / + ├─ amduatcas + ├─ sha256/.. # private runtime state; never a put() target + ├─ interface/ + │ └─ libamduatcas.current + ├─ HEAD + └─ meta/ +``` + +**Rule:** Public CAS API acts only on `cas/sha256/`. The per-instance subtree is private and MUST NOT receive `put()` writes. + +--- + +## 7. Error Conditions & Higher-Layer Layouts (Normative) + +### 7.1 COR/1 & ICD/1 Enforcement (codes) + +* `ERR_COR_HEADER_INVALID`, `ERR_COR_UNKNOWN_TAG`, `ERR_COR_TAG_ORDER`, `ERR_COR_DUPLICATE_TAG`, + `ERR_COR_LENGTH_MISMATCH`, `ERR_VARINT_NON_MINIMAL`, `ERR_ALGO_UNSUPPORTED`, + `ERR_ALGO_MISMATCH`, `ERR_TRAILING_BYTES`, `ERR_CORRUPT_OBJECT`. + +--- + +### 7.2 FCS/1 Descriptor Layout — v1-min (Normative) + +> **Design principle:** *FCS/1 describes the deterministic execution recipe only.* +> Intent, roles, scope, authority, and registry policy are **not** encoded in FCS; they are captured at **certification time** in FCT/1. + +Header: `MAGIC="FCS1" VERSION=0x01 FLAGS=RSV=0x00` + +| Tag | Field | Type | Card. | Notes | +| ---: | ----------------- | ------ | ----: | ------------------------------------------ | +| 0x30 | `function_ptr` | CID | 1 | FPS/1 primitive or nested FCS/1 descriptor | +| 0x31 | `parameter_block` | CID | 1 | CID of PCB1 parameter block | +| 0x32 | `arity` | VARINT | 1 | Expected parameter slots | + +**Validation rules** + +1. Strict TLV order; duplicates/out-of-order → `ERR_FCS_TAG_ORDER`. +2. `parameter_block` MUST be valid PCB1 → `ERR_FCS_PARAMETER_FORMAT`. +3. `arity` MUST match slot count → `ERR_PCB_ARITY_MISMATCH`. +4. Descriptor graph MUST be acyclic → `ERR_FCS_CYCLE_DETECTED`. +5. **Any unknown or legacy governance tag** (`registry_policy 0x33`, `intent_vector 0x34`, `provenance_edge 0x35`, `notes 0x36`, or unregistered fields) → `ERR_FCS_UNKNOWN_TAG`. Such tags MUST never be tolerated in canonical streams. + +--- + +### 7.3 PCB1 Parameter Blocks (Normative) + +PCB1 payloads are COR/1 envelopes with header `MAGIC="PCB1"`, `VERSION=0x01`, `FLAGS=RSV=0x00`. + +| Tag | Field | Type | Notes | +| ---: | --------------- | ----- | ----------------------------------------------------- | +| 0x50 | `slot_manifest` | BCF/1 | Canonical slot descriptors `{index,name,type,digest}` | +| 0x51 | `slot_data` | BYTES | Packed slot bytes respecting manifest order | + +**Rules:** +Slots appear in ascending `index`. Numeric slots default to `0` when omitted. +Digest mismatches ⇒ `ERR_PCB_DIGEST_MISMATCH`. Non-deterministic ordering ⇒ `ERR_PCB_MANIFEST_ORDER`. +Arity mismatch vs FCS/1 ⇒ `ERR_PCB_ARITY_MISMATCH`. + +--- + +### 7.4 **FER/1 Receipt Layout (Normative)** + +FER/1 receipts reuse COR/1 framing with header `"FER1"` and are byte-deterministic. + +**Strict TLV order (no padding):** + +| Tag | Field | Type | Cardinality | Notes | +| ---- | --------------------- | ----------- | ----------- | ----- | +| 0x40 | `function_cid` | CID | 1 | Evaluated FCS/1 descriptor (must decode to v1-min). | +| 0x41 | `input_manifest` | CID | 1 | MUST decode to GS/1 BCF/1 set list (deduped, byte-lexicographic). | +| 0x42 | `environment` | CID | 1 | ICD/1 snapshot or PH03 environment capsule. | +| 0x43 | `evaluator_id` | BYTES | 1 | Stable evaluator identity (DID/descriptor CID). | +| 0x44 | `executor_set` | BCF/1 map | 1 | Map of executors → impl metadata (language/version/build); keys sorted. | +| 0x4F | `executor_fingerprint`| CID | 0–1 | SBOM/attestation CID feeding `run_id`; REQUIRED when `run_id` present. | +| 0x45 | `output_cid` | CID | 1 | Canonical output CID (single-output invariant). | +| 0x46 | `parity_vector` | BCF/1 list | 1 | Sorted by executor key; each entry carries `{executor, output, digest, sbom_cid}`. | +| 0x47 | `logs` | LIST | 0–1 | Typed log capsules (`kind`, `cid`, `sha256`). | +| 0x51 | `determinism_level` | ENUM | 0–1 | `"D1_bit_exact"` (default) or `"D2_numeric_stable"`. | +| 0x50 | `rng_seed` | BYTES | 0–1 | 0–32 byte seed REQUIRED when determinism ≠ D1. | +| 0x52 | `limits` | BCF/1 map | 0–1 | Resource envelope (`cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`). | +| 0x48 | `started_at` | UINT64 | 1 | Epoch seconds (FR-020 start bound). | +| 0x49 | `completed_at` | UINT64 | 1 | Epoch seconds ≥ `started_at`. | +| 0x53 | `parent` | CID | 0–1 | Optional lineage pointer for follow-up runs. | +| 0x4A | `context` | BCF/1 map | 0–1 | Optional scheduling hooks (WT/1 ticket, TA/1 branch tip, notes ref). | +| 0x4B | `witnesses` | BCF/1 list | 0–1 | Optional observer descriptors / co-signers. | +| 0x4E | `run_id` | BYTES[32] | 0–1 | Deterministic dedup anchor (`H("AMDUAT:RUN\0" || function || manifest || env || fingerprint)`). | +| 0x4C | `signature` | BCF/1 map | 1 | Primary Ed25519 signature over `H("AMDUAT:FER\0" || canonical bytes)`. | +| 0x4D | `signature_ext` | BCF/1 list | 0–1 | Reserved slot for multi-sig / threshold proofs (future). | + +**Validation:** + +1. TLV order strict; unknown tags ⇒ `ERR_FER_TAG_ORDER` / `ERR_FER_UNKNOWN_TAG`. +2. `function_cid` must decode to valid FCS/1 ⇒ `ERR_FER_FUNCTION_MISMATCH` otherwise. +3. `input_manifest` MUST decode to GS/1 set list (deduped + byte-lexicographic). Violations ⇒ `ERR_FER_INPUT_MANIFEST_SHAPE`. +4. `executor_set` keys MUST be byte-lexicographic and align with `parity_vector` entries. Ordering mismatches ⇒ `ERR_IMPL_PARITY_ORDER`; missing executors or divergent outputs ⇒ `ERR_IMPL_PARITY`. +5. Each parity entry MUST declare `sbom_cid` referencing the executor’s mini-SBOM CID. +6. `determinism_level` defaults to `D1_bit_exact`; when set to any other value a 0–32 byte `rng_seed` is REQUIRED ⇒ `ERR_FER_RNG_REQUIRED`. +7. `limits` (when present) MUST supply non-negative integers for `cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`. +8. `logs` (when present) MUST contain objects with `kind ∈ {stderr, stdout, metrics, trace}`, `cid`, and `sha256` (both 32-byte hex strings). +9. `run_id` (when present) MUST equal `H("AMDUAT:RUN\0" || function_cid || manifest_cid || environment_cid || executor_fingerprint)`; missing fingerprint ⇒ `ERR_FER_UNKNOWN_TAG`. +10. `completed_at < started_at` ⇒ `ERR_FER_TIMESTAMP` (FR-020 envelope enforcement). +11. Signatures MUST verify against `H("AMDUAT:FER\0" || canonical bytes)` ⇒ failure ⇒ `ERR_FER_SIGNATURE`. + +> **Manifest note:** `input_manifest` bytes MUST be the GS/1 canonical list; ingestion MUST reject producer-specific ordering. +> **Log capsule note:** `logs` entries bind `kind`, `cid`, and `sha256` together to avoid stdout/stderr hash confusion. +> **Dedup note:** `run_id` enables idempotent FER ingestion across registries while keeping the FER CID authoritative. +> **Provenance note:** FER/1 remains the exclusive home for run-time provenance and parity outcomes; governance stays in FCT/1. + +> **Graph note:** Ingestors emit `realizes`, `produced_by`, `consumed_by`, and (optionally) `fulfills` edges based solely on FER content. + +--- + +### 7.5 **FCT/1 Transaction Envelope (Normative)** + +> **Design principle:** *FCT/1 is the canonical home for **intent**, **domain scope**, **roles/authority**, and **policy snapshot*** captured at certification/publication time. + +FCT/1 serializes as ADR-003 BCF/1 map with canonical keys: + +| Key | Type | Notes | +| --------------------- | ----------- | ------------------------------------------------------- | +| `fct.version` | UINT8 | MUST be `1` | +| `fct.registry_policy` | UINT8 | Publication policy snapshot (0=Open,1=Curated,2=Locked) | +| `fct.function` | CID | Certified FCS/1 descriptor | +| `fct.receipts` | LIST | One or more FER/1 CIDs | +| `fct.authority_role` | ENUM | ADR-010C role | +| `fct.domain_scope` | ENUM | ADR-010B scope | +| `fct.intent` | SET | ADR-010 intents | +| `fct.constraints` | LIST | Optional constraint set | +| `fct.attestations` | LIST | Required when policy ≠ Open | +| `fct.timestamp` | UINT64 | Epoch seconds | +| `fct.publication` | CID | Optional ADR-007 digest | + +**Validation:** + +1. All receipts reference the same `function_cid` ⇒ else `ERR_FCT_RECEIPT_MISMATCH`. +2. If `registry_policy ≠ 0` then `attestations` **required** ⇒ `ERR_FCT_ATTESTATION_REQUIRED`. +3. All signatures/attestations verify ⇒ `ERR_FCT_SIGNATURE` on failure. +4. Receipt timestamps must be monotonic ⇒ `ERR_FCT_TIMESTAMP`. + +--- + +### 7.6 FPD/1 Publication Digest (Normative) + +> **Design principle:** *Federation publishes exactly one deterministic digest per event (ADR-007, SRS FR-022).* + +FPD/1 serializes as an ADR-003 BCF/1 map with canonical keys: + +| Key | Type | Notes | +| --------------- | ---------- | --------------------------------------------------------------------- | +| `fpd.version` | UINT8 | MUST be `1`. | +| `fpd.members` | LIST | Deterministic, byte-lexicographic list of member artefact CIDs. | +| `fpd.parent` | CID (opt) | Previous FPD/1 digest for the domain publication chain (or `null`). | +| `fpd.timestamp` | UINT64 | Epoch seconds aligned with `fct.timestamp` monotonic ordering. | +| `fpd.digest` | CID | Canonical digest over `{FCT/1 bytes, FER/1 receipts, governance edges}`. | + +**Construction:** + +1. Normalize and sign the FCT/1 record (per §7.5) writing canonical bytes to the payload area (PA). +2. Collect referenced FER/1 receipts and governance edges (`certifies`, `attests`, `publishes`) as canonical byte arrays. +3. Build `fpd.members` as the byte-lexicographic list of CIDs for the certified FCT/1 record, every FER/1 receipt, and the edge batch capsule. +4. Hash the concatenated canonical payloads using the federation digest algorithm (default `CIDv1/BCF`). Persist the resulting bytes and record the CID in `fpd.digest`. +5. If a prior publication exists, set `fpd.parent` to the previous digest CID; otherwise omit. +6. Emit the FPD/1 map, persist alongside the FCT/1 payload under `/logs/ph03/evidence/fct/`, and update `fct.publication` with the FPD/1 CID. + +**Validation:** + +* `fpd.members` MUST include exactly one FCT/1 CID and the full set of FER/1 receipt CIDs referenced by that transaction. +* Recomputing the digest from the persisted canonical payloads MUST yield `fpd.digest`; mismatches ⇒ `ERR_FPD_DIGEST` (registered under ADR-006). +* `fpd.timestamp` MUST be ≥ the largest FER/1 `completed_at` and ≥ the prior `fpd.timestamp` when `fpd.parent` is present ⇒ violations raise `ERR_FPD_TIMESTAMP`. +* Graph emitters MUST log governance edges via `lib/g1-emitter/` using the canonical digests referenced above. + +> **Graph note:** Publication surfaces emit `publishes(fct,fpd)` edges binding certification state to digest lineage for PH04 FLS/1 integration. + +### 7.7 Error Surface Registration (consolidated) + +All FCS/1, PCB1, FER/1, and FCT/1 errors map to ADR-006. +Additions since v0.3.0: + +| Code | Meaning | +| --------------------- | -------------------------------------------------------------------------------------- | +| `ERR_FCS_UNKNOWN_TAG` | Descriptor contained a tag outside the v1-min set (`0x30-0x32`). Rejected per ADR-006. | +| `ERR_EXEC_TIMEOUT` | Executor exceeded deterministic time envelope (Maat’s Balance). | +| `ERR_IMPL_PARITY` | Executor outputs/parity metadata diverged (missing executor, mismatched `output_cid`). | +| `ERR_IMPL_PARITY_ORDER` | Parity vector ordering did not match the canonical executor ordering. | +| `ERR_FER_UNKNOWN_TAG` | FER/1 payload contained an unknown tag or cardinality violation. | +| `ERR_FER_INPUT_MANIFEST_SHAPE` | `input_manifest` failed GS/1 set decoding (not deduped or unsorted). | +| `ERR_FER_RNG_REQUIRED` | `determinism_level` demanded an `rng_seed` but none was provided. | +| `ERR_FPD_DIGEST` | Recomputed federation digest did not match `fpd.digest` (non-deterministic publication). | +| `ERR_FPD_TIMESTAMP` | Publication timestamp regressed relative to receipts or parent digest. | +| `ERR_FPD_PARENT_REQUIRED` | Policy-enforced lineage expected `fpd.parent` but none was provided. | +| `ERR_FPD_MEMBER_DUP` | Duplicate member CID detected in the canonical set ordering. | +| `ERR_WT_UNKNOWN_KEY` | WT/1 map contained a key outside the v1-min schema. | +| `ERR_WT_VERSION_UNSUPPORTED` | `wt.version` not equal to `1`. | +| `ERR_WT_INTENT_EMPTY` | `wt.intent` list empty. | +| `ERR_WT_INTENT_DUP` | Duplicate ADR-010 intents detected in `wt.intent`. | +| `ERR_WT_TIMESTAMP` | `wt.timestamp` regressed relative to the previous ticket from the same author. | +| `ERR_WT_SIGNATURE` | Signature validation over `"AMDUAT:WT\0"` failed. | +| `ERR_WT_KEY_UNBOUND` | Declared `wt.pubkey` is not authorized for `wt.author` via the predicate registry. | +| `ERR_WT_INTENT_UNREGISTERED` | `wt.intent` entry not registered in ADR-010 predicate registry. | +| `ERR_WT_SCOPE_UNAUTHORIZED` | Router policy rejected the declared domain scope. | +| `ERR_WT_PARENT_UNKNOWN` | Optional `wt.parent` reference could not be resolved. | +| `ERR_WT_PARENT_REQUIRED` | Policy required `wt.parent` but the field was omitted. | +| `ERR_SOS_UNKNOWN_KEY` | SOS/1 map contained a key outside the v1-min schema. | +| `ERR_SOS_VERSION_UNSUPPORTED` | `sos.version` not equal to `1`. | +| `ERR_SOS_PREDICATE_UNREGISTERED` | Overlay predicate not registered in the CRS predicate registry. | +| `ERR_SOS_POLICY_INCOMPATIBLE` | `sos.policy` outside `{0,1,2}` or disallowed for the deployment lane. | +| `ERR_SOS_SIGNATURE_INVALID` | Signature validation over `"AMDUAT:SOS\0"` failed. | +| `ERR_SOS_COMPAT_EVIDENCE_REQUIRED` | Compat overlays missing MPR/1 + IER/1 references. | +| `ERR_SOS_TIMESTAMP_REGRESSION` | Overlay timestamp regressed relative to policy baseline. | + +### 7.8 FLS/1 and CRS/1 Byte Semantics + +Phase 04 establishes deterministic linkage between FLS/1 envelopes and CRS/1 concept graphs. ADR-018 governs the linkage envelope; ADR-020 governs concept and relation payloads. CI harnesses (`tools/ci/run_vectors.py`, `tools/ci/gs_snapshot.py`) provide conformance evidence. + +#### 7.8.1 FLS/1 Envelope TLVs (Draft) + +> **Scope:** Draft wire image aligned with ADR-018 v0.5.0. Stewardship will finalize signature semantics alongside multi-surface publication work. + +| Tag | Field | Type | Card. | Notes | +| ------ | -------------------- | ------ | ----- | ----- | +| `0x60` | `source_cid` | CID | 1 | Deterministic sender artefact/surface. | +| `0x61` | `target_cid` | CID | 1 | Deterministic recipient artefact/surface. | +| `0x62` | `payload_cid` | CID | 1 | Content payload (COR/1 capsule, CRS/1 concept, or CRR/1 relation). | +| `0x63` | `routing_policy_cid` | CID | 0-1 | Optional deterministic policy capsule. | +| `0x64` | `timestamp` | UINT64 | 0-1 | Optional bounded timing evidence (big-endian). | +| `0x65` | `signature` | BYTES | 0-1 | Optional Ed25519 signature with `"AMDUAT:FLS\0"` domain separator. | + +**Envelope rules (draft):** + +* Header MUST present `MAGIC="FLS1"`, `VERSION=0x01`, and zeroed `FLAGS/RSV` bytes. +* TLVs MUST appear in strictly increasing tag order. Duplicate tags ⇒ `ERR_FLS_DUPLICATE_TAG`; reordering ⇒ `ERR_FLS_TAG_ORDER`. +* Unknown tags are rejected until ADR updates extend this table (`ERR_FLS_UNKNOWN_TAG`). +* CID TLVs MUST present 32-byte payloads aligned with ADR-001 ⇒ `ERR_FLS_CID_LENGTH`. +* `timestamp` MUST be exactly eight bytes (UINT64, network byte order) ⇒ `ERR_FLS_TIMESTAMP_LENGTH`. +* `signature` MUST start with `"AMDUAT:FLS\0"` and carry a 64-byte Ed25519 signature ⇒ `ERR_FLS_SIGNATURE_DOMAIN` / `ERR_FLS_SIGNATURE_LENGTH`; failing Ed25519 verification raises `ERR_FLS_SIGNATURE`. +* When supplied, CRS payload bytes MUST hash to the declared `payload_cid` using `SHA-256("CAS:OBJ\0" || payload)` ⇒ `ERR_FLS_PAYLOAD_CID_MISMATCH`. +* CRS payload headers MUST match `CRS1` (concept) or `CRR1` (relation) when linkage metadata declares the type ⇒ `ERR_FLS_PAYLOAD_KIND`. +* Payloads MAY be CRS/1 concepts or CRR/1 relations; FLS/1 envelopes never mutate CRS graphs. + +#### 7.8.2 CRS/1 Concept & Relation TLVs (Normative) + +> **Scope:** Deterministic CRS/1 byte layout as ratified by ADR-020 v1.1.0. All TLVs +> use single-byte tags + single-byte lengths with fixed 32-byte payloads. + +**Concept Header** — `MAGIC="CRS1"`, `VERSION=0x01`, `FLAGS=0x00`, `RSV=0x00`. + +| Tag | Field | Type | Card. | Notes | +| ------ | ------------------ | ---- | ----- | ----- | +| `0x40` | `description_cid` | CID | 1 | Canonical COR/1/BCF descriptor for the concept text/essence. | +| `0x41` | `relations_cid` | CID | 1 | Deterministic list CID of outbound relation CIDs. | + +**Relation Header** — `MAGIC="CRR1"`, `VERSION=0x01`, `FLAGS=0x00`, `RSV=0x00`. + +| Tag | Field | Type | Card. | Notes | +| ------ | ----------------- | ---- | ----- | ----- | +| `0x42` | `source_cid` | CID | 1 | Originating Concept CID. | +| `0x43` | `target_cid` | CID | 1 | Destination Concept or artefact CID. | +| `0x44` | `predicate_cid` | CID | 1 | Registered predicate Concept CID. | + +**Validation rules** + +* Headers MUST match the values above; mismatches reject as malformed. +* TLVs MUST appear exactly once in the order listed. Missing or out-of-order + TLVs ⇒ `ERR_CRS_TAG_ORDER` (concept) or `ERR_CRR_TAG_ORDER` (relation). +* Duplicate relation tags ⇒ `ERR_CRR_DUPLICATE_TAG`. +* TLV payloads MUST be exactly 32 bytes ⇒ `ERR_CRS_LENGTH_MISMATCH` / `ERR_CRR_LENGTH_MISMATCH`. +* Unknown tags are rejected ⇒ `ERR_CRS_UNKNOWN_TAG` / `ERR_CRR_UNKNOWN_TAG`. +* `predicate_cid` MUST reference a CRS Concept (`ERR_CRR_PREDICATE_NOT_CONCEPT`). When a predicate taxonomy exists, predicates MUST declare `is_a → Predicate` (`ERR_CRR_PREDICATE_CLASS_MISSING`). + +**Error mapping (ADR-006)** + +| Code | Condition | +| ---- | --------- | +| `ERR_CRS_TAG_ORDER` | Concept TLVs missing, duplicated, or out of order. | +| `ERR_CRS_LENGTH_MISMATCH` | Concept TLV payload not exactly 32 bytes. | +| `ERR_CRS_UNKNOWN_TAG` | Concept TLV tag outside `0x40–0x41`. | +| `ERR_CRR_TAG_ORDER` | Relation TLVs missing, duplicated, or out of order. | +| `ERR_CRR_LENGTH_MISMATCH` | Relation TLV payload not exactly 32 bytes. | +| `ERR_CRR_UNKNOWN_TAG` | Relation TLV tag outside `0x42–0x44`. | +| `ERR_CRR_DUPLICATE_TAG` | Duplicate relation TLV encountered. | +| `ERR_CRR_PREDICATE_NOT_CONCEPT` | `predicate_cid` did not resolve to a CRS Concept. | +| `ERR_CRR_PREDICATE_CLASS_MISSING` | Predicate Concept missing `is_a → Predicate` taxonomy edge. | + +**CID derivation** + +``` +concept_cid = SHA-256("CAS:OBJ\0" || bytes(CRS/1 concept record)) +relation_cid = SHA-256("CAS:OBJ\0" || bytes(CRR/1 relation record)) +``` + +Byte-identical records MUST yield identical CIDs; any mutation requires a new +record. + +### 7.9 WT/1 Audited Ticket Intake (Normative) + +WT/1 (ADR-023) captures auditable intent-to-change tickets as an ADR-003 BCF/1 +map. Keys are UTF-8 strings sorted lexicographically; values use canonical BCF +types. + +| Key | Type | Cardinality | Notes | +| -------------- | ----------------- | ----------- | ----- | +| `wt.version` | UINT8 | 1 | MUST equal `1`. | +| `wt.author` | CID (hex string) | 1 | CRS Concept or DID capsule representing the submitting actor. | +| `wt.scope` | CID (hex string) | 1 | ADR-010B domain scope concept CID. | +| `wt.intent` | LIST | 1 | Non-empty ADR-010 intent identifiers; deduped and byte-lexicographically sorted. | +| `wt.payload` | CID (hex string) | 1 | CRS manifest, change plan, or opaque payload describing proposed work. | +| `wt.timestamp` | UINT64 | 1 | Epoch seconds; MUST be monotonic per `wt.author`. | +| `wt.pubkey` | BYTES[32] | 1 | Ed25519 public key used to verify `wt.signature`; MUST bind to `wt.author`. | +| `wt.signature` | BYTES[64] | 1 | Ed25519 signature over `H("AMDUAT:WT\0" || canonical_bytes_without_signature)`. | +| `wt.parent` | CID (hex string) | 0–1 | Optional lineage pointer to the previous WT/1 ticket for the same author. | + +**Encoding rules** + +1. `wt.intent` MUST be encoded as a list of unique UTF-8 strings sorted + lexicographically; duplicates ⇒ `ERR_WT_INTENT_DUP`; entries not registered in + ADR-010 ⇒ `ERR_WT_INTENT_UNREGISTERED`. +2. CIDs serialize as lowercase hex strings (32 bytes → 64 hex chars) matching + `SHA-256("CAS:OBJ\0" || payload)` outputs. +3. `wt.signature` is a 64-byte Ed25519 signature; `wt.pubkey` supplies the + 32-byte verification key. The signature domain-separates with + `"AMDUAT:WT\0"` and excludes the `wt.signature` field from the canonical byte + stream hashed for verification. + +**Validation** + +1. Unknown keys ⇒ `ERR_WT_UNKNOWN_KEY`. +2. `wt.version != 1` ⇒ `ERR_WT_VERSION_UNSUPPORTED`. +3. Empty `wt.intent` ⇒ `ERR_WT_INTENT_EMPTY`. +4. `wt.timestamp` less than the prior accepted ticket for the same `wt.author` + ⇒ `ERR_WT_TIMESTAMP`. When `wt.parent` is provided, its timestamp MUST NOT + exceed the child timestamp; violations ⇒ `ERR_WT_TIMESTAMP`. +5. Signature verification failure ⇒ `ERR_WT_SIGNATURE`. +6. Routers MUST verify `has_pubkey(wt.author, wt.pubkey)` (or registered + equivalent) ⇒ missing edge raises `ERR_WT_KEY_UNBOUND`. +7. Unknown ADR-010 intent ⇒ `ERR_WT_INTENT_UNREGISTERED`. +8. Router policy rejection of `wt.scope` ⇒ `ERR_WT_SCOPE_UNAUTHORIZED`. +9. Provided `wt.parent` that cannot be resolved ⇒ `ERR_WT_PARENT_UNKNOWN`. +10. Policy required lineage but omitted `wt.parent` ⇒ `ERR_WT_PARENT_REQUIRED`. + +**Router integration** + +* `POST /wt` (Protected Area) accepts WT/1 payloads, verifies signatures against + `wt.pubkey`, enforces ADR-010 intent membership, validates optional + `wt.parent` lineage, and rejects timestamp regressions. +* `GET /wt/:cid` returns canonical WT/1 bytes for replay. +* `GET /wt?after=&limit=` paginates deterministically by CID + (byte-lexicographic). `after` is an exclusive bound; routers enforce + `1 ≤ limit ≤ Nmax` and MUST preserve stable replay windows. +* Responses MUST include canonical WT/1 bytes; no rewriting or reformatting is + permitted. + +**Evidence & vectors** + +* `/amduat/logs/ph04/evidence/wt1/PH04-EV-WT-001/summary.md` — validator run linking + router behaviour to vectors. +* `/amduat/vectors/ph04/wt1/` — fixtures `TV-WT-001…009` covering success, + unknown key, signature failure, timestamp regression, key unbound, intent + unregistered, parent timestamp inversion, scope policy rejection, and + unresolved parent lineage. + +### 7.10 CT/1 Header (Normative) + +CT/1 headers serialize as ADR-003 BCF/1 maps with fixed key ordering. Keys and +types: + +| Key | Type | Notes | +| --------------------- | -------- | ----- | +| `ct.version` | `UINT8` | MUST equal `1`. | +| `ct.rcs_version` | `UINT8` | RCS/1 core schema version; MUST equal `1`. | +| `ct.topology` | `CID` | CRS/1 topology or manifest CID. | +| `ct.ac` | `CID` | AC/1 descriptor CID (ADR-028). | +| `ct.dtf` | `CID` | DTF/1 policy CID (ADR-028). | +| `ct.determinism_level`| `UINT8` | `0` = D1 (bit-exact), `1` = D2 (numeric stable). | +| `ct.kernel_cfg` | `CID` | Opaque kernel/tolerance configuration manifest. | +| `ct.tick` | `UINT64` | Monotonically increasing replay sequence number. | +| `ct.signature` | `BYTES` | 64-byte Ed25519 signature payload. | + +**Validation** + +1. BCF decode failures ⇒ `ERR_CT_MALFORMED`. +2. Key set/order mismatches ⇒ `ERR_CT_UNKNOWN_KEY`. +3. `ct.version` or `ct.rcs_version` ≠ `1` ⇒ `ERR_CT_VERSION`. +4. `ct.determinism_level ∉ {0,1}` ⇒ `ERR_CT_DET_LEVEL`. +5. Non-canonical CID strings ⇒ `ERR_CT_CID`. +6. `ct.tick` outside `UINT64` range or non-monotone progression ⇒ + `ERR_CT_FIELD_TYPE` / `ERR_CT_TICK`. +7. `ct.signature` length mismatch or Ed25519 verification failure ⇒ + `ERR_CT_SIGNATURE`. + +**Signature rules** + +`ct.signature` signs `H("AMDUAT:CT\0" || canonical_bytes_without_signature)`. Public +keys are registered in the determinism catalogue (this section) and referenced by +`ct.kernel_cfg` as needed for tolerance disclosure. + +**Evidence & vectors** + +* `/amduat/tools/validate/ct1_validator.py` — validation helper covering CT/1, + AC/1, and DTF/1 schemas. +* `/amduat/vectors/ph05/ct1/` — fixtures `TV-CT1-001…004`, `TV-AC1-001…002`, + `TV-DTF1-001…002`. +* `/amduat/tools/ci/ct_replay.py` — replay harness producing + `/amduat/logs/ph05/evidence/ct1/PH05-EV-CT1-REPLAY-001/` (D1 parity + D2 + tolerance runs). + +### 7.11 SOS/1 Semantic Overlays (Normative) + +SOS/1 (ADR-024) attaches typed overlays to CRS Concepts or Relations via an +ADR-003 BCF/1 map signed with the `"AMDUAT:SOS\0"` domain separator. + +| Key | Type | Cardinality | Notes | +| -------------- | ------------ | ----------- | ----- | +| `sos.version` | UINT8 | 1 | MUST equal `1`. | +| `sos.subject` | CID (hex) | 1 | CRS Concept or Relation CID receiving the overlay. | +| `sos.predicate`| CID (hex) | 1 | Registered predicate concept describing overlay semantics. | +| `sos.value` | CID (hex) | 1 | Opaque payload (text capsule, BCF/1 manifest, etc.). | +| `sos.policy` | ENUM | 1 | `0=open`, `1=curated`, `2=compat`. | +| `sos.timestamp`| UINT64 | 1 | Epoch seconds when authored. | +| `sos.signature`| BYTES[64] | 1 | Ed25519 signature over `H("AMDUAT:SOS\0" || canonical_bytes_without_signature)`. | + +**Validation** + +1. Unknown keys ⇒ `ERR_SOS_UNKNOWN_KEY`. +2. `sos.version != 1` ⇒ `ERR_SOS_VERSION_UNSUPPORTED`. +3. `sos.predicate` MUST resolve to a registered CRS predicate ⇒ + `ERR_SOS_PREDICATE_UNREGISTERED`. +4. `sos.policy` outside `{0,1,2}` or disallowed for deployment ⇒ + `ERR_SOS_POLICY_INCOMPATIBLE`. +5. Epoch-second timestamps that regress relative to policy baseline MAY raise + `ERR_SOS_TIMESTAMP_REGRESSION`. +6. Signature verification failure ⇒ `ERR_SOS_SIGNATURE_INVALID`. +7. Compat overlays (`sos.policy = 2`) MUST reference MPR/1 + IER/1 artefacts in + certification evidence ⇒ missing references raise + `ERR_SOS_COMPAT_EVIDENCE_REQUIRED`. + +**Router integration** + +* `POST /sos` (Protected Area) validates predicate registry membership, policy + lane, timestamp discipline, and signatures. +* `GET /sos/:cid` returns canonical SOS/1 bytes for replay. +* `GET /sos?subject=&after=&limit=` paginates overlays + deterministically by CID with stable replay windows. +* Compat responses MUST surface referenced MPR/1 hashes and IER/1 fingerprints + for auditors. + +**Evidence & vectors** + +* `/amduat/logs/ph04/evidence/sos1/PH04-EV-SOS-001/summary.md` — validator run covering + `TV-SOS-001…006`. +* `/amduat/vectors/ph04/sos1/` — canonical overlay fixtures exercising success, + unregistered predicate, policy mismatch, signature failure, timestamp + regression, and compat evidence gaps. + +### 7.12 MPR/1 Model Provenance (Normative) + +MPR/1 (ADR-025 v1.0.0) captures canonical model fingerprint triples for compat +policy lanes. + +| Key | Type | Cardinality | Notes | +| ------------------ | ------------ | ----------- | ----- | +| `mpr.version` | UINT8 | 1 | MUST equal `1`. | +| `mpr.model_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of model artefact. | +| `mpr.weights_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of weights bundle. | +| `mpr.tokenizer_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of tokenizer assets. | +| `mpr.build_info` | CID *(optional)* | 0..1 | Immutable build metadata capsule. | +| `mpr.signature` | BYTES[64] *(optional)* | 0..1 | Ed25519 signature over `"AMDUAT:MPR\0" || canonical_bytes_without_signature`. | + +**Validation** + +1. Unknown keys ⇒ `ERR_MPR_UNKNOWN_KEY`. +2. `mpr.version != 1` ⇒ `ERR_MPR_VERSION`. +3. Missing hash fields ⇒ `ERR_MPR_MISSING_FIELD`. +4. Hash fields not lowercase hex (≥64) ⇒ `ERR_MPR_HASH_FORMAT`; zero digests ⇒ `ERR_MPR_HASH_ZERO`. +5. `mpr.build_info` malformed ⇒ `ERR_MPR_BUILD_INFO`. +6. Signature verification failure ⇒ `ERR_MPR_SIGNATURE`. + +**Evidence & vectors** + +* `/amduat/logs/ph04/evidence/mpr1/PH04-EV-MPR-001/pass.jsonl` — validator harness (`python tools/ci/run_mpr_vectors.py`) covering `TV-MPR-001…003` with summary in `summary.md`. +* `/amduat/vectors/ph04/mpr1/` — fixtures exercising valid record, missing weights hash, and signature domain mismatch. + +### 7.13 IER/1 Inference Evidence (Normative) + +IER/1 (ADR-026 v1.0.0) binds FER/1 receipts to compat policy envelopes and MPR/1 fingerprints. + +| Key | Type | Cardinality | Notes | +| ------------------------ | --------------- | ----------- | ----- | +| `ier.version` | UINT8 | 1 | MUST equal `1`. | +| `ier.fer_cid` | CID | 1 | Referenced FER/1 receipt. | +| `ier.executor_fingerprint` | CID | 1 | MUST equal linked MPR/1 CID. | +| `ier.determinism_level` | ENUM | 1 | FER/1 determinism indicator. | +| `ier.rng_seed` | HEX *(conditional)* | 0..1 | Required (hex) when determinism ≠ `D1`. | +| `ier.policy_cid` | CID | 1 | Compat policy capsule authorising run. | +| `ier.log_digest` | HEX | 1 | `H("AMDUAT:IER:LOG\0" || concat(log.sha256))`. | +| `ier.log_manifest` | MAP *(optional)* | 0..1 | Non-empty list of log entries with `sha256`. | +| `ier.attestations` | LIST *(optional)* | 0..1 | Policy attestations (Ed25519 signatures). | + +**Validation** + +1. Unknown keys ⇒ `ERR_IER_UNKNOWN_KEY`. +2. `ier.version != 1` ⇒ `ERR_IER_VERSION`. +3. Malformed CIDs ⇒ `ERR_IER_POLICY`. +4. `ier.executor_fingerprint` mismatch ⇒ `ERR_IER_FINGERPRINT`. +5. Missing RNG seed when determinism ≠ `D1` ⇒ `ERR_FER_RNG_REQUIRED`. +6. `ier.log_digest` mismatch or malformed manifest ⇒ `ERR_IER_LOG_HASH` / `ERR_IER_LOG_MANIFEST`. +7. Attestation payloads not raw bytes ⇒ `ERR_IER_MALFORMED`. + +**Evidence & vectors** + +* `/amduat/logs/ph04/evidence/ier1/PH04-EV-IER-001/pass.jsonl` — validator harness (`python tools/ci/run_ier_vectors.py`) covering `TV-IER-001…004` with manifest summary in `summary.md`. +* `/amduat/vectors/ph04/ier1/` — fixtures exercising success, missing RNG seed, fingerprint mismatch, and log digest mismatch. + +--- + +## 8 – Test Vectors & Conformance + +### 8.1 COR/1 & ICD/1 + +* Payload → CID (algo `0x01`). +* COR/1 streams → CID and back (round-trip identity). +* ICD/1 → `instance_id`. + +### 8.2 FCS/1 v1-min + +* Positive: `{0x30,0x31,0x32}` only, strict order, valid PCB1, acyclic. +* Negative: any pre-v1-min tags (`0x33/0x34/0x35/0x36`) ⇒ reject per §7.2. +* Arity/PCB mismatch ⇒ `ERR_PCB_ARITY_MISMATCH`. +* Cycle ⇒ `ERR_FCS_CYCLE_DETECTED`. +* Negative: legacy tags (`0x33-0x36`) → `ERR_FCS_UNKNOWN_TAG` per §7.2. + +### 8.3 FER/1 + +* Signed receipt with monotonic timestamps; verify signature, executor set ↔ parity alignment, and linkage to FCS/1. +* Negative: timestamp inversion ⇒ `ERR_FER_TIMESTAMP`; bad signature ⇒ `ERR_FER_SIGNATURE`. +* Negative: parity drift (mismatched executor keys or output digests) ⇒ `ERR_IMPL_PARITY`. +* Negative: unknown TLV tag/cardinality ⇒ `ERR_FER_UNKNOWN_TAG`. + +### 8.4 FCT/1 + +* Multiple FER/1 receipts for same function; verify attestation coverage by policy. +* Negative: mismatched receipt function ⇒ `ERR_FCT_RECEIPT_MISMATCH`. +* Negative: missing attestation when policy ≠ Open ⇒ `ERR_FCT_ATTESTATION_REQUIRED`. + +### 8.5 FPD/1 + +* Deterministic reconstruction of `fpd.digest` over `{FCT/1 bytes, FER/1 receipts, governance edge capsule}` on repeated runs. +* Negative: perturbation of member ordering ⇒ `ERR_FPD_DIGEST`. +* Negative: timestamp regression versus FER receipts or parent digest ⇒ `ERR_FPD_TIMESTAMP`. + +**CI Requirements** + +* Import/export **byte-identity** round-trip for COR/1/FCS/1/FER/1. +* Canonical TLV/BCF ordering across descriptors. +* Multi-platform reproducibility (≥3) including signature verification parity. +* Timing evidence captured per SRS FR-020 (deterministic envelope). +* Federation digest fixture verifies stable FPD/1 CID under `tools/ci/fct_publish_check.py`. + +--- + +## 9. Security Considerations + +* Domain separation strings MUST be exact. +* Hash **exact payload bytes**, never decoded structures. +* Canonical rejection prevents ambiguous encodings. +* Certification places policy/intent in signed FCT/1, not in execution recipes. + +--- + +## 10. Change Management + +* **Behavioural semantics are in SRS.** +* Changes here require ADR + CCP approval. +* Versioning follows semantic versioning of encodings. +* On approval, update IDX and SRS references accordingly. + +--- + +## 11. ByteStore API & Persistence Discipline + +ByteStore is the canonical persistence boundary layered over COR/1 and ICD/1. +Implementations **must** honour the behaviours in this section; deviations are +governed by ADR-030. + +### 11.1 API Surface + +| API | Signature | Behaviour | Error Surfaces (ADR-006) | +| -------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------- | ---------------------------------------------------- | +| `put` | `(payload: bytes) → cid_hex` | Persist raw payload under CID derived from `H("CAS:OBJ\0" || payload)`. | `ERR_POLICY_SIZE`, `ERR_IDENTITY_MISMATCH` | +| `put_stream` | `(chunks: Iterable[bytes]) → cid_hex` | Deterministic chunked ingest; concatenated bytes hash to the same CID as `put`. | `ERR_STREAM_ORDER`, `ERR_STREAM_TRUNCATED` | +| `import_cor` | `(envelope: bytes) → cid_hex` | Validate COR/1, enforce policy, persist canonical envelope without re-encoding. | `ERR_POLICY_SIZE`, COR/1 decoder errors | +| `export_cor` | `(cid_hex: str) → envelope` | Return stored COR/1 bytes; must match the original import byte-for-byte. | `ERR_STORE_MISSING`, `ERR_IDENTITY_MISMATCH` | +| `get` | `(cid_hex: str) → bytes` | Return stored bytes (payload or COR envelope) exactly as persisted. | `ERR_STORE_MISSING` | +| `stat` | `(cid_hex: str) → {present: bool, size: int}` | Probe object presence and payload/envelope size without mutating state. | `ERR_STORE_MISSING` (absence reported via `present`) | +| `assert_area_isolation` | `(public_root: Path, secure_root: Path) → None` | Enforce SA/PA separation; raise if roots overlap or share ancestry. | `ERR_AREA_VIOLATION` | + +### 11.2 Deterministic Identity + +Canonical identity is derived per COR/1/SRS: + +``` +cid = algo_id || H("CAS:OBJ\0" || payload) +``` + +`algo_id` defaults to `0x01` (SHA-256). ByteStore **must** reuse the exact +domain separator and hash to remain compatible with CAS and DDS §1. + +### 11.3 COR/1 Round-Trip Identity + +`import_cor()` decodes the envelope, enforces policy (size ≤ ICD/1 +`max_object_size`), and persists the canonical bytes. `export_cor()` returns the +exact stored envelope; re-encoding is forbidden. Derived CID **must** equal the +envelope’s CID (DDS §2.5, SRS FR-BS-004). + +### 11.4 Atomic fsync Ladder + +All writes follow the deterministic ladder: + +1. Write payload/envelope to a unique `.tmp-` file in the shard. +2. `fsync(tmp)` to guarantee payload durability. +3. `rename(tmp, final)`. +4. `fsync(shard directory)` and then `fsync(ByteStore root)`. + +Crash-window simulation is exposed via `AMDUAT_BYTESTORE_CRASH_STEP` (“before_rename”). +Implementations **must** honour the hook and leave PA consistent on recovery +(DDS §11.8; vectors TV-BS-005, evidence bundle PH05-EV-BS-001). + +### 11.5 SA/PA Isolation & Pathing + +Public area (PA) payloads live under case-stable two-level fan-out (`/aa/bb/cid…`). +Secure area (SA) metadata is held outside the PA tree. `assert_area_isolation()` +enforces: + +* `public_root != secure_root` +* neither root is an ancestor of the other + +Violations raise `ERR_AREA_VIOLATION` and **must** be surfaced by callers. + +### 11.6 Chunked Ingest Determinism & Policy + +`put_stream()` concatenates byte chunks in order, rejecting non-bytes input or +missing data. The resulting CID **must** equal `put(payload)` for the same +payload (SRS FR-BS-005). ByteStore enforces ICD/1 `max_object_size` prior to +persisting data; exceeding the limit raises `ERR_POLICY_SIZE`. + +### 11.7 Error Mapping + +| Condition | Error Code | Notes | +| ---------------------------------- | --------------------- | -------------------------------------------------------------- | +| Payload exceeds policy limit | `ERR_POLICY_SIZE` | ICD/1 `max_object_size` (ADR-006 policy lane). | +| Streaming chunk type/order invalid | `ERR_STREAM_ORDER` | Non-bytes or out-of-order chunks (deterministic rejection). | +| Streaming missing payload | `ERR_STREAM_TRUNCATED`| Zero-length stream without payload. | +| Stored bytes mismatch CID | `ERR_IDENTITY_MISMATCH` | Raised when existing bytes conflict with derived identity. | +| SA/PA overlap | `ERR_AREA_VIOLATION` | Shared roots or ancestry (secure/public crossing). | +| Crash-window hook triggered | `ERR_CRASH_SIMULATION`| Simulated crash prior to rename/fsync ladder completion. | +| Missing object | `ERR_STORE_MISSING` | Reported when an object path is absent. | + +All other errors bubble from COR/1 decoding and map to existing ADR-006 codes +(see §2.7). + +### 11.8 Conformance & Evidence + +* Vectors: `/amduat/vectors/ph05/bytestore/` (`TV-BS-001…005`). +* Runner: `/amduat/tools/ci/bs_check.py` (dual-run determinism; emits JSONL). +* Evidence: `/amduat/logs/ph05/evidence/bytestore/PH05-EV-BS-001/` (runA/runB + + crash summary). +* Linked ADR: ADR-030 (ByteStore Persistence Contract). + +--- + +## Appendix A — Surface Version Table + +| Surface | Version | Notes | +| ------- | ------- | ----- | +| FCS/1 | v1-min | Execution-only descriptor (ADR-016); governance fields live in FCT/1. | +| FER/1 | v1.1 | Parity-first receipts with run_id dedup, executor fingerprints, typed logs, RNG envelope (ADR-017). | +| FCT/1 | v1.0 | Certification transactions binding policy/intent/attestations; publishes FER/1 receipts. | +| FPD/1 | v1.0 | Single-digest publication capsule linking FCT/1 and FER/1 sets. | + +--- + +**End of DDS 0.5.0** + +--- + +## Document History + +* 0.2.1 (2025-10-26) — Updated Phase Pack references; byte semantics unchanged; ADR-012 no-normalization. + +* 0.2.2 (2025-10-26) — Promoted PH01 design surfaces to Approved; synchronized anchors. + +* 0.2.3 (2025-10-27) — Marked DDS scope as PH01-only and referenced FPS/1 surfaces. + +* **0.2.4 (2025-11-14):** Added FCS/1 & PCB1 TLVs plus FER/1 receipt and FCT/1 transaction schemas with rejection mapping. + +* **0.2.5 (2025-11-15):** Registered PCB1 header invariants and arity/cycle validation errors. + +* **0.2.6 (2025-11-19):** Registered `ERR_EXEC_TIMEOUT` for deterministic timing envelope. + +* **0.3.0 (2025-11-02):** Trimmed **FCS/1 to v1-min** (execution recipe only: `function_ptr`, `parameter_block`, `arity`). Moved **intent/roles/scope/policy** to **FCT/1**; clarified provenance lives in **FER/1**. Added rejection guidance for legacy FCS tags. + +* **0.3.1 (2025-11-20):** Registered `ERR_FCS_UNKNOWN_TAG`; clarified that any legacy governance tag in FCS/1 is a hard rejection. No other layout changes. +* **0.3.2 (2025-11-21):** Adopted parity-first FER/1 TLVs (executor set, parity vector, context/witness hooks), registered `ERR_IMPL_PARITY` and `ERR_FER_UNKNOWN_TAG`, and refreshed conformance guidance. +* **0.3.3 (2025-11-22):** Added FPD/1 publication digest schema, registered federation digest/timestamp errors, and wired CI fixtures to deterministic publish checks. + +* **0.3.5 (2025-11-07):** Added surface version table and aligned FER/1 v1.1 maintenance metadata for Phase 04 handoff. + +* **0.3.6 (2025-11-08):** Seeded PH04 linkage & semantic placeholder section (DDS §7.8). + +* **0.3.7 (2025-11-08):** Seeded FLS/1 placeholder TLV table aligned with ADR-018 v0.3.0. +* **0.3.8 (2025-11-08):** Registered FLS/1 TLV registry (0x60–0x65), error mapping, and conformance vectors aligned with ADR-018 v0.4.0. +* **0.3.9 (2025-11-09):** Locked CRS/1 concept/relation TLVs and registered FLS payload CID/type errors with conformance evidence. + +* **0.4.0 (2025-11-08):** Promoted §7.8 FLS/1 & CRS/1 TLVs with error mapping and GS/1 snapshot evidence. + +* **0.4.1 (2025-11-09):** Extended CRS predicate rules and mapped new validation errors +* **0.4.2 (2025-11-09):** Registered router error codes (`ERR_FLS_UNKNOWN_TAG`, `ERR_FLS_TAG_ORDER`, `ERR_FLS_SIGNATURE`) and FPD parent-policy errors with GS diff evidence pointer. +* **0.4.3 (2025-11-09):** Added WT/1 intake layout, validation errors, and router API integration (§7.9). +* **0.4.4 (2025-11-20):** Refined WT/1 (§7.9) with `wt.pubkey`, signature preimage exclusion, lineage/policy errors, and + expanded validator vector coverage. + +* **0.4.6 (2025-11-22):** WT/1 and SOS/1 conformance evidence sealed via PH04-M4/M5 audit bundles. +* **0.4.5 (2025-11-21):** Registered SOS/1 overlays (§7.10) with compat evidence enforcement, aligned WT/1 error mapping (`ERR_WT_KEY_UNBOUND`, `ERR_WT_INTENT_UNREGISTERED`, `ERR_WT_PARENT_REQUIRED`), and expanded vector coverage to `TV-WT-001…009`. + +* **0.4.7 (2025-11-23):** Documented MPR/1 and IER/1 schemas, error surfaces, and validator evidence for compat policy lane. + +* **0.4.8 (2025-11-24):** Added §7.10 CT/1 header schema with error codes and renumbered downstream sections for PH05 replay. + +* **0.5.0 (2025-11-11):** Added §11 ByteStore API & Persistence discipline covering API surface, fsync ladder, SA/PA isolation, streaming determinism, and ADR-006 error mapping. diff --git a/tier1/enc-asl-core-index-1.md b/tier1/enc-asl-core-index-1.md new file mode 100644 index 0000000..062aedb --- /dev/null +++ b/tier1/enc-asl-core-index-1.md @@ -0,0 +1,357 @@ +# ENC/ASL-CORE-INDEX/1 — Encoding Specification for ASL Core Index + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [encoding, index, deterministic] + + + +**Document ID:** `ENC/ASL-CORE-INDEX/1` +**Layer:** Index Encoding Profile (on top of ASL/1-CORE-INDEX + ASL/STORE-INDEX/1) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` — semantic index model +* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts + +**Informative references:** + +* `ASL/LOG/1` — append-only log semantics + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 1. Purpose + +This document defines the **exact encoding of ASL index segments** and records for storage and interoperability. + +It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**. +Variable-length digest requirements are defined in ASL/1-CORE-INDEX (`tier1/asl-core-index.md`). +This document incorporates the federation encoding addendum. + +It is intended for: + +* C libraries +* Tools +* API frontends +* Memory-mapped access + +It does **not** define: + +* Index semantics (see ASL/1-CORE-INDEX) +* Store lifecycle behavior (see ASL-STORE-INDEX) +* Acceleration semantics (see ASL/INDEX-ACCEL/1) +* TGK edge semantics or encodings (see `TGK/1` and `TGK/1-CORE`) +* Federation semantics (see federation/domain policy layers) + +--- + +## 2. Encoding Principles + +1. **Little-endian** representation +2. **Fixed-width fields** for deterministic access +3. **No pointers or references**; all offsets are file-relative +4. **Packed structures**; no compiler-introduced padding +5. **Forward compatibility** via version field +6. **CRC or checksum protection** for corruption detection +7. **Federation metadata** embedded in index records for deterministic cross-domain replay + +All multi-byte integers are little-endian unless explicitly noted. + +--- + +## 3. Segment Layout + +Each index segment file is laid out as follows: + +``` ++------------------+ +| SegmentHeader | ++------------------+ +| BloomFilter[] | (optional, opaque to semantics) ++------------------+ +| IndexRecord[] | ++------------------+ +| DigestBytes[] | ++------------------+ +| ExtentRecord[] | ++------------------+ +| SegmentFooter | ++------------------+ +``` + +* **SegmentHeader**: fixed-size, mandatory +* **BloomFilter**: optional, opaque, segment-local +* **IndexRecord[]**: array of index entries +* **DigestBytes[]**: concatenated digest bytes referenced by IndexRecord +* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord +* **SegmentFooter**: fixed-size, mandatory + +Offsets in the header define locations of Bloom filter and index records. + +### 3.1 Fixed Constants and Sizes + +**Magic bytes (SegmentHeader.magic):** `ASLIDX03` + +* ASCII bytes: `0x41 0x53 0x4c 0x49 0x44 0x58 0x30 0x33` +* Little-endian uint64 value: `0x33305844494c5341` + +**Current encoding version:** `3` + +**Fixed struct sizes (bytes):** + +* `SegmentHeader`: 112 +* `IndexRecord`: 48 +* `ExtentRecord`: 16 +* `SegmentFooter`: 24 + +**Section packing (no gaps):** + +* `records_offset = header_size + bloom_size` +* `digests_offset = records_offset + (record_count * sizeof(IndexRecord))` +* `extents_offset = digests_offset + digests_size` +* `SegmentFooter` starts at `extents_offset + (extent_count * sizeof(ExtentRecord))` + +All offsets MUST be file-relative, 8-byte aligned, and point to their respective arrays exactly as above. + +### 3.2 Federation Defaults + +This encoding integrates federation metadata into segments and records. + +Legacy segments without federation fields MUST be treated as: + +* `segment_domain_id = local` +* `segment_visibility = internal` +* `domain_id = local` +* `visibility = internal` +* `has_cross_domain_source = 0` +* `cross_domain_source = 0` + +--- + +## 4. SegmentHeader + +```c +#pragma pack(push,1) +typedef struct { + uint64_t magic; // Unique magic number identifying segment file type + uint16_t version; // Encoding version + uint16_t shard_id; // Optional shard identifier + uint32_t header_size; // Total size of header including fields below + + uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid + uint64_t snapshot_max; // Maximum snapshot ID + + uint64_t record_count; // Number of index entries + uint64_t records_offset; // File offset of IndexRecord array + + uint64_t bloom_offset; // File offset of bloom filter (0 if none) + uint64_t bloom_size; // Size of bloom filter (0 if none) + + uint64_t digests_offset; // File offset of DigestBytes array + uint64_t digests_size; // Total size in bytes of DigestBytes + + uint64_t extents_offset; // File offset of ExtentRecord array + uint64_t extent_count; // Total number of ExtentRecord entries + + uint32_t segment_domain_id; // Domain owning this segment + uint8_t segment_visibility; // 0 = internal, 1 = published + uint8_t federation_version; // 0 if unused + uint16_t reserved0; // Reserved (must be 0) + + uint64_t flags; // Segment flags (must be 0 in version 3) +} SegmentHeader; +#pragma pack(pop) +``` + +**Notes:** + +* `magic` ensures the reader validates the segment type. +* `version` allows forward-compatible extension. +* `snapshot_min` / `snapshot_max` are reserved for future use and carry no visibility semantics in version 3. +* `segment_domain_id` identifies the owning domain for all records in this segment. +* `segment_visibility` MUST be the maximum visibility of all records in the segment. +* `federation_version` MUST be `0` unless a future federation encoding version is defined. +* `reserved0` MUST be `0`. +* `header_size` MUST be `112`. +* `flags` MUST be `0`. Readers MUST reject non-zero values. + +--- + +## 5. IndexRecord + +```c +#pragma pack(push,1) +typedef struct { + uint32_t hash_id; // Hash algorithm identifier + uint16_t digest_len; // Digest length in bytes + uint16_t reserved0; // Reserved for alignment/future use + uint64_t digest_offset; // File offset of digest bytes for this entry + + uint64_t extents_offset; // File offset of first ExtentRecord for this entry + uint32_t extent_count; // Number of ExtentRecord entries for this artifact + uint32_t total_length; // Total artifact length in bytes + + uint32_t domain_id; // Domain identifier for this artifact + uint8_t visibility; // 0 = internal, 1 = published + uint8_t has_cross_domain_source; // 0 or 1 + uint16_t reserved1; // Reserved (must be 0) + + uint32_t cross_domain_source; // Source domain if imported (valid if has_cross_domain_source=1) + uint32_t flags; // Optional flags (tombstone, reserved, etc.) +} IndexRecord; +#pragma pack(pop) +``` + +**Notes:** + +* `hash_id` + `digest_len` + `digest_offset` store the artifact key deterministically. +* `digest_len` MUST be explicit in the encoding and MUST match the length implied by `hash_id` and StoreConfig. +* `digest_offset` MUST be within `[digests_offset, digests_offset + digests_size)`. +* `extents_offset` references the first ExtentRecord for this entry. +* `extent_count` defines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX in `tier1/asl-core-index.md`). +* `total_length` is the exact artifact size in bytes. +* Flags may indicate tombstone or other special status. +* `domain_id` MUST be present and stable across replay. +* `visibility` MUST be `0` or `1`. +* `has_cross_domain_source` MUST be `0` or `1`. +* `cross_domain_source` MUST be `0` when `has_cross_domain_source=0`. +* `reserved0` and `reserved1` MUST be `0`. + +### 5.1 IndexRecord Flags + +``` +IDX_FLAG_TOMBSTONE = 0x00000001 +``` + +* If `IDX_FLAG_TOMBSTONE` is set, then `extent_count`, `total_length`, and `extents_offset` MUST be `0`. +* All other bits are reserved and MUST be `0`. Readers MUST reject unknown flag bits. +* Tombstones MUST retain valid `domain_id` and `visibility` to ensure domain-local shadowing. + +--- + +## 6. ExtentRecord + +```c +#pragma pack(push,1) +typedef struct { + uint64_t block_id; // ASL block identifier + uint32_t offset; // Offset within block + uint32_t length; // Length of this extent +} ExtentRecord; +#pragma pack(pop) +``` + +**Notes:** + +* Extents are concatenated in order to produce artifact bytes. +* `extent_count` MUST be > 0 for visible (non-tombstone) entries. +* `total_length` MUST equal the sum of `length` across the extents. +* `offset` and `length` MUST describe a contiguous slice within the referenced block. + +--- + +## 7. SegmentFooter + +```c +#pragma pack(push,1) +typedef struct { + uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents + uint64_t seal_snapshot; // Snapshot ID when segment was sealed + uint64_t seal_time_ns; // High-resolution seal timestamp +} SegmentFooter; +#pragma pack(pop) +``` + +**Notes:** + +* CRC ensures corruption detection during reads, covering all segment contents except the footer. +* Seal information allows deterministic reconstruction of CURRENT state. + +--- + +## 8. DigestBytes + +* Digest bytes are concatenated in a single byte array. +* Each IndexRecord references its digest via `digest_offset` and `digest_len`. +* The digest bytes MUST be immutable once the segment is sealed. + +--- + +## 9. Bloom Filter + +* The bloom filter is **optional** and opaque to semantics. +* Its purpose is **lookup acceleration**. +* Must be deterministic: same entries → same bloom representation. +* Segment-local only; no global assumptions. + +--- + +## 10. Versioning and Compatibility + +* `version` field in header defines encoding. +* Readers must **reject unsupported versions**. +* New fields may be added in future versions only via version bump. +* Existing fields must **never change meaning**. +* Version `1` implies single-extent layout (legacy). +* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`. +* Version `3` introduces variable-length digest bytes with `hash_id` and `digest_offset`. +* Version `3` also integrates federation metadata in segment headers and index records. + +### 10.1 Federation Compatibility Rules + +* Legacy segments without federation fields are treated as local/internal (see 3.2). +* Tombstones MUST NOT shadow artifacts from other domains; domain matching is required. + +--- + +## 11. Alignment and Packing + +* All structures are **packed** (no compiler padding) +* Multi-byte integers are **little-endian** +* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`. +* Extents are accessed via `IndexRecord.extents_offset` relative to the file base. + +--- + +## 12. Summary of Encoding Guarantees + +The ENC-ASL-CORE-INDEX specification ensures: + +1. **Deterministic layout** across platforms +2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation) +3. **Immutability of sealed segments** +4. **Integrity validation** via CRC +5. **Forward-compatible extensibility** + +--- + +## 13. Relationship to Other Layers + +| Layer | Responsibility | +| ------------------ | ---------------------------------------------------------- | +| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping | +| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts | +| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) | +| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence | + +This completes the stack: **semantics → store behavior → encoding**. diff --git a/tier1/enc-asl-log-1.md b/tier1/enc-asl-log-1.md new file mode 100644 index 0000000..f37716a --- /dev/null +++ b/tier1/enc-asl-log-1.md @@ -0,0 +1,248 @@ +# ENC/ASL-LOG/1 — Encoding Specification for ASL Append-Only Log + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [encoding, log, deterministic] + + + +**Document ID:** `ENC/ASL-LOG/1` +**Layer:** Log Encoding Profile (on top of ASL/LOG/1) + +**Depends on (normative):** + +* `ASL/LOG/1` — semantic log behavior and replay rules + +**Informative references:** + +* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 1. Purpose + +This document defines the **exact encoding** of the ASL append-only log. + +It translates **ASL/LOG/1** semantics into a deterministic **bytes-on-disk** format. + +It does **not** define log semantics (see `ASL/LOG/1`). + +--- + +## 2. Encoding Principles + +1. **Little-endian** integers +2. **Packed structures** (no compiler padding) +3. **Forward-compatible** versioning via header fields +4. **Deterministic serialization**: identical log content -> identical bytes +5. **Hash-chained integrity** as defined by ASL/LOG/1 + +--- + +## 3. Log File Layout + +``` ++----------------+ +| LogHeader | ++----------------+ +| LogRecord[] | ++----------------+ +``` + +* **LogHeader**: fixed-size, mandatory, begins file +* **LogRecord[]**: append-only entries, variable number + +--- + +## 4. LogHeader + +```c +#pragma pack(push,1) +typedef struct { + uint64_t magic; // "ASLLOG01" + uint32_t version; // Encoding version (1) + uint32_t header_size; // Total header bytes including this struct + uint64_t flags; // Reserved, must be zero for v1 +} LogHeader; +#pragma pack(pop) +``` + +Notes: + +* `magic` is ASCII bytes: `0x41 0x53 0x4c 0x4c 0x4f 0x47 0x30 0x31` +* `version` allows forward compatibility + +--- + +## 5. LogRecord Envelope + +Each record is encoded as: + +```c +#pragma pack(push,1) +typedef struct { + uint64_t logseq; // Monotonic sequence number + uint32_t record_type; // Record type tag + uint32_t payload_len; // Payload byte length + uint8_t payload[payload_len]; + uint8_t record_hash[32]; // Hash-chained integrity (SHA-256) +} LogRecord; +#pragma pack(pop) +``` + +Hash chain rule (normative): + +``` +record_hash = H(prev_record_hash || logseq || record_type || payload_len || payload) +``` + +* `prev_record_hash` is the previous record's `record_hash` +* For the first record, `prev_record_hash` is 32 bytes of zero +* `H` is SHA-256 for v1 + +Readers MUST skip unknown `record_type` values using `payload_len` and MUST +continue replay without failure. + +--- + +## 6. Record Type IDs (v1) + +These type IDs bind the ASL/LOG/1 semantics to bytes-on-disk: + +| Type ID | Record Type | +| ------- | ------------------ | +| 0x01 | SEGMENT_SEAL | +| 0x10 | TOMBSTONE | +| 0x11 | TOMBSTONE_LIFT | +| 0x20 | SNAPSHOT_ANCHOR | +| 0x30 | ARTIFACT_PUBLISH | +| 0x31 | ARTIFACT_UNPUBLISH | + +--- + +## 6.1 Payload Schemas (v1) + +All payloads are little-endian and packed. Variable-length fields are encoded +inline and accounted for by `payload_len`. + +### 6.1.1 ArtifactRef + +```c +#pragma pack(push,1) +typedef struct { + uint32_t hash_id; // Hash algorithm identifier + uint16_t digest_len; // Digest length in bytes + uint16_t reserved0; // Must be 0 + uint8_t digest[digest_len]; +} ArtifactRef; +#pragma pack(pop) +``` + +Notes: + +* `digest_len` MUST be > 0. +* If StoreConfig fixes the hash, `digest_len` MUST match that hash's length. + +### 6.1.2 SEGMENT_SEAL (Type 0x01) + +```c +#pragma pack(push,1) +typedef struct { + uint64_t segment_id; // Store-local segment identifier + uint8_t segment_hash[32]; // SHA-256 over the segment file bytes +} SegmentSealPayload; +#pragma pack(pop) +``` + +### 6.1.3 TOMBSTONE (Type 0x10) + +```c +#pragma pack(push,1) +typedef struct { + ArtifactRef artifact; + uint32_t scope; // Opaque to ASL/LOG/1 + uint32_t reason_code; // Opaque to ASL/LOG/1 +} TombstonePayload; +#pragma pack(pop) +``` + +### 6.1.4 TOMBSTONE_LIFT (Type 0x11) + +```c +#pragma pack(push,1) +typedef struct { + ArtifactRef artifact; + uint64_t tombstone_logseq; // logseq of the tombstone being lifted +} TombstoneLiftPayload; +#pragma pack(pop) +``` + +### 6.1.5 SNAPSHOT_ANCHOR (Type 0x20) + +```c +#pragma pack(push,1) +typedef struct { + uint64_t snapshot_id; + uint8_t root_hash[32]; // Hash of snapshot-visible state +} SnapshotAnchorPayload; +#pragma pack(pop) +``` + +### 6.1.6 ARTIFACT_PUBLISH (Type 0x30) + +```c +#pragma pack(push,1) +typedef struct { + ArtifactRef artifact; +} ArtifactPublishPayload; +#pragma pack(pop) +``` + +### 6.1.7 ARTIFACT_UNPUBLISH (Type 0x31) + +```c +#pragma pack(push,1) +typedef struct { + ArtifactRef artifact; +} ArtifactUnpublishPayload; +#pragma pack(pop) +``` + +--- + +## 7. Versioning Rules + +* `version = 1` for this specification. +* New record types MAY be added without bumping the version. +* Layout changes to `LogHeader` or `LogRecord` require a new version. + +--- + +## 8. Relationship to Other Layers + +| Layer | Responsibility | +| ---------------- | ------------------------------------------------ | +| ASL/LOG/1 | Semantic log behavior and replay rules | +| ASL-STORE-INDEX | Store lifecycle and snapshot/log contracts | +| ENC-ASL-LOG | Exact byte layout for log encoding (this doc) | +| ENC-ASL-CORE-INDEX | Exact byte layout for index segments | diff --git a/tier1/enc-asl-tgk-exec-plan-1.md b/tier1/enc-asl-tgk-exec-plan-1.md new file mode 100644 index 0000000..4f8a0c6 --- /dev/null +++ b/tier1/enc-asl-tgk-exec-plan-1.md @@ -0,0 +1,202 @@ +# ENC/ASL-TGK-EXEC-PLAN/1 — Execution Plan Encoding + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: No +Last Updated: 2025-01-17 +Linked Phase Pack: N/A +Tags: [encoding, execution, tgk] + + + +**Document ID:** `ENC/ASL-TGK-EXEC-PLAN/1` +**Layer:** L2 — Execution plan encoding (bytes-on-disk) + +**Depends on (normative):** + +* `ASL/TGK-EXEC-PLAN/1` + +**Informative references:** + +* `ENC/ASL-CORE-INDEX/1` + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ENC/ASL-TGK-EXEC-PLAN/1 defines the byte-level encoding for serialized execution plans. It does not define operator semantics. + +--- + +## 1. Operator Type Enumeration + +```c +typedef enum { + OP_SEGMENT_SCAN, + OP_INDEX_FILTER, + OP_MERGE, + OP_PROJECTION, + OP_TGK_TRAVERSAL, + OP_AGGREGATION, + OP_LIMIT_OFFSET, + OP_SHARD_DISPATCH, + OP_SIMD_FILTER, + OP_TOMBSTONE_SHADOW +} operator_type_t; +``` + +--- + +## 2. Operator Flags + +```c +typedef enum { + OP_FLAG_NONE = 0x00, + OP_FLAG_PARALLEL = 0x01, // shard or SIMD capable + OP_FLAG_OPTIONAL = 0x02 // optional operator (acceleration) +} operator_flags_t; +``` + +--- + +## 3. Snapshot Range Structure + +```c +typedef struct { + uint64_t logseq_min; // inclusive + uint64_t logseq_max; // inclusive +} snapshot_range_t; +``` + +--- + +## 4. Operator Parameter Union + +```c +typedef struct { + // SegmentScan parameters + struct { + uint8_t is_asl_segment; // 1 = ASL, 0 = TGK + uint64_t segment_start_id; + uint64_t segment_end_id; + } segment_scan; + + // IndexFilter parameters + struct { + uint32_t artifact_type_tag; + uint8_t has_type_tag; + uint32_t edge_type_key; + uint8_t has_edge_type; + uint8_t role; // 0=none, 1=from, 2=to, 3=both + } index_filter; + + // Merge parameters + struct { + uint8_t deterministic; // 1 = logseq ascending + canonical key + } merge; + + // Projection parameters + struct { + uint8_t project_artifact_id; + uint8_t project_tgk_edge_id; + uint8_t project_node_id; + uint8_t project_type_tag; + } projection; + + // TGKTraversal parameters + struct { + uint64_t start_node_id; + uint32_t traversal_depth; + uint8_t direction; // 1=from, 2=to, 3=both + } tgk_traversal; + + // Aggregation parameters + struct { + uint8_t agg_count; + uint8_t agg_union; + uint8_t agg_sum; + } aggregation; + + // LimitOffset parameters + struct { + uint64_t limit; + uint64_t offset; + } limit_offset; + + // ShardDispatch & SIMDFilter are handled via flags +} operator_params_t; +``` + +--- + +## 5. Operator Definition Structure + +```c +typedef struct operator_def { + uint32_t op_id; // unique operator ID + operator_type_t op_type; // operator type + operator_flags_t flags; // parallel/optional flags + snapshot_range_t snapshot; // snapshot bounds for deterministic execution + operator_params_t params; // operator-specific parameters + + uint32_t input_count; // number of upstream operators + uint32_t inputs[8]; // list of op_ids for input edges (DAG) +} operator_def_t; +``` + +Notes: + +* `inputs` defines DAG dependencies. +* The maximum input fan-in is 8 for v1. + +--- + +## 6. Execution Plan Structure + +```c +typedef struct exec_plan { + uint32_t plan_version; // version of plan encoding + uint32_t operator_count; // number of operators + operator_def_t *operators; // array of operator definitions +} exec_plan_t; +``` + +Operators SHOULD be serialized in topological order when possible. + +--- + +## 7. Serialization Rules (Normative) + +* All integers are little-endian. +* Operators MUST be serialized in a deterministic order. +* `operator_count` MUST match the serialized operator array length. +* `inputs[]` MUST reference valid `op_id` values within the plan. + +--- + +## 8. Non-Goals + +ENC-ASL-TGK-EXEC-PLAN/1 does not define: + +* Runtime scheduling or execution +* Query languages or APIs +* Operator semantics beyond parameter layout diff --git a/tier1/srs.md b/tier1/srs.md new file mode 100644 index 0000000..561336c --- /dev/null +++ b/tier1/srs.md @@ -0,0 +1,554 @@ +# AMDUAT-SRS — Detailed Requirements Specification + +Status: Approved +Owner: Niklas Rydberg +Version: 0.4.0 +SoT: Yes +Last Updated: 2025-11-11 +Linked Phase Pack: PH01 +Tags: [requirements, cas, kheper] + + + +**Document ID:** `AMDUAT-SRS` +**Layer:** L0 — Requirements baseline (CAS + deterministic composition) + +**Depends on (normative):** + +* None (requirements baseline) + +**Informative references:** + +* `AMDUAT-DDS` — byte-level design specification +* ADR-006 — deterministic error semantics +* ADR-015 — CAS rejection matrix alignment + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +> **Purpose:** Capture normative behavioural requirements for Phase PH01 (Kheper) and beyond. Long-lived semantics live here (not in Phase Packs). + +--- + +## 1. Objectives (from Tier-0 Charter; elaborated) + +* Deterministic addressing: identical payload bytes **MUST** yield identical CIDs. +* Immutability: new bytes → new CID; objects MUST NOT be mutated in place. +* Integrity by design: `verify()` MUST detect corruption; zero false positives. +* Instance isolation: storage layout and runtime state are implementation detail. +* Binary canonical substrate: COR/1 is the normative import/export envelope. +* Instance identity: ICD/1 defines stable `instance_id` for future transaction bindings. +* Crypto agility: default SHA-256; algorithm IDs extensible. +* Minimal tooling: reference CLI (`amduatcas`) and C library. +* Conformance: golden vectors and cross-impl CI enforce byte-identity. + +--- + +## 2. Scope (Behavioural) + +### 2.1 In Scope + +* Local, single-node Content-Addressable Storage (CAS) +* Deterministic hashing with domain separation +* Canonical envelopes (COR/1) and instance descriptor (ICD/1) +* CRUD-adjacent operations: put/get/stat/exists/verify +* Import/export of canonical bytestreams +* Optional listing/gc semantics + +### 2.2 Out of Scope (for PH01) + +* Networking, replication, consensus +* Multi-object transactions +* Semantic/provenance graphing +* Encryption/ACLs (layer externally) + +--- + +## 3. Functional Requirements + +### FR-001 Deterministic CID Production + +Given identical payload bytes and algo_id, the CID **MUST** match across compliant implementations. + +### FR-002 Immutability + +Objects **MUST NOT** be mutated; new payload → new CID. + +### FR-003 Idempotent Put + +Concurrent `put()` of identical payload MUST yield one canonical object; object integrity preserved. + +### FR-004 Verification + +`verify(CID)` MUST recompute the CID and detect corruption; zero false positives. + +### FR-005 Import/Export Canonicality + +Importing COR/1 and then exporting it MUST yield byte-identical bytestreams. + +### FR-006 Size Validation + +`get()` MUST validate payload length according to COR/1. + +### FR-007 Optional Verify-on-Read Policy + +Policy MAY require verify for cold reads; MUST NOT corrupt payload if disabled. + +### FR-008 Canonical Rejection + +CAS decoders MUST reject: + +* out-of-order TLV tags +* duplicate TLV tags +* extraneous tags +* trailing bytes +* malformed or over-long VARINT encodings +* payload length mismatches + +Rejection MUST be deterministic and symbolic. + +### FR-009 Concurrency Discipline + +Concurrent `put()` operations for identical payloads MUST NOT yield divergent COR/1 envelopes. Only one canonical envelope may result. + +### FR-010 Raw Byte Semantics + +CAS MUST operate strictly over exact payload bytes. No normalization (newline, whitespace, UTF-8 interpretation, or Unicode equivalence) SHALL occur. + +### FR-011 Filesystem Independence + +Consensus behaviour MUST NOT depend on: + +* directory entry ordering +* timestamp metadata +* filesystem case sensitivity +* locale or regional configuration + +### FR-012 Deterministic Failure + +Malformed objects MUST be rejected. CAS MUST NOT auto-repair or normalize COR/1 envelopes. + +### FR-013 Resource Boundaries + +Resource exhaustion (disk full, allocation failure) MUST fail atomically and leave no partial objects visible. + +### FR-014 FCS/1 Descriptor Determinism (v1-min) + +Composite and custom functions MUST be expressed as canonical **FCS/1** descriptors that contain **only the execution recipe**: +`function_ptr`, `parameter_block (PCB1)`, and `arity`. +Identical descriptors SHALL hash to identical CIDs and MUST remain immutable after publication. **No policy/intent/notes** appear in FCS/1. + +### FR-015 Registry Determinism (Descriptor Admission) + +Functional registries MUST admit **only canonical FCS/1 descriptors** (per FR-014) and enforce descriptor validation (TLV order, PCB1 arity, acyclicity). +Registries MUST NOT infer or embed policy/intent into descriptors; publication governance is handled at certification time (FR-017). + +### FR-016 Evaluation Receipt Integrity (FER/1) + +Every execution of a composite function under curated or locked policies MUST emit a **FER/1** receipt. The receipt SHALL encode, in canonical TLV order, at least the following evidence: + +1. `function_cid` → evaluated FCS/1 descriptor (v1-min) preserving CIP indirection. +2. `input_manifest` → GS/1 BCF/1 set of consumed input CIDs (deduped and byte-lexicographic). +3. `environment` → ICD/1 (or PH03 env capsule) snapshot pinning toolchain/runtime state. +4. `evaluator_id` → stable evaluator identity bytes. +5. `executor_set` → implementations that executed the recipe, keyed in canonical byte order. +6. `parity_vector` → per-executor digests with matching `executor` ordering, shared `output` (`== output_cid`), and `sbom_cid` entries. +7. `executor_fingerprint` + `run_id` → optional SBOM fingerprint CID and deterministic dedup hash (`H("AMDUAT:RUN\0" || function || manifest || env || fingerprint)`). +8. `logs` → typed evidence capsules binding `kind`, `cid`, and `sha256` for stdout/stderr/metrics traces. +9. `limits` → declared execution envelope (`cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`). +10. `determinism_level` / `rng_seed` → declared determinism class (`D1_bit_exact` default, `D2_numeric_stable` requires a 0–32 byte seed). +11. `output_cid` → single canonical output CID for the run. +12. `started_at` / `completed_at` → epoch-second timestamps satisfying FR-020 bounds. +13. `signature` → Ed25519 metadata verifying `H("AMDUAT:FER\0" || canonical bytes)`. + +Receipts MAY include optional `logs` (typed capsules), `context`, `witnesses`, `parent`, and `signature_ext` TLVs but MUST NOT leak policy/intent (those belong to FCT/1). + +From Phase 04 onwards, governance and runtime layers MUST require FER/1 v1.1 receipts; ER/1 artefacts remain valid only as historical evidence and SHALL NOT satisfy FR-016 compliance gates. + +Parity discipline is mandatory: unsorted executor keys or mismatched parity orderings SHALL raise `ERR_IMPL_PARITY_ORDER`; divergent outputs or missing executors SHALL raise `ERR_IMPL_PARITY`. Unknown TLVs or cardinality violations SHALL raise `ERR_FER_UNKNOWN_TAG`. GS/1 manifest violations emit `ERR_FER_INPUT_MANIFEST_SHAPE`; missing RNG seed when determinism ≠ D1 emits `ERR_FER_RNG_REQUIRED`. All signatures MUST verify against the domain-separated hash (`ERR_FER_SIGNATURE` on failure). + +### FR-017 Certification Transactions (FCT/1: Policy & Intent) + +Certification events MUST be recorded as **FCT/1** transactions that aggregate one or more FER/1 receipts and bind **registry policy, intent, domain scope, and authority role**. +Transactions MUST include attestations whenever `registry_policy != 0` and SHALL expose publication pointers when federated. +**All intent/scope/role/authority metadata lives in FCT/1 (not in FCS/1).** + +### FR-BS-001 ByteStore Deterministic Identity + +ByteStore SHALL derive CIDs using the canonical CAS domain separator: `CID = algo || H("CAS:OBJ\0" || payload)`. +The derived CID returned by `put()` and `import_cor()` MUST match the CID embedded in COR/1 envelopes and SHALL remain stable across runs, implementations, and ingest modes (DDS §11.2; ADR-030). + +### FR-BS-002 Atomic Durability Ladder + +ByteStore persistence MUST follow the atomic write ladder: write → `fsync(tmp)` → `rename` → `fsync(shard)` → `fsync(root)`. +Crash-window simulations triggered via `AMDUAT_BYTESTORE_CRASH_STEP` MUST leave the public area consistent upon recovery, with no visible partial objects (DDS §11.4; ADR-030; evidence PH05-EV-BS-001). + +### FR-BS-003 Secure/Public Area Isolation + +ByteStore SHALL enforce SA/PA isolation such that public payload roots and secure state roots are disjoint and non-overlapping. +Violations MUST raise `ERR_AREA_VIOLATION` and SHALL be surfaced to callers (DDS §11.5; ADR-030). + +### FR-BS-004 COR/1 Round-Trip Identity + +Importing COR/1 bytes via ByteStore and exporting the same CID MUST yield a byte-identical envelope. +Any mismatch between stored bytes and derived CID SHALL raise `ERR_IDENTITY_MISMATCH` (DDS §11.3; ADR-030). + +### FR-BS-005 Streaming Determinism & Policy Enforcement + +Chunked ingestion (`put_stream`) MUST produce the same CID as single-shot `put` for equivalent payloads and reject non-bytes or missing data with deterministic errors (`ERR_STREAM_ORDER`, `ERR_STREAM_TRUNCATED`). +ByteStore SHALL enforce ICD/1 `max_object_size` for all ingest paths, raising `ERR_POLICY_SIZE` when exceeded (DDS §11.6–11.7; ADR-030). + +### FR-022 Federation Publication Digest (FPD/1) + +Every publish event emerging from an FCT/1 certification MUST emit exactly one **FPD/1** digest satisfying ADR-007 single-digest guarantees. +The digest SHALL canonically hash the certified FCT/1 record, all attested FER/1 receipts, and the emitted governance edges (`certifies`, `attests`, `publishes`). +Implementations MUST persist the FPD/1 bytes alongside the FCT/1 payload under `/logs/ph03/evidence/fct/` (or successor evidence path) and reference the resulting CID from `fct.publication`. +Repeated invocations over identical inputs SHALL reproduce the same digest; mismatches SHALL be treated as certification failures. + +### FR-018 Provenance Enforcement + +Caching or replay layers MUST validate FER/1 receipts and FCT/1 transactions before serving composite outputs. Serving uncertified artefacts when policy requires certification is forbidden. + +### FR-019 Transaction Envelope Rejection + +Systems MUST reject FER/1 or FCT/1 envelopes whose CID lineage does not match the referenced FCS/1 descriptor, whose timestamps are non-monotonic, or whose signatures/attestations fail verification. + +### FR-020 Deterministic Execution Envelope + +| ID | Statement | Verification | Notes | +| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | +| **FR-020 — Deterministic Execution Envelope** | Each executor SHALL complete within a bounded deterministic time envelope (default 5 s). Execution time SHALL be measured and logged as evidence. Non-termination SHALL yield symbolic error `ERR_EXEC_TIMEOUT`. | Verified via CI parity harness and evidence file `/logs/ph03/evidence/-execution-times.jsonl`. | Implements Maat’s Balance principle. Tags: [deterministic-timing, evidence, maat-balance]. | + +### FR-021 Acyclic Composition + +FCS/1 descriptors referencing FPS/1 primitives, PCB1 parameter blocks, or nested FCS/1 descriptors MUST form an acyclic graph. +Registries SHALL reject submissions introducing self-references or cycles and emit `ERR_FCS_CYCLE_DETECTED` or +`ERR_PCB_ARITY_MISMATCH` when arity metadata conflicts with PCB1 manifests. + +### FR-028 Concept-Native Domain Materialization + +Federated domain manifests SHALL be materialized exclusively from CRS Concepts +and Relations. Given a DomainNode Concept, registries MUST traverse +`hasManifest` → `ManifestEntry` Concepts, extract `entryName` and +`entryChildVersion` relations, dedupe the `(name, version)` set, and compute the +GS/1 domain state deterministically. Duplicated pairs trigger `ERR_DG_DUP_ENTRY`; +missing relations trigger `ERR_DG_ENTRY_INCOMPLETE`; self references or +ancestor loops raise `ERR_DG_CYCLE`. Evidence: `tools/ci/dg_snapshot.py` +→ `logs/ph04/evidence/dg1/PH04-EV-DG-001/`. + +Operational linkage: router listings (`GET /links`) MUST return entries sorted +lexicographically by `fls_cid` and treat `since` query parameters as exclusive +lower bounds, ensuring deterministic replay of linkage events. + +### FR-029 Publication Recursion Discipline + +Publication Concepts SHALL declare their supporting FPD/1 digest, GS/1 cover +state, endorsed member FPD CIDs, and optional lineage parent using CRS +relations (`covers`, `endorses`, `parent`). Validators MUST recompute GS/1 from +the FPD payload, enforce duplicate-free membership, and detect recursive +cycles (`ERR_FPD_CYCLE`). Timestamp regressions raise `ERR_FPD_TIMESTAMP`; state +mismatches raise `ERR_PUB_STATE_MISMATCH`. Evidence: `tools/ci/pub_validate.py` +→ `logs/ph04/evidence/pub1/PH04-EV-PUB-001/`. + +Operational linkage: non-genesis publications SHOULD enable the parent-required +policy, supplying `fpd.parent` and guaranteeing strictly monotonic +`fpd.timestamp` to align with ADR-019 v1.2.1 and PH04 parent-policy harnesses. + +### FR-030 Predicate Concepts + +Every CRR/1 relation predicate MUST resolve to a CRS Concept. When the +taxonomy defines a `Predicate` Concept, predicate entries SHALL expose an +`is_a` edge into that class. Missing predicate Concepts raise +`ERR_CRR_PREDICATE_NOT_CONCEPT`; missing taxonomy membership raises +`ERR_CRR_PREDICATE_CLASS_MISSING`. Evidence: CRS validator vectors and +`logs/ph04/evidence/crs1/PH04-EV-CRS-001.md`. + +Operational linkage: FPD feed endpoints SHALL implement stateless, content-anchored pagination over parent-chained publications. `GET /feed/fpd` MUST traverse the publisher’s current tip toward genesis until either the caller-provided `limit` is satisfied or the supplied `since` CID is encountered; identical `publisher_id`, `since`, and `limit` inputs SHALL yield identical CID sequences. Detail lookups (`GET /feed/fpd/:cid`) SHALL expose publisher, members, parent, and state metadata without server-side session state. Evidence: `tools/ci/feeds_check.py` → `/amduat/logs/ph04/evidence/feeds/PH04-EV-FEEDS-001/pass.jsonl`. + +### FR-031 Authority Anchoring via CRS & FPD + +Publishing authorities SHALL represent identities as CRS Concepts linked via +`owns` and `hasRole` relations to key material and governance roles. Signatures +remain confined to FCT/1 and FPD/1 surfaces; CRS layers stay unsigned. FLS/1 +transport MAY carry Concept or Relation payloads but MUST NOT mutate them and +MUST perform payload-kind checks when requested (`--check-crs-payload`). + +Operational linkage: FLS router deployments SHALL expose `POST /fls`, +`GET /fls/:cid`, `GET /links`, `GET /healthz`, and `GET /readyz` endpoints and +enforce SA/PA separation (`ERR_AREA_VIOLATION` if misconfigured) so that public +ingest never mutates state areas directly. Audited ticket intake SHALL be +implemented via WT/1 (ADR-023) with: + +* `POST /wt` (Protected Area) accepting WT/1 BCF/1 payloads, validating + `has_pubkey(wt.author, wt.pubkey)` (or registered equivalent), verifying + signatures over `H("AMDUAT:WT\0" || canonical_bytes_without_signature)`, + enforcing registered ADR-010 intents (deduped + byte-lexicographically + sorted), ensuring monotonic `wt.timestamp` per `wt.author`, and optionally + chaining `wt.parent` lineage. Violations yield `ERR_WT_SIGNATURE`, + `ERR_WT_KEY_UNBOUND`, `ERR_WT_INTENT_UNREGISTERED`, `ERR_WT_INTENT_DUP`, + `ERR_WT_INTENT_EMPTY`, `ERR_WT_TIMESTAMP`, `ERR_WT_PARENT_UNKNOWN`, or + `ERR_WT_PARENT_REQUIRED`. Router policy MUST surface scope denials as + `ERR_WT_SCOPE_UNAUTHORIZED` and log the governing policy capsule. +* `GET /wt/:cid` returning the canonical WT/1 bytes for any accepted ticket. +* Deterministic pagination (`GET /wt?after=&limit=`) that emits WT/1 + entries in byte-lexicographic CID order with stable page boundaries. The + `after` parameter is an exclusive bound and routers SHALL enforce + `1 ≤ limit ≤ Nmax` to guarantee replay stability. + +Evidence: `/amduat/logs/ph04/evidence/wt1/PH04-EV-WT-001/summary.md` captures the +validator run over vectors `TV-WT-001…009`, ensuring unknown keys, signature +failures, timestamp regressions (including parent inversions), unbound keys, +unregistered intents, policy rejections, and unresolved parents reject as +specified. + +Compat overlays SHALL reference ADR-025 MPR/1 provenance capsules and ADR-026 +IER/1 inference evidence when operating in policy lane `compat`. Routers MUST +validate that `executor_fingerprint` equals the supplied MPR/1 CID, enforce +`determinism_level` plus `rng_seed` (raising `ERR_FER_RNG_REQUIRED` when +omitted), and verify log digests via the IER/1 manifest before accepting +overlays (`ERR_IER_LOG_HASH`/`ERR_IER_LOG_MANIFEST`). Evidence surfaces +`/amduat/logs/ph04/evidence/mpr1/PH04-EV-MPR-001/pass.jsonl` and +`/amduat/logs/ph04/evidence/ier1/PH04-EV-IER-001/pass.jsonl` prove vector +coverage `TV-MPR-001…003` (hash triple, missing weights, signature domain) and +`TV-IER-001…004` (ok, missing seed, fingerprint mismatch, log digest mismatch) +respectively with scenario summaries in accompanying `summary.md` files. + +### FR-032 CT/1 Deterministic Replay (D1) + +Given identical AC/1 + DTF/1 + topology inputs, executing the runtime twice in +isolation MUST produce byte-identical CT/1 snapshots (header and payload) with +matching CIDs whenever `ct.determinism_level = 0`. Evidence: +`tools/ci/ct_replay.py` (`runA`/`runB`) → +`/amduat/logs/ph05/evidence/ct1/PH05-EV-CT1-REPLAY-001/`. + +### FR-033 CT/1 Numeric Stability (D2) + +When `ct.determinism_level = 1`, numeric observables MAY diverge, but the +maximum absolute delta MUST remain within the tolerance documented by +`ct.kernel_cfg`. Evidence: `tools/ci/ct_replay.py` D2 replay outputs and kernel +configuration manifests in the same evidence set. + +### FR-034 CT/1 Header Integrity + +CT/1 headers MUST follow ADR-027: canonical BCF/1 key ordering, rejection of +unknown keys, monotonic `ct.tick`, canonical `cid:` formatting for topology and +AC/1/DTF/1 pointers (ADR-028), and Ed25519 signatures over +`H("AMDUAT:CT\0" || canonical_bytes_without_signature)`. Evidence: +`tools/validate/ct1_validator.py` with vectors +`/amduat/vectors/ph05/ct1/TV-CT1-001…004` and AC/DTF fixtures +`TV-AC1-001…002`, `TV-DTF1-001…002`. + +--- + +## 4. Non-Functional Requirements + +### NFR-001 Determinism + +Platform/language differences MUST NOT affect CID. + +### NFR-002 Performance + +Put/get latency MUST remain within configured OPS budgets. + +### NFR-003 Reliability + +CAS operations MUST be atomic; partial writes MUST NOT be visible. + +### NFR-004 Portability + +Implementations MUST operate on common filesystems. + +### NFR-005 Security Posture + +Domain separation strings MUST be applied for all hashed surfaces. + +### 4.3 Future Scope Alignment (Informative) + +Phase 02 introduces deterministic transformation primitives (**FPS/1**) extending the Kheper CAS model defined herein. +See `/amduat/arc/adrs/adr-015.md` and `/amduat/tier1/fps.md` for details. +No behavioural changes apply retroactively to PH01 surfaces. + +--- + +## 5. Data Model (Behavioural View) + +* CAS objects identified strictly by CID. +* COR/1 envelope provides size, payload, algo_id. +* ICD/1 descriptor provides instance configuration. + +> See DDS §2 (COR/1) and §3 (ICD/1) for normative byte layouts. + +--- + +## 6. API Semantics + +### `put(payload_bytes, algo_id=default) → CID` + +* Compute CID using domain separation: `CID = algo_id || H("CAS:OBJ\0" || payload_bytes)` +* If CID exists: return existing CID (idempotent) +* If absent: write canonical COR/1 envelope atomically +* Reject on size limit breach, malformed payload, non-canonical COR/1, I/O errors +* Writes MUST be atomic: temp file → fsync → rename → fsync parent dir + +### `get(CID) → payload_bytes` + +* Retrieve raw payload bytes +* MUST validate canonical COR/1 envelope +* Implementation MAY verify hash on read by policy +* Reject on missing object, hash mismatch + +### `exists(CID) → bool` + +* Return true if object is present and canonical + +### `stat(CID) → { present, size, algo_id }` + +* MUST return canonical metadata + +### `verify(CID) → { ok|error, expected:CID, actual:CID }` + +* Recompute CID from canonical bytes +* MUST detect corruption and reject non-canonical encodings + +### `import(stream_COR1) → CID` + +* Validate canonical TLV ordering +* Reject duplicate tags, extraneous tags, malformed VARINTs +* MUST round-trip to identical CID + +### `export(CID) → stream_COR1` + +* Emit canonical envelope; re-encoding MUST preserve canonical bytes + +### Deterministic Errors + +Errors MUST be emitted as stable symbolic codes including but not limited to: + +* `E_CID_NOT_FOUND` +* `E_CORRUPT_OBJECT` +* `E_CANONICALITY_VIOLATION` +* `E_IO_FAILURE` + +--- + +## 7. Success Criteria + +* Byte-for-byte CID agreement (≥ 3 platforms) +* Zero false positives in `verify()` +* Idempotent concurrent `put()` +* COR/1 import/export round-trips cleanly + +--- + +## 8. GC Semantics (Behavioural) + +* Reachability from configured roots +* Dry-run mode MUST NOT delete +* Removal MUST be atomic per object + +--- + +## 9. Acceptance Criteria (Phase Exit) + +* Golden vectors published +* Cross-impl CI passing +* COR/1 and ICD/1 documented in DDS +* Security posture validated by SEC + +--- + +## 10. Traceability + +* Requirements link to tests/defects in Phase Packs +* ADRs reference affected FR/NFR IDs + +--- + +## 11. Future Phases + +* Multi-object transactions bind to `instance_id` +* Provenance graph consumes COR/1 metadata + +--- + +## 12. Functional Primitive Surface (FPS/1) + +> Defines the canonical deterministic operations over canonical payloads. +> Each primitive produces exactly one payload and one CID. + +| Primitive | Signature | Description | Determinism / Errors | +| ------------- | ------------------------------ | ------------------------------------------- | ---------------------------------------------- | +| `put` | `(payload_bytes) → CID` | Canonical write, atomic fsync ladder. | ADR-006 `ERR_IO_FAILURE`, `ERR_NORMALIZATION`. | +| `get` | `(CID) → payload_bytes` | Fetch canonical bytes. | `ERR_CID_NOT_FOUND`. | +| `slice` | `(CID, offset, length) → CID` | Extract contiguous bytes. | `ERR_SLICE_RANGE`. | +| `concatenate` | `([CID₁,…,CIDₙ]) → CID` | Sequential join of payloads. | `ERR_EMPTY_INPUTS`. | +| `reverse` | `(CID, level) → CID` | Reverse payload order (bit/byte/word/long). | `ERR_REV_ALIGNMENT`, `ERR_INVALID_LEVEL`. | +| `splice` | `(CID_a, offset, CID_b) → CID` | Insert payload b into a at offset. | `ERR_SPLICE_RANGE`. | + +**Determinism:** identical inputs → identical outputs. +**Immutability:** inputs never mutated. +**Closure:** outputs valid for reuse as inputs to any primitive. +**Error handling:** all symbolic per ADR-006. + +--- + +## Appendix A — Surface Version Table + +| Surface | Version | Notes | +| ------- | ------- | ----- | +| FCS/1 | v1-min | Canonical execution descriptors; governance captured in FCT/1. | +| FER/1 | v1.1 | Receipts enforce parity-first evidence, run_id dedup, typed logs, and RNG discipline (ADR-017). | +| FCT/1 | v1.0 | Certification transactions binding policy/intent/attestations with FER/1 sets. | +| FPD/1 | v1.0 | Publication digest linking FCT/1 to FER/1 receipts for federation replay. | + +--- + +## Document History + +* 0.2.1 (2025-10-26) — Phase Pack pointer updated; no semantic changes; archival preserves historical lineage per ADR-002. +* 0.2.2 (2025-10-26) — Promoted PH01 baseline to Approved; synchronized Phase Pack §1 anchors and closure snapshot. +* 0.2.3 (2025-10-27) — Added future scope alignment note pointing to FPS/1 and ADR-015; PH01 semantics remain unchanged. +* **0.2.4 (2025-11-14):** Added FR-014–FR-019 for FCS/1 composition, FER/1 receipts, and FCT/1 certification policies. +* **0.2.5 (2025-11-15):** Added FR-021 (formerly FR-020) enforcing acyclic FCS/1 composition and PCB1 arity validation. +* **0.2.6 (2025-11-19):** Registered FR-020 Deterministic Execution Envelope (Maat’s Balance) with timing evidence tags. +* **0.3.0 (2025-11-02):** Trimmed FCS/1 to execution-only (v1-min) under FR-014/FR-015; moved policy/intent/scope/role/authority to FCT/1 (FR-017); clarified registry admission behaviour and kept FER/1 unchanged. +* **0.3.1 (2025-11-21):** Updated FR-016 to require parity-first FER/1 receipts with executor sets, parity vectors, and FR-020 aligned timestamps. +* **0.3.2 (2025-11-22):** Registered FR-022 Federation Publication Digest (FPD/1) requirement tying FCT/1 publications to single-digest evidence and canonical logging. + +* **0.3.4 (2025-11-07):** Recorded FER/1 v1.1 requirement for Phase 04 and added surface version table. + +* **0.3.5 (2025-11-08):** Registered PH04 linkage & semantic placeholder requirements (FR-028…031). +* **0.3.6 (2025-11-09):** Promoted FR-028…031 to normative linkage requirements with CRS/1 validator enforcement. + +* **0.3.7 (2025-11-08):** Finalized FR-028…031 with CRS/1 immutability, GS/1 linkage, and certification coverage. + +* **0.3.8 (2025-11-09):** Promoted FR-028…FR-031 for concept-native domain and publication validation. +* **0.3.9 (2025-11-09):** Documented operational linkage: router endpoints, deterministic `/links`, and parent-required publish policy guidance. +* **0.3.10 (2025-11-11):** Registered FR-030 stateless, content-anchored FPD feed pagination requirement. + +* **0.3.11 (2025-11-09):** Extended FR-031 with WT/1 intake endpoints, validation, and evidence log references. +* **0.3.12 (2025-11-20):** Tightened FR-031 with `wt.pubkey` bindings, signature preimage exclusion, lineage/policy errors, and + expanded WT/1 vector evidence coverage. + +* **0.3.13 (2025-11-21):** Updated FR-031 for `has_pubkey` bindings (`ERR_WT_KEY_UNBOUND`), intent registry enforcement (`ERR_WT_INTENT_UNREGISTERED`), lineage policy rejection (`ERR_WT_PARENT_REQUIRED`), and expanded WT/1 vectors `TV-WT-001…009`. +* **0.3.14 (2025-11-22):** WT/1 intake and SOS/1 compat overlays proven with PH04-M4/M5 audit evidence. +* **0.3.15 (2025-11-22):** Recorded ADR-025/026 compat path requirements and evidence anchors for FR-031. + +* **0.3.16 (2025-11-23):** Compat lane now enforces ADR-025/026 validators (MPR/1 hash triple, IER/1 replay) with updated evidence surfaces. + +* **0.3.17 (2025-11-24):** Added FR-032–FR-034 for CT/1 replay determinism, numeric stability, and header integrity (ADR-027/028). + +* **0.4.0 (2025-11-11):** Added FR-BS-001…005 for ByteStore identity, atomic durability, SA/PA isolation, COR round-trip, and streaming determinism linked to DDS §11 / ADR-030. diff --git a/tier1/tgk-1.md b/tier1/tgk-1.md new file mode 100644 index 0000000..9b88f6f --- /dev/null +++ b/tier1/tgk-1.md @@ -0,0 +1,158 @@ +# TGK/1 — Trace Graph Kernel Semantics + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-30 +Linked Phase Pack: N/A +Tags: [tgk, determinism, index, federation] + + + +**Document ID:** `TGK/1` +**Layer:** L1 — Semantic graph layer over ASL artifacts and PERs (no encodings) + +**Depends on (normative):** + +* `ASL/1-CORE` +* `ASL/1-CORE-INDEX` +* `ASL/LOG/1` +* `ASL/SYSTEM/1` +* `TGK/1-CORE` + +**Informative references:** + +* `ENC/TGK1-EDGE/1` — core edge encoding +* `ENC/TGK-INDEX/1` — index encoding draft +* `ASL/INDEX-ACCEL/1` +* `ENC/ASL-CORE-INDEX/1` + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +TGK/1 defines semantic meaning only. It does not define storage formats, on-disk encodings, or execution operators. + +--- + +## 1. Purpose & Scope + +TGK/1 defines the **semantic layer** for Trace Graph Kernel (TGK) edges that relate ASL artifacts and PERs. +It keeps TGK thin and deterministic by reusing ASL index and log semantics. + +Non-goals: + +* New encodings for edges or indexes +* Query operators or execution plans +* Federation protocols or transport +* Re-definition of ASL or PEL semantics + +--- + +## 2. TGK Objects + +### 2.1 TGK Edge + +A TGK Edge is an **immutable record** representing a directed relationship between ASL artifacts and/or PERs. +TGK edges are semantic overlays and **MUST NOT** redefine or bypass ASL identity. +TGK/1-CORE defines the EdgeBody structure with ordered `from`/`to` lists; TGK/1 +does not further constrain cardinality. + +### 2.2 Canonical Edge Key + +Each TGK edge has a **Canonical Edge Key** that uniquely identifies it. +The Canonical Edge Key MUST be derived from the logical `EdgeBody` defined in +`TGK/1-CORE`, preserving list order and multiplicity: + +* `from`: ordered list of source node identifiers (MAY be empty) +* `to`: ordered list of destination node identifiers (MAY be empty) +* `payload`: reference carried by the edge +* `type`: edge type identifier +* Projection context (for example, PER or execution identity) when not already + captured by the edge payload or type profile + +Classification attributes (edge type keys, labels) **MUST NOT** affect canonical identity. + +--- + +## 3. Index and Visibility (Normative) + +TGK edges are **indexed objects** and inherit visibility from the ASL index and log: + +1. A TGK edge becomes visible only when its index record is admitted by a sealed segment and log order (ASL/LOG/1). +2. TGK traversal and lookup **MUST NOT** bypass index visibility or log ordering. +3. For a fixed `{Snapshot, LogPrefix}`, TGK edge lookup and shadowing **MUST** be deterministic (ASL/1-CORE-INDEX). +4. Tombstones and shadowing semantics follow ASL/1-CORE-INDEX and ASL/LOG/1 replay order. + +Index records MUST reference TGK/1-CORE edge identities. Index encodings MUST +NOT re-encode edge structure (`from[]`, `to[]`); they reference TGK/1-CORE edges +and carry only routing/filter metadata. + +--- + +## 4. Deterministic Traversal (Normative) + +TGK traversal operates over a snapshot/log-bounded view: + +* Inputs: `{Snapshot, LogPrefix}` and a seed set (nodes or edges). +* Outputs: only edges visible under the same `{Snapshot, LogPrefix}`. +* Traversal **MUST** be deterministic and replay-compatible with ASL/LOG/1. + +Deterministic ordering for traversal output MUST be: + +1. `logseq` ascending +2. Canonical Edge Key as tie-break + +Acceleration structures MAY be used but MUST NOT change semantics. + +--- + +## 5. Federation Alignment (Normative) + +Federation does not change TGK semantics. It only propagates edges and artifacts that are already visible under index rules. + +* Domain visibility and publication status are enforced via index metadata (ENC-ASL-CORE-INDEX). +* TGK edges keep canonical identity across domains. +* Cross-domain propagation MUST preserve snapshot/log determinism. + +--- + +## 6. Non-Goals + +TGK/1 does not define: + +* Edge encoding or storage layout +* Index segment formats +* Query languages or execution plans +* Acceleration rules beyond ASL/INDEX-ACCEL/1 + +--- + +## 7. Normative Invariants + +Conforming implementations MUST enforce: + +1. TGK edges are immutable and indexed objects. +2. No TGK visibility without index admission and log ordering. +3. Traversal is snapshot/log bounded and deterministic. +4. Federation does not alter TGK semantics; it only propagates visible edges. +5. Edge classification is not part of canonical identity.