Add core tier1 specs for ASL/TGK

This commit is contained in:
Carl Niklas Rydberg 2026-01-17 11:18:00 +01:00
parent 0fc1fbd980
commit 3886716799
13 changed files with 4323 additions and 0 deletions

233
tier1/asl-core-index-1.md Normal file
View file

@ -0,0 +1,233 @@
# ASL/1-CORE-INDEX — Semantic Index Model
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, index, semantics]
<!-- Source: /amduat-api/tier1/asl-core-index.md | Canonical: /amduat/tier1/asl-core-index-1.md -->
**Document ID:** `ASL/1-CORE-INDEX`
**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle)
**Depends on (normative):**
* `ASL/1-CORE`
* `ASL/1-STORE`
**Informative references:**
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
* `ENC/ASL-CORE-INDEX/1` — bytes-on-disk encoding profile
* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding)
* `ASL/LOG/1` — append-only semantic log (segment visibility)
* `TGK/1` — TGK edge visibility and traversal alignment
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX.
---
## 1. Purpose & Non-Goals
### 1.1 Purpose
ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts:
* It specifies what it means to map an artifact identity to a byte location.
* It defines visibility, immutability, and shadowing semantics.
* It ensures deterministic lookup for a fixed snapshot and log prefix.
### 1.2 Non-goals
ASL/1-CORE-INDEX explicitly does **not** define:
* On-disk layouts, segment files, or memory representations.
* Block allocation, packing, GC, or lifecycle rules.
* Snapshot implementation details, checkpoints, or log storage.
* Performance optimizations (bloom filters, sharding, SIMD).
* Federation, provenance, or execution semantics.
---
## 2. Terminology
* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE.
* **Reference** — ASL/1 content address of an Artifact (hash_id + digest).
* **StoreConfig**`{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE).
* **Block** — immutable storage unit containing artifact bytes.
* **BlockID** — opaque identifier for a block.
* **ArtifactExtent**`(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state.
* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot.
* **CURRENT** — effective state after replaying a log prefix on a snapshot.
---
## 3. Core Mapping Semantics
### 3.1 Index Mapping
The index defines a semantic mapping:
```
Reference -> ArtifactLocation
```
For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state.
### 3.2 Determinism
For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
### 3.3 StoreConfig Consistency
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`. Encoding profiles MUST allow variable-length digests; the digest length MUST be either explicit in the encoding or derivable from `hash_id` and StoreConfig.
---
## 4. ArtifactLocation Semantics
* An ArtifactLocation is an **ordered list** of ArtifactExtents.
* Each extent references immutable bytes within a block.
* The artifact bytes are defined by **concatenating extents in order**.
* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes.
* Tombstone entries are visible but MUST have no ArtifactLocation; they only shadow prior entries.
* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks.
* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
* An ArtifactLocation is valid only while all referenced blocks are retained.
* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping.
---
## 5. Visibility Model
An index entry is **visible** at CURRENT if and only if:
1. The entry is contained in a sealed segment whose seal record is admitted in the ordered log prefix for CURRENT (or anchored in the snapshot).
2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
Visibility is binary; entries are either visible or not visible.
---
## 6. Snapshot and Log Semantics
Snapshots provide a base mapping of sealed segments; the append-only log admits later segment seals and policy records that define subsequent changes.
The index state for a given CURRENT is defined as:
```
Index(CURRENT) = Index(snapshot) + replay(log_prefix)
```
Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed.
---
## 7. Immutability and Shadowing
### 7.1 Immutability
* Index entries are never mutated.
* Once visible, an entrys meaning does not change.
* Referenced bytes are immutable for the lifetime of the entry.
### 7.2 Shadowing
* Later entries MAY shadow earlier entries with the same Reference.
* Precedence is determined solely by log order.
* Snapshot boundaries do not alter shadowing semantics.
---
## 8. Tombstones (Optional)
Tombstone entries MAY be used to invalidate prior mappings.
* A tombstone shadows earlier entries for the same Reference.
* Visibility rules are identical to regular entries.
* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used.
---
## 9. Determinism Guarantees
For fixed:
* StoreConfig
* Snapshot
* Log prefix
ASL/1-CORE-INDEX guarantees:
* Deterministic lookup results
* Deterministic shadowing resolution
* Deterministic visibility
---
## 10. Normative Invariants
Conforming implementations MUST enforce:
1. No visibility without a sealed segment whose seal record is log-admitted (or snapshot-anchored).
2. No mutation of visible index entries.
3. Referenced bytes remain immutable for the entrys lifetime.
4. Shadowing follows strict log order.
5. Snapshot + log replay uniquely defines CURRENT.
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun), except for tombstones which have no ArtifactLocation.
Violation of any invariant constitutes index corruption.
---
## 11. Relationship to Other Specifications
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
| ASL/1-CORE | Artifact semantics and identity |
| ASL/1-STORE | StoreSnapshot and put/get logical model |
| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts |
| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records |
---
## 12. Summary
ASL/1-CORE-INDEX specifies the semantic meaning of the index:
* It maps artifact References to byte locations deterministically.
* It defines visibility and shadowing rules across snapshot + log replay.
* It guarantees immutability and deterministic lookup.
It answers one question:
> *Given a Reference and a CURRENT state, where are the bytes?*

296
tier1/asl-index-accel-1.md Normal file
View file

@ -0,0 +1,296 @@
# ASL/INDEX-ACCEL/1 — Index Acceleration Semantics
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, index, acceleration]
<!-- Source: /amduat-api/tier1/asl-index-accel-1.md | Canonical: /amduat/tier1/asl-index-accel-1.md -->
**Document ID:** `ASL/INDEX-ACCEL/1`
**Layer:** L1 — Acceleration rules over index semantics (no storage / encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX`
* `ASL/LOG/1`
**Informative references:**
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
* `ENC/ASL-CORE-INDEX/1` — bytes-on-disk encoding profile
* `TGK/1` — TGK semantics and visibility alignment
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/INDEX-ACCEL/1 defines **acceleration semantics only**. It MUST NOT change index meaning defined by ASL/1-CORE-INDEX.
---
## 1. Purpose
ASL/INDEX-ACCEL/1 defines **acceleration mechanisms** used by ASL-based indexes, including:
* Routing keys
* Sharding
* Filters (Bloom, XOR, Ribbon, etc.)
* SIMD execution
* Hash recasting
All mechanisms defined herein are **observationally invisible** to ASL/1-CORE-INDEX semantics.
---
## 2. Scope
Applies to:
* Artifact indexes (ASL)
* Projection and graph indexes (e.g., TGK)
* Any index layered on ASL/1-CORE-INDEX semantics
Does **not** define:
* Artifact or edge identity
* Snapshot semantics
* Storage lifecycle
* Encoding details
---
## 3. Canonical Key vs Routing Key
### 3.1 Canonical Key
The **Canonical Key** uniquely identifies an indexable entity.
Examples:
* Artifact: `Reference`
* TGK Edge: canonical key defined by `TGK/1` and `TGK/1-CORE` (opaque here)
Properties:
* Defines semantic identity
* Used for equality, shadowing, and tombstones
* Stable and immutable
* Fully compared on index match
### 3.2 Routing Key
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
Properties:
* Derived deterministically from Canonical Key and optional attributes
* MAY be used for sharding, filters, SIMD layouts
* MUST NOT affect index semantics
* MUST be verified by full Canonical Key comparison on match
Formal rule:
```
CanonicalKey determines correctness
RoutingKey determines performance
```
---
## 4. Filter Semantics
### 4.1 Advisory Nature
All filters are **advisory only**.
Rules:
* False positives are permitted
* False negatives are forbidden
* Filter behavior MUST NOT affect correctness
Invariant:
```
Filter miss => key is definitely absent
Filter hit => key may be present
```
### 4.2 Filter Inputs
Filters operate over **Routing Keys**, not Canonical Keys.
A Routing Key MAY incorporate:
* Hash of Canonical Key
* Artifact type tag (if present)
* TGK `EdgeTypeId` or other immutable classification attributes (TGK/1-CORE)
* Direction, role, or other immutable classification attributes
Absence of optional attributes MUST be encoded explicitly.
### 4.3 Filter Construction
* Filters are built only over **sealed, immutable segments**
* Filters are immutable once built
* Filter construction MUST be deterministic
* Filter state MUST be covered by segment checksums
* Filters SHOULD be snapshot-scoped or versioned with their segment to avoid
unbounded false-positive accumulation over time
---
## 5. Sharding Semantics
### 5.1 Observational Invisibility
Sharding is a **mechanical partitioning** of the index.
Invariant:
```
LogicalIndex = union(all shards)
```
Rules:
* Shards MUST NOT affect lookup results
* Shard count and boundaries may change over time
* Rebalancing MUST preserve lookup semantics
### 5.2 Shard Assignment
Shard assignment MAY be based on:
* Hash of Canonical Key
* Routing Key
* Composite routing strategies
Shard selection MUST be deterministic per snapshot.
---
## 6. Hashing and Hash Recasting
### 6.1 Hashing
Hashes MAY be used for routing, filtering, or SIMD layout.
Hashes MUST NOT be treated as identity.
### 6.2 Hash Recasting
Hash recasting (changing hash functions or seeds) is permitted if:
1. It is deterministic
2. It does not change Canonical Keys
3. It does not affect index semantics
Recasting is equivalent to rebuilding acceleration structures.
---
## 7. SIMD Execution
SIMD operations MAY be used to:
* Evaluate filters
* Compare routing keys
* Accelerate scans
Rules:
* SIMD must operate only on immutable data
* SIMD must not short-circuit semantic checks
* SIMD must preserve deterministic behavior
---
## 8. Multi-Dimensional Routing Examples (Normative)
### 8.1 Artifact Index
* Canonical Key: `Reference`
* Routing Key components:
* `H(Reference)`
* `type_tag` (if present)
* `has_typetag`
### 8.2 TGK Edge Index
* Canonical Key: defined by `TGK/1` and `TGK/1-CORE` (opaque here)
* Routing Key components:
* `H(CanonicalEdgeKey)`
* `EdgeTypeId` (if present in the TGK profile)
* Direction or role (optional)
---
## 9. Snapshot Interaction
Acceleration structures:
* MUST respect snapshot visibility rules
* MUST operate over the same sealed segments visible to the snapshot
* MUST NOT bypass tombstones or shadowing
Snapshot cuts apply **after** routing and filtering.
---
## 10. Normative Invariants
1. Canonical Keys define identity and correctness
2. Routing Keys are advisory only
3. Filters may never introduce false negatives
4. Sharding is observationally invisible
5. Hashes are not identity
6. SIMD is an execution strategy, not a semantic construct
7. All acceleration is deterministic per snapshot
---
## 11. Non-Goals
ASL/INDEX-ACCEL/1 does not define:
* Specific filter algorithms
* Memory layout
* CPU instruction selection
* Encoding formats
* Federation policies
---
## 12. Summary
ASL/INDEX-ACCEL/1 establishes a strict contract:
> All acceleration exists to make the index faster, never different.
It formalizes Canonical vs Routing keys and constrains filters, sharding, hashing, and SIMD so that correctness is preserved under all optimizations.

139
tier1/asl-indexes-1.md Normal file
View file

@ -0,0 +1,139 @@
# ASL/INDEXES/1 -- Index Taxonomy and Relationships
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-01-17
Linked Phase Pack: N/A
Tags: [indexes, content, structural, materialization]
<!-- Source: /amduat-api/tier1/asl-indexes-1.md | Canonical: /amduat/tier1/asl-indexes-1.md -->
**Document ID:** `ASL/INDEXES/1`
**Layer:** L2 -- Index taxonomy (no encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX`
* `ASL/STORE-INDEX/1`
**Informative references:**
* `ASL/SYSTEM/1`
* `TGK/1`
* `ENC/ASL-CORE-INDEX/1`
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/INDEXES/1 defines index roles and relationships. It does not define encodings or storage layouts.
---
## 1. Purpose
This document defines the minimal set of indexes used by ASL systems and their dependency relationships.
---
## 2. Index Taxonomy (Normative)
ASL systems use three distinct indexes:
### 2.1 Content Index
Purpose: map semantic identity to bytes.
```
ArtifactKey -> ArtifactLocation
```
Properties:
* Snapshot-relative and append-only
* Deterministic replay
* Optional tombstone shadowing
This is the ASL/1-CORE-INDEX and is the only index that governs visibility.
### 2.2 Structural Index
Purpose: map structural identity to a derivation DAG node.
```
SID -> DAG node
```
Properties:
* Deterministic and rebuildable
* Does not imply materialization
* May be in-memory or persisted
### 2.3 Materialization Cache
Purpose: record previously materialized content for a structural identity.
```
SID -> ArtifactKey
```
Properties:
* Redundant and safe to drop
* Recomputable from DAG + content index
* Pure performance optimization
---
## 3. Dependency Rules (Normative)
Dependencies MUST follow this direction:
```
Structural Index -> Materialization Cache -> Content Index
```
Rules:
* The Content Index MUST NOT depend on the Structural Index.
* The Structural Index MUST NOT depend on stored bytes.
* The Materialization Cache MAY depend on both.
---
## 4. PUT/GET Interaction (Informative)
* PUT registers structure (if used), resolves to an ArtifactKey, and updates the Content Index.
* GET consults only the Content Index and reads bytes from the store.
* The Structural Index and Materialization Cache are optional optimizations for PUT.
---
## 5. Non-Goals
ASL/INDEXES/1 does not define:
* Encodings for any index
* Storage layout or sharding
* Query operators or traversal semantics

314
tier1/asl-log-1.md Normal file
View file

@ -0,0 +1,314 @@
# ASL/LOG/1 — Append-Only Semantic Log
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, log, snapshot]
<!-- Source: /amduat-api/tier1/asl-log-1.md | Canonical: /amduat/tier1/asl-log-1.md -->
**Document ID:** `ASL/LOG/1`
**Layer:** L1 — Domain log semantics (no transport)
**Depends on (normative):**
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts (pending spec)
**Informative references:**
* `ASL/1-CORE-INDEX` — index semantics
* `TGK/1` — TGK edge visibility and traversal alignment
* `ENC/ASL-LOG/1` — bytes-on-disk encoding profile
* `ENC/ASL-CORE-INDEX/1` — index segment encoding
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/LOG/1 defines **semantic log behavior**. It does not define transport, replication protocols, or storage layout.
---
## 1. Purpose
ASL/LOG/1 defines the **authoritative, append-only log** for an ASL domain.
The log records **semantic commits** that affect:
* Index segment visibility
* Tombstone policy
* Snapshot anchoring
* Optional publication metadata
The log is the **sole source of truth** for reconstructing CURRENT state.
---
## 2. Core Properties (Normative)
An ASL log MUST be:
1. Append-only
2. Strictly ordered
3. Deterministically replayable
4. Hash-chained
5. Snapshot-anchorable
6. Binary encoded per `ENC-ASL-LOG`
7. Forward-compatible
---
## 3. Log Model
### 3.1 Log Sequence
Each record has a monotonically increasing `logseq`:
```
logseq: uint64
```
* Assigned by the domain authority
* Total order within a domain
* Never reused
### 3.2 Hash Chain
Each record commits to the previous record:
```
record_hash = H(prev_record_hash || logseq || record_type || payload_len || payload)
```
This enables tamper detection, witness signing, and federation verification.
### 3.3 Record Envelope
All log records share a common envelope whose **exact byte layout** is defined
in `ENC-ASL-LOG`. The envelope MUST include:
* `logseq` (monotonic sequence number)
* `record_type` (type tag)
* `payload_len` (bytes)
* `payload` (type-specific bytes)
* `record_hash` (hash-chained integrity)
---
## 4. Record Types (Normative)
## 4.0 Common Payload Encoding (Informative)
The byte-level payload schemas are defined in `ENC-ASL-LOG`. The shared
artifact reference encoding is:
```c
typedef struct {
uint32_t hash_id;
uint16_t digest_len;
uint16_t reserved0; // must be 0
uint8_t digest[digest_len];
} ArtifactRef;
```
### 4.1 SEGMENT_SEAL
Declares an index segment visible.
Payload (encoding):
```c
typedef struct {
uint64_t segment_id;
uint8_t segment_hash[32];
} SegmentSealPayload;
```
Semantics:
* From this `logseq` onward, the referenced segment is visible for lookup and replay.
* Segment MUST be immutable.
* All referenced blocks MUST already be sealed.
* Segment contents are not re-logged.
### 4.2 TOMBSTONE
Declares an artifact inadmissible under domain policy.
Payload (encoding):
```c
typedef struct {
ArtifactRef artifact;
uint32_t scope;
uint32_t reason_code;
} TombstonePayload;
```
Semantics:
* Does not delete data.
* Shadows prior visibility.
* Applies from this logseq onward.
### 4.3 TOMBSTONE_LIFT
Supersedes a previous tombstone.
Payload (encoding):
```c
typedef struct {
ArtifactRef artifact;
uint64_t tombstone_logseq;
} TombstoneLiftPayload;
```
Semantics:
* References an earlier TOMBSTONE.
* Does not erase history.
* Only affects CURRENT at or above this logseq.
### 4.4 SNAPSHOT_ANCHOR
Binds semantic state to a snapshot.
Payload (encoding):
```c
typedef struct {
uint64_t snapshot_id;
uint8_t root_hash[32];
} SnapshotAnchorPayload;
```
Semantics:
* Defines a replay checkpoint.
* Enables log truncation below anchor with care.
### 4.5 ARTIFACT_PUBLISH (Optional)
Marks an artifact as published.
Payload (encoding):
```c
typedef struct {
ArtifactRef artifact;
} ArtifactPublishPayload;
```
Semantics:
* Publication is domain-local.
* Federation layers may interpret this metadata.
### 4.6 ARTIFACT_UNPUBLISH (Optional)
Withdraws publication.
Payload (encoding):
```c
typedef struct {
ArtifactRef artifact;
} ArtifactUnpublishPayload;
```
---
## 5. Replay Semantics (Normative)
To reconstruct CURRENT:
1. Load latest snapshot anchor (if any).
2. Initialize visible segments from that snapshot.
3. Replay all log records with `logseq > snapshot.logseq`.
4. Apply records in order:
* SEGMENT_SEAL -> add segment
* TOMBSTONE -> update policy state
* TOMBSTONE_LIFT -> override policy
* PUBLISH/UNPUBLISH -> update visibility metadata
Replay MUST be deterministic.
---
## 6. Index Interaction
* Index segments contain index entries.
* The log never records individual index entries.
* Visibility is controlled solely by SEGMENT_SEAL.
* Index rebuild = scan visible segments + apply policy.
---
## 7. Garbage Collection Constraints
* A segment may be GC'd only if:
* No snapshot references it.
* No log replay <= CURRENT requires it.
* Log truncation is only safe at SNAPSHOT_ANCHOR boundaries.
---
## 8. Versioning & Extensibility
* Unknown record types MUST be skipped and MUST NOT break replay.
* Payloads are opaque outside their type.
* New record types may be added in later versions.
---
## 9. Non-Goals
ASL/LOG/1 does not define:
* Federation protocols
* Network replication
* Witness signatures
* Block-level events
* Hydration / eviction
* Execution receipts
---
## 10. Invariant (Informative)
> If it affects visibility, admissibility, or authority, it goes in the log.
> If it affects layout or performance, it does not.
---
## 10. Summary
ASL/LOG/1 defines the minimal semantic log needed to reconstruct CURRENT.
If it affects visibility or admissibility, it goes in the log. If it affects layout or performance, it does not.

414
tier1/asl-store-index-1.md Normal file
View file

@ -0,0 +1,414 @@
# ASL/STORE-INDEX/1 — Store Semantics and Contracts for ASL Core Index
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, index, log, storage]
<!-- Source: /amduat-api/tier1/asl-store-index.md | Canonical: /amduat/tier1/asl-store-index-1.md -->
**Document ID:** `ASL/STORE-INDEX/1`
**Layer:** L1 — Store lifecycle and replay contracts (no encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX` — semantic index model
* `ASL/LOG/1` — append-only log semantics
**Informative references:**
* `ENC/ASL-CORE-INDEX/1` — index segment encoding
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
* `TGK/1` — TGK semantics and visibility alignment
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 1. Purpose
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
It specifies:
* **Block lifecycle**: creation, sealing, retention, GC
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot identity and log positions** for deterministic replay
* **Append-only log semantics**
* **Lookup, visibility, and crash recovery rules**
* **Small vs large block handling**
It **does not define encoding** (see `ENC/ASL-CORE-INDEX/1`) or semantic mapping (see `ASL/1-CORE-INDEX`).
**Informative references:**
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
* `TGK/1` — TGK semantics and visibility alignment
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
---
## 2. Scope
Covers:
* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
* Packing policy for small vs large artifacts
Excludes:
* Disk-level encoding
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
* Memory residency or caching
* Federation, PEL, or TGK semantics (see `TGK/1` and `TGK/1-CORE`)
---
## 3. Core Concepts
### 3.1 Block
* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique).
* **Properties:**
* Once sealed, contents never change.
* Can be referenced by multiple artifacts.
* May be pinned by snapshots for retention.
* Allocation method is implementation-defined (e.g., hash or sequence).
### 3.2 Index Segment
Segments group index entries and provide **persistence and recovery units**.
* **Open segment:** accepting new index entries, not visible for lookup.
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
* **Segment components:** header, optional bloom filter, index records, footer.
* **Segment visibility:** only after seal and log append.
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include:
* Index additions
* Tombstones
* Segment seals
* Log is replayable to reconstruct CURRENT.
* Log semantics are defined in `ASL/LOG/1`.
### 3.4 Snapshot Identity and Log Position
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
* **SnapshotID**: opaque, immutable identifier for a snapshot.
* **LogPosition**: monotonic integer position in the append-only log.
* **IndexState**: `(SnapshotID, LogPosition)`.
Deterministic replay is defined as:
```
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
### 3.5 Artifact Location
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
---
## 4. PUT/GET Contract (Normative)
### 4.1 PUT Signature
```
put(artifact) -> (ArtifactKey, IndexState)
```
* `ArtifactKey` is the content identity (ASL/1-CORE-INDEX).
* `IndexState = (SnapshotID, LogPosition)` after the PUT is admitted.
### 4.2 PUT Semantics
1. **Structural registration (if applicable)**: if a structural index (SID -> DAG) exists, it MUST register the artifact and reuse existing SID entries.
2. **Materialization (if applicable)**: if the artifact is lazy, materialize deterministically to derive `ArtifactKey`.
3. **Deduplication**: lookup `ArtifactKey` at CURRENT. If present, PUT MUST succeed without writing bytes or adding a new index entry.
4. **Storage**: if absent, write bytes to one or more sealed blocks and produce `ArtifactLocation`.
5. **Index mutation**: append an index entry mapping `ArtifactKey -> ArtifactLocation` and record visibility via log order.
### 4.3 PUT Guarantees
* PUT is idempotent for identical artifacts.
* No visible index entry points to mutable or missing bytes.
* Visibility follows log order and seal rules defined in this document.
### 4.4 GET Signature
```
get(ArtifactKey, IndexState?) -> bytes | NOT_FOUND
```
* `IndexState` defaults to CURRENT when omitted.
### 4.5 GET Semantics
1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_prefix)`.
2. If no entry exists, return `NOT_FOUND`.
3. Otherwise, read exactly the referenced `(BlockID, offset, length)` bytes and return them verbatim.
GET MUST NOT mutate state or trigger materialization.
### 4.6 Failure Semantics
* Partial writes MUST NOT become visible.
* Replay of snapshot + log after crash MUST reconstruct a valid CURRENT.
* Implementations MAY use caching, but MUST preserve determinism.
---
## 5. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
* Sealing ensures any index entry referencing the block is immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.
---
## 6. Segment Lifecycle Semantics
### 5.1 Creation
* Open segment is allocated.
* Index entries appended in log order.
* Entries are invisible until segment seal and log append.
### 5.2 Seal
* Segment is closed to append.
* Seal record is written to append-only log.
* Segment becomes visible for lookup.
* Sealed segment may be snapshot-pinned.
### 5.3 Snapshot Interaction
* Snapshots capture sealed segments.
* Open segments need not survive snapshot.
* Segments below snapshot are replay anchors.
---
## 7. Visibility and Lookup Semantics
### 6.1 Visibility Rules
* Entry visible **iff**:
* The block is sealed.
* Log record exists at position ≤ CURRENT.
* Segment seal recorded in log.
* Entries above CURRENT or referencing unsealed blocks are invisible.
### 6.2 Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse seal-log order** (highest seal log position first).
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.
Determinism:
* Lookup results are identical across platforms given the same snapshot and log prefix.
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
---
## 8. Snapshot Interaction
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:
```
CURRENT = snapshot_state + replay(log)
```
Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
---
## 9. Garbage Collection
Eligibility for GC:
* Segments: sealed, no references from CURRENT or snapshots.
* Blocks: unpinned, unreferenced by any segment or artifact.
Rules:
* GC is safe **only on sealed segments and blocks**.
* Must respect snapshot pins.
* Tombstones may aid in invalidating unreachable blocks.
* Snapshots retained for provenance or receipt verification MUST remain pinned.
Outcome:
* GC never violates CURRENT reconstruction.
* Blocks can be reclaimed without breaking provenance.
---
## 10. Tombstone Semantics
* Optional marker to invalidate prior mappings.
* Visibility rules identical to regular index entries.
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
---
## 11. Small vs Large Block Handling
### 11.1 Definitions
| Term | Meaning |
| ----------------- | --------------------------------------------------------------------- |
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
| **Packing** | Combining multiple small artifacts into a single physical block. |
| **BlockID** | Opaque identifier for a block; addressing is identical for all sizes. |
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
`T_small` is configurable per deployment.
### 11.2 Packing Rules
1. **Small blocks may be packed together** to reduce storage overhead.
2. **Large blocks are never packed with other artifacts**.
3. Mixed segments are **allowed but discouraged**; implementations MAY warn when mixing occurs.
### 11.3 Segment Allocation Rules
1. Small blocks are allocated into segments optimized for packing efficiency.
2. Large blocks are allocated into segments optimized for sequential I/O.
3. Segment sealing and visibility rules remain unchanged.
### 11.4 Indexing and Addressing
All blocks are addressed uniformly:
```
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
```
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
### 11.5 GC and Retention
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
2. Large blocks are reclaimed per block.
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
---
## 12. Crash and Recovery Semantics
* Open segments or unsealed blocks may be lost; no invariant is broken.
* Recovery procedure:
1. Mount last checkpoint snapshot.
2. Replay append-only log from checkpoint.
3. Reconstruct CURRENT.
* Recovery is **deterministic and idempotent**.
* Segments and blocks **never partially visible** after crash.
---
## 13. Normative Invariants
1. Sealed blocks are immutable.
2. Index entries referencing blocks are immutable once visible.
3. Shadowing follows strict log order.
4. Replay of snapshot + log uniquely reconstructs CURRENT.
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
---
## 14. Non-Goals
* Disk-level encoding (ENC-ASL-CORE-INDEX).
* Memory layout or caching.
* Sharding or performance heuristics.
* Federation / multi-domain semantics (handled elsewhere).
* Block packing strategies beyond the policy rules here.
---
## 15. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
---
## 16. Summary
The tier1 ASL-STORE-INDEX specification:
* Defines **block lifecycle** and **segment lifecycle**.
* Makes **snapshot identity and log positions** explicit for replay.
* Ensures deterministic visibility, lookup, and crash recovery.
* Formalizes GC safety and tombstone behavior.
* Adds clear **small vs large block** handling without changing core semantics.

213
tier1/asl-system-1.md Normal file
View file

@ -0,0 +1,213 @@
# ASL/SYSTEM/1 — Unified ASL + TGK + PEL System View
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-01-17
Linked Phase Pack: N/A
Tags: [deterministic, federation, pel, tgk, index]
<!-- Source: /amduat-api/tier1/asl-system-1.md | Canonical: /amduat/tier1/asl-system-1.md -->
**Document ID:** `ASL/SYSTEM/1`
**Layer:** L2 — Cross-cutting system view (no new encodings)
**Depends on (normative):**
* `ASL/1-CORE`
* `ASL/1-CORE-INDEX`
* `ASL/STORE-INDEX/1`
* `ASL/LOG/1`
* `ENC/ASL-CORE-INDEX/1`
**Informative references:**
* `ASL/INDEX-ACCEL/1`
* `TGK/1` — Trace Graph Kernel semantics
* PEL draft specs (program DAG, execution receipts)
* `ASL/FEDERATION/1` — core federation semantics
* `ASL/FEDERATION-REPLAY/1` — cross-node deterministic replay
* `ASL/DAP/1` — domain admission
* `ASL/POLICY-HASH/1` — policy binding
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are
to be interpreted as in RFC 2119.
ASL/SYSTEM/1 is an integration view. It does not define new encodings or
storage formats; those remain in the underlying layer specs.
---
## 1. Purpose & Scope
This document aligns the cross-cutting semantics of:
* ASL index and log behavior
* PEL deterministic execution
* TGK edge semantics and traversal
* Federation visibility and replay
It ensures a single, consistent model for determinism, snapshot bounds, and
domain visibility.
Non-goals:
* New on-disk encodings
* New execution operators
* Domain policy or governance rules
---
## 2. Core Objects (Unified View)
* **Artifact**: immutable byte value (ASL/1-CORE).
* **PER**: PEL Execution Receipt stored as an artifact.
* **TGK Edge**: immutable edge record linking artifacts and/or PERs.
* **Snapshot + Log Prefix**: boundary for deterministic visibility and replay.
* **Domain Visibility**: internal vs published visibility embedded in index
records (ENC-ASL-CORE-INDEX).
All of these objects are addressed and stored via the same index semantics.
---
## 3. Determinism & Snapshot Boundaries
For a fixed `(SnapshotID, LogPrefix)`:
* Index lookup is deterministic (ASL/1-CORE-INDEX).
* TGK traversal is deterministic when bounded by the same snapshot/log prefix.
* PEL execution is deterministic when its inputs are bounded by the same
snapshot/log prefix.
PEL MUST read only snapshot-scoped artifacts and receipts. It MUST NOT depend
on storage layout, block packing, or non-snapshot metadata.
PEL outputs (artifacts and PERs) become visible only through normal index
admission and log ordering.
PEL MUST NOT depend on physical storage metadata. It MAY read only:
* snapshot identity
* execution configuration that is itself snapshot-scoped and immutable
---
## 4. One PEL Principle (Resolution)
There is exactly one PEL: a deterministic, snapshot-bound, authority-aware
derivation language mapping artifacts to artifacts.
Distinctions such as "PEL-S" vs "PEL-P" are not separate languages. They are
policy decisions about how outputs are treated:
* **Promotion** (truth vs view) is a domain policy decision.
* **Publication** (internal vs published) is a visibility decision encoded in
index metadata.
* **Retention** (store, cache, discard, recompute) is a store policy decision.
Implementations MUST NOT fork PEL semantics into separate dialects. Any
classification of outputs MUST be expressed via policy, publication flags, or
receipt annotations, not by changing the execution language.
---
## 5. PEL, PERs, and TGK Integration
* PEL programs consume artifacts and/or PERs.
* PEL execution produces artifacts and a PER describing the run.
* TGK edges may reference artifacts, PERs, or projections derived from them.
---
## 5.1 PERs and Snapshot State (Clarification)
PERs are artifacts that bind deterministic execution to a specific snapshot
and log prefix. They do not introduce a separate storage layer:
* The sequential log and snapshot define CURRENT.
* A PER records that execution observed CURRENT at a specific log prefix.
* Replay uses the same snapshot + log prefix to reconstruct inputs.
* PERs are artifacts and MAY be used as inputs, but programs embedded in
receipts MUST NOT be executed implicitly.
TGK remains a semantic graph layer; it does not alter PEL determinism and does
not bypass the index.
---
## 6. Federation Alignment
Federation operates over the same immutable artifacts, PERs, and TGK edges.
Cross-domain visibility is governed by index metadata:
* `domain_id` identifies the owning domain.
* `visibility` marks internal vs published.
* `cross_domain_source` preserves provenance for imported artifacts.
Deterministic replay across nodes MUST respect:
* Snapshot boundaries
* Log order
* Domain visibility rules
Federation does not change PEL semantics. It propagates artifacts and receipts
that were already deterministically produced.
Admission and policy compatibility gate foreign state: only admitted domains and
policy-compatible published state may be included in a federation view.
---
## 7. Index Alignment
The index is the shared substrate:
* Artifacts, PERs, and TGK edges are all indexed via the same lookup semantics.
* Sharding, SIMD, and filters (ASL/INDEX-ACCEL/1) are advisory and MUST NOT
change correctness.
* Tombstones and shadowing remain the only visibility overrides.
---
## 8. Glossary and Terminology Alignment (Informative)
To prevent drift across layers, the following terms map as:
* **EdgeBody** (`TGK/1-CORE`) — logical edge content (`from[]`, `to[]`, `payload`, `type`).
* **EdgeArtifact** (`TGK/1-CORE`) — ASL Artifact whose payload encodes an EdgeBody.
* **EdgeRef** (`TGK/1-CORE`) — ASL Reference to an EdgeArtifact.
* **TGK index record** (`TGK/1`, `ASL/1-CORE-INDEX`) — index entry that makes an EdgeRef visible under snapshot/log rules; contains no edge payload.
* **TGK traversal result** (`TGK/1`) — snapshot/log-bounded set of visible edges (EdgeRefs) and/or node references derived from indexed EdgeArtifacts.
---
## 9. Summary
ASL/SYSTEM/1 provides a single, consistent view:
* One PEL, with policy-based output treatment
* TGK and PEL both bounded by snapshot + log determinism
* Federation mediated by index-level domain metadata
* Index semantics remain the core substrate for all objects

View file

@ -0,0 +1,251 @@
# ASL/TGK-EXEC-PLAN/1 -- Unified Execution Plan Semantics
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-01-17
Linked Phase Pack: N/A
Tags: [execution, query, tgk, determinism]
<!-- Source: /amduat-api/tier1/asl-tgk-execution-plan-1.md | Canonical: /amduat/tier1/asl-tgk-execution-plan-1.md -->
**Document ID:** `ASL/TGK-EXEC-PLAN/1`
**Layer:** L2 -- Execution plan semantics (no encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX`
* `ASL/LOG/1`
* `ASL/INDEX-ACCEL/1`
* `TGK/1`
**Informative references:**
* `ASL/SYSTEM/1`
* `ENC/ASL-CORE-INDEX/1`
* `ENC/ASL-TGK-EXEC-PLAN/1`
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/TGK-EXEC-PLAN/1 defines execution plan semantics for querying artifacts and TGK edges. It does not define encoding, transport, or runtime scheduling.
---
## 1. Purpose
This document defines the operator model and determinism rules for executing queries over ASL artifacts and TGK edges using snapshot-bounded visibility.
---
## 2. Execution Plan Model (Normative)
An execution plan is a DAG of operators:
```
Plan = { nodes: [Op], edges: [(Op -> Op)] }
```
Each operator includes:
* `op_id`: unique identifier
* `op_type`: operator type
* `inputs`: upstream operator outputs
* `snapshot`: `(SnapshotID, LogPrefix)`
* `constraints`: canonical filters
* `projections`: output fields
* `traversal`: optional traversal parameters
* `aggregation`: optional aggregation parameters
---
## 2.1 Query Abstraction (Informative)
A query can be represented as:
```
Q = {
snapshot: S,
constraints: C,
projections: P,
traversal: optional,
aggregation: optional
}
```
Where:
* `constraints` describe canonical filters (artifact keys, type tags, edge types, roles, node IDs).
* `projections` select output fields.
* `traversal` declares TGK traversal depth and direction.
* `aggregation` defines deterministic reduction operations.
---
## 3. Deterministic Ordering (Normative)
All operator outputs MUST be ordered by:
1. `logseq` ascending
2. canonical key ascending (tie-breaker)
Parallel execution MUST preserve this order.
---
## 4. Visibility Rules (Normative)
Records are visible if and only if:
* `record.logseq <= snapshot.log_prefix`
* The record is not shadowed by a later tombstone
Unknown record types MUST be skipped without breaking determinism.
---
## 5. Operator Types (Normative)
### 5.1 SegmentScan
* Inputs: sealed segments
* Outputs: raw record references
* Rules:
* Only segments with `segment.logseq_min <= snapshot.log_prefix` are scanned.
* Advisory filters MAY be applied but MUST NOT introduce false negatives.
* Shard routing MAY be applied prior to scan if deterministic.
### 5.2 IndexFilter
* Inputs: record stream
* Outputs: filtered record stream
* Rules:
* Applies canonical constraints (artifact key, type tag, TGK edge type, roles).
* Filters MUST be exact; advisory filters are not sufficient.
### 5.3 TombstoneShadow
* Inputs: record stream + tombstone stream
* Outputs: visible records only
* Rules:
* Later tombstones shadow earlier entries with the same canonical key.
### 5.4 Merge
* Inputs: multiple ordered streams
* Outputs: single ordered stream
* Rules:
* Order is `logseq` then canonical key.
* Merge MUST be deterministic regardless of shard order.
### 5.5 Projection
* Inputs: record stream
* Outputs: projected fields
* Rules:
* Projection MUST preserve input order.
### 5.6 TGKTraversal
* Inputs: seed node set
* Outputs: edge and/or node stream
* Rules:
* Expansion MUST respect snapshot bounds.
* Traversal depth MUST be explicit.
* Order MUST follow deterministic ordering rules.
### 5.7 Aggregation (Optional)
* Inputs: record stream
* Outputs: aggregate results
* Rules:
* Aggregation MUST be deterministic given identical inputs and snapshot.
### 5.8 LimitOffset (Optional)
* Inputs: ordered record stream
* Outputs: ordered slice
* Rules:
* Applies pagination or top-N selection.
* MUST preserve deterministic order from upstream operators.
### 5.9 ShardDispatch (Optional)
* Inputs: shard-local streams
* Outputs: ordered global stream
* Rules:
* Shard execution MAY be parallel.
* Merge MUST preserve deterministic ordering by `logseq` then canonical key.
### 5.10 SIMDFilter (Optional)
* Inputs: record stream
* Outputs: filtered record stream
* Rules:
* SIMD filters are advisory accelerators.
* Canonical checks MUST still be applied before output.
---
## 6. Acceleration Constraints (Normative)
Acceleration mechanisms (filters, routing, SIMD) MUST be observationally invisible:
* False positives are permitted.
* False negatives are forbidden.
* Canonical checks MUST always be applied before returning results.
---
## 7. Plan Serialization (Optional)
Execution plans MAY be serialized for reuse or deterministic replay.
```c
struct exec_plan {
uint32_t plan_version;
uint32_t operator_count;
struct operator_def operators[];
struct operator_edge edges[];
};
```
Serialization MUST preserve operator parameters, snapshot bounds, and DAG edges.
---
## 8. GC Safety (Informative)
Records and edges MUST NOT be removed if they appear in a snapshot or are
reachable via traversal at that snapshot.
---
## 9. Non-Goals
ASL/TGK-EXEC-PLAN/1 does not define:
* Runtime scheduling or parallelization strategy
* Encoding of operator plans
* Query languages or APIs
* Operator cost models

944
tier1/dds.md Normal file
View file

@ -0,0 +1,944 @@
# AMDUAT-DDS — Detailed Design Specification
Status: Approved
Owner: Niklas Rydberg
Version: 0.5.0
SoT: Yes
Last Updated: 2025-11-11
Linked Phase Pack: PH01
Tags: [design, cas, composition]
<!-- Source: /amduat-api/tier1/dds.md | Canonical: /amduat/tier1/dds.md -->
**Document ID:** `AMDUAT-DDS`
**Layer:** L0.1 — Byte-level design (CAS + deterministic envelopes)
**Depends on (normative):**
* `AMDUAT-SRS` — behavioural requirements
* ADR-001 — CAS identity
* ADR-003 — canonical encoding discipline
* ADR-006 — deterministic error semantics
**Informative references:**
* ADR-015 — rejection governance
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
> **Note (scope):**
> This DDS covers **Phase 01 (Kheper CAS)** byte semantics and, where necessary, the canonical **binary encodings** for higher deterministic layers (FCS/1, PCB1, FER/1, FCT/1).
> **Behavioural semantics live in SRS.** This document governs the **bytes**.
---
## 1 Content ID (CID)
**Rule.**
```
CID = algo_id || H("CAS:OBJ\0" || payload_bytes)
```
* `algo_id`: 1-byte or VARINT identifier (default `0x01` = SHA-256).
* `H`: selected hash over **exact payload bytes**.
* Domain separation prefix must be present verbatim: `"CAS:OBJ\0"`.
**Properties.**
* Deterministic: identical payload → identical CID.
* Implementation-independent (SRS NFR-001).
* Crypto-agile via `algo_id`.
**Errors.**
* `ERR_ALGO_UNSUPPORTED` when `algo_id` not registered.
* Empty payload is allowed and canonical.
---
## 2. Canonical Object Record (COR/1)
COR/1 is the **only** canonical import/export envelope for CAS objects. Exact bytes are consensus; on-disk layout is not.
### 2.1 Envelope Layout (exact bytes)
```
Header (7 bytes total):
MAGIC : 4 bytes = "CAS1" (0x43 0x41 0x53 0x31)
VERSION : 1 byte = 0x01
FLAGS : 1 byte = 0x00 (reserved; MUST be 0)
RSV : 1 byte = 0x00 (reserved; MUST be 0)
Body (strict TLV order; no padding):
0x10 algo_id (VARINT)
0x11 size (VARINT)
0x12 payload (BYTES; length == size)
```
**Notes**
* Fixed header invariants; any mismatch is rejection.
* No alignment/padding anywhere.
### 2.2 Tag Semantics
| Tag | Name | Type | Card. | Notes |
| ---: | ------- | ------ | ----: | ----------------------------------------------- |
| 0x10 | algo_id | VARINT | 1 | MUST equal algorithm used for the objects CID. |
| 0x11 | size | VARINT | 1 | **Minimal VARINT**; MUST equal payload length. |
| 0x12 | payload | BYTES | 1 | Raw bytes; never normalized. |
### 2.3 Canonicalization Rules (strict)
1. **Order & uniqueness:** `0x10`, `0x11`, `0x12`, each exactly once.
2. **VARINTS:** Unsigned LEB128 **minimal** form only.
3. **BYTES:** `VARINT(len) || len bytes`, with `len == size`.
4. **No extras:** No unknown tags, no trailing bytes.
5. **Header invariants:** `MAGIC="CAS1"`, `VERSION=0x01`, `FLAGS=RSV=0x00`.
6. **Policy domain:** `size ≤ max_object_size` when enforced (ICD/1 §3).
7. **Raw byte semantics** (SRS FR-010).
### 2.4 Decoder Validation Algorithm (normative)
1. Validate header ⇒ else `ERR_COR_HEADER_INVALID`.
2. Read `0x10` minimal VARINT ⇒ else `ERR_COR_TAG_ORDER` / `ERR_VARINT_NON_MINIMAL`.
3. Read `0x11` minimal VARINT ⇒ same error rules.
4. Read `0x12` BYTES (length minimal VARINT) ⇒ else `ERR_VARINT_NON_MINIMAL`.
5. Enforce `size == len(payload)``ERR_COR_LENGTH_MISMATCH` on failure.
6. Ensure **no trailing bytes**`ERR_TRAILING_BYTES`.
7. Recompute CID and compare ⇒ mismatch `ERR_CORRUPT_OBJECT`.
### 2.5 Consistency with CID (normative)
* **Export:** set `algo_id` to CID algorithm.
* **Import:** verify `algo_id` and hash component against expected CID.
* Mismatch ⇒ `ERR_ALGO_MISMATCH` / `ERR_CORRUPT_OBJECT`.
### 2.6 Round-Trip Identity
`import(COR/1) → export(CID)` MUST produce **byte-identical** envelope (SRS FR-005). Re-encoding is forbidden.
### 2.7 Rejection Matrix (normative)
| Violation | Example | Error |
| ------------------ | -------------------------------- | ------------------------- |
| Bad header | Wrong MAGIC/VERSION/FLAGS/RSV | `ERR_COR_HEADER_INVALID` |
| Unknown/extra tag | Any tag not 0x10/0x11/0x12 | `ERR_COR_UNKNOWN_TAG` |
| Out-of-order | `0x11` before `0x10` | `ERR_COR_TAG_ORDER` |
| Duplicate tag | Two `0x10` entries | `ERR_COR_DUPLICATE_TAG` |
| Non-minimal VARINT | Over-long algo/size/bytes length | `ERR_VARINT_NON_MINIMAL` |
| Length mismatch | `size != len(payload)` | `ERR_COR_LENGTH_MISMATCH` |
| Trailing bytes | Any bytes after payload | `ERR_TRAILING_BYTES` |
| Algo mismatch | `algo_id` conflicts with CID | `ERR_ALGO_MISMATCH` |
| Hash mismatch | Recomputed hash ≠ expected | `ERR_CORRUPT_OBJECT` |
---
## 3. Instance Descriptor (ICD/1)
ICD/1 publishes canonical instance configuration; its bytes are consensus.
### 3.1 Envelope
```
Header:
MAGIC : "ICD1"
VERSION : 0x01
TLV (strict order; minimal VARINTs; no duplicates):
0x20 algo_default (VARINT)
0x21 max_object_size (VARINT)
0x22 cor_version (VARINT) # 0x01 => COR/1 v1
0x23 gc_policy_id (VARINT; 0 if none)
0x24 impl_id (BYTES; optional build/impl descriptor CID)
```
### 3.2 Derived Identity
```
instance_id = SHA-256("CAS:ICD\0" || bytes(ICD/1))
```
**Rules:** Ordering/minimal VARINTs mirror COR/1. Exporters preserve canonical bytes; `instance_id` is stable.
---
## 4. Encodings
* **VARINT (unsigned LEB128)** — minimal form only; else `ERR_VARINT_NON_MINIMAL`.
* **BYTES**`VARINT(length) || length bytes`.
* **Fixed-width integers** — big-endian if present.
* **No padding/alignment** in canonical encodings.
---
## 5. Algorithm Registry
**Default**
* `0x01` → SHA-256
**Reserved**
* `0x02` → SHA-512/256
* `0x03` → BLAKE3
**Policy**
* New entries require ADR + test vectors. Backward compatible by design.
---
## 6. Filesystem Considerations (Informative)
```
cas/
├─ sha256/
│ ├─ aa/.. # fan-out by CID prefix (implementation detail)
│ └─ ff/..
└─ amduat/
└─ <instance-id>/
├─ amduatcas
├─ sha256/.. # private runtime state; never a put() target
├─ interface/
│ └─ libamduatcas.current
├─ HEAD
└─ meta/
```
**Rule:** Public CAS API acts only on `cas/sha256/`. The per-instance subtree is private and MUST NOT receive `put()` writes.
---
## 7. Error Conditions & Higher-Layer Layouts (Normative)
### 7.1 COR/1 & ICD/1 Enforcement (codes)
* `ERR_COR_HEADER_INVALID`, `ERR_COR_UNKNOWN_TAG`, `ERR_COR_TAG_ORDER`, `ERR_COR_DUPLICATE_TAG`,
`ERR_COR_LENGTH_MISMATCH`, `ERR_VARINT_NON_MINIMAL`, `ERR_ALGO_UNSUPPORTED`,
`ERR_ALGO_MISMATCH`, `ERR_TRAILING_BYTES`, `ERR_CORRUPT_OBJECT`.
---
### 7.2 FCS/1 Descriptor Layout — v1-min (Normative)
> **Design principle:** *FCS/1 describes the deterministic execution recipe only.*
> Intent, roles, scope, authority, and registry policy are **not** encoded in FCS; they are captured at **certification time** in FCT/1.
Header: `MAGIC="FCS1" VERSION=0x01 FLAGS=RSV=0x00`
| Tag | Field | Type | Card. | Notes |
| ---: | ----------------- | ------ | ----: | ------------------------------------------ |
| 0x30 | `function_ptr` | CID | 1 | FPS/1 primitive or nested FCS/1 descriptor |
| 0x31 | `parameter_block` | CID | 1 | CID of PCB1 parameter block |
| 0x32 | `arity` | VARINT | 1 | Expected parameter slots |
**Validation rules**
1. Strict TLV order; duplicates/out-of-order → `ERR_FCS_TAG_ORDER`.
2. `parameter_block` MUST be valid PCB1 → `ERR_FCS_PARAMETER_FORMAT`.
3. `arity` MUST match slot count → `ERR_PCB_ARITY_MISMATCH`.
4. Descriptor graph MUST be acyclic → `ERR_FCS_CYCLE_DETECTED`.
5. **Any unknown or legacy governance tag** (`registry_policy 0x33`, `intent_vector 0x34`, `provenance_edge 0x35`, `notes 0x36`, or unregistered fields) → `ERR_FCS_UNKNOWN_TAG`. Such tags MUST never be tolerated in canonical streams.
---
### 7.3 PCB1 Parameter Blocks (Normative)
PCB1 payloads are COR/1 envelopes with header `MAGIC="PCB1"`, `VERSION=0x01`, `FLAGS=RSV=0x00`.
| Tag | Field | Type | Notes |
| ---: | --------------- | ----- | ----------------------------------------------------- |
| 0x50 | `slot_manifest` | BCF/1 | Canonical slot descriptors `{index,name,type,digest}` |
| 0x51 | `slot_data` | BYTES | Packed slot bytes respecting manifest order |
**Rules:**
Slots appear in ascending `index`. Numeric slots default to `0` when omitted.
Digest mismatches ⇒ `ERR_PCB_DIGEST_MISMATCH`. Non-deterministic ordering ⇒ `ERR_PCB_MANIFEST_ORDER`.
Arity mismatch vs FCS/1 ⇒ `ERR_PCB_ARITY_MISMATCH`.
---
### 7.4 **FER/1 Receipt Layout (Normative)**
FER/1 receipts reuse COR/1 framing with header `"FER1"` and are byte-deterministic.
**Strict TLV order (no padding):**
| Tag | Field | Type | Cardinality | Notes |
| ---- | --------------------- | ----------- | ----------- | ----- |
| 0x40 | `function_cid` | CID | 1 | Evaluated FCS/1 descriptor (must decode to v1-min). |
| 0x41 | `input_manifest` | CID | 1 | MUST decode to GS/1 BCF/1 set list (deduped, byte-lexicographic). |
| 0x42 | `environment` | CID | 1 | ICD/1 snapshot or PH03 environment capsule. |
| 0x43 | `evaluator_id` | BYTES | 1 | Stable evaluator identity (DID/descriptor CID). |
| 0x44 | `executor_set` | BCF/1 map | 1 | Map of executors → impl metadata (language/version/build); keys sorted. |
| 0x4F | `executor_fingerprint`| CID | 01 | SBOM/attestation CID feeding `run_id`; REQUIRED when `run_id` present. |
| 0x45 | `output_cid` | CID | 1 | Canonical output CID (single-output invariant). |
| 0x46 | `parity_vector` | BCF/1 list | 1 | Sorted by executor key; each entry carries `{executor, output, digest, sbom_cid}`. |
| 0x47 | `logs` | LIST<BCF/1> | 01 | Typed log capsules (`kind`, `cid`, `sha256`). |
| 0x51 | `determinism_level` | ENUM | 01 | `"D1_bit_exact"` (default) or `"D2_numeric_stable"`. |
| 0x50 | `rng_seed` | BYTES | 01 | 032 byte seed REQUIRED when determinism ≠ D1. |
| 0x52 | `limits` | BCF/1 map | 01 | Resource envelope (`cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`). |
| 0x48 | `started_at` | UINT64 | 1 | Epoch seconds (FR-020 start bound). |
| 0x49 | `completed_at` | UINT64 | 1 | Epoch seconds ≥ `started_at`. |
| 0x53 | `parent` | CID | 01 | Optional lineage pointer for follow-up runs. |
| 0x4A | `context` | BCF/1 map | 01 | Optional scheduling hooks (WT/1 ticket, TA/1 branch tip, notes ref). |
| 0x4B | `witnesses` | BCF/1 list | 01 | Optional observer descriptors / co-signers. |
| 0x4E | `run_id` | BYTES[32] | 01 | Deterministic dedup anchor (`H("AMDUAT:RUN\0" || function || manifest || env || fingerprint)`). |
| 0x4C | `signature` | BCF/1 map | 1 | Primary Ed25519 signature over `H("AMDUAT:FER\0" || canonical bytes)`. |
| 0x4D | `signature_ext` | BCF/1 list | 01 | Reserved slot for multi-sig / threshold proofs (future). |
**Validation:**
1. TLV order strict; unknown tags ⇒ `ERR_FER_TAG_ORDER` / `ERR_FER_UNKNOWN_TAG`.
2. `function_cid` must decode to valid FCS/1 ⇒ `ERR_FER_FUNCTION_MISMATCH` otherwise.
3. `input_manifest` MUST decode to GS/1 set list (deduped + byte-lexicographic). Violations ⇒ `ERR_FER_INPUT_MANIFEST_SHAPE`.
4. `executor_set` keys MUST be byte-lexicographic and align with `parity_vector` entries. Ordering mismatches ⇒ `ERR_IMPL_PARITY_ORDER`; missing executors or divergent outputs ⇒ `ERR_IMPL_PARITY`.
5. Each parity entry MUST declare `sbom_cid` referencing the executors mini-SBOM CID.
6. `determinism_level` defaults to `D1_bit_exact`; when set to any other value a 032 byte `rng_seed` is REQUIRED ⇒ `ERR_FER_RNG_REQUIRED`.
7. `limits` (when present) MUST supply non-negative integers for `cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`.
8. `logs` (when present) MUST contain objects with `kind ∈ {stderr, stdout, metrics, trace}`, `cid`, and `sha256` (both 32-byte hex strings).
9. `run_id` (when present) MUST equal `H("AMDUAT:RUN\0" || function_cid || manifest_cid || environment_cid || executor_fingerprint)`; missing fingerprint ⇒ `ERR_FER_UNKNOWN_TAG`.
10. `completed_at < started_at``ERR_FER_TIMESTAMP` (FR-020 envelope enforcement).
11. Signatures MUST verify against `H("AMDUAT:FER\0" || canonical bytes)` ⇒ failure ⇒ `ERR_FER_SIGNATURE`.
> **Manifest note:** `input_manifest` bytes MUST be the GS/1 canonical list; ingestion MUST reject producer-specific ordering.
> **Log capsule note:** `logs` entries bind `kind`, `cid`, and `sha256` together to avoid stdout/stderr hash confusion.
> **Dedup note:** `run_id` enables idempotent FER ingestion across registries while keeping the FER CID authoritative.
> **Provenance note:** FER/1 remains the exclusive home for run-time provenance and parity outcomes; governance stays in FCT/1.
> **Graph note:** Ingestors emit `realizes`, `produced_by`, `consumed_by`, and (optionally) `fulfills` edges based solely on FER content.
---
### 7.5 **FCT/1 Transaction Envelope (Normative)**
> **Design principle:** *FCT/1 is the canonical home for **intent**, **domain scope**, **roles/authority**, and **policy snapshot*** captured at certification/publication time.
FCT/1 serializes as ADR-003 BCF/1 map with canonical keys:
| Key | Type | Notes |
| --------------------- | ----------- | ------------------------------------------------------- |
| `fct.version` | UINT8 | MUST be `1` |
| `fct.registry_policy` | UINT8 | Publication policy snapshot (0=Open,1=Curated,2=Locked) |
| `fct.function` | CID | Certified FCS/1 descriptor |
| `fct.receipts` | LIST<CID> | One or more FER/1 CIDs |
| `fct.authority_role` | ENUM | ADR-010C role |
| `fct.domain_scope` | ENUM | ADR-010B scope |
| `fct.intent` | SET<ENUM> | ADR-010 intents |
| `fct.constraints` | LIST<BCF/1> | Optional constraint set |
| `fct.attestations` | LIST<BYTES> | Required when policy ≠ Open |
| `fct.timestamp` | UINT64 | Epoch seconds |
| `fct.publication` | CID | Optional ADR-007 digest |
**Validation:**
1. All receipts reference the same `function_cid` ⇒ else `ERR_FCT_RECEIPT_MISMATCH`.
2. If `registry_policy ≠ 0` then `attestations` **required**`ERR_FCT_ATTESTATION_REQUIRED`.
3. All signatures/attestations verify ⇒ `ERR_FCT_SIGNATURE` on failure.
4. Receipt timestamps must be monotonic ⇒ `ERR_FCT_TIMESTAMP`.
---
### 7.6 FPD/1 Publication Digest (Normative)
> **Design principle:** *Federation publishes exactly one deterministic digest per event (ADR-007, SRS FR-022).*
FPD/1 serializes as an ADR-003 BCF/1 map with canonical keys:
| Key | Type | Notes |
| --------------- | ---------- | --------------------------------------------------------------------- |
| `fpd.version` | UINT8 | MUST be `1`. |
| `fpd.members` | LIST<CID> | Deterministic, byte-lexicographic list of member artefact CIDs. |
| `fpd.parent` | CID (opt) | Previous FPD/1 digest for the domain publication chain (or `null`). |
| `fpd.timestamp` | UINT64 | Epoch seconds aligned with `fct.timestamp` monotonic ordering. |
| `fpd.digest` | CID | Canonical digest over `{FCT/1 bytes, FER/1 receipts, governance edges}`. |
**Construction:**
1. Normalize and sign the FCT/1 record (per §7.5) writing canonical bytes to the payload area (PA).
2. Collect referenced FER/1 receipts and governance edges (`certifies`, `attests`, `publishes`) as canonical byte arrays.
3. Build `fpd.members` as the byte-lexicographic list of CIDs for the certified FCT/1 record, every FER/1 receipt, and the edge batch capsule.
4. Hash the concatenated canonical payloads using the federation digest algorithm (default `CIDv1/BCF`). Persist the resulting bytes and record the CID in `fpd.digest`.
5. If a prior publication exists, set `fpd.parent` to the previous digest CID; otherwise omit.
6. Emit the FPD/1 map, persist alongside the FCT/1 payload under `/logs/ph03/evidence/fct/`, and update `fct.publication` with the FPD/1 CID.
**Validation:**
* `fpd.members` MUST include exactly one FCT/1 CID and the full set of FER/1 receipt CIDs referenced by that transaction.
* Recomputing the digest from the persisted canonical payloads MUST yield `fpd.digest`; mismatches ⇒ `ERR_FPD_DIGEST` (registered under ADR-006).
* `fpd.timestamp` MUST be ≥ the largest FER/1 `completed_at` and ≥ the prior `fpd.timestamp` when `fpd.parent` is present ⇒ violations raise `ERR_FPD_TIMESTAMP`.
* Graph emitters MUST log governance edges via `lib/g1-emitter/` using the canonical digests referenced above.
> **Graph note:** Publication surfaces emit `publishes(fct,fpd)` edges binding certification state to digest lineage for PH04 FLS/1 integration.
### 7.7 Error Surface Registration (consolidated)
All FCS/1, PCB1, FER/1, and FCT/1 errors map to ADR-006.
Additions since v0.3.0:
| Code | Meaning |
| --------------------- | -------------------------------------------------------------------------------------- |
| `ERR_FCS_UNKNOWN_TAG` | Descriptor contained a tag outside the v1-min set (`0x30-0x32`). Rejected per ADR-006. |
| `ERR_EXEC_TIMEOUT` | Executor exceeded deterministic time envelope (Maats Balance). |
| `ERR_IMPL_PARITY` | Executor outputs/parity metadata diverged (missing executor, mismatched `output_cid`). |
| `ERR_IMPL_PARITY_ORDER` | Parity vector ordering did not match the canonical executor ordering. |
| `ERR_FER_UNKNOWN_TAG` | FER/1 payload contained an unknown tag or cardinality violation. |
| `ERR_FER_INPUT_MANIFEST_SHAPE` | `input_manifest` failed GS/1 set decoding (not deduped or unsorted). |
| `ERR_FER_RNG_REQUIRED` | `determinism_level` demanded an `rng_seed` but none was provided. |
| `ERR_FPD_DIGEST` | Recomputed federation digest did not match `fpd.digest` (non-deterministic publication). |
| `ERR_FPD_TIMESTAMP` | Publication timestamp regressed relative to receipts or parent digest. |
| `ERR_FPD_PARENT_REQUIRED` | Policy-enforced lineage expected `fpd.parent` but none was provided. |
| `ERR_FPD_MEMBER_DUP` | Duplicate member CID detected in the canonical set ordering. |
| `ERR_WT_UNKNOWN_KEY` | WT/1 map contained a key outside the v1-min schema. |
| `ERR_WT_VERSION_UNSUPPORTED` | `wt.version` not equal to `1`. |
| `ERR_WT_INTENT_EMPTY` | `wt.intent` list empty. |
| `ERR_WT_INTENT_DUP` | Duplicate ADR-010 intents detected in `wt.intent`. |
| `ERR_WT_TIMESTAMP` | `wt.timestamp` regressed relative to the previous ticket from the same author. |
| `ERR_WT_SIGNATURE` | Signature validation over `"AMDUAT:WT\0"` failed. |
| `ERR_WT_KEY_UNBOUND` | Declared `wt.pubkey` is not authorized for `wt.author` via the predicate registry. |
| `ERR_WT_INTENT_UNREGISTERED` | `wt.intent` entry not registered in ADR-010 predicate registry. |
| `ERR_WT_SCOPE_UNAUTHORIZED` | Router policy rejected the declared domain scope. |
| `ERR_WT_PARENT_UNKNOWN` | Optional `wt.parent` reference could not be resolved. |
| `ERR_WT_PARENT_REQUIRED` | Policy required `wt.parent` but the field was omitted. |
| `ERR_SOS_UNKNOWN_KEY` | SOS/1 map contained a key outside the v1-min schema. |
| `ERR_SOS_VERSION_UNSUPPORTED` | `sos.version` not equal to `1`. |
| `ERR_SOS_PREDICATE_UNREGISTERED` | Overlay predicate not registered in the CRS predicate registry. |
| `ERR_SOS_POLICY_INCOMPATIBLE` | `sos.policy` outside `{0,1,2}` or disallowed for the deployment lane. |
| `ERR_SOS_SIGNATURE_INVALID` | Signature validation over `"AMDUAT:SOS\0"` failed. |
| `ERR_SOS_COMPAT_EVIDENCE_REQUIRED` | Compat overlays missing MPR/1 + IER/1 references. |
| `ERR_SOS_TIMESTAMP_REGRESSION` | Overlay timestamp regressed relative to policy baseline. |
### 7.8 FLS/1 and CRS/1 Byte Semantics
Phase 04 establishes deterministic linkage between FLS/1 envelopes and CRS/1 concept graphs. ADR-018 governs the linkage envelope; ADR-020 governs concept and relation payloads. CI harnesses (`tools/ci/run_vectors.py`, `tools/ci/gs_snapshot.py`) provide conformance evidence.
#### 7.8.1 FLS/1 Envelope TLVs (Draft)
> **Scope:** Draft wire image aligned with ADR-018 v0.5.0. Stewardship will finalize signature semantics alongside multi-surface publication work.
| Tag | Field | Type | Card. | Notes |
| ------ | -------------------- | ------ | ----- | ----- |
| `0x60` | `source_cid` | CID | 1 | Deterministic sender artefact/surface. |
| `0x61` | `target_cid` | CID | 1 | Deterministic recipient artefact/surface. |
| `0x62` | `payload_cid` | CID | 1 | Content payload (COR/1 capsule, CRS/1 concept, or CRR/1 relation). |
| `0x63` | `routing_policy_cid` | CID | 0-1 | Optional deterministic policy capsule. |
| `0x64` | `timestamp` | UINT64 | 0-1 | Optional bounded timing evidence (big-endian). |
| `0x65` | `signature` | BYTES | 0-1 | Optional Ed25519 signature with `"AMDUAT:FLS\0"` domain separator. |
**Envelope rules (draft):**
* Header MUST present `MAGIC="FLS1"`, `VERSION=0x01`, and zeroed `FLAGS/RSV` bytes.
* TLVs MUST appear in strictly increasing tag order. Duplicate tags ⇒ `ERR_FLS_DUPLICATE_TAG`; reordering ⇒ `ERR_FLS_TAG_ORDER`.
* Unknown tags are rejected until ADR updates extend this table (`ERR_FLS_UNKNOWN_TAG`).
* CID TLVs MUST present 32-byte payloads aligned with ADR-001 ⇒ `ERR_FLS_CID_LENGTH`.
* `timestamp` MUST be exactly eight bytes (UINT64, network byte order) ⇒ `ERR_FLS_TIMESTAMP_LENGTH`.
* `signature` MUST start with `"AMDUAT:FLS\0"` and carry a 64-byte Ed25519 signature ⇒ `ERR_FLS_SIGNATURE_DOMAIN` / `ERR_FLS_SIGNATURE_LENGTH`; failing Ed25519 verification raises `ERR_FLS_SIGNATURE`.
* When supplied, CRS payload bytes MUST hash to the declared `payload_cid` using `SHA-256("CAS:OBJ\0" || payload)``ERR_FLS_PAYLOAD_CID_MISMATCH`.
* CRS payload headers MUST match `CRS1` (concept) or `CRR1` (relation) when linkage metadata declares the type ⇒ `ERR_FLS_PAYLOAD_KIND`.
* Payloads MAY be CRS/1 concepts or CRR/1 relations; FLS/1 envelopes never mutate CRS graphs.
#### 7.8.2 CRS/1 Concept & Relation TLVs (Normative)
> **Scope:** Deterministic CRS/1 byte layout as ratified by ADR-020 v1.1.0. All TLVs
> use single-byte tags + single-byte lengths with fixed 32-byte payloads.
**Concept Header** — `MAGIC="CRS1"`, `VERSION=0x01`, `FLAGS=0x00`, `RSV=0x00`.
| Tag | Field | Type | Card. | Notes |
| ------ | ------------------ | ---- | ----- | ----- |
| `0x40` | `description_cid` | CID | 1 | Canonical COR/1/BCF descriptor for the concept text/essence. |
| `0x41` | `relations_cid` | CID | 1 | Deterministic list CID of outbound relation CIDs. |
**Relation Header** — `MAGIC="CRR1"`, `VERSION=0x01`, `FLAGS=0x00`, `RSV=0x00`.
| Tag | Field | Type | Card. | Notes |
| ------ | ----------------- | ---- | ----- | ----- |
| `0x42` | `source_cid` | CID | 1 | Originating Concept CID. |
| `0x43` | `target_cid` | CID | 1 | Destination Concept or artefact CID. |
| `0x44` | `predicate_cid` | CID | 1 | Registered predicate Concept CID. |
**Validation rules**
* Headers MUST match the values above; mismatches reject as malformed.
* TLVs MUST appear exactly once in the order listed. Missing or out-of-order
TLVs ⇒ `ERR_CRS_TAG_ORDER` (concept) or `ERR_CRR_TAG_ORDER` (relation).
* Duplicate relation tags ⇒ `ERR_CRR_DUPLICATE_TAG`.
* TLV payloads MUST be exactly 32 bytes ⇒ `ERR_CRS_LENGTH_MISMATCH` / `ERR_CRR_LENGTH_MISMATCH`.
* Unknown tags are rejected ⇒ `ERR_CRS_UNKNOWN_TAG` / `ERR_CRR_UNKNOWN_TAG`.
* `predicate_cid` MUST reference a CRS Concept (`ERR_CRR_PREDICATE_NOT_CONCEPT`). When a predicate taxonomy exists, predicates MUST declare `is_a → Predicate` (`ERR_CRR_PREDICATE_CLASS_MISSING`).
**Error mapping (ADR-006)**
| Code | Condition |
| ---- | --------- |
| `ERR_CRS_TAG_ORDER` | Concept TLVs missing, duplicated, or out of order. |
| `ERR_CRS_LENGTH_MISMATCH` | Concept TLV payload not exactly 32 bytes. |
| `ERR_CRS_UNKNOWN_TAG` | Concept TLV tag outside `0x400x41`. |
| `ERR_CRR_TAG_ORDER` | Relation TLVs missing, duplicated, or out of order. |
| `ERR_CRR_LENGTH_MISMATCH` | Relation TLV payload not exactly 32 bytes. |
| `ERR_CRR_UNKNOWN_TAG` | Relation TLV tag outside `0x420x44`. |
| `ERR_CRR_DUPLICATE_TAG` | Duplicate relation TLV encountered. |
| `ERR_CRR_PREDICATE_NOT_CONCEPT` | `predicate_cid` did not resolve to a CRS Concept. |
| `ERR_CRR_PREDICATE_CLASS_MISSING` | Predicate Concept missing `is_a → Predicate` taxonomy edge. |
**CID derivation**
```
concept_cid = SHA-256("CAS:OBJ\0" || bytes(CRS/1 concept record))
relation_cid = SHA-256("CAS:OBJ\0" || bytes(CRR/1 relation record))
```
Byte-identical records MUST yield identical CIDs; any mutation requires a new
record.
### 7.9 WT/1 Audited Ticket Intake (Normative)
WT/1 (ADR-023) captures auditable intent-to-change tickets as an ADR-003 BCF/1
map. Keys are UTF-8 strings sorted lexicographically; values use canonical BCF
types.
| Key | Type | Cardinality | Notes |
| -------------- | ----------------- | ----------- | ----- |
| `wt.version` | UINT8 | 1 | MUST equal `1`. |
| `wt.author` | CID (hex string) | 1 | CRS Concept or DID capsule representing the submitting actor. |
| `wt.scope` | CID (hex string) | 1 | ADR-010B domain scope concept CID. |
| `wt.intent` | LIST<STRING> | 1 | Non-empty ADR-010 intent identifiers; deduped and byte-lexicographically sorted. |
| `wt.payload` | CID (hex string) | 1 | CRS manifest, change plan, or opaque payload describing proposed work. |
| `wt.timestamp` | UINT64 | 1 | Epoch seconds; MUST be monotonic per `wt.author`. |
| `wt.pubkey` | BYTES[32] | 1 | Ed25519 public key used to verify `wt.signature`; MUST bind to `wt.author`. |
| `wt.signature` | BYTES[64] | 1 | Ed25519 signature over `H("AMDUAT:WT\0" || canonical_bytes_without_signature)`. |
| `wt.parent` | CID (hex string) | 01 | Optional lineage pointer to the previous WT/1 ticket for the same author. |
**Encoding rules**
1. `wt.intent` MUST be encoded as a list of unique UTF-8 strings sorted
lexicographically; duplicates ⇒ `ERR_WT_INTENT_DUP`; entries not registered in
ADR-010 ⇒ `ERR_WT_INTENT_UNREGISTERED`.
2. CIDs serialize as lowercase hex strings (32 bytes → 64 hex chars) matching
`SHA-256("CAS:OBJ\0" || payload)` outputs.
3. `wt.signature` is a 64-byte Ed25519 signature; `wt.pubkey` supplies the
32-byte verification key. The signature domain-separates with
`"AMDUAT:WT\0"` and excludes the `wt.signature` field from the canonical byte
stream hashed for verification.
**Validation**
1. Unknown keys ⇒ `ERR_WT_UNKNOWN_KEY`.
2. `wt.version != 1``ERR_WT_VERSION_UNSUPPORTED`.
3. Empty `wt.intent``ERR_WT_INTENT_EMPTY`.
4. `wt.timestamp` less than the prior accepted ticket for the same `wt.author`
`ERR_WT_TIMESTAMP`. When `wt.parent` is provided, its timestamp MUST NOT
exceed the child timestamp; violations ⇒ `ERR_WT_TIMESTAMP`.
5. Signature verification failure ⇒ `ERR_WT_SIGNATURE`.
6. Routers MUST verify `has_pubkey(wt.author, wt.pubkey)` (or registered
equivalent) ⇒ missing edge raises `ERR_WT_KEY_UNBOUND`.
7. Unknown ADR-010 intent ⇒ `ERR_WT_INTENT_UNREGISTERED`.
8. Router policy rejection of `wt.scope``ERR_WT_SCOPE_UNAUTHORIZED`.
9. Provided `wt.parent` that cannot be resolved ⇒ `ERR_WT_PARENT_UNKNOWN`.
10. Policy required lineage but omitted `wt.parent``ERR_WT_PARENT_REQUIRED`.
**Router integration**
* `POST /wt` (Protected Area) accepts WT/1 payloads, verifies signatures against
`wt.pubkey`, enforces ADR-010 intent membership, validates optional
`wt.parent` lineage, and rejects timestamp regressions.
* `GET /wt/:cid` returns canonical WT/1 bytes for replay.
* `GET /wt?after=<cid>&limit=<n>` paginates deterministically by CID
(byte-lexicographic). `after` is an exclusive bound; routers enforce
`1 ≤ limit ≤ Nmax` and MUST preserve stable replay windows.
* Responses MUST include canonical WT/1 bytes; no rewriting or reformatting is
permitted.
**Evidence & vectors**
* `/amduat/logs/ph04/evidence/wt1/PH04-EV-WT-001/summary.md` — validator run linking
router behaviour to vectors.
* `/amduat/vectors/ph04/wt1/` — fixtures `TV-WT-001…009` covering success,
unknown key, signature failure, timestamp regression, key unbound, intent
unregistered, parent timestamp inversion, scope policy rejection, and
unresolved parent lineage.
### 7.10 CT/1 Header (Normative)
CT/1 headers serialize as ADR-003 BCF/1 maps with fixed key ordering. Keys and
types:
| Key | Type | Notes |
| --------------------- | -------- | ----- |
| `ct.version` | `UINT8` | MUST equal `1`. |
| `ct.rcs_version` | `UINT8` | RCS/1 core schema version; MUST equal `1`. |
| `ct.topology` | `CID` | CRS/1 topology or manifest CID. |
| `ct.ac` | `CID` | AC/1 descriptor CID (ADR-028). |
| `ct.dtf` | `CID` | DTF/1 policy CID (ADR-028). |
| `ct.determinism_level`| `UINT8` | `0` = D1 (bit-exact), `1` = D2 (numeric stable). |
| `ct.kernel_cfg` | `CID` | Opaque kernel/tolerance configuration manifest. |
| `ct.tick` | `UINT64` | Monotonically increasing replay sequence number. |
| `ct.signature` | `BYTES` | 64-byte Ed25519 signature payload. |
**Validation**
1. BCF decode failures ⇒ `ERR_CT_MALFORMED`.
2. Key set/order mismatches ⇒ `ERR_CT_UNKNOWN_KEY`.
3. `ct.version` or `ct.rcs_version``1``ERR_CT_VERSION`.
4. `ct.determinism_level ∉ {0,1}``ERR_CT_DET_LEVEL`.
5. Non-canonical CID strings ⇒ `ERR_CT_CID`.
6. `ct.tick` outside `UINT64` range or non-monotone progression ⇒
`ERR_CT_FIELD_TYPE` / `ERR_CT_TICK`.
7. `ct.signature` length mismatch or Ed25519 verification failure ⇒
`ERR_CT_SIGNATURE`.
**Signature rules**
`ct.signature` signs `H("AMDUAT:CT\0" || canonical_bytes_without_signature)`. Public
keys are registered in the determinism catalogue (this section) and referenced by
`ct.kernel_cfg` as needed for tolerance disclosure.
**Evidence & vectors**
* `/amduat/tools/validate/ct1_validator.py` — validation helper covering CT/1,
AC/1, and DTF/1 schemas.
* `/amduat/vectors/ph05/ct1/` — fixtures `TV-CT1-001…004`, `TV-AC1-001…002`,
`TV-DTF1-001…002`.
* `/amduat/tools/ci/ct_replay.py` — replay harness producing
`/amduat/logs/ph05/evidence/ct1/PH05-EV-CT1-REPLAY-001/` (D1 parity + D2
tolerance runs).
### 7.11 SOS/1 Semantic Overlays (Normative)
SOS/1 (ADR-024) attaches typed overlays to CRS Concepts or Relations via an
ADR-003 BCF/1 map signed with the `"AMDUAT:SOS\0"` domain separator.
| Key | Type | Cardinality | Notes |
| -------------- | ------------ | ----------- | ----- |
| `sos.version` | UINT8 | 1 | MUST equal `1`. |
| `sos.subject` | CID (hex) | 1 | CRS Concept or Relation CID receiving the overlay. |
| `sos.predicate`| CID (hex) | 1 | Registered predicate concept describing overlay semantics. |
| `sos.value` | CID (hex) | 1 | Opaque payload (text capsule, BCF/1 manifest, etc.). |
| `sos.policy` | ENUM | 1 | `0=open`, `1=curated`, `2=compat`. |
| `sos.timestamp`| UINT64 | 1 | Epoch seconds when authored. |
| `sos.signature`| BYTES[64] | 1 | Ed25519 signature over `H("AMDUAT:SOS\0" || canonical_bytes_without_signature)`. |
**Validation**
1. Unknown keys ⇒ `ERR_SOS_UNKNOWN_KEY`.
2. `sos.version != 1``ERR_SOS_VERSION_UNSUPPORTED`.
3. `sos.predicate` MUST resolve to a registered CRS predicate ⇒
`ERR_SOS_PREDICATE_UNREGISTERED`.
4. `sos.policy` outside `{0,1,2}` or disallowed for deployment ⇒
`ERR_SOS_POLICY_INCOMPATIBLE`.
5. Epoch-second timestamps that regress relative to policy baseline MAY raise
`ERR_SOS_TIMESTAMP_REGRESSION`.
6. Signature verification failure ⇒ `ERR_SOS_SIGNATURE_INVALID`.
7. Compat overlays (`sos.policy = 2`) MUST reference MPR/1 + IER/1 artefacts in
certification evidence ⇒ missing references raise
`ERR_SOS_COMPAT_EVIDENCE_REQUIRED`.
**Router integration**
* `POST /sos` (Protected Area) validates predicate registry membership, policy
lane, timestamp discipline, and signatures.
* `GET /sos/:cid` returns canonical SOS/1 bytes for replay.
* `GET /sos?subject=<cid>&after=<cid?>&limit=<n>` paginates overlays
deterministically by CID with stable replay windows.
* Compat responses MUST surface referenced MPR/1 hashes and IER/1 fingerprints
for auditors.
**Evidence & vectors**
* `/amduat/logs/ph04/evidence/sos1/PH04-EV-SOS-001/summary.md` — validator run covering
`TV-SOS-001…006`.
* `/amduat/vectors/ph04/sos1/` — canonical overlay fixtures exercising success,
unregistered predicate, policy mismatch, signature failure, timestamp
regression, and compat evidence gaps.
### 7.12 MPR/1 Model Provenance (Normative)
MPR/1 (ADR-025 v1.0.0) captures canonical model fingerprint triples for compat
policy lanes.
| Key | Type | Cardinality | Notes |
| ------------------ | ------------ | ----------- | ----- |
| `mpr.version` | UINT8 | 1 | MUST equal `1`. |
| `mpr.model_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of model artefact. |
| `mpr.weights_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of weights bundle. |
| `mpr.tokenizer_hash` | HEX | 1 | Lowercase hex digest (≥64 chars) of tokenizer assets. |
| `mpr.build_info` | CID *(optional)* | 0..1 | Immutable build metadata capsule. |
| `mpr.signature` | BYTES[64] *(optional)* | 0..1 | Ed25519 signature over `"AMDUAT:MPR\0" || canonical_bytes_without_signature`. |
**Validation**
1. Unknown keys ⇒ `ERR_MPR_UNKNOWN_KEY`.
2. `mpr.version != 1``ERR_MPR_VERSION`.
3. Missing hash fields ⇒ `ERR_MPR_MISSING_FIELD`.
4. Hash fields not lowercase hex (≥64) ⇒ `ERR_MPR_HASH_FORMAT`; zero digests ⇒ `ERR_MPR_HASH_ZERO`.
5. `mpr.build_info` malformed ⇒ `ERR_MPR_BUILD_INFO`.
6. Signature verification failure ⇒ `ERR_MPR_SIGNATURE`.
**Evidence & vectors**
* `/amduat/logs/ph04/evidence/mpr1/PH04-EV-MPR-001/pass.jsonl` — validator harness (`python tools/ci/run_mpr_vectors.py`) covering `TV-MPR-001…003` with summary in `summary.md`.
* `/amduat/vectors/ph04/mpr1/` — fixtures exercising valid record, missing weights hash, and signature domain mismatch.
### 7.13 IER/1 Inference Evidence (Normative)
IER/1 (ADR-026 v1.0.0) binds FER/1 receipts to compat policy envelopes and MPR/1 fingerprints.
| Key | Type | Cardinality | Notes |
| ------------------------ | --------------- | ----------- | ----- |
| `ier.version` | UINT8 | 1 | MUST equal `1`. |
| `ier.fer_cid` | CID | 1 | Referenced FER/1 receipt. |
| `ier.executor_fingerprint` | CID | 1 | MUST equal linked MPR/1 CID. |
| `ier.determinism_level` | ENUM | 1 | FER/1 determinism indicator. |
| `ier.rng_seed` | HEX *(conditional)* | 0..1 | Required (hex) when determinism ≠ `D1`. |
| `ier.policy_cid` | CID | 1 | Compat policy capsule authorising run. |
| `ier.log_digest` | HEX | 1 | `H("AMDUAT:IER:LOG\0" || concat(log.sha256))`. |
| `ier.log_manifest` | MAP *(optional)* | 0..1 | Non-empty list of log entries with `sha256`. |
| `ier.attestations` | LIST<BYTES> *(optional)* | 0..1 | Policy attestations (Ed25519 signatures). |
**Validation**
1. Unknown keys ⇒ `ERR_IER_UNKNOWN_KEY`.
2. `ier.version != 1``ERR_IER_VERSION`.
3. Malformed CIDs ⇒ `ERR_IER_POLICY`.
4. `ier.executor_fingerprint` mismatch ⇒ `ERR_IER_FINGERPRINT`.
5. Missing RNG seed when determinism ≠ `D1``ERR_FER_RNG_REQUIRED`.
6. `ier.log_digest` mismatch or malformed manifest ⇒ `ERR_IER_LOG_HASH` / `ERR_IER_LOG_MANIFEST`.
7. Attestation payloads not raw bytes ⇒ `ERR_IER_MALFORMED`.
**Evidence & vectors**
* `/amduat/logs/ph04/evidence/ier1/PH04-EV-IER-001/pass.jsonl` — validator harness (`python tools/ci/run_ier_vectors.py`) covering `TV-IER-001…004` with manifest summary in `summary.md`.
* `/amduat/vectors/ph04/ier1/` — fixtures exercising success, missing RNG seed, fingerprint mismatch, and log digest mismatch.
---
## 8 Test Vectors & Conformance
### 8.1 COR/1 & ICD/1
* Payload → CID (algo `0x01`).
* COR/1 streams → CID and back (round-trip identity).
* ICD/1 → `instance_id`.
### 8.2 FCS/1 v1-min
* Positive: `{0x30,0x31,0x32}` only, strict order, valid PCB1, acyclic.
* Negative: any pre-v1-min tags (`0x33/0x34/0x35/0x36`) ⇒ reject per §7.2.
* Arity/PCB mismatch ⇒ `ERR_PCB_ARITY_MISMATCH`.
* Cycle ⇒ `ERR_FCS_CYCLE_DETECTED`.
* Negative: legacy tags (`0x33-0x36`) → `ERR_FCS_UNKNOWN_TAG` per §7.2.
### 8.3 FER/1
* Signed receipt with monotonic timestamps; verify signature, executor set ↔ parity alignment, and linkage to FCS/1.
* Negative: timestamp inversion ⇒ `ERR_FER_TIMESTAMP`; bad signature ⇒ `ERR_FER_SIGNATURE`.
* Negative: parity drift (mismatched executor keys or output digests) ⇒ `ERR_IMPL_PARITY`.
* Negative: unknown TLV tag/cardinality ⇒ `ERR_FER_UNKNOWN_TAG`.
### 8.4 FCT/1
* Multiple FER/1 receipts for same function; verify attestation coverage by policy.
* Negative: mismatched receipt function ⇒ `ERR_FCT_RECEIPT_MISMATCH`.
* Negative: missing attestation when policy ≠ Open ⇒ `ERR_FCT_ATTESTATION_REQUIRED`.
### 8.5 FPD/1
* Deterministic reconstruction of `fpd.digest` over `{FCT/1 bytes, FER/1 receipts, governance edge capsule}` on repeated runs.
* Negative: perturbation of member ordering ⇒ `ERR_FPD_DIGEST`.
* Negative: timestamp regression versus FER receipts or parent digest ⇒ `ERR_FPD_TIMESTAMP`.
**CI Requirements**
* Import/export **byte-identity** round-trip for COR/1/FCS/1/FER/1.
* Canonical TLV/BCF ordering across descriptors.
* Multi-platform reproducibility (≥3) including signature verification parity.
* Timing evidence captured per SRS FR-020 (deterministic envelope).
* Federation digest fixture verifies stable FPD/1 CID under `tools/ci/fct_publish_check.py`.
---
## 9. Security Considerations
* Domain separation strings MUST be exact.
* Hash **exact payload bytes**, never decoded structures.
* Canonical rejection prevents ambiguous encodings.
* Certification places policy/intent in signed FCT/1, not in execution recipes.
---
## 10. Change Management
* **Behavioural semantics are in SRS.**
* Changes here require ADR + CCP approval.
* Versioning follows semantic versioning of encodings.
* On approval, update IDX and SRS references accordingly.
---
## 11. ByteStore API & Persistence Discipline
ByteStore is the canonical persistence boundary layered over COR/1 and ICD/1.
Implementations **must** honour the behaviours in this section; deviations are
governed by ADR-030.
### 11.1 API Surface
| API | Signature | Behaviour | Error Surfaces (ADR-006) |
| -------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------- | ---------------------------------------------------- |
| `put` | `(payload: bytes) → cid_hex` | Persist raw payload under CID derived from `H("CAS:OBJ\0" || payload)`. | `ERR_POLICY_SIZE`, `ERR_IDENTITY_MISMATCH` |
| `put_stream` | `(chunks: Iterable[bytes]) → cid_hex` | Deterministic chunked ingest; concatenated bytes hash to the same CID as `put`. | `ERR_STREAM_ORDER`, `ERR_STREAM_TRUNCATED` |
| `import_cor` | `(envelope: bytes) → cid_hex` | Validate COR/1, enforce policy, persist canonical envelope without re-encoding. | `ERR_POLICY_SIZE`, COR/1 decoder errors |
| `export_cor` | `(cid_hex: str) → envelope` | Return stored COR/1 bytes; must match the original import byte-for-byte. | `ERR_STORE_MISSING`, `ERR_IDENTITY_MISMATCH` |
| `get` | `(cid_hex: str) → bytes` | Return stored bytes (payload or COR envelope) exactly as persisted. | `ERR_STORE_MISSING` |
| `stat` | `(cid_hex: str) → {present: bool, size: int}` | Probe object presence and payload/envelope size without mutating state. | `ERR_STORE_MISSING` (absence reported via `present`) |
| `assert_area_isolation` | `(public_root: Path, secure_root: Path) → None` | Enforce SA/PA separation; raise if roots overlap or share ancestry. | `ERR_AREA_VIOLATION` |
### 11.2 Deterministic Identity
Canonical identity is derived per COR/1/SRS:
```
cid = algo_id || H("CAS:OBJ\0" || payload)
```
`algo_id` defaults to `0x01` (SHA-256). ByteStore **must** reuse the exact
domain separator and hash to remain compatible with CAS and DDS §1.
### 11.3 COR/1 Round-Trip Identity
`import_cor()` decodes the envelope, enforces policy (size ≤ ICD/1
`max_object_size`), and persists the canonical bytes. `export_cor()` returns the
exact stored envelope; re-encoding is forbidden. Derived CID **must** equal the
envelopes CID (DDS §2.5, SRS FR-BS-004).
### 11.4 Atomic fsync Ladder
All writes follow the deterministic ladder:
1. Write payload/envelope to a unique `.tmp-<suffix>` file in the shard.
2. `fsync(tmp)` to guarantee payload durability.
3. `rename(tmp, final)`.
4. `fsync(shard directory)` and then `fsync(ByteStore root)`.
Crash-window simulation is exposed via `AMDUAT_BYTESTORE_CRASH_STEP` (“before_rename”).
Implementations **must** honour the hook and leave PA consistent on recovery
(DDS §11.8; vectors TV-BS-005, evidence bundle PH05-EV-BS-001).
### 11.5 SA/PA Isolation & Pathing
Public area (PA) payloads live under case-stable two-level fan-out (`/aa/bb/cid…`).
Secure area (SA) metadata is held outside the PA tree. `assert_area_isolation()`
enforces:
* `public_root != secure_root`
* neither root is an ancestor of the other
Violations raise `ERR_AREA_VIOLATION` and **must** be surfaced by callers.
### 11.6 Chunked Ingest Determinism & Policy
`put_stream()` concatenates byte chunks in order, rejecting non-bytes input or
missing data. The resulting CID **must** equal `put(payload)` for the same
payload (SRS FR-BS-005). ByteStore enforces ICD/1 `max_object_size` prior to
persisting data; exceeding the limit raises `ERR_POLICY_SIZE`.
### 11.7 Error Mapping
| Condition | Error Code | Notes |
| ---------------------------------- | --------------------- | -------------------------------------------------------------- |
| Payload exceeds policy limit | `ERR_POLICY_SIZE` | ICD/1 `max_object_size` (ADR-006 policy lane). |
| Streaming chunk type/order invalid | `ERR_STREAM_ORDER` | Non-bytes or out-of-order chunks (deterministic rejection). |
| Streaming missing payload | `ERR_STREAM_TRUNCATED`| Zero-length stream without payload. |
| Stored bytes mismatch CID | `ERR_IDENTITY_MISMATCH` | Raised when existing bytes conflict with derived identity. |
| SA/PA overlap | `ERR_AREA_VIOLATION` | Shared roots or ancestry (secure/public crossing). |
| Crash-window hook triggered | `ERR_CRASH_SIMULATION`| Simulated crash prior to rename/fsync ladder completion. |
| Missing object | `ERR_STORE_MISSING` | Reported when an object path is absent. |
All other errors bubble from COR/1 decoding and map to existing ADR-006 codes
(see §2.7).
### 11.8 Conformance & Evidence
* Vectors: `/amduat/vectors/ph05/bytestore/` (`TV-BS-001…005`).
* Runner: `/amduat/tools/ci/bs_check.py` (dual-run determinism; emits JSONL).
* Evidence: `/amduat/logs/ph05/evidence/bytestore/PH05-EV-BS-001/` (runA/runB +
crash summary).
* Linked ADR: ADR-030 (ByteStore Persistence Contract).
---
## Appendix A — Surface Version Table
| Surface | Version | Notes |
| ------- | ------- | ----- |
| FCS/1 | v1-min | Execution-only descriptor (ADR-016); governance fields live in FCT/1. |
| FER/1 | v1.1 | Parity-first receipts with run_id dedup, executor fingerprints, typed logs, RNG envelope (ADR-017). |
| FCT/1 | v1.0 | Certification transactions binding policy/intent/attestations; publishes FER/1 receipts. |
| FPD/1 | v1.0 | Single-digest publication capsule linking FCT/1 and FER/1 sets. |
---
**End of DDS 0.5.0**
---
## Document History
* 0.2.1 (2025-10-26) — Updated Phase Pack references; byte semantics unchanged; ADR-012 no-normalization.
* 0.2.2 (2025-10-26) — Promoted PH01 design surfaces to Approved; synchronized anchors.
* 0.2.3 (2025-10-27) — Marked DDS scope as PH01-only and referenced FPS/1 surfaces.
* **0.2.4 (2025-11-14):** Added FCS/1 & PCB1 TLVs plus FER/1 receipt and FCT/1 transaction schemas with rejection mapping.
* **0.2.5 (2025-11-15):** Registered PCB1 header invariants and arity/cycle validation errors.
* **0.2.6 (2025-11-19):** Registered `ERR_EXEC_TIMEOUT` for deterministic timing envelope.
* **0.3.0 (2025-11-02):** Trimmed **FCS/1 to v1-min** (execution recipe only: `function_ptr`, `parameter_block`, `arity`). Moved **intent/roles/scope/policy** to **FCT/1**; clarified provenance lives in **FER/1**. Added rejection guidance for legacy FCS tags.
* **0.3.1 (2025-11-20):** Registered `ERR_FCS_UNKNOWN_TAG`; clarified that any legacy governance tag in FCS/1 is a hard rejection. No other layout changes.
* **0.3.2 (2025-11-21):** Adopted parity-first FER/1 TLVs (executor set, parity vector, context/witness hooks), registered `ERR_IMPL_PARITY` and `ERR_FER_UNKNOWN_TAG`, and refreshed conformance guidance.
* **0.3.3 (2025-11-22):** Added FPD/1 publication digest schema, registered federation digest/timestamp errors, and wired CI fixtures to deterministic publish checks.
* **0.3.5 (2025-11-07):** Added surface version table and aligned FER/1 v1.1 maintenance metadata for Phase 04 handoff.
* **0.3.6 (2025-11-08):** Seeded PH04 linkage & semantic placeholder section (DDS §7.8).
* **0.3.7 (2025-11-08):** Seeded FLS/1 placeholder TLV table aligned with ADR-018 v0.3.0.
* **0.3.8 (2025-11-08):** Registered FLS/1 TLV registry (0x600x65), error mapping, and conformance vectors aligned with ADR-018 v0.4.0.
* **0.3.9 (2025-11-09):** Locked CRS/1 concept/relation TLVs and registered FLS payload CID/type errors with conformance evidence.
* **0.4.0 (2025-11-08):** Promoted §7.8 FLS/1 & CRS/1 TLVs with error mapping and GS/1 snapshot evidence.
* **0.4.1 (2025-11-09):** Extended CRS predicate rules and mapped new validation errors
* **0.4.2 (2025-11-09):** Registered router error codes (`ERR_FLS_UNKNOWN_TAG`, `ERR_FLS_TAG_ORDER`, `ERR_FLS_SIGNATURE`) and FPD parent-policy errors with GS diff evidence pointer.
* **0.4.3 (2025-11-09):** Added WT/1 intake layout, validation errors, and router API integration (§7.9).
* **0.4.4 (2025-11-20):** Refined WT/1 (§7.9) with `wt.pubkey`, signature preimage exclusion, lineage/policy errors, and
expanded validator vector coverage.
* **0.4.6 (2025-11-22):** WT/1 and SOS/1 conformance evidence sealed via PH04-M4/M5 audit bundles.
* **0.4.5 (2025-11-21):** Registered SOS/1 overlays (§7.10) with compat evidence enforcement, aligned WT/1 error mapping (`ERR_WT_KEY_UNBOUND`, `ERR_WT_INTENT_UNREGISTERED`, `ERR_WT_PARENT_REQUIRED`), and expanded vector coverage to `TV-WT-001…009`.
* **0.4.7 (2025-11-23):** Documented MPR/1 and IER/1 schemas, error surfaces, and validator evidence for compat policy lane.
* **0.4.8 (2025-11-24):** Added §7.10 CT/1 header schema with error codes and renumbered downstream sections for PH05 replay.
* **0.5.0 (2025-11-11):** Added §11 ByteStore API & Persistence discipline covering API surface, fsync ladder, SA/PA isolation, streaming determinism, and ADR-006 error mapping.

View file

@ -0,0 +1,357 @@
# ENC/ASL-CORE-INDEX/1 — Encoding Specification for ASL Core Index
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [encoding, index, deterministic]
<!-- Source: /amduat-api/tier1/enc-asl-core-index.md | Canonical: /amduat/tier1/enc-asl-core-index-1.md -->
**Document ID:** `ENC/ASL-CORE-INDEX/1`
**Layer:** Index Encoding Profile (on top of ASL/1-CORE-INDEX + ASL/STORE-INDEX/1)
**Depends on (normative):**
* `ASL/1-CORE-INDEX` — semantic index model
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
**Informative references:**
* `ASL/LOG/1` — append-only log semantics
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 1. Purpose
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
Variable-length digest requirements are defined in ASL/1-CORE-INDEX (`tier1/asl-core-index.md`).
This document incorporates the federation encoding addendum.
It is intended for:
* C libraries
* Tools
* API frontends
* Memory-mapped access
It does **not** define:
* Index semantics (see ASL/1-CORE-INDEX)
* Store lifecycle behavior (see ASL-STORE-INDEX)
* Acceleration semantics (see ASL/INDEX-ACCEL/1)
* TGK edge semantics or encodings (see `TGK/1` and `TGK/1-CORE`)
* Federation semantics (see federation/domain policy layers)
---
## 2. Encoding Principles
1. **Little-endian** representation
2. **Fixed-width fields** for deterministic access
3. **No pointers or references**; all offsets are file-relative
4. **Packed structures**; no compiler-introduced padding
5. **Forward compatibility** via version field
6. **CRC or checksum protection** for corruption detection
7. **Federation metadata** embedded in index records for deterministic cross-domain replay
All multi-byte integers are little-endian unless explicitly noted.
---
## 3. Segment Layout
Each index segment file is laid out as follows:
```
+------------------+
| SegmentHeader |
+------------------+
| BloomFilter[] | (optional, opaque to semantics)
+------------------+
| IndexRecord[] |
+------------------+
| DigestBytes[] |
+------------------+
| ExtentRecord[] |
+------------------+
| SegmentFooter |
+------------------+
```
* **SegmentHeader**: fixed-size, mandatory
* **BloomFilter**: optional, opaque, segment-local
* **IndexRecord[]**: array of index entries
* **DigestBytes[]**: concatenated digest bytes referenced by IndexRecord
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
* **SegmentFooter**: fixed-size, mandatory
Offsets in the header define locations of Bloom filter and index records.
### 3.1 Fixed Constants and Sizes
**Magic bytes (SegmentHeader.magic):** `ASLIDX03`
* ASCII bytes: `0x41 0x53 0x4c 0x49 0x44 0x58 0x30 0x33`
* Little-endian uint64 value: `0x33305844494c5341`
**Current encoding version:** `3`
**Fixed struct sizes (bytes):**
* `SegmentHeader`: 112
* `IndexRecord`: 48
* `ExtentRecord`: 16
* `SegmentFooter`: 24
**Section packing (no gaps):**
* `records_offset = header_size + bloom_size`
* `digests_offset = records_offset + (record_count * sizeof(IndexRecord))`
* `extents_offset = digests_offset + digests_size`
* `SegmentFooter` starts at `extents_offset + (extent_count * sizeof(ExtentRecord))`
All offsets MUST be file-relative, 8-byte aligned, and point to their respective arrays exactly as above.
### 3.2 Federation Defaults
This encoding integrates federation metadata into segments and records.
Legacy segments without federation fields MUST be treated as:
* `segment_domain_id = local`
* `segment_visibility = internal`
* `domain_id = local`
* `visibility = internal`
* `has_cross_domain_source = 0`
* `cross_domain_source = 0`
---
## 4. SegmentHeader
```c
#pragma pack(push,1)
typedef struct {
uint64_t magic; // Unique magic number identifying segment file type
uint16_t version; // Encoding version
uint16_t shard_id; // Optional shard identifier
uint32_t header_size; // Total size of header including fields below
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
uint64_t snapshot_max; // Maximum snapshot ID
uint64_t record_count; // Number of index entries
uint64_t records_offset; // File offset of IndexRecord array
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
uint64_t digests_offset; // File offset of DigestBytes array
uint64_t digests_size; // Total size in bytes of DigestBytes
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
uint32_t segment_domain_id; // Domain owning this segment
uint8_t segment_visibility; // 0 = internal, 1 = published
uint8_t federation_version; // 0 if unused
uint16_t reserved0; // Reserved (must be 0)
uint64_t flags; // Segment flags (must be 0 in version 3)
} SegmentHeader;
#pragma pack(pop)
```
**Notes:**
* `magic` ensures the reader validates the segment type.
* `version` allows forward-compatible extension.
* `snapshot_min` / `snapshot_max` are reserved for future use and carry no visibility semantics in version 3.
* `segment_domain_id` identifies the owning domain for all records in this segment.
* `segment_visibility` MUST be the maximum visibility of all records in the segment.
* `federation_version` MUST be `0` unless a future federation encoding version is defined.
* `reserved0` MUST be `0`.
* `header_size` MUST be `112`.
* `flags` MUST be `0`. Readers MUST reject non-zero values.
---
## 5. IndexRecord
```c
#pragma pack(push,1)
typedef struct {
uint32_t hash_id; // Hash algorithm identifier
uint16_t digest_len; // Digest length in bytes
uint16_t reserved0; // Reserved for alignment/future use
uint64_t digest_offset; // File offset of digest bytes for this entry
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
uint32_t total_length; // Total artifact length in bytes
uint32_t domain_id; // Domain identifier for this artifact
uint8_t visibility; // 0 = internal, 1 = published
uint8_t has_cross_domain_source; // 0 or 1
uint16_t reserved1; // Reserved (must be 0)
uint32_t cross_domain_source; // Source domain if imported (valid if has_cross_domain_source=1)
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
} IndexRecord;
#pragma pack(pop)
```
**Notes:**
* `hash_id` + `digest_len` + `digest_offset` store the artifact key deterministically.
* `digest_len` MUST be explicit in the encoding and MUST match the length implied by `hash_id` and StoreConfig.
* `digest_offset` MUST be within `[digests_offset, digests_offset + digests_size)`.
* `extents_offset` references the first ExtentRecord for this entry.
* `extent_count` defines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX in `tier1/asl-core-index.md`).
* `total_length` is the exact artifact size in bytes.
* Flags may indicate tombstone or other special status.
* `domain_id` MUST be present and stable across replay.
* `visibility` MUST be `0` or `1`.
* `has_cross_domain_source` MUST be `0` or `1`.
* `cross_domain_source` MUST be `0` when `has_cross_domain_source=0`.
* `reserved0` and `reserved1` MUST be `0`.
### 5.1 IndexRecord Flags
```
IDX_FLAG_TOMBSTONE = 0x00000001
```
* If `IDX_FLAG_TOMBSTONE` is set, then `extent_count`, `total_length`, and `extents_offset` MUST be `0`.
* All other bits are reserved and MUST be `0`. Readers MUST reject unknown flag bits.
* Tombstones MUST retain valid `domain_id` and `visibility` to ensure domain-local shadowing.
---
## 6. ExtentRecord
```c
#pragma pack(push,1)
typedef struct {
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of this extent
} ExtentRecord;
#pragma pack(pop)
```
**Notes:**
* Extents are concatenated in order to produce artifact bytes.
* `extent_count` MUST be > 0 for visible (non-tombstone) entries.
* `total_length` MUST equal the sum of `length` across the extents.
* `offset` and `length` MUST describe a contiguous slice within the referenced block.
---
## 7. SegmentFooter
```c
#pragma pack(push,1)
typedef struct {
uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
uint64_t seal_time_ns; // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)
```
**Notes:**
* CRC ensures corruption detection during reads, covering all segment contents except the footer.
* Seal information allows deterministic reconstruction of CURRENT state.
---
## 8. DigestBytes
* Digest bytes are concatenated in a single byte array.
* Each IndexRecord references its digest via `digest_offset` and `digest_len`.
* The digest bytes MUST be immutable once the segment is sealed.
---
## 9. Bloom Filter
* The bloom filter is **optional** and opaque to semantics.
* Its purpose is **lookup acceleration**.
* Must be deterministic: same entries → same bloom representation.
* Segment-local only; no global assumptions.
---
## 10. Versioning and Compatibility
* `version` field in header defines encoding.
* Readers must **reject unsupported versions**.
* New fields may be added in future versions only via version bump.
* Existing fields must **never change meaning**.
* Version `1` implies single-extent layout (legacy).
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
* Version `3` introduces variable-length digest bytes with `hash_id` and `digest_offset`.
* Version `3` also integrates federation metadata in segment headers and index records.
### 10.1 Federation Compatibility Rules
* Legacy segments without federation fields are treated as local/internal (see 3.2).
* Tombstones MUST NOT shadow artifacts from other domains; domain matching is required.
---
## 11. Alignment and Packing
* All structures are **packed** (no compiler padding)
* Multi-byte integers are **little-endian**
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
* Extents are accessed via `IndexRecord.extents_offset` relative to the file base.
---
## 12. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
1. **Deterministic layout** across platforms
2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation)
3. **Immutability of sealed segments**
4. **Integrity validation** via CRC
5. **Forward-compatible extensibility**
---
## 13. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
This completes the stack: **semantics → store behavior → encoding**.

248
tier1/enc-asl-log-1.md Normal file
View file

@ -0,0 +1,248 @@
# ENC/ASL-LOG/1 — Encoding Specification for ASL Append-Only Log
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [encoding, log, deterministic]
<!-- Source: /amduat-api/tier1/enc-asl-log.md | Canonical: /amduat/tier1/enc-asl-log-1.md -->
**Document ID:** `ENC/ASL-LOG/1`
**Layer:** Log Encoding Profile (on top of ASL/LOG/1)
**Depends on (normative):**
* `ASL/LOG/1` — semantic log behavior and replay rules
**Informative references:**
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 1. Purpose
This document defines the **exact encoding** of the ASL append-only log.
It translates **ASL/LOG/1** semantics into a deterministic **bytes-on-disk** format.
It does **not** define log semantics (see `ASL/LOG/1`).
---
## 2. Encoding Principles
1. **Little-endian** integers
2. **Packed structures** (no compiler padding)
3. **Forward-compatible** versioning via header fields
4. **Deterministic serialization**: identical log content -> identical bytes
5. **Hash-chained integrity** as defined by ASL/LOG/1
---
## 3. Log File Layout
```
+----------------+
| LogHeader |
+----------------+
| LogRecord[] |
+----------------+
```
* **LogHeader**: fixed-size, mandatory, begins file
* **LogRecord[]**: append-only entries, variable number
---
## 4. LogHeader
```c
#pragma pack(push,1)
typedef struct {
uint64_t magic; // "ASLLOG01"
uint32_t version; // Encoding version (1)
uint32_t header_size; // Total header bytes including this struct
uint64_t flags; // Reserved, must be zero for v1
} LogHeader;
#pragma pack(pop)
```
Notes:
* `magic` is ASCII bytes: `0x41 0x53 0x4c 0x4c 0x4f 0x47 0x30 0x31`
* `version` allows forward compatibility
---
## 5. LogRecord Envelope
Each record is encoded as:
```c
#pragma pack(push,1)
typedef struct {
uint64_t logseq; // Monotonic sequence number
uint32_t record_type; // Record type tag
uint32_t payload_len; // Payload byte length
uint8_t payload[payload_len];
uint8_t record_hash[32]; // Hash-chained integrity (SHA-256)
} LogRecord;
#pragma pack(pop)
```
Hash chain rule (normative):
```
record_hash = H(prev_record_hash || logseq || record_type || payload_len || payload)
```
* `prev_record_hash` is the previous record's `record_hash`
* For the first record, `prev_record_hash` is 32 bytes of zero
* `H` is SHA-256 for v1
Readers MUST skip unknown `record_type` values using `payload_len` and MUST
continue replay without failure.
---
## 6. Record Type IDs (v1)
These type IDs bind the ASL/LOG/1 semantics to bytes-on-disk:
| Type ID | Record Type |
| ------- | ------------------ |
| 0x01 | SEGMENT_SEAL |
| 0x10 | TOMBSTONE |
| 0x11 | TOMBSTONE_LIFT |
| 0x20 | SNAPSHOT_ANCHOR |
| 0x30 | ARTIFACT_PUBLISH |
| 0x31 | ARTIFACT_UNPUBLISH |
---
## 6.1 Payload Schemas (v1)
All payloads are little-endian and packed. Variable-length fields are encoded
inline and accounted for by `payload_len`.
### 6.1.1 ArtifactRef
```c
#pragma pack(push,1)
typedef struct {
uint32_t hash_id; // Hash algorithm identifier
uint16_t digest_len; // Digest length in bytes
uint16_t reserved0; // Must be 0
uint8_t digest[digest_len];
} ArtifactRef;
#pragma pack(pop)
```
Notes:
* `digest_len` MUST be > 0.
* If StoreConfig fixes the hash, `digest_len` MUST match that hash's length.
### 6.1.2 SEGMENT_SEAL (Type 0x01)
```c
#pragma pack(push,1)
typedef struct {
uint64_t segment_id; // Store-local segment identifier
uint8_t segment_hash[32]; // SHA-256 over the segment file bytes
} SegmentSealPayload;
#pragma pack(pop)
```
### 6.1.3 TOMBSTONE (Type 0x10)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
uint32_t scope; // Opaque to ASL/LOG/1
uint32_t reason_code; // Opaque to ASL/LOG/1
} TombstonePayload;
#pragma pack(pop)
```
### 6.1.4 TOMBSTONE_LIFT (Type 0x11)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
uint64_t tombstone_logseq; // logseq of the tombstone being lifted
} TombstoneLiftPayload;
#pragma pack(pop)
```
### 6.1.5 SNAPSHOT_ANCHOR (Type 0x20)
```c
#pragma pack(push,1)
typedef struct {
uint64_t snapshot_id;
uint8_t root_hash[32]; // Hash of snapshot-visible state
} SnapshotAnchorPayload;
#pragma pack(pop)
```
### 6.1.6 ARTIFACT_PUBLISH (Type 0x30)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
} ArtifactPublishPayload;
#pragma pack(pop)
```
### 6.1.7 ARTIFACT_UNPUBLISH (Type 0x31)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
} ArtifactUnpublishPayload;
#pragma pack(pop)
```
---
## 7. Versioning Rules
* `version = 1` for this specification.
* New record types MAY be added without bumping the version.
* Layout changes to `LogHeader` or `LogRecord` require a new version.
---
## 8. Relationship to Other Layers
| Layer | Responsibility |
| ---------------- | ------------------------------------------------ |
| ASL/LOG/1 | Semantic log behavior and replay rules |
| ASL-STORE-INDEX | Store lifecycle and snapshot/log contracts |
| ENC-ASL-LOG | Exact byte layout for log encoding (this doc) |
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments |

View file

@ -0,0 +1,202 @@
# ENC/ASL-TGK-EXEC-PLAN/1 — Execution Plan Encoding
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-01-17
Linked Phase Pack: N/A
Tags: [encoding, execution, tgk]
<!-- Source: /amduat-api/tier1/enc-asl-tgk-exec-plan-1.md | Canonical: /amduat/tier1/enc-asl-tgk-exec-plan-1.md -->
**Document ID:** `ENC/ASL-TGK-EXEC-PLAN/1`
**Layer:** L2 — Execution plan encoding (bytes-on-disk)
**Depends on (normative):**
* `ASL/TGK-EXEC-PLAN/1`
**Informative references:**
* `ENC/ASL-CORE-INDEX/1`
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ENC/ASL-TGK-EXEC-PLAN/1 defines the byte-level encoding for serialized execution plans. It does not define operator semantics.
---
## 1. Operator Type Enumeration
```c
typedef enum {
OP_SEGMENT_SCAN,
OP_INDEX_FILTER,
OP_MERGE,
OP_PROJECTION,
OP_TGK_TRAVERSAL,
OP_AGGREGATION,
OP_LIMIT_OFFSET,
OP_SHARD_DISPATCH,
OP_SIMD_FILTER,
OP_TOMBSTONE_SHADOW
} operator_type_t;
```
---
## 2. Operator Flags
```c
typedef enum {
OP_FLAG_NONE = 0x00,
OP_FLAG_PARALLEL = 0x01, // shard or SIMD capable
OP_FLAG_OPTIONAL = 0x02 // optional operator (acceleration)
} operator_flags_t;
```
---
## 3. Snapshot Range Structure
```c
typedef struct {
uint64_t logseq_min; // inclusive
uint64_t logseq_max; // inclusive
} snapshot_range_t;
```
---
## 4. Operator Parameter Union
```c
typedef struct {
// SegmentScan parameters
struct {
uint8_t is_asl_segment; // 1 = ASL, 0 = TGK
uint64_t segment_start_id;
uint64_t segment_end_id;
} segment_scan;
// IndexFilter parameters
struct {
uint32_t artifact_type_tag;
uint8_t has_type_tag;
uint32_t edge_type_key;
uint8_t has_edge_type;
uint8_t role; // 0=none, 1=from, 2=to, 3=both
} index_filter;
// Merge parameters
struct {
uint8_t deterministic; // 1 = logseq ascending + canonical key
} merge;
// Projection parameters
struct {
uint8_t project_artifact_id;
uint8_t project_tgk_edge_id;
uint8_t project_node_id;
uint8_t project_type_tag;
} projection;
// TGKTraversal parameters
struct {
uint64_t start_node_id;
uint32_t traversal_depth;
uint8_t direction; // 1=from, 2=to, 3=both
} tgk_traversal;
// Aggregation parameters
struct {
uint8_t agg_count;
uint8_t agg_union;
uint8_t agg_sum;
} aggregation;
// LimitOffset parameters
struct {
uint64_t limit;
uint64_t offset;
} limit_offset;
// ShardDispatch & SIMDFilter are handled via flags
} operator_params_t;
```
---
## 5. Operator Definition Structure
```c
typedef struct operator_def {
uint32_t op_id; // unique operator ID
operator_type_t op_type; // operator type
operator_flags_t flags; // parallel/optional flags
snapshot_range_t snapshot; // snapshot bounds for deterministic execution
operator_params_t params; // operator-specific parameters
uint32_t input_count; // number of upstream operators
uint32_t inputs[8]; // list of op_ids for input edges (DAG)
} operator_def_t;
```
Notes:
* `inputs` defines DAG dependencies.
* The maximum input fan-in is 8 for v1.
---
## 6. Execution Plan Structure
```c
typedef struct exec_plan {
uint32_t plan_version; // version of plan encoding
uint32_t operator_count; // number of operators
operator_def_t *operators; // array of operator definitions
} exec_plan_t;
```
Operators SHOULD be serialized in topological order when possible.
---
## 7. Serialization Rules (Normative)
* All integers are little-endian.
* Operators MUST be serialized in a deterministic order.
* `operator_count` MUST match the serialized operator array length.
* `inputs[]` MUST reference valid `op_id` values within the plan.
---
## 8. Non-Goals
ENC-ASL-TGK-EXEC-PLAN/1 does not define:
* Runtime scheduling or execution
* Query languages or APIs
* Operator semantics beyond parameter layout

554
tier1/srs.md Normal file
View file

@ -0,0 +1,554 @@
# AMDUAT-SRS — Detailed Requirements Specification
Status: Approved
Owner: Niklas Rydberg
Version: 0.4.0
SoT: Yes
Last Updated: 2025-11-11
Linked Phase Pack: PH01
Tags: [requirements, cas, kheper]
<!-- Source: /amduat-api/tier1/srs.md | Canonical: /amduat/tier1/srs.md -->
**Document ID:** `AMDUAT-SRS`
**Layer:** L0 — Requirements baseline (CAS + deterministic composition)
**Depends on (normative):**
* None (requirements baseline)
**Informative references:**
* `AMDUAT-DDS` — byte-level design specification
* ADR-006 — deterministic error semantics
* ADR-015 — CAS rejection matrix alignment
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
> **Purpose:** Capture normative behavioural requirements for Phase PH01 (Kheper) and beyond. Long-lived semantics live here (not in Phase Packs).
---
## 1. Objectives (from Tier-0 Charter; elaborated)
* Deterministic addressing: identical payload bytes **MUST** yield identical CIDs.
* Immutability: new bytes → new CID; objects MUST NOT be mutated in place.
* Integrity by design: `verify()` MUST detect corruption; zero false positives.
* Instance isolation: storage layout and runtime state are implementation detail.
* Binary canonical substrate: COR/1 is the normative import/export envelope.
* Instance identity: ICD/1 defines stable `instance_id` for future transaction bindings.
* Crypto agility: default SHA-256; algorithm IDs extensible.
* Minimal tooling: reference CLI (`amduatcas`) and C library.
* Conformance: golden vectors and cross-impl CI enforce byte-identity.
---
## 2. Scope (Behavioural)
### 2.1 In Scope
* Local, single-node Content-Addressable Storage (CAS)
* Deterministic hashing with domain separation
* Canonical envelopes (COR/1) and instance descriptor (ICD/1)
* CRUD-adjacent operations: put/get/stat/exists/verify
* Import/export of canonical bytestreams
* Optional listing/gc semantics
### 2.2 Out of Scope (for PH01)
* Networking, replication, consensus
* Multi-object transactions
* Semantic/provenance graphing
* Encryption/ACLs (layer externally)
---
## 3. Functional Requirements
### FR-001 Deterministic CID Production
Given identical payload bytes and algo_id, the CID **MUST** match across compliant implementations.
### FR-002 Immutability
Objects **MUST NOT** be mutated; new payload → new CID.
### FR-003 Idempotent Put
Concurrent `put()` of identical payload MUST yield one canonical object; object integrity preserved.
### FR-004 Verification
`verify(CID)` MUST recompute the CID and detect corruption; zero false positives.
### FR-005 Import/Export Canonicality
Importing COR/1 and then exporting it MUST yield byte-identical bytestreams.
### FR-006 Size Validation
`get()` MUST validate payload length according to COR/1.
### FR-007 Optional Verify-on-Read Policy
Policy MAY require verify for cold reads; MUST NOT corrupt payload if disabled.
### FR-008 Canonical Rejection
CAS decoders MUST reject:
* out-of-order TLV tags
* duplicate TLV tags
* extraneous tags
* trailing bytes
* malformed or over-long VARINT encodings
* payload length mismatches
Rejection MUST be deterministic and symbolic.
### FR-009 Concurrency Discipline
Concurrent `put()` operations for identical payloads MUST NOT yield divergent COR/1 envelopes. Only one canonical envelope may result.
### FR-010 Raw Byte Semantics
CAS MUST operate strictly over exact payload bytes. No normalization (newline, whitespace, UTF-8 interpretation, or Unicode equivalence) SHALL occur.
### FR-011 Filesystem Independence
Consensus behaviour MUST NOT depend on:
* directory entry ordering
* timestamp metadata
* filesystem case sensitivity
* locale or regional configuration
### FR-012 Deterministic Failure
Malformed objects MUST be rejected. CAS MUST NOT auto-repair or normalize COR/1 envelopes.
### FR-013 Resource Boundaries
Resource exhaustion (disk full, allocation failure) MUST fail atomically and leave no partial objects visible.
### FR-014 FCS/1 Descriptor Determinism (v1-min)
Composite and custom functions MUST be expressed as canonical **FCS/1** descriptors that contain **only the execution recipe**:
`function_ptr`, `parameter_block (PCB1)`, and `arity`.
Identical descriptors SHALL hash to identical CIDs and MUST remain immutable after publication. **No policy/intent/notes** appear in FCS/1.
### FR-015 Registry Determinism (Descriptor Admission)
Functional registries MUST admit **only canonical FCS/1 descriptors** (per FR-014) and enforce descriptor validation (TLV order, PCB1 arity, acyclicity).
Registries MUST NOT infer or embed policy/intent into descriptors; publication governance is handled at certification time (FR-017).
### FR-016 Evaluation Receipt Integrity (FER/1)
Every execution of a composite function under curated or locked policies MUST emit a **FER/1** receipt. The receipt SHALL encode, in canonical TLV order, at least the following evidence:
1. `function_cid` → evaluated FCS/1 descriptor (v1-min) preserving CIP indirection.
2. `input_manifest` → GS/1 BCF/1 set of consumed input CIDs (deduped and byte-lexicographic).
3. `environment` → ICD/1 (or PH03 env capsule) snapshot pinning toolchain/runtime state.
4. `evaluator_id` → stable evaluator identity bytes.
5. `executor_set` → implementations that executed the recipe, keyed in canonical byte order.
6. `parity_vector` → per-executor digests with matching `executor` ordering, shared `output` (`== output_cid`), and `sbom_cid` entries.
7. `executor_fingerprint` + `run_id` → optional SBOM fingerprint CID and deterministic dedup hash (`H("AMDUAT:RUN\0" || function || manifest || env || fingerprint)`).
8. `logs` → typed evidence capsules binding `kind`, `cid`, and `sha256` for stdout/stderr/metrics traces.
9. `limits` → declared execution envelope (`cpu_ms`, `wall_ms`, `max_rss_kib`, `io_reads`, `io_writes`).
10. `determinism_level` / `rng_seed` → declared determinism class (`D1_bit_exact` default, `D2_numeric_stable` requires a 032 byte seed).
11. `output_cid` → single canonical output CID for the run.
12. `started_at` / `completed_at` → epoch-second timestamps satisfying FR-020 bounds.
13. `signature` → Ed25519 metadata verifying `H("AMDUAT:FER\0" || canonical bytes)`.
Receipts MAY include optional `logs` (typed capsules), `context`, `witnesses`, `parent`, and `signature_ext` TLVs but MUST NOT leak policy/intent (those belong to FCT/1).
From Phase 04 onwards, governance and runtime layers MUST require FER/1 v1.1 receipts; ER/1 artefacts remain valid only as historical evidence and SHALL NOT satisfy FR-016 compliance gates.
Parity discipline is mandatory: unsorted executor keys or mismatched parity orderings SHALL raise `ERR_IMPL_PARITY_ORDER`; divergent outputs or missing executors SHALL raise `ERR_IMPL_PARITY`. Unknown TLVs or cardinality violations SHALL raise `ERR_FER_UNKNOWN_TAG`. GS/1 manifest violations emit `ERR_FER_INPUT_MANIFEST_SHAPE`; missing RNG seed when determinism ≠ D1 emits `ERR_FER_RNG_REQUIRED`. All signatures MUST verify against the domain-separated hash (`ERR_FER_SIGNATURE` on failure).
### FR-017 Certification Transactions (FCT/1: Policy & Intent)
Certification events MUST be recorded as **FCT/1** transactions that aggregate one or more FER/1 receipts and bind **registry policy, intent, domain scope, and authority role**.
Transactions MUST include attestations whenever `registry_policy != 0` and SHALL expose publication pointers when federated.
**All intent/scope/role/authority metadata lives in FCT/1 (not in FCS/1).**
### FR-BS-001 ByteStore Deterministic Identity
ByteStore SHALL derive CIDs using the canonical CAS domain separator: `CID = algo || H("CAS:OBJ\0" || payload)`.
The derived CID returned by `put()` and `import_cor()` MUST match the CID embedded in COR/1 envelopes and SHALL remain stable across runs, implementations, and ingest modes (DDS §11.2; ADR-030).
### FR-BS-002 Atomic Durability Ladder
ByteStore persistence MUST follow the atomic write ladder: write → `fsync(tmp)``rename``fsync(shard)``fsync(root)`.
Crash-window simulations triggered via `AMDUAT_BYTESTORE_CRASH_STEP` MUST leave the public area consistent upon recovery, with no visible partial objects (DDS §11.4; ADR-030; evidence PH05-EV-BS-001).
### FR-BS-003 Secure/Public Area Isolation
ByteStore SHALL enforce SA/PA isolation such that public payload roots and secure state roots are disjoint and non-overlapping.
Violations MUST raise `ERR_AREA_VIOLATION` and SHALL be surfaced to callers (DDS §11.5; ADR-030).
### FR-BS-004 COR/1 Round-Trip Identity
Importing COR/1 bytes via ByteStore and exporting the same CID MUST yield a byte-identical envelope.
Any mismatch between stored bytes and derived CID SHALL raise `ERR_IDENTITY_MISMATCH` (DDS §11.3; ADR-030).
### FR-BS-005 Streaming Determinism & Policy Enforcement
Chunked ingestion (`put_stream`) MUST produce the same CID as single-shot `put` for equivalent payloads and reject non-bytes or missing data with deterministic errors (`ERR_STREAM_ORDER`, `ERR_STREAM_TRUNCATED`).
ByteStore SHALL enforce ICD/1 `max_object_size` for all ingest paths, raising `ERR_POLICY_SIZE` when exceeded (DDS §11.611.7; ADR-030).
### FR-022 Federation Publication Digest (FPD/1)
Every publish event emerging from an FCT/1 certification MUST emit exactly one **FPD/1** digest satisfying ADR-007 single-digest guarantees.
The digest SHALL canonically hash the certified FCT/1 record, all attested FER/1 receipts, and the emitted governance edges (`certifies`, `attests`, `publishes`).
Implementations MUST persist the FPD/1 bytes alongside the FCT/1 payload under `/logs/ph03/evidence/fct/` (or successor evidence path) and reference the resulting CID from `fct.publication`.
Repeated invocations over identical inputs SHALL reproduce the same digest; mismatches SHALL be treated as certification failures.
### FR-018 Provenance Enforcement
Caching or replay layers MUST validate FER/1 receipts and FCT/1 transactions before serving composite outputs. Serving uncertified artefacts when policy requires certification is forbidden.
### FR-019 Transaction Envelope Rejection
Systems MUST reject FER/1 or FCT/1 envelopes whose CID lineage does not match the referenced FCS/1 descriptor, whose timestamps are non-monotonic, or whose signatures/attestations fail verification.
### FR-020 Deterministic Execution Envelope
| ID | Statement | Verification | Notes |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| **FR-020 — Deterministic Execution Envelope** | Each executor SHALL complete within a bounded deterministic time envelope (default 5 s). Execution time SHALL be measured and logged as evidence. Non-termination SHALL yield symbolic error `ERR_EXEC_TIMEOUT`. | Verified via CI parity harness and evidence file `/logs/ph03/evidence/<date>-execution-times.jsonl`. | Implements Maats Balance principle. Tags: [deterministic-timing, evidence, maat-balance]. |
### FR-021 Acyclic Composition
FCS/1 descriptors referencing FPS/1 primitives, PCB1 parameter blocks, or nested FCS/1 descriptors MUST form an acyclic graph.
Registries SHALL reject submissions introducing self-references or cycles and emit `ERR_FCS_CYCLE_DETECTED` or
`ERR_PCB_ARITY_MISMATCH` when arity metadata conflicts with PCB1 manifests.
### FR-028 Concept-Native Domain Materialization
Federated domain manifests SHALL be materialized exclusively from CRS Concepts
and Relations. Given a DomainNode Concept, registries MUST traverse
`hasManifest``ManifestEntry` Concepts, extract `entryName` and
`entryChildVersion` relations, dedupe the `(name, version)` set, and compute the
GS/1 domain state deterministically. Duplicated pairs trigger `ERR_DG_DUP_ENTRY`;
missing relations trigger `ERR_DG_ENTRY_INCOMPLETE`; self references or
ancestor loops raise `ERR_DG_CYCLE`. Evidence: `tools/ci/dg_snapshot.py`
`logs/ph04/evidence/dg1/PH04-EV-DG-001/`.
Operational linkage: router listings (`GET /links`) MUST return entries sorted
lexicographically by `fls_cid` and treat `since` query parameters as exclusive
lower bounds, ensuring deterministic replay of linkage events.
### FR-029 Publication Recursion Discipline
Publication Concepts SHALL declare their supporting FPD/1 digest, GS/1 cover
state, endorsed member FPD CIDs, and optional lineage parent using CRS
relations (`covers`, `endorses`, `parent`). Validators MUST recompute GS/1 from
the FPD payload, enforce duplicate-free membership, and detect recursive
cycles (`ERR_FPD_CYCLE`). Timestamp regressions raise `ERR_FPD_TIMESTAMP`; state
mismatches raise `ERR_PUB_STATE_MISMATCH`. Evidence: `tools/ci/pub_validate.py`
`logs/ph04/evidence/pub1/PH04-EV-PUB-001/`.
Operational linkage: non-genesis publications SHOULD enable the parent-required
policy, supplying `fpd.parent` and guaranteeing strictly monotonic
`fpd.timestamp` to align with ADR-019 v1.2.1 and PH04 parent-policy harnesses.
### FR-030 Predicate Concepts
Every CRR/1 relation predicate MUST resolve to a CRS Concept. When the
taxonomy defines a `Predicate` Concept, predicate entries SHALL expose an
`is_a` edge into that class. Missing predicate Concepts raise
`ERR_CRR_PREDICATE_NOT_CONCEPT`; missing taxonomy membership raises
`ERR_CRR_PREDICATE_CLASS_MISSING`. Evidence: CRS validator vectors and
`logs/ph04/evidence/crs1/PH04-EV-CRS-001.md`.
Operational linkage: FPD feed endpoints SHALL implement stateless, content-anchored pagination over parent-chained publications. `GET /feed/fpd` MUST traverse the publishers current tip toward genesis until either the caller-provided `limit` is satisfied or the supplied `since` CID is encountered; identical `publisher_id`, `since`, and `limit` inputs SHALL yield identical CID sequences. Detail lookups (`GET /feed/fpd/:cid`) SHALL expose publisher, members, parent, and state metadata without server-side session state. Evidence: `tools/ci/feeds_check.py``/amduat/logs/ph04/evidence/feeds/PH04-EV-FEEDS-001/pass.jsonl`.
### FR-031 Authority Anchoring via CRS & FPD
Publishing authorities SHALL represent identities as CRS Concepts linked via
`owns` and `hasRole` relations to key material and governance roles. Signatures
remain confined to FCT/1 and FPD/1 surfaces; CRS layers stay unsigned. FLS/1
transport MAY carry Concept or Relation payloads but MUST NOT mutate them and
MUST perform payload-kind checks when requested (`--check-crs-payload`).
Operational linkage: FLS router deployments SHALL expose `POST /fls`,
`GET /fls/:cid`, `GET /links`, `GET /healthz`, and `GET /readyz` endpoints and
enforce SA/PA separation (`ERR_AREA_VIOLATION` if misconfigured) so that public
ingest never mutates state areas directly. Audited ticket intake SHALL be
implemented via WT/1 (ADR-023) with:
* `POST /wt` (Protected Area) accepting WT/1 BCF/1 payloads, validating
`has_pubkey(wt.author, wt.pubkey)` (or registered equivalent), verifying
signatures over `H("AMDUAT:WT\0" || canonical_bytes_without_signature)`,
enforcing registered ADR-010 intents (deduped + byte-lexicographically
sorted), ensuring monotonic `wt.timestamp` per `wt.author`, and optionally
chaining `wt.parent` lineage. Violations yield `ERR_WT_SIGNATURE`,
`ERR_WT_KEY_UNBOUND`, `ERR_WT_INTENT_UNREGISTERED`, `ERR_WT_INTENT_DUP`,
`ERR_WT_INTENT_EMPTY`, `ERR_WT_TIMESTAMP`, `ERR_WT_PARENT_UNKNOWN`, or
`ERR_WT_PARENT_REQUIRED`. Router policy MUST surface scope denials as
`ERR_WT_SCOPE_UNAUTHORIZED` and log the governing policy capsule.
* `GET /wt/:cid` returning the canonical WT/1 bytes for any accepted ticket.
* Deterministic pagination (`GET /wt?after=<cid>&limit=<n>`) that emits WT/1
entries in byte-lexicographic CID order with stable page boundaries. The
`after` parameter is an exclusive bound and routers SHALL enforce
`1 ≤ limit ≤ Nmax` to guarantee replay stability.
Evidence: `/amduat/logs/ph04/evidence/wt1/PH04-EV-WT-001/summary.md` captures the
validator run over vectors `TV-WT-001…009`, ensuring unknown keys, signature
failures, timestamp regressions (including parent inversions), unbound keys,
unregistered intents, policy rejections, and unresolved parents reject as
specified.
Compat overlays SHALL reference ADR-025 MPR/1 provenance capsules and ADR-026
IER/1 inference evidence when operating in policy lane `compat`. Routers MUST
validate that `executor_fingerprint` equals the supplied MPR/1 CID, enforce
`determinism_level` plus `rng_seed` (raising `ERR_FER_RNG_REQUIRED` when
omitted), and verify log digests via the IER/1 manifest before accepting
overlays (`ERR_IER_LOG_HASH`/`ERR_IER_LOG_MANIFEST`). Evidence surfaces
`/amduat/logs/ph04/evidence/mpr1/PH04-EV-MPR-001/pass.jsonl` and
`/amduat/logs/ph04/evidence/ier1/PH04-EV-IER-001/pass.jsonl` prove vector
coverage `TV-MPR-001…003` (hash triple, missing weights, signature domain) and
`TV-IER-001…004` (ok, missing seed, fingerprint mismatch, log digest mismatch)
respectively with scenario summaries in accompanying `summary.md` files.
### FR-032 CT/1 Deterministic Replay (D1)
Given identical AC/1 + DTF/1 + topology inputs, executing the runtime twice in
isolation MUST produce byte-identical CT/1 snapshots (header and payload) with
matching CIDs whenever `ct.determinism_level = 0`. Evidence:
`tools/ci/ct_replay.py` (`runA`/`runB`) →
`/amduat/logs/ph05/evidence/ct1/PH05-EV-CT1-REPLAY-001/`.
### FR-033 CT/1 Numeric Stability (D2)
When `ct.determinism_level = 1`, numeric observables MAY diverge, but the
maximum absolute delta MUST remain within the tolerance documented by
`ct.kernel_cfg`. Evidence: `tools/ci/ct_replay.py` D2 replay outputs and kernel
configuration manifests in the same evidence set.
### FR-034 CT/1 Header Integrity
CT/1 headers MUST follow ADR-027: canonical BCF/1 key ordering, rejection of
unknown keys, monotonic `ct.tick`, canonical `cid:` formatting for topology and
AC/1/DTF/1 pointers (ADR-028), and Ed25519 signatures over
`H("AMDUAT:CT\0" || canonical_bytes_without_signature)`. Evidence:
`tools/validate/ct1_validator.py` with vectors
`/amduat/vectors/ph05/ct1/TV-CT1-001…004` and AC/DTF fixtures
`TV-AC1-001…002`, `TV-DTF1-001…002`.
---
## 4. Non-Functional Requirements
### NFR-001 Determinism
Platform/language differences MUST NOT affect CID.
### NFR-002 Performance
Put/get latency MUST remain within configured OPS budgets.
### NFR-003 Reliability
CAS operations MUST be atomic; partial writes MUST NOT be visible.
### NFR-004 Portability
Implementations MUST operate on common filesystems.
### NFR-005 Security Posture
Domain separation strings MUST be applied for all hashed surfaces.
### 4.3 Future Scope Alignment (Informative)
Phase 02 introduces deterministic transformation primitives (**FPS/1**) extending the Kheper CAS model defined herein.
See `/amduat/arc/adrs/adr-015.md` and `/amduat/tier1/fps.md` for details.
No behavioural changes apply retroactively to PH01 surfaces.
---
## 5. Data Model (Behavioural View)
* CAS objects identified strictly by CID.
* COR/1 envelope provides size, payload, algo_id.
* ICD/1 descriptor provides instance configuration.
> See DDS §2 (COR/1) and §3 (ICD/1) for normative byte layouts.
---
## 6. API Semantics
### `put(payload_bytes, algo_id=default) → CID`
* Compute CID using domain separation: `CID = algo_id || H("CAS:OBJ\0" || payload_bytes)`
* If CID exists: return existing CID (idempotent)
* If absent: write canonical COR/1 envelope atomically
* Reject on size limit breach, malformed payload, non-canonical COR/1, I/O errors
* Writes MUST be atomic: temp file → fsync → rename → fsync parent dir
### `get(CID) → payload_bytes`
* Retrieve raw payload bytes
* MUST validate canonical COR/1 envelope
* Implementation MAY verify hash on read by policy
* Reject on missing object, hash mismatch
### `exists(CID) → bool`
* Return true if object is present and canonical
### `stat(CID) → { present, size, algo_id }`
* MUST return canonical metadata
### `verify(CID) → { ok|error, expected:CID, actual:CID }`
* Recompute CID from canonical bytes
* MUST detect corruption and reject non-canonical encodings
### `import(stream_COR1) → CID`
* Validate canonical TLV ordering
* Reject duplicate tags, extraneous tags, malformed VARINTs
* MUST round-trip to identical CID
### `export(CID) → stream_COR1`
* Emit canonical envelope; re-encoding MUST preserve canonical bytes
### Deterministic Errors
Errors MUST be emitted as stable symbolic codes including but not limited to:
* `E_CID_NOT_FOUND`
* `E_CORRUPT_OBJECT`
* `E_CANONICALITY_VIOLATION`
* `E_IO_FAILURE`
---
## 7. Success Criteria
* Byte-for-byte CID agreement (≥ 3 platforms)
* Zero false positives in `verify()`
* Idempotent concurrent `put()`
* COR/1 import/export round-trips cleanly
---
## 8. GC Semantics (Behavioural)
* Reachability from configured roots
* Dry-run mode MUST NOT delete
* Removal MUST be atomic per object
---
## 9. Acceptance Criteria (Phase Exit)
* Golden vectors published
* Cross-impl CI passing
* COR/1 and ICD/1 documented in DDS
* Security posture validated by SEC
---
## 10. Traceability
* Requirements link to tests/defects in Phase Packs
* ADRs reference affected FR/NFR IDs
---
## 11. Future Phases
* Multi-object transactions bind to `instance_id`
* Provenance graph consumes COR/1 metadata
---
## 12. Functional Primitive Surface (FPS/1)
> Defines the canonical deterministic operations over canonical payloads.
> Each primitive produces exactly one payload and one CID.
| Primitive | Signature | Description | Determinism / Errors |
| ------------- | ------------------------------ | ------------------------------------------- | ---------------------------------------------- |
| `put` | `(payload_bytes) → CID` | Canonical write, atomic fsync ladder. | ADR-006 `ERR_IO_FAILURE`, `ERR_NORMALIZATION`. |
| `get` | `(CID) → payload_bytes` | Fetch canonical bytes. | `ERR_CID_NOT_FOUND`. |
| `slice` | `(CID, offset, length) → CID` | Extract contiguous bytes. | `ERR_SLICE_RANGE`. |
| `concatenate` | `([CID₁,…,CIDₙ]) → CID` | Sequential join of payloads. | `ERR_EMPTY_INPUTS`. |
| `reverse` | `(CID, level) → CID` | Reverse payload order (bit/byte/word/long). | `ERR_REV_ALIGNMENT`, `ERR_INVALID_LEVEL`. |
| `splice` | `(CID_a, offset, CID_b) → CID` | Insert payload b into a at offset. | `ERR_SPLICE_RANGE`. |
**Determinism:** identical inputs → identical outputs.
**Immutability:** inputs never mutated.
**Closure:** outputs valid for reuse as inputs to any primitive.
**Error handling:** all symbolic per ADR-006.
---
## Appendix A — Surface Version Table
| Surface | Version | Notes |
| ------- | ------- | ----- |
| FCS/1 | v1-min | Canonical execution descriptors; governance captured in FCT/1. |
| FER/1 | v1.1 | Receipts enforce parity-first evidence, run_id dedup, typed logs, and RNG discipline (ADR-017). |
| FCT/1 | v1.0 | Certification transactions binding policy/intent/attestations with FER/1 sets. |
| FPD/1 | v1.0 | Publication digest linking FCT/1 to FER/1 receipts for federation replay. |
---
## Document History
* 0.2.1 (2025-10-26) — Phase Pack pointer updated; no semantic changes; archival preserves historical lineage per ADR-002.
* 0.2.2 (2025-10-26) — Promoted PH01 baseline to Approved; synchronized Phase Pack §1 anchors and closure snapshot.
* 0.2.3 (2025-10-27) — Added future scope alignment note pointing to FPS/1 and ADR-015; PH01 semantics remain unchanged.
* **0.2.4 (2025-11-14):** Added FR-014FR-019 for FCS/1 composition, FER/1 receipts, and FCT/1 certification policies.
* **0.2.5 (2025-11-15):** Added FR-021 (formerly FR-020) enforcing acyclic FCS/1 composition and PCB1 arity validation.
* **0.2.6 (2025-11-19):** Registered FR-020 Deterministic Execution Envelope (Maats Balance) with timing evidence tags.
* **0.3.0 (2025-11-02):** Trimmed FCS/1 to execution-only (v1-min) under FR-014/FR-015; moved policy/intent/scope/role/authority to FCT/1 (FR-017); clarified registry admission behaviour and kept FER/1 unchanged.
* **0.3.1 (2025-11-21):** Updated FR-016 to require parity-first FER/1 receipts with executor sets, parity vectors, and FR-020 aligned timestamps.
* **0.3.2 (2025-11-22):** Registered FR-022 Federation Publication Digest (FPD/1) requirement tying FCT/1 publications to single-digest evidence and canonical logging.
* **0.3.4 (2025-11-07):** Recorded FER/1 v1.1 requirement for Phase 04 and added surface version table.
* **0.3.5 (2025-11-08):** Registered PH04 linkage & semantic placeholder requirements (FR-028…031).
* **0.3.6 (2025-11-09):** Promoted FR-028…031 to normative linkage requirements with CRS/1 validator enforcement.
* **0.3.7 (2025-11-08):** Finalized FR-028…031 with CRS/1 immutability, GS/1 linkage, and certification coverage.
* **0.3.8 (2025-11-09):** Promoted FR-028…FR-031 for concept-native domain and publication validation.
* **0.3.9 (2025-11-09):** Documented operational linkage: router endpoints, deterministic `/links`, and parent-required publish policy guidance.
* **0.3.10 (2025-11-11):** Registered FR-030 stateless, content-anchored FPD feed pagination requirement.
* **0.3.11 (2025-11-09):** Extended FR-031 with WT/1 intake endpoints, validation, and evidence log references.
* **0.3.12 (2025-11-20):** Tightened FR-031 with `wt.pubkey` bindings, signature preimage exclusion, lineage/policy errors, and
expanded WT/1 vector evidence coverage.
* **0.3.13 (2025-11-21):** Updated FR-031 for `has_pubkey` bindings (`ERR_WT_KEY_UNBOUND`), intent registry enforcement (`ERR_WT_INTENT_UNREGISTERED`), lineage policy rejection (`ERR_WT_PARENT_REQUIRED`), and expanded WT/1 vectors `TV-WT-001…009`.
* **0.3.14 (2025-11-22):** WT/1 intake and SOS/1 compat overlays proven with PH04-M4/M5 audit evidence.
* **0.3.15 (2025-11-22):** Recorded ADR-025/026 compat path requirements and evidence anchors for FR-031.
* **0.3.16 (2025-11-23):** Compat lane now enforces ADR-025/026 validators (MPR/1 hash triple, IER/1 replay) with updated evidence surfaces.
* **0.3.17 (2025-11-24):** Added FR-032FR-034 for CT/1 replay determinism, numeric stability, and header integrity (ADR-027/028).
* **0.4.0 (2025-11-11):** Added FR-BS-001…005 for ByteStore identity, atomic durability, SA/PA isolation, COR round-trip, and streaming determinism linked to DDS §11 / ADR-030.

158
tier1/tgk-1.md Normal file
View file

@ -0,0 +1,158 @@
# TGK/1 — Trace Graph Kernel Semantics
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-11-30
Linked Phase Pack: N/A
Tags: [tgk, determinism, index, federation]
<!-- Source: /amduat-api/tier1/tgk-1.md | Canonical: /amduat/tier1/tgk-1.md -->
**Document ID:** `TGK/1`
**Layer:** L1 — Semantic graph layer over ASL artifacts and PERs (no encodings)
**Depends on (normative):**
* `ASL/1-CORE`
* `ASL/1-CORE-INDEX`
* `ASL/LOG/1`
* `ASL/SYSTEM/1`
* `TGK/1-CORE`
**Informative references:**
* `ENC/TGK1-EDGE/1` — core edge encoding
* `ENC/TGK-INDEX/1` — index encoding draft
* `ASL/INDEX-ACCEL/1`
* `ENC/ASL-CORE-INDEX/1`
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
TGK/1 defines semantic meaning only. It does not define storage formats, on-disk encodings, or execution operators.
---
## 1. Purpose & Scope
TGK/1 defines the **semantic layer** for Trace Graph Kernel (TGK) edges that relate ASL artifacts and PERs.
It keeps TGK thin and deterministic by reusing ASL index and log semantics.
Non-goals:
* New encodings for edges or indexes
* Query operators or execution plans
* Federation protocols or transport
* Re-definition of ASL or PEL semantics
---
## 2. TGK Objects
### 2.1 TGK Edge
A TGK Edge is an **immutable record** representing a directed relationship between ASL artifacts and/or PERs.
TGK edges are semantic overlays and **MUST NOT** redefine or bypass ASL identity.
TGK/1-CORE defines the EdgeBody structure with ordered `from`/`to` lists; TGK/1
does not further constrain cardinality.
### 2.2 Canonical Edge Key
Each TGK edge has a **Canonical Edge Key** that uniquely identifies it.
The Canonical Edge Key MUST be derived from the logical `EdgeBody` defined in
`TGK/1-CORE`, preserving list order and multiplicity:
* `from`: ordered list of source node identifiers (MAY be empty)
* `to`: ordered list of destination node identifiers (MAY be empty)
* `payload`: reference carried by the edge
* `type`: edge type identifier
* Projection context (for example, PER or execution identity) when not already
captured by the edge payload or type profile
Classification attributes (edge type keys, labels) **MUST NOT** affect canonical identity.
---
## 3. Index and Visibility (Normative)
TGK edges are **indexed objects** and inherit visibility from the ASL index and log:
1. A TGK edge becomes visible only when its index record is admitted by a sealed segment and log order (ASL/LOG/1).
2. TGK traversal and lookup **MUST NOT** bypass index visibility or log ordering.
3. For a fixed `{Snapshot, LogPrefix}`, TGK edge lookup and shadowing **MUST** be deterministic (ASL/1-CORE-INDEX).
4. Tombstones and shadowing semantics follow ASL/1-CORE-INDEX and ASL/LOG/1 replay order.
Index records MUST reference TGK/1-CORE edge identities. Index encodings MUST
NOT re-encode edge structure (`from[]`, `to[]`); they reference TGK/1-CORE edges
and carry only routing/filter metadata.
---
## 4. Deterministic Traversal (Normative)
TGK traversal operates over a snapshot/log-bounded view:
* Inputs: `{Snapshot, LogPrefix}` and a seed set (nodes or edges).
* Outputs: only edges visible under the same `{Snapshot, LogPrefix}`.
* Traversal **MUST** be deterministic and replay-compatible with ASL/LOG/1.
Deterministic ordering for traversal output MUST be:
1. `logseq` ascending
2. Canonical Edge Key as tie-break
Acceleration structures MAY be used but MUST NOT change semantics.
---
## 5. Federation Alignment (Normative)
Federation does not change TGK semantics. It only propagates edges and artifacts that are already visible under index rules.
* Domain visibility and publication status are enforced via index metadata (ENC-ASL-CORE-INDEX).
* TGK edges keep canonical identity across domains.
* Cross-domain propagation MUST preserve snapshot/log determinism.
---
## 6. Non-Goals
TGK/1 does not define:
* Edge encoding or storage layout
* Index segment formats
* Query languages or execution plans
* Acceleration rules beyond ASL/INDEX-ACCEL/1
---
## 7. Normative Invariants
Conforming implementations MUST enforce:
1. TGK edges are immutable and indexed objects.
2. No TGK visibility without index admission and log ordering.
3. Traversal is snapshot/log bounded and deterministic.
4. Federation does not alter TGK semantics; it only propagates visible edges.
5. Edge classification is not part of canonical identity.