Polish ASL index/log specs

2026-01-17 12:21:15 +01:00 · 2026-01-17 12:21:15 +01:00 · 20f092606d
parent c595e2370a
commit 20f092606d
12 changed files with 303 additions and 20 deletions
--- a/AUDITS.md
+++ b/AUDITS.md
@ -22,6 +22,60 @@ Verification notes:
 - Prefer explicit commands and paths (e.g., `ctest --test-dir build`).
 - If results are user-reported, note that explicitly.
 Note: the filesystem ASL store (`asl_store_fs`) is a legacy convenience backend
 and will be considered non-conformant to ASL index/log specs once the index/log
 store is introduced. Audits for ASL index/log specs target the new backend only.
 ## Test Expectations (Planned)
 These tests are planned to validate index/log behavior once implemented:
 | Area | Example tests |
 | --- | --- |
 | Segment encoding | Round-trip encode/decode; CRC mismatch rejection; offset bounds checks |
 | Log encoding | Hash-chain validation; unknown record type skip; truncated record rejection |
 | Replay | Snapshot anchor + log replay determinism; segment seal visibility |
 | Tombstones | Shadowing and lift across snapshots; domain-local shadowing rules |
 | Visibility | CURRENT computed by `(SnapshotID, LogPosition)`; reverse seal-log order |
 | Recovery | Crash with open segment; replay yields deterministic CURRENT |
 ## Spec Coverage (Implementation Status)
 Status legend: ✅ implemented, 🟡 planned/in-progress, ⬜ not started.
 | Spec | Status | Notes |
 | --- | --- | --- |
 | `ASL/1-CORE` | ✅ | Core artifact semantics implemented. |
 | `ASL/1-STORE` | ✅ | Store semantics + fs backend. |
 | `ENC/ASL1-CORE` | ✅ | Artifact/Reference encoding. |
 | `HASH/ASL1` | ✅ | Hash registry + streaming API. |
 | `PEL/1-CORE` | ✅ | Core execution semantics. |
 | `PEL/1-SURF` | ✅ | Store-backed surface execution. |
 | `PEL/PROGRAM-DAG/1` | ✅ | DAG scheme execution. |
 | `PEL/PROGRAM-DAG-DESC/1` | ✅ | Scheme descriptor codec + wiring. |
 | `ENC/PEL-PROGRAM-DAG/1` | ✅ | Program encoding. |
 | `ENC/PEL1-RESULT/1` | ✅ | Result encoding. |
 | `PEL/TRACE-DAG/1` | ✅ | Trace semantics + wiring. |
 | `ENC/PEL-TRACE-DAG/1` | ✅ | Trace encoding. |
 | `TGK/1-CORE` | ✅ | Edge semantics + validation. |
 | `ENC/TGK1-EDGE/1` | ✅ | Edge encoding. |
 | `TGK/STORE/1` | ✅ | Store semantics. |
 | `TGK/PROV/1` | ✅ | Provenance operators. |
 | `OPREG/PEL1-KERNEL` | ✅ | Kernel op registry. |
 | `OPREG/PEL1-KERNEL-PARAMS/1` | ✅ | Kernel params encoding. |
 | `AMDUAT20-STACK-OVERVIEW` | ✅ | Orientation surface aligned. |
 | `ASL/1-CORE-INDEX` | 🟡 | Spec clarified; implementation pending. |
 | `ASL/STORE-INDEX/1` | 🟡 | Spec clarified; implementation pending. |
 | `ENC/ASL-CORE-INDEX/1` | 🟡 | Encoding planned. |
 | `ASL/LOG/1` | 🟡 | Log semantics planned. |
 | `ENC/ASL-LOG/1` | 🟡 | Encoding planned. |
 | `ASL/INDEX-ACCEL/1` | 🟡 | Semantics planned. |
 | `ASL/INDEXES/1` | 🟡 | Taxonomy planned. |
 | `ASL/TGK-EXEC-PLAN/1` | 🟡 | Encoding-only plan; executor out of scope. |
 | `ENC/ASL-TGK-EXEC-PLAN/1` | 🟡 | Encoding planned. |
 | `ASL/SYSTEM/1` | 🟡 | Cross-cutting view planned. |
 | `TGK/1` | 🟡 | Semantic layer planned. |
 ## Audit Plan
 Status legend: ✅ completed, ⬜ pending.
--- a/README.md
+++ b/README.md
@ -65,6 +65,19 @@ status and refs are printed to stderr.
  when not using `--output-raw`.
 - The filesystem ASL store layout expects digests at least 2 bytes long
  (two directory levels). Experimental shorter digests need a different store.
 - The filesystem ASL store (`amduat-asl ... --root`) is a legacy convenience
  backend; once the index/log store is introduced it is considered
  non-conformant to ASL index/log specs and should be used only for quickstart
  demos.
 - Compatibility & migration: existing `asl_store_fs` stores will not be
  automatically upgraded. Plan to re-ingest artifacts into the index/log store
  when it lands.
 ## Documentation
 - Implementation clarifications: `docs/spec-clarifications.md`
 - Spec coverage matrix: `AUDITS.md` (Spec Coverage section)
 - Index/log API sketch: `docs/index-log-api-sketch.md`
 ## PEL reference
--- a/docs/index-log-api-sketch.md
+++ b/docs/index-log-api-sketch.md
@ -0,0 +1,58 @@
 # Index/Log API Surface (Sketch)
 This document is a one-page sketch of the planned public API for ASL index/log
 support. It is non-normative and intended to guide header design.
 ## ASL Index/Log Types (Draft)
 ```
 typedef uint64_t amduat_asl_snapshot_id_t;
 typedef uint64_t amduat_asl_log_position_t; // inclusive logseq upper bound
 typedef struct {
  amduat_asl_snapshot_id_t snapshot_id;
  amduat_asl_log_position_t log_position;
 } amduat_asl_index_state_t;
 ```
 ## Core Store API (Draft)
 ```
 // Initialization and config.
 bool amduat_asl_store_index_init(...);
 // PUT/GET with index state reporting.
 amduat_asl_store_error_t amduat_asl_store_put_indexed(
    amduat_asl_store_t *store,
    amduat_artifact_t artifact,
    amduat_reference_t *out_ref,
    amduat_asl_index_state_t *out_state);
 amduat_asl_store_error_t amduat_asl_store_get_indexed(
    amduat_asl_store_t *store,
    amduat_reference_t ref,
    amduat_asl_index_state_t state,
    amduat_artifact_t *out_artifact);
 ```
 ## Index/Log Introspection (Draft)
 ```
 // Snapshot/log position queries.
 bool amduat_asl_index_current_state(amduat_asl_store_t *store,
                                    amduat_asl_index_state_t *out_state);
 // Segment and log inspection (read-only).
 bool amduat_asl_log_scan(amduat_asl_store_t *store, ...);
 bool amduat_asl_segment_scan(amduat_asl_store_t *store, ...);
 ```
 ## Expected Error Surfaces
 * `AMDUAT_ASL_STORE_ERR_INTEGRITY` for malformed index segments or log records.
 * `AMDUAT_ASL_STORE_ERR_IO` for underlying I/O faults.
 * `AMDUAT_ASL_STORE_ERR_NOT_FOUND` for absent artifacts or missing segments.
 * `AMDUAT_ASL_STORE_ERR_UNSUPPORTED` for unsupported encoding versions.
 These are illustrative; exact error codes and mapping will be finalized when
 headers are introduced.
--- a/docs/spec-clarifications.md
+++ b/docs/spec-clarifications.md
@ -4,6 +4,19 @@ This document records implementation-level clarifications for draft Tier-1
 specs. These notes do not change the specs; they document concrete choices for
 the implementation in this repository.
 ## Glossary and Abbreviations
 | Term | Meaning |
 | --- | --- |
 | CURRENT | Effective index state after replaying a log position on a snapshot. |
 | LogPosition | Inclusive `logseq` upper bound for replay (not a byte offset). |
 | SnapshotID | Opaque `uint64_t` identifier persisted in `SNAPSHOT_ANCHOR`. |
 | Segment seal | Log record admitting a segment via `(segment_id, segment_hash)`. |
 | Segment hash | SHA-256 over exact on-disk segment bytes, including footer. |
 | Tombstone | Visibility policy record applied during replay. |
 | Tombstone lift | Cancels a specific tombstone record for the same artifact. |
 | Exec plan | Serialized plan format; executor out of scope for core library. |
 ## Snapshot and Log Identity (ASL/STORE-INDEX + ASL/LOG)
 Decision:
--- a/tier1/asl-core-index-1.md
+++ b/tier1/asl-core-index-1.md
@ -61,7 +61,7 @@ ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts:
 * It specifies what it means to map an artifact identity to a byte location.
 * It defines visibility, immutability, and shadowing semantics.
-* It ensures deterministic lookup for a fixed snapshot and log prefix.
+* It ensures deterministic lookup for a fixed snapshot and log position.
 ### 1.2 Non-goals
@ -84,9 +84,11 @@ ASL/1-CORE-INDEX explicitly does **not** define:
 * **BlockID** — opaque identifier for a block.
 * **ArtifactExtent** — `(BlockID, offset, length)` identifying a byte slice within a block.
 * **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
 * **Degenerate store** — a store that treats each artifact as its own block,
  with a single extent covering the entire blob.
 * **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state.
 * **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot.
-* **CURRENT** — effective state after replaying a log prefix on a snapshot.
+* **CURRENT** — effective state after replaying a log position on a snapshot.
 ---
@ -104,7 +106,7 @@ For any visible `Reference`, there is exactly one `ArtifactLocation` at a given
 ### 3.2 Determinism
-For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
+For a fixed `{StoreConfig, Snapshot, LogPosition}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
 ### 3.3 StoreConfig Consistency
@ -123,6 +125,8 @@ All references in an index view are interpreted under a fixed StoreConfig. Imple
 * Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
 * An ArtifactLocation is valid only while all referenced blocks are retained.
 * ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping.
 * In a degenerate store, an ArtifactLocation consists of a single extent that
  spans the full blob in its dedicated block.
 ---
@ -130,11 +134,19 @@ All references in an index view are interpreted under a fixed StoreConfig. Imple
 An index entry is **visible** at CURRENT if and only if:
-1. The entry is contained in a sealed segment whose seal record is admitted in the ordered log prefix for CURRENT (or anchored in the snapshot).
+1. The entry is admitted by the store's visibility mechanism as defined in
-2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
+   `ASL/STORE-INDEX/1` (e.g., via sealed segments and an append-only log), for
   the given snapshot/log position.
 2. The referenced bytes are immutable (e.g., the underlying block is sealed by
   store rules).
 Visibility is binary; entries are either visible or not visible.
 **Implementation note:** A store MAY implement a degenerate visibility
 mechanism (e.g., a single implicit segment that is always sealed and a trivial
 log position), which is sufficient for simple filesystem-backed stores such as
 `asl_store_fs`.
 ---
 ## 6. Snapshot and Log Semantics
@ -144,7 +156,7 @@ Snapshots provide a base mapping of sealed segments; the append-only log admits
 The index state for a given CURRENT is defined as:
 ```
-Index(CURRENT) = Index(snapshot) + replay(log_prefix)
+Index(CURRENT) = Index(snapshot) + replay(log_position)
 ```
 Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed.
@ -169,11 +181,14 @@ Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entr
 ## 8. Tombstones (Optional)
-Tombstone entries MAY be used to invalidate prior mappings.
+Tombstones MAY be used to invalidate prior mappings.
 * A tombstone shadows earlier entries for the same Reference.
-* Visibility rules are identical to regular entries.
+* Tombstones are visibility policy records (see `ASL/LOG/1`) and are applied
-* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used.
+  during replay; they are not required to appear as index entries.
 * If an encoding chooses to materialize tombstones in index segments, they MUST
  have no `ArtifactLocation` and MUST follow the same visibility rules as other
  entries.
 ---
--- a/tier1/asl-log-1.md
+++ b/tier1/asl-log-1.md
@ -261,6 +261,21 @@ To reconstruct CURRENT:
 Replay MUST be deterministic.
 ### 5.1 Example: Tombstone + Lift Across Snapshots (Informative)
 Let `R` be an artifact reference. Consider the following log sequence:
 1. `logseq = 10`: `SEGMENT_SEAL` admits a segment containing `R`.
 2. `logseq = 20`: `TOMBSTONE(R)` shadows `R`.
 3. `logseq = 30`: `SNAPSHOT_ANCHOR(snapshot_id = 7)` is recorded.
 4. `logseq = 40`: `TOMBSTONE_LIFT(R, tombstone_logseq = 20)` is recorded.
 Replay rules:
 * CURRENT at `(snapshot_id = 7, log_position = 30)` includes the tombstone,
  because the lift occurs after the snapshot.
 * CURRENT at `(snapshot_id = 7, log_position = 40)` lifts the tombstone and `R`
  becomes visible again (assuming no later tombstones).
 ---
 ## 6. Index Interaction
--- a/tier1/asl-store-index-1.md
+++ b/tier1/asl-store-index-1.md
@ -58,6 +58,10 @@ It specifies:
 It **does not define encoding** (see `ENC/ASL-CORE-INDEX/1`) or semantic mapping (see `ASL/1-CORE-INDEX`).
 **Implementation note:** A degenerate store that skips segments/log replay (for
 example, simple filesystem backends) is non-conformant to ASL/STORE-INDEX/1 and
 is intended only for quickstart or legacy use.
 **Informative references:**
 * `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
@ -183,7 +187,7 @@ get(ArtifactKey, IndexState?) -> bytes | NOT_FOUND
 ### 4.5 GET Semantics
-1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_prefix)`.
+1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_position)`.
 2. If no entry exists, return `NOT_FOUND`.
 3. Otherwise, read exactly the referenced `(BlockID, offset, length)` bytes and return them verbatim.
@ -235,6 +239,30 @@ Notes:
 * Open segments need not survive snapshot.
 * Segments below snapshot are replay anchors.
 ### 5.3.1 Segment State Machine (Informative)
 ```
 OPEN -> SEALED -> VISIBLE -> GC_ELIGIBLE
 ```
 * **OPEN:** accepting new index records; not visible.
 * **SEALED:** immutable on disk; not yet visible until log-admitted.
 * **VISIBLE:** seal record admitted by log replay; visible for lookup.
 * **GC_ELIGIBLE:** no snapshots/log positions reference the segment.
 ### 5.4 Index/Log Bootstrap Flow (Informative)
 1. **Initialize store**: load latest snapshot anchor (if any); otherwise start
   with an empty index.
 2. **Load sealed segments**: from snapshot metadata, locate segment files and
   verify their hashes before admitting them.
 3. **Replay log**: scan records with `logseq > snapshot.logseq` in order and
   apply `SEGMENT_SEAL`, tombstones, and lifts.
 4. **Compute CURRENT**: resolve visibility and shadowing to produce the
   effective index view for queries.
 This flow is deterministic and idempotent; re-running it yields the same
 CURRENT state for a fixed `(SnapshotID, LogPosition)`.
 ---
 ## 7. Visibility and Lookup Semantics
@ -260,7 +288,7 @@ To resolve an `ArtifactKey`:
 Determinism:
-* Lookup results are identical across platforms given the same snapshot and log prefix.
+* Lookup results are identical across platforms given the same snapshot and log position.
 * Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
 ---
@ -394,6 +422,16 @@ Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
 ---
 ## 13.1 Conformance Checklist (Informative)
 * Reject visibility for any entry not admitted by replay.
 * Enforce immutability of sealed blocks and visible segments.
 * Ensure replay is deterministic and idempotent for a fixed index state.
 * Verify tombstone + lift behavior across snapshots.
 * Prevent GC of segments/blocks referenced by CURRENT or snapshots.
 ---
 ## 14. Non-Goals
 * Disk-level encoding (ENC-ASL-CORE-INDEX).
--- a/tier1/asl-system-1.md
+++ b/tier1/asl-system-1.md
@ -94,12 +94,12 @@ All of these objects are addressed and stored via the same index semantics.
 ## 3. Determinism & Snapshot Boundaries
-For a fixed `(SnapshotID, LogPrefix)`:
+For a fixed `(SnapshotID, LogPosition)`:
 * Index lookup is deterministic (ASL/1-CORE-INDEX).
-* TGK traversal is deterministic when bounded by the same snapshot/log prefix.
+* TGK traversal is deterministic when bounded by the same snapshot/log position.
 * PEL execution is deterministic when its inputs are bounded by the same
-  snapshot/log prefix.
+  snapshot/log position.
 PEL MUST read only snapshot-scoped artifacts and receipts. It MUST NOT depend
 on storage layout, block packing, or non-snapshot metadata.
@ -144,11 +144,11 @@ receipt annotations, not by changing the execution language.
 ## 5.1 PERs and Snapshot State (Clarification)
 PERs are artifacts that bind deterministic execution to a specific snapshot
-and log prefix. They do not introduce a separate storage layer:
+and log position. They do not introduce a separate storage layer:
 * The sequential log and snapshot define CURRENT.
-* A PER records that execution observed CURRENT at a specific log prefix.
+* A PER records that execution observed CURRENT at a specific log position.
-* Replay uses the same snapshot + log prefix to reconstruct inputs.
+* Replay uses the same snapshot + log position to reconstruct inputs.
 * PERs are artifacts and MAY be used as inputs, but programs embedded in
  receipts MUST NOT be executed implicitly.
--- a/tier1/asl-tgk-execution-plan-1.md
+++ b/tier1/asl-tgk-execution-plan-1.md
@ -76,7 +76,7 @@ Each operator includes:
 * `op_id`: unique identifier
 * `op_type`: operator type
 * `inputs`: upstream operator outputs
-* `snapshot`: `(SnapshotID, LogPrefix)`
+* `snapshot`: `(SnapshotID, LogPosition)` (inclusive logseq upper bound)
 * `constraints`: canonical filters
 * `projections`: output fields
 * `traversal`: optional traversal parameters
@ -122,7 +122,7 @@ Parallel execution MUST preserve this order.
 Records are visible if and only if:
-* `record.logseq <= snapshot.log_prefix`
+* `record.logseq <= snapshot.log_position`
 * The record is not shadowed by a later tombstone
 Unknown record types MUST be skipped without breaking determinism.
@ -136,7 +136,7 @@ Unknown record types MUST be skipped without breaking determinism.
 * Inputs: sealed segments
 * Outputs: raw record references
 * Rules:
-  * Only segments with `segment.logseq_min <= snapshot.log_prefix` are scanned.
+  * Only segments with `segment.logseq_min <= snapshot.log_position` are scanned.
  * Advisory filters MAY be applied but MUST NOT introduce false negatives.
  * Shard routing MAY be applied prior to scan if deterministic.
--- a/tier1/enc-asl-core-index-1.md
+++ b/tier1/enc-asl-core-index-1.md
@ -99,6 +99,24 @@ Each index segment file is laid out as follows:
 +------------------+
 ```
 Boxed sketch:
 ```
 ┌───────────────────────┐
 │ SegmentHeader         │
 ├───────────────────────┤
 │ BloomFilter[] (opt)   │
 ├───────────────────────┤
 │ IndexRecord[]         │
 ├───────────────────────┤
 │ DigestBytes[]         │
 ├───────────────────────┤
 │ ExtentRecord[]        │
 ├───────────────────────┤
 │ SegmentFooter         │
 └───────────────────────┘
 ```
 * **SegmentHeader**: fixed-size, mandatory
 * **BloomFilter**: optional, opaque, segment-local
 * **IndexRecord[]**: array of index entries
@ -336,6 +354,20 @@ must occur after the footer is finalized.
 * Legacy segments without federation fields are treated as local/internal (see 3.2).
 * Tombstones MUST NOT shadow artifacts from other domains; domain matching is required.
 ### 10.2 Error Handling (Normative)
 Readers MUST treat malformed segment files as invalid and MUST reject them.
 Examples include (non-exhaustive):
 * Incorrect magic/version/header size
 * Offsets not aligned or not pointing to the expected arrays
 * Out-of-range lengths or overflows in size calculations
 * CRC mismatch for the segment payload
 * Invalid federation fields or flag bits
 Rejected segments MUST NOT be admitted for lookup or replay. Implementations MAY
 surface diagnostic errors, but MUST NOT attempt partial salvage.
 ---
 ## 11. Alignment and Packing
@ -359,6 +391,15 @@ The ENC-ASL-CORE-INDEX specification ensures:
 ---
 ## 12.1 Error Mapping (Informative)
 Decoding failures (invalid magic/version, malformed offsets, CRC mismatch,
 invalid federation fields) MUST be surfaced to callers as decode errors. The
 exact error codes are implementation-specific; examples include
 `ERR_ASL_INDEX_ENC_INVALID`, `ERR_ASL_INDEX_CRC_MISMATCH`, or a generic
 `ERR_INTEGRITY`. Encoders/decoders MUST NOT treat malformed segments as valid
 or partially recoverable.
 ## 13. Relationship to Other Layers
 | Layer              | Responsibility                                             |
--- a/tier1/enc-asl-log-1.md
+++ b/tier1/enc-asl-log-1.md
@ -69,6 +69,16 @@ It does **not** define log semantics (see `ASL/LOG/1`).
 +----------------+
 ```
 Boxed sketch:
 ```
 ┌───────────────────────┐
 │ LogHeader             │
 ├───────────────────────┤
 │ LogRecord[]           │
 └───────────────────────┘
 ```
 * **LogHeader**: fixed-size, mandatory, begins file
 * **LogRecord[]**: append-only entries, variable number
@ -123,6 +133,14 @@ record_hash = H(prev_record_hash || logseq || record_type || payload_len || payl
 Readers MUST skip unknown `record_type` values using `payload_len` and MUST
 continue replay without failure.
 **Error handling (normative):**
 * Malformed log headers or records (bad magic/version, truncated payload,
  invalid `payload_len`, hash-chain mismatch) MUST cause the log to be rejected
  for replay.
 * Unknown `record_type` values are the only exception: they MUST be skipped
  using `payload_len` and MUST NOT break replay determinism.
 ---
 ## 6. Record Type IDs (v1)
@ -245,6 +263,21 @@ typedef struct {
 ---
 ## 7.1 Error Mapping (Informative)
 Decoding failures (invalid magic/version, truncated records, invalid payload
 lengths, hash-chain mismatches) MUST be surfaced to callers as decode errors.
 The exact error codes are implementation-specific; examples include
 `ERR_ASL_LOG_ENC_INVALID`, `ERR_ASL_LOG_HASH_MISMATCH`, or a generic
 `ERR_INTEGRITY`. Unknown record types are not errors and must be skipped.
 ## 7.2 Conformance Checklist (Informative)
 * Reject logs with invalid magic/version or truncated records.
 * Enforce hash-chain validation across all records.
 * Skip unknown record types using `payload_len` without breaking replay.
 * Treat malformed payload lengths as fatal decode errors.
 ## 8. Relationship to Other Layers
 | Layer            | Responsibility                                   |
--- a/tier1/enc-asl-tgk-exec-plan-1.md
+++ b/tier1/enc-asl-tgk-exec-plan-1.md
@ -90,6 +90,9 @@ typedef struct {
 } snapshot_range_t;
 ```
 **Note:** `logseq_max` corresponds to the `LogPosition` upper bound referenced
 by `ASL/TGK-EXEC-PLAN/1`.
 ---
 ## 4. Operator Parameter Union