diff --git a/docs/spec-clarifications.md b/docs/spec-clarifications.md new file mode 100644 index 0000000..3018c87 --- /dev/null +++ b/docs/spec-clarifications.md @@ -0,0 +1,99 @@ +# Spec Clarifications + +This document records implementation-level clarifications for draft Tier-1 +specs. These notes do not change the specs; they document concrete choices for +the implementation in this repository. + +## Snapshot and Log Identity (ASL/STORE-INDEX + ASL/LOG) + +Decision: +- LogPosition is the log sequence number (`logseq`), not a byte offset. +- SnapshotID is an opaque store-assigned `uint64_t`, persisted in the + `SNAPSHOT_ANCHOR` payload. + +Implications: +- `IndexState = (SnapshotID, LogPosition)` uses an inclusive logseq upper bound + when replaying `log[0:LogPosition]`. +- The log's record envelope already carries `logseq`, so snapshot anchors use + the anchor record's `logseq` as the snapshot log position. +- If no snapshot exists, treat SnapshotID as `0` and LogPosition as `0`. + +Rationale: +- `ASL/LOG/1` defines replay and visibility in terms of `logseq` ordering. +- `ASL/TGK-EXEC-PLAN/1` orders results by `logseq` and uses `log_prefix` bounds. +- `ASL/STORE-INDEX/1` defines LogPosition as a monotonic integer position and + replay as `log[0:LogPosition]`, which maps directly to logseq. + +References: +- `tier1/asl-log-1.md` +- `tier1/enc-asl-log-1.md` +- `tier1/asl-store-index-1.md` +- `tier1/asl-tgk-execution-plan-1.md` +- `tier1/enc-asl-tgk-exec-plan-1.md` + +## Index Segment Identity and Seals (ASL/STORE-INDEX + ASL/LOG) + +Decision: +- `segment_id` is a store-local, monotonic `uint64_t` assigned when a segment is + created (before writing records), and persisted by naming/metadata outside the + segment file. +- `segment_hash` is SHA-256 over the exact segment file bytes as stored on disk, + including header, records, digest bytes, extents, and footer. + +Implications: +- The seal record (`SEGMENT_SEAL`) binds a specific persisted segment file to the + log via `(segment_id, segment_hash)`. Hashing occurs after the footer is + written so the hash commits to seal metadata (CRC, seal snapshot, timestamp). +- Replay uses `segment_id` to locate the segment file and verifies + `segment_hash` before admitting it as visible. + +Rationale: +- `ENC/ASL-LOG/1` defines the seal payload as a segment ID plus a hash of the + segment bytes; the log is the visibility gate, so the hash must cover the + complete on-disk segment. +- `ENC/ASL-CORE-INDEX/1` does not embed a segment ID, so the ID must be an + external, store-managed handle (filename or catalog entry). + +References: +- `tier1/asl-log-1.md` +- `tier1/enc-asl-log-1.md` +- `tier1/asl-store-index-1.md` +- `tier1/enc-asl-core-index-1.md` + +## Tombstone Semantics (ASL/LOG + ASL/STORE-INDEX) + +Decision: +- `scope` and `reason_code` are opaque metadata and do not affect shadowing. +- A `TOMBSTONE_LIFT` cancels only the referenced tombstone record for the same + artifact; other tombstones for that artifact remain effective. + +Across snapshots: +- Snapshots capture the effective tombstone state as of the snapshot's `logseq`. +- Lifts recorded after a snapshot become effective only when replay reaches + their `logseq`. + +References: +- `tier1/asl-log-1.md` +- `tier1/asl-store-index-1.md` + +## Federation Fields (ENC/ASL-CORE-INDEX) + +Decision: +- Version 3 encoders must always emit federation fields in both headers and + records. They are required, not optional, in v3. +- Decoders accept legacy versions that omit federation fields and apply default + local/internal values as defined in the encoding spec. + +References: +- `tier1/enc-asl-core-index-1.md` + +## Execution Plan Scope (ASL/TGK-EXEC-PLAN + ENC/ASL-TGK-EXEC-PLAN) + +Decision: +- The implementation treats execution plans as a serialized/transport artifact + and semantic contract only. A plan executor is out of scope for the core + library. + +References: +- `tier1/asl-tgk-execution-plan-1.md` +- `tier1/enc-asl-tgk-exec-plan-1.md` diff --git a/tier1/asl-log-1.md b/tier1/asl-log-1.md index d0dfd49..d801591 100644 --- a/tier1/asl-log-1.md +++ b/tier1/asl-log-1.md @@ -172,6 +172,8 @@ Semantics: * Does not delete data. * Shadows prior visibility. * Applies from this logseq onward. +* `scope` and `reason_code` are opaque to ASL/LOG/1 and MUST NOT affect + shadowing or replay order; they are preserved for policy/diagnostic layers. ### 4.3 TOMBSTONE_LIFT @@ -191,6 +193,8 @@ Semantics: * References an earlier TOMBSTONE. * Does not erase history. * Only affects CURRENT at or above this logseq. +* A lift cancels only the referenced tombstone record for the same artifact; + other tombstones for the artifact remain effective unless separately lifted. ### 4.4 SNAPSHOT_ANCHOR diff --git a/tier1/asl-store-index-1.md b/tier1/asl-store-index-1.md index eecda05..3383927 100644 --- a/tier1/asl-store-index-1.md +++ b/tier1/asl-store-index-1.md @@ -135,6 +135,11 @@ Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition Snapshots and log positions are required for checkpointing, federation, and deterministic recovery. +**Implementation note (determinism):** This repository interprets `LogPosition` +as the inclusive `logseq` upper bound defined by `ASL/LOG/1`, not a byte offset +into the log file. Snapshot anchors use their record `logseq` as the snapshot's +log position. + ### 3.5 Artifact Location * **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block. @@ -306,6 +311,13 @@ Outcome: * Optional marker to invalidate prior mappings. * Visibility rules identical to regular index entries. * Used to maintain deterministic CURRENT in face of shadowing or deletions. +* `scope` and `reason_code` are policy metadata only; they do not affect + shadowing order or replay determinism. +* Tombstone lifts cancel only the referenced tombstone record for the same + artifact; other tombstones remain effective until lifted. +* Snapshot + log replay applies tombstones and lifts in `logseq` order; a lift + that occurs after a snapshot becomes effective only when replay reaches its + `logseq`. --- diff --git a/tier1/asl-tgk-execution-plan-1.md b/tier1/asl-tgk-execution-plan-1.md index e51931a..401fcbd 100644 --- a/tier1/asl-tgk-execution-plan-1.md +++ b/tier1/asl-tgk-execution-plan-1.md @@ -50,6 +50,11 @@ The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are ASL/TGK-EXEC-PLAN/1 defines execution plan semantics for querying artifacts and TGK edges. It does not define encoding, transport, or runtime scheduling. +**Implementation note:** This repository treats execution plans as a serialized +plan format and semantic contract only. A plan executor is out of scope for the +core library; higher-level services or tooling may implement execution on top +of the encoded plan. + --- ## 1. Purpose diff --git a/tier1/enc-asl-core-index-1.md b/tier1/enc-asl-core-index-1.md index 062aedb..d747555 100644 --- a/tier1/enc-asl-core-index-1.md +++ b/tier1/enc-asl-core-index-1.md @@ -146,6 +146,16 @@ Legacy segments without federation fields MUST be treated as: * `has_cross_domain_source = 0` * `cross_domain_source = 0` +**Handling rules:** + +* Encoders for version 3 MUST write explicit federation fields in both + `SegmentHeader` and `IndexRecord`; these fields are not optional in v3. +* Decoders MUST accept older versions that omit federation fields and apply the + defaults above. +* Decoders MUST reject v3 segments if federation fields are missing, malformed, + or contain out-of-range values (e.g., `visibility` not in {0,1} or + `has_cross_domain_source` not in {0,1}). + --- ## 4. SegmentHeader @@ -287,6 +297,10 @@ typedef struct { * CRC ensures corruption detection during reads, covering all segment contents except the footer. * Seal information allows deterministic reconstruction of CURRENT state. +**Implementation note:** The segment file bytes are hashed for log sealing as +defined in `ENC/ASL-LOG/1`. The hash covers the footer as written, so sealing +must occur after the footer is finalized. + --- ## 8. DigestBytes diff --git a/tier1/enc-asl-log-1.md b/tier1/enc-asl-log-1.md index f37716a..424b9c6 100644 --- a/tier1/enc-asl-log-1.md +++ b/tier1/enc-asl-log-1.md @@ -174,6 +174,13 @@ typedef struct { #pragma pack(pop) ``` +**Implementation note (segment identity):** In this repository, `segment_id` is +allocated when a segment is created (before writing records) and persisted via +store metadata (e.g., filename or catalog). The `segment_hash` is computed over +the exact on-disk segment bytes including header, records, digest bytes, +extents, and footer; the hash is taken after the footer is written so the seal +commits to the footer metadata. + ### 6.1.3 TOMBSTONE (Type 0x10) ```c diff --git a/tier1/enc-asl-tgk-exec-plan-1.md b/tier1/enc-asl-tgk-exec-plan-1.md index 4f8a0c6..6244cc4 100644 --- a/tier1/enc-asl-tgk-exec-plan-1.md +++ b/tier1/enc-asl-tgk-exec-plan-1.md @@ -45,6 +45,9 @@ The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are ENC/ASL-TGK-EXEC-PLAN/1 defines the byte-level encoding for serialized execution plans. It does not define operator semantics. +**Implementation note:** The core library encodes/decodes plans but does not +ship a plan executor; execution is delegated to higher-layer components. + --- ## 1. Operator Type Enumeration