Clarify ASL index/log semantics
This commit is contained in:
parent
3886716799
commit
c595e2370a
99
docs/spec-clarifications.md
Normal file
99
docs/spec-clarifications.md
Normal file
|
|
@ -0,0 +1,99 @@
|
||||||
|
# Spec Clarifications
|
||||||
|
|
||||||
|
This document records implementation-level clarifications for draft Tier-1
|
||||||
|
specs. These notes do not change the specs; they document concrete choices for
|
||||||
|
the implementation in this repository.
|
||||||
|
|
||||||
|
## Snapshot and Log Identity (ASL/STORE-INDEX + ASL/LOG)
|
||||||
|
|
||||||
|
Decision:
|
||||||
|
- LogPosition is the log sequence number (`logseq`), not a byte offset.
|
||||||
|
- SnapshotID is an opaque store-assigned `uint64_t`, persisted in the
|
||||||
|
`SNAPSHOT_ANCHOR` payload.
|
||||||
|
|
||||||
|
Implications:
|
||||||
|
- `IndexState = (SnapshotID, LogPosition)` uses an inclusive logseq upper bound
|
||||||
|
when replaying `log[0:LogPosition]`.
|
||||||
|
- The log's record envelope already carries `logseq`, so snapshot anchors use
|
||||||
|
the anchor record's `logseq` as the snapshot log position.
|
||||||
|
- If no snapshot exists, treat SnapshotID as `0` and LogPosition as `0`.
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- `ASL/LOG/1` defines replay and visibility in terms of `logseq` ordering.
|
||||||
|
- `ASL/TGK-EXEC-PLAN/1` orders results by `logseq` and uses `log_prefix` bounds.
|
||||||
|
- `ASL/STORE-INDEX/1` defines LogPosition as a monotonic integer position and
|
||||||
|
replay as `log[0:LogPosition]`, which maps directly to logseq.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- `tier1/asl-log-1.md`
|
||||||
|
- `tier1/enc-asl-log-1.md`
|
||||||
|
- `tier1/asl-store-index-1.md`
|
||||||
|
- `tier1/asl-tgk-execution-plan-1.md`
|
||||||
|
- `tier1/enc-asl-tgk-exec-plan-1.md`
|
||||||
|
|
||||||
|
## Index Segment Identity and Seals (ASL/STORE-INDEX + ASL/LOG)
|
||||||
|
|
||||||
|
Decision:
|
||||||
|
- `segment_id` is a store-local, monotonic `uint64_t` assigned when a segment is
|
||||||
|
created (before writing records), and persisted by naming/metadata outside the
|
||||||
|
segment file.
|
||||||
|
- `segment_hash` is SHA-256 over the exact segment file bytes as stored on disk,
|
||||||
|
including header, records, digest bytes, extents, and footer.
|
||||||
|
|
||||||
|
Implications:
|
||||||
|
- The seal record (`SEGMENT_SEAL`) binds a specific persisted segment file to the
|
||||||
|
log via `(segment_id, segment_hash)`. Hashing occurs after the footer is
|
||||||
|
written so the hash commits to seal metadata (CRC, seal snapshot, timestamp).
|
||||||
|
- Replay uses `segment_id` to locate the segment file and verifies
|
||||||
|
`segment_hash` before admitting it as visible.
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- `ENC/ASL-LOG/1` defines the seal payload as a segment ID plus a hash of the
|
||||||
|
segment bytes; the log is the visibility gate, so the hash must cover the
|
||||||
|
complete on-disk segment.
|
||||||
|
- `ENC/ASL-CORE-INDEX/1` does not embed a segment ID, so the ID must be an
|
||||||
|
external, store-managed handle (filename or catalog entry).
|
||||||
|
|
||||||
|
References:
|
||||||
|
- `tier1/asl-log-1.md`
|
||||||
|
- `tier1/enc-asl-log-1.md`
|
||||||
|
- `tier1/asl-store-index-1.md`
|
||||||
|
- `tier1/enc-asl-core-index-1.md`
|
||||||
|
|
||||||
|
## Tombstone Semantics (ASL/LOG + ASL/STORE-INDEX)
|
||||||
|
|
||||||
|
Decision:
|
||||||
|
- `scope` and `reason_code` are opaque metadata and do not affect shadowing.
|
||||||
|
- A `TOMBSTONE_LIFT` cancels only the referenced tombstone record for the same
|
||||||
|
artifact; other tombstones for that artifact remain effective.
|
||||||
|
|
||||||
|
Across snapshots:
|
||||||
|
- Snapshots capture the effective tombstone state as of the snapshot's `logseq`.
|
||||||
|
- Lifts recorded after a snapshot become effective only when replay reaches
|
||||||
|
their `logseq`.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- `tier1/asl-log-1.md`
|
||||||
|
- `tier1/asl-store-index-1.md`
|
||||||
|
|
||||||
|
## Federation Fields (ENC/ASL-CORE-INDEX)
|
||||||
|
|
||||||
|
Decision:
|
||||||
|
- Version 3 encoders must always emit federation fields in both headers and
|
||||||
|
records. They are required, not optional, in v3.
|
||||||
|
- Decoders accept legacy versions that omit federation fields and apply default
|
||||||
|
local/internal values as defined in the encoding spec.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- `tier1/enc-asl-core-index-1.md`
|
||||||
|
|
||||||
|
## Execution Plan Scope (ASL/TGK-EXEC-PLAN + ENC/ASL-TGK-EXEC-PLAN)
|
||||||
|
|
||||||
|
Decision:
|
||||||
|
- The implementation treats execution plans as a serialized/transport artifact
|
||||||
|
and semantic contract only. A plan executor is out of scope for the core
|
||||||
|
library.
|
||||||
|
|
||||||
|
References:
|
||||||
|
- `tier1/asl-tgk-execution-plan-1.md`
|
||||||
|
- `tier1/enc-asl-tgk-exec-plan-1.md`
|
||||||
|
|
@ -172,6 +172,8 @@ Semantics:
|
||||||
* Does not delete data.
|
* Does not delete data.
|
||||||
* Shadows prior visibility.
|
* Shadows prior visibility.
|
||||||
* Applies from this logseq onward.
|
* Applies from this logseq onward.
|
||||||
|
* `scope` and `reason_code` are opaque to ASL/LOG/1 and MUST NOT affect
|
||||||
|
shadowing or replay order; they are preserved for policy/diagnostic layers.
|
||||||
|
|
||||||
### 4.3 TOMBSTONE_LIFT
|
### 4.3 TOMBSTONE_LIFT
|
||||||
|
|
||||||
|
|
@ -191,6 +193,8 @@ Semantics:
|
||||||
* References an earlier TOMBSTONE.
|
* References an earlier TOMBSTONE.
|
||||||
* Does not erase history.
|
* Does not erase history.
|
||||||
* Only affects CURRENT at or above this logseq.
|
* Only affects CURRENT at or above this logseq.
|
||||||
|
* A lift cancels only the referenced tombstone record for the same artifact;
|
||||||
|
other tombstones for the artifact remain effective unless separately lifted.
|
||||||
|
|
||||||
### 4.4 SNAPSHOT_ANCHOR
|
### 4.4 SNAPSHOT_ANCHOR
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -135,6 +135,11 @@ Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition
|
||||||
|
|
||||||
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
|
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
|
||||||
|
|
||||||
|
**Implementation note (determinism):** This repository interprets `LogPosition`
|
||||||
|
as the inclusive `logseq` upper bound defined by `ASL/LOG/1`, not a byte offset
|
||||||
|
into the log file. Snapshot anchors use their record `logseq` as the snapshot's
|
||||||
|
log position.
|
||||||
|
|
||||||
### 3.5 Artifact Location
|
### 3.5 Artifact Location
|
||||||
|
|
||||||
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
|
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
|
||||||
|
|
@ -306,6 +311,13 @@ Outcome:
|
||||||
* Optional marker to invalidate prior mappings.
|
* Optional marker to invalidate prior mappings.
|
||||||
* Visibility rules identical to regular index entries.
|
* Visibility rules identical to regular index entries.
|
||||||
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
|
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
|
||||||
|
* `scope` and `reason_code` are policy metadata only; they do not affect
|
||||||
|
shadowing order or replay determinism.
|
||||||
|
* Tombstone lifts cancel only the referenced tombstone record for the same
|
||||||
|
artifact; other tombstones remain effective until lifted.
|
||||||
|
* Snapshot + log replay applies tombstones and lifts in `logseq` order; a lift
|
||||||
|
that occurs after a snapshot becomes effective only when replay reaches its
|
||||||
|
`logseq`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -50,6 +50,11 @@ The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are
|
||||||
|
|
||||||
ASL/TGK-EXEC-PLAN/1 defines execution plan semantics for querying artifacts and TGK edges. It does not define encoding, transport, or runtime scheduling.
|
ASL/TGK-EXEC-PLAN/1 defines execution plan semantics for querying artifacts and TGK edges. It does not define encoding, transport, or runtime scheduling.
|
||||||
|
|
||||||
|
**Implementation note:** This repository treats execution plans as a serialized
|
||||||
|
plan format and semantic contract only. A plan executor is out of scope for the
|
||||||
|
core library; higher-level services or tooling may implement execution on top
|
||||||
|
of the encoded plan.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Purpose
|
## 1. Purpose
|
||||||
|
|
|
||||||
|
|
@ -146,6 +146,16 @@ Legacy segments without federation fields MUST be treated as:
|
||||||
* `has_cross_domain_source = 0`
|
* `has_cross_domain_source = 0`
|
||||||
* `cross_domain_source = 0`
|
* `cross_domain_source = 0`
|
||||||
|
|
||||||
|
**Handling rules:**
|
||||||
|
|
||||||
|
* Encoders for version 3 MUST write explicit federation fields in both
|
||||||
|
`SegmentHeader` and `IndexRecord`; these fields are not optional in v3.
|
||||||
|
* Decoders MUST accept older versions that omit federation fields and apply the
|
||||||
|
defaults above.
|
||||||
|
* Decoders MUST reject v3 segments if federation fields are missing, malformed,
|
||||||
|
or contain out-of-range values (e.g., `visibility` not in {0,1} or
|
||||||
|
`has_cross_domain_source` not in {0,1}).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4. SegmentHeader
|
## 4. SegmentHeader
|
||||||
|
|
@ -287,6 +297,10 @@ typedef struct {
|
||||||
* CRC ensures corruption detection during reads, covering all segment contents except the footer.
|
* CRC ensures corruption detection during reads, covering all segment contents except the footer.
|
||||||
* Seal information allows deterministic reconstruction of CURRENT state.
|
* Seal information allows deterministic reconstruction of CURRENT state.
|
||||||
|
|
||||||
|
**Implementation note:** The segment file bytes are hashed for log sealing as
|
||||||
|
defined in `ENC/ASL-LOG/1`. The hash covers the footer as written, so sealing
|
||||||
|
must occur after the footer is finalized.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. DigestBytes
|
## 8. DigestBytes
|
||||||
|
|
|
||||||
|
|
@ -174,6 +174,13 @@ typedef struct {
|
||||||
#pragma pack(pop)
|
#pragma pack(pop)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Implementation note (segment identity):** In this repository, `segment_id` is
|
||||||
|
allocated when a segment is created (before writing records) and persisted via
|
||||||
|
store metadata (e.g., filename or catalog). The `segment_hash` is computed over
|
||||||
|
the exact on-disk segment bytes including header, records, digest bytes,
|
||||||
|
extents, and footer; the hash is taken after the footer is written so the seal
|
||||||
|
commits to the footer metadata.
|
||||||
|
|
||||||
### 6.1.3 TOMBSTONE (Type 0x10)
|
### 6.1.3 TOMBSTONE (Type 0x10)
|
||||||
|
|
||||||
```c
|
```c
|
||||||
|
|
|
||||||
|
|
@ -45,6 +45,9 @@ The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are
|
||||||
|
|
||||||
ENC/ASL-TGK-EXEC-PLAN/1 defines the byte-level encoding for serialized execution plans. It does not define operator semantics.
|
ENC/ASL-TGK-EXEC-PLAN/1 defines the byte-level encoding for serialized execution plans. It does not define operator semantics.
|
||||||
|
|
||||||
|
**Implementation note:** The core library encodes/decodes plans but does not
|
||||||
|
ship a plan executor; execution is delegated to higher-layer components.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Operator Type Enumeration
|
## 1. Operator Type Enumeration
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue