diff --git a/notes/ASL-INDEX-ACCEL.md b/notes/ASL-INDEX-ACCEL.md index 67a0aaf..1f750d8 100644 --- a/notes/ASL-INDEX-ACCEL.md +++ b/notes/ASL-INDEX-ACCEL.md @@ -37,7 +37,7 @@ It does **not** define: * Artifact or edge identity * Snapshot semantics * Storage lifecycle -* Encoding details (see ENC-ASL-CORE-INDEX) +* Encoding details (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) --- @@ -284,4 +284,3 @@ If you want, the **next natural step** would be to: * Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively Both would lock the model completely. - diff --git a/notes/ENC-ASL-LOG v0.1.md b/notes/ENC-ASL-LOG v0.1.md index 5cb307c..b75fd19 100644 --- a/notes/ENC-ASL-LOG v0.1.md +++ b/notes/ENC-ASL-LOG v0.1.md @@ -180,7 +180,7 @@ typedef struct { | ------------------ | ----------------------------------------------- | | ASL-STORE | Lifecycle, visibility, snapshot enforcement | | ASL-CORE-INDEX | Index semantics (ArtifactKey → BlockID mapping) | -| ENC-ASL-CORE-INDEX | Exact byte layout for index segments | +| ENC-ASL-CORE-INDEX | Exact byte layout for index segments (`tier1/enc-asl-core-index.md`) | | ENC-ASL-LOG | Exact byte layout for append-only log | --- @@ -199,4 +199,3 @@ You could now use it to: If you want, I can **draft a minimal C writer/reader API** next that directly reads/writes this log, including helpers for `add_index_segment()`, `seal_segment()`, and `tombstone()` with payload serialization. Do you want me to do that next? - diff --git a/notes/all.txt b/notes/all.txt index 89cf787..a6bc6b1 100644 --- a/notes/all.txt +++ b/notes/all.txt @@ -1183,223 +1183,7 @@ Design GC and snapshot pruning safely You’re asking the right questions — this is already “systems-level” design, not tinkering. -Maybe this clear something up: -# ASL-STORE-INDEX - -### Store Semantics and Contracts for ASL Index - ---- - -## 1. Purpose - -This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics. - -It bridges the gap between **index meaning** and **physical storage**, ensuring: - -* Deterministic replay -* Snapshot-aware visibility -* Immutable block guarantees -* Idempotent recovery -* Correctness of CURRENT state - -It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX). - ---- - -## 2. Scope - -This specification covers: - -* Index segment lifecycle -* Interaction between index and ASL blocks -* Append-only log semantics -* Snapshot integration -* Visibility and lookup rules -* Crash safety and recovery -* Garbage collection constraints - -It does **not** cover: - -* Disk format details -* Bloom filter algorithms -* File system specifics -* Placement heuristics beyond semantic guarantees - ---- - -## 3. Core Concepts - -### 3.1 Index Segment - -A **segment** is a contiguous set of index entries written by the store. - -* Open while accepting new entries -* Sealed when closed for append -* Sealed segments are immutable -* Sealed segments are **snapshot-visible only after log record** - -Segments are the **unit of persistence, replay, and GC**. - ---- - -### 3.2 ASL Block Relationship - -Each index entry references a **sealed block** via: - - -ArtifactKey → (BlockID, offset, length) - - -* The store must ensure the block is sealed before the entry becomes log-visible -* Blocks are immutable after seal -* Open blocks may be abandoned without violating invariants - ---- - -### 3.3 Append-Only Log - -All store-visible mutations are recorded in a **strictly ordered, append-only log**: - -* Entries include index additions, tombstones, and segment seals -* Log is durable and replayable -* Log defines visibility above checkpoint snapshots - -**CURRENT state** is derived as: - - -CURRENT = checkpoint_state + replay(log) - - ---- - -## 4. Segment Lifecycle - -### 4.1 Creation - -* Open segment is allocated -* Index entries appended in log order -* Entries are invisible until segment seal and log append - -### 4.2 Seal - -* Segment is closed to append -* Seal record is written to append-only log -* Segment becomes visible for lookup -* Sealed segment may be snapshot-pinned - -### 4.3 Snapshot Interaction - -* Snapshots capture sealed segments -* Open segments need not survive snapshot -* Segments below snapshot are replay anchors - -### 4.4 Garbage Collection - -* Only **sealed and unreachable segments** can be deleted -* GC operates at segment granularity -* GC must not break CURRENT or violate invariants - ---- - -## 5. Lookup Semantics - -To resolve an ArtifactKey: - -1. Identify all visible segments ≤ CURRENT -2. Search segments in **reverse creation order** (newest first) -3. Return the first matching entry -4. Respect tombstone entries (if present) - -Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**. - ---- - -## 6. Visibility Guarantees - -* Entry visible **iff**: - - * The block is sealed - * Log record exists ≤ CURRENT - * Segment seal recorded in log -* Entries above CURRENT or referencing unsealed blocks are invisible - ---- - -## 7. Crash and Recovery Semantics - -### 7.1 Crash During Open Segment - -* Open segments may be lost -* Index entries may be leaked -* No sealed segment may be corrupted - -### 7.2 Recovery Procedure - -1. Mount latest checkpoint snapshot -2. Replay append-only log from checkpoint -3. Rebuild CURRENT -4. Resume normal operation - -Recovery must be **deterministic and idempotent**. - ---- - -## 8. Tombstone Semantics - -* Optional: tombstones may exist to invalidate prior mappings -* Tombstones shadow prior entries with the same ArtifactKey -* Tombstone visibility follows same rules as regular entries - ---- - -## 9. Invariants (Normative) - -The store **must enforce**: - -1. No segment visible without seal log record -2. No mutation of sealed segment or block -3. Shadowing follows log order strictly -4. Replay uniquely reconstructs CURRENT -5. GC does not remove segments referenced by snapshot or log -6. ArtifactLocation always points to immutable bytes - ---- - -## 10. Non-Goals - -ASL-STORE-INDEX does **not** define: - -* Disk layout or encoding (ENC-ASL-CORE-INDEX) -* Placement heuristics (small vs. large block packing) -* Performance targets -* Memory caching strategies -* Federation or provenance mechanics - ---- - -## 11. Relationship to Other Documents - -| Layer | Responsibility | -| ------------------ | -------------------------------------------------------------------- | -| ASL-CORE-INDEX | Defines semantic meaning of mapping ArtifactKey → ArtifactLocation | -| ASL-STORE-INDEX | Defines contracts for store to realize those semantics | -| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format | - ---- - -## 12. Summary - -The store-index layer guarantees: - -* Immutable, snapshot-safe segments -* Deterministic and idempotent replay -* Correct visibility semantics -* Safe crash recovery -* Garbage collection constraints - -This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies. - -Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous. +Canonical spec (refined, replaces earlier draft): --- diff --git a/notes/asl-core-index.md b/notes/asl-core-index.md deleted file mode 100644 index 100d25a..0000000 --- a/notes/asl-core-index.md +++ /dev/null @@ -1,245 +0,0 @@ -# ASL-CORE-INDEX - -### Semantic Addendum to ASL-CORE - ---- - -## 1. Purpose - -This document defines the **semantic model of the ASL index**, extending ASL-CORE artifact semantics to include **mapping artifacts to storage locations**. - -The ASL index provides a **deterministic, snapshot-relative mapping** from artifact identities to byte locations within **immutable storage blocks**. - -It specifies **what the index means**, not: - -* How the index is stored or encoded -* How blocks are allocated or packed -* Performance optimizations -* Garbage collection or memory strategies - -Those are handled by: - -* **ASL-STORE-INDEX** (store semantics and contracts) -* **ENC-ASL-CORE-INDEX** (bytes-on-disk encoding) - ---- - -## 2. Scope - -This document defines: - -* Logical structure of index entries -* Visibility rules -* Snapshot and log interaction -* Immutability and shadowing semantics -* Determinism guarantees -* Required invariants - -It does **not** define: - -* On-disk formats -* Index segmentation or sharding -* Bloom filters or probabilistic structures -* Memory residency -* Performance targets - ---- - -## 3. Terminology - -* **Artifact**: An immutable sequence of bytes managed by ASL. -* **ArtifactKey**: Opaque identifier for an artifact (typically a hash). -* **Block**: Immutable storage unit containing artifact bytes. -* **BlockID**: Opaque, unique identifier for a block. -* **ArtifactLocation**: Tuple `(BlockID, offset, length)` identifying bytes within a block. -* **Snapshot**: Checkpoint capturing a consistent base state of ASL-managed storage and metadata. -* **Append-Only Log**: Strictly ordered log of index-visible mutations occurring after a snapshot. -* **CURRENT**: The effective system state obtained by replaying the append-only log on top of a checkpoint snapshot. - ---- - -## 4. Block Semantics - -ASL-CORE introduces **blocks** minimally: - -1. Blocks are **existential storage atoms** for artifact bytes. -2. Each block is uniquely identified by a **BlockID**. -3. Blocks are **immutable once sealed**. -4. Addressing: `(BlockID, offset, length) → bytes`. -5. No block layout, allocation, packing, or size semantics are defined at the core level. - ---- - -## 5. Core Semantic Mapping - -The ASL index defines a **total mapping**: - -``` -ArtifactKey → ArtifactLocation -``` - -Semantic guarantees: - -* Each visible `ArtifactKey` maps to exactly one `ArtifactLocation`. -* Mapping is **immutable once visible**. -* Mapping is **snapshot-relative**. -* Mapping is **deterministic** given `(snapshot, log prefix)`. - ---- - -## 6. ArtifactLocation Semantics - -* `block_id` references an ASL block. -* `offset` and `length` define bytes within the block. -* Only valid for the lifetime of the referenced block. -* No interpretation of bytes is implied. - ---- - -## 7. Visibility Model - -An index entry is **visible** if and only if: - -1. The referenced block is sealed. -2. A corresponding log record exists. -3. The log record is ≤ CURRENT replay position. - -**Consequences**: - -* Entries referencing unsealed blocks are invisible. -* Entries above CURRENT are invisible. -* Visibility is binary (no gradual exposure). - ---- - -## 8. Snapshot and Log Semantics - -* Snapshots act as **checkpoints**, not full state representations. -* Index state at any time: - -``` -Index(CURRENT) = Index(snapshot) + replay(log) -``` - -* Replay is strictly ordered, deterministic, and idempotent. -* Snapshot and log entries are semantically equivalent once replayed. - ---- - -## 9. Immutability and Shadowing - -### 9.1 Immutability - -* Index entries are never mutated. -* Once visible, an entry’s meaning never changes. -* Blocks referenced by entries are immutable. - -### 9.2 Shadowing - -* Later entries may shadow earlier entries with the same `ArtifactKey`. -* Precedence is determined by log order. -* Snapshot boundaries do not alter shadowing semantics. - ---- - -## 10. Tombstones (Optional) - -* Tombstone entries are allowed to invalidate prior mappings. -* Semantics: - - * Shadows previous entries for the same `ArtifactKey`. - * Visibility follows the same rules as regular entries. -* Existence and encoding of tombstones are optional. - ---- - -## 11. Determinism Guarantees - -For fixed: - -* Snapshot -* Log prefix -* ASL configuration -* Hash algorithm - -The index guarantees: - -* Deterministic lookup results -* Deterministic shadowing resolution -* Deterministic visibility - -No nondeterministic input may influence index semantics. - ---- - -## 12. Separation of Concerns - -* **ASL-CORE**: Defines artifact semantics and the existence of blocks as storage atoms. -* **ASL-CORE-INDEX**: Defines how artifact keys map to blocks, offsets, and lengths. -* **ASL-STORE-INDEX**: Defines lifecycle, replay, and visibility guarantees. -* **ENC-ASL-CORE-INDEX**: Defines exact bytes-on-disk representation. - -Index semantics **do not** prescribe: - -* Block allocation -* Packing strategies -* Performance optimizations -* Memory residency or caching - ---- - -## 13. Normative Invariants - -All conforming implementations must enforce: - -1. No visibility without a log record. -2. No mutation of visible index entries. -3. No mutation of sealed blocks. -4. Shadowing follows strict log order. -5. Replay of snapshot + log uniquely defines CURRENT. -6. ArtifactLocation always resolves to immutable bytes. - -Violation of any invariant constitutes index corruption. - ---- - -## 14. Non-Goals (Explicit) - -ASL-CORE-INDEX does **not** define: - -* Disk layout or encoding -* Segment structure, sharding, or bloom filters -* GC policies or memory management -* Small vs. large block packing -* Federation or provenance mechanics - ---- - -## 15. Relationship to Other Specifications - -| Layer | Responsibility | -| ------------------ | ---------------------------------------------------------- | -| ASL-CORE | Defines artifact semantics and existence of blocks | -| ASL-CORE-INDEX | Defines semantic mapping of ArtifactKey → ArtifactLocation | -| ASL-STORE-INDEX | Defines store contracts to realize index semantics | -| ENC-ASL-CORE-INDEX | Defines exact encoding on disk | - ---- - -## 16. Summary - -The ASL index: - -* Maps artifact identities to block locations deterministically -* Is immutable once entries are visible -* Resolves visibility via snapshots + append-only log -* Supports optional tombstones -* Provides a stable substrate for store, encoding, and higher layers like PEL - -It answers **exactly one question**: - -> *Given an artifact identity and a point in time, where are the bytes?* - -Nothing more, nothing less. - - diff --git a/notes/asl-federation.md b/notes/asl-federation.md index d4f14f8..6c06121 100644 --- a/notes/asl-federation.md +++ b/notes/asl-federation.md @@ -138,7 +138,7 @@ It ensures **determinism, traceability, and reproducibility** across federated d | ASL-CORE | Blocks and artifacts remain immutable; no change | | ASL-CORE-INDEX | Artifact → Block mapping is domain-local; published artifacts are indexed across domains | | ASL-STORE-INDEX | Sealing, retention, and snapshot pinning apply per domain; GC respects cross-domain references | -| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation | +| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation (`tier1/enc-asl-core-index.md`) | | PEL | DAG execution may include imported artifacts; determinism guaranteed per domain snapshot | | PEL-PROV / PEL-TRACE | Maintains provenance including cross-domain artifact lineage | @@ -155,5 +155,3 @@ The Federation Specification formalizes: * Integration with index, store, PEL, and provenance layers It ensures **multi-domain determinism, traceability, and reproducibility** while leaving semantics and storage-layer policies unchanged. - - diff --git a/notes/asl-store-index.md b/notes/asl-store-index.md deleted file mode 100644 index 2fca44a..0000000 --- a/notes/asl-store-index.md +++ /dev/null @@ -1,439 +0,0 @@ -# ASL-STORE-INDEX - -### Store Semantics and Contracts for ASL Index - ---- - -## 1. Purpose - -This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics. - -It bridges the gap between **index meaning** and **physical storage**, ensuring: - -* Deterministic replay -* Snapshot-aware visibility -* Immutable block guarantees -* Idempotent recovery -* Correctness of CURRENT state - -It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX). - ---- - -## 2. Scope - -This specification covers: - -* Index segment lifecycle -* Interaction between index and ASL blocks -* Append-only log semantics -* Snapshot integration -* Visibility and lookup rules -* Crash safety and recovery -* Garbage collection constraints - -It does **not** cover: - -* Disk format details -* Bloom filter algorithms -* File system specifics -* Placement heuristics beyond semantic guarantees - ---- - -## 3. Core Concepts - -### 3.1 Index Segment - -A **segment** is a contiguous set of index entries written by the store. - -* Open while accepting new entries -* Sealed when closed for append -* Sealed segments are immutable -* Sealed segments are **snapshot-visible only after log record** - -Segments are the **unit of persistence, replay, and GC**. - ---- - -### 3.2 ASL Block Relationship - -Each index entry references a **sealed block** via: - -``` -ArtifactKey → (BlockID, offset, length) -``` - -* The store must ensure the block is sealed before the entry becomes log-visible -* Blocks are immutable after seal -* Open blocks may be abandoned without violating invariants - ---- - -### 3.3 Append-Only Log - -All store-visible mutations are recorded in a **strictly ordered, append-only log**: - -* Entries include index additions, tombstones, and segment seals -* Log is durable and replayable -* Log defines visibility above checkpoint snapshots - -**CURRENT state** is derived as: - -``` -CURRENT = checkpoint_state + replay(log) -``` - ---- - -## 4. Segment Lifecycle - -### 4.1 Creation - -* Open segment is allocated -* Index entries appended in log order -* Entries are invisible until segment seal and log append - -### 4.2 Seal - -* Segment is closed to append -* Seal record is written to append-only log -* Segment becomes visible for lookup -* Sealed segment may be snapshot-pinned - -### 4.3 Snapshot Interaction - -* Snapshots capture sealed segments -* Open segments need not survive snapshot -* Segments below snapshot are replay anchors - -### 4.4 Garbage Collection - -* Only **sealed and unreachable segments** can be deleted -* GC operates at segment granularity -* GC must not break CURRENT or violate invariants - ---- - -## 5. Lookup Semantics - -To resolve an `ArtifactKey`: - -1. Identify all visible segments ≤ CURRENT -2. Search segments in **reverse creation order** (newest first) -3. Return the first matching entry -4. Respect tombstone entries (if present) - -Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**. - ---- - -## 6. Visibility Guarantees - -* Entry visible **iff**: - - * The block is sealed - * Log record exists ≤ CURRENT - * Segment seal recorded in log -* Entries above CURRENT or referencing unsealed blocks are invisible - ---- - -## 7. Crash and Recovery Semantics - -### 7.1 Crash During Open Segment - -* Open segments may be lost -* Index entries may be leaked -* No sealed segment may be corrupted - -### 7.2 Recovery Procedure - -1. Mount latest checkpoint snapshot -2. Replay append-only log from checkpoint -3. Rebuild CURRENT -4. Resume normal operation - -Recovery must be **deterministic and idempotent**. - ---- - -## 8. Tombstone Semantics - -* Optional: tombstones may exist to invalidate prior mappings -* Tombstones shadow prior entries with the same `ArtifactKey` -* Tombstone visibility follows same rules as regular entries - ---- - -## 9. Invariants (Normative) - -The store **must enforce**: - -1. No segment visible without seal log record -2. No mutation of sealed segment or block -3. Shadowing follows log order strictly -4. Replay uniquely reconstructs CURRENT -5. GC does not remove segments referenced by snapshot or log -6. ArtifactLocation always points to immutable bytes - ---- - -## 10. Non-Goals - -ASL-STORE-INDEX does **not** define: - -* Disk layout or encoding (ENC-ASL-CORE-INDEX) -* Placement heuristics (small vs. large block packing) -* Performance targets -* Memory caching strategies -* Federation or provenance mechanics - ---- - -## 11. Relationship to Other Documents - -| Layer | Responsibility | -| ------------------ | -------------------------------------------------------------------- | -| ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` | -| ASL-STORE-INDEX | Defines contracts for store to realize those semantics | -| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format | - ---- - -## 12. Summary - -The store-index layer guarantees: - -* Immutable, snapshot-safe segments -* Deterministic and idempotent replay -* Correct visibility semantics -* Safe crash recovery -* Garbage collection constraints - -This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies. - -Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous. - ---- - -# ASL-STORE-INDEX - -### Store Semantics and Contracts for ASL Core Index (Refined) - ---- - -## 1. Purpose - -This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX. - -It specifies: - -* **Block lifecycle**: creation, sealing, retention -* **Index segment lifecycle**: creation, append, seal, visibility -* **Snapshot interaction**: pinning, deterministic visibility -* **Append-only log semantics** -* **Garbage collection rules** - -It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX). - ---- - -## 2. Scope - -Covers: - -* Lifecycle of **blocks** and **index entries** -* Snapshot and CURRENT consistency guarantees -* Deterministic replay and recovery -* GC and tombstone semantics - -Excludes: - -* Disk-level encoding -* Sharding strategies -* Bloom filters or acceleration structures -* Memory residency or caching -* Federation or PEL semantics - ---- - -## 3. Core Concepts - -### 3.1 Block - -* **Definition:** Immutable storage unit containing artifact bytes. -* **Identifier:** BlockID (opaque, unique) -* **Properties:** - - * Once sealed, contents never change - * Can be referenced by multiple artifacts - * May be pinned by snapshots for retention -* **Lifecycle Events:** - - 1. Creation: block allocated but contents may still be written - 2. Sealing: block is finalized, immutable, and log-visible - 3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT - 4. Garbage collection: block may be deleted if no longer referenced and unpinned - ---- - -### 3.2 Index Segment - -Segments group index entries and provide **persistence and recovery units**. - -* **Open segment:** accepting new index entries, not visible for lookup -* **Sealed segment:** closed for append, log-visible, snapshot-pinnable -* **Segment components:** header, optional bloom filter, index records, footer -* **Segment visibility:** only after seal and log append - ---- - -### 3.3 Append-Only Log - -All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**: - -* Entries include: - - * Index additions - * Tombstones - * Segment seals -* Log is replayable to reconstruct CURRENT -* Determinism: replay produces identical CURRENT from same snapshot and log prefix - ---- - -## 4. Block Lifecycle Semantics - -| Event | Description | Semantic Guarantees | -| ------------------ | ------------------------------------- | ------------------------------------------------------------- | -| Creation | Block allocated; bytes may be written | Not visible to index until sealed | -| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index | -| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed | -| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed | - -**Notes:** - -* Sealing ensures that any index entry referencing the block is deterministic and immutable. -* Retention is driven by snapshot and log visibility rules. -* GC must **never violate CURRENT reconstruction guarantees**. - ---- - -## 5. Snapshot Interaction - -* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time. -* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration. -* CURRENT is reconstructed as: - -``` -CURRENT = snapshot_state + replay(log) -``` - -* Segment and block visibility rules: - -| Entity | Visible in snapshot | Visible in CURRENT | -| -------------------- | ---------------------------- | ------------------------------ | -| Open segment/block | No | Only after seal and log append | -| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log | -| Tombstone | Yes, if log-recorded | Yes, shadows prior entries | - ---- - -## 6. Index Lookup Semantics - -To resolve an `ArtifactKey`: - -1. Identify all visible segments ≤ CURRENT -2. Search segments in **reverse creation order** (newest first) -3. Return first matching entry -4. Respect tombstones to shadow prior entries - -Determinism: - -* Lookup results are identical across platforms given the same snapshot and log prefix -* Accelerations (bloom filters, sharding, SIMD) do **not alter correctness** - ---- - -## 7. Garbage Collection - -* **Eligibility for GC:** - - * Segments: sealed, no references from CURRENT or snapshots - * Blocks: unpinned, unreferenced by any segment or artifact -* **Rules:** - - * GC is safe **only on sealed segments and blocks** - * Must respect snapshot pins - * Tombstones may aid in invalidating unreachable blocks -* **Outcome:** - - * GC never violates CURRENT reconstruction - * Blocks can be reclaimed without breaking provenance - ---- - -## 8. Tombstone Semantics - -* Optional marker to invalidate prior mappings -* Visibility rules identical to regular index entries -* Used to maintain deterministic CURRENT in face of shadowing or deletions - ---- - -## 9. Crash and Recovery Semantics - -* Open segments or unsealed blocks may be lost; no invariant is broken -* Recovery procedure: - - 1. Mount last checkpoint snapshot - 2. Replay append-only log - 3. Reconstruct CURRENT -* Recovery is **deterministic and idempotent** -* Segments and blocks **never partially visible** after crash - ---- - -## 10. Normative Invariants - -1. Sealed blocks are immutable -2. Index entries referencing blocks are immutable once visible -3. Shadowing follows strict log order -4. Replay of snapshot + log uniquely reconstructs CURRENT -5. GC cannot remove blocks or segments needed by snapshot or CURRENT -6. Tombstones shadow prior entries without deleting underlying blocks prematurely - ---- - -## 11. Non-Goals - -* Disk-level encoding (ENC-ASL-CORE-INDEX) -* Memory layout or caching -* Sharding or performance heuristics -* Federation / multi-domain semantics (handled elsewhere) -* Block packing strategies (small vs large blocks) - ---- - -## 12. Relationship to Other Layers - -| Layer | Responsibility | -| ------------------ | ---------------------------------------------------------------------------- | -| ASL-CORE | Artifact semantics, existence of blocks, immutability | -| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation | -| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments | -| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters | - ---- - -## 13. Summary - -The refined ASL-STORE-INDEX: - -* Defines **block lifecycle**: creation, sealing, retention, GC -* Ensures **snapshot safety** and deterministic visibility -* Guarantees **immutable, replayable, and recoverable CURRENT** -* Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics - - diff --git a/notes/enc-asl-core-index-addendum-federation-encoding.md b/notes/enc-asl-core-index-addendum-federation-encoding.md index a374abb..f4251d9 100644 --- a/notes/enc-asl-core-index-addendum-federation-encoding.md +++ b/notes/enc-asl-core-index-addendum-federation-encoding.md @@ -1,5 +1,7 @@ # ENC-ASL-CORE-INDEX ADDENDUM: Federation Encoding +Base spec: `tier1/enc-asl-core-index.md` + --- ## 1. Purpose @@ -109,5 +111,3 @@ This addendum updates **ENC-ASL-CORE-INDEX** to support **federation**: * Maintains backward compatibility with legacy segments It integrates federation metadata **without altering the underlying block or artifact encoding**, preserving deterministic execution and PEL provenance. - - diff --git a/notes/overview.md b/notes/overview.md index f91c927..ff0b616 100644 --- a/notes/overview.md +++ b/notes/overview.md @@ -33,7 +33,7 @@ │ • Block sealing │ │ • Retention / GC │ │ • Small/Large packing │ -│ - ENC-ASL-CORE-INDEX │ +│ - ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md) │ │ • On-disk record layout│ │ • Domain / visibility │ └─────────────┬──────────────┘ @@ -143,5 +143,3 @@ This diagram and flow description captures: * Deterministic reconstruction from **checkpoint + append-only log** * Block semantics, small/large handling, and domain visibility * Integration of **execution receipts** into artifact flows and traces - - diff --git a/notes/typetag-patch.md b/notes/typetag-patch.md index 33692df..55481c1 100644 --- a/notes/typetag-patch.md +++ b/notes/typetag-patch.md @@ -182,7 +182,7 @@ ASL-CORE-INDEX ASL-STORE-INDEX └─ Store lifecycle & snapshot safety -ENC-ASL-CORE-INDEX +ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md) └─ Bytes-on-disk encoding ASL-INDEX-ACCEL ← NEW @@ -214,4 +214,3 @@ If you want next, I can: * Draft **ASL-INDEX-ACCEL** * Or rewrite **ASL-CORE-INDEX with Canonical vs Routing fully integrated** - diff --git a/tier1/asl-core-index.md b/tier1/asl-core-index.md new file mode 100644 index 0000000..22d9290 --- /dev/null +++ b/tier1/asl-core-index.md @@ -0,0 +1,211 @@ +# ASL/1-CORE-INDEX — Semantic Index Model + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Tags: [deterministic, index, semantics] + +**Document ID:** `ASL/1-CORE-INDEX` +**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle) + +**Depends on (normative):** + +* `ASL/1-CORE` +* `ASL/1-STORE` + +**Informative references:** + +* `ASL-STORE-INDEX` — store lifecycle and replay contracts +* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`) +* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding) +* `ASL/LOG/1` — append-only semantic log (segment visibility) + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX. + +--- + +## 1. Purpose & Non-Goals + +### 1.1 Purpose + +ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts: + +* It specifies what it means to map an artifact identity to a byte location. +* It defines visibility, immutability, and shadowing semantics. +* It ensures deterministic lookup for a fixed snapshot and log prefix. + +### 1.2 Non-goals + +ASL/1-CORE-INDEX explicitly does **not** define: + +* On-disk layouts, segment files, or memory representations. +* Block allocation, packing, GC, or lifecycle rules. +* Snapshot implementation details, checkpoints, or log storage. +* Performance optimizations (bloom filters, sharding, SIMD). +* Federation, provenance, or execution semantics. + +--- + +## 2. Terminology + +* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE. +* **Reference** — ASL/1 content address of an Artifact (hash_id + digest). +* **StoreConfig** — `{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE). +* **Block** — immutable storage unit containing artifact bytes. +* **BlockID** — opaque identifier for a block. +* **ArtifactExtent** — `(BlockID, offset, length)` identifying a byte slice within a block. +* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes. +* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state. +* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot. +* **CURRENT** — effective state after replaying a log prefix on a snapshot. + +--- + +## 3. Core Mapping Semantics + +### 3.1 Index Mapping + +The index defines a semantic mapping: + +``` +Reference -> ArtifactLocation +``` + +For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state. + +### 3.2 Determinism + +For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics. + +### 3.3 StoreConfig Consistency + +All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`. + +--- + +## 4. ArtifactLocation Semantics + +* An ArtifactLocation is an **ordered list** of ArtifactExtents. +* Each extent references immutable bytes within a block. +* The artifact bytes are defined by **concatenating extents in order**. +* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes. +* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks. +* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact. +* An ArtifactLocation is valid only while all referenced blocks are retained. +* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping. + +--- + +## 5. Visibility Model + +An index entry is **visible** at CURRENT if and only if: + +1. The entry is admitted in the ordered log prefix for CURRENT. +2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules). + +Visibility is binary; entries are either visible or not visible. + +--- + +## 6. Snapshot and Log Semantics + +Snapshots provide a base mapping; the append-only log defines subsequent changes. + +The index state for a given CURRENT is defined as: + +``` +Index(CURRENT) = Index(snapshot) + replay(log_prefix) +``` + +Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed. + +--- + +## 7. Immutability and Shadowing + +### 7.1 Immutability + +* Index entries are never mutated. +* Once visible, an entry’s meaning does not change. +* Referenced bytes are immutable for the lifetime of the entry. + +### 7.2 Shadowing + +* Later entries MAY shadow earlier entries with the same Reference. +* Precedence is determined solely by log order. +* Snapshot boundaries do not alter shadowing semantics. + +--- + +## 8. Tombstones (Optional) + +Tombstone entries MAY be used to invalidate prior mappings. + +* A tombstone shadows earlier entries for the same Reference. +* Visibility rules are identical to regular entries. +* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used. + +--- + +## 9. Determinism Guarantees + +For fixed: + +* StoreConfig +* Snapshot +* Log prefix + +ASL/1-CORE-INDEX guarantees: + +* Deterministic lookup results +* Deterministic shadowing resolution +* Deterministic visibility + +--- + +## 10. Normative Invariants + +Conforming implementations MUST enforce: + +1. No visibility without a log-admitted entry. +2. No mutation of visible index entries. +3. Referenced bytes remain immutable for the entry’s lifetime. +4. Shadowing follows strict log order. +5. Snapshot + log replay uniquely defines CURRENT. +6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun). + +Violation of any invariant constitutes index corruption. + +--- + +## 11. Relationship to Other Specifications + +| Layer | Responsibility | +| ------------------ | ---------------------------------------------------------- | +| ASL/1-CORE | Artifact semantics and identity | +| ASL/1-STORE | StoreSnapshot and put/get logical model | +| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation | +| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts | +| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records | + +--- + +## 12. Summary + +ASL/1-CORE-INDEX specifies the semantic meaning of the index: + +* It maps artifact References to byte locations deterministically. +* It defines visibility and shadowing rules across snapshot + log replay. +* It guarantees immutability and deterministic lookup. + +It answers one question: + +> *Given a Reference and a CURRENT state, where are the bytes?* diff --git a/tier1/asl-index-accel-1.md b/tier1/asl-index-accel-1.md new file mode 100644 index 0000000..d4b95ca --- /dev/null +++ b/tier1/asl-index-accel-1.md @@ -0,0 +1,272 @@ +# ASL/INDEX-ACCEL/1 — Index Acceleration Semantics + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Tags: [deterministic, index, acceleration] + +**Document ID:** `ASL/INDEX-ACCEL/1` +**Layer:** L1 — Acceleration rules over index semantics (no storage / encoding) + +**Depends on (normative):** + +* `ASL/1-CORE-INDEX` + +**Informative references:** + +* `ASL-STORE-INDEX` — store lifecycle and replay contracts +* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`) + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/INDEX-ACCEL/1 defines **acceleration semantics only**. It MUST NOT change index meaning defined by ASL/1-CORE-INDEX. + +--- + +## 1. Purpose + +ASL/INDEX-ACCEL/1 defines **acceleration mechanisms** used by ASL-based indexes, including: + +* Routing keys +* Sharding +* Filters (Bloom, XOR, Ribbon, etc.) +* SIMD execution +* Hash recasting + +All mechanisms defined herein are **observationally invisible** to ASL/1-CORE-INDEX semantics. + +--- + +## 2. Scope + +Applies to: + +* Artifact indexes (ASL) +* Projection and graph indexes (e.g., TGK) +* Any index layered on ASL/1-CORE-INDEX semantics + +Does **not** define: + +* Artifact or edge identity +* Snapshot semantics +* Storage lifecycle +* Encoding details + +--- + +## 3. Canonical Key vs Routing Key + +### 3.1 Canonical Key + +The **Canonical Key** uniquely identifies an indexable entity. + +Examples: + +* Artifact: `Reference` +* TGK Edge: `CanonicalEdgeKey` + +Properties: + +* Defines semantic identity +* Used for equality, shadowing, and tombstones +* Stable and immutable +* Fully compared on index match + +### 3.2 Routing Key + +The **Routing Key** is a **derived, advisory key** used exclusively for acceleration. + +Properties: + +* Derived deterministically from Canonical Key and optional attributes +* MAY be used for sharding, filters, SIMD layouts +* MUST NOT affect index semantics +* MUST be verified by full Canonical Key comparison on match + +Formal rule: + +``` +CanonicalKey determines correctness +RoutingKey determines performance +``` + +--- + +## 4. Filter Semantics + +### 4.1 Advisory Nature + +All filters are **advisory only**. + +Rules: + +* False positives are permitted +* False negatives are forbidden +* Filter behavior MUST NOT affect correctness + +Invariant: + +``` +Filter miss => key is definitely absent +Filter hit => key may be present +``` + +### 4.2 Filter Inputs + +Filters operate over **Routing Keys**, not Canonical Keys. + +A Routing Key MAY incorporate: + +* Hash of Canonical Key +* Artifact type tag (if present) +* TGK edge type key +* Direction, role, or other immutable classification attributes + +Absence of optional attributes MUST be encoded explicitly. + +### 4.3 Filter Construction + +* Filters are built only over **sealed, immutable segments** +* Filters are immutable once built +* Filter construction MUST be deterministic +* Filter state MUST be covered by segment checksums + +--- + +## 5. Sharding Semantics + +### 5.1 Observational Invisibility + +Sharding is a **mechanical partitioning** of the index. + +Invariant: + +``` +LogicalIndex = union(all shards) +``` + +Rules: + +* Shards MUST NOT affect lookup results +* Shard count and boundaries may change over time +* Rebalancing MUST preserve lookup semantics + +### 5.2 Shard Assignment + +Shard assignment MAY be based on: + +* Hash of Canonical Key +* Routing Key +* Composite routing strategies + +Shard selection MUST be deterministic per snapshot. + +--- + +## 6. Hashing and Hash Recasting + +### 6.1 Hashing + +Hashes MAY be used for routing, filtering, or SIMD layout. + +Hashes MUST NOT be treated as identity. + +### 6.2 Hash Recasting + +Hash recasting (changing hash functions or seeds) is permitted if: + +1. It is deterministic +2. It does not change Canonical Keys +3. It does not affect index semantics + +Recasting is equivalent to rebuilding acceleration structures. + +--- + +## 7. SIMD Execution + +SIMD operations MAY be used to: + +* Evaluate filters +* Compare routing keys +* Accelerate scans + +Rules: + +* SIMD must operate only on immutable data +* SIMD must not short-circuit semantic checks +* SIMD must preserve deterministic behavior + +--- + +## 8. Multi-Dimensional Routing Examples (Normative) + +### 8.1 Artifact Index + +* Canonical Key: `Reference` +* Routing Key components: + + * `H(Reference)` + * `type_tag` (if present) + * `has_typetag` + +### 8.2 TGK Edge Index + +* Canonical Key: `CanonicalEdgeKey` +* Routing Key components: + + * `H(CanonicalEdgeKey)` + * `edge_type_key` + * Direction or role (optional) + +--- + +## 9. Snapshot Interaction + +Acceleration structures: + +* MUST respect snapshot visibility rules +* MUST operate over the same sealed segments visible to the snapshot +* MUST NOT bypass tombstones or shadowing + +Snapshot cuts apply **after** routing and filtering. + +--- + +## 10. Normative Invariants + +1. Canonical Keys define identity and correctness +2. Routing Keys are advisory only +3. Filters may never introduce false negatives +4. Sharding is observationally invisible +5. Hashes are not identity +6. SIMD is an execution strategy, not a semantic construct +7. All acceleration is deterministic per snapshot + +--- + +## 11. Non-Goals + +ASL/INDEX-ACCEL/1 does not define: + +* Specific filter algorithms +* Memory layout +* CPU instruction selection +* Encoding formats +* Federation policies + +--- + +## 12. Summary + +ASL/INDEX-ACCEL/1 establishes a strict contract: + +> All acceleration exists to make the index faster, never different. + +It formalizes Canonical vs Routing keys and constrains filters, sharding, hashing, and SIMD so that correctness is preserved under all optimizations. diff --git a/tier1/asl-log-1.md b/tier1/asl-log-1.md new file mode 100644 index 0000000..2a9b472 --- /dev/null +++ b/tier1/asl-log-1.md @@ -0,0 +1,207 @@ +# ASL/LOG/1 — Append-Only Semantic Log + +Status: Draft +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: No +Last Updated: 2025-11-16 +Tags: [deterministic, log, snapshot] + +**Document ID:** `ASL/LOG/1` +**Layer:** L1 — Domain log semantics (no transport) + +**Depends on (normative):** + +* `ASL-STORE-INDEX` + +**Informative references:** + +* `ASL/1-CORE-INDEX` — index semantics +* `ENC-ASL-LOG` — bytes-on-disk encoding profile (if defined) +* `ENC-ASL-CORE-INDEX` — index segment encoding (`tier1/enc-asl-core-index.md`) + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. + +ASL/LOG/1 defines **semantic log behavior**. It does not define transport, replication protocols, or storage layout. + +--- + +## 1. Purpose + +ASL/LOG/1 defines the **authoritative, append-only log** for an ASL domain. + +The log records **semantic commits** that affect: + +* Index segment visibility +* Tombstone policy +* Snapshot anchoring +* Optional publication metadata + +The log is the **sole source of truth** for reconstructing CURRENT state. + +--- + +## 2. Core Properties (Normative) + +An ASL log MUST be: + +1. Append-only +2. Strictly ordered +3. Deterministically replayable +4. Hash-chained +5. Snapshot-anchorable +6. Forward-compatible + +--- + +## 3. Log Model + +### 3.1 Log Sequence + +Each record has a monotonically increasing `logseq`: + +``` +logseq: uint64 +``` + +* Assigned by the domain authority +* Total order within a domain +* Never reused + +### 3.2 Hash Chain + +Each record commits to the previous record: + +``` +record_hash = H(prev_record_hash || record_type || payload) +``` + +This enables tamper detection, witness signing, and federation verification. + +--- + +## 4. Record Types (Normative) + +### 4.1 SEGMENT_SEAL + +Declares an index segment visible. + +Semantics: + +* From this `logseq` onward, the referenced segment is visible for lookup and replay. +* Segment MUST be immutable. +* All referenced blocks MUST already be sealed. +* Segment contents are not re-logged. + +### 4.2 TOMBSTONE + +Declares an artifact inadmissible under domain policy. + +Semantics: + +* Does not delete data. +* Shadows prior visibility. +* Applies from this logseq onward. + +### 4.3 TOMBSTONE_LIFT + +Supersedes a previous tombstone. + +Semantics: + +* References an earlier TOMBSTONE. +* Does not erase history. +* Only affects CURRENT at or above this logseq. + +### 4.4 SNAPSHOT_ANCHOR + +Binds semantic state to a snapshot. + +Semantics: + +* Defines a replay checkpoint. +* Enables log truncation below anchor with care. + +### 4.5 ARTIFACT_PUBLISH (Optional) + +Marks an artifact as published. + +Semantics: + +* Publication is domain-local. +* Federation layers may interpret this metadata. + +### 4.6 ARTIFACT_UNPUBLISH (Optional) + +Withdraws publication. + +--- + +## 5. Replay Semantics (Normative) + +To reconstruct CURRENT: + +1. Load latest snapshot anchor (if any). +2. Initialize visible segments from that snapshot. +3. Replay all log records with `logseq > snapshot.logseq`. +4. Apply records in order: + + * SEGMENT_SEAL -> add segment + * TOMBSTONE -> update policy state + * TOMBSTONE_LIFT -> override policy + * PUBLISH/UNPUBLISH -> update visibility metadata + +Replay MUST be deterministic. + +--- + +## 6. Index Interaction + +* Index segments contain index entries. +* The log never records individual index entries. +* Visibility is controlled solely by SEGMENT_SEAL. +* Index rebuild = scan visible segments + apply policy. + +--- + +## 7. Garbage Collection Constraints + +* A segment may be GC'd only if: + + * No snapshot references it. + * No log replay <= CURRENT requires it. + +* Log truncation is only safe at SNAPSHOT_ANCHOR boundaries. + +--- + +## 8. Versioning & Extensibility + +* Unknown record types MUST be skipped and MUST NOT break replay. +* Payloads are opaque outside their type. +* New record types may be added in later versions. + +--- + +## 9. Non-Goals + +ASL/LOG/1 does not define: + +* Federation protocols +* Network replication +* Witness signatures +* Block-level events +* Hydration / eviction +* Execution receipts + +--- + +## 10. Summary + +ASL/LOG/1 defines the minimal semantic log needed to reconstruct CURRENT. + +If it affects visibility or admissibility, it goes in the log. If it affects layout or performance, it does not. diff --git a/tier1/asl-store-index.md b/tier1/asl-store-index.md new file mode 100644 index 0000000..e6ef4c7 --- /dev/null +++ b/tier1/asl-store-index.md @@ -0,0 +1,316 @@ +# ASL-STORE-INDEX + +### Store Semantics and Contracts for ASL Core Index (Tier1) + +--- + +## 1. Purpose + +This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX. + +It specifies: + +* **Block lifecycle**: creation, sealing, retention, GC +* **Index segment lifecycle**: creation, append, seal, visibility +* **Snapshot identity and log positions** for deterministic replay +* **Append-only log semantics** +* **Lookup, visibility, and crash recovery rules** +* **Small vs large block handling** + +It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX). + +--- + +## 2. Scope + +Covers: + +* Lifecycle of **blocks** and **index entries** +* Snapshot and CURRENT consistency guarantees +* Deterministic replay and recovery +* GC and tombstone semantics +* Packing policy for small vs large artifacts + +Excludes: + +* Disk-level encoding +* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1) +* Memory residency or caching +* Federation or PEL semantics + +--- + +## 3. Core Concepts + +### 3.1 Block + +* **Definition:** Immutable storage unit containing artifact bytes. +* **Identifier:** BlockID (opaque, unique). +* **Properties:** + + * Once sealed, contents never change. + * Can be referenced by multiple artifacts. + * May be pinned by snapshots for retention. + +### 3.2 Index Segment + +Segments group index entries and provide **persistence and recovery units**. + +* **Open segment:** accepting new index entries, not visible for lookup. +* **Sealed segment:** closed for append, log-visible, snapshot-pinnable. +* **Segment components:** header, optional bloom filter, index records, footer. +* **Segment visibility:** only after seal and log append. + +### 3.3 Append-Only Log + +All store-visible mutations are recorded in a **strictly ordered, append-only log**: + +* Entries include: + + * Index additions + * Tombstones + * Segment seals +* Log is replayable to reconstruct CURRENT. +* Log semantics are defined in `ASL/LOG/1`. + +### 3.4 Snapshot Identity and Log Position + +To make CURRENT referencable and replayable, ASL-STORE-INDEX defines: + +* **SnapshotID**: opaque, immutable identifier for a snapshot. +* **LogPosition**: monotonic integer position in the append-only log. +* **IndexState**: `(SnapshotID, LogPosition)`. + +Deterministic replay is defined as: + +``` +Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition]) +``` + +Snapshots and log positions are required for checkpointing, federation, and deterministic recovery. + +### 3.5 Artifact Location + +* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block. +* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes. +* Multi-extent locations allow a single artifact to be striped across multiple blocks. + +--- + +## 4. Block Lifecycle Semantics + +| Event | Description | Semantic Guarantees | +| ------------------ | ------------------------------------- | ------------------------------------------------------------- | +| Creation | Block allocated; bytes may be written | Not visible to index until sealed | +| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index | +| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed | +| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed | + +Notes: + +* Sealing ensures any index entry referencing the block is immutable. +* Retention is driven by snapshot and log visibility rules. +* GC must **never violate CURRENT reconstruction guarantees**. + +--- + +## 5. Segment Lifecycle Semantics + +### 5.1 Creation + +* Open segment is allocated. +* Index entries appended in log order. +* Entries are invisible until segment seal and log append. + +### 5.2 Seal + +* Segment is closed to append. +* Seal record is written to append-only log. +* Segment becomes visible for lookup. +* Sealed segment may be snapshot-pinned. + +### 5.3 Snapshot Interaction + +* Snapshots capture sealed segments. +* Open segments need not survive snapshot. +* Segments below snapshot are replay anchors. + +--- + +## 6. Visibility and Lookup Semantics + +### 6.1 Visibility Rules + +* Entry visible **iff**: + + * The block is sealed. + * Log record exists at position ≤ CURRENT. + * Segment seal recorded in log. + +* Entries above CURRENT or referencing unsealed blocks are invisible. + +### 6.2 Lookup Semantics + +To resolve an `ArtifactKey`: + +1. Identify all visible segments ≤ CURRENT. +2. Search segments in **reverse creation order** (newest first). +3. Return first matching entry. +4. Respect tombstones to shadow prior entries. + +Determinism: + +* Lookup results are identical across platforms given the same snapshot and log prefix. +* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**. + +--- + +## 7. Snapshot Interaction + +* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time. +* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration. +* CURRENT is reconstructed as: + +``` +CURRENT = snapshot_state + replay(log) +``` + +Segment and block visibility rules: + +| Entity | Visible in snapshot | Visible in CURRENT | +| -------------------- | ---------------------------- | ------------------------------ | +| Open segment/block | No | Only after seal and log append | +| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log | +| Tombstone | Yes, if log-recorded | Yes, shadows prior entries | + +--- + +## 8. Garbage Collection + +Eligibility for GC: + +* Segments: sealed, no references from CURRENT or snapshots. +* Blocks: unpinned, unreferenced by any segment or artifact. + +Rules: + +* GC is safe **only on sealed segments and blocks**. +* Must respect snapshot pins. +* Tombstones may aid in invalidating unreachable blocks. + +Outcome: + +* GC never violates CURRENT reconstruction. +* Blocks can be reclaimed without breaking provenance. + +--- + +## 9. Tombstone Semantics + +* Optional marker to invalidate prior mappings. +* Visibility rules identical to regular index entries. +* Used to maintain deterministic CURRENT in face of shadowing or deletions. + +--- + +## 10. Small vs Large Block Handling + +### 10.1 Definitions + +| Term | Meaning | +| ----------------- | --------------------------------------------------------------------- | +| **Small block** | Block containing artifact bytes below a threshold `T_small`. | +| **Large block** | Block containing artifact bytes ≥ `T_small`. | +| **Mixed segment** | Segment containing both small and large blocks (discouraged). | +| **Packing** | Combining multiple small artifacts into a single physical block. | + +Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers. + +### 10.2 Packing Rules + +1. **Small blocks may be packed together** to reduce storage overhead. +2. **Large blocks are never packed with other artifacts**. +3. Mixed segments are **allowed but discouraged**; index semantics remain identical. + +### 10.3 Segment Allocation Rules + +1. Small blocks are allocated into segments optimized for packing efficiency. +2. Large blocks are allocated into segments optimized for sequential I/O. +3. Segment sealing and visibility rules remain unchanged. + +### 10.4 Indexing and Addressing + +All blocks are addressed uniformly: + +``` +ArtifactExtent = (BlockID, offset, length) +ArtifactLocation = [ArtifactExtent...] +``` + +Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed. + +### 10.5 GC and Retention + +1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable. +2. Large blocks are reclaimed per block. + +Invariant: GC must never remove bytes still referenced by CURRENT or snapshots. + +--- + +## 11. Crash and Recovery Semantics + +* Open segments or unsealed blocks may be lost; no invariant is broken. +* Recovery procedure: + + 1. Mount last checkpoint snapshot. + 2. Replay append-only log from checkpoint. + 3. Reconstruct CURRENT. + +* Recovery is **deterministic and idempotent**. +* Segments and blocks **never partially visible** after crash. + +--- + +## 12. Normative Invariants + +1. Sealed blocks are immutable. +2. Index entries referencing blocks are immutable once visible. +3. Shadowing follows strict log order. +4. Replay of snapshot + log uniquely reconstructs CURRENT. +5. GC cannot remove blocks or segments needed by snapshot or CURRENT. +6. Tombstones shadow prior entries without deleting underlying blocks prematurely. +7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT. + +--- + +## 13. Non-Goals + +* Disk-level encoding (ENC-ASL-CORE-INDEX). +* Memory layout or caching. +* Sharding or performance heuristics. +* Federation / multi-domain semantics (handled elsewhere). +* Block packing strategies beyond the policy rules here. + +--- + +## 14. Relationship to Other Layers + +| Layer | Responsibility | +| ------------------ | ---------------------------------------------------------------------------- | +| ASL-CORE | Artifact semantics, existence of blocks, immutability | +| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation | +| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments | +| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters | + +--- + +## 15. Summary + +The tier1 ASL-STORE-INDEX specification: + +* Defines **block lifecycle** and **segment lifecycle**. +* Makes **snapshot identity and log positions** explicit for replay. +* Ensures deterministic visibility, lookup, and crash recovery. +* Formalizes GC safety and tombstone behavior. +* Adds clear **small vs large block** handling without changing core semantics. diff --git a/notes/enc-asl-core-index.md b/tier1/enc-asl-core-index.md similarity index 72% rename from notes/enc-asl-core-index.md rename to tier1/enc-asl-core-index.md index 724f1dd..1b8f384 100644 --- a/notes/enc-asl-core-index.md +++ b/tier1/enc-asl-core-index.md @@ -8,7 +8,7 @@ This document defines the **exact encoding of ASL index segments** and records for storage and interoperability. -It translates the **semantic model of ASL-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**. +It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**. It is intended for: @@ -19,8 +19,9 @@ It is intended for: It does **not** define: -* Index semantics (see ASL-CORE-INDEX) +* Index semantics (see ASL/1-CORE-INDEX) * Store lifecycle behavior (see ASL-STORE-INDEX) +* Acceleration semantics (see ASL/INDEX-ACCEL/1) --- @@ -49,6 +50,8 @@ Each index segment file is laid out as follows: +------------------+ | IndexRecord[] | +------------------+ +| ExtentRecord[] | ++------------------+ | SegmentFooter | +------------------+ ``` @@ -56,6 +59,7 @@ Each index segment file is laid out as follows: * **SegmentHeader**: fixed-size, mandatory * **BloomFilter**: optional, opaque, segment-local * **IndexRecord[]**: array of index entries +* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord * **SegmentFooter**: fixed-size, mandatory Offsets in the header define locations of Bloom filter and index records. @@ -81,6 +85,9 @@ typedef struct { uint64_t bloom_offset; // File offset of bloom filter (0 if none) uint64_t bloom_size; // Size of bloom filter (0 if none) + uint64_t extents_offset; // File offset of ExtentRecord array + uint64_t extent_count; // Total number of ExtentRecord entries + uint64_t flags; // Reserved for future use } SegmentHeader; #pragma pack(pop) @@ -104,9 +111,9 @@ typedef struct { uint64_t hash_lo; // Low 64 bits uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits - uint64_t block_id; // ASL block identifier - uint32_t offset; // Offset within block - uint32_t length; // Length of artifact bytes + uint64_t extents_offset; // File offset of first ExtentRecord for this entry + uint32_t extent_count; // Number of ExtentRecord entries for this artifact + uint32_t total_length; // Total artifact length in bytes uint32_t flags; // Optional flags (tombstone, reserved, etc.) uint32_t reserved; // Reserved for alignment/future use @@ -117,13 +124,34 @@ typedef struct { **Notes:** * `hash_*` fields store the artifact key deterministically. -* `block_id` references an ASL block. -* `offset` / `length` define bytes within the block. +* `extents_offset` references the first ExtentRecord for this entry. +* `extent_count` defines how many extents to read (may be 0 for tombstones). +* `total_length` is the exact artifact size in bytes. * Flags may indicate tombstone or other special status. --- -## 6. SegmentFooter +## 6. ExtentRecord + +```c +#pragma pack(push,1) +typedef struct { + uint64_t block_id; // ASL block identifier + uint32_t offset; // Offset within block + uint32_t length; // Length of this extent +} ExtentRecord; +#pragma pack(pop) +``` + +**Notes:** + +* Extents are concatenated in order to produce artifact bytes. +* `extent_count` MUST be > 0 for visible (non-tombstone) entries. +* `total_length` MUST equal the sum of `length` across the extents. + +--- + +## 7. SegmentFooter ```c #pragma pack(push,1) @@ -142,7 +170,7 @@ typedef struct { --- -## 7. Bloom Filter +## 8. Bloom Filter * The bloom filter is **optional** and opaque to semantics. * Its purpose is **lookup acceleration**. @@ -151,24 +179,27 @@ typedef struct { --- -## 8. Versioning and Compatibility +## 9. Versioning and Compatibility * `version` field in header defines encoding. * Readers must **reject unsupported versions**. * New fields may be added in future versions only via version bump. * Existing fields must **never change meaning**. +* Version `1` implies single-extent layout (legacy). +* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`. --- -## 9. Alignment and Packing +## 10. Alignment and Packing * All structures are **packed** (no compiler padding) * Multi-byte integers are **little-endian** * Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`. +* Extents are accessed via `IndexRecord.extents_offset` relative to the file base. --- -## 10. Summary of Encoding Guarantees +## 11. Summary of Encoding Guarantees The ENC-ASL-CORE-INDEX specification ensures: @@ -180,14 +211,13 @@ The ENC-ASL-CORE-INDEX specification ensures: --- -## 11. Relationship to Other Layers +## 12. Relationship to Other Layers | Layer | Responsibility | | ------------------ | ---------------------------------------------------------- | -| ASL-CORE-INDEX | Defines semantic meaning of artifact → location mapping | +| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping | | ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts | +| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) | | ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence | This completes the stack: **semantics → store behavior → encoding**. - -