# ASL-STORE-INDEX ### Store Semantics and Contracts for ASL Index --- ## 1. Purpose This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics. It bridges the gap between **index meaning** and **physical storage**, ensuring: * Deterministic replay * Snapshot-aware visibility * Immutable block guarantees * Idempotent recovery * Correctness of CURRENT state It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX). --- ## 2. Scope This specification covers: * Index segment lifecycle * Interaction between index and ASL blocks * Append-only log semantics * Snapshot integration * Visibility and lookup rules * Crash safety and recovery * Garbage collection constraints It does **not** cover: * Disk format details * Bloom filter algorithms * File system specifics * Placement heuristics beyond semantic guarantees --- ## 3. Core Concepts ### 3.1 Index Segment A **segment** is a contiguous set of index entries written by the store. * Open while accepting new entries * Sealed when closed for append * Sealed segments are immutable * Sealed segments are **snapshot-visible only after log record** Segments are the **unit of persistence, replay, and GC**. --- ### 3.2 ASL Block Relationship Each index entry references a **sealed block** via: ``` ArtifactKey → (BlockID, offset, length) ``` * The store must ensure the block is sealed before the entry becomes log-visible * Blocks are immutable after seal * Open blocks may be abandoned without violating invariants --- ### 3.3 Append-Only Log All store-visible mutations are recorded in a **strictly ordered, append-only log**: * Entries include index additions, tombstones, and segment seals * Log is durable and replayable * Log defines visibility above checkpoint snapshots **CURRENT state** is derived as: ``` CURRENT = checkpoint_state + replay(log) ``` --- ## 4. Segment Lifecycle ### 4.1 Creation * Open segment is allocated * Index entries appended in log order * Entries are invisible until segment seal and log append ### 4.2 Seal * Segment is closed to append * Seal record is written to append-only log * Segment becomes visible for lookup * Sealed segment may be snapshot-pinned ### 4.3 Snapshot Interaction * Snapshots capture sealed segments * Open segments need not survive snapshot * Segments below snapshot are replay anchors ### 4.4 Garbage Collection * Only **sealed and unreachable segments** can be deleted * GC operates at segment granularity * GC must not break CURRENT or violate invariants --- ## 5. Lookup Semantics To resolve an `ArtifactKey`: 1. Identify all visible segments ≤ CURRENT 2. Search segments in **reverse creation order** (newest first) 3. Return the first matching entry 4. Respect tombstone entries (if present) Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**. --- ## 6. Visibility Guarantees * Entry visible **iff**: * The block is sealed * Log record exists ≤ CURRENT * Segment seal recorded in log * Entries above CURRENT or referencing unsealed blocks are invisible --- ## 7. Crash and Recovery Semantics ### 7.1 Crash During Open Segment * Open segments may be lost * Index entries may be leaked * No sealed segment may be corrupted ### 7.2 Recovery Procedure 1. Mount latest checkpoint snapshot 2. Replay append-only log from checkpoint 3. Rebuild CURRENT 4. Resume normal operation Recovery must be **deterministic and idempotent**. --- ## 8. Tombstone Semantics * Optional: tombstones may exist to invalidate prior mappings * Tombstones shadow prior entries with the same `ArtifactKey` * Tombstone visibility follows same rules as regular entries --- ## 9. Invariants (Normative) The store **must enforce**: 1. No segment visible without seal log record 2. No mutation of sealed segment or block 3. Shadowing follows log order strictly 4. Replay uniquely reconstructs CURRENT 5. GC does not remove segments referenced by snapshot or log 6. ArtifactLocation always points to immutable bytes --- ## 10. Non-Goals ASL-STORE-INDEX does **not** define: * Disk layout or encoding (ENC-ASL-CORE-INDEX) * Placement heuristics (small vs. large block packing) * Performance targets * Memory caching strategies * Federation or provenance mechanics --- ## 11. Relationship to Other Documents | Layer | Responsibility | | ------------------ | -------------------------------------------------------------------- | | ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` | | ASL-STORE-INDEX | Defines contracts for store to realize those semantics | | ENC-ASL-CORE-INDEX | Defines bytes-on-disk format | --- ## 12. Summary The store-index layer guarantees: * Immutable, snapshot-safe segments * Deterministic and idempotent replay * Correct visibility semantics * Safe crash recovery * Garbage collection constraints This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies. Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous. --- # ASL-STORE-INDEX ### Store Semantics and Contracts for ASL Core Index (Refined) --- ## 1. Purpose This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX. It specifies: * **Block lifecycle**: creation, sealing, retention * **Index segment lifecycle**: creation, append, seal, visibility * **Snapshot interaction**: pinning, deterministic visibility * **Append-only log semantics** * **Garbage collection rules** It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX). --- ## 2. Scope Covers: * Lifecycle of **blocks** and **index entries** * Snapshot and CURRENT consistency guarantees * Deterministic replay and recovery * GC and tombstone semantics Excludes: * Disk-level encoding * Sharding strategies * Bloom filters or acceleration structures * Memory residency or caching * Federation or PEL semantics --- ## 3. Core Concepts ### 3.1 Block * **Definition:** Immutable storage unit containing artifact bytes. * **Identifier:** BlockID (opaque, unique) * **Properties:** * Once sealed, contents never change * Can be referenced by multiple artifacts * May be pinned by snapshots for retention * **Lifecycle Events:** 1. Creation: block allocated but contents may still be written 2. Sealing: block is finalized, immutable, and log-visible 3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT 4. Garbage collection: block may be deleted if no longer referenced and unpinned --- ### 3.2 Index Segment Segments group index entries and provide **persistence and recovery units**. * **Open segment:** accepting new index entries, not visible for lookup * **Sealed segment:** closed for append, log-visible, snapshot-pinnable * **Segment components:** header, optional bloom filter, index records, footer * **Segment visibility:** only after seal and log append --- ### 3.3 Append-Only Log All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**: * Entries include: * Index additions * Tombstones * Segment seals * Log is replayable to reconstruct CURRENT * Determinism: replay produces identical CURRENT from same snapshot and log prefix --- ## 4. Block Lifecycle Semantics | Event | Description | Semantic Guarantees | | ------------------ | ------------------------------------- | ------------------------------------------------------------- | | Creation | Block allocated; bytes may be written | Not visible to index until sealed | | Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index | | Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed | | Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed | **Notes:** * Sealing ensures that any index entry referencing the block is deterministic and immutable. * Retention is driven by snapshot and log visibility rules. * GC must **never violate CURRENT reconstruction guarantees**. --- ## 5. Snapshot Interaction * Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time. * Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration. * CURRENT is reconstructed as: ``` CURRENT = snapshot_state + replay(log) ``` * Segment and block visibility rules: | Entity | Visible in snapshot | Visible in CURRENT | | -------------------- | ---------------------------- | ------------------------------ | | Open segment/block | No | Only after seal and log append | | Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log | | Tombstone | Yes, if log-recorded | Yes, shadows prior entries | --- ## 6. Index Lookup Semantics To resolve an `ArtifactKey`: 1. Identify all visible segments ≤ CURRENT 2. Search segments in **reverse creation order** (newest first) 3. Return first matching entry 4. Respect tombstones to shadow prior entries Determinism: * Lookup results are identical across platforms given the same snapshot and log prefix * Accelerations (bloom filters, sharding, SIMD) do **not alter correctness** --- ## 7. Garbage Collection * **Eligibility for GC:** * Segments: sealed, no references from CURRENT or snapshots * Blocks: unpinned, unreferenced by any segment or artifact * **Rules:** * GC is safe **only on sealed segments and blocks** * Must respect snapshot pins * Tombstones may aid in invalidating unreachable blocks * **Outcome:** * GC never violates CURRENT reconstruction * Blocks can be reclaimed without breaking provenance --- ## 8. Tombstone Semantics * Optional marker to invalidate prior mappings * Visibility rules identical to regular index entries * Used to maintain deterministic CURRENT in face of shadowing or deletions --- ## 9. Crash and Recovery Semantics * Open segments or unsealed blocks may be lost; no invariant is broken * Recovery procedure: 1. Mount last checkpoint snapshot 2. Replay append-only log 3. Reconstruct CURRENT * Recovery is **deterministic and idempotent** * Segments and blocks **never partially visible** after crash --- ## 10. Normative Invariants 1. Sealed blocks are immutable 2. Index entries referencing blocks are immutable once visible 3. Shadowing follows strict log order 4. Replay of snapshot + log uniquely reconstructs CURRENT 5. GC cannot remove blocks or segments needed by snapshot or CURRENT 6. Tombstones shadow prior entries without deleting underlying blocks prematurely --- ## 11. Non-Goals * Disk-level encoding (ENC-ASL-CORE-INDEX) * Memory layout or caching * Sharding or performance heuristics * Federation / multi-domain semantics (handled elsewhere) * Block packing strategies (small vs large blocks) --- ## 12. Relationship to Other Layers | Layer | Responsibility | | ------------------ | ---------------------------------------------------------------------------- | | ASL-CORE | Artifact semantics, existence of blocks, immutability | | ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation | | ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments | | ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters | --- ## 13. Summary The refined ASL-STORE-INDEX: * Defines **block lifecycle**: creation, sealing, retention, GC * Ensures **snapshot safety** and deterministic visibility * Guarantees **immutable, replayable, and recoverable CURRENT** * Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics