# ASL-STORE-INDEX ### Store Semantics and Contracts for ASL Core Index (Tier1) --- ## 1. Purpose This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX. It specifies: * **Block lifecycle**: creation, sealing, retention, GC * **Index segment lifecycle**: creation, append, seal, visibility * **Snapshot identity and log positions** for deterministic replay * **Append-only log semantics** * **Lookup, visibility, and crash recovery rules** * **Small vs large block handling** It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX). --- ## 2. Scope Covers: * Lifecycle of **blocks** and **index entries** * Snapshot and CURRENT consistency guarantees * Deterministic replay and recovery * GC and tombstone semantics * Packing policy for small vs large artifacts Excludes: * Disk-level encoding * Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1) * Memory residency or caching * Federation or PEL semantics --- ## 3. Core Concepts ### 3.1 Block * **Definition:** Immutable storage unit containing artifact bytes. * **Identifier:** BlockID (opaque, unique). * **Properties:** * Once sealed, contents never change. * Can be referenced by multiple artifacts. * May be pinned by snapshots for retention. ### 3.2 Index Segment Segments group index entries and provide **persistence and recovery units**. * **Open segment:** accepting new index entries, not visible for lookup. * **Sealed segment:** closed for append, log-visible, snapshot-pinnable. * **Segment components:** header, optional bloom filter, index records, footer. * **Segment visibility:** only after seal and log append. ### 3.3 Append-Only Log All store-visible mutations are recorded in a **strictly ordered, append-only log**: * Entries include: * Index additions * Tombstones * Segment seals * Log is replayable to reconstruct CURRENT. * Log semantics are defined in `ASL/LOG/1`. ### 3.4 Snapshot Identity and Log Position To make CURRENT referencable and replayable, ASL-STORE-INDEX defines: * **SnapshotID**: opaque, immutable identifier for a snapshot. * **LogPosition**: monotonic integer position in the append-only log. * **IndexState**: `(SnapshotID, LogPosition)`. Deterministic replay is defined as: ``` Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition]) ``` Snapshots and log positions are required for checkpointing, federation, and deterministic recovery. ### 3.5 Artifact Location * **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block. * **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes. * Multi-extent locations allow a single artifact to be striped across multiple blocks. --- ## 4. Block Lifecycle Semantics | Event | Description | Semantic Guarantees | | ------------------ | ------------------------------------- | ------------------------------------------------------------- | | Creation | Block allocated; bytes may be written | Not visible to index until sealed | | Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index | | Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed | | Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed | Notes: * Sealing ensures any index entry referencing the block is immutable. * Retention is driven by snapshot and log visibility rules. * GC must **never violate CURRENT reconstruction guarantees**. --- ## 5. Segment Lifecycle Semantics ### 5.1 Creation * Open segment is allocated. * Index entries appended in log order. * Entries are invisible until segment seal and log append. ### 5.2 Seal * Segment is closed to append. * Seal record is written to append-only log. * Segment becomes visible for lookup. * Sealed segment may be snapshot-pinned. ### 5.3 Snapshot Interaction * Snapshots capture sealed segments. * Open segments need not survive snapshot. * Segments below snapshot are replay anchors. --- ## 6. Visibility and Lookup Semantics ### 6.1 Visibility Rules * Entry visible **iff**: * The block is sealed. * Log record exists at position ≤ CURRENT. * Segment seal recorded in log. * Entries above CURRENT or referencing unsealed blocks are invisible. ### 6.2 Lookup Semantics To resolve an `ArtifactKey`: 1. Identify all visible segments ≤ CURRENT. 2. Search segments in **reverse seal-log order** (highest seal log position first). 3. Return first matching entry. 4. Respect tombstones to shadow prior entries. Determinism: * Lookup results are identical across platforms given the same snapshot and log prefix. * Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**. --- ## 7. Snapshot Interaction * Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time. * Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration. * CURRENT is reconstructed as: ``` CURRENT = snapshot_state + replay(log) ``` Segment and block visibility rules: | Entity | Visible in snapshot | Visible in CURRENT | | -------------------- | ---------------------------- | ------------------------------ | | Open segment/block | No | Only after seal and log append | | Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log | | Tombstone | Yes, if log-recorded | Yes, shadows prior entries | --- ## 8. Garbage Collection Eligibility for GC: * Segments: sealed, no references from CURRENT or snapshots. * Blocks: unpinned, unreferenced by any segment or artifact. Rules: * GC is safe **only on sealed segments and blocks**. * Must respect snapshot pins. * Tombstones may aid in invalidating unreachable blocks. Outcome: * GC never violates CURRENT reconstruction. * Blocks can be reclaimed without breaking provenance. --- ## 9. Tombstone Semantics * Optional marker to invalidate prior mappings. * Visibility rules identical to regular index entries. * Used to maintain deterministic CURRENT in face of shadowing or deletions. --- ## 10. Small vs Large Block Handling ### 10.1 Definitions | Term | Meaning | | ----------------- | --------------------------------------------------------------------- | | **Small block** | Block containing artifact bytes below a threshold `T_small`. | | **Large block** | Block containing artifact bytes ≥ `T_small`. | | **Mixed segment** | Segment containing both small and large blocks (discouraged). | | **Packing** | Combining multiple small artifacts into a single physical block. | Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers. ### 10.2 Packing Rules 1. **Small blocks may be packed together** to reduce storage overhead. 2. **Large blocks are never packed with other artifacts**. 3. Mixed segments are **allowed but discouraged**; index semantics remain identical. ### 10.3 Segment Allocation Rules 1. Small blocks are allocated into segments optimized for packing efficiency. 2. Large blocks are allocated into segments optimized for sequential I/O. 3. Segment sealing and visibility rules remain unchanged. ### 10.4 Indexing and Addressing All blocks are addressed uniformly: ``` ArtifactExtent = (BlockID, offset, length) ArtifactLocation = [ArtifactExtent...] ``` Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed. ### 10.5 GC and Retention 1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable. 2. Large blocks are reclaimed per block. Invariant: GC must never remove bytes still referenced by CURRENT or snapshots. --- ## 11. Crash and Recovery Semantics * Open segments or unsealed blocks may be lost; no invariant is broken. * Recovery procedure: 1. Mount last checkpoint snapshot. 2. Replay append-only log from checkpoint. 3. Reconstruct CURRENT. * Recovery is **deterministic and idempotent**. * Segments and blocks **never partially visible** after crash. --- ## 12. Normative Invariants 1. Sealed blocks are immutable. 2. Index entries referencing blocks are immutable once visible. 3. Shadowing follows strict log order. 4. Replay of snapshot + log uniquely reconstructs CURRENT. 5. GC cannot remove blocks or segments needed by snapshot or CURRENT. 6. Tombstones shadow prior entries without deleting underlying blocks prematurely. 7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT. --- ## 13. Non-Goals * Disk-level encoding (ENC-ASL-CORE-INDEX). * Memory layout or caching. * Sharding or performance heuristics. * Federation / multi-domain semantics (handled elsewhere). * Block packing strategies beyond the policy rules here. --- ## 14. Relationship to Other Layers | Layer | Responsibility | | ------------------ | ---------------------------------------------------------------------------- | | ASL-CORE | Artifact semantics, existence of blocks, immutability | | ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation | | ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments | | ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters | --- ## 15. Summary The tier1 ASL-STORE-INDEX specification: * Defines **block lifecycle** and **segment lifecycle**. * Makes **snapshot identity and log positions** explicit for replay. * Ensures deterministic visibility, lookup, and crash recovery. * Formalizes GC safety and tombstone behavior. * Adds clear **small vs large block** handling without changing core semantics.