amduat-api/tier1/asl-store-index.md
2026-01-17 07:32:14 +01:00

321 lines
10 KiB
Markdown

# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Core Index (Tier1)
---
## 1. Purpose
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
It specifies:
* **Block lifecycle**: creation, sealing, retention, GC
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot identity and log positions** for deterministic replay
* **Append-only log semantics**
* **Lookup, visibility, and crash recovery rules**
* **Small vs large block handling**
It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX).
**Informative references:**
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
---
## 2. Scope
Covers:
* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
* Packing policy for small vs large artifacts
Excludes:
* Disk-level encoding
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
* Memory residency or caching
* Federation or PEL semantics
---
## 3. Core Concepts
### 3.1 Block
* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique).
* **Properties:**
* Once sealed, contents never change.
* Can be referenced by multiple artifacts.
* May be pinned by snapshots for retention.
### 3.2 Index Segment
Segments group index entries and provide **persistence and recovery units**.
* **Open segment:** accepting new index entries, not visible for lookup.
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
* **Segment components:** header, optional bloom filter, index records, footer.
* **Segment visibility:** only after seal and log append.
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include:
* Index additions
* Tombstones
* Segment seals
* Log is replayable to reconstruct CURRENT.
* Log semantics are defined in `ASL/LOG/1`.
### 3.4 Snapshot Identity and Log Position
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
* **SnapshotID**: opaque, immutable identifier for a snapshot.
* **LogPosition**: monotonic integer position in the append-only log.
* **IndexState**: `(SnapshotID, LogPosition)`.
Deterministic replay is defined as:
```
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
### 3.5 Artifact Location
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
---
## 4. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
* Sealing ensures any index entry referencing the block is immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.
---
## 5. Segment Lifecycle Semantics
### 5.1 Creation
* Open segment is allocated.
* Index entries appended in log order.
* Entries are invisible until segment seal and log append.
### 5.2 Seal
* Segment is closed to append.
* Seal record is written to append-only log.
* Segment becomes visible for lookup.
* Sealed segment may be snapshot-pinned.
### 5.3 Snapshot Interaction
* Snapshots capture sealed segments.
* Open segments need not survive snapshot.
* Segments below snapshot are replay anchors.
---
## 6. Visibility and Lookup Semantics
### 6.1 Visibility Rules
* Entry visible **iff**:
* The block is sealed.
* Log record exists at position ≤ CURRENT.
* Segment seal recorded in log.
* Entries above CURRENT or referencing unsealed blocks are invisible.
### 6.2 Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse seal-log order** (highest seal log position first).
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.
Determinism:
* Lookup results are identical across platforms given the same snapshot and log prefix.
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
---
## 7. Snapshot Interaction
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:
```
CURRENT = snapshot_state + replay(log)
```
Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
---
## 8. Garbage Collection
Eligibility for GC:
* Segments: sealed, no references from CURRENT or snapshots.
* Blocks: unpinned, unreferenced by any segment or artifact.
Rules:
* GC is safe **only on sealed segments and blocks**.
* Must respect snapshot pins.
* Tombstones may aid in invalidating unreachable blocks.
Outcome:
* GC never violates CURRENT reconstruction.
* Blocks can be reclaimed without breaking provenance.
---
## 9. Tombstone Semantics
* Optional marker to invalidate prior mappings.
* Visibility rules identical to regular index entries.
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
---
## 10. Small vs Large Block Handling
### 10.1 Definitions
| Term | Meaning |
| ----------------- | --------------------------------------------------------------------- |
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
| **Packing** | Combining multiple small artifacts into a single physical block. |
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
### 10.2 Packing Rules
1. **Small blocks may be packed together** to reduce storage overhead.
2. **Large blocks are never packed with other artifacts**.
3. Mixed segments are **allowed but discouraged**; index semantics remain identical.
### 10.3 Segment Allocation Rules
1. Small blocks are allocated into segments optimized for packing efficiency.
2. Large blocks are allocated into segments optimized for sequential I/O.
3. Segment sealing and visibility rules remain unchanged.
### 10.4 Indexing and Addressing
All blocks are addressed uniformly:
```
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
```
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
### 10.5 GC and Retention
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
2. Large blocks are reclaimed per block.
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
---
## 11. Crash and Recovery Semantics
* Open segments or unsealed blocks may be lost; no invariant is broken.
* Recovery procedure:
1. Mount last checkpoint snapshot.
2. Replay append-only log from checkpoint.
3. Reconstruct CURRENT.
* Recovery is **deterministic and idempotent**.
* Segments and blocks **never partially visible** after crash.
---
## 12. Normative Invariants
1. Sealed blocks are immutable.
2. Index entries referencing blocks are immutable once visible.
3. Shadowing follows strict log order.
4. Replay of snapshot + log uniquely reconstructs CURRENT.
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
---
## 13. Non-Goals
* Disk-level encoding (ENC-ASL-CORE-INDEX).
* Memory layout or caching.
* Sharding or performance heuristics.
* Federation / multi-domain semantics (handled elsewhere).
* Block packing strategies beyond the policy rules here.
---
## 14. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
---
## 15. Summary
The tier1 ASL-STORE-INDEX specification:
* Defines **block lifecycle** and **segment lifecycle**.
* Makes **snapshot identity and log positions** explicit for replay.
* Ensures deterministic visibility, lookup, and crash recovery.
* Formalizes GC safety and tombstone behavior.
* Adds clear **small vs large block** handling without changing core semantics.