amduat-api/tier1/asl-store-index.md

376 lines
12 KiB
Markdown
Raw Normal View History

2026-01-17 06:29:58 +01:00
# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Core Index (Tier1)
---
## 1. Purpose
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
It specifies:
* **Block lifecycle**: creation, sealing, retention, GC
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot identity and log positions** for deterministic replay
* **Append-only log semantics**
* **Lookup, visibility, and crash recovery rules**
* **Small vs large block handling**
It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX).
**Informative references:**
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
* `TGK/1` — TGK semantics and visibility alignment
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
2026-01-17 06:29:58 +01:00
---
## 2. Scope
Covers:
* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
* Packing policy for small vs large artifacts
Excludes:
* Disk-level encoding
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
* Memory residency or caching
* Federation, PEL, or TGK semantics (see `TGK/1` and `TGK/1-CORE`)
2026-01-17 06:29:58 +01:00
---
## 3. Core Concepts
### 3.1 Block
* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique).
* **Properties:**
* Once sealed, contents never change.
* Can be referenced by multiple artifacts.
* May be pinned by snapshots for retention.
2026-01-17 10:33:23 +01:00
* Allocation method is implementation-defined (e.g., hash or sequence).
2026-01-17 06:29:58 +01:00
### 3.2 Index Segment
Segments group index entries and provide **persistence and recovery units**.
* **Open segment:** accepting new index entries, not visible for lookup.
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
* **Segment components:** header, optional bloom filter, index records, footer.
* **Segment visibility:** only after seal and log append.
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include:
* Index additions
* Tombstones
* Segment seals
* Log is replayable to reconstruct CURRENT.
* Log semantics are defined in `ASL/LOG/1`.
### 3.4 Snapshot Identity and Log Position
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
* **SnapshotID**: opaque, immutable identifier for a snapshot.
* **LogPosition**: monotonic integer position in the append-only log.
* **IndexState**: `(SnapshotID, LogPosition)`.
Deterministic replay is defined as:
```
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
### 3.5 Artifact Location
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
---
## 4. PUT/GET Contract (Normative)
### 4.1 PUT Signature
```
put(artifact) -> (ArtifactKey, IndexState)
```
* `ArtifactKey` is the content identity (ASL/1-CORE-INDEX).
* `IndexState = (SnapshotID, LogPosition)` after the PUT is admitted.
### 4.2 PUT Semantics
1. **Structural registration (if applicable)**: if a structural index (SID -> DAG) exists, it MUST register the artifact and reuse existing SID entries.
2. **Materialization (if applicable)**: if the artifact is lazy, materialize deterministically to derive `ArtifactKey`.
3. **Deduplication**: lookup `ArtifactKey` at CURRENT. If present, PUT MUST succeed without writing bytes or adding a new index entry.
4. **Storage**: if absent, write bytes to one or more sealed blocks and produce `ArtifactLocation`.
5. **Index mutation**: append an index entry mapping `ArtifactKey -> ArtifactLocation` and record visibility via log order.
### 4.3 PUT Guarantees
* PUT is idempotent for identical artifacts.
* No visible index entry points to mutable or missing bytes.
* Visibility follows log order and seal rules defined in this document.
### 4.4 GET Signature
```
get(ArtifactKey, IndexState?) -> bytes | NOT_FOUND
```
* `IndexState` defaults to CURRENT when omitted.
### 4.5 GET Semantics
1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_prefix)`.
2. If no entry exists, return `NOT_FOUND`.
3. Otherwise, read exactly the referenced `(BlockID, offset, length)` bytes and return them verbatim.
GET MUST NOT mutate state or trigger materialization.
### 4.6 Failure Semantics
* Partial writes MUST NOT become visible.
* Replay of snapshot + log after crash MUST reconstruct a valid CURRENT.
* Implementations MAY use caching, but MUST preserve determinism.
---
## 5. Block Lifecycle Semantics
2026-01-17 06:29:58 +01:00
| Event | Description | Semantic Guarantees |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
* Sealing ensures any index entry referencing the block is immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.
---
## 6. Segment Lifecycle Semantics
2026-01-17 06:29:58 +01:00
### 5.1 Creation
* Open segment is allocated.
* Index entries appended in log order.
* Entries are invisible until segment seal and log append.
### 5.2 Seal
* Segment is closed to append.
* Seal record is written to append-only log.
* Segment becomes visible for lookup.
* Sealed segment may be snapshot-pinned.
### 5.3 Snapshot Interaction
* Snapshots capture sealed segments.
* Open segments need not survive snapshot.
* Segments below snapshot are replay anchors.
---
## 7. Visibility and Lookup Semantics
2026-01-17 06:29:58 +01:00
### 6.1 Visibility Rules
* Entry visible **iff**:
* The block is sealed.
* Log record exists at position ≤ CURRENT.
* Segment seal recorded in log.
* Entries above CURRENT or referencing unsealed blocks are invisible.
### 6.2 Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse seal-log order** (highest seal log position first).
2026-01-17 06:29:58 +01:00
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.
Determinism:
* Lookup results are identical across platforms given the same snapshot and log prefix.
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
---
## 8. Snapshot Interaction
2026-01-17 06:29:58 +01:00
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:
```
CURRENT = snapshot_state + replay(log)
```
Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
---
## 9. Garbage Collection
2026-01-17 06:29:58 +01:00
Eligibility for GC:
* Segments: sealed, no references from CURRENT or snapshots.
* Blocks: unpinned, unreferenced by any segment or artifact.
Rules:
* GC is safe **only on sealed segments and blocks**.
* Must respect snapshot pins.
* Tombstones may aid in invalidating unreachable blocks.
2026-01-17 10:33:23 +01:00
* Snapshots retained for provenance or receipt verification MUST remain pinned.
2026-01-17 06:29:58 +01:00
Outcome:
* GC never violates CURRENT reconstruction.
* Blocks can be reclaimed without breaking provenance.
---
## 10. Tombstone Semantics
2026-01-17 06:29:58 +01:00
* Optional marker to invalidate prior mappings.
* Visibility rules identical to regular index entries.
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
---
## 11. Small vs Large Block Handling
2026-01-17 06:29:58 +01:00
### 11.1 Definitions
2026-01-17 06:29:58 +01:00
| Term | Meaning |
| ----------------- | --------------------------------------------------------------------- |
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
| **Packing** | Combining multiple small artifacts into a single physical block. |
| **BlockID** | Opaque identifier for a block; addressing is identical for all sizes. |
2026-01-17 06:29:58 +01:00
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
`T_small` is configurable per deployment.
2026-01-17 06:29:58 +01:00
### 11.2 Packing Rules
2026-01-17 06:29:58 +01:00
1. **Small blocks may be packed together** to reduce storage overhead.
2. **Large blocks are never packed with other artifacts**.
3. Mixed segments are **allowed but discouraged**; implementations MAY warn when mixing occurs.
2026-01-17 06:29:58 +01:00
### 11.3 Segment Allocation Rules
2026-01-17 06:29:58 +01:00
1. Small blocks are allocated into segments optimized for packing efficiency.
2. Large blocks are allocated into segments optimized for sequential I/O.
3. Segment sealing and visibility rules remain unchanged.
### 11.4 Indexing and Addressing
2026-01-17 06:29:58 +01:00
All blocks are addressed uniformly:
```
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
```
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
### 11.5 GC and Retention
2026-01-17 06:29:58 +01:00
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
2. Large blocks are reclaimed per block.
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
---
## 12. Crash and Recovery Semantics
2026-01-17 06:29:58 +01:00
* Open segments or unsealed blocks may be lost; no invariant is broken.
* Recovery procedure:
1. Mount last checkpoint snapshot.
2. Replay append-only log from checkpoint.
3. Reconstruct CURRENT.
* Recovery is **deterministic and idempotent**.
* Segments and blocks **never partially visible** after crash.
---
## 13. Normative Invariants
2026-01-17 06:29:58 +01:00
1. Sealed blocks are immutable.
2. Index entries referencing blocks are immutable once visible.
3. Shadowing follows strict log order.
4. Replay of snapshot + log uniquely reconstructs CURRENT.
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
---
## 14. Non-Goals
2026-01-17 06:29:58 +01:00
* Disk-level encoding (ENC-ASL-CORE-INDEX).
* Memory layout or caching.
* Sharding or performance heuristics.
* Federation / multi-domain semantics (handled elsewhere).
* Block packing strategies beyond the policy rules here.
---
## 15. Relationship to Other Layers
2026-01-17 06:29:58 +01:00
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
---
## 16. Summary
2026-01-17 06:29:58 +01:00
The tier1 ASL-STORE-INDEX specification:
* Defines **block lifecycle** and **segment lifecycle**.
* Makes **snapshot identity and log positions** explicit for replay.
* Ensures deterministic visibility, lookup, and crash recovery.
* Formalizes GC safety and tombstone behavior.
* Adds clear **small vs large block** handling without changing core semantics.