amduat-api/tier1/asl-store-index.md

# ASL-STORE-INDEX

### Store Semantics and Contracts for ASL Core Index (Tier1)

---

## 1. Purpose

This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.

It specifies:

* **Block lifecycle**: creation, sealing, retention, GC
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot identity and log positions** for deterministic replay
* **Append-only log semantics**
* **Lookup, visibility, and crash recovery rules**
* **Small vs large block handling**

It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX).

**Informative references:**

* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)

---

## 2. Scope

Covers:

* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
* Packing policy for small vs large artifacts

Excludes:

* Disk-level encoding
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
* Memory residency or caching
* Federation or PEL semantics

---

## 3. Core Concepts

### 3.1 Block

* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique).
* **Properties:**

  * Once sealed, contents never change.
  * Can be referenced by multiple artifacts.
  * May be pinned by snapshots for retention.

### 3.2 Index Segment

Segments group index entries and provide **persistence and recovery units**.

* **Open segment:** accepting new index entries, not visible for lookup.
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
* **Segment components:** header, optional bloom filter, index records, footer.
* **Segment visibility:** only after seal and log append.

### 3.3 Append-Only Log

All store-visible mutations are recorded in a **strictly ordered, append-only log**:

* Entries include:

  * Index additions
  * Tombstones
  * Segment seals
* Log is replayable to reconstruct CURRENT.
* Log semantics are defined in `ASL/LOG/1`.

### 3.4 Snapshot Identity and Log Position

To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:

* **SnapshotID**: opaque, immutable identifier for a snapshot.
* **LogPosition**: monotonic integer position in the append-only log.
* **IndexState**: `(SnapshotID, LogPosition)`.

Deterministic replay is defined as:

```
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
```

Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.

### 3.5 Artifact Location

* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* Multi-extent locations allow a single artifact to be striped across multiple blocks.

---

## 4. Block Lifecycle Semantics

| Event              | Description                           | Semantic Guarantees                                           |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation           | Block allocated; bytes may be written | Not visible to index until sealed                             |
| Sealing            | Block is finalized and immutable      | Sealed blocks are stable and safe to reference from index     |
| Retention          | Block remains accessible              | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted                  | Only unpinned, unreachable blocks may be removed              |

Notes:

* Sealing ensures any index entry referencing the block is immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.

---

## 5. Segment Lifecycle Semantics

### 5.1 Creation

* Open segment is allocated.
* Index entries appended in log order.
* Entries are invisible until segment seal and log append.

### 5.2 Seal

* Segment is closed to append.
* Seal record is written to append-only log.
* Segment becomes visible for lookup.
* Sealed segment may be snapshot-pinned.

### 5.3 Snapshot Interaction

* Snapshots capture sealed segments.
* Open segments need not survive snapshot.
* Segments below snapshot are replay anchors.

---

## 6. Visibility and Lookup Semantics

### 6.1 Visibility Rules

* Entry visible **iff**:

  * The block is sealed.
  * Log record exists at position ≤ CURRENT.
  * Segment seal recorded in log.

* Entries above CURRENT or referencing unsealed blocks are invisible.

### 6.2 Lookup Semantics

To resolve an `ArtifactKey`:

1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse seal-log order** (highest seal log position first).
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.

Determinism:

* Lookup results are identical across platforms given the same snapshot and log prefix.
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.

---

## 7. Snapshot Interaction

* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:

```
CURRENT = snapshot_state + replay(log)
```

Segment and block visibility rules:

| Entity               | Visible in snapshot          | Visible in CURRENT             |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block   | No                           | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log         |
| Tombstone            | Yes, if log-recorded         | Yes, shadows prior entries     |

---

## 8. Garbage Collection

Eligibility for GC:

* Segments: sealed, no references from CURRENT or snapshots.
* Blocks: unpinned, unreferenced by any segment or artifact.

Rules:

* GC is safe **only on sealed segments and blocks**.
* Must respect snapshot pins.
* Tombstones may aid in invalidating unreachable blocks.

Outcome:

* GC never violates CURRENT reconstruction.
* Blocks can be reclaimed without breaking provenance.

---

## 9. Tombstone Semantics

* Optional marker to invalidate prior mappings.
* Visibility rules identical to regular index entries.
* Used to maintain deterministic CURRENT in face of shadowing or deletions.

---

## 10. Small vs Large Block Handling

### 10.1 Definitions

| Term              | Meaning                                                               |
| ----------------- | --------------------------------------------------------------------- |
| **Small block**   | Block containing artifact bytes below a threshold `T_small`.          |
| **Large block**   | Block containing artifact bytes ≥ `T_small`.                          |
| **Mixed segment** | Segment containing both small and large blocks (discouraged).         |
| **Packing**       | Combining multiple small artifacts into a single physical block.      |

Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.

### 10.2 Packing Rules

1. **Small blocks may be packed together** to reduce storage overhead.
2. **Large blocks are never packed with other artifacts**.
3. Mixed segments are **allowed but discouraged**; index semantics remain identical.

### 10.3 Segment Allocation Rules

1. Small blocks are allocated into segments optimized for packing efficiency.
2. Large blocks are allocated into segments optimized for sequential I/O.
3. Segment sealing and visibility rules remain unchanged.

### 10.4 Indexing and Addressing

All blocks are addressed uniformly:

```
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
```

Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.

### 10.5 GC and Retention

1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
2. Large blocks are reclaimed per block.

Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.

---

## 11. Crash and Recovery Semantics

* Open segments or unsealed blocks may be lost; no invariant is broken.
* Recovery procedure:

  1. Mount last checkpoint snapshot.
  2. Replay append-only log from checkpoint.
  3. Reconstruct CURRENT.

* Recovery is **deterministic and idempotent**.
* Segments and blocks **never partially visible** after crash.

---

## 12. Normative Invariants

1. Sealed blocks are immutable.
2. Index entries referencing blocks are immutable once visible.
3. Shadowing follows strict log order.
4. Replay of snapshot + log uniquely reconstructs CURRENT.
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.

---

## 13. Non-Goals

* Disk-level encoding (ENC-ASL-CORE-INDEX).
* Memory layout or caching.
* Sharding or performance heuristics.
* Federation / multi-domain semantics (handled elsewhere).
* Block packing strategies beyond the policy rules here.

---

## 14. Relationship to Other Layers

| Layer              | Responsibility                                                               |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE           | Artifact semantics, existence of blocks, immutability                        |
| ASL-CORE-INDEX     | Semantic mapping of ArtifactKey → ArtifactLocation                           |
| ASL-STORE-INDEX    | Lifecycle and operational contracts for blocks and segments                  |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |

---

## 15. Summary

The tier1 ASL-STORE-INDEX specification:

* Defines **block lifecycle** and **segment lifecycle**.
* Makes **snapshot identity and log positions** explicit for replay.
* Ensures deterministic visibility, lookup, and crash recovery.
* Formalizes GC safety and tombstone behavior.
* Adds clear **small vs large block** handling without changing core semantics.