440 lines
12 KiB
Markdown
440 lines
12 KiB
Markdown
# ASL-STORE-INDEX
|
||
|
||
### Store Semantics and Contracts for ASL Index
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
|
||
|
||
It bridges the gap between **index meaning** and **physical storage**, ensuring:
|
||
|
||
* Deterministic replay
|
||
* Snapshot-aware visibility
|
||
* Immutable block guarantees
|
||
* Idempotent recovery
|
||
* Correctness of CURRENT state
|
||
|
||
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
|
||
|
||
---
|
||
|
||
## 2. Scope
|
||
|
||
This specification covers:
|
||
|
||
* Index segment lifecycle
|
||
* Interaction between index and ASL blocks
|
||
* Append-only log semantics
|
||
* Snapshot integration
|
||
* Visibility and lookup rules
|
||
* Crash safety and recovery
|
||
* Garbage collection constraints
|
||
|
||
It does **not** cover:
|
||
|
||
* Disk format details
|
||
* Bloom filter algorithms
|
||
* File system specifics
|
||
* Placement heuristics beyond semantic guarantees
|
||
|
||
---
|
||
|
||
## 3. Core Concepts
|
||
|
||
### 3.1 Index Segment
|
||
|
||
A **segment** is a contiguous set of index entries written by the store.
|
||
|
||
* Open while accepting new entries
|
||
* Sealed when closed for append
|
||
* Sealed segments are immutable
|
||
* Sealed segments are **snapshot-visible only after log record**
|
||
|
||
Segments are the **unit of persistence, replay, and GC**.
|
||
|
||
---
|
||
|
||
### 3.2 ASL Block Relationship
|
||
|
||
Each index entry references a **sealed block** via:
|
||
|
||
```
|
||
ArtifactKey → (BlockID, offset, length)
|
||
```
|
||
|
||
* The store must ensure the block is sealed before the entry becomes log-visible
|
||
* Blocks are immutable after seal
|
||
* Open blocks may be abandoned without violating invariants
|
||
|
||
---
|
||
|
||
### 3.3 Append-Only Log
|
||
|
||
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
||
|
||
* Entries include index additions, tombstones, and segment seals
|
||
* Log is durable and replayable
|
||
* Log defines visibility above checkpoint snapshots
|
||
|
||
**CURRENT state** is derived as:
|
||
|
||
```
|
||
CURRENT = checkpoint_state + replay(log)
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Segment Lifecycle
|
||
|
||
### 4.1 Creation
|
||
|
||
* Open segment is allocated
|
||
* Index entries appended in log order
|
||
* Entries are invisible until segment seal and log append
|
||
|
||
### 4.2 Seal
|
||
|
||
* Segment is closed to append
|
||
* Seal record is written to append-only log
|
||
* Segment becomes visible for lookup
|
||
* Sealed segment may be snapshot-pinned
|
||
|
||
### 4.3 Snapshot Interaction
|
||
|
||
* Snapshots capture sealed segments
|
||
* Open segments need not survive snapshot
|
||
* Segments below snapshot are replay anchors
|
||
|
||
### 4.4 Garbage Collection
|
||
|
||
* Only **sealed and unreachable segments** can be deleted
|
||
* GC operates at segment granularity
|
||
* GC must not break CURRENT or violate invariants
|
||
|
||
---
|
||
|
||
## 5. Lookup Semantics
|
||
|
||
To resolve an `ArtifactKey`:
|
||
|
||
1. Identify all visible segments ≤ CURRENT
|
||
2. Search segments in **reverse creation order** (newest first)
|
||
3. Return the first matching entry
|
||
4. Respect tombstone entries (if present)
|
||
|
||
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
|
||
|
||
---
|
||
|
||
## 6. Visibility Guarantees
|
||
|
||
* Entry visible **iff**:
|
||
|
||
* The block is sealed
|
||
* Log record exists ≤ CURRENT
|
||
* Segment seal recorded in log
|
||
* Entries above CURRENT or referencing unsealed blocks are invisible
|
||
|
||
---
|
||
|
||
## 7. Crash and Recovery Semantics
|
||
|
||
### 7.1 Crash During Open Segment
|
||
|
||
* Open segments may be lost
|
||
* Index entries may be leaked
|
||
* No sealed segment may be corrupted
|
||
|
||
### 7.2 Recovery Procedure
|
||
|
||
1. Mount latest checkpoint snapshot
|
||
2. Replay append-only log from checkpoint
|
||
3. Rebuild CURRENT
|
||
4. Resume normal operation
|
||
|
||
Recovery must be **deterministic and idempotent**.
|
||
|
||
---
|
||
|
||
## 8. Tombstone Semantics
|
||
|
||
* Optional: tombstones may exist to invalidate prior mappings
|
||
* Tombstones shadow prior entries with the same `ArtifactKey`
|
||
* Tombstone visibility follows same rules as regular entries
|
||
|
||
---
|
||
|
||
## 9. Invariants (Normative)
|
||
|
||
The store **must enforce**:
|
||
|
||
1. No segment visible without seal log record
|
||
2. No mutation of sealed segment or block
|
||
3. Shadowing follows log order strictly
|
||
4. Replay uniquely reconstructs CURRENT
|
||
5. GC does not remove segments referenced by snapshot or log
|
||
6. ArtifactLocation always points to immutable bytes
|
||
|
||
---
|
||
|
||
## 10. Non-Goals
|
||
|
||
ASL-STORE-INDEX does **not** define:
|
||
|
||
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
|
||
* Placement heuristics (small vs. large block packing)
|
||
* Performance targets
|
||
* Memory caching strategies
|
||
* Federation or provenance mechanics
|
||
|
||
---
|
||
|
||
## 11. Relationship to Other Documents
|
||
|
||
| Layer | Responsibility |
|
||
| ------------------ | -------------------------------------------------------------------- |
|
||
| ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` |
|
||
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
|
||
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
|
||
|
||
---
|
||
|
||
## 12. Summary
|
||
|
||
The store-index layer guarantees:
|
||
|
||
* Immutable, snapshot-safe segments
|
||
* Deterministic and idempotent replay
|
||
* Correct visibility semantics
|
||
* Safe crash recovery
|
||
* Garbage collection constraints
|
||
|
||
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
|
||
|
||
Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
|
||
|
||
---
|
||
|
||
# ASL-STORE-INDEX
|
||
|
||
### Store Semantics and Contracts for ASL Core Index (Refined)
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX.
|
||
|
||
It specifies:
|
||
|
||
* **Block lifecycle**: creation, sealing, retention
|
||
* **Index segment lifecycle**: creation, append, seal, visibility
|
||
* **Snapshot interaction**: pinning, deterministic visibility
|
||
* **Append-only log semantics**
|
||
* **Garbage collection rules**
|
||
|
||
It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).
|
||
|
||
---
|
||
|
||
## 2. Scope
|
||
|
||
Covers:
|
||
|
||
* Lifecycle of **blocks** and **index entries**
|
||
* Snapshot and CURRENT consistency guarantees
|
||
* Deterministic replay and recovery
|
||
* GC and tombstone semantics
|
||
|
||
Excludes:
|
||
|
||
* Disk-level encoding
|
||
* Sharding strategies
|
||
* Bloom filters or acceleration structures
|
||
* Memory residency or caching
|
||
* Federation or PEL semantics
|
||
|
||
---
|
||
|
||
## 3. Core Concepts
|
||
|
||
### 3.1 Block
|
||
|
||
* **Definition:** Immutable storage unit containing artifact bytes.
|
||
* **Identifier:** BlockID (opaque, unique)
|
||
* **Properties:**
|
||
|
||
* Once sealed, contents never change
|
||
* Can be referenced by multiple artifacts
|
||
* May be pinned by snapshots for retention
|
||
* **Lifecycle Events:**
|
||
|
||
1. Creation: block allocated but contents may still be written
|
||
2. Sealing: block is finalized, immutable, and log-visible
|
||
3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT
|
||
4. Garbage collection: block may be deleted if no longer referenced and unpinned
|
||
|
||
---
|
||
|
||
### 3.2 Index Segment
|
||
|
||
Segments group index entries and provide **persistence and recovery units**.
|
||
|
||
* **Open segment:** accepting new index entries, not visible for lookup
|
||
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable
|
||
* **Segment components:** header, optional bloom filter, index records, footer
|
||
* **Segment visibility:** only after seal and log append
|
||
|
||
---
|
||
|
||
### 3.3 Append-Only Log
|
||
|
||
All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**:
|
||
|
||
* Entries include:
|
||
|
||
* Index additions
|
||
* Tombstones
|
||
* Segment seals
|
||
* Log is replayable to reconstruct CURRENT
|
||
* Determinism: replay produces identical CURRENT from same snapshot and log prefix
|
||
|
||
---
|
||
|
||
## 4. Block Lifecycle Semantics
|
||
|
||
| Event | Description | Semantic Guarantees |
|
||
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
|
||
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
|
||
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
|
||
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
|
||
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
|
||
|
||
**Notes:**
|
||
|
||
* Sealing ensures that any index entry referencing the block is deterministic and immutable.
|
||
* Retention is driven by snapshot and log visibility rules.
|
||
* GC must **never violate CURRENT reconstruction guarantees**.
|
||
|
||
---
|
||
|
||
## 5. Snapshot Interaction
|
||
|
||
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
|
||
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
|
||
* CURRENT is reconstructed as:
|
||
|
||
```
|
||
CURRENT = snapshot_state + replay(log)
|
||
```
|
||
|
||
* Segment and block visibility rules:
|
||
|
||
| Entity | Visible in snapshot | Visible in CURRENT |
|
||
| -------------------- | ---------------------------- | ------------------------------ |
|
||
| Open segment/block | No | Only after seal and log append |
|
||
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
|
||
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
|
||
|
||
---
|
||
|
||
## 6. Index Lookup Semantics
|
||
|
||
To resolve an `ArtifactKey`:
|
||
|
||
1. Identify all visible segments ≤ CURRENT
|
||
2. Search segments in **reverse creation order** (newest first)
|
||
3. Return first matching entry
|
||
4. Respect tombstones to shadow prior entries
|
||
|
||
Determinism:
|
||
|
||
* Lookup results are identical across platforms given the same snapshot and log prefix
|
||
* Accelerations (bloom filters, sharding, SIMD) do **not alter correctness**
|
||
|
||
---
|
||
|
||
## 7. Garbage Collection
|
||
|
||
* **Eligibility for GC:**
|
||
|
||
* Segments: sealed, no references from CURRENT or snapshots
|
||
* Blocks: unpinned, unreferenced by any segment or artifact
|
||
* **Rules:**
|
||
|
||
* GC is safe **only on sealed segments and blocks**
|
||
* Must respect snapshot pins
|
||
* Tombstones may aid in invalidating unreachable blocks
|
||
* **Outcome:**
|
||
|
||
* GC never violates CURRENT reconstruction
|
||
* Blocks can be reclaimed without breaking provenance
|
||
|
||
---
|
||
|
||
## 8. Tombstone Semantics
|
||
|
||
* Optional marker to invalidate prior mappings
|
||
* Visibility rules identical to regular index entries
|
||
* Used to maintain deterministic CURRENT in face of shadowing or deletions
|
||
|
||
---
|
||
|
||
## 9. Crash and Recovery Semantics
|
||
|
||
* Open segments or unsealed blocks may be lost; no invariant is broken
|
||
* Recovery procedure:
|
||
|
||
1. Mount last checkpoint snapshot
|
||
2. Replay append-only log
|
||
3. Reconstruct CURRENT
|
||
* Recovery is **deterministic and idempotent**
|
||
* Segments and blocks **never partially visible** after crash
|
||
|
||
---
|
||
|
||
## 10. Normative Invariants
|
||
|
||
1. Sealed blocks are immutable
|
||
2. Index entries referencing blocks are immutable once visible
|
||
3. Shadowing follows strict log order
|
||
4. Replay of snapshot + log uniquely reconstructs CURRENT
|
||
5. GC cannot remove blocks or segments needed by snapshot or CURRENT
|
||
6. Tombstones shadow prior entries without deleting underlying blocks prematurely
|
||
|
||
---
|
||
|
||
## 11. Non-Goals
|
||
|
||
* Disk-level encoding (ENC-ASL-CORE-INDEX)
|
||
* Memory layout or caching
|
||
* Sharding or performance heuristics
|
||
* Federation / multi-domain semantics (handled elsewhere)
|
||
* Block packing strategies (small vs large blocks)
|
||
|
||
---
|
||
|
||
## 12. Relationship to Other Layers
|
||
|
||
| Layer | Responsibility |
|
||
| ------------------ | ---------------------------------------------------------------------------- |
|
||
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
|
||
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
|
||
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
|
||
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
|
||
|
||
---
|
||
|
||
## 13. Summary
|
||
|
||
The refined ASL-STORE-INDEX:
|
||
|
||
* Defines **block lifecycle**: creation, sealing, retention, GC
|
||
* Ensures **snapshot safety** and deterministic visibility
|
||
* Guarantees **immutable, replayable, and recoverable CURRENT**
|
||
* Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics
|
||
|
||
|