440 lines
12 KiB
Markdown
440 lines
12 KiB
Markdown
|
|
# ASL-STORE-INDEX
|
|||
|
|
|
|||
|
|
### Store Semantics and Contracts for ASL Index
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
|
|||
|
|
|
|||
|
|
It bridges the gap between **index meaning** and **physical storage**, ensuring:
|
|||
|
|
|
|||
|
|
* Deterministic replay
|
|||
|
|
* Snapshot-aware visibility
|
|||
|
|
* Immutable block guarantees
|
|||
|
|
* Idempotent recovery
|
|||
|
|
* Correctness of CURRENT state
|
|||
|
|
|
|||
|
|
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Scope
|
|||
|
|
|
|||
|
|
This specification covers:
|
|||
|
|
|
|||
|
|
* Index segment lifecycle
|
|||
|
|
* Interaction between index and ASL blocks
|
|||
|
|
* Append-only log semantics
|
|||
|
|
* Snapshot integration
|
|||
|
|
* Visibility and lookup rules
|
|||
|
|
* Crash safety and recovery
|
|||
|
|
* Garbage collection constraints
|
|||
|
|
|
|||
|
|
It does **not** cover:
|
|||
|
|
|
|||
|
|
* Disk format details
|
|||
|
|
* Bloom filter algorithms
|
|||
|
|
* File system specifics
|
|||
|
|
* Placement heuristics beyond semantic guarantees
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Core Concepts
|
|||
|
|
|
|||
|
|
### 3.1 Index Segment
|
|||
|
|
|
|||
|
|
A **segment** is a contiguous set of index entries written by the store.
|
|||
|
|
|
|||
|
|
* Open while accepting new entries
|
|||
|
|
* Sealed when closed for append
|
|||
|
|
* Sealed segments are immutable
|
|||
|
|
* Sealed segments are **snapshot-visible only after log record**
|
|||
|
|
|
|||
|
|
Segments are the **unit of persistence, replay, and GC**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.2 ASL Block Relationship
|
|||
|
|
|
|||
|
|
Each index entry references a **sealed block** via:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
ArtifactKey → (BlockID, offset, length)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
* The store must ensure the block is sealed before the entry becomes log-visible
|
|||
|
|
* Blocks are immutable after seal
|
|||
|
|
* Open blocks may be abandoned without violating invariants
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.3 Append-Only Log
|
|||
|
|
|
|||
|
|
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
|||
|
|
|
|||
|
|
* Entries include index additions, tombstones, and segment seals
|
|||
|
|
* Log is durable and replayable
|
|||
|
|
* Log defines visibility above checkpoint snapshots
|
|||
|
|
|
|||
|
|
**CURRENT state** is derived as:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
CURRENT = checkpoint_state + replay(log)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Segment Lifecycle
|
|||
|
|
|
|||
|
|
### 4.1 Creation
|
|||
|
|
|
|||
|
|
* Open segment is allocated
|
|||
|
|
* Index entries appended in log order
|
|||
|
|
* Entries are invisible until segment seal and log append
|
|||
|
|
|
|||
|
|
### 4.2 Seal
|
|||
|
|
|
|||
|
|
* Segment is closed to append
|
|||
|
|
* Seal record is written to append-only log
|
|||
|
|
* Segment becomes visible for lookup
|
|||
|
|
* Sealed segment may be snapshot-pinned
|
|||
|
|
|
|||
|
|
### 4.3 Snapshot Interaction
|
|||
|
|
|
|||
|
|
* Snapshots capture sealed segments
|
|||
|
|
* Open segments need not survive snapshot
|
|||
|
|
* Segments below snapshot are replay anchors
|
|||
|
|
|
|||
|
|
### 4.4 Garbage Collection
|
|||
|
|
|
|||
|
|
* Only **sealed and unreachable segments** can be deleted
|
|||
|
|
* GC operates at segment granularity
|
|||
|
|
* GC must not break CURRENT or violate invariants
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Lookup Semantics
|
|||
|
|
|
|||
|
|
To resolve an `ArtifactKey`:
|
|||
|
|
|
|||
|
|
1. Identify all visible segments ≤ CURRENT
|
|||
|
|
2. Search segments in **reverse creation order** (newest first)
|
|||
|
|
3. Return the first matching entry
|
|||
|
|
4. Respect tombstone entries (if present)
|
|||
|
|
|
|||
|
|
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Visibility Guarantees
|
|||
|
|
|
|||
|
|
* Entry visible **iff**:
|
|||
|
|
|
|||
|
|
* The block is sealed
|
|||
|
|
* Log record exists ≤ CURRENT
|
|||
|
|
* Segment seal recorded in log
|
|||
|
|
* Entries above CURRENT or referencing unsealed blocks are invisible
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Crash and Recovery Semantics
|
|||
|
|
|
|||
|
|
### 7.1 Crash During Open Segment
|
|||
|
|
|
|||
|
|
* Open segments may be lost
|
|||
|
|
* Index entries may be leaked
|
|||
|
|
* No sealed segment may be corrupted
|
|||
|
|
|
|||
|
|
### 7.2 Recovery Procedure
|
|||
|
|
|
|||
|
|
1. Mount latest checkpoint snapshot
|
|||
|
|
2. Replay append-only log from checkpoint
|
|||
|
|
3. Rebuild CURRENT
|
|||
|
|
4. Resume normal operation
|
|||
|
|
|
|||
|
|
Recovery must be **deterministic and idempotent**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Tombstone Semantics
|
|||
|
|
|
|||
|
|
* Optional: tombstones may exist to invalidate prior mappings
|
|||
|
|
* Tombstones shadow prior entries with the same `ArtifactKey`
|
|||
|
|
* Tombstone visibility follows same rules as regular entries
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Invariants (Normative)
|
|||
|
|
|
|||
|
|
The store **must enforce**:
|
|||
|
|
|
|||
|
|
1. No segment visible without seal log record
|
|||
|
|
2. No mutation of sealed segment or block
|
|||
|
|
3. Shadowing follows log order strictly
|
|||
|
|
4. Replay uniquely reconstructs CURRENT
|
|||
|
|
5. GC does not remove segments referenced by snapshot or log
|
|||
|
|
6. ArtifactLocation always points to immutable bytes
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Non-Goals
|
|||
|
|
|
|||
|
|
ASL-STORE-INDEX does **not** define:
|
|||
|
|
|
|||
|
|
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
|
|||
|
|
* Placement heuristics (small vs. large block packing)
|
|||
|
|
* Performance targets
|
|||
|
|
* Memory caching strategies
|
|||
|
|
* Federation or provenance mechanics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Relationship to Other Documents
|
|||
|
|
|
|||
|
|
| Layer | Responsibility |
|
|||
|
|
| ------------------ | -------------------------------------------------------------------- |
|
|||
|
|
| ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` |
|
|||
|
|
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
|
|||
|
|
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Summary
|
|||
|
|
|
|||
|
|
The store-index layer guarantees:
|
|||
|
|
|
|||
|
|
* Immutable, snapshot-safe segments
|
|||
|
|
* Deterministic and idempotent replay
|
|||
|
|
* Correct visibility semantics
|
|||
|
|
* Safe crash recovery
|
|||
|
|
* Garbage collection constraints
|
|||
|
|
|
|||
|
|
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
|
|||
|
|
|
|||
|
|
Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# ASL-STORE-INDEX
|
|||
|
|
|
|||
|
|
### Store Semantics and Contracts for ASL Core Index (Refined)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX.
|
|||
|
|
|
|||
|
|
It specifies:
|
|||
|
|
|
|||
|
|
* **Block lifecycle**: creation, sealing, retention
|
|||
|
|
* **Index segment lifecycle**: creation, append, seal, visibility
|
|||
|
|
* **Snapshot interaction**: pinning, deterministic visibility
|
|||
|
|
* **Append-only log semantics**
|
|||
|
|
* **Garbage collection rules**
|
|||
|
|
|
|||
|
|
It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Scope
|
|||
|
|
|
|||
|
|
Covers:
|
|||
|
|
|
|||
|
|
* Lifecycle of **blocks** and **index entries**
|
|||
|
|
* Snapshot and CURRENT consistency guarantees
|
|||
|
|
* Deterministic replay and recovery
|
|||
|
|
* GC and tombstone semantics
|
|||
|
|
|
|||
|
|
Excludes:
|
|||
|
|
|
|||
|
|
* Disk-level encoding
|
|||
|
|
* Sharding strategies
|
|||
|
|
* Bloom filters or acceleration structures
|
|||
|
|
* Memory residency or caching
|
|||
|
|
* Federation or PEL semantics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Core Concepts
|
|||
|
|
|
|||
|
|
### 3.1 Block
|
|||
|
|
|
|||
|
|
* **Definition:** Immutable storage unit containing artifact bytes.
|
|||
|
|
* **Identifier:** BlockID (opaque, unique)
|
|||
|
|
* **Properties:**
|
|||
|
|
|
|||
|
|
* Once sealed, contents never change
|
|||
|
|
* Can be referenced by multiple artifacts
|
|||
|
|
* May be pinned by snapshots for retention
|
|||
|
|
* **Lifecycle Events:**
|
|||
|
|
|
|||
|
|
1. Creation: block allocated but contents may still be written
|
|||
|
|
2. Sealing: block is finalized, immutable, and log-visible
|
|||
|
|
3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT
|
|||
|
|
4. Garbage collection: block may be deleted if no longer referenced and unpinned
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.2 Index Segment
|
|||
|
|
|
|||
|
|
Segments group index entries and provide **persistence and recovery units**.
|
|||
|
|
|
|||
|
|
* **Open segment:** accepting new index entries, not visible for lookup
|
|||
|
|
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable
|
|||
|
|
* **Segment components:** header, optional bloom filter, index records, footer
|
|||
|
|
* **Segment visibility:** only after seal and log append
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.3 Append-Only Log
|
|||
|
|
|
|||
|
|
All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**:
|
|||
|
|
|
|||
|
|
* Entries include:
|
|||
|
|
|
|||
|
|
* Index additions
|
|||
|
|
* Tombstones
|
|||
|
|
* Segment seals
|
|||
|
|
* Log is replayable to reconstruct CURRENT
|
|||
|
|
* Determinism: replay produces identical CURRENT from same snapshot and log prefix
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Block Lifecycle Semantics
|
|||
|
|
|
|||
|
|
| Event | Description | Semantic Guarantees |
|
|||
|
|
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
|
|||
|
|
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
|
|||
|
|
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
|
|||
|
|
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
|
|||
|
|
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
|
|||
|
|
|
|||
|
|
**Notes:**
|
|||
|
|
|
|||
|
|
* Sealing ensures that any index entry referencing the block is deterministic and immutable.
|
|||
|
|
* Retention is driven by snapshot and log visibility rules.
|
|||
|
|
* GC must **never violate CURRENT reconstruction guarantees**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Snapshot Interaction
|
|||
|
|
|
|||
|
|
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
|
|||
|
|
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
|
|||
|
|
* CURRENT is reconstructed as:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
CURRENT = snapshot_state + replay(log)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
* Segment and block visibility rules:
|
|||
|
|
|
|||
|
|
| Entity | Visible in snapshot | Visible in CURRENT |
|
|||
|
|
| -------------------- | ---------------------------- | ------------------------------ |
|
|||
|
|
| Open segment/block | No | Only after seal and log append |
|
|||
|
|
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
|
|||
|
|
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Index Lookup Semantics
|
|||
|
|
|
|||
|
|
To resolve an `ArtifactKey`:
|
|||
|
|
|
|||
|
|
1. Identify all visible segments ≤ CURRENT
|
|||
|
|
2. Search segments in **reverse creation order** (newest first)
|
|||
|
|
3. Return first matching entry
|
|||
|
|
4. Respect tombstones to shadow prior entries
|
|||
|
|
|
|||
|
|
Determinism:
|
|||
|
|
|
|||
|
|
* Lookup results are identical across platforms given the same snapshot and log prefix
|
|||
|
|
* Accelerations (bloom filters, sharding, SIMD) do **not alter correctness**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Garbage Collection
|
|||
|
|
|
|||
|
|
* **Eligibility for GC:**
|
|||
|
|
|
|||
|
|
* Segments: sealed, no references from CURRENT or snapshots
|
|||
|
|
* Blocks: unpinned, unreferenced by any segment or artifact
|
|||
|
|
* **Rules:**
|
|||
|
|
|
|||
|
|
* GC is safe **only on sealed segments and blocks**
|
|||
|
|
* Must respect snapshot pins
|
|||
|
|
* Tombstones may aid in invalidating unreachable blocks
|
|||
|
|
* **Outcome:**
|
|||
|
|
|
|||
|
|
* GC never violates CURRENT reconstruction
|
|||
|
|
* Blocks can be reclaimed without breaking provenance
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Tombstone Semantics
|
|||
|
|
|
|||
|
|
* Optional marker to invalidate prior mappings
|
|||
|
|
* Visibility rules identical to regular index entries
|
|||
|
|
* Used to maintain deterministic CURRENT in face of shadowing or deletions
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Crash and Recovery Semantics
|
|||
|
|
|
|||
|
|
* Open segments or unsealed blocks may be lost; no invariant is broken
|
|||
|
|
* Recovery procedure:
|
|||
|
|
|
|||
|
|
1. Mount last checkpoint snapshot
|
|||
|
|
2. Replay append-only log
|
|||
|
|
3. Reconstruct CURRENT
|
|||
|
|
* Recovery is **deterministic and idempotent**
|
|||
|
|
* Segments and blocks **never partially visible** after crash
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Normative Invariants
|
|||
|
|
|
|||
|
|
1. Sealed blocks are immutable
|
|||
|
|
2. Index entries referencing blocks are immutable once visible
|
|||
|
|
3. Shadowing follows strict log order
|
|||
|
|
4. Replay of snapshot + log uniquely reconstructs CURRENT
|
|||
|
|
5. GC cannot remove blocks or segments needed by snapshot or CURRENT
|
|||
|
|
6. Tombstones shadow prior entries without deleting underlying blocks prematurely
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Non-Goals
|
|||
|
|
|
|||
|
|
* Disk-level encoding (ENC-ASL-CORE-INDEX)
|
|||
|
|
* Memory layout or caching
|
|||
|
|
* Sharding or performance heuristics
|
|||
|
|
* Federation / multi-domain semantics (handled elsewhere)
|
|||
|
|
* Block packing strategies (small vs large blocks)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Relationship to Other Layers
|
|||
|
|
|
|||
|
|
| Layer | Responsibility |
|
|||
|
|
| ------------------ | ---------------------------------------------------------------------------- |
|
|||
|
|
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
|
|||
|
|
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
|
|||
|
|
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
|
|||
|
|
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 13. Summary
|
|||
|
|
|
|||
|
|
The refined ASL-STORE-INDEX:
|
|||
|
|
|
|||
|
|
* Defines **block lifecycle**: creation, sealing, retention, GC
|
|||
|
|
* Ensures **snapshot safety** and deterministic visibility
|
|||
|
|
* Guarantees **immutable, replayable, and recoverable CURRENT**
|
|||
|
|
* Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics
|
|||
|
|
|
|||
|
|
|