12 KiB
ASL-STORE-INDEX
Store Semantics and Contracts for ASL Index
1. Purpose
This document defines the store-level responsibilities and contracts required to implement the ASL-CORE-INDEX semantics.
It bridges the gap between index meaning and physical storage, ensuring:
- Deterministic replay
- Snapshot-aware visibility
- Immutable block guarantees
- Idempotent recovery
- Correctness of CURRENT state
It does not define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
2. Scope
This specification covers:
- Index segment lifecycle
- Interaction between index and ASL blocks
- Append-only log semantics
- Snapshot integration
- Visibility and lookup rules
- Crash safety and recovery
- Garbage collection constraints
It does not cover:
- Disk format details
- Bloom filter algorithms
- File system specifics
- Placement heuristics beyond semantic guarantees
3. Core Concepts
3.1 Index Segment
A segment is a contiguous set of index entries written by the store.
- Open while accepting new entries
- Sealed when closed for append
- Sealed segments are immutable
- Sealed segments are snapshot-visible only after log record
Segments are the unit of persistence, replay, and GC.
3.2 ASL Block Relationship
Each index entry references a sealed block via:
ArtifactKey → (BlockID, offset, length)
- The store must ensure the block is sealed before the entry becomes log-visible
- Blocks are immutable after seal
- Open blocks may be abandoned without violating invariants
3.3 Append-Only Log
All store-visible mutations are recorded in a strictly ordered, append-only log:
- Entries include index additions, tombstones, and segment seals
- Log is durable and replayable
- Log defines visibility above checkpoint snapshots
CURRENT state is derived as:
CURRENT = checkpoint_state + replay(log)
4. Segment Lifecycle
4.1 Creation
- Open segment is allocated
- Index entries appended in log order
- Entries are invisible until segment seal and log append
4.2 Seal
- Segment is closed to append
- Seal record is written to append-only log
- Segment becomes visible for lookup
- Sealed segment may be snapshot-pinned
4.3 Snapshot Interaction
- Snapshots capture sealed segments
- Open segments need not survive snapshot
- Segments below snapshot are replay anchors
4.4 Garbage Collection
- Only sealed and unreachable segments can be deleted
- GC operates at segment granularity
- GC must not break CURRENT or violate invariants
5. Lookup Semantics
To resolve an ArtifactKey:
- Identify all visible segments ≤ CURRENT
- Search segments in reverse creation order (newest first)
- Return the first matching entry
- Respect tombstone entries (if present)
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, but correctness must be independent of acceleration strategies.
6. Visibility Guarantees
-
Entry visible iff:
- The block is sealed
- Log record exists ≤ CURRENT
- Segment seal recorded in log
-
Entries above CURRENT or referencing unsealed blocks are invisible
7. Crash and Recovery Semantics
7.1 Crash During Open Segment
- Open segments may be lost
- Index entries may be leaked
- No sealed segment may be corrupted
7.2 Recovery Procedure
- Mount latest checkpoint snapshot
- Replay append-only log from checkpoint
- Rebuild CURRENT
- Resume normal operation
Recovery must be deterministic and idempotent.
8. Tombstone Semantics
- Optional: tombstones may exist to invalidate prior mappings
- Tombstones shadow prior entries with the same
ArtifactKey - Tombstone visibility follows same rules as regular entries
9. Invariants (Normative)
The store must enforce:
- No segment visible without seal log record
- No mutation of sealed segment or block
- Shadowing follows log order strictly
- Replay uniquely reconstructs CURRENT
- GC does not remove segments referenced by snapshot or log
- ArtifactLocation always points to immutable bytes
10. Non-Goals
ASL-STORE-INDEX does not define:
- Disk layout or encoding (ENC-ASL-CORE-INDEX)
- Placement heuristics (small vs. large block packing)
- Performance targets
- Memory caching strategies
- Federation or provenance mechanics
11. Relationship to Other Documents
| Layer | Responsibility |
|---|---|
| ASL-CORE-INDEX | Defines semantic meaning of mapping ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
12. Summary
The store-index layer guarantees:
- Immutable, snapshot-safe segments
- Deterministic and idempotent replay
- Correct visibility semantics
- Safe crash recovery
- Garbage collection constraints
This specification ensures that ASL-CORE-INDEX semantics are faithfully realized in the store without constraining encoding or acceleration strategies.
Here’s a fully refined version of ASL-STORE-INDEX, incorporating block lifecycle, sealing, snapshot safety, retention, and GC rules, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
ASL-STORE-INDEX
Store Semantics and Contracts for ASL Core Index (Refined)
1. Purpose
This document defines the operational and store-level semantics necessary to implement ASL-CORE-INDEX.
It specifies:
- Block lifecycle: creation, sealing, retention
- Index segment lifecycle: creation, append, seal, visibility
- Snapshot interaction: pinning, deterministic visibility
- Append-only log semantics
- Garbage collection rules
It does not define encoding (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).
2. Scope
Covers:
- Lifecycle of blocks and index entries
- Snapshot and CURRENT consistency guarantees
- Deterministic replay and recovery
- GC and tombstone semantics
Excludes:
- Disk-level encoding
- Sharding strategies
- Bloom filters or acceleration structures
- Memory residency or caching
- Federation or PEL semantics
3. Core Concepts
3.1 Block
-
Definition: Immutable storage unit containing artifact bytes.
-
Identifier: BlockID (opaque, unique)
-
Properties:
- Once sealed, contents never change
- Can be referenced by multiple artifacts
- May be pinned by snapshots for retention
-
Lifecycle Events:
- Creation: block allocated but contents may still be written
- Sealing: block is finalized, immutable, and log-visible
- Retention: block remains accessible while pinned by snapshots or needed by CURRENT
- Garbage collection: block may be deleted if no longer referenced and unpinned
3.2 Index Segment
Segments group index entries and provide persistence and recovery units.
- Open segment: accepting new index entries, not visible for lookup
- Sealed segment: closed for append, log-visible, snapshot-pinnable
- Segment components: header, optional bloom filter, index records, footer
- Segment visibility: only after seal and log append
3.3 Append-Only Log
All store operations affecting index visibility are recorded in a strictly ordered, append-only log:
-
Entries include:
- Index additions
- Tombstones
- Segment seals
-
Log is replayable to reconstruct CURRENT
-
Determinism: replay produces identical CURRENT from same snapshot and log prefix
4. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
|---|---|---|
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
- Sealing ensures that any index entry referencing the block is deterministic and immutable.
- Retention is driven by snapshot and log visibility rules.
- GC must never violate CURRENT reconstruction guarantees.
5. Snapshot Interaction
- Snapshots capture the set of sealed blocks and sealed index segments at a point in time.
- Blocks referenced by a snapshot are pinned and cannot be garbage-collected until snapshot expiration.
- CURRENT is reconstructed as:
CURRENT = snapshot_state + replay(log)
- Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
|---|---|---|
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
6. Index Lookup Semantics
To resolve an ArtifactKey:
- Identify all visible segments ≤ CURRENT
- Search segments in reverse creation order (newest first)
- Return first matching entry
- Respect tombstones to shadow prior entries
Determinism:
- Lookup results are identical across platforms given the same snapshot and log prefix
- Accelerations (bloom filters, sharding, SIMD) do not alter correctness
7. Garbage Collection
-
Eligibility for GC:
- Segments: sealed, no references from CURRENT or snapshots
- Blocks: unpinned, unreferenced by any segment or artifact
-
Rules:
- GC is safe only on sealed segments and blocks
- Must respect snapshot pins
- Tombstones may aid in invalidating unreachable blocks
-
Outcome:
- GC never violates CURRENT reconstruction
- Blocks can be reclaimed without breaking provenance
8. Tombstone Semantics
- Optional marker to invalidate prior mappings
- Visibility rules identical to regular index entries
- Used to maintain deterministic CURRENT in face of shadowing or deletions
9. Crash and Recovery Semantics
-
Open segments or unsealed blocks may be lost; no invariant is broken
-
Recovery procedure:
- Mount last checkpoint snapshot
- Replay append-only log
- Reconstruct CURRENT
-
Recovery is deterministic and idempotent
-
Segments and blocks never partially visible after crash
10. Normative Invariants
- Sealed blocks are immutable
- Index entries referencing blocks are immutable once visible
- Shadowing follows strict log order
- Replay of snapshot + log uniquely reconstructs CURRENT
- GC cannot remove blocks or segments needed by snapshot or CURRENT
- Tombstones shadow prior entries without deleting underlying blocks prematurely
11. Non-Goals
- Disk-level encoding (ENC-ASL-CORE-INDEX)
- Memory layout or caching
- Sharding or performance heuristics
- Federation / multi-domain semantics (handled elsewhere)
- Block packing strategies (small vs large blocks)
12. Relationship to Other Layers
| Layer | Responsibility |
|---|---|
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
13. Summary
The refined ASL-STORE-INDEX:
- Defines block lifecycle: creation, sealing, retention, GC
- Ensures snapshot safety and deterministic visibility
- Guarantees immutable, replayable, and recoverable CURRENT
- Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics