10 KiB
ASL-STORE-INDEX
Store Semantics and Contracts for ASL Core Index (Tier1)
1. Purpose
This document defines the operational and store-level semantics required to implement ASL-CORE-INDEX.
It specifies:
- Block lifecycle: creation, sealing, retention, GC
- Index segment lifecycle: creation, append, seal, visibility
- Snapshot identity and log positions for deterministic replay
- Append-only log semantics
- Lookup, visibility, and crash recovery rules
- Small vs large block handling
It does not define encoding (see ENC-ASL-CORE-INDEX at tier1/enc-asl-core-index.md) or semantic mapping (see ASL/1-CORE-INDEX).
Informative references:
ASL/SYSTEM/1— unified system view (PEL/TGK/federation alignment)
2. Scope
Covers:
- Lifecycle of blocks and index entries
- Snapshot and CURRENT consistency guarantees
- Deterministic replay and recovery
- GC and tombstone semantics
- Packing policy for small vs large artifacts
Excludes:
- Disk-level encoding
- Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
- Memory residency or caching
- Federation or PEL semantics
3. Core Concepts
3.1 Block
-
Definition: Immutable storage unit containing artifact bytes.
-
Identifier: BlockID (opaque, unique).
-
Properties:
- Once sealed, contents never change.
- Can be referenced by multiple artifacts.
- May be pinned by snapshots for retention.
3.2 Index Segment
Segments group index entries and provide persistence and recovery units.
- Open segment: accepting new index entries, not visible for lookup.
- Sealed segment: closed for append, log-visible, snapshot-pinnable.
- Segment components: header, optional bloom filter, index records, footer.
- Segment visibility: only after seal and log append.
3.3 Append-Only Log
All store-visible mutations are recorded in a strictly ordered, append-only log:
-
Entries include:
- Index additions
- Tombstones
- Segment seals
-
Log is replayable to reconstruct CURRENT.
-
Log semantics are defined in
ASL/LOG/1.
3.4 Snapshot Identity and Log Position
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
- SnapshotID: opaque, immutable identifier for a snapshot.
- LogPosition: monotonic integer position in the append-only log.
- IndexState:
(SnapshotID, LogPosition).
Deterministic replay is defined as:
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
3.5 Artifact Location
- ArtifactExtent:
(BlockID, offset, length)identifying a byte slice within a block. - ArtifactLocation: ordered list of
ArtifactExtentvalues that, when concatenated, produce the artifact bytes. - Multi-extent locations allow a single artifact to be striped across multiple blocks.
4. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
|---|---|---|
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
- Sealing ensures any index entry referencing the block is immutable.
- Retention is driven by snapshot and log visibility rules.
- GC must never violate CURRENT reconstruction guarantees.
5. Segment Lifecycle Semantics
5.1 Creation
- Open segment is allocated.
- Index entries appended in log order.
- Entries are invisible until segment seal and log append.
5.2 Seal
- Segment is closed to append.
- Seal record is written to append-only log.
- Segment becomes visible for lookup.
- Sealed segment may be snapshot-pinned.
5.3 Snapshot Interaction
- Snapshots capture sealed segments.
- Open segments need not survive snapshot.
- Segments below snapshot are replay anchors.
6. Visibility and Lookup Semantics
6.1 Visibility Rules
-
Entry visible iff:
- The block is sealed.
- Log record exists at position ≤ CURRENT.
- Segment seal recorded in log.
-
Entries above CURRENT or referencing unsealed blocks are invisible.
6.2 Lookup Semantics
To resolve an ArtifactKey:
- Identify all visible segments ≤ CURRENT.
- Search segments in reverse seal-log order (highest seal log position first).
- Return first matching entry.
- Respect tombstones to shadow prior entries.
Determinism:
- Lookup results are identical across platforms given the same snapshot and log prefix.
- Accelerations (bloom filters, sharding, SIMD) do not alter correctness.
7. Snapshot Interaction
- Snapshots capture the set of sealed blocks and sealed index segments at a point in time.
- Blocks referenced by a snapshot are pinned and cannot be garbage-collected until snapshot expiration.
- CURRENT is reconstructed as:
CURRENT = snapshot_state + replay(log)
Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
|---|---|---|
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
8. Garbage Collection
Eligibility for GC:
- Segments: sealed, no references from CURRENT or snapshots.
- Blocks: unpinned, unreferenced by any segment or artifact.
Rules:
- GC is safe only on sealed segments and blocks.
- Must respect snapshot pins.
- Tombstones may aid in invalidating unreachable blocks.
Outcome:
- GC never violates CURRENT reconstruction.
- Blocks can be reclaimed without breaking provenance.
9. Tombstone Semantics
- Optional marker to invalidate prior mappings.
- Visibility rules identical to regular index entries.
- Used to maintain deterministic CURRENT in face of shadowing or deletions.
10. Small vs Large Block Handling
10.1 Definitions
| Term | Meaning |
|---|---|
| Small block | Block containing artifact bytes below a threshold T_small. |
| Large block | Block containing artifact bytes ≥ T_small. |
| Mixed segment | Segment containing both small and large blocks (discouraged). |
| Packing | Combining multiple small artifacts into a single physical block. |
Small vs large classification is store-level only and transparent to ASL-CORE and index layers.
10.2 Packing Rules
- Small blocks may be packed together to reduce storage overhead.
- Large blocks are never packed with other artifacts.
- Mixed segments are allowed but discouraged; index semantics remain identical.
10.3 Segment Allocation Rules
- Small blocks are allocated into segments optimized for packing efficiency.
- Large blocks are allocated into segments optimized for sequential I/O.
- Segment sealing and visibility rules remain unchanged.
10.4 Indexing and Addressing
All blocks are addressed uniformly:
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
Packing does not affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
10.5 GC and Retention
- Packed small blocks can be reclaimed only when all contained artifacts are unreachable.
- Large blocks are reclaimed per block.
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
11. Crash and Recovery Semantics
-
Open segments or unsealed blocks may be lost; no invariant is broken.
-
Recovery procedure:
- Mount last checkpoint snapshot.
- Replay append-only log from checkpoint.
- Reconstruct CURRENT.
-
Recovery is deterministic and idempotent.
-
Segments and blocks never partially visible after crash.
12. Normative Invariants
- Sealed blocks are immutable.
- Index entries referencing blocks are immutable once visible.
- Shadowing follows strict log order.
- Replay of snapshot + log uniquely reconstructs CURRENT.
- GC cannot remove blocks or segments needed by snapshot or CURRENT.
- Tombstones shadow prior entries without deleting underlying blocks prematurely.
- IndexState
(SnapshotID, LogPosition)uniquely identifies CURRENT.
13. Non-Goals
- Disk-level encoding (ENC-ASL-CORE-INDEX).
- Memory layout or caching.
- Sharding or performance heuristics.
- Federation / multi-domain semantics (handled elsewhere).
- Block packing strategies beyond the policy rules here.
14. Relationship to Other Layers
| Layer | Responsibility |
|---|---|
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
15. Summary
The tier1 ASL-STORE-INDEX specification:
- Defines block lifecycle and segment lifecycle.
- Makes snapshot identity and log positions explicit for replay.
- Ensures deterministic visibility, lookup, and crash recovery.
- Formalizes GC safety and tombstone behavior.
- Adds clear small vs large block handling without changing core semantics.