amduat-api/notes/asl-store-index.md
2026-01-17 00:19:49 +01:00

12 KiB
Raw Blame History

ASL-STORE-INDEX

Store Semantics and Contracts for ASL Index


1. Purpose

This document defines the store-level responsibilities and contracts required to implement the ASL-CORE-INDEX semantics.

It bridges the gap between index meaning and physical storage, ensuring:

  • Deterministic replay
  • Snapshot-aware visibility
  • Immutable block guarantees
  • Idempotent recovery
  • Correctness of CURRENT state

It does not define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).


2. Scope

This specification covers:

  • Index segment lifecycle
  • Interaction between index and ASL blocks
  • Append-only log semantics
  • Snapshot integration
  • Visibility and lookup rules
  • Crash safety and recovery
  • Garbage collection constraints

It does not cover:

  • Disk format details
  • Bloom filter algorithms
  • File system specifics
  • Placement heuristics beyond semantic guarantees

3. Core Concepts

3.1 Index Segment

A segment is a contiguous set of index entries written by the store.

  • Open while accepting new entries
  • Sealed when closed for append
  • Sealed segments are immutable
  • Sealed segments are snapshot-visible only after log record

Segments are the unit of persistence, replay, and GC.


3.2 ASL Block Relationship

Each index entry references a sealed block via:

ArtifactKey → (BlockID, offset, length)
  • The store must ensure the block is sealed before the entry becomes log-visible
  • Blocks are immutable after seal
  • Open blocks may be abandoned without violating invariants

3.3 Append-Only Log

All store-visible mutations are recorded in a strictly ordered, append-only log:

  • Entries include index additions, tombstones, and segment seals
  • Log is durable and replayable
  • Log defines visibility above checkpoint snapshots

CURRENT state is derived as:

CURRENT = checkpoint_state + replay(log)

4. Segment Lifecycle

4.1 Creation

  • Open segment is allocated
  • Index entries appended in log order
  • Entries are invisible until segment seal and log append

4.2 Seal

  • Segment is closed to append
  • Seal record is written to append-only log
  • Segment becomes visible for lookup
  • Sealed segment may be snapshot-pinned

4.3 Snapshot Interaction

  • Snapshots capture sealed segments
  • Open segments need not survive snapshot
  • Segments below snapshot are replay anchors

4.4 Garbage Collection

  • Only sealed and unreachable segments can be deleted
  • GC operates at segment granularity
  • GC must not break CURRENT or violate invariants

5. Lookup Semantics

To resolve an ArtifactKey:

  1. Identify all visible segments ≤ CURRENT
  2. Search segments in reverse creation order (newest first)
  3. Return the first matching entry
  4. Respect tombstone entries (if present)

Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, but correctness must be independent of acceleration strategies.


6. Visibility Guarantees

  • Entry visible iff:

    • The block is sealed
    • Log record exists ≤ CURRENT
    • Segment seal recorded in log
  • Entries above CURRENT or referencing unsealed blocks are invisible


7. Crash and Recovery Semantics

7.1 Crash During Open Segment

  • Open segments may be lost
  • Index entries may be leaked
  • No sealed segment may be corrupted

7.2 Recovery Procedure

  1. Mount latest checkpoint snapshot
  2. Replay append-only log from checkpoint
  3. Rebuild CURRENT
  4. Resume normal operation

Recovery must be deterministic and idempotent.


8. Tombstone Semantics

  • Optional: tombstones may exist to invalidate prior mappings
  • Tombstones shadow prior entries with the same ArtifactKey
  • Tombstone visibility follows same rules as regular entries

9. Invariants (Normative)

The store must enforce:

  1. No segment visible without seal log record
  2. No mutation of sealed segment or block
  3. Shadowing follows log order strictly
  4. Replay uniquely reconstructs CURRENT
  5. GC does not remove segments referenced by snapshot or log
  6. ArtifactLocation always points to immutable bytes

10. Non-Goals

ASL-STORE-INDEX does not define:

  • Disk layout or encoding (ENC-ASL-CORE-INDEX)
  • Placement heuristics (small vs. large block packing)
  • Performance targets
  • Memory caching strategies
  • Federation or provenance mechanics

11. Relationship to Other Documents

Layer Responsibility
ASL-CORE-INDEX Defines semantic meaning of mapping ArtifactKey → ArtifactLocation
ASL-STORE-INDEX Defines contracts for store to realize those semantics
ENC-ASL-CORE-INDEX Defines bytes-on-disk format

12. Summary

The store-index layer guarantees:

  • Immutable, snapshot-safe segments
  • Deterministic and idempotent replay
  • Correct visibility semantics
  • Safe crash recovery
  • Garbage collection constraints

This specification ensures that ASL-CORE-INDEX semantics are faithfully realized in the store without constraining encoding or acceleration strategies.

Heres a fully refined version of ASL-STORE-INDEX, incorporating block lifecycle, sealing, snapshot safety, retention, and GC rules, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.


ASL-STORE-INDEX

Store Semantics and Contracts for ASL Core Index (Refined)


1. Purpose

This document defines the operational and store-level semantics necessary to implement ASL-CORE-INDEX.

It specifies:

  • Block lifecycle: creation, sealing, retention
  • Index segment lifecycle: creation, append, seal, visibility
  • Snapshot interaction: pinning, deterministic visibility
  • Append-only log semantics
  • Garbage collection rules

It does not define encoding (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).


2. Scope

Covers:

  • Lifecycle of blocks and index entries
  • Snapshot and CURRENT consistency guarantees
  • Deterministic replay and recovery
  • GC and tombstone semantics

Excludes:

  • Disk-level encoding
  • Sharding strategies
  • Bloom filters or acceleration structures
  • Memory residency or caching
  • Federation or PEL semantics

3. Core Concepts

3.1 Block

  • Definition: Immutable storage unit containing artifact bytes.

  • Identifier: BlockID (opaque, unique)

  • Properties:

    • Once sealed, contents never change
    • Can be referenced by multiple artifacts
    • May be pinned by snapshots for retention
  • Lifecycle Events:

    1. Creation: block allocated but contents may still be written
    2. Sealing: block is finalized, immutable, and log-visible
    3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT
    4. Garbage collection: block may be deleted if no longer referenced and unpinned

3.2 Index Segment

Segments group index entries and provide persistence and recovery units.

  • Open segment: accepting new index entries, not visible for lookup
  • Sealed segment: closed for append, log-visible, snapshot-pinnable
  • Segment components: header, optional bloom filter, index records, footer
  • Segment visibility: only after seal and log append

3.3 Append-Only Log

All store operations affecting index visibility are recorded in a strictly ordered, append-only log:

  • Entries include:

    • Index additions
    • Tombstones
    • Segment seals
  • Log is replayable to reconstruct CURRENT

  • Determinism: replay produces identical CURRENT from same snapshot and log prefix


4. Block Lifecycle Semantics

Event Description Semantic Guarantees
Creation Block allocated; bytes may be written Not visible to index until sealed
Sealing Block is finalized and immutable Sealed blocks are stable and safe to reference from index
Retention Block remains accessible Blocks referenced by snapshots or CURRENT must not be removed
Garbage Collection Block may be deleted Only unpinned, unreachable blocks may be removed

Notes:

  • Sealing ensures that any index entry referencing the block is deterministic and immutable.
  • Retention is driven by snapshot and log visibility rules.
  • GC must never violate CURRENT reconstruction guarantees.

5. Snapshot Interaction

  • Snapshots capture the set of sealed blocks and sealed index segments at a point in time.
  • Blocks referenced by a snapshot are pinned and cannot be garbage-collected until snapshot expiration.
  • CURRENT is reconstructed as:
CURRENT = snapshot_state + replay(log)
  • Segment and block visibility rules:
Entity Visible in snapshot Visible in CURRENT
Open segment/block No Only after seal and log append
Sealed segment/block Yes, if included in snapshot Yes, replayed from log
Tombstone Yes, if log-recorded Yes, shadows prior entries

6. Index Lookup Semantics

To resolve an ArtifactKey:

  1. Identify all visible segments ≤ CURRENT
  2. Search segments in reverse creation order (newest first)
  3. Return first matching entry
  4. Respect tombstones to shadow prior entries

Determinism:

  • Lookup results are identical across platforms given the same snapshot and log prefix
  • Accelerations (bloom filters, sharding, SIMD) do not alter correctness

7. Garbage Collection

  • Eligibility for GC:

    • Segments: sealed, no references from CURRENT or snapshots
    • Blocks: unpinned, unreferenced by any segment or artifact
  • Rules:

    • GC is safe only on sealed segments and blocks
    • Must respect snapshot pins
    • Tombstones may aid in invalidating unreachable blocks
  • Outcome:

    • GC never violates CURRENT reconstruction
    • Blocks can be reclaimed without breaking provenance

8. Tombstone Semantics

  • Optional marker to invalidate prior mappings
  • Visibility rules identical to regular index entries
  • Used to maintain deterministic CURRENT in face of shadowing or deletions

9. Crash and Recovery Semantics

  • Open segments or unsealed blocks may be lost; no invariant is broken

  • Recovery procedure:

    1. Mount last checkpoint snapshot
    2. Replay append-only log
    3. Reconstruct CURRENT
  • Recovery is deterministic and idempotent

  • Segments and blocks never partially visible after crash


10. Normative Invariants

  1. Sealed blocks are immutable
  2. Index entries referencing blocks are immutable once visible
  3. Shadowing follows strict log order
  4. Replay of snapshot + log uniquely reconstructs CURRENT
  5. GC cannot remove blocks or segments needed by snapshot or CURRENT
  6. Tombstones shadow prior entries without deleting underlying blocks prematurely

11. Non-Goals

  • Disk-level encoding (ENC-ASL-CORE-INDEX)
  • Memory layout or caching
  • Sharding or performance heuristics
  • Federation / multi-domain semantics (handled elsewhere)
  • Block packing strategies (small vs large blocks)

12. Relationship to Other Layers

Layer Responsibility
ASL-CORE Artifact semantics, existence of blocks, immutability
ASL-CORE-INDEX Semantic mapping of ArtifactKey → ArtifactLocation
ASL-STORE-INDEX Lifecycle and operational contracts for blocks and segments
ENC-ASL-CORE-INDEX Bytes-on-disk layout for segments, index records, and optional bloom filters

13. Summary

The refined ASL-STORE-INDEX:

  • Defines block lifecycle: creation, sealing, retention, GC
  • Ensures snapshot safety and deterministic visibility
  • Guarantees immutable, replayable, and recoverable CURRENT
  • Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics