427 lines
14 KiB
Markdown
427 lines
14 KiB
Markdown
# ASL/STORE-INDEX/1 — Store Semantics and Contracts for ASL Core Index
|
|
|
|
Status: Draft
|
|
Owner: Niklas Rydberg
|
|
Version: 0.1.0
|
|
SoT: No
|
|
Last Updated: 2025-11-16
|
|
Linked Phase Pack: N/A
|
|
Tags: [deterministic, index, log, storage]
|
|
|
|
<!-- Source: /amduat-api/tier1/asl-store-index.md | Canonical: /amduat/tier1/asl-store-index-1.md -->
|
|
|
|
**Document ID:** `ASL/STORE-INDEX/1`
|
|
**Layer:** L1 — Store lifecycle and replay contracts (no encoding)
|
|
|
|
**Depends on (normative):**
|
|
|
|
* `ASL/1-CORE-INDEX` — semantic index model
|
|
* `ASL/LOG/1` — append-only log semantics
|
|
|
|
**Informative references:**
|
|
|
|
* `ENC/ASL-CORE-INDEX/1` — index segment encoding
|
|
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
|
|
* `TGK/1` — TGK semantics and visibility alignment
|
|
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
|
|
|
|
© 2025 Niklas Rydberg.
|
|
|
|
## License
|
|
|
|
Except where otherwise noted, this document (text and diagrams) is licensed under
|
|
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
|
|
|
|
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
|
|
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
|
|
Universal (CC0) to enable unrestricted reuse in implementations and derivative
|
|
specifications.
|
|
|
|
Code examples in this document are provided under the Apache License 2.0 unless
|
|
explicitly stated otherwise. Test vectors, where present, are dedicated to the
|
|
public domain under CC0 1.0.
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
|
|
|
|
It specifies:
|
|
|
|
* **Block lifecycle**: creation, sealing, retention, GC
|
|
* **Index segment lifecycle**: creation, append, seal, visibility
|
|
* **Snapshot identity and log positions** for deterministic replay
|
|
* **Append-only log semantics**
|
|
* **Lookup, visibility, and crash recovery rules**
|
|
* **Small vs large block handling**
|
|
|
|
It **does not define encoding** (see `ENC/ASL-CORE-INDEX/1`) or semantic mapping (see `ASL/1-CORE-INDEX`).
|
|
|
|
**Informative references:**
|
|
|
|
* `ASL/SYSTEM/1` — unified system view (PEL/TGK/federation alignment)
|
|
* `TGK/1` — TGK semantics and visibility alignment
|
|
* `TGK/1-CORE` — EdgeBody and EdgeTypeId definitions
|
|
|
|
---
|
|
|
|
## 2. Scope
|
|
|
|
Covers:
|
|
|
|
* Lifecycle of **blocks** and **index entries**
|
|
* Snapshot and CURRENT consistency guarantees
|
|
* Deterministic replay and recovery
|
|
* GC and tombstone semantics
|
|
* Packing policy for small vs large artifacts
|
|
|
|
Excludes:
|
|
|
|
* Disk-level encoding
|
|
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
|
|
* Memory residency or caching
|
|
* Federation, PEL, or TGK semantics (see `TGK/1` and `TGK/1-CORE`)
|
|
|
|
---
|
|
|
|
## 3. Core Concepts
|
|
|
|
### 3.1 Block
|
|
|
|
* **Definition:** Immutable storage unit containing artifact bytes.
|
|
* **Identifier:** BlockID (opaque, unique).
|
|
* **Properties:**
|
|
|
|
* Once sealed, contents never change.
|
|
* Can be referenced by multiple artifacts.
|
|
* May be pinned by snapshots for retention.
|
|
* Allocation method is implementation-defined (e.g., hash or sequence).
|
|
|
|
### 3.2 Index Segment
|
|
|
|
Segments group index entries and provide **persistence and recovery units**.
|
|
|
|
* **Open segment:** accepting new index entries, not visible for lookup.
|
|
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
|
|
* **Segment components:** header, optional bloom filter, index records, footer.
|
|
* **Segment visibility:** only after seal and log append.
|
|
|
|
### 3.3 Append-Only Log
|
|
|
|
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
|
|
|
* Entries include:
|
|
|
|
* Index additions
|
|
* Tombstones
|
|
* Segment seals
|
|
* Log is replayable to reconstruct CURRENT.
|
|
* Log semantics are defined in `ASL/LOG/1`.
|
|
|
|
### 3.4 Snapshot Identity and Log Position
|
|
|
|
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
|
|
|
|
* **SnapshotID**: opaque, immutable identifier for a snapshot.
|
|
* **LogPosition**: monotonic integer position in the append-only log.
|
|
* **IndexState**: `(SnapshotID, LogPosition)`.
|
|
|
|
Deterministic replay is defined as:
|
|
|
|
```
|
|
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
|
|
```
|
|
|
|
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
|
|
|
|
**Implementation note (determinism):** This repository interprets `LogPosition`
|
|
as the inclusive `logseq` upper bound defined by `ASL/LOG/1`, not a byte offset
|
|
into the log file. Snapshot anchors use their record `logseq` as the snapshot's
|
|
log position.
|
|
|
|
### 3.5 Artifact Location
|
|
|
|
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
|
|
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
|
|
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
|
|
|
|
---
|
|
|
|
## 4. PUT/GET Contract (Normative)
|
|
|
|
### 4.1 PUT Signature
|
|
|
|
```
|
|
put(artifact) -> (ArtifactKey, IndexState)
|
|
```
|
|
|
|
* `ArtifactKey` is the content identity (ASL/1-CORE-INDEX).
|
|
* `IndexState = (SnapshotID, LogPosition)` after the PUT is admitted.
|
|
|
|
### 4.2 PUT Semantics
|
|
|
|
1. **Structural registration (if applicable)**: if a structural index (SID -> DAG) exists, it MUST register the artifact and reuse existing SID entries.
|
|
2. **Materialization (if applicable)**: if the artifact is lazy, materialize deterministically to derive `ArtifactKey`.
|
|
3. **Deduplication**: lookup `ArtifactKey` at CURRENT. If present, PUT MUST succeed without writing bytes or adding a new index entry.
|
|
4. **Storage**: if absent, write bytes to one or more sealed blocks and produce `ArtifactLocation`.
|
|
5. **Index mutation**: append an index entry mapping `ArtifactKey -> ArtifactLocation` and record visibility via log order.
|
|
|
|
### 4.3 PUT Guarantees
|
|
|
|
* PUT is idempotent for identical artifacts.
|
|
* No visible index entry points to mutable or missing bytes.
|
|
* Visibility follows log order and seal rules defined in this document.
|
|
|
|
### 4.4 GET Signature
|
|
|
|
```
|
|
get(ArtifactKey, IndexState?) -> bytes | NOT_FOUND
|
|
```
|
|
|
|
* `IndexState` defaults to CURRENT when omitted.
|
|
|
|
### 4.5 GET Semantics
|
|
|
|
1. Resolve `ArtifactKey -> ArtifactLocation` using `Index(snapshot, log_prefix)`.
|
|
2. If no entry exists, return `NOT_FOUND`.
|
|
3. Otherwise, read exactly the referenced `(BlockID, offset, length)` bytes and return them verbatim.
|
|
|
|
GET MUST NOT mutate state or trigger materialization.
|
|
|
|
### 4.6 Failure Semantics
|
|
|
|
* Partial writes MUST NOT become visible.
|
|
* Replay of snapshot + log after crash MUST reconstruct a valid CURRENT.
|
|
* Implementations MAY use caching, but MUST preserve determinism.
|
|
|
|
---
|
|
|
|
## 5. Block Lifecycle Semantics
|
|
|
|
| Event | Description | Semantic Guarantees |
|
|
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
|
|
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
|
|
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
|
|
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
|
|
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
|
|
|
|
Notes:
|
|
|
|
* Sealing ensures any index entry referencing the block is immutable.
|
|
* Retention is driven by snapshot and log visibility rules.
|
|
* GC must **never violate CURRENT reconstruction guarantees**.
|
|
|
|
---
|
|
|
|
## 6. Segment Lifecycle Semantics
|
|
|
|
### 5.1 Creation
|
|
|
|
* Open segment is allocated.
|
|
* Index entries appended in log order.
|
|
* Entries are invisible until segment seal and log append.
|
|
|
|
### 5.2 Seal
|
|
|
|
* Segment is closed to append.
|
|
* Seal record is written to append-only log.
|
|
* Segment becomes visible for lookup.
|
|
* Sealed segment may be snapshot-pinned.
|
|
|
|
### 5.3 Snapshot Interaction
|
|
|
|
* Snapshots capture sealed segments.
|
|
* Open segments need not survive snapshot.
|
|
* Segments below snapshot are replay anchors.
|
|
|
|
---
|
|
|
|
## 7. Visibility and Lookup Semantics
|
|
|
|
### 6.1 Visibility Rules
|
|
|
|
* Entry visible **iff**:
|
|
|
|
* The block is sealed.
|
|
* Log record exists at position ≤ CURRENT.
|
|
* Segment seal recorded in log.
|
|
|
|
* Entries above CURRENT or referencing unsealed blocks are invisible.
|
|
|
|
### 6.2 Lookup Semantics
|
|
|
|
To resolve an `ArtifactKey`:
|
|
|
|
1. Identify all visible segments ≤ CURRENT.
|
|
2. Search segments in **reverse seal-log order** (highest seal log position first).
|
|
3. Return first matching entry.
|
|
4. Respect tombstones to shadow prior entries.
|
|
|
|
Determinism:
|
|
|
|
* Lookup results are identical across platforms given the same snapshot and log prefix.
|
|
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
|
|
|
|
---
|
|
|
|
## 8. Snapshot Interaction
|
|
|
|
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
|
|
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
|
|
* CURRENT is reconstructed as:
|
|
|
|
```
|
|
CURRENT = snapshot_state + replay(log)
|
|
```
|
|
|
|
Segment and block visibility rules:
|
|
|
|
| Entity | Visible in snapshot | Visible in CURRENT |
|
|
| -------------------- | ---------------------------- | ------------------------------ |
|
|
| Open segment/block | No | Only after seal and log append |
|
|
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
|
|
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
|
|
|
|
---
|
|
|
|
## 9. Garbage Collection
|
|
|
|
Eligibility for GC:
|
|
|
|
* Segments: sealed, no references from CURRENT or snapshots.
|
|
* Blocks: unpinned, unreferenced by any segment or artifact.
|
|
|
|
Rules:
|
|
|
|
* GC is safe **only on sealed segments and blocks**.
|
|
* Must respect snapshot pins.
|
|
* Tombstones may aid in invalidating unreachable blocks.
|
|
* Snapshots retained for provenance or receipt verification MUST remain pinned.
|
|
|
|
Outcome:
|
|
|
|
* GC never violates CURRENT reconstruction.
|
|
* Blocks can be reclaimed without breaking provenance.
|
|
|
|
---
|
|
|
|
## 10. Tombstone Semantics
|
|
|
|
* Optional marker to invalidate prior mappings.
|
|
* Visibility rules identical to regular index entries.
|
|
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
|
|
* `scope` and `reason_code` are policy metadata only; they do not affect
|
|
shadowing order or replay determinism.
|
|
* Tombstone lifts cancel only the referenced tombstone record for the same
|
|
artifact; other tombstones remain effective until lifted.
|
|
* Snapshot + log replay applies tombstones and lifts in `logseq` order; a lift
|
|
that occurs after a snapshot becomes effective only when replay reaches its
|
|
`logseq`.
|
|
|
|
---
|
|
|
|
## 11. Small vs Large Block Handling
|
|
|
|
### 11.1 Definitions
|
|
|
|
| Term | Meaning |
|
|
| ----------------- | --------------------------------------------------------------------- |
|
|
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
|
|
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
|
|
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
|
|
| **Packing** | Combining multiple small artifacts into a single physical block. |
|
|
| **BlockID** | Opaque identifier for a block; addressing is identical for all sizes. |
|
|
|
|
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
|
|
`T_small` is configurable per deployment.
|
|
|
|
### 11.2 Packing Rules
|
|
|
|
1. **Small blocks may be packed together** to reduce storage overhead.
|
|
2. **Large blocks are never packed with other artifacts**.
|
|
3. Mixed segments are **allowed but discouraged**; implementations MAY warn when mixing occurs.
|
|
|
|
### 11.3 Segment Allocation Rules
|
|
|
|
1. Small blocks are allocated into segments optimized for packing efficiency.
|
|
2. Large blocks are allocated into segments optimized for sequential I/O.
|
|
3. Segment sealing and visibility rules remain unchanged.
|
|
|
|
### 11.4 Indexing and Addressing
|
|
|
|
All blocks are addressed uniformly:
|
|
|
|
```
|
|
ArtifactExtent = (BlockID, offset, length)
|
|
ArtifactLocation = [ArtifactExtent...]
|
|
```
|
|
|
|
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
|
|
|
|
### 11.5 GC and Retention
|
|
|
|
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
|
|
2. Large blocks are reclaimed per block.
|
|
|
|
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
|
|
|
|
---
|
|
|
|
## 12. Crash and Recovery Semantics
|
|
|
|
* Open segments or unsealed blocks may be lost; no invariant is broken.
|
|
* Recovery procedure:
|
|
|
|
1. Mount last checkpoint snapshot.
|
|
2. Replay append-only log from checkpoint.
|
|
3. Reconstruct CURRENT.
|
|
|
|
* Recovery is **deterministic and idempotent**.
|
|
* Segments and blocks **never partially visible** after crash.
|
|
|
|
---
|
|
|
|
## 13. Normative Invariants
|
|
|
|
1. Sealed blocks are immutable.
|
|
2. Index entries referencing blocks are immutable once visible.
|
|
3. Shadowing follows strict log order.
|
|
4. Replay of snapshot + log uniquely reconstructs CURRENT.
|
|
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
|
|
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
|
|
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
|
|
|
|
---
|
|
|
|
## 14. Non-Goals
|
|
|
|
* Disk-level encoding (ENC-ASL-CORE-INDEX).
|
|
* Memory layout or caching.
|
|
* Sharding or performance heuristics.
|
|
* Federation / multi-domain semantics (handled elsewhere).
|
|
* Block packing strategies beyond the policy rules here.
|
|
|
|
---
|
|
|
|
## 15. Relationship to Other Layers
|
|
|
|
| Layer | Responsibility |
|
|
| ------------------ | ---------------------------------------------------------------------------- |
|
|
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
|
|
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
|
|
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
|
|
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
|
|
|
|
---
|
|
|
|
## 16. Summary
|
|
|
|
The tier1 ASL-STORE-INDEX specification:
|
|
|
|
* Defines **block lifecycle** and **segment lifecycle**.
|
|
* Makes **snapshot identity and log positions** explicit for replay.
|
|
* Ensures deterministic visibility, lookup, and crash recovery.
|
|
* Formalizes GC safety and tombstone behavior.
|
|
* Adds clear **small vs large block** handling without changing core semantics.
|