212 lines
6.9 KiB
Markdown
212 lines
6.9 KiB
Markdown
|
|
# ASL/1-CORE-INDEX — Semantic Index Model
|
|||
|
|
|
|||
|
|
Status: Draft
|
|||
|
|
Owner: Niklas Rydberg
|
|||
|
|
Version: 0.1.0
|
|||
|
|
SoT: No
|
|||
|
|
Last Updated: 2025-11-16
|
|||
|
|
Tags: [deterministic, index, semantics]
|
|||
|
|
|
|||
|
|
**Document ID:** `ASL/1-CORE-INDEX`
|
|||
|
|
**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle)
|
|||
|
|
|
|||
|
|
**Depends on (normative):**
|
|||
|
|
|
|||
|
|
* `ASL/1-CORE`
|
|||
|
|
* `ASL/1-STORE`
|
|||
|
|
|
|||
|
|
**Informative references:**
|
|||
|
|
|
|||
|
|
* `ASL-STORE-INDEX` — store lifecycle and replay contracts
|
|||
|
|
* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`)
|
|||
|
|
* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding)
|
|||
|
|
* `ASL/LOG/1` — append-only semantic log (segment visibility)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 0. Conventions
|
|||
|
|
|
|||
|
|
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
|
|||
|
|
|
|||
|
|
ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose & Non-Goals
|
|||
|
|
|
|||
|
|
### 1.1 Purpose
|
|||
|
|
|
|||
|
|
ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts:
|
|||
|
|
|
|||
|
|
* It specifies what it means to map an artifact identity to a byte location.
|
|||
|
|
* It defines visibility, immutability, and shadowing semantics.
|
|||
|
|
* It ensures deterministic lookup for a fixed snapshot and log prefix.
|
|||
|
|
|
|||
|
|
### 1.2 Non-goals
|
|||
|
|
|
|||
|
|
ASL/1-CORE-INDEX explicitly does **not** define:
|
|||
|
|
|
|||
|
|
* On-disk layouts, segment files, or memory representations.
|
|||
|
|
* Block allocation, packing, GC, or lifecycle rules.
|
|||
|
|
* Snapshot implementation details, checkpoints, or log storage.
|
|||
|
|
* Performance optimizations (bloom filters, sharding, SIMD).
|
|||
|
|
* Federation, provenance, or execution semantics.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Terminology
|
|||
|
|
|
|||
|
|
* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE.
|
|||
|
|
* **Reference** — ASL/1 content address of an Artifact (hash_id + digest).
|
|||
|
|
* **StoreConfig** — `{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE).
|
|||
|
|
* **Block** — immutable storage unit containing artifact bytes.
|
|||
|
|
* **BlockID** — opaque identifier for a block.
|
|||
|
|
* **ArtifactExtent** — `(BlockID, offset, length)` identifying a byte slice within a block.
|
|||
|
|
* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
|
|||
|
|
* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state.
|
|||
|
|
* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot.
|
|||
|
|
* **CURRENT** — effective state after replaying a log prefix on a snapshot.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Core Mapping Semantics
|
|||
|
|
|
|||
|
|
### 3.1 Index Mapping
|
|||
|
|
|
|||
|
|
The index defines a semantic mapping:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Reference -> ArtifactLocation
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state.
|
|||
|
|
|
|||
|
|
### 3.2 Determinism
|
|||
|
|
|
|||
|
|
For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
|
|||
|
|
|
|||
|
|
### 3.3 StoreConfig Consistency
|
|||
|
|
|
|||
|
|
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. ArtifactLocation Semantics
|
|||
|
|
|
|||
|
|
* An ArtifactLocation is an **ordered list** of ArtifactExtents.
|
|||
|
|
* Each extent references immutable bytes within a block.
|
|||
|
|
* The artifact bytes are defined by **concatenating extents in order**.
|
|||
|
|
* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes.
|
|||
|
|
* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks.
|
|||
|
|
* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
|
|||
|
|
* An ArtifactLocation is valid only while all referenced blocks are retained.
|
|||
|
|
* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Visibility Model
|
|||
|
|
|
|||
|
|
An index entry is **visible** at CURRENT if and only if:
|
|||
|
|
|
|||
|
|
1. The entry is admitted in the ordered log prefix for CURRENT.
|
|||
|
|
2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
|
|||
|
|
|
|||
|
|
Visibility is binary; entries are either visible or not visible.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Snapshot and Log Semantics
|
|||
|
|
|
|||
|
|
Snapshots provide a base mapping; the append-only log defines subsequent changes.
|
|||
|
|
|
|||
|
|
The index state for a given CURRENT is defined as:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Index(CURRENT) = Index(snapshot) + replay(log_prefix)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Immutability and Shadowing
|
|||
|
|
|
|||
|
|
### 7.1 Immutability
|
|||
|
|
|
|||
|
|
* Index entries are never mutated.
|
|||
|
|
* Once visible, an entry’s meaning does not change.
|
|||
|
|
* Referenced bytes are immutable for the lifetime of the entry.
|
|||
|
|
|
|||
|
|
### 7.2 Shadowing
|
|||
|
|
|
|||
|
|
* Later entries MAY shadow earlier entries with the same Reference.
|
|||
|
|
* Precedence is determined solely by log order.
|
|||
|
|
* Snapshot boundaries do not alter shadowing semantics.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Tombstones (Optional)
|
|||
|
|
|
|||
|
|
Tombstone entries MAY be used to invalidate prior mappings.
|
|||
|
|
|
|||
|
|
* A tombstone shadows earlier entries for the same Reference.
|
|||
|
|
* Visibility rules are identical to regular entries.
|
|||
|
|
* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Determinism Guarantees
|
|||
|
|
|
|||
|
|
For fixed:
|
|||
|
|
|
|||
|
|
* StoreConfig
|
|||
|
|
* Snapshot
|
|||
|
|
* Log prefix
|
|||
|
|
|
|||
|
|
ASL/1-CORE-INDEX guarantees:
|
|||
|
|
|
|||
|
|
* Deterministic lookup results
|
|||
|
|
* Deterministic shadowing resolution
|
|||
|
|
* Deterministic visibility
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Normative Invariants
|
|||
|
|
|
|||
|
|
Conforming implementations MUST enforce:
|
|||
|
|
|
|||
|
|
1. No visibility without a log-admitted entry.
|
|||
|
|
2. No mutation of visible index entries.
|
|||
|
|
3. Referenced bytes remain immutable for the entry’s lifetime.
|
|||
|
|
4. Shadowing follows strict log order.
|
|||
|
|
5. Snapshot + log replay uniquely defines CURRENT.
|
|||
|
|
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun).
|
|||
|
|
|
|||
|
|
Violation of any invariant constitutes index corruption.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Relationship to Other Specifications
|
|||
|
|
|
|||
|
|
| Layer | Responsibility |
|
|||
|
|
| ------------------ | ---------------------------------------------------------- |
|
|||
|
|
| ASL/1-CORE | Artifact semantics and identity |
|
|||
|
|
| ASL/1-STORE | StoreSnapshot and put/get logical model |
|
|||
|
|
| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation |
|
|||
|
|
| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts |
|
|||
|
|
| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Summary
|
|||
|
|
|
|||
|
|
ASL/1-CORE-INDEX specifies the semantic meaning of the index:
|
|||
|
|
|
|||
|
|
* It maps artifact References to byte locations deterministically.
|
|||
|
|
* It defines visibility and shadowing rules across snapshot + log replay.
|
|||
|
|
* It guarantees immutability and deterministic lookup.
|
|||
|
|
|
|||
|
|
It answers one question:
|
|||
|
|
|
|||
|
|
> *Given a Reference and a CURRENT state, where are the bytes?*
|