sceaning up index documents.

This commit is contained in:
Carl Niklas Rydberg 2026-01-17 06:29:58 +01:00
parent 5a887da909
commit f2225f7a73
14 changed files with 1060 additions and 931 deletions

View file

@ -37,7 +37,7 @@ It does **not** define:
* Artifact or edge identity
* Snapshot semantics
* Storage lifecycle
* Encoding details (see ENC-ASL-CORE-INDEX)
* Encoding details (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`)
---
@ -284,4 +284,3 @@ If you want, the **next natural step** would be to:
* Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively
Both would lock the model completely.

View file

@ -180,7 +180,7 @@ typedef struct {
| ------------------ | ----------------------------------------------- |
| ASL-STORE | Lifecycle, visibility, snapshot enforcement |
| ASL-CORE-INDEX | Index semantics (ArtifactKey → BlockID mapping) |
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments |
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments (`tier1/enc-asl-core-index.md`) |
| ENC-ASL-LOG | Exact byte layout for append-only log |
---
@ -199,4 +199,3 @@ You could now use it to:
If you want, I can **draft a minimal C writer/reader API** next that directly reads/writes this log, including helpers for `add_index_segment()`, `seal_segment()`, and `tombstone()` with payload serialization.
Do you want me to do that next?

View file

@ -1183,223 +1183,7 @@ Design GC and snapshot pruning safely
Youre asking the right questions — this is already “systems-level” design, not tinkering.
Maybe this clear something up:
# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Index
---
## 1. Purpose
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
It bridges the gap between **index meaning** and **physical storage**, ensuring:
* Deterministic replay
* Snapshot-aware visibility
* Immutable block guarantees
* Idempotent recovery
* Correctness of CURRENT state
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
---
## 2. Scope
This specification covers:
* Index segment lifecycle
* Interaction between index and ASL blocks
* Append-only log semantics
* Snapshot integration
* Visibility and lookup rules
* Crash safety and recovery
* Garbage collection constraints
It does **not** cover:
* Disk format details
* Bloom filter algorithms
* File system specifics
* Placement heuristics beyond semantic guarantees
---
## 3. Core Concepts
### 3.1 Index Segment
A **segment** is a contiguous set of index entries written by the store.
* Open while accepting new entries
* Sealed when closed for append
* Sealed segments are immutable
* Sealed segments are **snapshot-visible only after log record**
Segments are the **unit of persistence, replay, and GC**.
---
### 3.2 ASL Block Relationship
Each index entry references a **sealed block** via:
ArtifactKey → (BlockID, offset, length)
* The store must ensure the block is sealed before the entry becomes log-visible
* Blocks are immutable after seal
* Open blocks may be abandoned without violating invariants
---
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include index additions, tombstones, and segment seals
* Log is durable and replayable
* Log defines visibility above checkpoint snapshots
**CURRENT state** is derived as:
CURRENT = checkpoint_state + replay(log)
---
## 4. Segment Lifecycle
### 4.1 Creation
* Open segment is allocated
* Index entries appended in log order
* Entries are invisible until segment seal and log append
### 4.2 Seal
* Segment is closed to append
* Seal record is written to append-only log
* Segment becomes visible for lookup
* Sealed segment may be snapshot-pinned
### 4.3 Snapshot Interaction
* Snapshots capture sealed segments
* Open segments need not survive snapshot
* Segments below snapshot are replay anchors
### 4.4 Garbage Collection
* Only **sealed and unreachable segments** can be deleted
* GC operates at segment granularity
* GC must not break CURRENT or violate invariants
---
## 5. Lookup Semantics
To resolve an ArtifactKey:
1. Identify all visible segments ≤ CURRENT
2. Search segments in **reverse creation order** (newest first)
3. Return the first matching entry
4. Respect tombstone entries (if present)
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
---
## 6. Visibility Guarantees
* Entry visible **iff**:
* The block is sealed
* Log record exists ≤ CURRENT
* Segment seal recorded in log
* Entries above CURRENT or referencing unsealed blocks are invisible
---
## 7. Crash and Recovery Semantics
### 7.1 Crash During Open Segment
* Open segments may be lost
* Index entries may be leaked
* No sealed segment may be corrupted
### 7.2 Recovery Procedure
1. Mount latest checkpoint snapshot
2. Replay append-only log from checkpoint
3. Rebuild CURRENT
4. Resume normal operation
Recovery must be **deterministic and idempotent**.
---
## 8. Tombstone Semantics
* Optional: tombstones may exist to invalidate prior mappings
* Tombstones shadow prior entries with the same ArtifactKey
* Tombstone visibility follows same rules as regular entries
---
## 9. Invariants (Normative)
The store **must enforce**:
1. No segment visible without seal log record
2. No mutation of sealed segment or block
3. Shadowing follows log order strictly
4. Replay uniquely reconstructs CURRENT
5. GC does not remove segments referenced by snapshot or log
6. ArtifactLocation always points to immutable bytes
---
## 10. Non-Goals
ASL-STORE-INDEX does **not** define:
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
* Placement heuristics (small vs. large block packing)
* Performance targets
* Memory caching strategies
* Federation or provenance mechanics
---
## 11. Relationship to Other Documents
| Layer | Responsibility |
| ------------------ | -------------------------------------------------------------------- |
| ASL-CORE-INDEX | Defines semantic meaning of mapping ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
---
## 12. Summary
The store-index layer guarantees:
* Immutable, snapshot-safe segments
* Deterministic and idempotent replay
* Correct visibility semantics
* Safe crash recovery
* Garbage collection constraints
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
Heres a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
Canonical spec (refined, replaces earlier draft):
---

View file

@ -1,245 +0,0 @@
# ASL-CORE-INDEX
### Semantic Addendum to ASL-CORE
---
## 1. Purpose
This document defines the **semantic model of the ASL index**, extending ASL-CORE artifact semantics to include **mapping artifacts to storage locations**.
The ASL index provides a **deterministic, snapshot-relative mapping** from artifact identities to byte locations within **immutable storage blocks**.
It specifies **what the index means**, not:
* How the index is stored or encoded
* How blocks are allocated or packed
* Performance optimizations
* Garbage collection or memory strategies
Those are handled by:
* **ASL-STORE-INDEX** (store semantics and contracts)
* **ENC-ASL-CORE-INDEX** (bytes-on-disk encoding)
---
## 2. Scope
This document defines:
* Logical structure of index entries
* Visibility rules
* Snapshot and log interaction
* Immutability and shadowing semantics
* Determinism guarantees
* Required invariants
It does **not** define:
* On-disk formats
* Index segmentation or sharding
* Bloom filters or probabilistic structures
* Memory residency
* Performance targets
---
## 3. Terminology
* **Artifact**: An immutable sequence of bytes managed by ASL.
* **ArtifactKey**: Opaque identifier for an artifact (typically a hash).
* **Block**: Immutable storage unit containing artifact bytes.
* **BlockID**: Opaque, unique identifier for a block.
* **ArtifactLocation**: Tuple `(BlockID, offset, length)` identifying bytes within a block.
* **Snapshot**: Checkpoint capturing a consistent base state of ASL-managed storage and metadata.
* **Append-Only Log**: Strictly ordered log of index-visible mutations occurring after a snapshot.
* **CURRENT**: The effective system state obtained by replaying the append-only log on top of a checkpoint snapshot.
---
## 4. Block Semantics
ASL-CORE introduces **blocks** minimally:
1. Blocks are **existential storage atoms** for artifact bytes.
2. Each block is uniquely identified by a **BlockID**.
3. Blocks are **immutable once sealed**.
4. Addressing: `(BlockID, offset, length) → bytes`.
5. No block layout, allocation, packing, or size semantics are defined at the core level.
---
## 5. Core Semantic Mapping
The ASL index defines a **total mapping**:
```
ArtifactKey → ArtifactLocation
```
Semantic guarantees:
* Each visible `ArtifactKey` maps to exactly one `ArtifactLocation`.
* Mapping is **immutable once visible**.
* Mapping is **snapshot-relative**.
* Mapping is **deterministic** given `(snapshot, log prefix)`.
---
## 6. ArtifactLocation Semantics
* `block_id` references an ASL block.
* `offset` and `length` define bytes within the block.
* Only valid for the lifetime of the referenced block.
* No interpretation of bytes is implied.
---
## 7. Visibility Model
An index entry is **visible** if and only if:
1. The referenced block is sealed.
2. A corresponding log record exists.
3. The log record is ≤ CURRENT replay position.
**Consequences**:
* Entries referencing unsealed blocks are invisible.
* Entries above CURRENT are invisible.
* Visibility is binary (no gradual exposure).
---
## 8. Snapshot and Log Semantics
* Snapshots act as **checkpoints**, not full state representations.
* Index state at any time:
```
Index(CURRENT) = Index(snapshot) + replay(log)
```
* Replay is strictly ordered, deterministic, and idempotent.
* Snapshot and log entries are semantically equivalent once replayed.
---
## 9. Immutability and Shadowing
### 9.1 Immutability
* Index entries are never mutated.
* Once visible, an entrys meaning never changes.
* Blocks referenced by entries are immutable.
### 9.2 Shadowing
* Later entries may shadow earlier entries with the same `ArtifactKey`.
* Precedence is determined by log order.
* Snapshot boundaries do not alter shadowing semantics.
---
## 10. Tombstones (Optional)
* Tombstone entries are allowed to invalidate prior mappings.
* Semantics:
* Shadows previous entries for the same `ArtifactKey`.
* Visibility follows the same rules as regular entries.
* Existence and encoding of tombstones are optional.
---
## 11. Determinism Guarantees
For fixed:
* Snapshot
* Log prefix
* ASL configuration
* Hash algorithm
The index guarantees:
* Deterministic lookup results
* Deterministic shadowing resolution
* Deterministic visibility
No nondeterministic input may influence index semantics.
---
## 12. Separation of Concerns
* **ASL-CORE**: Defines artifact semantics and the existence of blocks as storage atoms.
* **ASL-CORE-INDEX**: Defines how artifact keys map to blocks, offsets, and lengths.
* **ASL-STORE-INDEX**: Defines lifecycle, replay, and visibility guarantees.
* **ENC-ASL-CORE-INDEX**: Defines exact bytes-on-disk representation.
Index semantics **do not** prescribe:
* Block allocation
* Packing strategies
* Performance optimizations
* Memory residency or caching
---
## 13. Normative Invariants
All conforming implementations must enforce:
1. No visibility without a log record.
2. No mutation of visible index entries.
3. No mutation of sealed blocks.
4. Shadowing follows strict log order.
5. Replay of snapshot + log uniquely defines CURRENT.
6. ArtifactLocation always resolves to immutable bytes.
Violation of any invariant constitutes index corruption.
---
## 14. Non-Goals (Explicit)
ASL-CORE-INDEX does **not** define:
* Disk layout or encoding
* Segment structure, sharding, or bloom filters
* GC policies or memory management
* Small vs. large block packing
* Federation or provenance mechanics
---
## 15. Relationship to Other Specifications
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
| ASL-CORE | Defines artifact semantics and existence of blocks |
| ASL-CORE-INDEX | Defines semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Defines store contracts to realize index semantics |
| ENC-ASL-CORE-INDEX | Defines exact encoding on disk |
---
## 16. Summary
The ASL index:
* Maps artifact identities to block locations deterministically
* Is immutable once entries are visible
* Resolves visibility via snapshots + append-only log
* Supports optional tombstones
* Provides a stable substrate for store, encoding, and higher layers like PEL
It answers **exactly one question**:
> *Given an artifact identity and a point in time, where are the bytes?*
Nothing more, nothing less.

View file

@ -138,7 +138,7 @@ It ensures **determinism, traceability, and reproducibility** across federated d
| ASL-CORE | Blocks and artifacts remain immutable; no change |
| ASL-CORE-INDEX | Artifact → Block mapping is domain-local; published artifacts are indexed across domains |
| ASL-STORE-INDEX | Sealing, retention, and snapshot pinning apply per domain; GC respects cross-domain references |
| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation |
| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation (`tier1/enc-asl-core-index.md`) |
| PEL | DAG execution may include imported artifacts; determinism guaranteed per domain snapshot |
| PEL-PROV / PEL-TRACE | Maintains provenance including cross-domain artifact lineage |
@ -155,5 +155,3 @@ The Federation Specification formalizes:
* Integration with index, store, PEL, and provenance layers
It ensures **multi-domain determinism, traceability, and reproducibility** while leaving semantics and storage-layer policies unchanged.

View file

@ -1,439 +0,0 @@
# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Index
---
## 1. Purpose
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
It bridges the gap between **index meaning** and **physical storage**, ensuring:
* Deterministic replay
* Snapshot-aware visibility
* Immutable block guarantees
* Idempotent recovery
* Correctness of CURRENT state
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
---
## 2. Scope
This specification covers:
* Index segment lifecycle
* Interaction between index and ASL blocks
* Append-only log semantics
* Snapshot integration
* Visibility and lookup rules
* Crash safety and recovery
* Garbage collection constraints
It does **not** cover:
* Disk format details
* Bloom filter algorithms
* File system specifics
* Placement heuristics beyond semantic guarantees
---
## 3. Core Concepts
### 3.1 Index Segment
A **segment** is a contiguous set of index entries written by the store.
* Open while accepting new entries
* Sealed when closed for append
* Sealed segments are immutable
* Sealed segments are **snapshot-visible only after log record**
Segments are the **unit of persistence, replay, and GC**.
---
### 3.2 ASL Block Relationship
Each index entry references a **sealed block** via:
```
ArtifactKey → (BlockID, offset, length)
```
* The store must ensure the block is sealed before the entry becomes log-visible
* Blocks are immutable after seal
* Open blocks may be abandoned without violating invariants
---
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include index additions, tombstones, and segment seals
* Log is durable and replayable
* Log defines visibility above checkpoint snapshots
**CURRENT state** is derived as:
```
CURRENT = checkpoint_state + replay(log)
```
---
## 4. Segment Lifecycle
### 4.1 Creation
* Open segment is allocated
* Index entries appended in log order
* Entries are invisible until segment seal and log append
### 4.2 Seal
* Segment is closed to append
* Seal record is written to append-only log
* Segment becomes visible for lookup
* Sealed segment may be snapshot-pinned
### 4.3 Snapshot Interaction
* Snapshots capture sealed segments
* Open segments need not survive snapshot
* Segments below snapshot are replay anchors
### 4.4 Garbage Collection
* Only **sealed and unreachable segments** can be deleted
* GC operates at segment granularity
* GC must not break CURRENT or violate invariants
---
## 5. Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT
2. Search segments in **reverse creation order** (newest first)
3. Return the first matching entry
4. Respect tombstone entries (if present)
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
---
## 6. Visibility Guarantees
* Entry visible **iff**:
* The block is sealed
* Log record exists ≤ CURRENT
* Segment seal recorded in log
* Entries above CURRENT or referencing unsealed blocks are invisible
---
## 7. Crash and Recovery Semantics
### 7.1 Crash During Open Segment
* Open segments may be lost
* Index entries may be leaked
* No sealed segment may be corrupted
### 7.2 Recovery Procedure
1. Mount latest checkpoint snapshot
2. Replay append-only log from checkpoint
3. Rebuild CURRENT
4. Resume normal operation
Recovery must be **deterministic and idempotent**.
---
## 8. Tombstone Semantics
* Optional: tombstones may exist to invalidate prior mappings
* Tombstones shadow prior entries with the same `ArtifactKey`
* Tombstone visibility follows same rules as regular entries
---
## 9. Invariants (Normative)
The store **must enforce**:
1. No segment visible without seal log record
2. No mutation of sealed segment or block
3. Shadowing follows log order strictly
4. Replay uniquely reconstructs CURRENT
5. GC does not remove segments referenced by snapshot or log
6. ArtifactLocation always points to immutable bytes
---
## 10. Non-Goals
ASL-STORE-INDEX does **not** define:
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
* Placement heuristics (small vs. large block packing)
* Performance targets
* Memory caching strategies
* Federation or provenance mechanics
---
## 11. Relationship to Other Documents
| Layer | Responsibility |
| ------------------ | -------------------------------------------------------------------- |
| ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` |
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
---
## 12. Summary
The store-index layer guarantees:
* Immutable, snapshot-safe segments
* Deterministic and idempotent replay
* Correct visibility semantics
* Safe crash recovery
* Garbage collection constraints
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
Heres a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
---
# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Core Index (Refined)
---
## 1. Purpose
This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX.
It specifies:
* **Block lifecycle**: creation, sealing, retention
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot interaction**: pinning, deterministic visibility
* **Append-only log semantics**
* **Garbage collection rules**
It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).
---
## 2. Scope
Covers:
* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
Excludes:
* Disk-level encoding
* Sharding strategies
* Bloom filters or acceleration structures
* Memory residency or caching
* Federation or PEL semantics
---
## 3. Core Concepts
### 3.1 Block
* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique)
* **Properties:**
* Once sealed, contents never change
* Can be referenced by multiple artifacts
* May be pinned by snapshots for retention
* **Lifecycle Events:**
1. Creation: block allocated but contents may still be written
2. Sealing: block is finalized, immutable, and log-visible
3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT
4. Garbage collection: block may be deleted if no longer referenced and unpinned
---
### 3.2 Index Segment
Segments group index entries and provide **persistence and recovery units**.
* **Open segment:** accepting new index entries, not visible for lookup
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable
* **Segment components:** header, optional bloom filter, index records, footer
* **Segment visibility:** only after seal and log append
---
### 3.3 Append-Only Log
All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**:
* Entries include:
* Index additions
* Tombstones
* Segment seals
* Log is replayable to reconstruct CURRENT
* Determinism: replay produces identical CURRENT from same snapshot and log prefix
---
## 4. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
**Notes:**
* Sealing ensures that any index entry referencing the block is deterministic and immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.
---
## 5. Snapshot Interaction
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:
```
CURRENT = snapshot_state + replay(log)
```
* Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
---
## 6. Index Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT
2. Search segments in **reverse creation order** (newest first)
3. Return first matching entry
4. Respect tombstones to shadow prior entries
Determinism:
* Lookup results are identical across platforms given the same snapshot and log prefix
* Accelerations (bloom filters, sharding, SIMD) do **not alter correctness**
---
## 7. Garbage Collection
* **Eligibility for GC:**
* Segments: sealed, no references from CURRENT or snapshots
* Blocks: unpinned, unreferenced by any segment or artifact
* **Rules:**
* GC is safe **only on sealed segments and blocks**
* Must respect snapshot pins
* Tombstones may aid in invalidating unreachable blocks
* **Outcome:**
* GC never violates CURRENT reconstruction
* Blocks can be reclaimed without breaking provenance
---
## 8. Tombstone Semantics
* Optional marker to invalidate prior mappings
* Visibility rules identical to regular index entries
* Used to maintain deterministic CURRENT in face of shadowing or deletions
---
## 9. Crash and Recovery Semantics
* Open segments or unsealed blocks may be lost; no invariant is broken
* Recovery procedure:
1. Mount last checkpoint snapshot
2. Replay append-only log
3. Reconstruct CURRENT
* Recovery is **deterministic and idempotent**
* Segments and blocks **never partially visible** after crash
---
## 10. Normative Invariants
1. Sealed blocks are immutable
2. Index entries referencing blocks are immutable once visible
3. Shadowing follows strict log order
4. Replay of snapshot + log uniquely reconstructs CURRENT
5. GC cannot remove blocks or segments needed by snapshot or CURRENT
6. Tombstones shadow prior entries without deleting underlying blocks prematurely
---
## 11. Non-Goals
* Disk-level encoding (ENC-ASL-CORE-INDEX)
* Memory layout or caching
* Sharding or performance heuristics
* Federation / multi-domain semantics (handled elsewhere)
* Block packing strategies (small vs large blocks)
---
## 12. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
---
## 13. Summary
The refined ASL-STORE-INDEX:
* Defines **block lifecycle**: creation, sealing, retention, GC
* Ensures **snapshot safety** and deterministic visibility
* Guarantees **immutable, replayable, and recoverable CURRENT**
* Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics

View file

@ -1,5 +1,7 @@
# ENC-ASL-CORE-INDEX ADDENDUM: Federation Encoding
Base spec: `tier1/enc-asl-core-index.md`
---
## 1. Purpose
@ -109,5 +111,3 @@ This addendum updates **ENC-ASL-CORE-INDEX** to support **federation**:
* Maintains backward compatibility with legacy segments
It integrates federation metadata **without altering the underlying block or artifact encoding**, preserving deterministic execution and PEL provenance.

View file

@ -33,7 +33,7 @@
│ • Block sealing │
│ • Retention / GC │
│ • Small/Large packing │
│ - ENC-ASL-CORE-INDEX │
│ - ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md)
│ • On-disk record layout│
│ • Domain / visibility │
└─────────────┬──────────────┘
@ -143,5 +143,3 @@ This diagram and flow description captures:
* Deterministic reconstruction from **checkpoint + append-only log**
* Block semantics, small/large handling, and domain visibility
* Integration of **execution receipts** into artifact flows and traces

View file

@ -182,7 +182,7 @@ ASL-CORE-INDEX
ASL-STORE-INDEX
└─ Store lifecycle & snapshot safety
ENC-ASL-CORE-INDEX
ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md)
└─ Bytes-on-disk encoding
ASL-INDEX-ACCEL ← NEW
@ -214,4 +214,3 @@ If you want next, I can:
* Draft **ASL-INDEX-ACCEL**
* Or rewrite **ASL-CORE-INDEX with Canonical vs Routing fully integrated**

211
tier1/asl-core-index.md Normal file
View file

@ -0,0 +1,211 @@
# ASL/1-CORE-INDEX — Semantic Index Model
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Tags: [deterministic, index, semantics]
**Document ID:** `ASL/1-CORE-INDEX`
**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle)
**Depends on (normative):**
* `ASL/1-CORE`
* `ASL/1-STORE`
**Informative references:**
* `ASL-STORE-INDEX` — store lifecycle and replay contracts
* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`)
* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding)
* `ASL/LOG/1` — append-only semantic log (segment visibility)
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX.
---
## 1. Purpose & Non-Goals
### 1.1 Purpose
ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts:
* It specifies what it means to map an artifact identity to a byte location.
* It defines visibility, immutability, and shadowing semantics.
* It ensures deterministic lookup for a fixed snapshot and log prefix.
### 1.2 Non-goals
ASL/1-CORE-INDEX explicitly does **not** define:
* On-disk layouts, segment files, or memory representations.
* Block allocation, packing, GC, or lifecycle rules.
* Snapshot implementation details, checkpoints, or log storage.
* Performance optimizations (bloom filters, sharding, SIMD).
* Federation, provenance, or execution semantics.
---
## 2. Terminology
* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE.
* **Reference** — ASL/1 content address of an Artifact (hash_id + digest).
* **StoreConfig**`{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE).
* **Block** — immutable storage unit containing artifact bytes.
* **BlockID** — opaque identifier for a block.
* **ArtifactExtent**`(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state.
* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot.
* **CURRENT** — effective state after replaying a log prefix on a snapshot.
---
## 3. Core Mapping Semantics
### 3.1 Index Mapping
The index defines a semantic mapping:
```
Reference -> ArtifactLocation
```
For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state.
### 3.2 Determinism
For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
### 3.3 StoreConfig Consistency
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`.
---
## 4. ArtifactLocation Semantics
* An ArtifactLocation is an **ordered list** of ArtifactExtents.
* Each extent references immutable bytes within a block.
* The artifact bytes are defined by **concatenating extents in order**.
* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes.
* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks.
* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
* An ArtifactLocation is valid only while all referenced blocks are retained.
* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping.
---
## 5. Visibility Model
An index entry is **visible** at CURRENT if and only if:
1. The entry is admitted in the ordered log prefix for CURRENT.
2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
Visibility is binary; entries are either visible or not visible.
---
## 6. Snapshot and Log Semantics
Snapshots provide a base mapping; the append-only log defines subsequent changes.
The index state for a given CURRENT is defined as:
```
Index(CURRENT) = Index(snapshot) + replay(log_prefix)
```
Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed.
---
## 7. Immutability and Shadowing
### 7.1 Immutability
* Index entries are never mutated.
* Once visible, an entrys meaning does not change.
* Referenced bytes are immutable for the lifetime of the entry.
### 7.2 Shadowing
* Later entries MAY shadow earlier entries with the same Reference.
* Precedence is determined solely by log order.
* Snapshot boundaries do not alter shadowing semantics.
---
## 8. Tombstones (Optional)
Tombstone entries MAY be used to invalidate prior mappings.
* A tombstone shadows earlier entries for the same Reference.
* Visibility rules are identical to regular entries.
* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used.
---
## 9. Determinism Guarantees
For fixed:
* StoreConfig
* Snapshot
* Log prefix
ASL/1-CORE-INDEX guarantees:
* Deterministic lookup results
* Deterministic shadowing resolution
* Deterministic visibility
---
## 10. Normative Invariants
Conforming implementations MUST enforce:
1. No visibility without a log-admitted entry.
2. No mutation of visible index entries.
3. Referenced bytes remain immutable for the entrys lifetime.
4. Shadowing follows strict log order.
5. Snapshot + log replay uniquely defines CURRENT.
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun).
Violation of any invariant constitutes index corruption.
---
## 11. Relationship to Other Specifications
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
| ASL/1-CORE | Artifact semantics and identity |
| ASL/1-STORE | StoreSnapshot and put/get logical model |
| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts |
| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records |
---
## 12. Summary
ASL/1-CORE-INDEX specifies the semantic meaning of the index:
* It maps artifact References to byte locations deterministically.
* It defines visibility and shadowing rules across snapshot + log replay.
* It guarantees immutability and deterministic lookup.
It answers one question:
> *Given a Reference and a CURRENT state, where are the bytes?*

272
tier1/asl-index-accel-1.md Normal file
View file

@ -0,0 +1,272 @@
# ASL/INDEX-ACCEL/1 — Index Acceleration Semantics
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Tags: [deterministic, index, acceleration]
**Document ID:** `ASL/INDEX-ACCEL/1`
**Layer:** L1 — Acceleration rules over index semantics (no storage / encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX`
**Informative references:**
* `ASL-STORE-INDEX` — store lifecycle and replay contracts
* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`)
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/INDEX-ACCEL/1 defines **acceleration semantics only**. It MUST NOT change index meaning defined by ASL/1-CORE-INDEX.
---
## 1. Purpose
ASL/INDEX-ACCEL/1 defines **acceleration mechanisms** used by ASL-based indexes, including:
* Routing keys
* Sharding
* Filters (Bloom, XOR, Ribbon, etc.)
* SIMD execution
* Hash recasting
All mechanisms defined herein are **observationally invisible** to ASL/1-CORE-INDEX semantics.
---
## 2. Scope
Applies to:
* Artifact indexes (ASL)
* Projection and graph indexes (e.g., TGK)
* Any index layered on ASL/1-CORE-INDEX semantics
Does **not** define:
* Artifact or edge identity
* Snapshot semantics
* Storage lifecycle
* Encoding details
---
## 3. Canonical Key vs Routing Key
### 3.1 Canonical Key
The **Canonical Key** uniquely identifies an indexable entity.
Examples:
* Artifact: `Reference`
* TGK Edge: `CanonicalEdgeKey`
Properties:
* Defines semantic identity
* Used for equality, shadowing, and tombstones
* Stable and immutable
* Fully compared on index match
### 3.2 Routing Key
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
Properties:
* Derived deterministically from Canonical Key and optional attributes
* MAY be used for sharding, filters, SIMD layouts
* MUST NOT affect index semantics
* MUST be verified by full Canonical Key comparison on match
Formal rule:
```
CanonicalKey determines correctness
RoutingKey determines performance
```
---
## 4. Filter Semantics
### 4.1 Advisory Nature
All filters are **advisory only**.
Rules:
* False positives are permitted
* False negatives are forbidden
* Filter behavior MUST NOT affect correctness
Invariant:
```
Filter miss => key is definitely absent
Filter hit => key may be present
```
### 4.2 Filter Inputs
Filters operate over **Routing Keys**, not Canonical Keys.
A Routing Key MAY incorporate:
* Hash of Canonical Key
* Artifact type tag (if present)
* TGK edge type key
* Direction, role, or other immutable classification attributes
Absence of optional attributes MUST be encoded explicitly.
### 4.3 Filter Construction
* Filters are built only over **sealed, immutable segments**
* Filters are immutable once built
* Filter construction MUST be deterministic
* Filter state MUST be covered by segment checksums
---
## 5. Sharding Semantics
### 5.1 Observational Invisibility
Sharding is a **mechanical partitioning** of the index.
Invariant:
```
LogicalIndex = union(all shards)
```
Rules:
* Shards MUST NOT affect lookup results
* Shard count and boundaries may change over time
* Rebalancing MUST preserve lookup semantics
### 5.2 Shard Assignment
Shard assignment MAY be based on:
* Hash of Canonical Key
* Routing Key
* Composite routing strategies
Shard selection MUST be deterministic per snapshot.
---
## 6. Hashing and Hash Recasting
### 6.1 Hashing
Hashes MAY be used for routing, filtering, or SIMD layout.
Hashes MUST NOT be treated as identity.
### 6.2 Hash Recasting
Hash recasting (changing hash functions or seeds) is permitted if:
1. It is deterministic
2. It does not change Canonical Keys
3. It does not affect index semantics
Recasting is equivalent to rebuilding acceleration structures.
---
## 7. SIMD Execution
SIMD operations MAY be used to:
* Evaluate filters
* Compare routing keys
* Accelerate scans
Rules:
* SIMD must operate only on immutable data
* SIMD must not short-circuit semantic checks
* SIMD must preserve deterministic behavior
---
## 8. Multi-Dimensional Routing Examples (Normative)
### 8.1 Artifact Index
* Canonical Key: `Reference`
* Routing Key components:
* `H(Reference)`
* `type_tag` (if present)
* `has_typetag`
### 8.2 TGK Edge Index
* Canonical Key: `CanonicalEdgeKey`
* Routing Key components:
* `H(CanonicalEdgeKey)`
* `edge_type_key`
* Direction or role (optional)
---
## 9. Snapshot Interaction
Acceleration structures:
* MUST respect snapshot visibility rules
* MUST operate over the same sealed segments visible to the snapshot
* MUST NOT bypass tombstones or shadowing
Snapshot cuts apply **after** routing and filtering.
---
## 10. Normative Invariants
1. Canonical Keys define identity and correctness
2. Routing Keys are advisory only
3. Filters may never introduce false negatives
4. Sharding is observationally invisible
5. Hashes are not identity
6. SIMD is an execution strategy, not a semantic construct
7. All acceleration is deterministic per snapshot
---
## 11. Non-Goals
ASL/INDEX-ACCEL/1 does not define:
* Specific filter algorithms
* Memory layout
* CPU instruction selection
* Encoding formats
* Federation policies
---
## 12. Summary
ASL/INDEX-ACCEL/1 establishes a strict contract:
> All acceleration exists to make the index faster, never different.
It formalizes Canonical vs Routing keys and constrains filters, sharding, hashing, and SIMD so that correctness is preserved under all optimizations.

207
tier1/asl-log-1.md Normal file
View file

@ -0,0 +1,207 @@
# ASL/LOG/1 — Append-Only Semantic Log
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Tags: [deterministic, log, snapshot]
**Document ID:** `ASL/LOG/1`
**Layer:** L1 — Domain log semantics (no transport)
**Depends on (normative):**
* `ASL-STORE-INDEX`
**Informative references:**
* `ASL/1-CORE-INDEX` — index semantics
* `ENC-ASL-LOG` — bytes-on-disk encoding profile (if defined)
* `ENC-ASL-CORE-INDEX` — index segment encoding (`tier1/enc-asl-core-index.md`)
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/LOG/1 defines **semantic log behavior**. It does not define transport, replication protocols, or storage layout.
---
## 1. Purpose
ASL/LOG/1 defines the **authoritative, append-only log** for an ASL domain.
The log records **semantic commits** that affect:
* Index segment visibility
* Tombstone policy
* Snapshot anchoring
* Optional publication metadata
The log is the **sole source of truth** for reconstructing CURRENT state.
---
## 2. Core Properties (Normative)
An ASL log MUST be:
1. Append-only
2. Strictly ordered
3. Deterministically replayable
4. Hash-chained
5. Snapshot-anchorable
6. Forward-compatible
---
## 3. Log Model
### 3.1 Log Sequence
Each record has a monotonically increasing `logseq`:
```
logseq: uint64
```
* Assigned by the domain authority
* Total order within a domain
* Never reused
### 3.2 Hash Chain
Each record commits to the previous record:
```
record_hash = H(prev_record_hash || record_type || payload)
```
This enables tamper detection, witness signing, and federation verification.
---
## 4. Record Types (Normative)
### 4.1 SEGMENT_SEAL
Declares an index segment visible.
Semantics:
* From this `logseq` onward, the referenced segment is visible for lookup and replay.
* Segment MUST be immutable.
* All referenced blocks MUST already be sealed.
* Segment contents are not re-logged.
### 4.2 TOMBSTONE
Declares an artifact inadmissible under domain policy.
Semantics:
* Does not delete data.
* Shadows prior visibility.
* Applies from this logseq onward.
### 4.3 TOMBSTONE_LIFT
Supersedes a previous tombstone.
Semantics:
* References an earlier TOMBSTONE.
* Does not erase history.
* Only affects CURRENT at or above this logseq.
### 4.4 SNAPSHOT_ANCHOR
Binds semantic state to a snapshot.
Semantics:
* Defines a replay checkpoint.
* Enables log truncation below anchor with care.
### 4.5 ARTIFACT_PUBLISH (Optional)
Marks an artifact as published.
Semantics:
* Publication is domain-local.
* Federation layers may interpret this metadata.
### 4.6 ARTIFACT_UNPUBLISH (Optional)
Withdraws publication.
---
## 5. Replay Semantics (Normative)
To reconstruct CURRENT:
1. Load latest snapshot anchor (if any).
2. Initialize visible segments from that snapshot.
3. Replay all log records with `logseq > snapshot.logseq`.
4. Apply records in order:
* SEGMENT_SEAL -> add segment
* TOMBSTONE -> update policy state
* TOMBSTONE_LIFT -> override policy
* PUBLISH/UNPUBLISH -> update visibility metadata
Replay MUST be deterministic.
---
## 6. Index Interaction
* Index segments contain index entries.
* The log never records individual index entries.
* Visibility is controlled solely by SEGMENT_SEAL.
* Index rebuild = scan visible segments + apply policy.
---
## 7. Garbage Collection Constraints
* A segment may be GC'd only if:
* No snapshot references it.
* No log replay <= CURRENT requires it.
* Log truncation is only safe at SNAPSHOT_ANCHOR boundaries.
---
## 8. Versioning & Extensibility
* Unknown record types MUST be skipped and MUST NOT break replay.
* Payloads are opaque outside their type.
* New record types may be added in later versions.
---
## 9. Non-Goals
ASL/LOG/1 does not define:
* Federation protocols
* Network replication
* Witness signatures
* Block-level events
* Hydration / eviction
* Execution receipts
---
## 10. Summary
ASL/LOG/1 defines the minimal semantic log needed to reconstruct CURRENT.
If it affects visibility or admissibility, it goes in the log. If it affects layout or performance, it does not.

316
tier1/asl-store-index.md Normal file
View file

@ -0,0 +1,316 @@
# ASL-STORE-INDEX
### Store Semantics and Contracts for ASL Core Index (Tier1)
---
## 1. Purpose
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
It specifies:
* **Block lifecycle**: creation, sealing, retention, GC
* **Index segment lifecycle**: creation, append, seal, visibility
* **Snapshot identity and log positions** for deterministic replay
* **Append-only log semantics**
* **Lookup, visibility, and crash recovery rules**
* **Small vs large block handling**
It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX).
---
## 2. Scope
Covers:
* Lifecycle of **blocks** and **index entries**
* Snapshot and CURRENT consistency guarantees
* Deterministic replay and recovery
* GC and tombstone semantics
* Packing policy for small vs large artifacts
Excludes:
* Disk-level encoding
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
* Memory residency or caching
* Federation or PEL semantics
---
## 3. Core Concepts
### 3.1 Block
* **Definition:** Immutable storage unit containing artifact bytes.
* **Identifier:** BlockID (opaque, unique).
* **Properties:**
* Once sealed, contents never change.
* Can be referenced by multiple artifacts.
* May be pinned by snapshots for retention.
### 3.2 Index Segment
Segments group index entries and provide **persistence and recovery units**.
* **Open segment:** accepting new index entries, not visible for lookup.
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
* **Segment components:** header, optional bloom filter, index records, footer.
* **Segment visibility:** only after seal and log append.
### 3.3 Append-Only Log
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
* Entries include:
* Index additions
* Tombstones
* Segment seals
* Log is replayable to reconstruct CURRENT.
* Log semantics are defined in `ASL/LOG/1`.
### 3.4 Snapshot Identity and Log Position
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
* **SnapshotID**: opaque, immutable identifier for a snapshot.
* **LogPosition**: monotonic integer position in the append-only log.
* **IndexState**: `(SnapshotID, LogPosition)`.
Deterministic replay is defined as:
```
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
### 3.5 Artifact Location
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
---
## 4. Block Lifecycle Semantics
| Event | Description | Semantic Guarantees |
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
Notes:
* Sealing ensures any index entry referencing the block is immutable.
* Retention is driven by snapshot and log visibility rules.
* GC must **never violate CURRENT reconstruction guarantees**.
---
## 5. Segment Lifecycle Semantics
### 5.1 Creation
* Open segment is allocated.
* Index entries appended in log order.
* Entries are invisible until segment seal and log append.
### 5.2 Seal
* Segment is closed to append.
* Seal record is written to append-only log.
* Segment becomes visible for lookup.
* Sealed segment may be snapshot-pinned.
### 5.3 Snapshot Interaction
* Snapshots capture sealed segments.
* Open segments need not survive snapshot.
* Segments below snapshot are replay anchors.
---
## 6. Visibility and Lookup Semantics
### 6.1 Visibility Rules
* Entry visible **iff**:
* The block is sealed.
* Log record exists at position ≤ CURRENT.
* Segment seal recorded in log.
* Entries above CURRENT or referencing unsealed blocks are invisible.
### 6.2 Lookup Semantics
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse creation order** (newest first).
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.
Determinism:
* Lookup results are identical across platforms given the same snapshot and log prefix.
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
---
## 7. Snapshot Interaction
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
* CURRENT is reconstructed as:
```
CURRENT = snapshot_state + replay(log)
```
Segment and block visibility rules:
| Entity | Visible in snapshot | Visible in CURRENT |
| -------------------- | ---------------------------- | ------------------------------ |
| Open segment/block | No | Only after seal and log append |
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
---
## 8. Garbage Collection
Eligibility for GC:
* Segments: sealed, no references from CURRENT or snapshots.
* Blocks: unpinned, unreferenced by any segment or artifact.
Rules:
* GC is safe **only on sealed segments and blocks**.
* Must respect snapshot pins.
* Tombstones may aid in invalidating unreachable blocks.
Outcome:
* GC never violates CURRENT reconstruction.
* Blocks can be reclaimed without breaking provenance.
---
## 9. Tombstone Semantics
* Optional marker to invalidate prior mappings.
* Visibility rules identical to regular index entries.
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
---
## 10. Small vs Large Block Handling
### 10.1 Definitions
| Term | Meaning |
| ----------------- | --------------------------------------------------------------------- |
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
| **Packing** | Combining multiple small artifacts into a single physical block. |
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
### 10.2 Packing Rules
1. **Small blocks may be packed together** to reduce storage overhead.
2. **Large blocks are never packed with other artifacts**.
3. Mixed segments are **allowed but discouraged**; index semantics remain identical.
### 10.3 Segment Allocation Rules
1. Small blocks are allocated into segments optimized for packing efficiency.
2. Large blocks are allocated into segments optimized for sequential I/O.
3. Segment sealing and visibility rules remain unchanged.
### 10.4 Indexing and Addressing
All blocks are addressed uniformly:
```
ArtifactExtent = (BlockID, offset, length)
ArtifactLocation = [ArtifactExtent...]
```
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
### 10.5 GC and Retention
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
2. Large blocks are reclaimed per block.
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
---
## 11. Crash and Recovery Semantics
* Open segments or unsealed blocks may be lost; no invariant is broken.
* Recovery procedure:
1. Mount last checkpoint snapshot.
2. Replay append-only log from checkpoint.
3. Reconstruct CURRENT.
* Recovery is **deterministic and idempotent**.
* Segments and blocks **never partially visible** after crash.
---
## 12. Normative Invariants
1. Sealed blocks are immutable.
2. Index entries referencing blocks are immutable once visible.
3. Shadowing follows strict log order.
4. Replay of snapshot + log uniquely reconstructs CURRENT.
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
---
## 13. Non-Goals
* Disk-level encoding (ENC-ASL-CORE-INDEX).
* Memory layout or caching.
* Sharding or performance heuristics.
* Federation / multi-domain semantics (handled elsewhere).
* Block packing strategies beyond the policy rules here.
---
## 14. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------------------------- |
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
---
## 15. Summary
The tier1 ASL-STORE-INDEX specification:
* Defines **block lifecycle** and **segment lifecycle**.
* Makes **snapshot identity and log positions** explicit for replay.
* Ensures deterministic visibility, lookup, and crash recovery.
* Formalizes GC safety and tombstone behavior.
* Adds clear **small vs large block** handling without changing core semantics.

View file

@ -8,7 +8,7 @@
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
It translates the **semantic model of ASL-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
It is intended for:
@ -19,8 +19,9 @@ It is intended for:
It does **not** define:
* Index semantics (see ASL-CORE-INDEX)
* Index semantics (see ASL/1-CORE-INDEX)
* Store lifecycle behavior (see ASL-STORE-INDEX)
* Acceleration semantics (see ASL/INDEX-ACCEL/1)
---
@ -49,6 +50,8 @@ Each index segment file is laid out as follows:
+------------------+
| IndexRecord[] |
+------------------+
| ExtentRecord[] |
+------------------+
| SegmentFooter |
+------------------+
```
@ -56,6 +59,7 @@ Each index segment file is laid out as follows:
* **SegmentHeader**: fixed-size, mandatory
* **BloomFilter**: optional, opaque, segment-local
* **IndexRecord[]**: array of index entries
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
* **SegmentFooter**: fixed-size, mandatory
Offsets in the header define locations of Bloom filter and index records.
@ -81,6 +85,9 @@ typedef struct {
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
uint64_t flags; // Reserved for future use
} SegmentHeader;
#pragma pack(pop)
@ -104,9 +111,9 @@ typedef struct {
uint64_t hash_lo; // Low 64 bits
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of artifact bytes
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
uint32_t total_length; // Total artifact length in bytes
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
uint32_t reserved; // Reserved for alignment/future use
@ -117,13 +124,34 @@ typedef struct {
**Notes:**
* `hash_*` fields store the artifact key deterministically.
* `block_id` references an ASL block.
* `offset` / `length` define bytes within the block.
* `extents_offset` references the first ExtentRecord for this entry.
* `extent_count` defines how many extents to read (may be 0 for tombstones).
* `total_length` is the exact artifact size in bytes.
* Flags may indicate tombstone or other special status.
---
## 6. SegmentFooter
## 6. ExtentRecord
```c
#pragma pack(push,1)
typedef struct {
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of this extent
} ExtentRecord;
#pragma pack(pop)
```
**Notes:**
* Extents are concatenated in order to produce artifact bytes.
* `extent_count` MUST be > 0 for visible (non-tombstone) entries.
* `total_length` MUST equal the sum of `length` across the extents.
---
## 7. SegmentFooter
```c
#pragma pack(push,1)
@ -142,7 +170,7 @@ typedef struct {
---
## 7. Bloom Filter
## 8. Bloom Filter
* The bloom filter is **optional** and opaque to semantics.
* Its purpose is **lookup acceleration**.
@ -151,24 +179,27 @@ typedef struct {
---
## 8. Versioning and Compatibility
## 9. Versioning and Compatibility
* `version` field in header defines encoding.
* Readers must **reject unsupported versions**.
* New fields may be added in future versions only via version bump.
* Existing fields must **never change meaning**.
* Version `1` implies single-extent layout (legacy).
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
---
## 9. Alignment and Packing
## 10. Alignment and Packing
* All structures are **packed** (no compiler padding)
* Multi-byte integers are **little-endian**
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
* Extents are accessed via `IndexRecord.extents_offset` relative to the file base.
---
## 10. Summary of Encoding Guarantees
## 11. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
@ -180,14 +211,13 @@ The ENC-ASL-CORE-INDEX specification ensures:
---
## 11. Relationship to Other Layers
## 12. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
| ASL-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
This completes the stack: **semantics → store behavior → encoding**.