sceaning up index documents.
This commit is contained in:
parent
5a887da909
commit
f2225f7a73
|
|
@ -37,7 +37,7 @@ It does **not** define:
|
|||
* Artifact or edge identity
|
||||
* Snapshot semantics
|
||||
* Storage lifecycle
|
||||
* Encoding details (see ENC-ASL-CORE-INDEX)
|
||||
* Encoding details (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -284,4 +284,3 @@ If you want, the **next natural step** would be to:
|
|||
* Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively
|
||||
|
||||
Both would lock the model completely.
|
||||
|
||||
|
|
|
|||
|
|
@ -180,7 +180,7 @@ typedef struct {
|
|||
| ------------------ | ----------------------------------------------- |
|
||||
| ASL-STORE | Lifecycle, visibility, snapshot enforcement |
|
||||
| ASL-CORE-INDEX | Index semantics (ArtifactKey → BlockID mapping) |
|
||||
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments |
|
||||
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments (`tier1/enc-asl-core-index.md`) |
|
||||
| ENC-ASL-LOG | Exact byte layout for append-only log |
|
||||
|
||||
---
|
||||
|
|
@ -199,4 +199,3 @@ You could now use it to:
|
|||
If you want, I can **draft a minimal C writer/reader API** next that directly reads/writes this log, including helpers for `add_index_segment()`, `seal_segment()`, and `tombstone()` with payload serialization.
|
||||
|
||||
Do you want me to do that next?
|
||||
|
||||
|
|
|
|||
218
notes/all.txt
218
notes/all.txt
|
|
@ -1183,223 +1183,7 @@ Design GC and snapshot pruning safely
|
|||
|
||||
You’re asking the right questions — this is already “systems-level” design, not tinkering.
|
||||
|
||||
Maybe this clear something up:
|
||||
# ASL-STORE-INDEX
|
||||
|
||||
### Store Semantics and Contracts for ASL Index
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
|
||||
|
||||
It bridges the gap between **index meaning** and **physical storage**, ensuring:
|
||||
|
||||
* Deterministic replay
|
||||
* Snapshot-aware visibility
|
||||
* Immutable block guarantees
|
||||
* Idempotent recovery
|
||||
* Correctness of CURRENT state
|
||||
|
||||
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This specification covers:
|
||||
|
||||
* Index segment lifecycle
|
||||
* Interaction between index and ASL blocks
|
||||
* Append-only log semantics
|
||||
* Snapshot integration
|
||||
* Visibility and lookup rules
|
||||
* Crash safety and recovery
|
||||
* Garbage collection constraints
|
||||
|
||||
It does **not** cover:
|
||||
|
||||
* Disk format details
|
||||
* Bloom filter algorithms
|
||||
* File system specifics
|
||||
* Placement heuristics beyond semantic guarantees
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Concepts
|
||||
|
||||
### 3.1 Index Segment
|
||||
|
||||
A **segment** is a contiguous set of index entries written by the store.
|
||||
|
||||
* Open while accepting new entries
|
||||
* Sealed when closed for append
|
||||
* Sealed segments are immutable
|
||||
* Sealed segments are **snapshot-visible only after log record**
|
||||
|
||||
Segments are the **unit of persistence, replay, and GC**.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 ASL Block Relationship
|
||||
|
||||
Each index entry references a **sealed block** via:
|
||||
|
||||
|
||||
ArtifactKey → (BlockID, offset, length)
|
||||
|
||||
|
||||
* The store must ensure the block is sealed before the entry becomes log-visible
|
||||
* Blocks are immutable after seal
|
||||
* Open blocks may be abandoned without violating invariants
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Append-Only Log
|
||||
|
||||
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
||||
|
||||
* Entries include index additions, tombstones, and segment seals
|
||||
* Log is durable and replayable
|
||||
* Log defines visibility above checkpoint snapshots
|
||||
|
||||
**CURRENT state** is derived as:
|
||||
|
||||
|
||||
CURRENT = checkpoint_state + replay(log)
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 4. Segment Lifecycle
|
||||
|
||||
### 4.1 Creation
|
||||
|
||||
* Open segment is allocated
|
||||
* Index entries appended in log order
|
||||
* Entries are invisible until segment seal and log append
|
||||
|
||||
### 4.2 Seal
|
||||
|
||||
* Segment is closed to append
|
||||
* Seal record is written to append-only log
|
||||
* Segment becomes visible for lookup
|
||||
* Sealed segment may be snapshot-pinned
|
||||
|
||||
### 4.3 Snapshot Interaction
|
||||
|
||||
* Snapshots capture sealed segments
|
||||
* Open segments need not survive snapshot
|
||||
* Segments below snapshot are replay anchors
|
||||
|
||||
### 4.4 Garbage Collection
|
||||
|
||||
* Only **sealed and unreachable segments** can be deleted
|
||||
* GC operates at segment granularity
|
||||
* GC must not break CURRENT or violate invariants
|
||||
|
||||
---
|
||||
|
||||
## 5. Lookup Semantics
|
||||
|
||||
To resolve an ArtifactKey:
|
||||
|
||||
1. Identify all visible segments ≤ CURRENT
|
||||
2. Search segments in **reverse creation order** (newest first)
|
||||
3. Return the first matching entry
|
||||
4. Respect tombstone entries (if present)
|
||||
|
||||
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
|
||||
|
||||
---
|
||||
|
||||
## 6. Visibility Guarantees
|
||||
|
||||
* Entry visible **iff**:
|
||||
|
||||
* The block is sealed
|
||||
* Log record exists ≤ CURRENT
|
||||
* Segment seal recorded in log
|
||||
* Entries above CURRENT or referencing unsealed blocks are invisible
|
||||
|
||||
---
|
||||
|
||||
## 7. Crash and Recovery Semantics
|
||||
|
||||
### 7.1 Crash During Open Segment
|
||||
|
||||
* Open segments may be lost
|
||||
* Index entries may be leaked
|
||||
* No sealed segment may be corrupted
|
||||
|
||||
### 7.2 Recovery Procedure
|
||||
|
||||
1. Mount latest checkpoint snapshot
|
||||
2. Replay append-only log from checkpoint
|
||||
3. Rebuild CURRENT
|
||||
4. Resume normal operation
|
||||
|
||||
Recovery must be **deterministic and idempotent**.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tombstone Semantics
|
||||
|
||||
* Optional: tombstones may exist to invalidate prior mappings
|
||||
* Tombstones shadow prior entries with the same ArtifactKey
|
||||
* Tombstone visibility follows same rules as regular entries
|
||||
|
||||
---
|
||||
|
||||
## 9. Invariants (Normative)
|
||||
|
||||
The store **must enforce**:
|
||||
|
||||
1. No segment visible without seal log record
|
||||
2. No mutation of sealed segment or block
|
||||
3. Shadowing follows log order strictly
|
||||
4. Replay uniquely reconstructs CURRENT
|
||||
5. GC does not remove segments referenced by snapshot or log
|
||||
6. ArtifactLocation always points to immutable bytes
|
||||
|
||||
---
|
||||
|
||||
## 10. Non-Goals
|
||||
|
||||
ASL-STORE-INDEX does **not** define:
|
||||
|
||||
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
|
||||
* Placement heuristics (small vs. large block packing)
|
||||
* Performance targets
|
||||
* Memory caching strategies
|
||||
* Federation or provenance mechanics
|
||||
|
||||
---
|
||||
|
||||
## 11. Relationship to Other Documents
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | -------------------------------------------------------------------- |
|
||||
| ASL-CORE-INDEX | Defines semantic meaning of mapping ArtifactKey → ArtifactLocation |
|
||||
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
|
||||
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
|
||||
|
||||
---
|
||||
|
||||
## 12. Summary
|
||||
|
||||
The store-index layer guarantees:
|
||||
|
||||
* Immutable, snapshot-safe segments
|
||||
* Deterministic and idempotent replay
|
||||
* Correct visibility semantics
|
||||
* Safe crash recovery
|
||||
* Garbage collection constraints
|
||||
|
||||
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
|
||||
|
||||
Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
|
||||
Canonical spec (refined, replaces earlier draft):
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -1,245 +0,0 @@
|
|||
# ASL-CORE-INDEX
|
||||
|
||||
### Semantic Addendum to ASL-CORE
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the **semantic model of the ASL index**, extending ASL-CORE artifact semantics to include **mapping artifacts to storage locations**.
|
||||
|
||||
The ASL index provides a **deterministic, snapshot-relative mapping** from artifact identities to byte locations within **immutable storage blocks**.
|
||||
|
||||
It specifies **what the index means**, not:
|
||||
|
||||
* How the index is stored or encoded
|
||||
* How blocks are allocated or packed
|
||||
* Performance optimizations
|
||||
* Garbage collection or memory strategies
|
||||
|
||||
Those are handled by:
|
||||
|
||||
* **ASL-STORE-INDEX** (store semantics and contracts)
|
||||
* **ENC-ASL-CORE-INDEX** (bytes-on-disk encoding)
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
* Logical structure of index entries
|
||||
* Visibility rules
|
||||
* Snapshot and log interaction
|
||||
* Immutability and shadowing semantics
|
||||
* Determinism guarantees
|
||||
* Required invariants
|
||||
|
||||
It does **not** define:
|
||||
|
||||
* On-disk formats
|
||||
* Index segmentation or sharding
|
||||
* Bloom filters or probabilistic structures
|
||||
* Memory residency
|
||||
* Performance targets
|
||||
|
||||
---
|
||||
|
||||
## 3. Terminology
|
||||
|
||||
* **Artifact**: An immutable sequence of bytes managed by ASL.
|
||||
* **ArtifactKey**: Opaque identifier for an artifact (typically a hash).
|
||||
* **Block**: Immutable storage unit containing artifact bytes.
|
||||
* **BlockID**: Opaque, unique identifier for a block.
|
||||
* **ArtifactLocation**: Tuple `(BlockID, offset, length)` identifying bytes within a block.
|
||||
* **Snapshot**: Checkpoint capturing a consistent base state of ASL-managed storage and metadata.
|
||||
* **Append-Only Log**: Strictly ordered log of index-visible mutations occurring after a snapshot.
|
||||
* **CURRENT**: The effective system state obtained by replaying the append-only log on top of a checkpoint snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 4. Block Semantics
|
||||
|
||||
ASL-CORE introduces **blocks** minimally:
|
||||
|
||||
1. Blocks are **existential storage atoms** for artifact bytes.
|
||||
2. Each block is uniquely identified by a **BlockID**.
|
||||
3. Blocks are **immutable once sealed**.
|
||||
4. Addressing: `(BlockID, offset, length) → bytes`.
|
||||
5. No block layout, allocation, packing, or size semantics are defined at the core level.
|
||||
|
||||
---
|
||||
|
||||
## 5. Core Semantic Mapping
|
||||
|
||||
The ASL index defines a **total mapping**:
|
||||
|
||||
```
|
||||
ArtifactKey → ArtifactLocation
|
||||
```
|
||||
|
||||
Semantic guarantees:
|
||||
|
||||
* Each visible `ArtifactKey` maps to exactly one `ArtifactLocation`.
|
||||
* Mapping is **immutable once visible**.
|
||||
* Mapping is **snapshot-relative**.
|
||||
* Mapping is **deterministic** given `(snapshot, log prefix)`.
|
||||
|
||||
---
|
||||
|
||||
## 6. ArtifactLocation Semantics
|
||||
|
||||
* `block_id` references an ASL block.
|
||||
* `offset` and `length` define bytes within the block.
|
||||
* Only valid for the lifetime of the referenced block.
|
||||
* No interpretation of bytes is implied.
|
||||
|
||||
---
|
||||
|
||||
## 7. Visibility Model
|
||||
|
||||
An index entry is **visible** if and only if:
|
||||
|
||||
1. The referenced block is sealed.
|
||||
2. A corresponding log record exists.
|
||||
3. The log record is ≤ CURRENT replay position.
|
||||
|
||||
**Consequences**:
|
||||
|
||||
* Entries referencing unsealed blocks are invisible.
|
||||
* Entries above CURRENT are invisible.
|
||||
* Visibility is binary (no gradual exposure).
|
||||
|
||||
---
|
||||
|
||||
## 8. Snapshot and Log Semantics
|
||||
|
||||
* Snapshots act as **checkpoints**, not full state representations.
|
||||
* Index state at any time:
|
||||
|
||||
```
|
||||
Index(CURRENT) = Index(snapshot) + replay(log)
|
||||
```
|
||||
|
||||
* Replay is strictly ordered, deterministic, and idempotent.
|
||||
* Snapshot and log entries are semantically equivalent once replayed.
|
||||
|
||||
---
|
||||
|
||||
## 9. Immutability and Shadowing
|
||||
|
||||
### 9.1 Immutability
|
||||
|
||||
* Index entries are never mutated.
|
||||
* Once visible, an entry’s meaning never changes.
|
||||
* Blocks referenced by entries are immutable.
|
||||
|
||||
### 9.2 Shadowing
|
||||
|
||||
* Later entries may shadow earlier entries with the same `ArtifactKey`.
|
||||
* Precedence is determined by log order.
|
||||
* Snapshot boundaries do not alter shadowing semantics.
|
||||
|
||||
---
|
||||
|
||||
## 10. Tombstones (Optional)
|
||||
|
||||
* Tombstone entries are allowed to invalidate prior mappings.
|
||||
* Semantics:
|
||||
|
||||
* Shadows previous entries for the same `ArtifactKey`.
|
||||
* Visibility follows the same rules as regular entries.
|
||||
* Existence and encoding of tombstones are optional.
|
||||
|
||||
---
|
||||
|
||||
## 11. Determinism Guarantees
|
||||
|
||||
For fixed:
|
||||
|
||||
* Snapshot
|
||||
* Log prefix
|
||||
* ASL configuration
|
||||
* Hash algorithm
|
||||
|
||||
The index guarantees:
|
||||
|
||||
* Deterministic lookup results
|
||||
* Deterministic shadowing resolution
|
||||
* Deterministic visibility
|
||||
|
||||
No nondeterministic input may influence index semantics.
|
||||
|
||||
---
|
||||
|
||||
## 12. Separation of Concerns
|
||||
|
||||
* **ASL-CORE**: Defines artifact semantics and the existence of blocks as storage atoms.
|
||||
* **ASL-CORE-INDEX**: Defines how artifact keys map to blocks, offsets, and lengths.
|
||||
* **ASL-STORE-INDEX**: Defines lifecycle, replay, and visibility guarantees.
|
||||
* **ENC-ASL-CORE-INDEX**: Defines exact bytes-on-disk representation.
|
||||
|
||||
Index semantics **do not** prescribe:
|
||||
|
||||
* Block allocation
|
||||
* Packing strategies
|
||||
* Performance optimizations
|
||||
* Memory residency or caching
|
||||
|
||||
---
|
||||
|
||||
## 13. Normative Invariants
|
||||
|
||||
All conforming implementations must enforce:
|
||||
|
||||
1. No visibility without a log record.
|
||||
2. No mutation of visible index entries.
|
||||
3. No mutation of sealed blocks.
|
||||
4. Shadowing follows strict log order.
|
||||
5. Replay of snapshot + log uniquely defines CURRENT.
|
||||
6. ArtifactLocation always resolves to immutable bytes.
|
||||
|
||||
Violation of any invariant constitutes index corruption.
|
||||
|
||||
---
|
||||
|
||||
## 14. Non-Goals (Explicit)
|
||||
|
||||
ASL-CORE-INDEX does **not** define:
|
||||
|
||||
* Disk layout or encoding
|
||||
* Segment structure, sharding, or bloom filters
|
||||
* GC policies or memory management
|
||||
* Small vs. large block packing
|
||||
* Federation or provenance mechanics
|
||||
|
||||
---
|
||||
|
||||
## 15. Relationship to Other Specifications
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | ---------------------------------------------------------- |
|
||||
| ASL-CORE | Defines artifact semantics and existence of blocks |
|
||||
| ASL-CORE-INDEX | Defines semantic mapping of ArtifactKey → ArtifactLocation |
|
||||
| ASL-STORE-INDEX | Defines store contracts to realize index semantics |
|
||||
| ENC-ASL-CORE-INDEX | Defines exact encoding on disk |
|
||||
|
||||
---
|
||||
|
||||
## 16. Summary
|
||||
|
||||
The ASL index:
|
||||
|
||||
* Maps artifact identities to block locations deterministically
|
||||
* Is immutable once entries are visible
|
||||
* Resolves visibility via snapshots + append-only log
|
||||
* Supports optional tombstones
|
||||
* Provides a stable substrate for store, encoding, and higher layers like PEL
|
||||
|
||||
It answers **exactly one question**:
|
||||
|
||||
> *Given an artifact identity and a point in time, where are the bytes?*
|
||||
|
||||
Nothing more, nothing less.
|
||||
|
||||
|
||||
|
|
@ -138,7 +138,7 @@ It ensures **determinism, traceability, and reproducibility** across federated d
|
|||
| ASL-CORE | Blocks and artifacts remain immutable; no change |
|
||||
| ASL-CORE-INDEX | Artifact → Block mapping is domain-local; published artifacts are indexed across domains |
|
||||
| ASL-STORE-INDEX | Sealing, retention, and snapshot pinning apply per domain; GC respects cross-domain references |
|
||||
| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation |
|
||||
| ENC-ASL-CORE-INDEX | Encoding of index entries may include domain and visibility flags for federation (`tier1/enc-asl-core-index.md`) |
|
||||
| PEL | DAG execution may include imported artifacts; determinism guaranteed per domain snapshot |
|
||||
| PEL-PROV / PEL-TRACE | Maintains provenance including cross-domain artifact lineage |
|
||||
|
||||
|
|
@ -155,5 +155,3 @@ The Federation Specification formalizes:
|
|||
* Integration with index, store, PEL, and provenance layers
|
||||
|
||||
It ensures **multi-domain determinism, traceability, and reproducibility** while leaving semantics and storage-layer policies unchanged.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -1,439 +0,0 @@
|
|||
# ASL-STORE-INDEX
|
||||
|
||||
### Store Semantics and Contracts for ASL Index
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the **store-level responsibilities and contracts** required to implement the ASL-CORE-INDEX semantics.
|
||||
|
||||
It bridges the gap between **index meaning** and **physical storage**, ensuring:
|
||||
|
||||
* Deterministic replay
|
||||
* Snapshot-aware visibility
|
||||
* Immutable block guarantees
|
||||
* Idempotent recovery
|
||||
* Correctness of CURRENT state
|
||||
|
||||
It does **not** define exact encoding, memory layout, or acceleration structures (see ENC-ASL-CORE-INDEX).
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This specification covers:
|
||||
|
||||
* Index segment lifecycle
|
||||
* Interaction between index and ASL blocks
|
||||
* Append-only log semantics
|
||||
* Snapshot integration
|
||||
* Visibility and lookup rules
|
||||
* Crash safety and recovery
|
||||
* Garbage collection constraints
|
||||
|
||||
It does **not** cover:
|
||||
|
||||
* Disk format details
|
||||
* Bloom filter algorithms
|
||||
* File system specifics
|
||||
* Placement heuristics beyond semantic guarantees
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Concepts
|
||||
|
||||
### 3.1 Index Segment
|
||||
|
||||
A **segment** is a contiguous set of index entries written by the store.
|
||||
|
||||
* Open while accepting new entries
|
||||
* Sealed when closed for append
|
||||
* Sealed segments are immutable
|
||||
* Sealed segments are **snapshot-visible only after log record**
|
||||
|
||||
Segments are the **unit of persistence, replay, and GC**.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 ASL Block Relationship
|
||||
|
||||
Each index entry references a **sealed block** via:
|
||||
|
||||
```
|
||||
ArtifactKey → (BlockID, offset, length)
|
||||
```
|
||||
|
||||
* The store must ensure the block is sealed before the entry becomes log-visible
|
||||
* Blocks are immutable after seal
|
||||
* Open blocks may be abandoned without violating invariants
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Append-Only Log
|
||||
|
||||
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
||||
|
||||
* Entries include index additions, tombstones, and segment seals
|
||||
* Log is durable and replayable
|
||||
* Log defines visibility above checkpoint snapshots
|
||||
|
||||
**CURRENT state** is derived as:
|
||||
|
||||
```
|
||||
CURRENT = checkpoint_state + replay(log)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Segment Lifecycle
|
||||
|
||||
### 4.1 Creation
|
||||
|
||||
* Open segment is allocated
|
||||
* Index entries appended in log order
|
||||
* Entries are invisible until segment seal and log append
|
||||
|
||||
### 4.2 Seal
|
||||
|
||||
* Segment is closed to append
|
||||
* Seal record is written to append-only log
|
||||
* Segment becomes visible for lookup
|
||||
* Sealed segment may be snapshot-pinned
|
||||
|
||||
### 4.3 Snapshot Interaction
|
||||
|
||||
* Snapshots capture sealed segments
|
||||
* Open segments need not survive snapshot
|
||||
* Segments below snapshot are replay anchors
|
||||
|
||||
### 4.4 Garbage Collection
|
||||
|
||||
* Only **sealed and unreachable segments** can be deleted
|
||||
* GC operates at segment granularity
|
||||
* GC must not break CURRENT or violate invariants
|
||||
|
||||
---
|
||||
|
||||
## 5. Lookup Semantics
|
||||
|
||||
To resolve an `ArtifactKey`:
|
||||
|
||||
1. Identify all visible segments ≤ CURRENT
|
||||
2. Search segments in **reverse creation order** (newest first)
|
||||
3. Return the first matching entry
|
||||
4. Respect tombstone entries (if present)
|
||||
|
||||
Lookups may use memory-mapped structures, bloom filters, sharding, or SIMD, **but correctness must be independent of acceleration strategies**.
|
||||
|
||||
---
|
||||
|
||||
## 6. Visibility Guarantees
|
||||
|
||||
* Entry visible **iff**:
|
||||
|
||||
* The block is sealed
|
||||
* Log record exists ≤ CURRENT
|
||||
* Segment seal recorded in log
|
||||
* Entries above CURRENT or referencing unsealed blocks are invisible
|
||||
|
||||
---
|
||||
|
||||
## 7. Crash and Recovery Semantics
|
||||
|
||||
### 7.1 Crash During Open Segment
|
||||
|
||||
* Open segments may be lost
|
||||
* Index entries may be leaked
|
||||
* No sealed segment may be corrupted
|
||||
|
||||
### 7.2 Recovery Procedure
|
||||
|
||||
1. Mount latest checkpoint snapshot
|
||||
2. Replay append-only log from checkpoint
|
||||
3. Rebuild CURRENT
|
||||
4. Resume normal operation
|
||||
|
||||
Recovery must be **deterministic and idempotent**.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tombstone Semantics
|
||||
|
||||
* Optional: tombstones may exist to invalidate prior mappings
|
||||
* Tombstones shadow prior entries with the same `ArtifactKey`
|
||||
* Tombstone visibility follows same rules as regular entries
|
||||
|
||||
---
|
||||
|
||||
## 9. Invariants (Normative)
|
||||
|
||||
The store **must enforce**:
|
||||
|
||||
1. No segment visible without seal log record
|
||||
2. No mutation of sealed segment or block
|
||||
3. Shadowing follows log order strictly
|
||||
4. Replay uniquely reconstructs CURRENT
|
||||
5. GC does not remove segments referenced by snapshot or log
|
||||
6. ArtifactLocation always points to immutable bytes
|
||||
|
||||
---
|
||||
|
||||
## 10. Non-Goals
|
||||
|
||||
ASL-STORE-INDEX does **not** define:
|
||||
|
||||
* Disk layout or encoding (ENC-ASL-CORE-INDEX)
|
||||
* Placement heuristics (small vs. large block packing)
|
||||
* Performance targets
|
||||
* Memory caching strategies
|
||||
* Federation or provenance mechanics
|
||||
|
||||
---
|
||||
|
||||
## 11. Relationship to Other Documents
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | -------------------------------------------------------------------- |
|
||||
| ASL-CORE-INDEX | Defines semantic meaning of mapping `ArtifactKey → ArtifactLocation` |
|
||||
| ASL-STORE-INDEX | Defines contracts for store to realize those semantics |
|
||||
| ENC-ASL-CORE-INDEX | Defines bytes-on-disk format |
|
||||
|
||||
---
|
||||
|
||||
## 12. Summary
|
||||
|
||||
The store-index layer guarantees:
|
||||
|
||||
* Immutable, snapshot-safe segments
|
||||
* Deterministic and idempotent replay
|
||||
* Correct visibility semantics
|
||||
* Safe crash recovery
|
||||
* Garbage collection constraints
|
||||
|
||||
This specification ensures that **ASL-CORE-INDEX semantics are faithfully realized in the store** without constraining encoding or acceleration strategies.
|
||||
|
||||
Here’s a **fully refined version of ASL-STORE-INDEX**, incorporating **block lifecycle, sealing, snapshot safety, retention, and GC rules**, fully aligned with ASL-CORE-INDEX semantics. This makes the store layer complete and unambiguous.
|
||||
|
||||
---
|
||||
|
||||
# ASL-STORE-INDEX
|
||||
|
||||
### Store Semantics and Contracts for ASL Core Index (Refined)
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the **operational and store-level semantics** necessary to implement ASL-CORE-INDEX.
|
||||
|
||||
It specifies:
|
||||
|
||||
* **Block lifecycle**: creation, sealing, retention
|
||||
* **Index segment lifecycle**: creation, append, seal, visibility
|
||||
* **Snapshot interaction**: pinning, deterministic visibility
|
||||
* **Append-only log semantics**
|
||||
* **Garbage collection rules**
|
||||
|
||||
It **does not define encoding** (see ENC-ASL-CORE-INDEX) or semantic mapping (see ASL-CORE-INDEX).
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
Covers:
|
||||
|
||||
* Lifecycle of **blocks** and **index entries**
|
||||
* Snapshot and CURRENT consistency guarantees
|
||||
* Deterministic replay and recovery
|
||||
* GC and tombstone semantics
|
||||
|
||||
Excludes:
|
||||
|
||||
* Disk-level encoding
|
||||
* Sharding strategies
|
||||
* Bloom filters or acceleration structures
|
||||
* Memory residency or caching
|
||||
* Federation or PEL semantics
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Concepts
|
||||
|
||||
### 3.1 Block
|
||||
|
||||
* **Definition:** Immutable storage unit containing artifact bytes.
|
||||
* **Identifier:** BlockID (opaque, unique)
|
||||
* **Properties:**
|
||||
|
||||
* Once sealed, contents never change
|
||||
* Can be referenced by multiple artifacts
|
||||
* May be pinned by snapshots for retention
|
||||
* **Lifecycle Events:**
|
||||
|
||||
1. Creation: block allocated but contents may still be written
|
||||
2. Sealing: block is finalized, immutable, and log-visible
|
||||
3. Retention: block remains accessible while pinned by snapshots or needed by CURRENT
|
||||
4. Garbage collection: block may be deleted if no longer referenced and unpinned
|
||||
|
||||
---
|
||||
|
||||
### 3.2 Index Segment
|
||||
|
||||
Segments group index entries and provide **persistence and recovery units**.
|
||||
|
||||
* **Open segment:** accepting new index entries, not visible for lookup
|
||||
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable
|
||||
* **Segment components:** header, optional bloom filter, index records, footer
|
||||
* **Segment visibility:** only after seal and log append
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Append-Only Log
|
||||
|
||||
All store operations affecting index visibility are recorded in a **strictly ordered, append-only log**:
|
||||
|
||||
* Entries include:
|
||||
|
||||
* Index additions
|
||||
* Tombstones
|
||||
* Segment seals
|
||||
* Log is replayable to reconstruct CURRENT
|
||||
* Determinism: replay produces identical CURRENT from same snapshot and log prefix
|
||||
|
||||
---
|
||||
|
||||
## 4. Block Lifecycle Semantics
|
||||
|
||||
| Event | Description | Semantic Guarantees |
|
||||
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
|
||||
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
|
||||
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
|
||||
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
|
||||
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
|
||||
|
||||
**Notes:**
|
||||
|
||||
* Sealing ensures that any index entry referencing the block is deterministic and immutable.
|
||||
* Retention is driven by snapshot and log visibility rules.
|
||||
* GC must **never violate CURRENT reconstruction guarantees**.
|
||||
|
||||
---
|
||||
|
||||
## 5. Snapshot Interaction
|
||||
|
||||
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
|
||||
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
|
||||
* CURRENT is reconstructed as:
|
||||
|
||||
```
|
||||
CURRENT = snapshot_state + replay(log)
|
||||
```
|
||||
|
||||
* Segment and block visibility rules:
|
||||
|
||||
| Entity | Visible in snapshot | Visible in CURRENT |
|
||||
| -------------------- | ---------------------------- | ------------------------------ |
|
||||
| Open segment/block | No | Only after seal and log append |
|
||||
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
|
||||
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
|
||||
|
||||
---
|
||||
|
||||
## 6. Index Lookup Semantics
|
||||
|
||||
To resolve an `ArtifactKey`:
|
||||
|
||||
1. Identify all visible segments ≤ CURRENT
|
||||
2. Search segments in **reverse creation order** (newest first)
|
||||
3. Return first matching entry
|
||||
4. Respect tombstones to shadow prior entries
|
||||
|
||||
Determinism:
|
||||
|
||||
* Lookup results are identical across platforms given the same snapshot and log prefix
|
||||
* Accelerations (bloom filters, sharding, SIMD) do **not alter correctness**
|
||||
|
||||
---
|
||||
|
||||
## 7. Garbage Collection
|
||||
|
||||
* **Eligibility for GC:**
|
||||
|
||||
* Segments: sealed, no references from CURRENT or snapshots
|
||||
* Blocks: unpinned, unreferenced by any segment or artifact
|
||||
* **Rules:**
|
||||
|
||||
* GC is safe **only on sealed segments and blocks**
|
||||
* Must respect snapshot pins
|
||||
* Tombstones may aid in invalidating unreachable blocks
|
||||
* **Outcome:**
|
||||
|
||||
* GC never violates CURRENT reconstruction
|
||||
* Blocks can be reclaimed without breaking provenance
|
||||
|
||||
---
|
||||
|
||||
## 8. Tombstone Semantics
|
||||
|
||||
* Optional marker to invalidate prior mappings
|
||||
* Visibility rules identical to regular index entries
|
||||
* Used to maintain deterministic CURRENT in face of shadowing or deletions
|
||||
|
||||
---
|
||||
|
||||
## 9. Crash and Recovery Semantics
|
||||
|
||||
* Open segments or unsealed blocks may be lost; no invariant is broken
|
||||
* Recovery procedure:
|
||||
|
||||
1. Mount last checkpoint snapshot
|
||||
2. Replay append-only log
|
||||
3. Reconstruct CURRENT
|
||||
* Recovery is **deterministic and idempotent**
|
||||
* Segments and blocks **never partially visible** after crash
|
||||
|
||||
---
|
||||
|
||||
## 10. Normative Invariants
|
||||
|
||||
1. Sealed blocks are immutable
|
||||
2. Index entries referencing blocks are immutable once visible
|
||||
3. Shadowing follows strict log order
|
||||
4. Replay of snapshot + log uniquely reconstructs CURRENT
|
||||
5. GC cannot remove blocks or segments needed by snapshot or CURRENT
|
||||
6. Tombstones shadow prior entries without deleting underlying blocks prematurely
|
||||
|
||||
---
|
||||
|
||||
## 11. Non-Goals
|
||||
|
||||
* Disk-level encoding (ENC-ASL-CORE-INDEX)
|
||||
* Memory layout or caching
|
||||
* Sharding or performance heuristics
|
||||
* Federation / multi-domain semantics (handled elsewhere)
|
||||
* Block packing strategies (small vs large blocks)
|
||||
|
||||
---
|
||||
|
||||
## 12. Relationship to Other Layers
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | ---------------------------------------------------------------------------- |
|
||||
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
|
||||
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
|
||||
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
|
||||
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
|
||||
|
||||
---
|
||||
|
||||
## 13. Summary
|
||||
|
||||
The refined ASL-STORE-INDEX:
|
||||
|
||||
* Defines **block lifecycle**: creation, sealing, retention, GC
|
||||
* Ensures **snapshot safety** and deterministic visibility
|
||||
* Guarantees **immutable, replayable, and recoverable CURRENT**
|
||||
* Provides operational contracts to faithfully implement ASL-CORE-INDEX semantics
|
||||
|
||||
|
||||
|
|
@ -1,5 +1,7 @@
|
|||
# ENC-ASL-CORE-INDEX ADDENDUM: Federation Encoding
|
||||
|
||||
Base spec: `tier1/enc-asl-core-index.md`
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
|
@ -109,5 +111,3 @@ This addendum updates **ENC-ASL-CORE-INDEX** to support **federation**:
|
|||
* Maintains backward compatibility with legacy segments
|
||||
|
||||
It integrates federation metadata **without altering the underlying block or artifact encoding**, preserving deterministic execution and PEL provenance.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -33,7 +33,7 @@
|
|||
│ • Block sealing │
|
||||
│ • Retention / GC │
|
||||
│ • Small/Large packing │
|
||||
│ - ENC-ASL-CORE-INDEX │
|
||||
│ - ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md) │
|
||||
│ • On-disk record layout│
|
||||
│ • Domain / visibility │
|
||||
└─────────────┬──────────────┘
|
||||
|
|
@ -143,5 +143,3 @@ This diagram and flow description captures:
|
|||
* Deterministic reconstruction from **checkpoint + append-only log**
|
||||
* Block semantics, small/large handling, and domain visibility
|
||||
* Integration of **execution receipts** into artifact flows and traces
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -182,7 +182,7 @@ ASL-CORE-INDEX
|
|||
ASL-STORE-INDEX
|
||||
└─ Store lifecycle & snapshot safety
|
||||
|
||||
ENC-ASL-CORE-INDEX
|
||||
ENC-ASL-CORE-INDEX (tier1/enc-asl-core-index.md)
|
||||
└─ Bytes-on-disk encoding
|
||||
|
||||
ASL-INDEX-ACCEL ← NEW
|
||||
|
|
@ -214,4 +214,3 @@ If you want next, I can:
|
|||
|
||||
* Draft **ASL-INDEX-ACCEL**
|
||||
* Or rewrite **ASL-CORE-INDEX with Canonical vs Routing fully integrated**
|
||||
|
||||
|
|
|
|||
211
tier1/asl-core-index.md
Normal file
211
tier1/asl-core-index.md
Normal file
|
|
@ -0,0 +1,211 @@
|
|||
# ASL/1-CORE-INDEX — Semantic Index Model
|
||||
|
||||
Status: Draft
|
||||
Owner: Niklas Rydberg
|
||||
Version: 0.1.0
|
||||
SoT: No
|
||||
Last Updated: 2025-11-16
|
||||
Tags: [deterministic, index, semantics]
|
||||
|
||||
**Document ID:** `ASL/1-CORE-INDEX`
|
||||
**Layer:** L0.5 — Semantic mapping over ASL/1-CORE values (no storage / encoding / lifecycle)
|
||||
|
||||
**Depends on (normative):**
|
||||
|
||||
* `ASL/1-CORE`
|
||||
* `ASL/1-STORE`
|
||||
|
||||
**Informative references:**
|
||||
|
||||
* `ASL-STORE-INDEX` — store lifecycle and replay contracts
|
||||
* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`)
|
||||
* `ASL/INDEX-ACCEL/1` — acceleration semantics (routing, filters, sharding)
|
||||
* `ASL/LOG/1` — append-only semantic log (segment visibility)
|
||||
|
||||
---
|
||||
|
||||
## 0. Conventions
|
||||
|
||||
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
|
||||
|
||||
ASL/1-CORE-INDEX defines **semantic meaning only**. It does not define storage formats, on-disk encoding, or operational lifecycle. Those belong to ASL-STORE-INDEX, ASL/LOG/1, and ENC-ASL-CORE-INDEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose & Non-Goals
|
||||
|
||||
### 1.1 Purpose
|
||||
|
||||
ASL/1-CORE-INDEX defines the **semantic model** for indexing artifacts:
|
||||
|
||||
* It specifies what it means to map an artifact identity to a byte location.
|
||||
* It defines visibility, immutability, and shadowing semantics.
|
||||
* It ensures deterministic lookup for a fixed snapshot and log prefix.
|
||||
|
||||
### 1.2 Non-goals
|
||||
|
||||
ASL/1-CORE-INDEX explicitly does **not** define:
|
||||
|
||||
* On-disk layouts, segment files, or memory representations.
|
||||
* Block allocation, packing, GC, or lifecycle rules.
|
||||
* Snapshot implementation details, checkpoints, or log storage.
|
||||
* Performance optimizations (bloom filters, sharding, SIMD).
|
||||
* Federation, provenance, or execution semantics.
|
||||
|
||||
---
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
* **Artifact** — ASL/1 immutable value defined in ASL/1-CORE.
|
||||
* **Reference** — ASL/1 content address of an Artifact (hash_id + digest).
|
||||
* **StoreConfig** — `{ encoding_profile, hash_id }` fixed per StoreSnapshot (ASL/1-STORE).
|
||||
* **Block** — immutable storage unit containing artifact bytes.
|
||||
* **BlockID** — opaque identifier for a block.
|
||||
* **ArtifactExtent** — `(BlockID, offset, length)` identifying a byte slice within a block.
|
||||
* **ArtifactLocation** — ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
|
||||
* **Snapshot** — a checkpointed StoreSnapshot (ASL/1-STORE) used as a base state.
|
||||
* **Append-Only Log** — ordered sequence of index-visible mutations after a snapshot.
|
||||
* **CURRENT** — effective state after replaying a log prefix on a snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Mapping Semantics
|
||||
|
||||
### 3.1 Index Mapping
|
||||
|
||||
The index defines a semantic mapping:
|
||||
|
||||
```
|
||||
Reference -> ArtifactLocation
|
||||
```
|
||||
|
||||
For any visible `Reference`, there is exactly one `ArtifactLocation` at a given CURRENT state.
|
||||
|
||||
### 3.2 Determinism
|
||||
|
||||
For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be deterministic. No nondeterministic input may affect index semantics.
|
||||
|
||||
### 3.3 StoreConfig Consistency
|
||||
|
||||
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`.
|
||||
|
||||
---
|
||||
|
||||
## 4. ArtifactLocation Semantics
|
||||
|
||||
* An ArtifactLocation is an **ordered list** of ArtifactExtents.
|
||||
* Each extent references immutable bytes within a block.
|
||||
* The artifact bytes are defined by **concatenating extents in order**.
|
||||
* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes.
|
||||
* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks.
|
||||
* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
|
||||
* An ArtifactLocation is valid only while all referenced blocks are retained.
|
||||
* ASL/1-CORE-INDEX does not define how blocks are allocated or sealed; it only requires that referenced bytes are immutable for the lifetime of the mapping.
|
||||
|
||||
---
|
||||
|
||||
## 5. Visibility Model
|
||||
|
||||
An index entry is **visible** at CURRENT if and only if:
|
||||
|
||||
1. The entry is admitted in the ordered log prefix for CURRENT.
|
||||
2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
|
||||
|
||||
Visibility is binary; entries are either visible or not visible.
|
||||
|
||||
---
|
||||
|
||||
## 6. Snapshot and Log Semantics
|
||||
|
||||
Snapshots provide a base mapping; the append-only log defines subsequent changes.
|
||||
|
||||
The index state for a given CURRENT is defined as:
|
||||
|
||||
```
|
||||
Index(CURRENT) = Index(snapshot) + replay(log_prefix)
|
||||
```
|
||||
|
||||
Replay is strictly ordered, deterministic, and idempotent. Snapshot and log entries are semantically equivalent once replayed.
|
||||
|
||||
---
|
||||
|
||||
## 7. Immutability and Shadowing
|
||||
|
||||
### 7.1 Immutability
|
||||
|
||||
* Index entries are never mutated.
|
||||
* Once visible, an entry’s meaning does not change.
|
||||
* Referenced bytes are immutable for the lifetime of the entry.
|
||||
|
||||
### 7.2 Shadowing
|
||||
|
||||
* Later entries MAY shadow earlier entries with the same Reference.
|
||||
* Precedence is determined solely by log order.
|
||||
* Snapshot boundaries do not alter shadowing semantics.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tombstones (Optional)
|
||||
|
||||
Tombstone entries MAY be used to invalidate prior mappings.
|
||||
|
||||
* A tombstone shadows earlier entries for the same Reference.
|
||||
* Visibility rules are identical to regular entries.
|
||||
* Encoding is optional and defined by ENC-ASL-CORE-INDEX if used.
|
||||
|
||||
---
|
||||
|
||||
## 9. Determinism Guarantees
|
||||
|
||||
For fixed:
|
||||
|
||||
* StoreConfig
|
||||
* Snapshot
|
||||
* Log prefix
|
||||
|
||||
ASL/1-CORE-INDEX guarantees:
|
||||
|
||||
* Deterministic lookup results
|
||||
* Deterministic shadowing resolution
|
||||
* Deterministic visibility
|
||||
|
||||
---
|
||||
|
||||
## 10. Normative Invariants
|
||||
|
||||
Conforming implementations MUST enforce:
|
||||
|
||||
1. No visibility without a log-admitted entry.
|
||||
2. No mutation of visible index entries.
|
||||
3. Referenced bytes remain immutable for the entry’s lifetime.
|
||||
4. Shadowing follows strict log order.
|
||||
5. Snapshot + log replay uniquely defines CURRENT.
|
||||
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun).
|
||||
|
||||
Violation of any invariant constitutes index corruption.
|
||||
|
||||
---
|
||||
|
||||
## 11. Relationship to Other Specifications
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | ---------------------------------------------------------- |
|
||||
| ASL/1-CORE | Artifact semantics and identity |
|
||||
| ASL/1-STORE | StoreSnapshot and put/get logical model |
|
||||
| ASL/1-CORE-INDEX | Semantic mapping of Reference → ArtifactLocation |
|
||||
| ASL-STORE-INDEX | Lifecycle, replay, and visibility contracts |
|
||||
| ENC-ASL-CORE-INDEX | On-disk encoding for index segments and records |
|
||||
|
||||
---
|
||||
|
||||
## 12. Summary
|
||||
|
||||
ASL/1-CORE-INDEX specifies the semantic meaning of the index:
|
||||
|
||||
* It maps artifact References to byte locations deterministically.
|
||||
* It defines visibility and shadowing rules across snapshot + log replay.
|
||||
* It guarantees immutability and deterministic lookup.
|
||||
|
||||
It answers one question:
|
||||
|
||||
> *Given a Reference and a CURRENT state, where are the bytes?*
|
||||
272
tier1/asl-index-accel-1.md
Normal file
272
tier1/asl-index-accel-1.md
Normal file
|
|
@ -0,0 +1,272 @@
|
|||
# ASL/INDEX-ACCEL/1 — Index Acceleration Semantics
|
||||
|
||||
Status: Draft
|
||||
Owner: Niklas Rydberg
|
||||
Version: 0.1.0
|
||||
SoT: No
|
||||
Last Updated: 2025-11-16
|
||||
Tags: [deterministic, index, acceleration]
|
||||
|
||||
**Document ID:** `ASL/INDEX-ACCEL/1`
|
||||
**Layer:** L1 — Acceleration rules over index semantics (no storage / encoding)
|
||||
|
||||
**Depends on (normative):**
|
||||
|
||||
* `ASL/1-CORE-INDEX`
|
||||
|
||||
**Informative references:**
|
||||
|
||||
* `ASL-STORE-INDEX` — store lifecycle and replay contracts
|
||||
* `ENC-ASL-CORE-INDEX` — bytes-on-disk encoding profile (`tier1/enc-asl-core-index.md`)
|
||||
|
||||
---
|
||||
|
||||
## 0. Conventions
|
||||
|
||||
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
|
||||
|
||||
ASL/INDEX-ACCEL/1 defines **acceleration semantics only**. It MUST NOT change index meaning defined by ASL/1-CORE-INDEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
ASL/INDEX-ACCEL/1 defines **acceleration mechanisms** used by ASL-based indexes, including:
|
||||
|
||||
* Routing keys
|
||||
* Sharding
|
||||
* Filters (Bloom, XOR, Ribbon, etc.)
|
||||
* SIMD execution
|
||||
* Hash recasting
|
||||
|
||||
All mechanisms defined herein are **observationally invisible** to ASL/1-CORE-INDEX semantics.
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
Applies to:
|
||||
|
||||
* Artifact indexes (ASL)
|
||||
* Projection and graph indexes (e.g., TGK)
|
||||
* Any index layered on ASL/1-CORE-INDEX semantics
|
||||
|
||||
Does **not** define:
|
||||
|
||||
* Artifact or edge identity
|
||||
* Snapshot semantics
|
||||
* Storage lifecycle
|
||||
* Encoding details
|
||||
|
||||
---
|
||||
|
||||
## 3. Canonical Key vs Routing Key
|
||||
|
||||
### 3.1 Canonical Key
|
||||
|
||||
The **Canonical Key** uniquely identifies an indexable entity.
|
||||
|
||||
Examples:
|
||||
|
||||
* Artifact: `Reference`
|
||||
* TGK Edge: `CanonicalEdgeKey`
|
||||
|
||||
Properties:
|
||||
|
||||
* Defines semantic identity
|
||||
* Used for equality, shadowing, and tombstones
|
||||
* Stable and immutable
|
||||
* Fully compared on index match
|
||||
|
||||
### 3.2 Routing Key
|
||||
|
||||
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
|
||||
|
||||
Properties:
|
||||
|
||||
* Derived deterministically from Canonical Key and optional attributes
|
||||
* MAY be used for sharding, filters, SIMD layouts
|
||||
* MUST NOT affect index semantics
|
||||
* MUST be verified by full Canonical Key comparison on match
|
||||
|
||||
Formal rule:
|
||||
|
||||
```
|
||||
CanonicalKey determines correctness
|
||||
RoutingKey determines performance
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Filter Semantics
|
||||
|
||||
### 4.1 Advisory Nature
|
||||
|
||||
All filters are **advisory only**.
|
||||
|
||||
Rules:
|
||||
|
||||
* False positives are permitted
|
||||
* False negatives are forbidden
|
||||
* Filter behavior MUST NOT affect correctness
|
||||
|
||||
Invariant:
|
||||
|
||||
```
|
||||
Filter miss => key is definitely absent
|
||||
Filter hit => key may be present
|
||||
```
|
||||
|
||||
### 4.2 Filter Inputs
|
||||
|
||||
Filters operate over **Routing Keys**, not Canonical Keys.
|
||||
|
||||
A Routing Key MAY incorporate:
|
||||
|
||||
* Hash of Canonical Key
|
||||
* Artifact type tag (if present)
|
||||
* TGK edge type key
|
||||
* Direction, role, or other immutable classification attributes
|
||||
|
||||
Absence of optional attributes MUST be encoded explicitly.
|
||||
|
||||
### 4.3 Filter Construction
|
||||
|
||||
* Filters are built only over **sealed, immutable segments**
|
||||
* Filters are immutable once built
|
||||
* Filter construction MUST be deterministic
|
||||
* Filter state MUST be covered by segment checksums
|
||||
|
||||
---
|
||||
|
||||
## 5. Sharding Semantics
|
||||
|
||||
### 5.1 Observational Invisibility
|
||||
|
||||
Sharding is a **mechanical partitioning** of the index.
|
||||
|
||||
Invariant:
|
||||
|
||||
```
|
||||
LogicalIndex = union(all shards)
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
* Shards MUST NOT affect lookup results
|
||||
* Shard count and boundaries may change over time
|
||||
* Rebalancing MUST preserve lookup semantics
|
||||
|
||||
### 5.2 Shard Assignment
|
||||
|
||||
Shard assignment MAY be based on:
|
||||
|
||||
* Hash of Canonical Key
|
||||
* Routing Key
|
||||
* Composite routing strategies
|
||||
|
||||
Shard selection MUST be deterministic per snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 6. Hashing and Hash Recasting
|
||||
|
||||
### 6.1 Hashing
|
||||
|
||||
Hashes MAY be used for routing, filtering, or SIMD layout.
|
||||
|
||||
Hashes MUST NOT be treated as identity.
|
||||
|
||||
### 6.2 Hash Recasting
|
||||
|
||||
Hash recasting (changing hash functions or seeds) is permitted if:
|
||||
|
||||
1. It is deterministic
|
||||
2. It does not change Canonical Keys
|
||||
3. It does not affect index semantics
|
||||
|
||||
Recasting is equivalent to rebuilding acceleration structures.
|
||||
|
||||
---
|
||||
|
||||
## 7. SIMD Execution
|
||||
|
||||
SIMD operations MAY be used to:
|
||||
|
||||
* Evaluate filters
|
||||
* Compare routing keys
|
||||
* Accelerate scans
|
||||
|
||||
Rules:
|
||||
|
||||
* SIMD must operate only on immutable data
|
||||
* SIMD must not short-circuit semantic checks
|
||||
* SIMD must preserve deterministic behavior
|
||||
|
||||
---
|
||||
|
||||
## 8. Multi-Dimensional Routing Examples (Normative)
|
||||
|
||||
### 8.1 Artifact Index
|
||||
|
||||
* Canonical Key: `Reference`
|
||||
* Routing Key components:
|
||||
|
||||
* `H(Reference)`
|
||||
* `type_tag` (if present)
|
||||
* `has_typetag`
|
||||
|
||||
### 8.2 TGK Edge Index
|
||||
|
||||
* Canonical Key: `CanonicalEdgeKey`
|
||||
* Routing Key components:
|
||||
|
||||
* `H(CanonicalEdgeKey)`
|
||||
* `edge_type_key`
|
||||
* Direction or role (optional)
|
||||
|
||||
---
|
||||
|
||||
## 9. Snapshot Interaction
|
||||
|
||||
Acceleration structures:
|
||||
|
||||
* MUST respect snapshot visibility rules
|
||||
* MUST operate over the same sealed segments visible to the snapshot
|
||||
* MUST NOT bypass tombstones or shadowing
|
||||
|
||||
Snapshot cuts apply **after** routing and filtering.
|
||||
|
||||
---
|
||||
|
||||
## 10. Normative Invariants
|
||||
|
||||
1. Canonical Keys define identity and correctness
|
||||
2. Routing Keys are advisory only
|
||||
3. Filters may never introduce false negatives
|
||||
4. Sharding is observationally invisible
|
||||
5. Hashes are not identity
|
||||
6. SIMD is an execution strategy, not a semantic construct
|
||||
7. All acceleration is deterministic per snapshot
|
||||
|
||||
---
|
||||
|
||||
## 11. Non-Goals
|
||||
|
||||
ASL/INDEX-ACCEL/1 does not define:
|
||||
|
||||
* Specific filter algorithms
|
||||
* Memory layout
|
||||
* CPU instruction selection
|
||||
* Encoding formats
|
||||
* Federation policies
|
||||
|
||||
---
|
||||
|
||||
## 12. Summary
|
||||
|
||||
ASL/INDEX-ACCEL/1 establishes a strict contract:
|
||||
|
||||
> All acceleration exists to make the index faster, never different.
|
||||
|
||||
It formalizes Canonical vs Routing keys and constrains filters, sharding, hashing, and SIMD so that correctness is preserved under all optimizations.
|
||||
207
tier1/asl-log-1.md
Normal file
207
tier1/asl-log-1.md
Normal file
|
|
@ -0,0 +1,207 @@
|
|||
# ASL/LOG/1 — Append-Only Semantic Log
|
||||
|
||||
Status: Draft
|
||||
Owner: Niklas Rydberg
|
||||
Version: 0.1.0
|
||||
SoT: No
|
||||
Last Updated: 2025-11-16
|
||||
Tags: [deterministic, log, snapshot]
|
||||
|
||||
**Document ID:** `ASL/LOG/1`
|
||||
**Layer:** L1 — Domain log semantics (no transport)
|
||||
|
||||
**Depends on (normative):**
|
||||
|
||||
* `ASL-STORE-INDEX`
|
||||
|
||||
**Informative references:**
|
||||
|
||||
* `ASL/1-CORE-INDEX` — index semantics
|
||||
* `ENC-ASL-LOG` — bytes-on-disk encoding profile (if defined)
|
||||
* `ENC-ASL-CORE-INDEX` — index segment encoding (`tier1/enc-asl-core-index.md`)
|
||||
|
||||
---
|
||||
|
||||
## 0. Conventions
|
||||
|
||||
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
|
||||
|
||||
ASL/LOG/1 defines **semantic log behavior**. It does not define transport, replication protocols, or storage layout.
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
ASL/LOG/1 defines the **authoritative, append-only log** for an ASL domain.
|
||||
|
||||
The log records **semantic commits** that affect:
|
||||
|
||||
* Index segment visibility
|
||||
* Tombstone policy
|
||||
* Snapshot anchoring
|
||||
* Optional publication metadata
|
||||
|
||||
The log is the **sole source of truth** for reconstructing CURRENT state.
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Properties (Normative)
|
||||
|
||||
An ASL log MUST be:
|
||||
|
||||
1. Append-only
|
||||
2. Strictly ordered
|
||||
3. Deterministically replayable
|
||||
4. Hash-chained
|
||||
5. Snapshot-anchorable
|
||||
6. Forward-compatible
|
||||
|
||||
---
|
||||
|
||||
## 3. Log Model
|
||||
|
||||
### 3.1 Log Sequence
|
||||
|
||||
Each record has a monotonically increasing `logseq`:
|
||||
|
||||
```
|
||||
logseq: uint64
|
||||
```
|
||||
|
||||
* Assigned by the domain authority
|
||||
* Total order within a domain
|
||||
* Never reused
|
||||
|
||||
### 3.2 Hash Chain
|
||||
|
||||
Each record commits to the previous record:
|
||||
|
||||
```
|
||||
record_hash = H(prev_record_hash || record_type || payload)
|
||||
```
|
||||
|
||||
This enables tamper detection, witness signing, and federation verification.
|
||||
|
||||
---
|
||||
|
||||
## 4. Record Types (Normative)
|
||||
|
||||
### 4.1 SEGMENT_SEAL
|
||||
|
||||
Declares an index segment visible.
|
||||
|
||||
Semantics:
|
||||
|
||||
* From this `logseq` onward, the referenced segment is visible for lookup and replay.
|
||||
* Segment MUST be immutable.
|
||||
* All referenced blocks MUST already be sealed.
|
||||
* Segment contents are not re-logged.
|
||||
|
||||
### 4.2 TOMBSTONE
|
||||
|
||||
Declares an artifact inadmissible under domain policy.
|
||||
|
||||
Semantics:
|
||||
|
||||
* Does not delete data.
|
||||
* Shadows prior visibility.
|
||||
* Applies from this logseq onward.
|
||||
|
||||
### 4.3 TOMBSTONE_LIFT
|
||||
|
||||
Supersedes a previous tombstone.
|
||||
|
||||
Semantics:
|
||||
|
||||
* References an earlier TOMBSTONE.
|
||||
* Does not erase history.
|
||||
* Only affects CURRENT at or above this logseq.
|
||||
|
||||
### 4.4 SNAPSHOT_ANCHOR
|
||||
|
||||
Binds semantic state to a snapshot.
|
||||
|
||||
Semantics:
|
||||
|
||||
* Defines a replay checkpoint.
|
||||
* Enables log truncation below anchor with care.
|
||||
|
||||
### 4.5 ARTIFACT_PUBLISH (Optional)
|
||||
|
||||
Marks an artifact as published.
|
||||
|
||||
Semantics:
|
||||
|
||||
* Publication is domain-local.
|
||||
* Federation layers may interpret this metadata.
|
||||
|
||||
### 4.6 ARTIFACT_UNPUBLISH (Optional)
|
||||
|
||||
Withdraws publication.
|
||||
|
||||
---
|
||||
|
||||
## 5. Replay Semantics (Normative)
|
||||
|
||||
To reconstruct CURRENT:
|
||||
|
||||
1. Load latest snapshot anchor (if any).
|
||||
2. Initialize visible segments from that snapshot.
|
||||
3. Replay all log records with `logseq > snapshot.logseq`.
|
||||
4. Apply records in order:
|
||||
|
||||
* SEGMENT_SEAL -> add segment
|
||||
* TOMBSTONE -> update policy state
|
||||
* TOMBSTONE_LIFT -> override policy
|
||||
* PUBLISH/UNPUBLISH -> update visibility metadata
|
||||
|
||||
Replay MUST be deterministic.
|
||||
|
||||
---
|
||||
|
||||
## 6. Index Interaction
|
||||
|
||||
* Index segments contain index entries.
|
||||
* The log never records individual index entries.
|
||||
* Visibility is controlled solely by SEGMENT_SEAL.
|
||||
* Index rebuild = scan visible segments + apply policy.
|
||||
|
||||
---
|
||||
|
||||
## 7. Garbage Collection Constraints
|
||||
|
||||
* A segment may be GC'd only if:
|
||||
|
||||
* No snapshot references it.
|
||||
* No log replay <= CURRENT requires it.
|
||||
|
||||
* Log truncation is only safe at SNAPSHOT_ANCHOR boundaries.
|
||||
|
||||
---
|
||||
|
||||
## 8. Versioning & Extensibility
|
||||
|
||||
* Unknown record types MUST be skipped and MUST NOT break replay.
|
||||
* Payloads are opaque outside their type.
|
||||
* New record types may be added in later versions.
|
||||
|
||||
---
|
||||
|
||||
## 9. Non-Goals
|
||||
|
||||
ASL/LOG/1 does not define:
|
||||
|
||||
* Federation protocols
|
||||
* Network replication
|
||||
* Witness signatures
|
||||
* Block-level events
|
||||
* Hydration / eviction
|
||||
* Execution receipts
|
||||
|
||||
---
|
||||
|
||||
## 10. Summary
|
||||
|
||||
ASL/LOG/1 defines the minimal semantic log needed to reconstruct CURRENT.
|
||||
|
||||
If it affects visibility or admissibility, it goes in the log. If it affects layout or performance, it does not.
|
||||
316
tier1/asl-store-index.md
Normal file
316
tier1/asl-store-index.md
Normal file
|
|
@ -0,0 +1,316 @@
|
|||
# ASL-STORE-INDEX
|
||||
|
||||
### Store Semantics and Contracts for ASL Core Index (Tier1)
|
||||
|
||||
---
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
This document defines the **operational and store-level semantics** required to implement ASL-CORE-INDEX.
|
||||
|
||||
It specifies:
|
||||
|
||||
* **Block lifecycle**: creation, sealing, retention, GC
|
||||
* **Index segment lifecycle**: creation, append, seal, visibility
|
||||
* **Snapshot identity and log positions** for deterministic replay
|
||||
* **Append-only log semantics**
|
||||
* **Lookup, visibility, and crash recovery rules**
|
||||
* **Small vs large block handling**
|
||||
|
||||
It **does not define encoding** (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`) or semantic mapping (see ASL/1-CORE-INDEX).
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
Covers:
|
||||
|
||||
* Lifecycle of **blocks** and **index entries**
|
||||
* Snapshot and CURRENT consistency guarantees
|
||||
* Deterministic replay and recovery
|
||||
* GC and tombstone semantics
|
||||
* Packing policy for small vs large artifacts
|
||||
|
||||
Excludes:
|
||||
|
||||
* Disk-level encoding
|
||||
* Sharding or acceleration strategies (see ASL/INDEX-ACCEL/1)
|
||||
* Memory residency or caching
|
||||
* Federation or PEL semantics
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Concepts
|
||||
|
||||
### 3.1 Block
|
||||
|
||||
* **Definition:** Immutable storage unit containing artifact bytes.
|
||||
* **Identifier:** BlockID (opaque, unique).
|
||||
* **Properties:**
|
||||
|
||||
* Once sealed, contents never change.
|
||||
* Can be referenced by multiple artifacts.
|
||||
* May be pinned by snapshots for retention.
|
||||
|
||||
### 3.2 Index Segment
|
||||
|
||||
Segments group index entries and provide **persistence and recovery units**.
|
||||
|
||||
* **Open segment:** accepting new index entries, not visible for lookup.
|
||||
* **Sealed segment:** closed for append, log-visible, snapshot-pinnable.
|
||||
* **Segment components:** header, optional bloom filter, index records, footer.
|
||||
* **Segment visibility:** only after seal and log append.
|
||||
|
||||
### 3.3 Append-Only Log
|
||||
|
||||
All store-visible mutations are recorded in a **strictly ordered, append-only log**:
|
||||
|
||||
* Entries include:
|
||||
|
||||
* Index additions
|
||||
* Tombstones
|
||||
* Segment seals
|
||||
* Log is replayable to reconstruct CURRENT.
|
||||
* Log semantics are defined in `ASL/LOG/1`.
|
||||
|
||||
### 3.4 Snapshot Identity and Log Position
|
||||
|
||||
To make CURRENT referencable and replayable, ASL-STORE-INDEX defines:
|
||||
|
||||
* **SnapshotID**: opaque, immutable identifier for a snapshot.
|
||||
* **LogPosition**: monotonic integer position in the append-only log.
|
||||
* **IndexState**: `(SnapshotID, LogPosition)`.
|
||||
|
||||
Deterministic replay is defined as:
|
||||
|
||||
```
|
||||
Index(SnapshotID, LogPosition) = Snapshot[SnapshotID] + replay(log[0:LogPosition])
|
||||
```
|
||||
|
||||
Snapshots and log positions are required for checkpointing, federation, and deterministic recovery.
|
||||
|
||||
### 3.5 Artifact Location
|
||||
|
||||
* **ArtifactExtent**: `(BlockID, offset, length)` identifying a byte slice within a block.
|
||||
* **ArtifactLocation**: ordered list of `ArtifactExtent` values that, when concatenated, produce the artifact bytes.
|
||||
* Multi-extent locations allow a single artifact to be striped across multiple blocks.
|
||||
|
||||
---
|
||||
|
||||
## 4. Block Lifecycle Semantics
|
||||
|
||||
| Event | Description | Semantic Guarantees |
|
||||
| ------------------ | ------------------------------------- | ------------------------------------------------------------- |
|
||||
| Creation | Block allocated; bytes may be written | Not visible to index until sealed |
|
||||
| Sealing | Block is finalized and immutable | Sealed blocks are stable and safe to reference from index |
|
||||
| Retention | Block remains accessible | Blocks referenced by snapshots or CURRENT must not be removed |
|
||||
| Garbage Collection | Block may be deleted | Only unpinned, unreachable blocks may be removed |
|
||||
|
||||
Notes:
|
||||
|
||||
* Sealing ensures any index entry referencing the block is immutable.
|
||||
* Retention is driven by snapshot and log visibility rules.
|
||||
* GC must **never violate CURRENT reconstruction guarantees**.
|
||||
|
||||
---
|
||||
|
||||
## 5. Segment Lifecycle Semantics
|
||||
|
||||
### 5.1 Creation
|
||||
|
||||
* Open segment is allocated.
|
||||
* Index entries appended in log order.
|
||||
* Entries are invisible until segment seal and log append.
|
||||
|
||||
### 5.2 Seal
|
||||
|
||||
* Segment is closed to append.
|
||||
* Seal record is written to append-only log.
|
||||
* Segment becomes visible for lookup.
|
||||
* Sealed segment may be snapshot-pinned.
|
||||
|
||||
### 5.3 Snapshot Interaction
|
||||
|
||||
* Snapshots capture sealed segments.
|
||||
* Open segments need not survive snapshot.
|
||||
* Segments below snapshot are replay anchors.
|
||||
|
||||
---
|
||||
|
||||
## 6. Visibility and Lookup Semantics
|
||||
|
||||
### 6.1 Visibility Rules
|
||||
|
||||
* Entry visible **iff**:
|
||||
|
||||
* The block is sealed.
|
||||
* Log record exists at position ≤ CURRENT.
|
||||
* Segment seal recorded in log.
|
||||
|
||||
* Entries above CURRENT or referencing unsealed blocks are invisible.
|
||||
|
||||
### 6.2 Lookup Semantics
|
||||
|
||||
To resolve an `ArtifactKey`:
|
||||
|
||||
1. Identify all visible segments ≤ CURRENT.
|
||||
2. Search segments in **reverse creation order** (newest first).
|
||||
3. Return first matching entry.
|
||||
4. Respect tombstones to shadow prior entries.
|
||||
|
||||
Determinism:
|
||||
|
||||
* Lookup results are identical across platforms given the same snapshot and log prefix.
|
||||
* Accelerations (bloom filters, sharding, SIMD) **do not alter correctness**.
|
||||
|
||||
---
|
||||
|
||||
## 7. Snapshot Interaction
|
||||
|
||||
* Snapshots capture the set of **sealed blocks** and **sealed index segments** at a point in time.
|
||||
* Blocks referenced by a snapshot are **pinned** and cannot be garbage-collected until snapshot expiration.
|
||||
* CURRENT is reconstructed as:
|
||||
|
||||
```
|
||||
CURRENT = snapshot_state + replay(log)
|
||||
```
|
||||
|
||||
Segment and block visibility rules:
|
||||
|
||||
| Entity | Visible in snapshot | Visible in CURRENT |
|
||||
| -------------------- | ---------------------------- | ------------------------------ |
|
||||
| Open segment/block | No | Only after seal and log append |
|
||||
| Sealed segment/block | Yes, if included in snapshot | Yes, replayed from log |
|
||||
| Tombstone | Yes, if log-recorded | Yes, shadows prior entries |
|
||||
|
||||
---
|
||||
|
||||
## 8. Garbage Collection
|
||||
|
||||
Eligibility for GC:
|
||||
|
||||
* Segments: sealed, no references from CURRENT or snapshots.
|
||||
* Blocks: unpinned, unreferenced by any segment or artifact.
|
||||
|
||||
Rules:
|
||||
|
||||
* GC is safe **only on sealed segments and blocks**.
|
||||
* Must respect snapshot pins.
|
||||
* Tombstones may aid in invalidating unreachable blocks.
|
||||
|
||||
Outcome:
|
||||
|
||||
* GC never violates CURRENT reconstruction.
|
||||
* Blocks can be reclaimed without breaking provenance.
|
||||
|
||||
---
|
||||
|
||||
## 9. Tombstone Semantics
|
||||
|
||||
* Optional marker to invalidate prior mappings.
|
||||
* Visibility rules identical to regular index entries.
|
||||
* Used to maintain deterministic CURRENT in face of shadowing or deletions.
|
||||
|
||||
---
|
||||
|
||||
## 10. Small vs Large Block Handling
|
||||
|
||||
### 10.1 Definitions
|
||||
|
||||
| Term | Meaning |
|
||||
| ----------------- | --------------------------------------------------------------------- |
|
||||
| **Small block** | Block containing artifact bytes below a threshold `T_small`. |
|
||||
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
|
||||
| **Mixed segment** | Segment containing both small and large blocks (discouraged). |
|
||||
| **Packing** | Combining multiple small artifacts into a single physical block. |
|
||||
|
||||
Small vs large classification is **store-level only** and transparent to ASL-CORE and index layers.
|
||||
|
||||
### 10.2 Packing Rules
|
||||
|
||||
1. **Small blocks may be packed together** to reduce storage overhead.
|
||||
2. **Large blocks are never packed with other artifacts**.
|
||||
3. Mixed segments are **allowed but discouraged**; index semantics remain identical.
|
||||
|
||||
### 10.3 Segment Allocation Rules
|
||||
|
||||
1. Small blocks are allocated into segments optimized for packing efficiency.
|
||||
2. Large blocks are allocated into segments optimized for sequential I/O.
|
||||
3. Segment sealing and visibility rules remain unchanged.
|
||||
|
||||
### 10.4 Indexing and Addressing
|
||||
|
||||
All blocks are addressed uniformly:
|
||||
|
||||
```
|
||||
ArtifactExtent = (BlockID, offset, length)
|
||||
ArtifactLocation = [ArtifactExtent...]
|
||||
```
|
||||
|
||||
Packing does **not** affect index semantics or determinism. Multi-extent ArtifactLocations are allowed.
|
||||
|
||||
### 10.5 GC and Retention
|
||||
|
||||
1. Packed small blocks can be reclaimed only when **all contained artifacts** are unreachable.
|
||||
2. Large blocks are reclaimed per block.
|
||||
|
||||
Invariant: GC must never remove bytes still referenced by CURRENT or snapshots.
|
||||
|
||||
---
|
||||
|
||||
## 11. Crash and Recovery Semantics
|
||||
|
||||
* Open segments or unsealed blocks may be lost; no invariant is broken.
|
||||
* Recovery procedure:
|
||||
|
||||
1. Mount last checkpoint snapshot.
|
||||
2. Replay append-only log from checkpoint.
|
||||
3. Reconstruct CURRENT.
|
||||
|
||||
* Recovery is **deterministic and idempotent**.
|
||||
* Segments and blocks **never partially visible** after crash.
|
||||
|
||||
---
|
||||
|
||||
## 12. Normative Invariants
|
||||
|
||||
1. Sealed blocks are immutable.
|
||||
2. Index entries referencing blocks are immutable once visible.
|
||||
3. Shadowing follows strict log order.
|
||||
4. Replay of snapshot + log uniquely reconstructs CURRENT.
|
||||
5. GC cannot remove blocks or segments needed by snapshot or CURRENT.
|
||||
6. Tombstones shadow prior entries without deleting underlying blocks prematurely.
|
||||
7. IndexState `(SnapshotID, LogPosition)` uniquely identifies CURRENT.
|
||||
|
||||
---
|
||||
|
||||
## 13. Non-Goals
|
||||
|
||||
* Disk-level encoding (ENC-ASL-CORE-INDEX).
|
||||
* Memory layout or caching.
|
||||
* Sharding or performance heuristics.
|
||||
* Federation / multi-domain semantics (handled elsewhere).
|
||||
* Block packing strategies beyond the policy rules here.
|
||||
|
||||
---
|
||||
|
||||
## 14. Relationship to Other Layers
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | ---------------------------------------------------------------------------- |
|
||||
| ASL-CORE | Artifact semantics, existence of blocks, immutability |
|
||||
| ASL-CORE-INDEX | Semantic mapping of ArtifactKey → ArtifactLocation |
|
||||
| ASL-STORE-INDEX | Lifecycle and operational contracts for blocks and segments |
|
||||
| ENC-ASL-CORE-INDEX | Bytes-on-disk layout for segments, index records, and optional bloom filters |
|
||||
|
||||
---
|
||||
|
||||
## 15. Summary
|
||||
|
||||
The tier1 ASL-STORE-INDEX specification:
|
||||
|
||||
* Defines **block lifecycle** and **segment lifecycle**.
|
||||
* Makes **snapshot identity and log positions** explicit for replay.
|
||||
* Ensures deterministic visibility, lookup, and crash recovery.
|
||||
* Formalizes GC safety and tombstone behavior.
|
||||
* Adds clear **small vs large block** handling without changing core semantics.
|
||||
|
|
@ -8,7 +8,7 @@
|
|||
|
||||
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
|
||||
|
||||
It translates the **semantic model of ASL-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
|
||||
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
|
||||
|
||||
It is intended for:
|
||||
|
||||
|
|
@ -19,8 +19,9 @@ It is intended for:
|
|||
|
||||
It does **not** define:
|
||||
|
||||
* Index semantics (see ASL-CORE-INDEX)
|
||||
* Index semantics (see ASL/1-CORE-INDEX)
|
||||
* Store lifecycle behavior (see ASL-STORE-INDEX)
|
||||
* Acceleration semantics (see ASL/INDEX-ACCEL/1)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -49,6 +50,8 @@ Each index segment file is laid out as follows:
|
|||
+------------------+
|
||||
| IndexRecord[] |
|
||||
+------------------+
|
||||
| ExtentRecord[] |
|
||||
+------------------+
|
||||
| SegmentFooter |
|
||||
+------------------+
|
||||
```
|
||||
|
|
@ -56,6 +59,7 @@ Each index segment file is laid out as follows:
|
|||
* **SegmentHeader**: fixed-size, mandatory
|
||||
* **BloomFilter**: optional, opaque, segment-local
|
||||
* **IndexRecord[]**: array of index entries
|
||||
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
|
||||
* **SegmentFooter**: fixed-size, mandatory
|
||||
|
||||
Offsets in the header define locations of Bloom filter and index records.
|
||||
|
|
@ -81,6 +85,9 @@ typedef struct {
|
|||
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
|
||||
uint64_t bloom_size; // Size of bloom filter (0 if none)
|
||||
|
||||
uint64_t extents_offset; // File offset of ExtentRecord array
|
||||
uint64_t extent_count; // Total number of ExtentRecord entries
|
||||
|
||||
uint64_t flags; // Reserved for future use
|
||||
} SegmentHeader;
|
||||
#pragma pack(pop)
|
||||
|
|
@ -104,9 +111,9 @@ typedef struct {
|
|||
uint64_t hash_lo; // Low 64 bits
|
||||
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
|
||||
|
||||
uint64_t block_id; // ASL block identifier
|
||||
uint32_t offset; // Offset within block
|
||||
uint32_t length; // Length of artifact bytes
|
||||
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
|
||||
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
|
||||
uint32_t total_length; // Total artifact length in bytes
|
||||
|
||||
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
|
||||
uint32_t reserved; // Reserved for alignment/future use
|
||||
|
|
@ -117,13 +124,34 @@ typedef struct {
|
|||
**Notes:**
|
||||
|
||||
* `hash_*` fields store the artifact key deterministically.
|
||||
* `block_id` references an ASL block.
|
||||
* `offset` / `length` define bytes within the block.
|
||||
* `extents_offset` references the first ExtentRecord for this entry.
|
||||
* `extent_count` defines how many extents to read (may be 0 for tombstones).
|
||||
* `total_length` is the exact artifact size in bytes.
|
||||
* Flags may indicate tombstone or other special status.
|
||||
|
||||
---
|
||||
|
||||
## 6. SegmentFooter
|
||||
## 6. ExtentRecord
|
||||
|
||||
```c
|
||||
#pragma pack(push,1)
|
||||
typedef struct {
|
||||
uint64_t block_id; // ASL block identifier
|
||||
uint32_t offset; // Offset within block
|
||||
uint32_t length; // Length of this extent
|
||||
} ExtentRecord;
|
||||
#pragma pack(pop)
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
|
||||
* Extents are concatenated in order to produce artifact bytes.
|
||||
* `extent_count` MUST be > 0 for visible (non-tombstone) entries.
|
||||
* `total_length` MUST equal the sum of `length` across the extents.
|
||||
|
||||
---
|
||||
|
||||
## 7. SegmentFooter
|
||||
|
||||
```c
|
||||
#pragma pack(push,1)
|
||||
|
|
@ -142,7 +170,7 @@ typedef struct {
|
|||
|
||||
---
|
||||
|
||||
## 7. Bloom Filter
|
||||
## 8. Bloom Filter
|
||||
|
||||
* The bloom filter is **optional** and opaque to semantics.
|
||||
* Its purpose is **lookup acceleration**.
|
||||
|
|
@ -151,24 +179,27 @@ typedef struct {
|
|||
|
||||
---
|
||||
|
||||
## 8. Versioning and Compatibility
|
||||
## 9. Versioning and Compatibility
|
||||
|
||||
* `version` field in header defines encoding.
|
||||
* Readers must **reject unsupported versions**.
|
||||
* New fields may be added in future versions only via version bump.
|
||||
* Existing fields must **never change meaning**.
|
||||
* Version `1` implies single-extent layout (legacy).
|
||||
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
|
||||
|
||||
---
|
||||
|
||||
## 9. Alignment and Packing
|
||||
## 10. Alignment and Packing
|
||||
|
||||
* All structures are **packed** (no compiler padding)
|
||||
* Multi-byte integers are **little-endian**
|
||||
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
|
||||
* Extents are accessed via `IndexRecord.extents_offset` relative to the file base.
|
||||
|
||||
---
|
||||
|
||||
## 10. Summary of Encoding Guarantees
|
||||
## 11. Summary of Encoding Guarantees
|
||||
|
||||
The ENC-ASL-CORE-INDEX specification ensures:
|
||||
|
||||
|
|
@ -180,14 +211,13 @@ The ENC-ASL-CORE-INDEX specification ensures:
|
|||
|
||||
---
|
||||
|
||||
## 11. Relationship to Other Layers
|
||||
## 12. Relationship to Other Layers
|
||||
|
||||
| Layer | Responsibility |
|
||||
| ------------------ | ---------------------------------------------------------- |
|
||||
| ASL-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
|
||||
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
|
||||
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
|
||||
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
|
||||
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
|
||||
|
||||
This completes the stack: **semantics → store behavior → encoding**.
|
||||
|
||||
|
||||
Loading…
Reference in a new issue