amduat-api/notes/ASL-Block-Architecture-&-Specification.md

298 lines
5.6 KiB
Markdown
Raw Normal View History

# ASL Block Architecture & Specification
## 1. Purpose and Scope
The **Artifact Storage Layer (ASL)** is responsible for the **physical storage, layout, and retrieval of immutable artifact bytes**.
ASL operates beneath CAS and above the storage substrate (ZFS).
ASL concerns itself with:
* Efficient packaging of artifacts into blocks
* Stable block addressing
* Snapshot-safe immutability
* Storage-local optimizations
ASL does **not** define:
* Artifact identity
* Hash semantics
* Provenance
* Interpretation
* Indexing semantics
---
## 2. Core Abstractions
### 2.1 Artifact
An **artifact** is an immutable byte sequence produced or consumed by higher layers.
ASL treats artifacts as opaque bytes.
---
### 2.2 ASL Block
An **ASL block** is the smallest independently addressable, immutable unit of storage managed by ASL.
Properties:
* Identified by an **ASL Block ID**
* Contains one or more artifacts
* Written sequentially
* Immutable once sealed
* Snapshot-safe
ASL blocks are the unit of:
* Storage
* Reachability
* Garbage collection
---
### 2.3 ASL Block ID
An **ASL Block ID** is an opaque, stable identifier.
#### Invariants
* Globally unique within an ASL instance
* Never reused
* Never mutated
* Does **not** encode:
* Artifact size
* Placement
* Snapshot
* Storage topology
* Policy decisions
#### Semantics
Block IDs identify *logical blocks*, not physical locations.
Higher layers must treat block IDs as uninterpretable tokens.
---
## 3. Addressing Model
ASL exposes a single addressing primitive:
```
(block_id, offset, length) → bytes
```
This is the **only** contract between CAS and ASL.
Notes:
* `offset` and `length` are stable for the lifetime of the block
* ASL guarantees that reads are deterministic per snapshot
* No size-class or block-kind information is exposed
---
## 4. Block Allocation Model
### 4.1 Global Block Namespace
ASL maintains a **single global block namespace**.
Block IDs are allocated from a monotonically increasing sequence:
```
next_block_id := next_block_id + 1
```
Properties:
* Allocation is append-only
* Leaked IDs are permitted
* No coordination with CAS is required
---
### 4.2 Open Blocks
At any time, ASL may maintain one or more **open blocks**.
Open blocks:
* Accept new artifact writes
* Are not visible to readers
* Are not referenced by the index
* May be abandoned on crash
---
### 4.3 Sealed Blocks
A block becomes **sealed** when:
* It reaches an internal fill threshold, or
* ASL decides to finalize it for policy reasons
Once sealed:
* No further writes are permitted
* Offsets and lengths become permanent
* The block becomes visible to CAS
* The block may be referenced by index entries
Sealed blocks are immutable forever.
---
## 5. Packaging Policy (Non-Semantic)
ASL applies **packaging heuristics** when choosing how to place artifacts into blocks.
Examples:
* Prefer packing many small artifacts together
* Prefer isolating very large artifacts
* Avoid mixing vastly different sizes when convenient
### Important rule
Packaging decisions are:
* Best-effort
* Local
* Replaceable
* **Not part of the ASL contract**
No higher layer may assume anything about block contents based on artifact size.
---
## 6. Storage Layout and Locality
### 6.1 Single Dataset, Structured Locality
ASL stores all blocks within a **single ZFS dataset**.
Within that dataset, ASL may organize blocks into subpaths to improve locality, e.g.:
```
asl/blocks/dense/
asl/blocks/sparse/
```
These subpaths:
* Exist purely for storage optimization
* May carry ZFS property overrides
* Are not encoded into block identity
Block resolution does **not** depend on knowing which subpath was used.
---
### 6.2 Placement Hints
At allocation time, ASL may apply **placement hints**, such as:
* Preferred directory
* Write size
* Compression preference
* Recordsize alignment
These hints:
* Affect only physical layout
* May change over time
* Do not affect block identity or correctness
---
## 7. Snapshot Semantics
ASL is snapshot-aware but snapshot-agnostic.
Rules:
* ASL blocks live inside snapshot-capable storage
* Snapshots naturally pin sealed blocks
* ASL does not encode snapshot IDs into block IDs
* CAS determines snapshot visibility
ASL guarantees:
* Deterministic reads for a given snapshot
* No mutation of sealed blocks across snapshots
---
## 8. Crash Safety and Recovery
### 8.1 Crash During Open Block
If a crash occurs:
* Open blocks may be lost or abandoned
* Block IDs allocated but not sealed may be leaked
* No sealed block may be corrupted
This is acceptable and expected.
---
### 8.2 Recovery Rules
On startup, ASL:
* Scans for sealed blocks
* Ignores or cleans up abandoned open blocks
* Resumes allocation from the next unused block ID
No global replay or rebuild is required.
---
## 9. Garbage Collection
ASL performs garbage collection at **block granularity**.
Rules:
* A block is eligible for deletion if:
* It is sealed, and
* It is unreachable from all retained snapshots
* ASL does not perform partial block mutation
* Compaction (if any) rewrites artifacts into new blocks
Block deletion is irreversible.
---
## 10. Non-Goals (Explicit)
ASL explicitly does **not** provide:
* Artifact identity management
* Deduplication decisions
* Provenance interpretation
* Size-class semantics
* Execution semantics
Those concerns belong to CAS, PEL, and higher layers.
---
## 11. Design Summary (Executive)
* One block namespace
* One addressing model
* One read path
* Placement is an optimization
* Immutability is absolute
* Snapshots provide safety
* Size is a courtesy, not a contract