amduat-api/notes/asl-store-index-adendum-small-vs-large-block.md
2026-01-17 00:19:49 +01:00

144 lines
5 KiB
Markdown

# ASL-STORE-INDEX ADDENDUM: Small vs Large Block Handling
---
## 1. Purpose
This addendum defines **store-level policies for handling small and large blocks** in ASL-STORE-INDEX, covering:
* Packing strategies
* Segment allocation rules
* Addressing consistency
* Determinism guarantees
It ensures **operational clarity** while keeping the **semantic model (ASL-CORE and ASL-CORE-INDEX) unchanged**.
---
## 2. Definitions
| Term | Meaning |
| ----------------- | --------------------------------------------------------------------------------------------------- |
| **Small block** | Block containing artifact bytes below a configurable threshold `T_small`. |
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
| **Mixed segment** | A segment containing both small and large blocks (generally avoided). |
| **Packing** | Strategy for combining multiple small artifacts into a single block. |
| **BlockID** | Opaque, unique identifier for the block. Addressing rules are identical for small and large blocks. |
**Notes:**
* Small vs large classification is **store-level only**, transparent to ASL-CORE and index layers.
* The **threshold `T_small`** is configurable per deployment.
---
## 3. Packing Rules
1. **Small blocks may be packed together** to reduce storage overhead and improve I/O efficiency.
* Multiple small artifacts can reside in a single physical block.
* Each artifact is mapped in the index to a distinct `(BlockID, offset, length)` within the packed block.
2. **Large blocks are never packed with other artifacts**.
* Each large artifact resides in its own block.
* This ensures sequential access efficiency and avoids fragmentation.
3. **Mixed segments** are **permitted only if necessary**, but discouraged.
* The store may emit a warning or logging when mixing occurs.
* Indexing and addressing remain consistent; artifacts retain deterministic `(BlockID, offset, length)` mapping.
---
## 4. Segment Allocation Rules
1. **Small blocks**:
* Allocated into segments optimized for packing efficiency.
* Segment size may be smaller than large-block segments to avoid wasted space.
2. **Large blocks**:
* Allocated into segments optimized for sequential I/O.
* Each segment may contain a single large block or a small number of large blocks.
3. **Segment sealing and visibility rules**:
* Same as standard ASL-STORE-INDEX: segments become visible only after seal + log append.
* Determinism and snapshot safety unaffected by block size.
---
## 5. Indexing and Addressing
* All blocks, regardless of size, are addressed uniformly:
```
ArtifactLocation = (BlockID, offset, length)
```
* Packing small artifacts **does not affect index semantics**:
* Each artifact retains its unique location.
* Shadowing, tombstones, and visibility rules are identical to large blocks.
---
## 6. Garbage Collection and Retention
1. **Small packed blocks**:
* GC may reclaim blocks only when **all contained artifacts are unreachable**.
* Tombstones and snapshot pins apply to individual artifacts within the packed block.
2. **Large blocks**:
* GC applies per block, as usual.
* Retention/pinning applies to the whole block.
**Invariant:** GC must never remove bytes still referenced by CURRENT or snapshots, independent of block size.
---
## 7. Determinism Guarantees
* Deterministic behavior of index lookup, CURRENT reconstruction, and PEL execution is **unchanged** by block size or packing.
* Packing is purely an **implementation optimization** at the store layer.
* All `(BlockID, offset, length)` mappings remain deterministic per snapshot + log.
---
## 8. Configurable Parameters
* `T_small`: threshold for small vs large block classification
* `Segment size for small blocks`
* `Segment size for large blocks`
* `Maximum artifacts per small packed block`
These parameters may be tuned per deployment but do not change ASL-CORE semantics.
---
## 9. Normative Invariants
1. Artifact locations remain deterministic and immutable.
2. Packed small artifacts are individually addressable via `(BlockID, offset, length)`.
3. Large artifacts are never packed with other artifacts.
4. Segment visibility, snapshot safety, and GC rules are identical to standard store rules.
5. Mixed segments are discouraged but allowed if unavoidable; index semantics remain consistent.
---
## 10. Summary
This addendum formalizes **small vs large block handling** in the store layer:
* **Small artifacts** may be packed together to reduce overhead.
* **Large artifacts** remain separate for efficiency.
* **Addressing and index semantics remain identical** for both sizes.
* **Determinism, snapshot safety, and GC invariants are preserved**.
It provides clear operational guidance for **store implementations**, while leaving **ASL-CORE and index semantics unaltered**.