144 lines
5 KiB
Markdown
144 lines
5 KiB
Markdown
|
|
# ASL-STORE-INDEX ADDENDUM: Small vs Large Block Handling
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Purpose
|
||
|
|
|
||
|
|
This addendum defines **store-level policies for handling small and large blocks** in ASL-STORE-INDEX, covering:
|
||
|
|
|
||
|
|
* Packing strategies
|
||
|
|
* Segment allocation rules
|
||
|
|
* Addressing consistency
|
||
|
|
* Determinism guarantees
|
||
|
|
|
||
|
|
It ensures **operational clarity** while keeping the **semantic model (ASL-CORE and ASL-CORE-INDEX) unchanged**.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Definitions
|
||
|
|
|
||
|
|
| Term | Meaning |
|
||
|
|
| ----------------- | --------------------------------------------------------------------------------------------------- |
|
||
|
|
| **Small block** | Block containing artifact bytes below a configurable threshold `T_small`. |
|
||
|
|
| **Large block** | Block containing artifact bytes ≥ `T_small`. |
|
||
|
|
| **Mixed segment** | A segment containing both small and large blocks (generally avoided). |
|
||
|
|
| **Packing** | Strategy for combining multiple small artifacts into a single block. |
|
||
|
|
| **BlockID** | Opaque, unique identifier for the block. Addressing rules are identical for small and large blocks. |
|
||
|
|
|
||
|
|
**Notes:**
|
||
|
|
|
||
|
|
* Small vs large classification is **store-level only**, transparent to ASL-CORE and index layers.
|
||
|
|
* The **threshold `T_small`** is configurable per deployment.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Packing Rules
|
||
|
|
|
||
|
|
1. **Small blocks may be packed together** to reduce storage overhead and improve I/O efficiency.
|
||
|
|
|
||
|
|
* Multiple small artifacts can reside in a single physical block.
|
||
|
|
* Each artifact is mapped in the index to a distinct `(BlockID, offset, length)` within the packed block.
|
||
|
|
|
||
|
|
2. **Large blocks are never packed with other artifacts**.
|
||
|
|
|
||
|
|
* Each large artifact resides in its own block.
|
||
|
|
* This ensures sequential access efficiency and avoids fragmentation.
|
||
|
|
|
||
|
|
3. **Mixed segments** are **permitted only if necessary**, but discouraged.
|
||
|
|
|
||
|
|
* The store may emit a warning or logging when mixing occurs.
|
||
|
|
* Indexing and addressing remain consistent; artifacts retain deterministic `(BlockID, offset, length)` mapping.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Segment Allocation Rules
|
||
|
|
|
||
|
|
1. **Small blocks**:
|
||
|
|
|
||
|
|
* Allocated into segments optimized for packing efficiency.
|
||
|
|
* Segment size may be smaller than large-block segments to avoid wasted space.
|
||
|
|
|
||
|
|
2. **Large blocks**:
|
||
|
|
|
||
|
|
* Allocated into segments optimized for sequential I/O.
|
||
|
|
* Each segment may contain a single large block or a small number of large blocks.
|
||
|
|
|
||
|
|
3. **Segment sealing and visibility rules**:
|
||
|
|
|
||
|
|
* Same as standard ASL-STORE-INDEX: segments become visible only after seal + log append.
|
||
|
|
* Determinism and snapshot safety unaffected by block size.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Indexing and Addressing
|
||
|
|
|
||
|
|
* All blocks, regardless of size, are addressed uniformly:
|
||
|
|
|
||
|
|
```
|
||
|
|
ArtifactLocation = (BlockID, offset, length)
|
||
|
|
```
|
||
|
|
* Packing small artifacts **does not affect index semantics**:
|
||
|
|
|
||
|
|
* Each artifact retains its unique location.
|
||
|
|
* Shadowing, tombstones, and visibility rules are identical to large blocks.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Garbage Collection and Retention
|
||
|
|
|
||
|
|
1. **Small packed blocks**:
|
||
|
|
|
||
|
|
* GC may reclaim blocks only when **all contained artifacts are unreachable**.
|
||
|
|
* Tombstones and snapshot pins apply to individual artifacts within the packed block.
|
||
|
|
|
||
|
|
2. **Large blocks**:
|
||
|
|
|
||
|
|
* GC applies per block, as usual.
|
||
|
|
* Retention/pinning applies to the whole block.
|
||
|
|
|
||
|
|
**Invariant:** GC must never remove bytes still referenced by CURRENT or snapshots, independent of block size.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Determinism Guarantees
|
||
|
|
|
||
|
|
* Deterministic behavior of index lookup, CURRENT reconstruction, and PEL execution is **unchanged** by block size or packing.
|
||
|
|
* Packing is purely an **implementation optimization** at the store layer.
|
||
|
|
* All `(BlockID, offset, length)` mappings remain deterministic per snapshot + log.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Configurable Parameters
|
||
|
|
|
||
|
|
* `T_small`: threshold for small vs large block classification
|
||
|
|
* `Segment size for small blocks`
|
||
|
|
* `Segment size for large blocks`
|
||
|
|
* `Maximum artifacts per small packed block`
|
||
|
|
|
||
|
|
These parameters may be tuned per deployment but do not change ASL-CORE semantics.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. Normative Invariants
|
||
|
|
|
||
|
|
1. Artifact locations remain deterministic and immutable.
|
||
|
|
2. Packed small artifacts are individually addressable via `(BlockID, offset, length)`.
|
||
|
|
3. Large artifacts are never packed with other artifacts.
|
||
|
|
4. Segment visibility, snapshot safety, and GC rules are identical to standard store rules.
|
||
|
|
5. Mixed segments are discouraged but allowed if unavoidable; index semantics remain consistent.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 10. Summary
|
||
|
|
|
||
|
|
This addendum formalizes **small vs large block handling** in the store layer:
|
||
|
|
|
||
|
|
* **Small artifacts** may be packed together to reduce overhead.
|
||
|
|
* **Large artifacts** remain separate for efficiency.
|
||
|
|
* **Addressing and index semantics remain identical** for both sizes.
|
||
|
|
* **Determinism, snapshot safety, and GC invariants are preserved**.
|
||
|
|
|
||
|
|
It provides clear operational guidance for **store implementations**, while leaving **ASL-CORE and index semantics unaltered**.
|
||
|
|
|
||
|
|
|