amduat-api/notes/NiklasNotes.md
2026-01-17 00:19:49 +01:00

4314 lines
86 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Designing a CAS on top of ZFS.
writers are per shard, simd, bloom filters, small + big block separation global index, local indexes, ZFS snapshot as checkpoint and sequential log, CAS stack also adds artifact storage layer (ASL) that has plugable algo and storage. adds 2 bytes preamble to payload. Used for hash_id.
Also adds Program Execution Layer (PEL)
PEL defines and execute deterministic DAG programs over an array of inputs and an array of outputs (artifacts). Keeps hot pash short but PEL execution recipts are used by other layers to project into a traceable graph kernal. Everything with determinism guaraneed per snapshot.
Handles billions of index entries.
Layer on top of pel can include registries, dedup logic, certification, envelopes probably also handling ...
Heres a concrete, end-to-end way to think about a
**CAS (Content-Addressable Storage) built on top of ZFS**
keep it system-design focused rather than academic.
---
## 1. Core goals & constraints
Before structure, clarify the implicit goals your notes suggest:
* **High write throughput** (per-shard writers, sequential logs)
* **Deduplication by hash** (CAS)
* **Fast existence checks** (Bloom filters)
* **Crash consistency** without complex WAL replay (ZFS snapshots as checkpoints)
* **Scalable index** (global → shard → local)
* **Efficient handling of small vs large objects**
ZFS already gives you:
* Copy-on-write
* Checksumming
* Snapshots
* Compression
* Atomic rename
So your CAS should **avoid re-implementing** what ZFS already does well.
---
## 2. Sharding model
### Hash → shard mapping
* Compute `H = hash(content)`
* Shard = `H[0..k]` (prefix bits)
* Each shard is **independent**
* Separate writer
* Separate log
* Separate local index
* Separate Bloom filter
This avoids cross-shard coordination.
```
hash ──► shard_id ──► shard writer
```
---
## 3. Write path (per shard, sequential)
### 3.1 Append-only log (primary data path)
Each shard has a **sequential log file**:
```
/zfs_pool/cas/shard_XX/log.current
```
Each entry:
```
[hash][size][flags][payload]
```
* Always append
* No in-place updates
* SIMD helps here:
* Batch hashes
* Batch compression
* Batch checksum verification
ZFS likes this pattern.
---
### 3.2 Small vs large block separation
**Small objects (e.g. < 816 KB)**:
* Packed directly into shard log
* Many per ZFS block
* Indexed by (log_id, offset)
**Large objects**:
* Stored as **separate ZFS files**
```
/zfs_pool/cas/shard_XX/objects/HH/HHHH...
```
* Log only stores a reference
* Benefits:
* Avoid log bloat
* ZFS compression & prefetch work better
* Easy GC
This mirrors Gits loose vs packed objects.
---
## 4. Indexing strategy
### 4.1 Local index (per shard)
Minimal, hot, compact:
```
hash → location
location = (log_id, offset) OR (object_path)
```
Implementation options:
* Embedded KV (LMDB / SQLite / custom mmap table)
* Append-only + periodic rebuild
This index is rebuilt from logs if lost.
---
### 4.2 Global index (thin routing layer)
The “global index” does **not** store full locations.
Instead:
```
hash_prefix → shard_id
```
Options:
* Static hash prefix rule (no structure at all)
* Tiny in-memory table if rebalancing is needed
This avoids a scalability bottleneck.
---
## 5. Bloom filters (yes, “bloomerb something”)
Each shard maintains a **Bloom filter**:
```
Bloom[shard] : hash → maybe exists
```
Usage:
1. Hash arrives
2. Check Bloom
* **Negative** → definitely new → append
* **Positive** → consult local index
Properties:
* Lives in memory
* Periodically snapshotted to disk
* Rebuilt on startup by replaying logs
You can SIMD-accelerate Bloom probes.
---
## 6. Read path
1. Compute hash
2. Route to shard
3. Check Bloom
* Negative → return “not found”
4. Lookup local index
5. Read from:
* Log offset (small)
* Object file (large)
ZFS ARC will cache hot reads automatically.
---
## 7. Checkpointing with ZFS snapshots
This is where ZFS shines.
### 7.1 Periodic checkpoint
For each shard:
1. Rotate log:
```
log.current → log.N
```
2. Flush local index & Bloom to disk
3. Take ZFS snapshot:
```
zfs snapshot pool/cas@checkpoint_123
```
That snapshot is a **crash-consistent point**:
* Logs
* Index
* Bloom
* Object files
No WAL replay logic needed.
---
### 7.2 Recovery
On restart:
1. Roll back or mount latest snapshot
2. Replay only **log.current**
3. Rebuild Bloom + index incrementally
Fast and simple.
---
## 8. Garbage collection (important!)
CAS needs GC eventually.
Strategy:
* Maintain **reference counts** or reachability info outside CAS
* Periodic mark phase:
* Identify live hashes
* Sweep:
* Drop unreachable log segments
* Delete unreferenced large object files
ZFS snapshots make this safer:
* GC operates on newest generation
* Old snapshots keep data alive until released
---
## 9. SIMD & batching opportunities
Where SIMD actually helps:
* Hashing (BLAKE3 / SHA-256 batch mode)
* Bloom filter probes
* Compression (ZSTD batch APIs)
* Checksums
Less useful for:
* Random index lookups
* Metadata heavy paths
---
## 10. Why this fits ZFS well
* Append-only logs → minimal fragmentation
* Snapshots → free checkpoints
* Compression → great for small objects
* ARC → automatic read caching
* Checksums → extra corruption detection
Youre building **policy and indexing**, not storage primitives.
---
## 11. Open questions (only if you want to go deeper)
If you want to refine this further, the key design pivots are:
* Expected object size distribution?
* Write-heavy vs read-heavy?
* Need for online shard rebalancing?
* Is this single-node or eventually multi-node?
Answering those changes details, not the core architecture.
This is a solid and ambitious architecture. Ill respond in **design-review mode**, not rewriting what you already have, but:
1. **Normalize the mental model**
2. **Call out strengths**
3. **Identify pressure points / risks**
4. **Suggest concrete refinements**
5. **Clarify how layers should contract with each other**
6. **List a few sharp questions that matter next**
---
## 1. Canonical Mental Model (clean layering)
What you are really building is:
```
ZFS (truth & durability)
└── CAS Core (bytes → hashes → blocks)
├── Storage Engine (logs, blocks, shards)
├── Indexing Engine (global + local, bloom, SIMD)
├── Snapshot / Checkpoint Engine
└── Artifact Storage Layer (ASL)
└── Pluggable codecs / layouts / policies
└── Program Execution Layer (PEL)
├── Deterministic DAG execution
├── Receipt generation
└── Snapshot-bound determinism
└── Higher Logical Layers
├── Registries
├── Dedup + certification
├── Envelopes / provenance
└── Hash-mapping / conflict resolution
```
**Key insight:**
Your CAS is *not* just content-addressed storage.
It is a **deterministic computation substrate over immutable data**.
That puts it closer to:
> *“Git + Nix + ZFS + Merkle execution traces”*
---
## 2. Strengths of the Design
### ✅ ZFS snapshots as checkpoints
This is extremely strong:
* Snapshot = **global consistency point**
* Enables:
* Deterministic replay
* Crash recovery without WAL complexity
* Snapshot-scoped guarantees (PEL determinism per snapshot)
This is the right foundation for *billions of entries*.
---
### ✅ Per-shard writers + sequential log
You avoid:
* Lock contention
* Write amplification
Shard-local sequential logs + snapshot sealing is the correct scaling model.
---
### ✅ Small / big block separation
Critical for:
* Tail latency
* Dedup efficiency
* Cache locality
Make sure:
* Small blocks are aggressively packed
* Big blocks avoid unnecessary hash fanout
---
### ✅ Bloom filters + SIMD
This is the only way a global index of this scale is feasible.
One warning: blooms must be **snapshot-scoped or versioned** or you will suffer false-positive accumulation over time.
---
### ✅ PEL as a deterministic DAG executor
This is the most interesting part.
You are essentially doing:
* Immutable inputs
* Deterministic execution
* Receipts as *proofs of execution*
This unlocks:
* Provenance
* Reproducibility
* Certification
* Auditability
---
## 3. Pressure Points / Risks
These are not flaws — theyre where designs like this usually break if not nailed early.
---
### ⚠️ 1. Hash ID vs payload preamble (2 bytes)
You mention:
> “adds 2 bytes preamble to payload. Used for hash_id.”
Be careful:
* Any mutation of payload bytes must be:
* Explicitly modeled
* Fully deterministic
* Included in hash domain separation
**Recommendation:**
Treat the preamble as:
```
hash = H(domain || algo_id || payload)
```
Not as “payload mutation”.
Otherwise you will eventually get:
* Hash collisions across algorithms
* Ambiguous artifact identity
---
### ⚠️ 2. Global index scale
Billions of entries means:
* You *cannot* do full rebuilds
* You *cannot* afford heavy compaction
You need:
* Append-only index segments
* Snapshot-scoped immutable index layers
* Periodic *merge but never rewrite*
Think: **LSM-tree, but snapshot-aligned and immutable after seal**.
---
### ⚠️ 3. Hash conflicts & algorithm agility
You explicitly mention:
> “duplicate conflicting hash…”
This is good — most systems pretend this wont happen.
You need:
* Hash namespace = `(algo, version)`
* CAS identity ≠ hash alone
* Hash is a *claim*, not truth
Higher layers must be able to say:
> “This artifact is identified by multiple hashes, with provenance.”
---
### ⚠️ 4. Determinism leaks in PEL
PEL determinism is only as strong as:
* Inputs
* Environment
* Execution semantics
Watch out for:
* Non-ordered maps
* Floating point differences
* Parallel execution ordering
* Clock / randomness leaks
**Rule:**
PEL must be:
> *Pure function of snapshot + declared inputs*
No exceptions.
---
## 4. Suggested Refinements
### 4.1 Artifact Storage Layer (ASL)
Make ASL explicitly responsible for:
* Encoding (compression, chunking)
* Layout (inline, small-block, big-block)
* Policy (retention, locality hints)
But **never identity**.
Identity must live above ASL, otherwise:
* Changing compression breaks hashes
* Migration becomes impossible
---
### 4.2 Receipts as First-Class Objects
PEL execution receipts should be:
* Immutable artifacts
* Content-addressed
* Stored in CAS
Receipt should include:
* Input artifact IDs
* Program hash
* Output artifact IDs
* Snapshot ID
* Execution parameters
This lets you:
* Project execution into a graph kernel (as you said)
* Re-verify execution later
* Build trust layers on top
---
### 4.3 Registries as Views, not Stores
Registries should be:
* Snapshot-scoped views
* Declarative mappings
* No ownership of data
Think:
```
registry(snapshot) = projection(CAS + receipts)
```
This avoids:
* Registry corruption
* Version skew
* Inconsistent dedup logic
---
## 5. Contracts Between Layers (important)
### CAS guarantees:
* Immutable storage per snapshot
* Deterministic lookup
* Algorithm-agnostic identity support
### ASL guarantees:
* Reversible encoding
* Stable layout per snapshot
* No semantic meaning
### PEL guarantees:
* Deterministic execution
* Replayable receipts
* Snapshot-bound semantics
### Upper layers guarantee:
* Policy
* Interpretation
* Certification
* Human meaning
If any layer violates its contract, debugging becomes impossible.
---
## 6. Sharp Questions That Matter Next
You dont need to answer these to me — but you **must** answer them in the design.
1. **Is PEL allowed to read CAS metadata directly, or only artifacts?**
2. **Are execution receipts themselves executable inputs?**
3. **How do you garbage-collect unreachable artifacts without breaking provenance?**
4. **Is snapshot identity global or per pool?**
5. **Can two snapshots share artifacts but diverge in interpretation?**
6. **What is the minimal thing required to reproduce an execution on another machine?**
---
## Bottom Line
You are building:
> A **deterministic, snapshot-based, content-addressed computation and provenance system**.
The architecture is coherent and powerful.
The biggest risks are **identity leakage**, **determinism erosion**, and **index lifecycle management**.
If you want, next we can:
* Formalize the **artifact identity model**
* Design the **index layout**
* Specify a **PEL execution spec**
* Or stress-test the design with failure scenarios
Just tell me where you want to go next.
Is PEL allowed to read CAS metadata directly, or only artifacts?
PEL is designed on top of ASL. ASL does not have any metadata yet. It has a configuration. PEL needs to read CURRENT snapshot from CAS/ASL metadata. Anything else?
Are execution receipts themselves executable inputs?
I guess they could be. I mean PEL operates DAG programs composed of concat, slice and some other primitives. But I think programs can have sideeffects on hooks for projections. Input to a PEL program is an artifact or another PEL program. receipts are artifacts so, yes they are valid input to PEL programs but not as executables. That said, looking down into the receipt you wil probably find the DAG program that was involved and use that as an executable program in the PEL program DAG.
How do you garbage-collect unreachable artifacts without breaking provenance?
Im not sure. Tombe stones?
Is snapshot identity global or per pool?
Im not sure...
Can two snapshots share artifacts but diverge in interpretation?
What? How? That is ZFS internals then, right?
What is the minimal thing required to reproduce an execution on another machine?
it is the last checkpoint and the sequential write log.
Internally it would require the governance, trace, provenance, receipts, certificates, federation data, you name it...
Great, this clarifies a lot. Ill answer **in design-decision form**, not just yes/no, and Ill be opinionated where the architecture really needs a hard line.
---
# 1. Can PEL read CAS metadata directly, or only artifacts?
### Short answer
**PEL should read *only* a very small, explicit CAS/ASL metadata surface:**
* **Current snapshot identity**
* **PEL execution context metadata**
* **Nothing else**
### Long answer
You already made the correct move by placing **PEL on top of ASL**. That implies a strict rule:
> **PEL must not depend on physical storage metadata.**
However, **snapshot identity is not “storage metadata”** — it is *execution context*. That is a critical distinction.
### Required metadata PEL may read
PEL must be allowed to read:
1. **Snapshot ID (opaque, stable)**
* Required for determinism
* Required for receipts
* Required to tie executions to checkpoints
2. **PEL execution configuration**
* Hash algorithm set
* Primitive semantics version
* ASL configuration hash
These must be **immutable for the duration of execution**.
### Forbidden metadata
PEL must *not* read:
* Block layout
* Shard IDs
* Bloom filters
* Index internals
* Physical offsets
* ZFS object IDs
If PEL reads these, determinism collapses.
### Rule (important)
> PEL may read **only metadata that is itself snapshot-scoped and immutable**.
If ASL “has no metadata yet”, then:
* Snapshot identity becomes the **first ASL metadata**
* Treat it as a *root execution constant*
---
# 2. Are execution receipts executable inputs?
You are very close to the right mental model already.
### Clear separation you need
There are **three distinct things**:
1. **Artifact** (bytes)
2. **Program** (PEL DAG definition)
3. **Receipt** (proof that a program ran)
Receipts are artifacts, but:
> **Receipts are *data*, not programs.**
### Your instinct is correct
> “looking down into the receipt you will probably find the DAG program that was involved and use that as an executable program”
Yes — but this must be **explicit**, not implicit.
### Correct rule
* A receipt **may contain**:
* Program hash
* Program DAG encoding
* Inputs and outputs
* PEL **must not automatically execute receipt contents**
* A higher layer must *extract* a program from a receipt and submit it intentionally
This prevents:
* Accidental self-executing artifacts
* Execution smuggling
* Provenance corruption
### Design recommendation
Add a concept:
```
ProgramDescriptor = artifact + semantic tag
```
Only `ProgramDescriptor` can be executed, not raw artifacts.
---
# 3. Garbage collection without breaking provenance
You are right to hesitate here — this is hard.
### First principle
> **CAS GC must be reachability-based *per snapshot***.
You cannot GC globally without violating history.
### The only viable model
Use **tombstones + snapshot pinning**.
#### Mechanics:
* Each snapshot defines a *root set*
* Artifacts reachable from that snapshot are alive
* Unreachable artifacts become *eligible*, not deleted
* Tombstone records:
* Artifact ID
* Snapshot range where it was last reachable
Deletion happens only when:
```
artifact not reachable from any retained snapshot
```
### Provenance-safe deletion
* Receipts, certificates, and provenance graphs **pin artifacts**
* If provenance is required, the snapshot is retained
* No snapshot = no provenance guarantee
This is the unavoidable tradeoff.
### Rule
> You cannot have both infinite provenance and finite storage without policy.
Expose that policy explicitly.
---
# 4. Is snapshot identity global or per pool?
This one **must be global**.
### Why per-pool breaks things
If snapshot identity is per ZFS pool:
* Federation becomes impossible
* Receipts lose meaning across machines
* Provenance graphs fragment
### Correct model
```
SnapshotID = Hash(global_namespace || pool_id || txg || config)
```
* Opaque
* Comparable
* Serializable
* Verifiable
### ZFS is an implementation detail
ZFS snapshot names are **not sufficient** as identities.
They are *sources* of identity, not the identity itself.
---
# 5. Can two snapshots share artifacts but diverge in interpretation?
This is **not** ZFS internals — its a semantic layer issue.
### Yes, they absolutely can.
Example:
* Snapshot A:
* Hash algorithm = SHA256
* Program semantics v1
* Snapshot B:
* Hash algorithm = BLAKE3
* Program semantics v2
Same bytes. Same blocks.
**Different meaning.**
### Interpretation depends on:
* Hash namespace
* ASL config
* PEL semantics
* Registry rules
* Certification policy
So:
> Artifacts are immutable, **meaning is snapshot-scoped**.
This is a feature, not a bug.
---
# 6. Minimal thing required to reproduce an execution elsewhere
Your answer is *almost* correct — but missing one crucial thing.
### Not sufficient
* Last checkpoint
* Sequential write log
That reproduces **bytes**, not **meaning**.
### Minimal reproducible set
You need:
1. **Checkpoint snapshot**
2. **Sequential write log**
3. **PEL semantics version**
4. **ASL configuration**
5. **Hash algorithm registry**
6. **Execution receipts**
Without (35), you can replay bytes but not verify correctness.
### Governance, federation, certificates?
Those are:
* **Not required to replay**
* **Required to trust**
This distinction is important.
---
# Final Design Rules (Strong Recommendations)
1. **PEL reads snapshot identity and execution config only**
2. **Receipts are artifacts, never auto-executables**
3. **GC is snapshot-based with tombstones**
4. **Snapshot identity is global and opaque**
5. **Artifacts are bytes; interpretation is snapshot-scoped**
6. **Reproducibility requires semantics, not just storage**
---
If you want, next we can:
* Design the **receipt format**
* Define **PEL determinism rules formally**
* Specify **GC lifecycle states**
* Or design **federation across pools**
Just point to the next pressure point.
Thank you, receipt format, pel determinism rules are well defined. provenance keeps snapshots held by ZFS.
Internal system has a sense of domains and federation, published vs internal provinance.
That helps a lot. Given that **receipt format and PEL determinism are already solid**, the remaining work is mostly about **making the boundaries explicit so the system stays sane at scale and across federation**. Ill focus on *what to lock down*, not re-design what you already have.
---
## 1. Snapshots held by provenance (good — but formalize it)
You said:
> provenance keeps snapshots held by ZFS
This is the **right anchor**, but it needs one explicit rule:
### Rule: Provenance is a *snapshot retention authority*
* Provenance objects must explicitly:
* **Pin snapshot IDs**
* Declare **retention intent** (internal, published, certified, etc.)
* ZFS snapshot retention must be **driven by provenance**, not by CAS reachability alone
Think of provenance as the *root of truth* for GC policy:
```
Provenance → Snapshot → Reachable Artifacts
```
This prevents:
* Accidental deletion of published artifacts
* Silent provenance loss
* Cross-domain retention leaks
---
## 2. Domains: internal vs published provenance
Your mention of:
> domains and federation, published vs internal provenance
is extremely important. This is where many systems get muddy.
### Strong recommendation: three provenance domains
#### 1. Internal domain
* Full receipts
* Full DAGs
* Debug artifacts
* Intermediate hashes
* Possibly unstable semantics
**Never federated by default.**
#### 2. Published domain
* Curated receipts
* Canonical artifacts
* Stable program semantics
* Redacted internals
This is what leaves the system.
#### 3. Certified domain
* Signed receipts
* Verified semantics
* Frozen interpretation
* Long-term retention guarantees
Each domain:
* Pins its own snapshots
* Has its own GC horizon
* Has explicit export rules
---
## 3. Federation: what actually crosses boundaries
Federation must be **artifact- and receipt-centric**, not snapshot-centric.
### What you should federate
* Artifacts (content-addressed)
* Receipts (as artifacts)
* Program descriptors
* Provenance summaries
### What you must *not* federate
* ZFS snapshots
* Physical layout
* Shard topology
* Index internals
Federation should be possible over:
```
CAS + receipts + semantics registry
```
—not over storage state.
---
## 4. Interpretation divergence is now a feature
Given domains + federation, this becomes clean:
* Same artifact hash
* Different domains
* Different meaning
This allows:
* Internal experimentation
* External stability
* Future semantic upgrades
You get **semantic versioning without data migration**.
---
## 5. Minimal invariant set (this is the “do not break” list)
As long as these hold, the system stays coherent:
1. **Artifacts are immutable**
2. **Receipts are immutable**
3. **Snapshots are pinned by provenance**
4. **PEL semantics are versioned and explicit**
5. **Domains control interpretation, not storage**
6. **Federation never depends on physical layout**
Everything else can evolve.
---
## 6. One last subtle pressure point to watch
Because:
* Receipts are artifacts
* Receipts can be inputs to PEL
* Provenance pins snapshots
You can accidentally create:
> **Self-sustaining provenance loops**
Example:
* Receipt A depends on artifact B
* Artifact B is produced by program whose receipt is A
This is logically valid but dangerous for GC and reasoning.
### Recommendation
Add a simple invariant:
> Provenance graphs must be **acyclic at the snapshot level**
Not at the artifact level — at the *snapshot pinning* level.
---
## Bottom line
At this point, your system is not missing features — its missing **formal guardrails**.
You already have:
* Determinism
* Receipts
* Snapshots
* Federation concepts
What you need now is:
* Explicit domain boundaries
* Provenance-driven retention
* Clear export semantics
If you want, next we can:
* Formalize **domain transitions** (internal → published → certified)
* Design **federation protocols**
* Stress-test snapshot pinning under adversarial workloads
* Or walk through a full lifecycle of an artifact from creation to certification
Just say where you want to apply pressure next.
We are looking into index for hash to block, offset and length where blocks are CAS/ASL managed blocks to match ZFS blocksize. The blocks and the log should be neded to reach CURRENT. Indexing needs to be done without loading to much into memory. here bloom filters and sharding comes to play.
Good, now were at the **hard systems part** 🙂
Ill stay focused on **hash → (block, offset, length)** indexing at *billions of entries*, snapshot-aware, ZFS-aligned, and **without large memory residency**.
Ill describe:
1. **Index shape**
2. **Write path**
3. **Read path**
4. **Bloom + sharding strategy**
5. **Snapshot sealing**
6. **Why this works at scale**
No fluff.
---
## 1. Index shape (what you should actually store)
You want the minimal mapping:
```
(hash_id, snapshot_id?) → (block_id, offset, length, flags)
```
### Strong recommendation
Make the index **append-only, segmented, and snapshot-sealed**.
#### Physical layout
* **Index segment** = immutable file
* Each segment covers:
* One shard
* One snapshot range (or single snapshot)
* Segments are written sequentially
Think:
```
/index/
shard-00/
seg-000001.idx
seg-000002.idx
shard-01/
seg-000001.idx
```
---
## 2. Sharding strategy (critical)
Shard on **hash prefix**, not snapshot.
Example:
```
shard = hash_id[0..N bits]
```
Properties:
* Uniform distribution
* Deterministic placement
* SIMD-friendly comparisons
* No rebalancing required
Shard count should be:
* Large enough to bound segment size
* Small enough to avoid too many file handles
Rule of thumb:
* 2¹⁰2¹⁴ shards (102416384)
---
## 3. Write path (fast, simple, safe)
### Step-by-step
1. **Artifact written to CAS block**
* Block is ZFS-sized (good)
* Offset + length known
2. **Index entry appended**
* Written to shard-local segment
* Sequential write
* No in-place update
3. **Bloom filter updated**
* Shard-local
* Snapshot-local
4. **Snapshot seal**
* Segment becomes immutable
* Bloom filter frozen
No global coordination required.
---
## 4. Read path (minimal IO, bounded)
### Lookup algorithm
Given `hash_id` and CURRENT snapshot:
1. Compute shard from hash prefix
2. Load shard bloom filter (memory-mapped or cached)
3. If bloom says “no” → done
4. If “maybe”:
* Scan **segments newest → oldest**
* Stop on first match
Because:
* Newer segments shadow older ones
* No deletes (only tombstones if needed)
### Why this is fast
* Bloom filters kill most misses
* Segment scan is sequential
* Segments are small and shard-bounded
* OS page cache does the heavy lifting
---
## 5. Bloom filters: how to not screw this up
### Per-shard, per-segment bloom filters
Do **not** use one giant bloom.
Each index segment has:
* Its own bloom
* Sized to expected entries
* Frozen on snapshot seal
This gives:
* Predictable false-positive rate
* No saturation over time
* Easy discard when segment GCd
Memory usage stays bounded.
---
## 6. SIMD-friendly index encoding
Inside a segment:
```
[hash_prefix][hash_suffix][block_id][offset][length][flags]
```
Recommendations:
* Fixed-width fields
* Aligned records
* Sorted by hash within segment (optional but powerful)
If sorted:
* Binary search or galloping scan
* SIMD compare of hash lanes
If unsorted:
* SIMD scan still works, just more comparisons
Sorting cost is paid once at seal time.
---
## 7. Snapshot awareness (CURRENT vs history)
You said:
> The blocks and the log should be needed to reach CURRENT
Correct.
### Important rule
The index **does not need to know ZFS history**.
It only needs:
* Segment creation snapshot
* Segment seal snapshot
When resolving CURRENT:
* Ignore segments sealed after CURRENT
* Include all segments ≤ CURRENT
This is cheap:
* Just a snapshot ID comparison
* No traversal
---
## 8. Memory discipline (how this stays small)
You never load:
* Full index
* Full shard
* Full bloom universe
You load:
* Bloom for a shard (small)
* A few recent segments
* Possibly memory-map older ones
Worst-case memory:
```
O(shards × bloom_size) + O(active_segments)
```
Bounded. Predictable. Cache-friendly.
---
## 9. Tombstones (only if you really need them)
If you must support:
* Hash redefinition
* Conflict resolution
* Artifact invalidation
Then:
* Tombstone is just another index entry
* Higher precedence than data entries
* Stored in the same segment stream
No special machinery.
---
## 10. Why this design survives billions of entries
Because:
* Append-only
* No rewrites
* No global locks
* Snapshot-aligned immutability
* ZFS handles durability and caching
* Bloom filters kill negative lookups
* Shards bound working sets
This is essentially:
> **LSM-tree semantics, but snapshot-native and immutable-first**
Which matches your system philosophy perfectly.
---
## One final sharp recommendation
**Do not try to make the index “smart”.**
Make it:
* Dumb
* Immutable
* Sequential
* Snapshot-scoped
All intelligence belongs:
* In PEL
* In provenance
* In registries
---
If you want next, we can:
* Pick exact record layouts (bytes-on-disk)
* Tune bloom parameters for your scale
* Walk through crash recovery
* Stress-test CURRENT resolution under heavy churn
Say the word.
Pick exact record layouts. bytes-on-disk mapped to C structures (used for c lib. spawning tools and api-frontends)
Great — Ill be concrete and opinionated. Below is a **bytes-on-disk index format** that:
* Maps **1:1 to C structs**
* Is **SIMD-friendly**
* Is **append-only**
* Works with **sharding + bloom filters**
* Is **snapshot-aware without embedding ZFS internals**
* Scales to **billions of entries**
* Avoids pointer chasing and heap allocation
Ill give:
1. High-level layout
2. Segment file structure
3. Exact record structs
4. Bloom filter layout
5. Snapshot sealing metadata
6. Alignment + portability rules
7. Example lookup flow in C terms
---
# 1. High-level layout
Each **index segment file** is immutable after sealing.
```
+----------------------+
| SegmentHeader |
+----------------------+
| BloomFilter |
+----------------------+
| IndexRecord[] | (append-only)
+----------------------+
| SegmentFooter |
+----------------------+
```
One segment:
* One shard
* One snapshot interval
* Sequentially written
* mmap-friendly
---
# 2. Segment header (fixed, small)
### Design goals
* Fast validation
* Versioning
* Snapshot bounds
* No pointers
```c
#pragma pack(push, 1)
typedef struct {
uint64_t magic; // e.g. 0x434153494E445831 ("CASINDX1")
uint16_t version; // format version
uint16_t shard_id; // hash prefix shard
uint32_t header_size; // bytes
uint64_t snapshot_min; // inclusive
uint64_t snapshot_max; // inclusive (== min until sealed)
uint64_t record_count; // filled at seal
uint64_t records_offset; // byte offset to IndexRecord[]
uint64_t bloom_offset; // byte offset
uint64_t bloom_size; // bytes
uint64_t flags; // sealed, sorted, has_tombstones, etc.
} SegmentHeader;
#pragma pack(pop)
```
Notes:
* `snapshot_max == snapshot_min` while writing
* On seal, snapshot_max is finalized
* Header never changes size
---
# 3. Index record layout (core of everything)
This is the **hot path**, so it must be perfect.
### Hash strategy
Split hash into:
* **Prefix** → shard
* **Suffix** → stored inline
Assume:
* Hash = 256 bits
* Shard uses first 12 bits
* Remaining 244 bits stored
### Record struct
```c
#pragma pack(push, 1)
typedef struct {
uint64_t hash_hi; // hash[12..75]
uint64_t hash_mid; // hash[76..139]
uint64_t hash_lo; // hash[140..203]
uint32_t hash_tail; // hash[204..235] (remaining bits, padded)
uint64_t block_id; // CAS/ASL block identifier
uint32_t offset; // offset within block
uint32_t length; // length in bytes
uint32_t flags; // data, tombstone, algo_id, etc.
uint32_t reserved; // alignment / future use
} IndexRecord;
#pragma pack(pop)
```
### Properties
* **64 bytes total** (cache-line aligned)
* Fixed-width
* SIMD-comparable hash lanes
* No pointers
* No varints
* Works with `memcmp`, AVX, NEON
---
# 4. Flags definition (important)
```c
#define IDX_FLAG_DATA 0x00000001
#define IDX_FLAG_TOMBSTONE 0x00000002
#define IDX_FLAG_INLINE 0x00000004 // small payload embedded elsewhere
#define IDX_FLAG_ALGO_SHIFT 8 // upper bits for hash algo id
```
Hash algorithm ID lives in flags — **not in payload**.
---
# 5. Bloom filter layout
Bloom is **segment-local**, frozen at seal.
### Header
```c
#pragma pack(push, 1)
typedef struct {
uint64_t hash_seed; // per-segment
uint32_t k; // number of hash functions
uint32_t bits_per_entry; // tuning knob
uint64_t bit_count; // total bits
} BloomHeader;
#pragma pack(pop)
```
### Bit array
Immediately follows header:
```c
uint8_t bloom_bits[];
```
Bloom filter offset and size are in `SegmentHeader`.
---
# 6. Segment footer (integrity + seal)
```c
#pragma pack(push, 1)
typedef struct {
uint64_t crc64; // over header+bloom+records
uint64_t seal_snapshot; // snapshot at seal time
uint64_t seal_time_ns; // optional
} SegmentFooter;
#pragma pack(pop)
```
Footer allows:
* Corruption detection
* Fast rejection
* Seal verification
---
# 7. Alignment & portability rules
**Non-negotiable rules:**
1. All structs are **packed**
2. All integers are **little-endian**
3. No pointers
4. No platform-dependent types
5. All offsets are file-relative
This makes:
* mmap safe
* Cross-language bindings trivial
* Zero-copy reads possible
---
# 8. Memory-mapped usage model (C)
Typical read path:
```c
SegmentHeader *hdr = mmap_segment(path);
if (!bloom_maybe(hdr, hash)) {
return NOT_FOUND;
}
IndexRecord *records = (IndexRecord *)(
(uint8_t *)hdr + hdr->records_offset
);
for (ssize_t i = hdr->record_count - 1; i >= 0; --i) {
if (hash_equal(&records[i], hash)) {
if (records[i].flags & IDX_FLAG_TOMBSTONE)
return NOT_FOUND;
return records[i];
}
}
```
With sorting:
* Binary search
* Galloping search
* SIMD blocks
---
# 9. Optional: Sorted segments (recommended)
At seal time:
* Sort records by `(hash_suffix)`
* Set `SEGMENT_FLAG_SORTED`
This gives:
* Faster lookup
* Smaller scan window
* Better cache locality
Sorting cost is amortized and paid once.
---
# 10. Why this layout works long-term
* Immutable
* Append-only
* Snapshot-scoped
* SIMD-friendly
* mmap-friendly
* No schema migration pain
* Easy to generate bindings (Rust, Go, Python)
This is the kind of layout that survives **decades**, not just versions.
---
## If you want next
We can:
* Tune record size vs hash width
* Design **small-block inline payload segments**
* Add **prefix-compressed variants**
* Design **cross-segment merge logic**
* Generate **actual C headers + helpers**
Tell me where to zoom in next.
I think we may have discussed block size. But not my original intent that is asl artifacts of different size kept apart as big and small.
Good catch — this is an important distinction, and it affects **block layout, indexing, and GC**. Youre not just talking about ZFS blocksize anymore, youre talking about **artifact size classes inside ASL**, with *different physical handling*.
Ill realign the design to your **original intent**:
> **ASL artifacts of different sizes are kept apart as small and big**, even though both live on ZFS and are indexed uniformly.
---
## 1. Clarify the layers (important reset)
### CAS
* Logical identity (hash → artifact)
* Index (hash → location)
* Snapshot semantics
### ASL
* **Physical artifact storage**
* Size-class separation
* Layout & packing
* Block management
ZFS blocksize is a *substrate detail*.
ASL defines **artifact blocks**, which may map onto one or more ZFS blocks.
---
## 2. Size classes (explicit, non-negotiable)
You want at least two classes:
### Small artifacts
* Typical sizes: bytes → tens of KB
* Stored **packed**
* High density
* High fan-in
* Many per block
### Big artifacts
* Typical sizes: hundreds of KB → GB
* Stored **directly**
* Minimal fragmentation
* Streaming-friendly
Make this explicit in ASL config:
```c
typedef struct {
uint32_t small_max_size; // e.g. 64 KB
uint32_t small_block_size; // e.g. 14 MB
uint32_t big_block_size; // e.g. ZFS recordsize or multiples
} ASLConfig;
```
---
## 3. Physical layout by size class
### 3.1 Small artifact blocks
Small artifacts are packed into **ASL small blocks**.
```
+------------------------+
| SmallBlockHeader |
+------------------------+
| ArtifactEntry[] |
+------------------------+
| Free space |
+------------------------+
```
Each artifact:
* Stored contiguously
* Offset + length identifies it
* No pointer chasing
#### Header
```c
#pragma pack(push, 1)
typedef struct {
uint64_t block_id;
uint32_t used_bytes;
uint32_t artifact_count;
} SmallBlockHeader;
#pragma pack(pop)
```
Artifacts are appended sequentially.
---
### 3.2 Big artifact blocks
Big artifacts:
* Stored one-per-block or as extent chains
* No packing
* No fragmentation games
Options:
* Single ASL block == one artifact
* Or extent list if very large
Header example:
```c
#pragma pack(push, 1)
typedef struct {
uint64_t block_id;
uint64_t artifact_size;
uint64_t flags;
} BigBlockHeader;
#pragma pack(pop)
```
Payload follows immediately.
---
## 4. Index mapping (now corrected)
Index record must reflect **ASL-level blocks**, not ZFS blocks.
So your index record fields:
```
hash → (asl_block_id, offset, length, flags)
```
### Flags indicate size class
```c
#define IDX_FLAG_SMALL 0x00000001
#define IDX_FLAG_BIG 0x00000002
```
This tells the reader:
* Whether to go through small-block unpacking
* Or treat the block as a direct artifact
No ambiguity.
---
## 5. Why separation matters (practical wins)
### Small artifacts
* Better cache locality
* Lower metadata overhead
* Faster dedup for tiny items
* Lower index pressure
### Big artifacts
* No read amplification
* Streaming-friendly
* Better ZFS prefetch
* Avoids hot small-block churn
This matches real-world workloads extremely well.
---
## 6. GC implications (size-aware)
GC works at **ASL block granularity**:
* Big blocks:
* Drop block when unreferenced
* Small blocks:
* Use:
* Reference counting per artifact
* Or evacuate live artifacts to new blocks (copy GC)
* Or tombstone + rebuild during compaction
**Important**: small-block GC is *optional* early on.
You can defer it until fragmentation hurts.
---
## 7. Snapshot interaction (no surprises)
* Index entries are snapshot-scoped
* ASL blocks are immutable once sealed
* Small blocks fill, seal, move on
* Big blocks seal immediately
Snapshots pin blocks naturally.
---
## 8. Revised index record (final form)
Heres the **corrected** index record given size classes:
```c
#pragma pack(push, 1)
typedef struct {
uint64_t hash_hi;
uint64_t hash_mid;
uint64_t hash_lo;
uint32_t hash_tail;
uint64_t asl_block_id;
uint32_t offset;
uint32_t length;
uint32_t flags; // SMALL / BIG / TOMBSTONE / ALGO
uint32_t reserved;
} IndexRecord;
#pragma pack(pop)
```
Same as before — but semantics are now correct.
---
## 9. One important rule to write down
> **CAS never reasons about size classes.**
> **ASL never reasons about identity.**
Size class is a *storage concern*, exposed only via flags for correct decoding.
---
## 10. Where to go next (useful follow-ups)
If you want, we can:
* Design **small-block packing format** in more detail
* Define **ASL block ID allocation**
* Design **small-block compaction / evacuation**
* Tune **size thresholds** using workload assumptions
* Design **zero-copy reads** for small artifacts
This clarification was important — your instinct was right to pause here.
Examine the model.
Size class to me was something ASL did when creating and sending blocks. It is just ASL packageing smaller items in one box and preferably dont mix in small items with the bigger items out of courtesy.
That clarification is important — and it actually *simplifies* the model in a good way. Let me examine it **as you intend it**, not as a generic CAS pattern.
You are **not** introducing size classes as a semantic concept.
You are introducing **courtesy-based physical packaging** inside ASL.
Thats a subtle but powerful distinction.
---
## 1. Restating your intent (to check alignment)
What you are saying is effectively:
> *ASL produces blocks.
> ASL prefers to package small artifacts together and not mix them with big ones.
> Size is a heuristic for packaging, not an exposed concept.*
Key consequences:
* CAS does **not** know about size classes
* Index does **not** encode size class semantics
* PEL does **not** care
* Size affects **only how ASL fills blocks**
If this matches your intent — then the earlier “size class flags” were **too strong**.
I agree with you.
---
## 2. Correct model after correction
### What size affects
* Block fill strategy
* Block sealing policy
* Write amplification
* Cache locality
### What size does *not* affect
* Identity
* Index semantics
* Read logic
* Provenance
* Snapshot behavior
This keeps the system clean.
---
## 3. ASL block abstraction (single kind of block)
You should think in terms of **one ASL block type**, with **soft heuristics**:
```
ASL Block
├── block_id
├── payload_bytes
├── internal layout (opaque to CAS)
└── sealed or open
```
Inside ASL:
* Some blocks will naturally contain many small artifacts
* Some blocks will contain one large artifact
* CAS does not know or care
This is exactly like object packing in Git packfiles — *but snapshot-native*.
---
## 4. Index stays simple (good)
Your original index mapping remains correct and minimal:
```
hash → (asl_block_id, offset, length)
```
No flags required for size.
Thats a **strong design win**:
* Fewer invariants
* Fewer migrations
* Less coupling
---
## 5. Reads remain uniform
Read path:
1. Lookup hash
2. Get `(block_id, offset, length)`
3. Ask ASL: “give me bytes”
4. ASL does whatever it needs internally
PEL and CAS never branch on size.
---
## 6. Writes: where size actually matters
Size only influences **block selection**:
* If artifact is “small enough”:
* Try current open block
* If artifact is “large”:
* Open a fresh block
* Write it
* Seal immediately
But this rule is:
* **Local**
* **Heuristic**
* **Not persistent knowledge**
You can change it later without breaking anything.
---
## 7. GC implications (still clean)
Because:
* Blocks are immutable once sealed
* Index defines reachability
GC can operate purely on:
* Block reachability
* Snapshot pinning
Small-block compaction is:
* An optimization
* Entirely inside ASL
* Invisible to CAS/PEL
No semantic coupling introduced.
---
## 8. One thing to be explicit about (important)
Even though size is *not* a semantic concept, you still want:
> **ASL blocks must never be partially mutable after seal.**
Otherwise:
* Offsets become unstable
* Index breaks
* Receipts lose meaning
As long as:
* ASL blocks are append-only then sealed
* No in-place mutation after seal
Youre safe.
---
## 9. Summary: model sanity check
Your corrected model is:
* ✔ Size is a **packaging courtesy**
* ✔ ASL hides size behavior
* ✔ CAS index remains minimal
* ✔ PEL is unaffected
* ✔ Future heuristics are swappable
This is **better** than a hard size-class model.
---
## 10. One gentle recommendation (optional)
Even if size is not semantic, consider storing **block hints** internally in ASL:
```c
enum ASLBlockHint {
ASL_HINT_DENSE,
ASL_HINT_SINGLETON,
};
```
This never leaves ASL, but helps:
* Compaction
* GC heuristics
* Read-ahead tuning
No one else needs to know.
---
### Bottom line
Your instinct is correct:
**size belongs to ASL behavior, not system meaning**.
The model is coherent, simpler, and more evolvable this way.
If you want, next we can:
* Stress-test this under mixed workloads
* Design ASL block ID allocation
* Walk through crash recovery of open blocks
* Or define ASL compaction strategies
Just point the spotlight.
Design ASL block ID allocation. I presume having two concrete ZFS datasets attached to the system would have impact is right. What do you say?
Good question — **yes, having two concrete ZFS datasets absolutely has impact**, and if you lean into that fact instead of abstracting it away, the design gets *simpler and safer*.
Ill go step-by-step and keep it grounded in how ZFS actually behaves, while preserving your ASL abstraction.
---
# 1. What an ASL Block ID really is (first principle)
An **ASL block ID is not a physical address**.
It is a **stable, opaque capability** that:
* Identifies a sealed ASL block
* Survives snapshots
* Is valid across restarts
* Is independent of index layout
* Can be resolved via ASL alone
So:
> **Block ID ≠ offset, ≠ object number, ≠ filename**
Block ID is a *name*, not a location.
---
# 2. Why ZFS datasets matter (youre right)
ZFS datasets give you:
* Independent recordsize
* Independent compression
* Independent quotas
* Independent snapshot lifecycles
* Independent IO behavior
That maps *perfectly* to your packaging courtesy idea.
So yes: **two datasets are a feature**, not a complication.
---
# 3. Recommended dataset topology
At minimum:
```
pool/
├── asl-small/ (dense packing, many artifacts per block)
└── asl-big/ (mostly one artifact per block)
```
Optionally later:
```
├── asl-volatile/
├── asl-published/
└── asl-certified/
```
But start with two.
### Dataset tuning examples
* `asl-small`
* smaller recordsize (e.g. 64K256K)
* aggressive compression
* `asl-big`
* larger recordsize (e.g. 1M8M)
* streaming-friendly
ASL decides *which dataset to write to* — CAS never sees this.
---
# 4. Block ID structure (this is the key)
### Strong recommendation: **structured, but opaque**
```c
typedef struct {
uint64_t dataset_id; // small / big / future
uint64_t sequence; // monotonically increasing
} ASLBlockID;
```
Total: **128 bits** (or hash it down if you want).
### Why this works
* No global coordination
* No contention
* No reuse
* Easy sharding
* Easy federation
---
# 5. Dataset ID (small, fixed namespace)
Dataset ID is **not dynamic**.
Example:
```c
enum ASLDatasetID {
ASL_DS_SMALL = 1,
ASL_DS_BIG = 2
};
```
This ID is:
* Compiled into ASL
* Part of the block ID
* Stable forever
Changing dataset tuning does **not** change dataset ID.
---
# 6. Sequence allocation (simple and robust)
Each dataset maintains its own allocator:
```
asl-small: next_block_seq
asl-big: next_block_seq
```
Allocation rules:
* Monotonic
* Append-only
* Persisted in dataset-local metadata
* Advanced only when block is created
### On crash
* You may leak an ID
* You never reuse an ID
* Thats fine
---
# 7. Mapping Block ID → storage (ASL-only)
ASL resolves block IDs like this:
```
(dataset_id, sequence)
dataset root
object path or ZFS object
```
Example mapping (implementation choice):
```
asl-small/blocks/0000/0000/00012345.blk
asl-big/blocks/0000/0000/00098765.blk
```
Or:
* ZFS object per block
* ZVOL
* File-backed objects
CAS does not care.
---
# 8. Snapshot interaction (clean)
Because:
* Blocks are immutable after seal
* Stored inside snapshot-aware datasets
Then:
* Snapshot pins blocks automatically
* Index references remain valid
* Provenance pinning works naturally
No extra machinery required.
---
# 9. Federation implications (important)
Block IDs **must not encode pool identity**.
Why:
* Federation means blocks may move
* Blocks may be replicated
* Pool names are local
If you need global uniqueness:
```
GlobalBlockID = hash(cluster_id || ASLBlockID)
```
But internally:
* `(dataset_id, sequence)` is perfect
---
# 10. Why two datasets improves the system
You get:
* Independent IO characteristics
* Natural separation without semantics
* Easier tuning
* Less fragmentation
* Clear GC boundaries
And crucially:
> **You can change the heuristic later without breaking block identity.**
---
# 11. What *not* to do (anti-patterns)
Avoid:
* Encoding artifact size into block ID
* Encoding snapshot ID into block ID
* Encoding physical offsets
* Using filenames as identity
* Reusing block IDs
All of these break invariants later.
---
# 12. Final sanity check
Your model now looks like this:
* ASL:
* Chooses dataset
* Allocates block ID
* Writes block
* Seals block
* CAS:
* Stores `(block_id, offset, length)`
* PEL:
* Never sees block IDs
* ZFS:
* Handles durability, snapshots, caching
This is **clean, layered, and future-proof**.
---
## If you want next
We can:
* Design ASL open-block lifecycle
* Design crash recovery for partially written blocks
* Decide file vs object vs zvol per block
* Design block compaction / evacuation
* Stress-test block ID growth and limits
Just say where to drill next.
Hmm... Small and big blocks should be addressed the same way. Can they maybe share storage closer, somehow?
Yes — and your instinct is right.
If **small and big blocks are addressed the same way**, then the earlier “dataset-per-size” framing is **too strong**. Lets correct it without losing the practical benefits.
What you actually want is:
> **One ASL block namespace, one addressing scheme, one resolution path —
> with storage *locality hints*, not semantic separation.**
That is both cleaner *and* more powerful.
---
## 1. Re-establish the invariant (this is the anchor)
**Invariant you want to preserve:**
* `ASLBlockID` is opaque
* Resolving a block ID does not require knowing whether it was “small” or “big”
* CAS and PEL never branch on block kind
* Storage choice is an ASL internal decision
So:
> **Small vs big is not part of the blocks identity.**
Good. That rules out encoding size class into the ID.
---
## 2. The mistake to avoid
The mistake would be:
* Two datasets ⇒ two namespaces
* Two datasets ⇒ different resolution logic
* Two datasets ⇒ semantic leakage
You dont want that.
---
## 3. The correct model: one namespace, shared storage, hinted placement
### One block ID type
```c
typedef uint64_t ASLBlockID;
```
Monotonic, opaque, never reused.
No dataset ID.
No size bit.
No encoding tricks.
---
## 4. How blocks “share storage closer”
Instead of **separate datasets**, use **one dataset** with **structured locality**.
Example layout:
```
asl/
├── blocks/
│ ├── open/
│ ├── sealed/
│ │ ├── 00/
│ │ ├── 01/
│ │ └── ff/
```
Block ID → directory sharding only.
Now comes the important part.
---
## 5. Locality without semantic separation (the key trick)
When ASL allocates a block, it supplies **placement hints**, not identity changes.
### Examples of hints
* ZFS recordsize hint
* Compression hint
* Write pattern hint
* Expected fill density hint
These hints are **best-effort** and **local**.
They may be implemented as:
* ZFS properties on subdirectories
* ZFS object creation flags
* Different write sizes
* Different block finalization strategies
But the block ID and path resolution remain identical.
---
## 6. Concrete example (single dataset, different behavior)
```
asl/
├── blocks/
│ ├── dense/ ← preferred for small artifacts
│ ├── sparse/ ← preferred for large artifacts
```
Both live in the same dataset.
Resolution logic:
```
block_id → hash → directory → file
```
No “if small then …” at read time.
Only at **allocation time**:
* ASL *prefers* `dense/` if current block has room
* ASL *prefers* `sparse/` if artifact is large
But if tomorrow you change the heuristic:
* Old blocks still work
* IDs dont change
* Index doesnt change
Thats exactly what you want.
---
## 7. ZFS tuning without semantic coupling
You can still tune ZFS:
* Set different recordsize on `dense/` vs `sparse/` directories
* Set different compression
* Set different prefetch behavior
ZFS allows property inheritance at directory level.
So you get:
* Physical differentiation
* Logical uniformity
---
## 8. GC and compaction stay simple
Because:
* Blocks are immutable after seal
* Block ID resolution is uniform
GC can:
* Walk sealed blocks
* Check reachability
* Delete whole blocks
No special cases.
Compaction:
* Can evacuate artifacts into new blocks
* Old block ID becomes unreachable
* Index updates do the rest
---
## 9. Why this is better than two datasets
Two datasets:
* Create implicit semantic boundaries
* Complicate federation
* Complicate tooling
* Leak storage decisions upward
One dataset with locality hints:
* Keeps one namespace
* Keeps resolution uniform
* Keeps future flexibility
This matches your “courtesy” framing perfectly.
---
## 10. Revised ASL block ID allocation (final form)
**Allocation rule:**
* Block ID = monotonically increasing counter
* Allocation is global to ASL
* Block ID never encodes size, dataset, or policy
**Placement rule:**
* ASL chooses *where* to place the block based on heuristics
* That choice is not observable above ASL
This separation is the core design win.
---
## 11. Sanity check against your earlier statements
* ✔ “Small and big blocks should be addressed the same way” → YES
* ✔ “Size is ASL packaging courtesy” → YES
* ✔ “Prefer not mixing small with big” → YES (heuristic)
* ✔ “No semantic exposure” → YES
The model is now internally consistent.
---
## 12. One last recommendation (tiny but important)
Document this explicitly:
> **Block placement is an optimization, not a contract.**
That sentence will save you years later.
---
If you want next, the natural follow-ups are:
* ASL open-block lifecycle and sealing
* Crash recovery for partially written blocks
* Directory / object layout details on ZFS
* How compaction rewrites index safely
Say which one you want to tackle.
Yes. Could you help me capture this model in an architecture and specification?
Absolutely. Ill help you **capture the model cleanly** in a way that is:
* Precise enough to implement
* Explicit about invariants
* Careful about what is *not* part of the contract
* Suitable as an **architecture + specification** section
Ill structure this as something you could almost drop into a design doc.
---
# ASL Block Architecture & Specification
## 1. Purpose and Scope
The **Artifact Storage Layer (ASL)** is responsible for the **physical storage, layout, and retrieval of immutable artifact bytes**.
ASL operates beneath CAS and above the storage substrate (ZFS).
ASL concerns itself with:
* Efficient packaging of artifacts into blocks
* Stable block addressing
* Snapshot-safe immutability
* Storage-local optimizations
ASL does **not** define:
* Artifact identity
* Hash semantics
* Provenance
* Interpretation
* Indexing semantics
---
## 2. Core Abstractions
### 2.1 Artifact
An **artifact** is an immutable byte sequence produced or consumed by higher layers.
ASL treats artifacts as opaque bytes.
---
### 2.2 ASL Block
An **ASL block** is the smallest independently addressable, immutable unit of storage managed by ASL.
Properties:
* Identified by an **ASL Block ID**
* Contains one or more artifacts
* Written sequentially
* Immutable once sealed
* Snapshot-safe
ASL blocks are the unit of:
* Storage
* Reachability
* Garbage collection
---
### 2.3 ASL Block ID
An **ASL Block ID** is an opaque, stable identifier.
#### Invariants
* Globally unique within an ASL instance
* Never reused
* Never mutated
* Does **not** encode:
* Artifact size
* Placement
* Snapshot
* Storage topology
* Policy decisions
#### Semantics
Block IDs identify *logical blocks*, not physical locations.
Higher layers must treat block IDs as uninterpretable tokens.
---
## 3. Addressing Model
ASL exposes a single addressing primitive:
```
(block_id, offset, length) → bytes
```
This is the **only** contract between CAS and ASL.
Notes:
* `offset` and `length` are stable for the lifetime of the block
* ASL guarantees that reads are deterministic per snapshot
* No size-class or block-kind information is exposed
---
## 4. Block Allocation Model
### 4.1 Global Block Namespace
ASL maintains a **single global block namespace**.
Block IDs are allocated from a monotonically increasing sequence:
```
next_block_id := next_block_id + 1
```
Properties:
* Allocation is append-only
* Leaked IDs are permitted
* No coordination with CAS is required
---
### 4.2 Open Blocks
At any time, ASL may maintain one or more **open blocks**.
Open blocks:
* Accept new artifact writes
* Are not visible to readers
* Are not referenced by the index
* May be abandoned on crash
---
### 4.3 Sealed Blocks
A block becomes **sealed** when:
* It reaches an internal fill threshold, or
* ASL decides to finalize it for policy reasons
Once sealed:
* No further writes are permitted
* Offsets and lengths become permanent
* The block becomes visible to CAS
* The block may be referenced by index entries
Sealed blocks are immutable forever.
---
## 5. Packaging Policy (Non-Semantic)
ASL applies **packaging heuristics** when choosing how to place artifacts into blocks.
Examples:
* Prefer packing many small artifacts together
* Prefer isolating very large artifacts
* Avoid mixing vastly different sizes when convenient
### Important rule
Packaging decisions are:
* Best-effort
* Local
* Replaceable
* **Not part of the ASL contract**
No higher layer may assume anything about block contents based on artifact size.
---
## 6. Storage Layout and Locality
### 6.1 Single Dataset, Structured Locality
ASL stores all blocks within a **single ZFS dataset**.
Within that dataset, ASL may organize blocks into subpaths to improve locality, e.g.:
```
asl/blocks/dense/
asl/blocks/sparse/
```
These subpaths:
* Exist purely for storage optimization
* May carry ZFS property overrides
* Are not encoded into block identity
Block resolution does **not** depend on knowing which subpath was used.
---
### 6.2 Placement Hints
At allocation time, ASL may apply **placement hints**, such as:
* Preferred directory
* Write size
* Compression preference
* Recordsize alignment
These hints:
* Affect only physical layout
* May change over time
* Do not affect block identity or correctness
---
## 7. Snapshot Semantics
ASL is snapshot-aware but snapshot-agnostic.
Rules:
* ASL blocks live inside snapshot-capable storage
* Snapshots naturally pin sealed blocks
* ASL does not encode snapshot IDs into block IDs
* CAS determines snapshot visibility
ASL guarantees:
* Deterministic reads for a given snapshot
* No mutation of sealed blocks across snapshots
---
## 8. Crash Safety and Recovery
### 8.1 Crash During Open Block
If a crash occurs:
* Open blocks may be lost or abandoned
* Block IDs allocated but not sealed may be leaked
* No sealed block may be corrupted
This is acceptable and expected.
---
### 8.2 Recovery Rules
On startup, ASL:
* Scans for sealed blocks
* Ignores or cleans up abandoned open blocks
* Resumes allocation from the next unused block ID
No global replay or rebuild is required.
---
## 9. Garbage Collection
ASL performs garbage collection at **block granularity**.
Rules:
* A block is eligible for deletion if:
* It is sealed, and
* It is unreachable from all retained snapshots
* ASL does not perform partial block mutation
* Compaction (if any) rewrites artifacts into new blocks
Block deletion is irreversible.
---
## 10. Non-Goals (Explicit)
ASL explicitly does **not** provide:
* Artifact identity management
* Deduplication decisions
* Provenance interpretation
* Size-class semantics
* Execution semantics
Those concerns belong to CAS, PEL, and higher layers.
---
## 11. Design Summary (Executive)
* One block namespace
* One addressing model
* One read path
* Placement is an optimization
* Immutability is absolute
* Snapshots provide safety
* Size is a courtesy, not a contract
---
If you want, next we can:
* Add **formal invariants** (for verification)
* Turn this into a **C header + comments**
* Write **pseudo-code for block lifecycle**
* Integrate this spec with the **index spec**
* Stress-test this model against adversarial failures
Just tell me how formal you want the next step to be.
The checkpoint is only a checkpoint but the append only write log can recover above the checkpoint.
Good clarification — this is an **important correction to the persistence model**, and it fits very well with everything youve designed so far. Lets fold it cleanly into the architecture/spec without breaking invariants.
Ill do three things:
1. Restate the persistence model precisely
2. Adjust the ASL + CAS responsibilities
3. Capture it as spec text you can drop in
---
## 1. Correct persistence model (precise statement)
What you are saying is:
> **A snapshot (checkpoint) is not the full state.
> The system state is: checkpoint + append-only write log.**
This means:
* Snapshots are **checkpoints**, not commits
* The write log is **authoritative above the checkpoint**
* Recovery = replay log on top of checkpoint
This is *not* a traditional WAL in the database sense — its closer to:
* Kafka log + snapshot
* LSM-tree + manifest
* Git packfiles + refs
And thats perfectly aligned with your system.
---
## 2. Reframing snapshots (important nuance)
### Snapshot semantics (revised)
A snapshot represents:
* A **stable base state**
* A **replay anchor**
* A **determinism boundary**
It does *not* necessarily represent:
* The latest committed state
* The full index
* All visible artifacts
So:
> **CURRENT = checkpoint snapshot + replayed log**
That distinction must be explicit everywhere.
---
## 3. What lives in the append-only log
The append-only log is now a **first-class system component**, not just an optimization.
The log records:
* ASL block seals
* CAS index segment creation
* Tombstones (if any)
* PEL receipts
* Registry mutations
* Provenance updates
Everything that changes system-visible state is logged.
---
## 4. ASL behavior under checkpoint + log
### 4.1 During normal operation
* ASL writes open blocks
* When a block is sealed:
* Block becomes immutable
* A **seal record is appended to the log**
* CAS may reference the block **only after** seal record exists
This guarantees:
* Crash safety
* Replayability
* Ordering
---
### 4.2 On checkpoint creation
When a checkpoint is taken:
* All prior log entries are considered “below the line”
* Snapshot captures:
* All sealed blocks
* All index segments
* All metadata up to that point
* Log continues **without reset**
You do *not* truncate the log eagerly.
---
### 4.3 On recovery
Recovery procedure:
1. Mount latest checkpoint snapshot
2. Read append-only log from checkpoint position
3. Replay entries in order:
* Re-register sealed blocks
* Re-attach index segments
* Rebuild CURRENT view
4. Resume normal operation
This is deterministic and bounded.
---
## 5. Why this is better than “snapshot = truth”
This model gives you:
* Faster checkpoints
* Less snapshot churn
* Fewer ZFS snapshots
* Clear crash recovery semantics
* Ability to validate or audit replay
It also aligns perfectly with your **PEL determinism guarantees**.
---
## 6. Updating the ASL spec (drop-in text)
Here is a **clean spec addition** you can insert.
---
### 8. Persistence, Checkpoints, and Recovery (Revised)
#### 8.1 Checkpoints
A **checkpoint** is a ZFS snapshot that captures a consistent base state of the system.
A checkpoint:
* Is immutable
* Serves as a replay anchor
* Does not necessarily reflect the latest system state
Checkpoints are created periodically or by policy.
---
#### 8.2 Append-Only Write Log
The system maintains an **append-only write log** that records all state mutations occurring after the most recent checkpoint.
The write log includes, but is not limited to:
* ASL block seal records
* Index segment creation records
* Tombstone records
* PEL receipt registrations
* Provenance and registry updates
The log is strictly ordered and durable.
---
#### 8.3 System State Definition
The current system state (CURRENT) is defined as:
```
CURRENT = checkpoint_state + replay(write_log)
```
No component may assume that the checkpoint alone represents CURRENT.
---
#### 8.4 Recovery Procedure
On startup or failure recovery:
1. The most recent checkpoint snapshot is mounted
2. The append-only write log is read from the checkpoint position
3. All log entries are replayed deterministically
4. The system transitions to CURRENT
5. Normal operation resumes
This process must be idempotent and deterministic.
---
#### 8.5 Interaction with ASL Blocks
* ASL blocks become visible only after a corresponding log entry exists
* Blocks written but not sealed at crash time may be discarded
* Leaked block IDs are permitted
* No sealed block may be modified or removed during replay
---
## 7. Invariants this introduces (important to write down)
You should explicitly state these invariants:
1. **Nothing is visible without a log record**
2. **Checkpoints are sufficient but not complete**
3. **Replay is the source of truth above the checkpoint**
4. **Sealed blocks are immutable across replay**
5. **Replay order defines CURRENT**
These invariants keep every layer honest.
---
## 8. How this interacts with PEL determinism
This model actually *strengthens* determinism:
* PEL executions reference snapshot + log position
* Receipts are replayed, not re-executed
* Determinism is guaranteed per checkpoint + log prefix
You now have a very strong story for:
* Auditing
* Federation
* Certification
---
## 9. Final sanity check
With this clarification, your system is:
* Snapshot-based
* Log-driven
* Immutable-first
* Deterministic
* Replayable
* Auditable
This is a **very robust design**.
---
If you want next, we can:
* Specify the **log record formats**
* Define **idempotency rules for replay**
* Design **log compaction vs checkpoints**
* Stress-test crash scenarios
* Integrate this with the index segment lifecycle
Just tell me where to zoom in.
Refine into three documents ASL-CORE-INDEX index addendum to ASL-CORE semantic model, ASL-STORE-INDEX index add-on for store semantics and contracts, and ENC-ASL-CORE-INDEX with exact encoding. If this makes sense to you.
Yes — this makes **excellent sense**, and the split you propose is *exactly* the right one. Youre separating:
* **What the index means**
* **How the store must behave to support it**
* **How it is encoded on disk**
That separation will save you from circular dependencies later.
Below Ill **refine the model into three documents**, with:
* Clear scope
* Explicit contracts
* Sharp non-goals
* Consistent terminology
You should be able to lift these almost verbatim into a spec repo.
---
# 1. ASL-CORE-INDEX
### *Index Addendum to ASL-CORE Semantic Model*
## 1.1 Purpose
This document defines the **semantic meaning of the ASL index**, independent of:
* Physical storage layout
* Encoding format
* ZFS specifics
* Performance optimizations
It extends **ASL-CORE** by specifying *what it means* to resolve an artifact via the index.
---
## 1.2 Scope
This document defines:
* Index concepts and semantics
* Visibility rules
* Snapshot and log interaction
* Required invariants
This document does **not** define:
* Disk formats
* Bloom filters
* Sharding
* Block placement
* Performance characteristics
---
## 1.3 Core Semantic Mapping
The ASL index defines a total mapping:
```
ArtifactKey → ArtifactLocation
```
Where:
```
ArtifactKey := hash_id (opaque, algorithm-scoped)
ArtifactLocation := (block_id, offset, length)
```
### Semantic guarantees
* The mapping is **immutable once visible**
* A mapping is **snapshot-relative**
* A mapping is **deterministic given (snapshot, log prefix)**
---
## 1.4 Visibility Model
An index entry becomes visible **iff**:
1. The referenced ASL block is sealed
2. A corresponding log record exists
3. The log record is ≤ CURRENT replay position
No other condition may make an entry visible.
---
## 1.5 Snapshot and Log Semantics
* Snapshots are **checkpoints**, not full state
* The index state at any time is:
```
Index(CURRENT) = Index(snapshot) + replay(log)
```
* Index entries may exist:
* Below the snapshot (materialized)
* Above the snapshot (log-derived)
The semantic meaning is identical.
---
## 1.6 Immutability and Shadowing
* Index entries are never mutated
* Later entries may **shadow** earlier entries
* Shadowing is resolved by:
* Log order
* Snapshot boundary
This applies equally to:
* Data entries
* Tombstones (if present)
---
## 1.7 Non-Goals (Explicit)
ASL-CORE-INDEX does **not** define:
* How lookups are accelerated
* How memory is managed
* How blocks are grouped
* How many index entries exist
Those belong to store and encoding layers.
---
# 2. ASL-STORE-INDEX
### *Index Add-On for Store Semantics and Contracts*
## 2.1 Purpose
This document defines **store-level responsibilities and guarantees** required to *realize* the ASL-CORE-INDEX semantics efficiently and safely.
It binds the semantic model to:
* Block lifecycle
* Log ordering
* Snapshot interaction
* Recovery behavior
---
## 2.2 Store Responsibilities
The ASL store **must** provide:
1. Stable ASL block resolution
2. Append-only index materialization
3. Deterministic replay
4. Snapshot-safe visibility
---
## 2.3 Index Segments
The store organizes index entries into **segments**.
Segment properties:
* Append-only while open
* Immutable once sealed
* Snapshot-visible only after seal
* Replay-attached via log records
Segments are the unit of:
* Persistence
* Replay
* GC eligibility
---
## 2.4 Index Lifecycle
### Creation
* Index entries are appended to an open segment
* No entry is visible yet
### Seal
* Segment is sealed
* Seal is recorded in the append-only log
* Segment becomes visible for lookup
### Snapshot
* Snapshot may capture sealed segments
* Open segments are not required to survive snapshot
---
## 2.5 Lookup Contract
To resolve a key, the store must:
1. Identify all visible segments ≤ CURRENT
2. Search segments in **reverse creation order**
3. Return the first matching entry
4. Respect tombstones if present
The store **may**:
* Use bloom filters
* Use sharding
* Use SIMD scans
* Use memory mapping
But correctness must not depend on these.
---
## 2.6 Crash Safety
The store guarantees:
* Sealed segments are never corrupted
* Open segments may be lost
* Leaked block IDs and segment IDs are acceptable
* Replay is idempotent
---
## 2.7 GC Interaction
The store may delete index segments **only if**:
* They are unreachable from all retained snapshots
* They are no longer referenced by replay state
GC operates at **segment granularity**, never entry granularity.
---
## 2.8 Non-Goals
ASL-STORE-INDEX does **not** define:
* On-disk layout
* Exact record sizes
* Bloom filter math
* Directory structures
Those belong to encoding.
---
# 3. ENC-ASL-CORE-INDEX
### *Encoding Specification for ASL Core Index*
## 3.1 Purpose
This document defines the **exact bytes-on-disk encoding** of index structures required to implement ASL-STORE-INDEX.
This encoding is:
* Immutable
* mmap-friendly
* Language-agnostic
* Stable across versions
---
## 3.2 Encoding Principles
* Little-endian
* Fixed-width fields
* No pointers
* No padding ambiguity
* Forward-compatible via versioning
---
## 3.3 Segment File Layout
```
[ SegmentHeader ]
[ BloomFilter ] (optional, opaque to semantics)
[ IndexRecord[] ]
[ SegmentFooter ]
```
All offsets are file-relative.
---
## 3.4 SegmentHeader (Exact)
```c
#pragma pack(push, 1)
typedef struct {
uint64_t magic;
uint16_t version;
uint16_t shard_id;
uint32_t header_size;
uint64_t snapshot_min;
uint64_t snapshot_max;
uint64_t record_count;
uint64_t records_offset;
uint64_t bloom_offset;
uint64_t bloom_size;
uint64_t flags;
} SegmentHeader;
#pragma pack(pop)
```
---
## 3.5 IndexRecord (Exact)
```c
#pragma pack(push, 1)
typedef struct {
uint64_t hash_hi;
uint64_t hash_mid;
uint64_t hash_lo;
uint32_t hash_tail;
uint64_t block_id;
uint32_t offset;
uint32_t length;
uint32_t flags;
uint32_t reserved;
} IndexRecord;
#pragma pack(pop)
```
Semantic meaning of fields is defined exclusively in ASL-CORE-INDEX.
---
## 3.6 SegmentFooter
```c
#pragma pack(push, 1)
typedef struct {
uint64_t crc64;
uint64_t seal_snapshot;
uint64_t seal_time_ns;
} SegmentFooter;
#pragma pack(pop)
```
---
## 3.7 Bloom Filter Encoding
Bloom filter layout is opaque to ASL-CORE-INDEX semantics.
ENC-ASL-CORE-INDEX requires only:
* Deterministic behavior
* Immutable representation
* Segment-local scope
---
## 3.8 Compatibility Rules
* New fields may be added only via version bump
* Existing fields must never change meaning
* Readers must reject unknown incompatible versions
---
## 4. How the Three Documents Fit Together
| Layer | Responsibility |
| ------------------ | ---------------------------- |
| ASL-CORE-INDEX | Meaning and invariants |
| ASL-STORE-INDEX | Store behavior and contracts |
| ENC-ASL-CORE-INDEX | Bytes on disk |
**No document leaks concerns upward or downward.**
---
## 5. Final Sanity Check
This split achieves:
* Semantic clarity
* Implementation freedom
* Encoding stability
* Long-term evolvability
* Federation friendliness
This is a *very strong* architecture/spec boundary.
---
If you want next, we can:
* Add **formal invariants** to ASL-CORE-INDEX
* Add **state machines** to ASL-STORE-INDEX
* Add **versioning and migration rules**
* Produce **actual C headers + static asserts**
* Walk through **replay step-by-step**
Just tell me which document you want to deepen next.
Refine ASL-CORE-INDEX.
Output full document.
Below is a **refined, self-contained, full specification** for **ASL-CORE-INDEX**.
This document is intentionally *semantic*, not operational or encoded. It should remain stable even if storage, indexing strategies, or encodings evolve.
---
# ASL-CORE-INDEX
### Semantic Addendum to ASL-CORE
---
## 1. Purpose
This document defines the **semantic model of the ASL index**.
The ASL index provides a deterministic, snapshot-relative mapping from **artifact identity** to **artifact location** within ASL-managed storage blocks.
This specification answers **what the index means**, not:
* how it is implemented,
* how it is stored,
* how it is accelerated,
* or how it is encoded on disk.
Those concerns are addressed by:
* **ASL-STORE-INDEX** (store contracts and lifecycle)
* **ENC-ASL-CORE-INDEX** (exact encoding)
---
## 2. Scope
This document defines:
* The logical structure of index entries
* Visibility rules
* Snapshot and log interaction
* Immutability and shadowing semantics
* Determinism guarantees
* Required invariants
This document explicitly does **not** define:
* On-disk formats
* Index segmentation or sharding
* Bloom filters or probabilistic structures
* Memory residency requirements
* Performance characteristics
* Garbage-collection mechanisms
---
## 3. Terminology
### Artifact
An immutable sequence of bytes managed by ASL and referenced by higher layers.
### ArtifactKey
An opaque identifier for an artifact, typically a hash identifier scoped by algorithm and configuration.
### ASL Block
An immutable, independently addressable unit of storage managed by ASL.
### ArtifactLocation
A tuple identifying where an artifacts bytes reside:
```
(block_id, offset, length)
```
### Snapshot
A checkpoint capturing a consistent base state of ASL-managed storage and metadata.
### Append-Only Log
A strictly ordered log recording all index-visible mutations occurring after a snapshot.
### CURRENT
The effective system state obtained by replaying the append-only log on top of a checkpoint snapshot.
---
## 4. Core Semantic Mapping
The ASL index defines a total, deterministic mapping:
```
ArtifactKey → ArtifactLocation
```
### Semantic guarantees
* For any visible `ArtifactKey`, exactly one `ArtifactLocation` is resolved.
* The mapping is **immutable once visible**.
* The mapping is **snapshot-relative**.
* The mapping is **deterministic** for a given `(snapshot, log prefix)`.
---
## 5. ArtifactLocation Semantics
An `ArtifactLocation` identifies **bytes**, not meaning.
### Properties
* `block_id` is an opaque ASL Block ID
* `offset` and `length` are byte offsets within the block
* The tuple is valid only for the lifetime of the referenced block
* ASL guarantees that the referenced bytes are immutable
No additional structure or interpretation is implied.
---
## 6. Visibility Model
An index entry is considered **visible** if and only if **all** of the following hold:
1. The referenced ASL block is sealed
2. A corresponding log record exists
3. The log record is ≤ the CURRENT replay position
### Consequences
* Entries written but not logged are invisible
* Entries logged but referencing unsealed blocks are invalid
* Entries above the CURRENT replay position are invisible
Visibility is **binary**, not gradual.
---
## 7. Snapshot and Log Semantics
### 7.1 Checkpoints
Snapshots act as **checkpoints**, not complete state representations.
A snapshot:
* Captures a consistent base state
* Serves as a replay anchor
* Does not necessarily include the latest index state
### 7.2 Log Replay
The effective index state is defined as:
```
Index(CURRENT) = Index(snapshot) + replay(log)
```
Where replay is:
* Strictly ordered
* Deterministic
* Idempotent
Snapshot and log entries are semantically equivalent once replayed.
---
## 8. Immutability and Shadowing
### 8.1 Immutability
* Index entries are never mutated
* Once visible, an entrys meaning never changes
* Blocks referenced by entries are immutable
### 8.2 Shadowing
Later index entries may **shadow** earlier entries with the same `ArtifactKey`.
Shadowing rules:
* Later entries take precedence
* Precedence is defined by log order
* Snapshot boundaries do not change precedence rules
Shadowing is the only way mappings evolve.
---
## 9. Tombstones (Optional Semantic Concept)
An implementation **may** support tombstone entries.
A tombstone:
* Is an index entry with no `ArtifactLocation`
* Shadows earlier mappings
* Indicates intentional invalidation
This specification:
* Defines the semantic meaning of tombstones
* Does not require their existence
* Does not define how they are encoded or stored
---
## 10. Determinism Guarantees
For any fixed:
* Snapshot
* Log prefix
* ASL configuration
* Hash algorithm set
The ASL index guarantees:
* Deterministic lookup results
* Deterministic shadowing resolution
* Deterministic visibility
No nondeterministic input (time, IO ordering, concurrency) may affect index semantics.
---
## 11. Separation of Concerns
The ASL index is **purely declarative**.
It declares:
* What artifacts exist
* Where their bytes reside
It does **not**:
* Decide how blocks are allocated
* Decide how artifacts are packed
* Decide how data is cached
* Decide how lookups are accelerated
Those responsibilities belong to lower layers.
---
## 12. Invariants (Normative)
The following invariants MUST hold in all conforming implementations:
1. **No visibility without a log record**
2. **No mutation of visible entries**
3. **No mutation of sealed blocks**
4. **Shadowing is strictly monotonic**
5. **Snapshot + log uniquely defines index state**
6. **ArtifactLocation always resolves to immutable bytes**
Violating any invariant constitutes index corruption.
---
## 13. Non-Goals (Explicit)
ASL-CORE-INDEX does **not** define:
* Index data structures
* Segment layout
* Block placement heuristics
* Size classes
* ZFS properties
* Garbage collection policy
* Federation mechanics
These are intentionally excluded to preserve semantic stability.
---
## 14. Relationship to Other Specifications
* **ASL-CORE**
Defines artifact and block semantics.
ASL-CORE-INDEX extends it with lookup meaning.
* **ASL-STORE-INDEX**
Defines how a store must behave to realize these semantics.
* **ENC-ASL-CORE-INDEX**
Defines how index entries are encoded on disk.
No document may contradict this one.
---
## 15. Summary (Normative)
The ASL index is:
* Snapshot-relative
* Log-replay-defined
* Append-only
* Immutable
* Deterministic
* Semantically minimal
It exists to answer **exactly one question**:
> *Given an artifact identity and a point in time, where are the bytes?*
Nothing more, nothing less.
---
If you want, next we can:
* Formalize this as **axioms + derived properties**
* Add a **state-transition model**
* Cross-check it against **PEL receipts**
* Refine **tombstone semantics**
* Or proceed to refine **ASL-STORE-INDEX** with equal rigor
You say "ASL-CORE Defines artifact and block semantics.".
At the moment ASL-CORE does not define block semantics. It is purely concerned with artifact semantics. Maybe it should be concerned with blocks? Maybe blocks are index concerns?
This is a **very good catch**, and it goes to the heart of whether your layering stays clean long-term.
Short answer up front:
> **Blocks should exist in ASL-CORE, but only as an *existential storage abstraction*, not as a semantic one.**
> Index concerns *refer to* blocks, but must not *define* them.
Let me unpack that carefully and reconcile the documents.
---
## 1. Why this question matters
Right now you have:
* **ASL-CORE** → artifact semantics only
* **ASL-CORE-INDEX** → introduces `(block_id, offset, length)`
* **ASL-STORE-INDEX / ENC** → operational & encoding
The tension is obvious:
* The index *talks about blocks*
* ASL-CORE *does not define blocks*
* So where does `block_id` actually come from *conceptually*?
If blocks are *only* an index concern, you end up with:
* Index defining storage abstractions (wrong direction)
* CAS leaking storage semantics upward
* ASL becoming underspecified
So your instinct is correct to question this.
---
## 2. The correct resolution (clean layering)
### Key principle
> **ASL-CORE should define that blocks exist — but not how they are structured, allocated, or used.**
Think of blocks as **storage atoms**, not semantic units.
---
## 3. What ASL-CORE *should* define about blocks (minimal)
ASL-CORE should define **exactly four things** about blocks, and no more.
### 3.1 Existence
ASL stores artifact bytes inside **blocks**.
A block is:
* A unit of immutability
* A unit of addressability
* A unit of snapshot pinning
Nothing else.
---
### 3.2 Block identity
ASL-CORE should introduce:
```
BlockID := opaque, immutable identifier
```
Properties:
* Unique within an ASL instance
* Stable across snapshots
* Never reused
* Opaque to all layers above ASL
No layout, no encoding, no hierarchy.
---
### 3.3 Block immutability
ASL-CORE must state:
> Once a block is sealed, its contents are immutable forever.
This is foundational — index semantics depend on it, but should not define it.
---
### 3.4 Block addressability
ASL-CORE should define the *only* legal addressing primitive:
```
(BlockID, offset, length) → bytes
```
This makes blocks part of the **artifact storage contract**, not an index invention.
---
## 4. What ASL-CORE should *not* define about blocks
ASL-CORE must **not** define:
* Block size
* How many artifacts per block
* How blocks are allocated
* How blocks are stored
* Whether blocks are files, objects, extents
* Any notion of “small” vs “big”
* Placement or locality
Those are **store concerns**, not core semantics.
---
## 5. Where blocks should *not* live
### ❌ Blocks should not be defined in ASL-CORE-INDEX
Because:
* Index semantics *depend on* blocks
* They must not *introduce* them
* Otherwise the index becomes a storage spec
ASL-CORE-INDEX should **reference** `BlockID`, not define it.
---
### ❌ Blocks should not be defined in ENC-ASL-CORE-INDEX
Because:
* Encoding is downstream
* You want freedom to change block representation
* Encodings must not introduce core concepts
---
## 6. Revised responsibility split (cleaned up)
Here is the corrected responsibility map:
| Document | Responsibility |
| ---------------------- | ------------------------------------------------------------------- |
| **ASL-CORE** | Artifact semantics + existence of blocks as immutable storage atoms |
| **ASL-CORE-INDEX** | Semantic mapping from artifact identity → block location |
| **ASL-STORE-INDEX** | How blocks and index entries are created, ordered, replayed |
| **ENC-ASL-CORE-INDEX** | Exact bytes-on-disk encoding |
Blocks belong **below** the index but **above** the store.
---
## 7. Minimal addition to ASL-CORE (suggested text)
You do **not** need to rewrite ASL-CORE — just add a small section.
### Suggested ASL-CORE addition
> ### Blocks
>
> ASL stores artifact bytes within immutable storage units called **blocks**.
>
> A block is identified by an opaque **BlockID**.
>
> Blocks have the following properties:
>
> * A block contains one or more artifacts
> * A block is immutable once sealed
> * A block is independently addressable
> * A block is snapshot-safe
>
> ASL exposes the following addressing primitive:
>
> ```
> (BlockID, offset, length) → bytes
> ```
>
> ASL-CORE does not define block layout, allocation strategy, size, or storage representation.
Thats it. No more.
---
## 8. Why this keeps your architecture sound
With this change:
* ASL-CORE defines *what storage exists*
* ASL-CORE-INDEX defines *how artifacts are located*
* Store and encoding layers remain free to evolve
* PEL and provenance have a stable substrate
* No semantic leakage upward or downward
You preserve your original design intent:
> **Artifacts have meaning. Blocks do not. Blocks only exist to hold bytes.**
---
## 9. Final answer to your question (direct)
> *“Maybe ASL-CORE should be concerned with blocks?”*
**Yes — minimally and existentially.**
> *“Maybe blocks are index concerns?”*
**No — the index depends on blocks, but must not define them.**
This refinement strengthens the architecture rather than complicating it.
If you want, next we can:
* Patch ASL-CORE formally
* Reconcile terminology across all three docs
* Ensure PEL and provenance never depend on block semantics
* Or refine ASL-STORE-INDEX now that this boundary is crisp