Carl Niklas Rydberg fabace7616 Reorganize notes into tier1/ops docs

2026-01-17 10:33:23 +01:00

86 KiB

Raw Blame History

Designing a CAS on top of ZFS. writers are per shard, simd, bloom filters, small + big block separation global index, local indexes, ZFS snapshot as checkpoint and sequential log, CAS stack also adds artifact storage layer (ASL) that has plugable algo and storage. adds 2 bytes preamble to payload. Used for hash_id. Also adds Program Execution Layer (PEL) PEL defines and execute deterministic DAG programs over an array of inputs and an array of outputs (artifacts). Keeps hot pash short but PEL execution recipts are used by other layers to project into a traceable graph kernal. Everything with determinism guaraneed per snapshot.

Handles billions of index entries.

Layer on top of pel can include registries, dedup logic, certification, envelopes probably also handling ...

Here’s a concrete, end-to-end way to think about a CAS (Content-Addressable Storage) built on top of ZFS keep it system-design focused rather than academic.

1. Core goals & constraints

Before structure, clarify the implicit goals your notes suggest:

High write throughput (per-shard writers, sequential logs)
Deduplication by hash (CAS)
Fast existence checks (Bloom filters)
Crash consistency without complex WAL replay (ZFS snapshots as checkpoints)
Scalable index (global → shard → local)
Efficient handling of small vs large objects

ZFS already gives you:

Copy-on-write
Checksumming
Snapshots
Compression
Atomic rename

So your CAS should avoid re-implementing what ZFS already does well.

2. Sharding model

Hash → shard mapping

Compute H = hash(content)
Shard = H[0..k] (prefix bits)
Each shard is independent
- Separate writer
- Separate log
- Separate local index
- Separate Bloom filter

This avoids cross-shard coordination.

hash ──► shard_id ──► shard writer

3. Write path (per shard, sequential)

3.1 Append-only log (primary data path)

Each shard has a sequential log file:

/zfs_pool/cas/shard_XX/log.current

Each entry:

[hash][size][flags][payload]

Always append
No in-place updates
SIMD helps here:
- Batch hashes
- Batch compression
- Batch checksum verification

ZFS likes this pattern.

3.2 Small vs large block separation

Small objects (e.g. < 8–16 KB):

Packed directly into shard log
Many per ZFS block
Indexed by (log_id, offset)

Large objects:

Stored as separate ZFS files

/zfs_pool/cas/shard_XX/objects/HH/HHHH...

Log only stores a reference
Benefits:
- Avoid log bloat
- ZFS compression & prefetch work better
- Easy GC

This mirrors Git’s loose vs packed objects.

4. Indexing strategy

4.1 Local index (per shard)

Minimal, hot, compact:

hash → location
location = (log_id, offset) OR (object_path)

Implementation options:

Embedded KV (LMDB / SQLite / custom mmap table)
Append-only + periodic rebuild

This index is rebuilt from logs if lost.

4.2 Global index (thin routing layer)

The “global index” does not store full locations.

Instead:

hash_prefix → shard_id

Options:

Static hash prefix rule (no structure at all)
Tiny in-memory table if rebalancing is needed

This avoids a scalability bottleneck.

5. Bloom filters (yes, “bloomerb something”)

Each shard maintains a Bloom filter:

Bloom[shard] : hash → maybe exists

Usage:

Hash arrives
Check Bloom
- Negative → definitely new → append
- Positive → consult local index

Properties:

Lives in memory
Periodically snapshotted to disk
Rebuilt on startup by replaying logs

You can SIMD-accelerate Bloom probes.

6. Read path

Compute hash
Route to shard
Check Bloom
- Negative → return “not found”
Lookup local index
Read from:
- Log offset (small)
- Object file (large)

ZFS ARC will cache hot reads automatically.

7. Checkpointing with ZFS snapshots

This is where ZFS shines.

7.1 Periodic checkpoint

For each shard:

Rotate log:
```
log.current → log.N
```
Flush local index & Bloom to disk
Take ZFS snapshot:
```
zfs snapshot pool/cas@checkpoint_123
```

That snapshot is a crash-consistent point:

Logs
Index
Bloom
Object files

No WAL replay logic needed.

7.2 Recovery

On restart:

Roll back or mount latest snapshot
Replay only log.current
Rebuild Bloom + index incrementally

Fast and simple.

8. Garbage collection (important!)

CAS needs GC eventually.

Strategy:

Maintain reference counts or reachability info outside CAS
Periodic mark phase:
- Identify live hashes
Sweep:
- Drop unreachable log segments
- Delete unreferenced large object files

ZFS snapshots make this safer:

GC operates on newest generation
Old snapshots keep data alive until released

9. SIMD & batching opportunities

Where SIMD actually helps:

Hashing (BLAKE3 / SHA-256 batch mode)
Bloom filter probes
Compression (ZSTD batch APIs)
Checksums

Less useful for:

Random index lookups
Metadata heavy paths

10. Why this fits ZFS well

Append-only logs → minimal fragmentation
Snapshots → free checkpoints
Compression → great for small objects
ARC → automatic read caching
Checksums → extra corruption detection

You’re building policy and indexing, not storage primitives.

11. Open questions (only if you want to go deeper)

If you want to refine this further, the key design pivots are:

Expected object size distribution?
Write-heavy vs read-heavy?
Need for online shard rebalancing?
Is this single-node or eventually multi-node?

Answering those changes details, not the core architecture.

This is a solid and ambitious architecture. I’ll respond in design-review mode, not rewriting what you already have, but:

Normalize the mental model
Call out strengths
Identify pressure points / risks
Suggest concrete refinements
Clarify how layers should contract with each other
List a few sharp questions that matter next

1. Canonical Mental Model (clean layering)

What you are really building is:

ZFS (truth & durability)
└── CAS Core (bytes → hashes → blocks)
    ├── Storage Engine (logs, blocks, shards)
    ├── Indexing Engine (global + local, bloom, SIMD)
    ├── Snapshot / Checkpoint Engine
    └── Artifact Storage Layer (ASL)
        └── Pluggable codecs / layouts / policies
└── Program Execution Layer (PEL)
    ├── Deterministic DAG execution
    ├── Receipt generation
    └── Snapshot-bound determinism
└── Higher Logical Layers
    ├── Registries
    ├── Dedup + certification
    ├── Envelopes / provenance
    └── Hash-mapping / conflict resolution

Key insight: Your CAS is not just content-addressed storage. It is a deterministic computation substrate over immutable data.

That puts it closer to:

“Git + Nix + ZFS + Merkle execution traces”

2. Strengths of the Design

✅ ZFS snapshots as checkpoints

This is extremely strong:

Snapshot = global consistency point
Enables:
- Deterministic replay
- Crash recovery without WAL complexity
- Snapshot-scoped guarantees (PEL determinism per snapshot)

This is the right foundation for billions of entries.

✅ Per-shard writers + sequential log

You avoid:

Lock contention
Write amplification

Shard-local sequential logs + snapshot sealing is the correct scaling model.

✅ Small / big block separation

Critical for:

Tail latency
Dedup efficiency
Cache locality

Make sure:

Small blocks are aggressively packed
Big blocks avoid unnecessary hash fanout

✅ Bloom filters + SIMD

This is the only way a global index of this scale is feasible.

One warning: blooms must be snapshot-scoped or versioned or you will suffer false-positive accumulation over time.

✅ PEL as a deterministic DAG executor

This is the most interesting part.

You are essentially doing:

Immutable inputs
Deterministic execution
Receipts as proofs of execution

This unlocks:

Provenance
Reproducibility
Certification
Auditability

3. Pressure Points / Risks

These are not flaws — they’re where designs like this usually break if not nailed early.

⚠️ 1. Hash ID vs payload preamble (2 bytes)

You mention:

“adds 2 bytes preamble to payload. Used for hash_id.”

Be careful:

Any mutation of payload bytes must be:
- Explicitly modeled
- Fully deterministic
- Included in hash domain separation

Recommendation: Treat the preamble as:

hash = H(domain || algo_id || payload)

Not as “payload mutation”.

Otherwise you will eventually get:

Hash collisions across algorithms
Ambiguous artifact identity

⚠️ 2. Global index scale

Billions of entries means:

You cannot do full rebuilds
You cannot afford heavy compaction

You need:

Append-only index segments
Snapshot-scoped immutable index layers
Periodic merge but never rewrite

Think: LSM-tree, but snapshot-aligned and immutable after seal.

⚠️ 3. Hash conflicts & algorithm agility

You explicitly mention:

“duplicate conflicting hash…”

This is good — most systems pretend this won’t happen.

You need:

Hash namespace = (algo, version)
CAS identity ≠ hash alone
Hash is a claim, not truth

Higher layers must be able to say:

“This artifact is identified by multiple hashes, with provenance.”

⚠️ 4. Determinism leaks in PEL

PEL determinism is only as strong as:

Inputs
Environment
Execution semantics

Watch out for:

Non-ordered maps
Floating point differences
Parallel execution ordering
Clock / randomness leaks

Rule: PEL must be:

Pure function of snapshot + declared inputs

No exceptions.

4.1 Artifact Storage Layer (ASL)

Make ASL explicitly responsible for:

Encoding (compression, chunking)
Layout (inline, small-block, big-block)
Policy (retention, locality hints)

But never identity.

Identity must live above ASL, otherwise:

Changing compression breaks hashes
Migration becomes impossible

4.2 Receipts as First-Class Objects

PEL execution receipts should be:

Immutable artifacts
Content-addressed
Stored in CAS

Receipt should include:

Input artifact IDs
Program hash
Output artifact IDs
Snapshot ID
Execution parameters

This lets you:

Project execution into a graph kernel (as you said)
Re-verify execution later
Build trust layers on top

4.3 Registries as Views, not Stores

Registries should be:

Snapshot-scoped views
Declarative mappings
No ownership of data

Think:

registry(snapshot) = projection(CAS + receipts)

This avoids:

Registry corruption
Version skew
Inconsistent dedup logic

5. Contracts Between Layers (important)

CAS guarantees:

Immutable storage per snapshot
Deterministic lookup
Algorithm-agnostic identity support

ASL guarantees:

Reversible encoding
Stable layout per snapshot
No semantic meaning

PEL guarantees:

Deterministic execution
Replayable receipts
Snapshot-bound semantics

Upper layers guarantee:

Policy
Interpretation
Certification
Human meaning

If any layer violates its contract, debugging becomes impossible.

6. Sharp Questions That Matter Next

You don’t need to answer these to me — but you must answer them in the design.

Is PEL allowed to read CAS metadata directly, or only artifacts?
Are execution receipts themselves executable inputs?
How do you garbage-collect unreachable artifacts without breaking provenance?
Is snapshot identity global or per pool?
Can two snapshots share artifacts but diverge in interpretation?
What is the minimal thing required to reproduce an execution on another machine?

Bottom Line

You are building:

A deterministic, snapshot-based, content-addressed computation and provenance system.

The architecture is coherent and powerful. The biggest risks are identity leakage, determinism erosion, and index lifecycle management.

If you want, next we can:

Formalize the artifact identity model
Design the index layout
Specify a PEL execution spec
Or stress-test the design with failure scenarios

Just tell me where you want to go next.

Is PEL allowed to read CAS metadata directly, or only artifacts? PEL is designed on top of ASL. ASL does not have any metadata yet. It has a configuration. PEL needs to read CURRENT snapshot from CAS/ASL metadata. Anything else? Are execution receipts themselves executable inputs? I guess they could be. I mean PEL operates DAG programs composed of concat, slice and some other primitives. But I think programs can have sideeffects on hooks for projections. Input to a PEL program is an artifact or another PEL program. receipts are artifacts so, yes they are valid input to PEL programs but not as executables. That said, looking down into the receipt you wil probably find the DAG program that was involved and use that as an executable program in the PEL program DAG. How do you garbage-collect unreachable artifacts without breaking provenance? Im not sure. Tombe stones? Is snapshot identity global or per pool? Im not sure... Can two snapshots share artifacts but diverge in interpretation? What? How? That is ZFS internals then, right? What is the minimal thing required to reproduce an execution on another machine? it is the last checkpoint and the sequential write log. Internally it would require the governance, trace, provenance, receipts, certificates, federation data, you name it...

Great, this clarifies a lot. I’ll answer in design-decision form, not just yes/no, and I’ll be opinionated where the architecture really needs a hard line.

1. Can PEL read CAS metadata directly, or only artifacts?

Short answer

PEL should read only a very small, explicit CAS/ASL metadata surface:

Current snapshot identity
PEL execution context metadata
Nothing else

Long answer

You already made the correct move by placing PEL on top of ASL. That implies a strict rule:

PEL must not depend on physical storage metadata.

However, snapshot identity is not “storage metadata” — it is execution context. That is a critical distinction.

Required metadata PEL may read

PEL must be allowed to read:

Snapshot ID (opaque, stable)
- Required for determinism
- Required for receipts
- Required to tie executions to checkpoints
PEL execution configuration
- Hash algorithm set
- Primitive semantics version
- ASL configuration hash

These must be immutable for the duration of execution.

Forbidden metadata

PEL must not read:

Block layout
Shard IDs
Bloom filters
Index internals
Physical offsets
ZFS object IDs

If PEL reads these, determinism collapses.

Rule (important)

PEL may read only metadata that is itself snapshot-scoped and immutable.

If ASL “has no metadata yet”, then:

Snapshot identity becomes the first ASL metadata
Treat it as a root execution constant

2. Are execution receipts executable inputs?

You are very close to the right mental model already.

Clear separation you need

There are three distinct things:

Artifact (bytes)
Program (PEL DAG definition)
Receipt (proof that a program ran)

Receipts are artifacts, but:

Receipts are data, not programs.

Your instinct is correct

“looking down into the receipt you will probably find the DAG program that was involved and use that as an executable program”

Yes — but this must be explicit, not implicit.

Correct rule

A receipt may contain:
- Program hash
- Program DAG encoding
- Inputs and outputs
PEL must not automatically execute receipt contents
A higher layer must extract a program from a receipt and submit it intentionally

This prevents:

Accidental self-executing artifacts
Execution smuggling
Provenance corruption

Design recommendation

Add a concept:

ProgramDescriptor = artifact + semantic tag

Only ProgramDescriptor can be executed, not raw artifacts.

3. Garbage collection without breaking provenance

You are right to hesitate here — this is hard.

First principle

CAS GC must be reachability-based per snapshot.

You cannot GC globally without violating history.

The only viable model

Use tombstones + snapshot pinning.

Mechanics:

Each snapshot defines a root set
Artifacts reachable from that snapshot are alive
Unreachable artifacts become eligible, not deleted
Tombstone records:
- Artifact ID
- Snapshot range where it was last reachable

Deletion happens only when:

artifact not reachable from any retained snapshot

Provenance-safe deletion

Receipts, certificates, and provenance graphs pin artifacts
If provenance is required, the snapshot is retained
No snapshot = no provenance guarantee

This is the unavoidable tradeoff.

Rule

You cannot have both infinite provenance and finite storage without policy.

Expose that policy explicitly.

4. Is snapshot identity global or per pool?

This one must be global.

Why per-pool breaks things

If snapshot identity is per ZFS pool:

Federation becomes impossible
Receipts lose meaning across machines
Provenance graphs fragment

Correct model

SnapshotID = Hash(global_namespace || pool_id || txg || config)

Opaque
Comparable
Serializable
Verifiable

ZFS is an implementation detail

ZFS snapshot names are not sufficient as identities. They are sources of identity, not the identity itself.

This is not ZFS internals — it’s a semantic layer issue.

Yes, they absolutely can.

Example:

Snapshot A:
- Hash algorithm = SHA256
- Program semantics v1
Snapshot B:
- Hash algorithm = BLAKE3
- Program semantics v2

Same bytes. Same blocks. Different meaning.

Interpretation depends on:

Hash namespace
ASL config
PEL semantics
Registry rules
Certification policy

So:

Artifacts are immutable, meaning is snapshot-scoped.

This is a feature, not a bug.

6. Minimal thing required to reproduce an execution elsewhere

Your answer is almost correct — but missing one crucial thing.

Not sufficient

Last checkpoint
Sequential write log

That reproduces bytes, not meaning.

Minimal reproducible set

You need:

Checkpoint snapshot
Sequential write log
PEL semantics version
ASL configuration
Hash algorithm registry
Execution receipts

Without (3–5), you can replay bytes but not verify correctness.

Governance, federation, certificates?

Those are:

Not required to replay
Required to trust

This distinction is important.

Final Design Rules (Strong Recommendations)

PEL reads snapshot identity and execution config only
Receipts are artifacts, never auto-executables
GC is snapshot-based with tombstones
Snapshot identity is global and opaque
Artifacts are bytes; interpretation is snapshot-scoped
Reproducibility requires semantics, not just storage

If you want, next we can:

Design the receipt format
Define PEL determinism rules formally
Specify GC lifecycle states
Or design federation across pools

Just point to the next pressure point. Thank you, receipt format, pel determinism rules are well defined. provenance keeps snapshots held by ZFS. Internal system has a sense of domains and federation, published vs internal provinance.

That helps a lot. Given that receipt format and PEL determinism are already solid, the remaining work is mostly about making the boundaries explicit so the system stays sane at scale and across federation. I’ll focus on what to lock down, not re-design what you already have.

1. Snapshots held by provenance (good — but formalize it)

You said:

provenance keeps snapshots held by ZFS

This is the right anchor, but it needs one explicit rule:

Rule: Provenance is a snapshot retention authority

Provenance objects must explicitly:
- Pin snapshot IDs
- Declare retention intent (internal, published, certified, etc.)
ZFS snapshot retention must be driven by provenance, not by CAS reachability alone

Think of provenance as the root of truth for GC policy:

Provenance → Snapshot → Reachable Artifacts

This prevents:

Accidental deletion of published artifacts
Silent provenance loss
Cross-domain retention leaks

2. Domains: internal vs published provenance

Your mention of:

domains and federation, published vs internal provenance

is extremely important. This is where many systems get muddy.

Strong recommendation: three provenance domains

1. Internal domain

Full receipts
Full DAGs
Debug artifacts
Intermediate hashes
Possibly unstable semantics

Never federated by default.

2. Published domain

Curated receipts
Canonical artifacts
Stable program semantics
Redacted internals

This is what leaves the system.

3. Certified domain

Signed receipts
Verified semantics
Frozen interpretation
Long-term retention guarantees

Each domain:

Pins its own snapshots
Has its own GC horizon
Has explicit export rules

3. Federation: what actually crosses boundaries

Federation must be artifact- and receipt-centric, not snapshot-centric.

What you should federate

Artifacts (content-addressed)
Receipts (as artifacts)
Program descriptors
Provenance summaries

What you must not federate

ZFS snapshots
Physical layout
Shard topology
Index internals

Federation should be possible over:

CAS + receipts + semantics registry

—not over storage state.

4. Interpretation divergence is now a feature

Given domains + federation, this becomes clean:

Same artifact hash
Different domains
Different meaning

This allows:

Internal experimentation
External stability
Future semantic upgrades

You get semantic versioning without data migration.

5. Minimal invariant set (this is the “do not break” list)

As long as these hold, the system stays coherent:

Artifacts are immutable
Receipts are immutable
Snapshots are pinned by provenance
PEL semantics are versioned and explicit
Domains control interpretation, not storage
Federation never depends on physical layout

Everything else can evolve.

6. One last subtle pressure point to watch

Because:

Receipts are artifacts
Receipts can be inputs to PEL
Provenance pins snapshots

You can accidentally create:

Self-sustaining provenance loops

Example:

Receipt A depends on artifact B
Artifact B is produced by program whose receipt is A

This is logically valid but dangerous for GC and reasoning.

Recommendation

Add a simple invariant:

Provenance graphs must be acyclic at the snapshot level

Not at the artifact level — at the snapshot pinning level.

Bottom line

At this point, your system is not missing features — it’s missing formal guardrails.

You already have:

Determinism
Receipts
Snapshots
Federation concepts

What you need now is:

Explicit domain boundaries
Provenance-driven retention
Clear export semantics

If you want, next we can:

Formalize domain transitions (internal → published → certified)
Design federation protocols
Stress-test snapshot pinning under adversarial workloads
Or walk through a full lifecycle of an artifact from creation to certification

Just say where you want to apply pressure next. We are looking into index for hash to block, offset and length where blocks are CAS/ASL managed blocks to match ZFS blocksize. The blocks and the log should be neded to reach CURRENT. Indexing needs to be done without loading to much into memory. here bloom filters and sharding comes to play.

Good, now we’re at the hard systems part 🙂 I’ll stay focused on hash → (block, offset, length) indexing at billions of entries, snapshot-aware, ZFS-aligned, and without large memory residency.

I’ll describe:

Index shape
Write path
Read path
Bloom + sharding strategy
Snapshot sealing
Why this works at scale

No fluff.

1. Index shape (what you should actually store)

You want the minimal mapping:

(hash_id, snapshot_id?) → (block_id, offset, length, flags)

Strong recommendation

Make the index append-only, segmented, and snapshot-sealed.

Physical layout

Index segment = immutable file
Each segment covers:
- One shard
- One snapshot range (or single snapshot)
Segments are written sequentially

Think:

/index/
  shard-00/
    seg-000001.idx
    seg-000002.idx
  shard-01/
    seg-000001.idx

2. Sharding strategy (critical)

Shard on hash prefix, not snapshot.

Example:

shard = hash_id[0..N bits]

Properties:

Uniform distribution
Deterministic placement
SIMD-friendly comparisons
No rebalancing required

Shard count should be:

Large enough to bound segment size
Small enough to avoid too many file handles

Rule of thumb:

2¹⁰–2¹⁴ shards (1024–16384)

3. Write path (fast, simple, safe)

Step-by-step

Artifact written to CAS block
- Block is ZFS-sized (good)
- Offset + length known
Index entry appended
- Written to shard-local segment
- Sequential write
- No in-place update
Bloom filter updated
- Shard-local
- Snapshot-local
Snapshot seal
- Segment becomes immutable
- Bloom filter frozen

No global coordination required.

4. Read path (minimal IO, bounded)

Lookup algorithm

Given hash_id and CURRENT snapshot:

Compute shard from hash prefix
Load shard bloom filter (memory-mapped or cached)
If bloom says “no” → done
If “maybe”:
- Scan segments newest → oldest
- Stop on first match

Because:

Newer segments shadow older ones
No deletes (only tombstones if needed)

Why this is fast

Bloom filters kill most misses
Segment scan is sequential
Segments are small and shard-bounded
OS page cache does the heavy lifting

5. Bloom filters: how to not screw this up

Per-shard, per-segment bloom filters

Do not use one giant bloom.

Each index segment has:

Its own bloom
Sized to expected entries
Frozen on snapshot seal

This gives:

Predictable false-positive rate
No saturation over time
Easy discard when segment GC’d

Memory usage stays bounded.

6. SIMD-friendly index encoding

Inside a segment:

[hash_prefix][hash_suffix][block_id][offset][length][flags]

Recommendations:

Fixed-width fields
Aligned records
Sorted by hash within segment (optional but powerful)

If sorted:

Binary search or galloping scan
SIMD compare of hash lanes

If unsorted:

SIMD scan still works, just more comparisons

Sorting cost is paid once at seal time.

7. Snapshot awareness (CURRENT vs history)

You said:

The blocks and the log should be needed to reach CURRENT

Correct.

Important rule

The index does not need to know ZFS history.

It only needs:

Segment creation snapshot
Segment seal snapshot

When resolving CURRENT:

Ignore segments sealed after CURRENT
Include all segments ≤ CURRENT

This is cheap:

Just a snapshot ID comparison
No traversal

8. Memory discipline (how this stays small)

You never load:

Full index
Full shard
Full bloom universe

You load:

Bloom for a shard (small)
A few recent segments
Possibly memory-map older ones

Worst-case memory:

O(shards × bloom_size) + O(active_segments)

Bounded. Predictable. Cache-friendly.

9. Tombstones (only if you really need them)

If you must support:

Hash redefinition
Conflict resolution
Artifact invalidation

Then:

Tombstone is just another index entry
Higher precedence than data entries
Stored in the same segment stream

No special machinery.

10. Why this design survives billions of entries

Because:

Append-only
No rewrites
No global locks
Snapshot-aligned immutability
ZFS handles durability and caching
Bloom filters kill negative lookups
Shards bound working sets

This is essentially:

LSM-tree semantics, but snapshot-native and immutable-first

Which matches your system philosophy perfectly.

One final sharp recommendation

Do not try to make the index “smart”. Make it:

Dumb
Immutable
Sequential
Snapshot-scoped

All intelligence belongs:

In PEL
In provenance
In registries

If you want next, we can:

Pick exact record layouts (bytes-on-disk)
Tune bloom parameters for your scale
Walk through crash recovery
Stress-test CURRENT resolution under heavy churn

Say the word. Pick exact record layouts. bytes-on-disk mapped to C structures (used for c lib. spawning tools and api-frontends)

Great — I’ll be concrete and opinionated. Below is a bytes-on-disk index format that:

Maps 1:1 to C structs
Is SIMD-friendly
Is append-only
Works with sharding + bloom filters
Is snapshot-aware without embedding ZFS internals
Scales to billions of entries
Avoids pointer chasing and heap allocation

I’ll give:

High-level layout
Segment file structure
Exact record structs
Bloom filter layout
Snapshot sealing metadata
Alignment + portability rules
Example lookup flow in C terms

1. High-level layout

Each index segment file is immutable after sealing.

+----------------------+
| SegmentHeader        |
+----------------------+
| BloomFilter          |
+----------------------+
| IndexRecord[]        |  (append-only)
+----------------------+
| SegmentFooter        |
+----------------------+

One segment:

One shard
One snapshot interval
Sequentially written
mmap-friendly

2. Segment header (fixed, small)

Design goals

Fast validation
Versioning
Snapshot bounds
No pointers

#pragma pack(push, 1)

typedef struct {
    uint64_t magic;              // e.g. 0x434153494E445831 ("CASINDX1")
    uint16_t version;            // format version
    uint16_t shard_id;           // hash prefix shard
    uint32_t header_size;        // bytes

    uint64_t snapshot_min;       // inclusive
    uint64_t snapshot_max;       // inclusive (== min until sealed)

    uint64_t record_count;       // filled at seal
    uint64_t records_offset;     // byte offset to IndexRecord[]

    uint64_t bloom_offset;       // byte offset
    uint64_t bloom_size;         // bytes

    uint64_t flags;              // sealed, sorted, has_tombstones, etc.
} SegmentHeader;

#pragma pack(pop)

Notes:

snapshot_max == snapshot_min while writing
On seal, snapshot_max is finalized
Header never changes size

3. Index record layout (core of everything)

This is the hot path, so it must be perfect.

Hash strategy

Split hash into:

Prefix → shard
Suffix → stored inline

Assume:

Hash = 256 bits
Shard uses first 12 bits
Remaining 244 bits stored

Record struct

#pragma pack(push, 1)

typedef struct {
    uint64_t hash_hi;     // hash[12..75]
    uint64_t hash_mid;    // hash[76..139]
    uint64_t hash_lo;     // hash[140..203]
    uint32_t hash_tail;   // hash[204..235] (remaining bits, padded)

    uint64_t block_id;    // CAS/ASL block identifier
    uint32_t offset;     // offset within block
    uint32_t length;     // length in bytes

    uint32_t flags;      // data, tombstone, algo_id, etc.
    uint32_t reserved;   // alignment / future use
} IndexRecord;

#pragma pack(pop)

Properties

64 bytes total (cache-line aligned)
Fixed-width
SIMD-comparable hash lanes
No pointers
No varints
Works with memcmp, AVX, NEON

4. Flags definition (important)

#define IDX_FLAG_DATA        0x00000001
#define IDX_FLAG_TOMBSTONE   0x00000002
#define IDX_FLAG_INLINE      0x00000004   // small payload embedded elsewhere
#define IDX_FLAG_ALGO_SHIFT  8             // upper bits for hash algo id

Hash algorithm ID lives in flags — not in payload.

5. Bloom filter layout

Bloom is segment-local, frozen at seal.

Header

#pragma pack(push, 1)

typedef struct {
    uint64_t hash_seed;       // per-segment
    uint32_t k;               // number of hash functions
    uint32_t bits_per_entry;  // tuning knob
    uint64_t bit_count;       // total bits
} BloomHeader;

#pragma pack(pop)

Bit array

Immediately follows header:

uint8_t bloom_bits[];

Bloom filter offset and size are in SegmentHeader.

6. Segment footer (integrity + seal)

#pragma pack(push, 1)

typedef struct {
    uint64_t crc64;           // over header+bloom+records
    uint64_t seal_snapshot;   // snapshot at seal time
    uint64_t seal_time_ns;    // optional
} SegmentFooter;

#pragma pack(pop)

Footer allows:

Corruption detection
Fast rejection
Seal verification

7. Alignment & portability rules

Non-negotiable rules:

All structs are packed
All integers are little-endian
No pointers
No platform-dependent types
All offsets are file-relative

This makes:

mmap safe
Cross-language bindings trivial
Zero-copy reads possible

8. Memory-mapped usage model (C)

Typical read path:

SegmentHeader *hdr = mmap_segment(path);

if (!bloom_maybe(hdr, hash)) {
    return NOT_FOUND;
}

IndexRecord *records = (IndexRecord *)(
    (uint8_t *)hdr + hdr->records_offset
);

for (ssize_t i = hdr->record_count - 1; i >= 0; --i) {
    if (hash_equal(&records[i], hash)) {
        if (records[i].flags & IDX_FLAG_TOMBSTONE)
            return NOT_FOUND;
        return records[i];
    }
}

With sorting:

Binary search
Galloping search
SIMD blocks

9. Optional: Sorted segments (recommended)

At seal time:

Sort records by (hash_suffix)
Set SEGMENT_FLAG_SORTED

This gives:

Faster lookup
Smaller scan window
Better cache locality

Sorting cost is amortized and paid once.

10. Why this layout works long-term

Immutable
Append-only
Snapshot-scoped
SIMD-friendly
mmap-friendly
No schema migration pain
Easy to generate bindings (Rust, Go, Python)

This is the kind of layout that survives decades, not just versions.

If you want next

We can:

Tune record size vs hash width
Design small-block inline payload segments
Add prefix-compressed variants
Design cross-segment merge logic
Generate actual C headers + helpers

Tell me where to zoom in next.

I think we may have discussed block size. But not my original intent that is asl artifacts of different size kept apart as big and small.

Good catch — this is an important distinction, and it affects block layout, indexing, and GC. You’re not just talking about ZFS blocksize anymore, you’re talking about artifact size classes inside ASL, with different physical handling.

I’ll realign the design to your original intent:

ASL artifacts of different sizes are kept apart as small and big, even though both live on ZFS and are indexed uniformly.

1. Clarify the layers (important reset)

CAS

Logical identity (hash → artifact)
Index (hash → location)
Snapshot semantics

ASL

Physical artifact storage
Size-class separation
Layout & packing
Block management

ZFS blocksize is a substrate detail. ASL defines artifact blocks, which may map onto one or more ZFS blocks.

2. Size classes (explicit, non-negotiable)

You want at least two classes:

Small artifacts

Typical sizes: bytes → tens of KB
Stored packed
High density
High fan-in
Many per block

Big artifacts

Typical sizes: hundreds of KB → GB
Stored directly
Minimal fragmentation
Streaming-friendly

Make this explicit in ASL config:

typedef struct {
    uint32_t small_max_size;   // e.g. 64 KB
    uint32_t small_block_size; // e.g. 1–4 MB
    uint32_t big_block_size;   // e.g. ZFS recordsize or multiples
} ASLConfig;

3. Physical layout by size class

3.1 Small artifact blocks

Small artifacts are packed into ASL small blocks.

+------------------------+
| SmallBlockHeader       |
+------------------------+
| ArtifactEntry[]        |
+------------------------+
| Free space             |
+------------------------+

Each artifact:

Stored contiguously
Offset + length identifies it
No pointer chasing

Header

#pragma pack(push, 1)

typedef struct {
    uint64_t block_id;
    uint32_t used_bytes;
    uint32_t artifact_count;
} SmallBlockHeader;

#pragma pack(pop)

Artifacts are appended sequentially.

3.2 Big artifact blocks

Big artifacts:

Stored one-per-block or as extent chains
No packing
No fragmentation games

Options:

Single ASL block == one artifact
Or extent list if very large

Header example:

#pragma pack(push, 1)

typedef struct {
    uint64_t block_id;
    uint64_t artifact_size;
    uint64_t flags;
} BigBlockHeader;

#pragma pack(pop)

Payload follows immediately.

4. Index mapping (now corrected)

Index record must reflect ASL-level blocks, not ZFS blocks.

So your index record fields:

hash → (asl_block_id, offset, length, flags)

Flags indicate size class

#define IDX_FLAG_SMALL  0x00000001
#define IDX_FLAG_BIG    0x00000002

This tells the reader:

Whether to go through small-block unpacking
Or treat the block as a direct artifact

No ambiguity.

5. Why separation matters (practical wins)

Small artifacts

Better cache locality
Lower metadata overhead
Faster dedup for tiny items
Lower index pressure

Big artifacts

No read amplification
Streaming-friendly
Better ZFS prefetch
Avoids hot small-block churn

This matches real-world workloads extremely well.

6. GC implications (size-aware)

GC works at ASL block granularity:

Big blocks:
- Drop block when unreferenced
Small blocks:
- Use:
  - Reference counting per artifact
  - Or evacuate live artifacts to new blocks (copy GC)
  - Or tombstone + rebuild during compaction

Important: small-block GC is optional early on. You can defer it until fragmentation hurts.

7. Snapshot interaction (no surprises)

Index entries are snapshot-scoped
ASL blocks are immutable once sealed
Small blocks fill, seal, move on
Big blocks seal immediately

Snapshots pin blocks naturally.

8. Revised index record (final form)

Here’s the corrected index record given size classes:

#pragma pack(push, 1)

typedef struct {
    uint64_t hash_hi;
    uint64_t hash_mid;
    uint64_t hash_lo;
    uint32_t hash_tail;

    uint64_t asl_block_id;
    uint32_t offset;
    uint32_t length;

    uint32_t flags;      // SMALL / BIG / TOMBSTONE / ALGO
    uint32_t reserved;
} IndexRecord;

#pragma pack(pop)

Same as before — but semantics are now correct.

9. One important rule to write down

CAS never reasons about size classes. ASL never reasons about identity.

Size class is a storage concern, exposed only via flags for correct decoding.

10. Where to go next (useful follow-ups)

If you want, we can:

Design small-block packing format in more detail
Define ASL block ID allocation
Design small-block compaction / evacuation
Tune size thresholds using workload assumptions
Design zero-copy reads for small artifacts

This clarification was important — your instinct was right to pause here.

Examine the model. Size class to me was something ASL did when creating and sending blocks. It is just ASL packageing smaller items in one box and preferably dont mix in small items with the bigger items out of courtesy.

That clarification is important — and it actually simplifies the model in a good way. Let me examine it as you intend it, not as a generic CAS pattern.

You are not introducing size classes as a semantic concept. You are introducing courtesy-based physical packaging inside ASL.

That’s a subtle but powerful distinction.

1. Restating your intent (to check alignment)

What you are saying is effectively:

ASL produces blocks. ASL prefers to package small artifacts together and not mix them with big ones. Size is a heuristic for packaging, not an exposed concept.

Key consequences:

CAS does not know about size classes
Index does not encode size class semantics
PEL does not care
Size affects only how ASL fills blocks

If this matches your intent — then the earlier “size class flags” were too strong.

I agree with you.

2. Correct model after correction

What size affects

Block fill strategy
Block sealing policy
Write amplification
Cache locality

What size does not affect

Identity
Index semantics
Read logic
Provenance
Snapshot behavior

This keeps the system clean.

3. ASL block abstraction (single kind of block)

You should think in terms of one ASL block type, with soft heuristics:

ASL Block
 ├── block_id
 ├── payload_bytes
 ├── internal layout (opaque to CAS)
 └── sealed or open

Inside ASL:

Some blocks will naturally contain many small artifacts
Some blocks will contain one large artifact
CAS does not know or care

This is exactly like object packing in Git packfiles — but snapshot-native.

4. Index stays simple (good)

Your original index mapping remains correct and minimal:

hash → (asl_block_id, offset, length)

No flags required for size.

That’s a strong design win:

Fewer invariants
Fewer migrations
Less coupling

5. Reads remain uniform

Read path:

Lookup hash
Get (block_id, offset, length)
Ask ASL: “give me bytes”
ASL does whatever it needs internally

PEL and CAS never branch on size.

6. Writes: where size actually matters

Size only influences block selection:

If artifact is “small enough”:
- Try current open block
If artifact is “large”:
- Open a fresh block
- Write it
- Seal immediately

But this rule is:

Local
Heuristic
Not persistent knowledge

You can change it later without breaking anything.

7. GC implications (still clean)

Because:

Blocks are immutable once sealed
Index defines reachability

GC can operate purely on:

Block reachability
Snapshot pinning

Small-block compaction is:

An optimization
Entirely inside ASL
Invisible to CAS/PEL

No semantic coupling introduced.

8. One thing to be explicit about (important)

Even though size is not a semantic concept, you still want:

ASL blocks must never be partially mutable after seal.

Otherwise:

Offsets become unstable
Index breaks
Receipts lose meaning

As long as:

ASL blocks are append-only then sealed
No in-place mutation after seal

You’re safe.

9. Summary: model sanity check

Your corrected model is:

✔ Size is a packaging courtesy
✔ ASL hides size behavior
✔ CAS index remains minimal
✔ PEL is unaffected
✔ Future heuristics are swappable

This is better than a hard size-class model.

10. One gentle recommendation (optional)

Even if size is not semantic, consider storing block hints internally in ASL:

enum ASLBlockHint {
    ASL_HINT_DENSE,
    ASL_HINT_SINGLETON,
};

This never leaves ASL, but helps:

Compaction
GC heuristics
Read-ahead tuning

No one else needs to know.

Bottom line

Your instinct is correct: size belongs to ASL behavior, not system meaning.

The model is coherent, simpler, and more evolvable this way.

If you want, next we can:

Stress-test this under mixed workloads
Design ASL block ID allocation
Walk through crash recovery of open blocks
Or define ASL compaction strategies

Just point the spotlight.

Design ASL block ID allocation. I presume having two concrete ZFS datasets attached to the system would have impact is right. What do you say?

Good question — yes, having two concrete ZFS datasets absolutely has impact, and if you lean into that fact instead of abstracting it away, the design gets simpler and safer.

I’ll go step-by-step and keep it grounded in how ZFS actually behaves, while preserving your ASL abstraction.

1. What an ASL Block ID really is (first principle)

An ASL block ID is not a physical address. It is a stable, opaque capability that:

Identifies a sealed ASL block
Survives snapshots
Is valid across restarts
Is independent of index layout
Can be resolved via ASL alone

So:

Block ID ≠ offset, ≠ object number, ≠ filename

Block ID is a name, not a location.

2. Why ZFS datasets matter (you’re right)

ZFS datasets give you:

Independent recordsize
Independent compression
Independent quotas
Independent snapshot lifecycles
Independent IO behavior

That maps perfectly to your packaging courtesy idea.

So yes: two datasets are a feature, not a complication.

3. Recommended dataset topology

At minimum:

pool/
 ├── asl-small/   (dense packing, many artifacts per block)
 └── asl-big/     (mostly one artifact per block)

Optionally later:

 ├── asl-volatile/
 ├── asl-published/
 └── asl-certified/

But start with two.

Dataset tuning examples

asl-small
- smaller recordsize (e.g. 64K–256K)
- aggressive compression
asl-big
- larger recordsize (e.g. 1M–8M)
- streaming-friendly

ASL decides which dataset to write to — CAS never sees this.

4. Block ID structure (this is the key)

Strong recommendation: structured, but opaque

typedef struct {
    uint64_t dataset_id;   // small / big / future
    uint64_t sequence;     // monotonically increasing
} ASLBlockID;

Total: 128 bits (or hash it down if you want).

Why this works

No global coordination
No contention
No reuse
Easy sharding
Easy federation

5. Dataset ID (small, fixed namespace)

Dataset ID is not dynamic.

Example:

enum ASLDatasetID {
    ASL_DS_SMALL = 1,
    ASL_DS_BIG   = 2
};

This ID is:

Compiled into ASL
Part of the block ID
Stable forever

Changing dataset tuning does not change dataset ID.

6. Sequence allocation (simple and robust)

Each dataset maintains its own allocator:

asl-small: next_block_seq
asl-big:   next_block_seq

Allocation rules:

Monotonic
Append-only
Persisted in dataset-local metadata
Advanced only when block is created

On crash

You may leak an ID
You never reuse an ID
That’s fine

7. Mapping Block ID → storage (ASL-only)

ASL resolves block IDs like this:

(dataset_id, sequence)
   ↓
dataset root
   ↓
object path or ZFS object

Example mapping (implementation choice):

asl-small/blocks/0000/0000/00012345.blk
asl-big/blocks/0000/0000/00098765.blk

Or:

ZFS object per block
ZVOL
File-backed objects

CAS does not care.

8. Snapshot interaction (clean)

Because:

Blocks are immutable after seal
Stored inside snapshot-aware datasets

Then:

Snapshot pins blocks automatically
Index references remain valid
Provenance pinning works naturally

No extra machinery required.

9. Federation implications (important)

Block IDs must not encode pool identity.

Why:

Federation means blocks may move
Blocks may be replicated
Pool names are local

If you need global uniqueness:

GlobalBlockID = hash(cluster_id || ASLBlockID)

But internally:

(dataset_id, sequence) is perfect

10. Why two datasets improves the system

You get:

Independent IO characteristics
Natural separation without semantics
Easier tuning
Less fragmentation
Clear GC boundaries

And crucially:

You can change the heuristic later without breaking block identity.

11. What not to do (anti-patterns)

Avoid:

Encoding artifact size into block ID
Encoding snapshot ID into block ID
Encoding physical offsets
Using filenames as identity
Reusing block IDs

All of these break invariants later.

12. Final sanity check

Your model now looks like this:

ASL:
- Chooses dataset
- Allocates block ID
- Writes block
- Seals block
CAS:
- Stores (block_id, offset, length)
PEL:
- Never sees block IDs
ZFS:
- Handles durability, snapshots, caching

This is clean, layered, and future-proof.

If you want next

We can:

Design ASL open-block lifecycle
Design crash recovery for partially written blocks
Decide file vs object vs zvol per block
Design block compaction / evacuation
Stress-test block ID growth and limits

Just say where to drill next.

Hmm... Small and big blocks should be addressed the same way. Can they maybe share storage closer, somehow?

Yes — and your instinct is right. If small and big blocks are addressed the same way, then the earlier “dataset-per-size” framing is too strong. Let’s correct it without losing the practical benefits.

What you actually want is:

One ASL block namespace, one addressing scheme, one resolution path — with storage locality hints, not semantic separation.

That is both cleaner and more powerful.

1. Re-establish the invariant (this is the anchor)

Invariant you want to preserve:

ASLBlockID is opaque
Resolving a block ID does not require knowing whether it was “small” or “big”
CAS and PEL never branch on block kind
Storage choice is an ASL internal decision

So:

Small vs big is not part of the block’s identity.

Good. That rules out encoding size class into the ID.

2. The mistake to avoid

The mistake would be:

Two datasets ⇒ two namespaces
Two datasets ⇒ different resolution logic
Two datasets ⇒ semantic leakage

You don’t want that.

3. The correct model: one namespace, shared storage, hinted placement

One block ID type

typedef uint64_t ASLBlockID;

Monotonic, opaque, never reused.

No dataset ID. No size bit. No encoding tricks.

Instead of separate datasets, use one dataset with structured locality.

Example layout:

asl/
 ├── blocks/
 │    ├── open/
 │    ├── sealed/
 │    │     ├── 00/
 │    │     ├── 01/
 │    │     └── ff/

Block ID → directory sharding only.

Now comes the important part.

5. Locality without semantic separation (the key trick)

When ASL allocates a block, it supplies placement hints, not identity changes.

Examples of hints

ZFS recordsize hint
Compression hint
Write pattern hint
Expected fill density hint

These hints are best-effort and local.

They may be implemented as:

ZFS properties on subdirectories
ZFS object creation flags
Different write sizes
Different block finalization strategies

But the block ID and path resolution remain identical.

6. Concrete example (single dataset, different behavior)

asl/
 ├── blocks/
 │    ├── dense/        ← preferred for small artifacts
 │    ├── sparse/       ← preferred for large artifacts

Both live in the same dataset.

Resolution logic:

block_id → hash → directory → file

No “if small then …” at read time.

Only at allocation time:

ASL prefers dense/ if current block has room
ASL prefers sparse/ if artifact is large

But if tomorrow you change the heuristic:

Old blocks still work
IDs don’t change
Index doesn’t change

That’s exactly what you want.

7. ZFS tuning without semantic coupling

You can still tune ZFS:

Set different recordsize on dense/ vs sparse/ directories
Set different compression
Set different prefetch behavior

ZFS allows property inheritance at directory level.

So you get:

Physical differentiation
Logical uniformity

8. GC and compaction stay simple

Because:

Blocks are immutable after seal
Block ID resolution is uniform

GC can:

Walk sealed blocks
Check reachability
Delete whole blocks

No special cases.

Compaction:

Can evacuate artifacts into new blocks
Old block ID becomes unreachable
Index updates do the rest

9. Why this is better than two datasets

Two datasets:

Create implicit semantic boundaries
Complicate federation
Complicate tooling
Leak storage decisions upward

One dataset with locality hints:

Keeps one namespace
Keeps resolution uniform
Keeps future flexibility

This matches your “courtesy” framing perfectly.

10. Revised ASL block ID allocation (final form)

Allocation rule:

Block ID = monotonically increasing counter
Allocation is global to ASL
Block ID never encodes size, dataset, or policy

Placement rule:

ASL chooses where to place the block based on heuristics
That choice is not observable above ASL

This separation is the core design win.

11. Sanity check against your earlier statements

✔ “Small and big blocks should be addressed the same way” → YES
✔ “Size is ASL packaging courtesy” → YES
✔ “Prefer not mixing small with big” → YES (heuristic)
✔ “No semantic exposure” → YES

The model is now internally consistent.

12. One last recommendation (tiny but important)

Document this explicitly:

Block placement is an optimization, not a contract.

That sentence will save you years later.

If you want next, the natural follow-ups are:

ASL open-block lifecycle and sealing
Crash recovery for partially written blocks
Directory / object layout details on ZFS
How compaction rewrites index safely

Say which one you want to tackle.

Yes. Could you help me capture this model in an architecture and specification?

Absolutely. I’ll help you capture the model cleanly in a way that is:

Precise enough to implement
Explicit about invariants
Careful about what is not part of the contract
Suitable as an architecture + specification section

I’ll structure this as something you could almost drop into a design doc.

ASL Block Architecture & Specification

1. Purpose and Scope

The Artifact Storage Layer (ASL) is responsible for the physical storage, layout, and retrieval of immutable artifact bytes. ASL operates beneath CAS and above the storage substrate (ZFS).

ASL concerns itself with:

Efficient packaging of artifacts into blocks
Stable block addressing
Snapshot-safe immutability
Storage-local optimizations

ASL does not define:

Artifact identity
Hash semantics
Provenance
Interpretation
Indexing semantics

2. Core Abstractions

2.1 Artifact

An artifact is an immutable byte sequence produced or consumed by higher layers.

ASL treats artifacts as opaque bytes.

2.2 ASL Block

An ASL block is the smallest independently addressable, immutable unit of storage managed by ASL.

Properties:

Identified by an ASL Block ID
Contains one or more artifacts
Written sequentially
Immutable once sealed
Snapshot-safe

ASL blocks are the unit of:

Storage
Reachability
Garbage collection

2.3 ASL Block ID

An ASL Block ID is an opaque, stable identifier.

Invariants

Globally unique within an ASL instance
Never reused
Never mutated
Does not encode:
- Artifact size
- Placement
- Snapshot
- Storage topology
- Policy decisions

Semantics

Block IDs identify logical blocks, not physical locations.

Higher layers must treat block IDs as uninterpretable tokens.

3. Addressing Model

ASL exposes a single addressing primitive:

(block_id, offset, length) → bytes

This is the only contract between CAS and ASL.

Notes:

offset and length are stable for the lifetime of the block
ASL guarantees that reads are deterministic per snapshot
No size-class or block-kind information is exposed

4. Block Allocation Model

4.1 Global Block Namespace

ASL maintains a single global block namespace.

Block IDs are allocated from a monotonically increasing sequence:

next_block_id := next_block_id + 1

Properties:

Allocation is append-only
Leaked IDs are permitted
No coordination with CAS is required

4.2 Open Blocks

At any time, ASL may maintain one or more open blocks.

Open blocks:

Accept new artifact writes
Are not visible to readers
Are not referenced by the index
May be abandoned on crash

4.3 Sealed Blocks

A block becomes sealed when:

It reaches an internal fill threshold, or
ASL decides to finalize it for policy reasons

Once sealed:

No further writes are permitted
Offsets and lengths become permanent
The block becomes visible to CAS
The block may be referenced by index entries

Sealed blocks are immutable forever.

5. Packaging Policy (Non-Semantic)

ASL applies packaging heuristics when choosing how to place artifacts into blocks.

Examples:

Prefer packing many small artifacts together
Prefer isolating very large artifacts
Avoid mixing vastly different sizes when convenient

Important rule

Packaging decisions are:

Best-effort
Local
Replaceable
Not part of the ASL contract

No higher layer may assume anything about block contents based on artifact size.

6. Storage Layout and Locality

6.1 Single Dataset, Structured Locality

ASL stores all blocks within a single ZFS dataset.

Within that dataset, ASL may organize blocks into subpaths to improve locality, e.g.:

asl/blocks/dense/
asl/blocks/sparse/

These subpaths:

Exist purely for storage optimization
May carry ZFS property overrides
Are not encoded into block identity

Block resolution does not depend on knowing which subpath was used.

6.2 Placement Hints

At allocation time, ASL may apply placement hints, such as:

Preferred directory
Write size
Compression preference
Recordsize alignment

These hints:

Affect only physical layout
May change over time
Do not affect block identity or correctness

7. Snapshot Semantics

ASL is snapshot-aware but snapshot-agnostic.

Rules:

ASL blocks live inside snapshot-capable storage
Snapshots naturally pin sealed blocks
ASL does not encode snapshot IDs into block IDs
CAS determines snapshot visibility

ASL guarantees:

Deterministic reads for a given snapshot
No mutation of sealed blocks across snapshots

8. Crash Safety and Recovery

8.1 Crash During Open Block

If a crash occurs:

Open blocks may be lost or abandoned
Block IDs allocated but not sealed may be leaked
No sealed block may be corrupted

This is acceptable and expected.

8.2 Recovery Rules

On startup, ASL:

Scans for sealed blocks
Ignores or cleans up abandoned open blocks
Resumes allocation from the next unused block ID

No global replay or rebuild is required.

9. Garbage Collection

ASL performs garbage collection at block granularity.

Rules:

A block is eligible for deletion if:
- It is sealed, and
- It is unreachable from all retained snapshots
ASL does not perform partial block mutation
Compaction (if any) rewrites artifacts into new blocks

Block deletion is irreversible.

10. Non-Goals (Explicit)

ASL explicitly does not provide:

Artifact identity management
Deduplication decisions
Provenance interpretation
Size-class semantics
Execution semantics

Those concerns belong to CAS, PEL, and higher layers.

11. Design Summary (Executive)

One block namespace
One addressing model
One read path
Placement is an optimization
Immutability is absolute
Snapshots provide safety
Size is a courtesy, not a contract

If you want, next we can:

Add formal invariants (for verification)
Turn this into a C header + comments
Write pseudo-code for block lifecycle
Integrate this spec with the index spec
Stress-test this model against adversarial failures

Just tell me how formal you want the next step to be.

The checkpoint is only a checkpoint but the append only write log can recover above the checkpoint.

Good clarification — this is an important correction to the persistence model, and it fits very well with everything you’ve designed so far. Let’s fold it cleanly into the architecture/spec without breaking invariants.

I’ll do three things:

Restate the persistence model precisely
Adjust the ASL + CAS responsibilities
Capture it as spec text you can drop in

1. Correct persistence model (precise statement)

What you are saying is:

A snapshot (checkpoint) is not the full state. The system state is: checkpoint + append-only write log.

This means:

Snapshots are checkpoints, not commits
The write log is authoritative above the checkpoint
Recovery = replay log on top of checkpoint

This is not a traditional WAL in the database sense — it’s closer to:

Kafka log + snapshot
LSM-tree + manifest
Git packfiles + refs

And that’s perfectly aligned with your system.

2. Reframing snapshots (important nuance)

Snapshot semantics (revised)

A snapshot represents:

A stable base state
A replay anchor
A determinism boundary

It does not necessarily represent:

The latest committed state
The full index
All visible artifacts

So:

CURRENT = checkpoint snapshot + replayed log

That distinction must be explicit everywhere.

3. What lives in the append-only log

The append-only log is now a first-class system component, not just an optimization.

The log records:

ASL block seals
CAS index segment creation
Tombstones (if any)
PEL receipts
Registry mutations
Provenance updates

Everything that changes system-visible state is logged.

4. ASL behavior under checkpoint + log

4.1 During normal operation

ASL writes open blocks
When a block is sealed:
- Block becomes immutable
- A seal record is appended to the log
CAS may reference the block only after seal record exists

This guarantees:

Crash safety
Replayability
Ordering

4.2 On checkpoint creation

When a checkpoint is taken:

All prior log entries are considered “below the line”
Snapshot captures:
- All sealed blocks
- All index segments
- All metadata up to that point
Log continues without reset

You do not truncate the log eagerly.

4.3 On recovery

Recovery procedure:

Mount latest checkpoint snapshot
Read append-only log from checkpoint position
Replay entries in order:
- Re-register sealed blocks
- Re-attach index segments
- Rebuild CURRENT view
Resume normal operation

This is deterministic and bounded.

5. Why this is better than “snapshot = truth”

This model gives you:

Faster checkpoints
Less snapshot churn
Fewer ZFS snapshots
Clear crash recovery semantics
Ability to validate or audit replay

It also aligns perfectly with your PEL determinism guarantees.

6. Updating the ASL spec (drop-in text)

Here is a clean spec addition you can insert.

8. Persistence, Checkpoints, and Recovery (Revised)

8.1 Checkpoints

A checkpoint is a ZFS snapshot that captures a consistent base state of the system.

A checkpoint:

Is immutable
Serves as a replay anchor
Does not necessarily reflect the latest system state

Checkpoints are created periodically or by policy.

8.2 Append-Only Write Log

The system maintains an append-only write log that records all state mutations occurring after the most recent checkpoint.

The write log includes, but is not limited to:

ASL block seal records
Index segment creation records
Tombstone records
PEL receipt registrations
Provenance and registry updates

The log is strictly ordered and durable.

8.3 System State Definition

The current system state (CURRENT) is defined as:

CURRENT = checkpoint_state + replay(write_log)

No component may assume that the checkpoint alone represents CURRENT.

8.4 Recovery Procedure

On startup or failure recovery:

The most recent checkpoint snapshot is mounted
The append-only write log is read from the checkpoint position
All log entries are replayed deterministically
The system transitions to CURRENT
Normal operation resumes

This process must be idempotent and deterministic.

8.5 Interaction with ASL Blocks

ASL blocks become visible only after a corresponding log entry exists
Blocks written but not sealed at crash time may be discarded
Leaked block IDs are permitted
No sealed block may be modified or removed during replay

7. Invariants this introduces (important to write down)

You should explicitly state these invariants:

Nothing is visible without a log record
Checkpoints are sufficient but not complete
Replay is the source of truth above the checkpoint
Sealed blocks are immutable across replay
Replay order defines CURRENT

These invariants keep every layer honest.

8. How this interacts with PEL determinism

This model actually strengthens determinism:

PEL executions reference snapshot + log position
Receipts are replayed, not re-executed
Determinism is guaranteed per checkpoint + log prefix

You now have a very strong story for:

Auditing
Federation
Certification

9. Final sanity check

With this clarification, your system is:

Snapshot-based
Log-driven
Immutable-first
Deterministic
Replayable
Auditable

This is a very robust design.

If you want next, we can:

Specify the log record formats
Define idempotency rules for replay
Design log compaction vs checkpoints
Stress-test crash scenarios
Integrate this with the index segment lifecycle

Just tell me where to zoom in.

Refine into three documents ASL-CORE-INDEX index addendum to ASL-CORE semantic model, ASL-STORE-INDEX index add-on for store semantics and contracts, and ENC-ASL-CORE-INDEX with exact encoding. If this makes sense to you.

Yes — this makes excellent sense, and the split you propose is exactly the right one. You’re separating:

What the index means
How the store must behave to support it
How it is encoded on disk

That separation will save you from circular dependencies later.

Below I’ll refine the model into three documents, with:

Clear scope
Explicit contracts
Sharp non-goals
Consistent terminology

You should be able to lift these almost verbatim into a spec repo.

1. ASL-CORE-INDEX

Index Addendum to ASL-CORE Semantic Model

1.1 Purpose

This document defines the semantic meaning of the ASL index, independent of:

Physical storage layout
Encoding format
ZFS specifics
Performance optimizations

It extends ASL-CORE by specifying what it means to resolve an artifact via the index.

1.2 Scope

This document defines:

Index concepts and semantics
Visibility rules
Snapshot and log interaction
Required invariants

This document does not define:

Disk formats
Bloom filters
Sharding
Block placement
Performance characteristics

1.3 Core Semantic Mapping

The ASL index defines a total mapping:

ArtifactKey → ArtifactLocation

Where:

ArtifactKey      := hash_id (opaque, algorithm-scoped)
ArtifactLocation := (block_id, offset, length)

Semantic guarantees

The mapping is immutable once visible
A mapping is snapshot-relative
A mapping is deterministic given (snapshot, log prefix)

1.4 Visibility Model

An index entry becomes visible iff:

The referenced ASL block is sealed
A corresponding log record exists
The log record is ≤ CURRENT replay position

No other condition may make an entry visible.

1.5 Snapshot and Log Semantics

Snapshots are checkpoints, not full state
The index state at any time is:

Index(CURRENT) = Index(snapshot) + replay(log)

Index entries may exist:
- Below the snapshot (materialized)
- Above the snapshot (log-derived)

The semantic meaning is identical.

1.6 Immutability and Shadowing

Index entries are never mutated
Later entries may shadow earlier entries
Shadowing is resolved by:
- Log order
- Snapshot boundary

This applies equally to:

Data entries
Tombstones (if present)

1.7 Non-Goals (Explicit)

ASL-CORE-INDEX does not define:

How lookups are accelerated
How memory is managed
How blocks are grouped
How many index entries exist

Those belong to store and encoding layers.

2. ASL-STORE-INDEX

Index Add-On for Store Semantics and Contracts

2.1 Purpose

This document defines store-level responsibilities and guarantees required to realize the ASL-CORE-INDEX semantics efficiently and safely.

It binds the semantic model to:

Block lifecycle
Log ordering
Snapshot interaction
Recovery behavior

2.2 Store Responsibilities

The ASL store must provide:

Stable ASL block resolution
Append-only index materialization
Deterministic replay
Snapshot-safe visibility

2.3 Index Segments

The store organizes index entries into segments.

Segment properties:

Append-only while open
Immutable once sealed
Snapshot-visible only after seal
Replay-attached via log records

Segments are the unit of:

Persistence
Replay
GC eligibility

2.4 Index Lifecycle

Creation

Index entries are appended to an open segment
No entry is visible yet

Seal

Segment is sealed
Seal is recorded in the append-only log
Segment becomes visible for lookup

Snapshot

Snapshot may capture sealed segments
Open segments are not required to survive snapshot

2.5 Lookup Contract

To resolve a key, the store must:

Identify all visible segments ≤ CURRENT
Search segments in reverse creation order
Return the first matching entry
Respect tombstones if present

The store may:

Use bloom filters
Use sharding
Use SIMD scans
Use memory mapping

But correctness must not depend on these.

2.6 Crash Safety

The store guarantees:

Sealed segments are never corrupted
Open segments may be lost
Leaked block IDs and segment IDs are acceptable
Replay is idempotent

2.7 GC Interaction

The store may delete index segments only if:

They are unreachable from all retained snapshots
They are no longer referenced by replay state

GC operates at segment granularity, never entry granularity.

2.8 Non-Goals

ASL-STORE-INDEX does not define:

On-disk layout
Exact record sizes
Bloom filter math
Directory structures

Those belong to encoding.

3. ENC-ASL-CORE-INDEX

Encoding Specification for ASL Core Index

3.1 Purpose

This document defines the exact bytes-on-disk encoding of index structures required to implement ASL-STORE-INDEX.

This encoding is:

Immutable
mmap-friendly
Language-agnostic
Stable across versions

3.2 Encoding Principles

Little-endian
Fixed-width fields
No pointers
No padding ambiguity
Forward-compatible via versioning

3.3 Segment File Layout

[ SegmentHeader ]
[ BloomFilter   ] (optional, opaque to semantics)
[ IndexRecord[] ]
[ SegmentFooter ]

All offsets are file-relative.

3.4 SegmentHeader (Exact)

#pragma pack(push, 1)
typedef struct {
    uint64_t magic;
    uint16_t version;
    uint16_t shard_id;
    uint32_t header_size;

    uint64_t snapshot_min;
    uint64_t snapshot_max;

    uint64_t record_count;
    uint64_t records_offset;

    uint64_t bloom_offset;
    uint64_t bloom_size;

    uint64_t flags;
} SegmentHeader;
#pragma pack(pop)

3.5 IndexRecord (Exact)

#pragma pack(push, 1)
typedef struct {
    uint64_t hash_hi;
    uint64_t hash_mid;
    uint64_t hash_lo;
    uint32_t hash_tail;

    uint64_t block_id;
    uint32_t offset;
    uint32_t length;

    uint32_t flags;
    uint32_t reserved;
} IndexRecord;
#pragma pack(pop)

Semantic meaning of fields is defined exclusively in ASL-CORE-INDEX.

3.6 SegmentFooter

#pragma pack(push, 1)
typedef struct {
    uint64_t crc64;
    uint64_t seal_snapshot;
    uint64_t seal_time_ns;
} SegmentFooter;
#pragma pack(pop)

3.7 Bloom Filter Encoding

Bloom filter layout is opaque to ASL-CORE-INDEX semantics.

ENC-ASL-CORE-INDEX requires only:

Deterministic behavior
Immutable representation
Segment-local scope

3.8 Compatibility Rules

New fields may be added only via version bump
Existing fields must never change meaning
Readers must reject unknown incompatible versions

4. How the Three Documents Fit Together

Layer	Responsibility
ASL-CORE-INDEX	Meaning and invariants
ASL-STORE-INDEX	Store behavior and contracts
ENC-ASL-CORE-INDEX	Bytes on disk

No document leaks concerns upward or downward.

5. Final Sanity Check

This split achieves:

Semantic clarity
Implementation freedom
Encoding stability
Long-term evolvability
Federation friendliness

This is a very strong architecture/spec boundary.

If you want next, we can:

Add formal invariants to ASL-CORE-INDEX
Add state machines to ASL-STORE-INDEX
Add versioning and migration rules
Produce actual C headers + static asserts
Walk through replay step-by-step

Just tell me which document you want to deepen next.

Refine ASL-CORE-INDEX. Output full document.

Below is a refined, self-contained, full specification for ASL-CORE-INDEX. This document is intentionally semantic, not operational or encoded. It should remain stable even if storage, indexing strategies, or encodings evolve.

ASL-CORE-INDEX

Semantic Addendum to ASL-CORE

1. Purpose

This document defines the semantic model of the ASL index.

The ASL index provides a deterministic, snapshot-relative mapping from artifact identity to artifact location within ASL-managed storage blocks.

This specification answers what the index means, not:

how it is implemented,
how it is stored,
how it is accelerated,
or how it is encoded on disk.

Those concerns are addressed by:

ASL-STORE-INDEX (store contracts and lifecycle)
ENC-ASL-CORE-INDEX (exact encoding)

2. Scope

This document defines:

The logical structure of index entries
Visibility rules
Snapshot and log interaction
Immutability and shadowing semantics
Determinism guarantees
Required invariants

This document explicitly does not define:

On-disk formats
Index segmentation or sharding
Bloom filters or probabilistic structures
Memory residency requirements
Performance characteristics
Garbage-collection mechanisms

3. Terminology

Artifact

An immutable sequence of bytes managed by ASL and referenced by higher layers.

ArtifactKey

An opaque identifier for an artifact, typically a hash identifier scoped by algorithm and configuration.

ASL Block

An immutable, independently addressable unit of storage managed by ASL.

ArtifactLocation

A tuple identifying where an artifact’s bytes reside:

(block_id, offset, length)

Snapshot

A checkpoint capturing a consistent base state of ASL-managed storage and metadata.

Append-Only Log

A strictly ordered log recording all index-visible mutations occurring after a snapshot.

CURRENT

The effective system state obtained by replaying the append-only log on top of a checkpoint snapshot.

4. Core Semantic Mapping

The ASL index defines a total, deterministic mapping:

ArtifactKey → ArtifactLocation

Semantic guarantees

For any visible ArtifactKey, exactly one ArtifactLocation is resolved.
The mapping is immutable once visible.
The mapping is snapshot-relative.
The mapping is deterministic for a given (snapshot, log prefix).

5. ArtifactLocation Semantics

An ArtifactLocation identifies bytes, not meaning.

Properties

block_id is an opaque ASL Block ID
offset and length are byte offsets within the block
The tuple is valid only for the lifetime of the referenced block
ASL guarantees that the referenced bytes are immutable

No additional structure or interpretation is implied.

6. Visibility Model

An index entry is considered visible if and only if all of the following hold:

The referenced ASL block is sealed
A corresponding log record exists
The log record is ≤ the CURRENT replay position

Consequences

Entries written but not logged are invisible
Entries logged but referencing unsealed blocks are invalid
Entries above the CURRENT replay position are invisible

Visibility is binary, not gradual.

7. Snapshot and Log Semantics

7.1 Checkpoints

Snapshots act as checkpoints, not complete state representations.

A snapshot:

Captures a consistent base state
Serves as a replay anchor
Does not necessarily include the latest index state

7.2 Log Replay

The effective index state is defined as:

Index(CURRENT) = Index(snapshot) + replay(log)

Where replay is:

Strictly ordered
Deterministic
Idempotent

Snapshot and log entries are semantically equivalent once replayed.

8. Immutability and Shadowing

8.1 Immutability

Index entries are never mutated
Once visible, an entry’s meaning never changes
Blocks referenced by entries are immutable

8.2 Shadowing

Later index entries may shadow earlier entries with the same ArtifactKey.

Shadowing rules:

Later entries take precedence
Precedence is defined by log order
Snapshot boundaries do not change precedence rules

Shadowing is the only way mappings evolve.

9. Tombstones (Optional Semantic Concept)

An implementation may support tombstone entries.

A tombstone:

Is an index entry with no ArtifactLocation
Shadows earlier mappings
Indicates intentional invalidation

This specification:

Defines the semantic meaning of tombstones
Does not require their existence
Does not define how they are encoded or stored

10. Determinism Guarantees

For any fixed:

Snapshot
Log prefix
ASL configuration
Hash algorithm set

The ASL index guarantees:

Deterministic lookup results
Deterministic shadowing resolution
Deterministic visibility

No nondeterministic input (time, IO ordering, concurrency) may affect index semantics.

11. Separation of Concerns

The ASL index is purely declarative.

It declares:

What artifacts exist
Where their bytes reside

It does not:

Decide how blocks are allocated
Decide how artifacts are packed
Decide how data is cached
Decide how lookups are accelerated

Those responsibilities belong to lower layers.

12. Invariants (Normative)

The following invariants MUST hold in all conforming implementations:

No visibility without a log record
No mutation of visible entries
No mutation of sealed blocks
Shadowing is strictly monotonic
Snapshot + log uniquely defines index state
ArtifactLocation always resolves to immutable bytes

Violating any invariant constitutes index corruption.

13. Non-Goals (Explicit)

ASL-CORE-INDEX does not define:

Index data structures
Segment layout
Block placement heuristics
Size classes
ZFS properties
Garbage collection policy
Federation mechanics

These are intentionally excluded to preserve semantic stability.

14. Relationship to Other Specifications

ASL-CORE Defines artifact and block semantics. ASL-CORE-INDEX extends it with lookup meaning.
ASL-STORE-INDEX Defines how a store must behave to realize these semantics.
ENC-ASL-CORE-INDEX Defines how index entries are encoded on disk.

No document may contradict this one.

15. Summary (Normative)

The ASL index is:

Snapshot-relative
Log-replay-defined
Append-only
Immutable
Deterministic
Semantically minimal

It exists to answer exactly one question:

Given an artifact identity and a point in time, where are the bytes?

Nothing more, nothing less.

If you want, next we can:

Formalize this as axioms + derived properties
Add a state-transition model
Cross-check it against PEL receipts
Refine tombstone semantics
Or proceed to refine ASL-STORE-INDEX with equal rigor

You say "ASL-CORE Defines artifact and block semantics.". At the moment ASL-CORE does not define block semantics. It is purely concerned with artifact semantics. Maybe it should be concerned with blocks? Maybe blocks are index concerns?

This is a very good catch, and it goes to the heart of whether your layering stays clean long-term.

Short answer up front:

Blocks should exist in ASL-CORE, but only as an existential storage abstraction, not as a semantic one. Index concerns refer to blocks, but must not define them.

Let me unpack that carefully and reconcile the documents.

1. Why this question matters

Right now you have:

ASL-CORE → artifact semantics only
ASL-CORE-INDEX → introduces (block_id, offset, length)
ASL-STORE-INDEX / ENC → operational & encoding

The tension is obvious:

The index talks about blocks
ASL-CORE does not define blocks
So where does block_id actually come from conceptually?

If blocks are only an index concern, you end up with:

Index defining storage abstractions (wrong direction)
CAS leaking storage semantics upward
ASL becoming underspecified

So your instinct is correct to question this.

2. The correct resolution (clean layering)

Key principle

ASL-CORE should define that blocks exist — but not how they are structured, allocated, or used.

Think of blocks as storage atoms, not semantic units.

3. What ASL-CORE should define about blocks (minimal)

ASL-CORE should define exactly four things about blocks, and no more.

3.1 Existence

ASL stores artifact bytes inside blocks.

A block is:

A unit of immutability
A unit of addressability
A unit of snapshot pinning

Nothing else.

3.2 Block identity

ASL-CORE should introduce:

BlockID := opaque, immutable identifier

Properties:

Unique within an ASL instance
Stable across snapshots
Never reused
Opaque to all layers above ASL

No layout, no encoding, no hierarchy.

3.3 Block immutability

ASL-CORE must state:

Once a block is sealed, its contents are immutable forever.

This is foundational — index semantics depend on it, but should not define it.

3.4 Block addressability

ASL-CORE should define the only legal addressing primitive:

(BlockID, offset, length) → bytes

This makes blocks part of the artifact storage contract, not an index invention.

4. What ASL-CORE should not define about blocks

ASL-CORE must not define:

Block size
How many artifacts per block
How blocks are allocated
How blocks are stored
Whether blocks are files, objects, extents
Any notion of “small” vs “big”
Placement or locality

Those are store concerns, not core semantics.

5. Where blocks should not live

❌ Blocks should not be defined in ASL-CORE-INDEX

Because:

Index semantics depend on blocks
They must not introduce them
Otherwise the index becomes a storage spec

ASL-CORE-INDEX should reference BlockID, not define it.

❌ Blocks should not be defined in ENC-ASL-CORE-INDEX

Because:

Encoding is downstream
You want freedom to change block representation
Encodings must not introduce core concepts

6. Revised responsibility split (cleaned up)

Here is the corrected responsibility map:

Document	Responsibility
ASL-CORE	Artifact semantics + existence of blocks as immutable storage atoms
ASL-CORE-INDEX	Semantic mapping from artifact identity → block location
ASL-STORE-INDEX	How blocks and index entries are created, ordered, replayed
ENC-ASL-CORE-INDEX	Exact bytes-on-disk encoding

Blocks belong below the index but above the store.

7. Minimal addition to ASL-CORE (suggested text)

You do not need to rewrite ASL-CORE — just add a small section.

Suggested ASL-CORE addition

Blocks

ASL stores artifact bytes within immutable storage units called blocks.

A block is identified by an opaque BlockID.

Blocks have the following properties:

A block contains one or more artifacts

A block is immutable once sealed

A block is independently addressable

A block is snapshot-safe

ASL exposes the following addressing primitive:
(BlockID, offset, length) → bytes
ASL-CORE does not define block layout, allocation strategy, size, or storage representation.

That’s it. No more.

8. Why this keeps your architecture sound

With this change:

ASL-CORE defines what storage exists
ASL-CORE-INDEX defines how artifacts are located
Store and encoding layers remain free to evolve
PEL and provenance have a stable substrate
No semantic leakage upward or downward

You preserve your original design intent:

Artifacts have meaning. Blocks do not. Blocks only exist to hold bytes.

9. Final answer to your question (direct)

“Maybe ASL-CORE should be concerned with blocks?”

Yes — minimally and existentially.

“Maybe blocks are index concerns?”

No — the index depends on blocks, but must not define them.

This refinement strengthens the architecture rather than complicating it.

If you want, next we can:

Patch ASL-CORE formally
Reconcile terminology across all three docs
Ensure PEL and provenance never depend on block semantics
Or refine ASL-STORE-INDEX now that this boundary is crisp

86 KiB Raw Blame History Unescape Escape

1. Core goals & constraints

2. Sharding model

Hash → shard mapping

3. Write path (per shard, sequential)

3.1 Append-only log (primary data path)

3.2 Small vs large block separation

4. Indexing strategy

4.1 Local index (per shard)

4.2 Global index (thin routing layer)

5. Bloom filters (yes, “bloomerb something”)

6. Read path

7. Checkpointing with ZFS snapshots

7.1 Periodic checkpoint

7.2 Recovery

8. Garbage collection (important!)

9. SIMD & batching opportunities

10. Why this fits ZFS well

11. Open questions (only if you want to go deeper)

1. Canonical Mental Model (clean layering)

2. Strengths of the Design

✅ ZFS snapshots as checkpoints

✅ Per-shard writers + sequential log

✅ Small / big block separation

✅ Bloom filters + SIMD

✅ PEL as a deterministic DAG executor

3. Pressure Points / Risks

⚠️ 1. Hash ID vs payload preamble (2 bytes)

⚠️ 2. Global index scale

⚠️ 3. Hash conflicts & algorithm agility

⚠️ 4. Determinism leaks in PEL

4. Suggested Refinements

4.1 Artifact Storage Layer (ASL)

4.2 Receipts as First-Class Objects

4.3 Registries as Views, not Stores

5. Contracts Between Layers (important)

CAS guarantees:

ASL guarantees:

PEL guarantees:

Upper layers guarantee:

6. Sharp Questions That Matter Next

Bottom Line

1. Can PEL read CAS metadata directly, or only artifacts?

Short answer

Long answer

Required metadata PEL may read

Forbidden metadata

Rule (important)

2. Are execution receipts executable inputs?

Clear separation you need

Your instinct is correct

Correct rule

Design recommendation

3. Garbage collection without breaking provenance

First principle

The only viable model

Mechanics:

Provenance-safe deletion

Rule

4. Is snapshot identity global or per pool?

Why per-pool breaks things

Correct model

ZFS is an implementation detail

5. Can two snapshots share artifacts but diverge in interpretation?

Yes, they absolutely can.

Interpretation depends on:

6. Minimal thing required to reproduce an execution elsewhere

Not sufficient

Minimal reproducible set

Governance, federation, certificates?

Final Design Rules (Strong Recommendations)

1. Snapshots held by provenance (good — but formalize it)

Rule: Provenance is a snapshot retention authority

2. Domains: internal vs published provenance

Strong recommendation: three provenance domains

1. Internal domain

2. Published domain

3. Certified domain

3. Federation: what actually crosses boundaries

What you should federate

86 KiB

Raw Blame History