amduat-api/notes/on-disk layout.md
2026-01-17 00:19:49 +01:00

6 KiB
Raw Blame History

Good, this is the right moment to pin this down, because on-disk shape is where abstraction leaks if were sloppy.

Ill answer in layers:

  1. What ASL actually requires on disk (minimal contract)
  2. A concrete, sane on-disk layout that fits what you already have
  3. How shards, blocks, artifacts, logs, and snapshots map
  4. How this scales from “single laptop” to “Common”
  5. Why this does not over-spec ASL-CORE

Ill stay very explicit about what is normative vs conventional.


1. What ASL actually requires on disk (the hard minimum)

ASL itself does not care about directories.

ASL-STORE / ASL-STORE-INDEX require only that the store can provide:

  1. Immutable blocks
  2. An append-only log
  3. Sealed snapshots
  4. Deterministic replay
  5. CAS addressing

So the only hard requirements are:

  • Blocks are immutable after seal
  • Index entries point to (block_id, offset, length)
  • Logs are append-only and ordered
  • Snapshots pin a prefix of state

Everything else is an implementation choice.

That means: 👉 your current “store root with shards” is already compatible.


Here is a layout that is:

  • compatible with your current shard idea
  • works on POSIX
  • maps cleanly to ZFS later
  • easy to reason about during rescue / forensics

Store root (per domain)

/asl/
└── domains/
    └── <domain-id>/
        ├── meta/
        ├── blocks/
        ├── index/
        ├── log/
        ├── snapshots/
        └── tmp/

Everything below is domain-local.


3. Blocks (the real storage substrate)

3.1 Block storage (immutable)

blocks/
├── open/
│   └── blk_<uuid>.tmp
└── sealed/
    ├── 00/
    │   └── <blockid>.blk
    ├── 01/
    │   └── <blockid>.blk
    └── ff/
        └── <blockid>.blk
  • blockid = CAS hash
  • Sharded by prefix (first byte or two)
  • Open blocks are never visible
  • Sealed blocks are immutable

This directly matches your block + offset mental model.

Important: artifacts do not live as files They live inside blocks.


4. Index (where artifacts become visible)

Your shard-based approach fits perfectly here.

index/
├── shard-000/
│   ├── segment-0001.idx
│   ├── segment-0002.idx
│   └── bloom.bin
├── shard-001/
│   └── ...
└── shard-fff/

Each segment-XXXX.idx is:

  • append-only while open
  • immutable once sealed
  • referenced by the log

Index records point to:

ArtifactKey → (BlockID, offset, length)

This is exactly ASL-STORE-INDEX.


5. Append-only log (the spine of truth)

log/
├── log-0000000000000000.asl
├── log-0000000000001000.asl
└── current

Log records include:

  • index additions
  • tombstones
  • segment seals
  • DAM updates
  • witness rotation artifacts

Rules:

  • Logs are strictly ordered
  • Never rewritten
  • Replayable from snapshot

6. Snapshots (checkpoints, not magic)

snapshots/
├── snapshot-00000123/
│   ├── manifest.yaml
│   ├── pinned-segments.txt
│   └── pinned-blocks.txt
└── snapshot-00000124/

A snapshot is:

  • a commit point
  • a pin-set (segments + blocks)
  • a replay anchor

ZFS snapshots fit perfectly here but are not required.


7. Meta (authority, policy, identity)

meta/
├── dam/
│   └── dam-current.art
├── policy/
│   └── policy-current.art
├── keys/
│   ├── pub/
│   └── priv/
└── host/
    └── host-id

Important rule:

Meta artifacts are still artifacts

They must:

  • be sealed
  • be indexed
  • appear in the log

The files here are cached materializations.


8. Tmp (explicitly unsafe)

tmp/
├── ingest/
├── scratch/
└── pty/

This is where:

  • asl-capture writes streams
  • open blocks live
  • crashes are allowed

Nothing here is trusted.


9. How shards fit (your current design)

You said:

“I have a store root with shards with artifacts in them.”

That maps cleanly if you reinterpret it as:

  • shards own index segments
  • artifacts are logical entries, not files
  • shard locality = lookup acceleration

So instead of:

shard/
└── artifact-file

You have:

shard/
└── index segments → blocks

This is the key mental shift you already sensed.


10. Common vs Personal vs Group (on disk)

They look identical on disk.

Differences live in:

  • DAM
  • policy
  • witness artifacts

Common

  • quorum witness
  • stricter validation
  • often read-only on most hosts

Personal

  • self-authority
  • multiple active writers allowed
  • same block/index/log structure

Group

  • hybrid policies

No special filesystem layout needed.


11. Why this is the right abstraction boundary

You said:

“I used to look at the data as artifacts. Now it seems Im better off looking at data as blocks with artifacts in them.”

Thats exactly right.

Artifacts are semantic Blocks are physical

This layout lets you:

  • reason in artifacts
  • store in blocks
  • snapshot blocks
  • index artifacts
  • federate semantics

without mixing concerns.


12. One sentence to lock it in

Artifacts are indexed meanings; blocks are frozen bytes; snapshots pin the relationship.


13. What you can safely implement now

Even before ASL-CORE / PEL are finished, you can implement:

  • block store
  • block sealing
  • index segments
  • append-only log
  • snapshot pinning
  • DAM storage and validation

All of that is infrastructure, not policy.


If you want next, I strongly suggest one of these:

  • walk through one artifact write end-to-end on disk
  • show how asl-capture writes into blocks
  • show how a personal domain syncs across two machines
  • map this layout directly onto ZFS datasets

Youre no longer at risk of mixing projects — youre converging.