amduat-api/notes/on-disk layout.md

335 lines
6 KiB
Markdown
Raw Normal View History

Good, this is the right moment to pin this down, because **on-disk shape is where abstraction leaks if were sloppy**.
Ill answer in layers:
1. **What ASL actually requires on disk (minimal contract)**
2. **A concrete, sane on-disk layout that fits what you already have**
3. **How shards, blocks, artifacts, logs, and snapshots map**
4. **How this scales from “single laptop” to “Common”**
5. **Why this does *not* over-spec ASL-CORE**
Ill stay very explicit about what is *normative* vs *conventional*.
---
## 1. What ASL actually requires on disk (the hard minimum)
ASL itself does **not** care about directories.
ASL-STORE / ASL-STORE-INDEX require only that the store can provide:
1. **Immutable blocks**
2. **An append-only log**
3. **Sealed snapshots**
4. **Deterministic replay**
5. **CAS addressing**
So the **only hard requirements** are:
* Blocks are immutable after seal
* Index entries point to `(block_id, offset, length)`
* Logs are append-only and ordered
* Snapshots pin a prefix of state
Everything else is an implementation choice.
That means:
👉 your current “store root with shards” is already compatible.
---
## 2. Recommended on-disk layout (practical, explicit)
Here is a layout that is:
* compatible with your current shard idea
* works on POSIX
* maps cleanly to ZFS later
* easy to reason about during rescue / forensics
### Store root (per domain)
```
/asl/
└── domains/
└── <domain-id>/
├── meta/
├── blocks/
├── index/
├── log/
├── snapshots/
└── tmp/
```
Everything below is **domain-local**.
---
## 3. Blocks (the real storage substrate)
### 3.1 Block storage (immutable)
```
blocks/
├── open/
│ └── blk_<uuid>.tmp
└── sealed/
├── 00/
│ └── <blockid>.blk
├── 01/
│ └── <blockid>.blk
└── ff/
└── <blockid>.blk
```
* `blockid` = CAS hash
* Sharded by prefix (first byte or two)
* Open blocks are **never visible**
* Sealed blocks are immutable
This directly matches your **block + offset** mental model.
> Important: **artifacts do not live as files**
> They live *inside blocks*.
---
## 4. Index (where artifacts become visible)
Your shard-based approach fits perfectly here.
```
index/
├── shard-000/
│ ├── segment-0001.idx
│ ├── segment-0002.idx
│ └── bloom.bin
├── shard-001/
│ └── ...
└── shard-fff/
```
Each `segment-XXXX.idx` is:
* append-only while open
* immutable once sealed
* referenced by the log
Index records point to:
```
ArtifactKey → (BlockID, offset, length)
```
This is exactly ASL-STORE-INDEX.
---
## 5. Append-only log (the spine of truth)
```
log/
├── log-0000000000000000.asl
├── log-0000000000001000.asl
└── current
```
Log records include:
* index additions
* tombstones
* segment seals
* DAM updates
* witness rotation artifacts
Rules:
* Logs are strictly ordered
* Never rewritten
* Replayable from snapshot
---
## 6. Snapshots (checkpoints, not magic)
```
snapshots/
├── snapshot-00000123/
│ ├── manifest.yaml
│ ├── pinned-segments.txt
│ └── pinned-blocks.txt
└── snapshot-00000124/
```
A snapshot is:
* a **commit point**
* a pin-set (segments + blocks)
* a replay anchor
ZFS snapshots fit **perfectly** here but are not required.
---
## 7. Meta (authority, policy, identity)
```
meta/
├── dam/
│ └── dam-current.art
├── policy/
│ └── policy-current.art
├── keys/
│ ├── pub/
│ └── priv/
└── host/
└── host-id
```
Important rule:
> **Meta artifacts are still artifacts**
They must:
* be sealed
* be indexed
* appear in the log
The files here are *cached materializations*.
---
## 8. Tmp (explicitly unsafe)
```
tmp/
├── ingest/
├── scratch/
└── pty/
```
This is where:
* asl-capture writes streams
* open blocks live
* crashes are allowed
Nothing here is trusted.
---
## 9. How shards fit (your current design)
You said:
> “I have a store root with shards with artifacts in them.”
That maps cleanly if you reinterpret it as:
* shards own **index segments**
* artifacts are logical entries, not files
* shard locality = lookup acceleration
So instead of:
```
shard/
└── artifact-file
```
You have:
```
shard/
└── index segments → blocks
```
This is the key mental shift you already sensed.
---
## 10. Common vs Personal vs Group (on disk)
**They look identical on disk.**
Differences live in:
* DAM
* policy
* witness artifacts
### Common
* quorum witness
* stricter validation
* often read-only on most hosts
### Personal
* self-authority
* multiple active writers allowed
* same block/index/log structure
### Group
* hybrid policies
No special filesystem layout needed.
---
## 11. Why this is the right abstraction boundary
You said:
> “I used to look at the data as artifacts. Now it seems Im better off looking at data as blocks with artifacts in them.”
Thats exactly right.
**Artifacts are semantic**
**Blocks are physical**
This layout lets you:
* reason in artifacts
* store in blocks
* snapshot blocks
* index artifacts
* federate semantics
without mixing concerns.
---
## 12. One sentence to lock it in
> **Artifacts are indexed meanings; blocks are frozen bytes; snapshots pin the relationship.**
---
## 13. What you can safely implement *now*
Even before ASL-CORE / PEL are finished, you can implement:
* block store
* block sealing
* index segments
* append-only log
* snapshot pinning
* DAM storage and validation
All of that is **infrastructure**, not policy.
---
If you want next, I strongly suggest one of these:
* walk through **one artifact write end-to-end on disk**
* show **how asl-capture writes into blocks**
* show **how a personal domain syncs across two machines**
* map this layout directly onto **ZFS datasets**
Youre no longer at risk of mixing projects — youre converging.