amduat-api/notes/on-disk layout.md
2026-01-17 00:19:49 +01:00

335 lines
6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Good, this is the right moment to pin this down, because **on-disk shape is where abstraction leaks if were sloppy**.
Ill answer in layers:
1. **What ASL actually requires on disk (minimal contract)**
2. **A concrete, sane on-disk layout that fits what you already have**
3. **How shards, blocks, artifacts, logs, and snapshots map**
4. **How this scales from “single laptop” to “Common”**
5. **Why this does *not* over-spec ASL-CORE**
Ill stay very explicit about what is *normative* vs *conventional*.
---
## 1. What ASL actually requires on disk (the hard minimum)
ASL itself does **not** care about directories.
ASL-STORE / ASL-STORE-INDEX require only that the store can provide:
1. **Immutable blocks**
2. **An append-only log**
3. **Sealed snapshots**
4. **Deterministic replay**
5. **CAS addressing**
So the **only hard requirements** are:
* Blocks are immutable after seal
* Index entries point to `(block_id, offset, length)`
* Logs are append-only and ordered
* Snapshots pin a prefix of state
Everything else is an implementation choice.
That means:
👉 your current “store root with shards” is already compatible.
---
## 2. Recommended on-disk layout (practical, explicit)
Here is a layout that is:
* compatible with your current shard idea
* works on POSIX
* maps cleanly to ZFS later
* easy to reason about during rescue / forensics
### Store root (per domain)
```
/asl/
└── domains/
└── <domain-id>/
├── meta/
├── blocks/
├── index/
├── log/
├── snapshots/
└── tmp/
```
Everything below is **domain-local**.
---
## 3. Blocks (the real storage substrate)
### 3.1 Block storage (immutable)
```
blocks/
├── open/
│ └── blk_<uuid>.tmp
└── sealed/
├── 00/
│ └── <blockid>.blk
├── 01/
│ └── <blockid>.blk
└── ff/
└── <blockid>.blk
```
* `blockid` = CAS hash
* Sharded by prefix (first byte or two)
* Open blocks are **never visible**
* Sealed blocks are immutable
This directly matches your **block + offset** mental model.
> Important: **artifacts do not live as files**
> They live *inside blocks*.
---
## 4. Index (where artifacts become visible)
Your shard-based approach fits perfectly here.
```
index/
├── shard-000/
│ ├── segment-0001.idx
│ ├── segment-0002.idx
│ └── bloom.bin
├── shard-001/
│ └── ...
└── shard-fff/
```
Each `segment-XXXX.idx` is:
* append-only while open
* immutable once sealed
* referenced by the log
Index records point to:
```
ArtifactKey → (BlockID, offset, length)
```
This is exactly ASL-STORE-INDEX.
---
## 5. Append-only log (the spine of truth)
```
log/
├── log-0000000000000000.asl
├── log-0000000000001000.asl
└── current
```
Log records include:
* index additions
* tombstones
* segment seals
* DAM updates
* witness rotation artifacts
Rules:
* Logs are strictly ordered
* Never rewritten
* Replayable from snapshot
---
## 6. Snapshots (checkpoints, not magic)
```
snapshots/
├── snapshot-00000123/
│ ├── manifest.yaml
│ ├── pinned-segments.txt
│ └── pinned-blocks.txt
└── snapshot-00000124/
```
A snapshot is:
* a **commit point**
* a pin-set (segments + blocks)
* a replay anchor
ZFS snapshots fit **perfectly** here but are not required.
---
## 7. Meta (authority, policy, identity)
```
meta/
├── dam/
│ └── dam-current.art
├── policy/
│ └── policy-current.art
├── keys/
│ ├── pub/
│ └── priv/
└── host/
└── host-id
```
Important rule:
> **Meta artifacts are still artifacts**
They must:
* be sealed
* be indexed
* appear in the log
The files here are *cached materializations*.
---
## 8. Tmp (explicitly unsafe)
```
tmp/
├── ingest/
├── scratch/
└── pty/
```
This is where:
* asl-capture writes streams
* open blocks live
* crashes are allowed
Nothing here is trusted.
---
## 9. How shards fit (your current design)
You said:
> “I have a store root with shards with artifacts in them.”
That maps cleanly if you reinterpret it as:
* shards own **index segments**
* artifacts are logical entries, not files
* shard locality = lookup acceleration
So instead of:
```
shard/
└── artifact-file
```
You have:
```
shard/
└── index segments → blocks
```
This is the key mental shift you already sensed.
---
## 10. Common vs Personal vs Group (on disk)
**They look identical on disk.**
Differences live in:
* DAM
* policy
* witness artifacts
### Common
* quorum witness
* stricter validation
* often read-only on most hosts
### Personal
* self-authority
* multiple active writers allowed
* same block/index/log structure
### Group
* hybrid policies
No special filesystem layout needed.
---
## 11. Why this is the right abstraction boundary
You said:
> “I used to look at the data as artifacts. Now it seems Im better off looking at data as blocks with artifacts in them.”
Thats exactly right.
**Artifacts are semantic**
**Blocks are physical**
This layout lets you:
* reason in artifacts
* store in blocks
* snapshot blocks
* index artifacts
* federate semantics
without mixing concerns.
---
## 12. One sentence to lock it in
> **Artifacts are indexed meanings; blocks are frozen bytes; snapshots pin the relationship.**
---
## 13. What you can safely implement *now*
Even before ASL-CORE / PEL are finished, you can implement:
* block store
* block sealing
* index segments
* append-only log
* snapshot pinning
* DAM storage and validation
All of that is **infrastructure**, not policy.
---
If you want next, I strongly suggest one of these:
* walk through **one artifact write end-to-end on disk**
* show **how asl-capture writes into blocks**
* show **how a personal domain syncs across two machines**
* map this layout directly onto **ZFS datasets**
Youre no longer at risk of mixing projects — youre converging.