amduat-api/notes/Federation and Cross-Node Deterministic Replay Specification.md

194 lines
5.8 KiB
Markdown
Raw Normal View History

Absolutely — heres a formal draft for **federation propagation rules and cross-node deterministic replay** for ASL + TGK + PEL. This ensures deterministic, snapshot-safe execution across multiple nodes while preserving artifact provenance and DAG integrity.
---
# Federation and Cross-Node Deterministic Replay Specification
---
## 1. Purpose
This specification defines:
1. How **artifacts, PERs, and TGK edges** are propagated across federation nodes.
2. How **deterministic replay** is guaranteed on remote nodes.
3. How **snapshots, shards, and logs** are used to synchronize state.
4. Rules for **conflict resolution, tombstone handling, and provenance integrity**.
---
## 2. Key Concepts
* **Node**: Independent system with access to ASL/TGK storage and PEL execution environment.
* **Federation**: Set of nodes that share artifacts, execution receipts, and TGK edges.
* **ArtifactKey**: Canonical identifier for artifacts or PERs.
* **SnapshotID**: Unique identifier of a ZFS snapshot (per pool or globally assigned).
* **Log Sequence (logseq)**: Monotonic sequence ensuring ordering for deterministic replay.
* **Execution Receipt (PER)**: Artifact describing the deterministic output of a PEL program.
---
## 3. Propagation Rules
### 3.1 Artifact & PER Propagation
1. **New artifacts or PERs** are assigned a **global canonical ArtifactKey**.
2. Each node maintains a **local shard mapping**; shard boundaries may differ per node.
3. Artifacts are propagated via **snapshot-delta sync**:
* Only artifacts **logseq > last replicated logseq** are transmitted.
* Each artifact includes:
* `ArtifactKey`
* `logseq`
* `type_tag` (optional)
* Payload checksum (hash)
4. PER artifacts are treated the same as raw artifacts but may include additional **PEL DAG metadata**.
---
### 3.2 TGK Edge Propagation
1. TGK edges reference canonical ArtifactKeys and NodeIDs.
2. Each edge includes:
* From nodes list
* To nodes list
* Edge type key
* Roles (from/to/both)
* logseq
3. Edges are propagated **incrementally**, respecting snapshot boundaries.
4. Deterministic ordering:
* Edges sorted by `(logseq, canonical_edge_id)` on transmit
* Replay nodes consume edges in the same order
---
### 3.3 Snapshot and Log Management
* Each node maintains:
1. **Last applied snapshot** per federation peer
2. **Sequential write log** for artifacts and edges
* Replay on a remote node:
1. Apply artifacts and edges sequentially from log
2. Only apply artifacts **≤ target snapshot**
3. Merge multiple logs deterministically via `(logseq, canonical_id)` tie-breaker
---
## 4. Conflict Resolution
1. **ArtifactKey collisions**:
* If hash matches existing artifact → discard duplicate
* If hash differs → flag conflict, require manual reconciliation or automated deterministic resolution
2. **TGK edge conflicts**:
* Multiple edges with same `from/to/type` but different logseq → pick latest ≤ snapshot
* Shadowed edges handled via **TombstoneShadow operator**
3. **PER replay conflicts**:
* Identical PEL DAG + identical inputs → skip execution
* Divergent inputs → log error, optionally recompute
---
## 5. Deterministic Replay Algorithm
```c
void FederationReplay(log_buffer_t *incoming_log, snapshot_range_t target_snapshot) {
// Sort incoming log deterministically
sort(incoming_log, by_logseq_then_canonical_id);
for (uint64_t i = 0; i < incoming_log->count; i++) {
record_t rec = incoming_log->records[i];
// Skip artifacts beyond target snapshot
if (rec.logseq > target_snapshot.logseq_max) continue;
// Apply artifact or TGK edge
if (rec.type == ARTIFACT || rec.type == PER) {
ApplyArtifact(rec);
} else if (rec.type == TGK_EDGE) {
ApplyTGKEdge(rec);
}
// Shadow tombstones deterministically
if (rec.is_tombstone) {
ApplyTombstone(rec.canonical_id, rec.logseq);
}
}
}
```
* Guarantees **deterministic replay** across nodes.
* Uses **logseq + canonical ID ordering** for tie-breaking.
---
## 6. Shard-Local Execution
* After federation sync, **local shards** may differ.
* Execution plan operators (SegmentScan, IndexFilter, TGKTraversal) operate **on local shards**.
* Global determinism maintained by:
* Deterministic merge of shards
* Snapshot constraints
* Canonical ordering of artifacts and edges
---
## 7. Provenance and Audit
* Each node maintains:
* **Snapshot provenance table**: snapshot ID → list of applied artifacts/PERs
* **Federation log table**: peer node → last applied logseq
* Deterministic execution allows **replay and auditing**:
* Verify that `final_output` is identical across nodes
* Provenance tables ensure **full traceability**
---
## 8. Multi-Node DAG Execution
1. PEL programs may span **multiple nodes**:
* Inputs and intermediate PERs propagated deterministically
* DAG nodes executed locally when all inputs are available
2. Determinism guaranteed because:
* Inputs constrained by snapshot + logseq
* Operators are deterministic
* Merge, shadowing, and projection preserve canonical ordering
---
## 9. Summary
Federation and cross-node deterministic replay:
* Uses **logseq + canonical IDs** for deterministic ordering
* Supports **PER and TGK artifacts** across nodes
* Enforces **snapshot constraints**
* Enables **federated PEL program execution**
* Preserves **provenance, tombstones, and deterministic DAG evaluation**
* Compatible with SIMD/shard acceleration and ENC-ASL-TGK-INDEX memory layout
---
Next step could be **drafting a formal overall architecture diagram** showing:
* PEL programs
* ASL/TGK storage
* Execution plan operators
* Shard/SIMD execution
* Federation propagation and replay paths
Do you want me to draft that architecture diagram next?