217 lines
6.3 KiB
Markdown
217 lines
6.3 KiB
Markdown
|
|
# Unified ASL + TGK + PEL System Specification (Master Reference)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Introduction
|
||
|
|
|
||
|
|
This document specifies a unified system for deterministic, federated, snapshot-safe storage and execution of artifacts, execution receipts (PERs), and TGK edges. The system integrates:
|
||
|
|
|
||
|
|
* **ASL (Artifact Storage Layer)**
|
||
|
|
* **TGK (Trace Graph Kernel)**
|
||
|
|
* **PEL (Program Execution Layer)**
|
||
|
|
* **Indexing, Shard/SIMD acceleration**
|
||
|
|
* **Federation and deterministic replay**
|
||
|
|
|
||
|
|
The system supports **billions of artifacts and edges**, deterministic DAG execution, and cross-node provenance.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Core Concepts
|
||
|
|
|
||
|
|
| Concept | Description |
|
||
|
|
| ------------ | ------------------------------------------------------------------------------------------------------------- |
|
||
|
|
| Artifact | Basic unit stored in ASL; may include optional `type_tag` and `has_type_tag`. |
|
||
|
|
| PER | PEL Execution Receipt; artifact describing deterministic output of a PEL program. |
|
||
|
|
| TGK Edge | Represents a directed relation between artifacts/PERs. Stores `from_nodes`, `to_nodes`, `edge_type`, `roles`. |
|
||
|
|
| Snapshot | ZFS snapshot, defines read visibility and deterministic execution boundary. |
|
||
|
|
| Logseq | Monotonic sequence number for deterministic ordering. |
|
||
|
|
| Shard | Subset of artifacts/edges partitioned for SIMD/parallel execution. |
|
||
|
|
| Canonical ID | Unique identifier per artifact, PER, or TGK edge. |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. ASL-CORE & ASL-STORE-INDEX
|
||
|
|
|
||
|
|
### 3.1 ASL-CORE
|
||
|
|
|
||
|
|
* Defines **artifact semantics**:
|
||
|
|
|
||
|
|
* Optional `type_tag` (32-bit) with `has_type_tag` (8-bit toggle)
|
||
|
|
* Artifacts are immutable once written
|
||
|
|
* PERs are treated as artifacts
|
||
|
|
|
||
|
|
### 3.2 ASL-STORE-INDEX
|
||
|
|
|
||
|
|
* Manages **artifact blocks**, including:
|
||
|
|
|
||
|
|
* Small vs. large blocks (packaging)
|
||
|
|
* Block sealing, retention, snapshot safety
|
||
|
|
* Index structure:
|
||
|
|
|
||
|
|
* **Shard-local**, supports **billion-scale lookups**
|
||
|
|
* Bloom filters for quick membership queries
|
||
|
|
* Sharding and SIMD acceleration for memory-efficient lookups
|
||
|
|
* Record Layout (C struct):
|
||
|
|
|
||
|
|
```c
|
||
|
|
typedef struct {
|
||
|
|
uint64_t artifact_key;
|
||
|
|
uint64_t block_id;
|
||
|
|
uint32_t offset;
|
||
|
|
uint32_t length;
|
||
|
|
uint32_t type_tag;
|
||
|
|
uint8_t has_type_tag;
|
||
|
|
} artifact_index_entry_t;
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. ENC-ASL-TGK-INDEX
|
||
|
|
|
||
|
|
* Defines **encoding for artifacts, PERs, and TGK edges** in storage.
|
||
|
|
* TGK edges stored as:
|
||
|
|
|
||
|
|
```c
|
||
|
|
typedef struct {
|
||
|
|
uint64_t canonical_edge_id;
|
||
|
|
uint64_t from_nodes[MAX_FROM];
|
||
|
|
uint64_t to_nodes[MAX_TO];
|
||
|
|
uint32_t edge_type;
|
||
|
|
uint8_t roles;
|
||
|
|
uint64_t logseq;
|
||
|
|
} tgk_edge_record_t;
|
||
|
|
```
|
||
|
|
|
||
|
|
* Supports deterministic traversal, snapshot bounds, and SIMD filtering.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. PEL Integration
|
||
|
|
|
||
|
|
### 5.1 PEL Program DAG
|
||
|
|
|
||
|
|
* Deterministic DAG with:
|
||
|
|
|
||
|
|
* Inputs: artifacts or PERs
|
||
|
|
* Computation nodes: concat, slice, primitive ops
|
||
|
|
* Outputs: artifacts or PERs
|
||
|
|
* Guarantees snapshot-bound determinism:
|
||
|
|
|
||
|
|
* Inputs: `logseq ≤ snapshot_max`
|
||
|
|
* Outputs: `logseq = max(input_logseq) + 1`
|
||
|
|
|
||
|
|
### 5.2 Execution Plan Mapping
|
||
|
|
|
||
|
|
| PEL Node | Execution Plan Operator |
|
||
|
|
| -------------- | ---------------------------- |
|
||
|
|
| Input Artifact | SegmentScan |
|
||
|
|
| Concat/Slice | Projection |
|
||
|
|
| TGK Projection | TGKTraversal |
|
||
|
|
| Aggregate | Aggregation |
|
||
|
|
| PER Output | SegmentScan (fed downstream) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Execution Plan Operators
|
||
|
|
|
||
|
|
* **SegmentScan**: scan artifacts/PERs within snapshot
|
||
|
|
* **IndexFilter**: SIMD-accelerated filtering by type_tag, edge_type, role
|
||
|
|
* **Merge**: deterministic merge across shards
|
||
|
|
* **TGKTraversal**: depth-limited deterministic DAG traversal
|
||
|
|
* **Projection**: select fields
|
||
|
|
* **Aggregation**: count, sum, union
|
||
|
|
* **TombstoneShadow**: applies tombstones and ensures snapshot safety
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Shard & SIMD Execution
|
||
|
|
|
||
|
|
* Artifacts/edges partitioned by shard
|
||
|
|
* SIMD applied per shard for filters and traversal
|
||
|
|
* Deterministic merge across shards ensures global ordering
|
||
|
|
* Buffers structured for memory alignment:
|
||
|
|
|
||
|
|
```c
|
||
|
|
struct shard_buffer {
|
||
|
|
uint64_t *artifact_ids;
|
||
|
|
uint64_t *tgk_edge_ids;
|
||
|
|
uint32_t *type_tags;
|
||
|
|
uint8_t *roles;
|
||
|
|
uint64_t count;
|
||
|
|
snapshot_range_t snapshot;
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Federation & Cross-Node Deterministic Replay
|
||
|
|
|
||
|
|
* **Propagation rules**:
|
||
|
|
|
||
|
|
* Only new artifacts/PERs/edges (`logseq > last_applied`) transmitted
|
||
|
|
* Delta replication per snapshot
|
||
|
|
* **Replay rules**:
|
||
|
|
|
||
|
|
* Sort by `(logseq, canonical_id)` for deterministic application
|
||
|
|
* Apply tombstones/shadowing
|
||
|
|
* Preserve snapshot boundaries
|
||
|
|
* **Conflict resolution**:
|
||
|
|
|
||
|
|
* ArtifactKey collisions: duplicate hash → ignore, differing hash → flag
|
||
|
|
* Edge conflicts: latest logseq ≤ snapshot
|
||
|
|
* PER conflicts: identical inputs → skip execution
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. Provenance & Audit
|
||
|
|
|
||
|
|
* **Provenance table**: snapshot → artifacts/PERs applied
|
||
|
|
* **Federation log table**: peer node → last applied logseq
|
||
|
|
* **Deterministic replay** guarantees identical final outputs across nodes
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 10. Data Flow Summary
|
||
|
|
|
||
|
|
```
|
||
|
|
PEL DAG Inputs --> Execute PEL Program --> Generate PERs
|
||
|
|
| |
|
||
|
|
v v
|
||
|
|
ASL/TGK Shard Buffers (SIMD-aligned, snapshot-safe)
|
||
|
|
|
|
||
|
|
v
|
||
|
|
Execution Plan Operators (SegmentScan, IndexFilter, Merge, TGKTraversal, TombstoneShadow)
|
||
|
|
|
|
||
|
|
v
|
||
|
|
Final Output (artifacts + PERs + TGK projections)
|
||
|
|
|
|
||
|
|
v
|
||
|
|
Federation Layer (propagation & deterministic replay across nodes)
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 11. Snapshot & Log Integration
|
||
|
|
|
||
|
|
* All operations are **snapshot-bounded**.
|
||
|
|
* **ZFS snapshots** + append-only sequential logs provide:
|
||
|
|
|
||
|
|
* Checkpointing
|
||
|
|
* Deterministic replay
|
||
|
|
* Garbage collection of unreachable artifacts while preserving provenance
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 12. Summary
|
||
|
|
|
||
|
|
This unified system specification ensures:
|
||
|
|
|
||
|
|
* **Deterministic execution** (PEL + index + TGK)
|
||
|
|
* **Snapshot-safe operations**
|
||
|
|
* **Shard/SIMD acceleration**
|
||
|
|
* **Federated, replayable, cross-node consistency**
|
||
|
|
* **Integration of PER artifacts with TGK edges**
|
||
|
|
* **Provenance and auditability at scale**
|
||
|
|
|
||
|
|
|