amduat-api/notes/Runtime Execution Semantics & Memory Layout for SIMD-Shard Acceleration.md

243 lines
6.6 KiB
Markdown
Raw Normal View History

NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
Absolutely — heres a formal draft for **runtime execution semantics and memory layout** for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on **parallel, deterministic, and snapshot-safe execution**.
---
# Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration
---
## 1. Purpose
This specification defines:
* How operators in an execution plan are executed in memory
* How shards, SIMD, and filters are applied efficiently
* Determinism guarantees per snapshot
* Memory layout for index scans, filter evaluation, and traversal expansion
It is fully compatible with:
* **ENC-ASL-TGK-INDEX**
* **Merged ASL + TGK query execution plan**
* **C-struct operator definitions**
---
## 2. Memory Layout Principles
1. **Immutable segments**: Index segments are **read-only** during execution
2. **Shard-local buffers**: Each shard stores a segment of records in contiguous memory
3. **SIMD key arrays**: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
4. **Canonical references**: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
5. **Traversal buffers**: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism
---
## 3. Segment Loading and Sharding
* Each index segment is **assigned to a shard** based on routing key hash
* Segment header is mapped into memory; record arrays are memory-mapped if needed
* For ASL artifacts:
```c
struct shard_asl_segment {
uint64_t *artifact_ids; // 64-bit canonical IDs
uint32_t *type_tags; // optional type tags
uint8_t *has_type_tag; // flags
uint64_t record_count;
};
```
* For TGK edges:
```c
struct shard_tgk_segment {
uint64_t *tgk_edge_ids; // canonical TGK-CORE references
uint32_t *edge_type_keys;
uint8_t *has_edge_type;
uint8_t *roles; // from/to/both
uint64_t record_count;
};
```
* **Shard-local buffers** allow **parallel SIMD evaluation** without inter-shard contention
---
## 4. SIMD-Accelerated Filter Evaluation
* SIMD applies vectorized comparison of:
* Artifact type tags
* Edge type keys
* Routing keys (pre-hashed)
* Example pseudo-code (AVX2):
```c
for (i = 0; i < record_count; i += SIMD_WIDTH) {
simd_load(type_tag[i:i+SIMD_WIDTH])
simd_cmp(type_tag_filter)
simd_mask_store(pass_mask, output_buffer)
}
```
* Determinism guaranteed by **maintaining original order** after filtering (logseq ascending + canonical ID tie-breaker)
---
## 5. Traversal Buffer Semantics (TGK)
* TGKTraversal operator maintains:
```c
struct tgk_traversal_buffer {
uint64_t *edge_ids; // expanded edges
uint64_t *node_ids; // corresponding nodes
uint32_t depth; // current traversal depth
uint64_t count; // number of records in buffer
};
```
* Buffers are **logseq-sorted per depth** to preserve deterministic traversal
* Optional **per-shard buffers** for parallel traversal
---
## 6. Merge Operator Semantics
* Merges **multiple shard-local streams**:
```c
struct merge_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint32_t *type_tags;
uint8_t *roles;
uint64_t count;
};
```
* Merge algorithm: **deterministic heap merge**
1. Compare `logseq` ascending
2. Tie-break with canonical ID
* Ensures same output regardless of shard execution order
---
## 7. Tombstone Shadowing
* Shadowing is **applied post-merge**:
```c
struct tombstone_state {
uint64_t canonical_id;
uint64_t max_logseq_seen;
uint8_t is_tombstoned;
};
```
* Algorithm:
1. Iterate merged buffer
2. For each canonical ID, keep only **latest logseq ≤ snapshot**
3. Drop tombstoned or overridden entries
* Deterministic and **snapshot-safe**
---
## 8. Traversal Expansion with SIMD & Shards
* Input: TGK edge buffer, shard-local nodes
* Steps:
1. **Filter edges** using SIMD (type, role)
2. **Expand edges** to downstream nodes
3. **Append results** to depth-sorted buffer
4. Repeat for depth `d` if traversal requested
5. Maintain deterministic order:
* logseq ascending
* canonical edge ID tie-breaker
---
## 9. Projection & Aggregation Buffers
* Output buffer for projection:
```c
struct projection_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint64_t *node_ids;
uint32_t *type_tags;
uint64_t count;
};
```
* Aggregation performed **in-place** or into **small accumulator structures**:
```c
struct aggregation_accumulator {
uint64_t count;
uint64_t sum_type_tag;
// additional aggregates as needed
};
```
* Deterministic due to **logseq + canonical ID ordering**
---
## 10. Parallelism and SIMD Determinism
* **Shard-local parallelism** allowed
* **SIMD vectorization** allowed
* Global determinism ensured by:
1. Per-shard deterministic processing
2. Deterministic merge of shards
3. Shadowing/tombstone application post-merge
4. Logseq + canonical ID ordering preserved
* This guarantees **identical results across runs and nodes**
---
## 11. Summary of Runtime Buffers
| Buffer | Contents | Purpose |
| ------------------------- | ------------------------- | --------------------------------- |
| `shard_asl_segment` | artifact IDs, type tags | parallel scan/filter |
| `shard_tgk_segment` | edge IDs, edge type, role | parallel scan/filter |
| `tgk_traversal_buffer` | expanded edges/nodes | DAG traversal |
| `merge_buffer` | merged records | deterministic merge across shards |
| `projection_buffer` | projected fields | final output |
| `aggregation_accumulator` | counts/sums | deterministic aggregation |
| `tombstone_state` | canonical ID + logseq | shadowing |
* All buffers are **snapshot-bound and immutable where possible**
* SIMD/shard acceleration does not affect determinism
---
This completes a **full runtime execution model**, including **memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation**.
---
Next steps could be:
1. **Formal operator pseudocode in C**, implementing SIMD and shard logic
2. **Memory-mapped index access routines** for ENC-ASL-TGK-INDEX
3. **Integration with PEL deterministic DAG execution receipts**
Do you want me to draft the **operator pseudocode next**?