amduat-api/notes/Runtime Execution Semantics & Memory Layout for SIMD-Shard Acceleration.md
2026-01-17 07:37:47 +01:00

243 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
Absolutely — heres a formal draft for **runtime execution semantics and memory layout** for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on **parallel, deterministic, and snapshot-safe execution**.
---
# Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration
---
## 1. Purpose
This specification defines:
* How operators in an execution plan are executed in memory
* How shards, SIMD, and filters are applied efficiently
* Determinism guarantees per snapshot
* Memory layout for index scans, filter evaluation, and traversal expansion
It is fully compatible with:
* **ENC-ASL-TGK-INDEX**
* **Merged ASL + TGK query execution plan**
* **C-struct operator definitions**
---
## 2. Memory Layout Principles
1. **Immutable segments**: Index segments are **read-only** during execution
2. **Shard-local buffers**: Each shard stores a segment of records in contiguous memory
3. **SIMD key arrays**: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
4. **Canonical references**: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
5. **Traversal buffers**: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism
---
## 3. Segment Loading and Sharding
* Each index segment is **assigned to a shard** based on routing key hash
* Segment header is mapped into memory; record arrays are memory-mapped if needed
* For ASL artifacts:
```c
struct shard_asl_segment {
uint64_t *artifact_ids; // 64-bit canonical IDs
uint32_t *type_tags; // optional type tags
uint8_t *has_type_tag; // flags
uint64_t record_count;
};
```
* For TGK edges:
```c
struct shard_tgk_segment {
uint64_t *tgk_edge_ids; // canonical TGK-CORE references
uint32_t *edge_type_keys;
uint8_t *has_edge_type;
uint8_t *roles; // from/to/both
uint64_t record_count;
};
```
* **Shard-local buffers** allow **parallel SIMD evaluation** without inter-shard contention
---
## 4. SIMD-Accelerated Filter Evaluation
* SIMD applies vectorized comparison of:
* Artifact type tags
* Edge type keys
* Routing keys (pre-hashed)
* Example pseudo-code (AVX2):
```c
for (i = 0; i < record_count; i += SIMD_WIDTH) {
simd_load(type_tag[i:i+SIMD_WIDTH])
simd_cmp(type_tag_filter)
simd_mask_store(pass_mask, output_buffer)
}
```
* Determinism guaranteed by **maintaining original order** after filtering (logseq ascending + canonical ID tie-breaker)
---
## 5. Traversal Buffer Semantics (TGK)
* TGKTraversal operator maintains:
```c
struct tgk_traversal_buffer {
uint64_t *edge_ids; // expanded edges
uint64_t *node_ids; // corresponding nodes
uint32_t depth; // current traversal depth
uint64_t count; // number of records in buffer
};
```
* Buffers are **logseq-sorted per depth** to preserve deterministic traversal
* Optional **per-shard buffers** for parallel traversal
---
## 6. Merge Operator Semantics
* Merges **multiple shard-local streams**:
```c
struct merge_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint32_t *type_tags;
uint8_t *roles;
uint64_t count;
};
```
* Merge algorithm: **deterministic heap merge**
1. Compare `logseq` ascending
2. Tie-break with canonical ID
* Ensures same output regardless of shard execution order
---
## 7. Tombstone Shadowing
* Shadowing is **applied post-merge**:
```c
struct tombstone_state {
uint64_t canonical_id;
uint64_t max_logseq_seen;
uint8_t is_tombstoned;
};
```
* Algorithm:
1. Iterate merged buffer
2. For each canonical ID, keep only **latest logseq ≤ snapshot**
3. Drop tombstoned or overridden entries
* Deterministic and **snapshot-safe**
---
## 8. Traversal Expansion with SIMD & Shards
* Input: TGK edge buffer, shard-local nodes
* Steps:
1. **Filter edges** using SIMD (type, role)
2. **Expand edges** to downstream nodes
3. **Append results** to depth-sorted buffer
4. Repeat for depth `d` if traversal requested
5. Maintain deterministic order:
* logseq ascending
* canonical edge ID tie-breaker
---
## 9. Projection & Aggregation Buffers
* Output buffer for projection:
```c
struct projection_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint64_t *node_ids;
uint32_t *type_tags;
uint64_t count;
};
```
* Aggregation performed **in-place** or into **small accumulator structures**:
```c
struct aggregation_accumulator {
uint64_t count;
uint64_t sum_type_tag;
// additional aggregates as needed
};
```
* Deterministic due to **logseq + canonical ID ordering**
---
## 10. Parallelism and SIMD Determinism
* **Shard-local parallelism** allowed
* **SIMD vectorization** allowed
* Global determinism ensured by:
1. Per-shard deterministic processing
2. Deterministic merge of shards
3. Shadowing/tombstone application post-merge
4. Logseq + canonical ID ordering preserved
* This guarantees **identical results across runs and nodes**
---
## 11. Summary of Runtime Buffers
| Buffer | Contents | Purpose |
| ------------------------- | ------------------------- | --------------------------------- |
| `shard_asl_segment` | artifact IDs, type tags | parallel scan/filter |
| `shard_tgk_segment` | edge IDs, edge type, role | parallel scan/filter |
| `tgk_traversal_buffer` | expanded edges/nodes | DAG traversal |
| `merge_buffer` | merged records | deterministic merge across shards |
| `projection_buffer` | projected fields | final output |
| `aggregation_accumulator` | counts/sums | deterministic aggregation |
| `tombstone_state` | canonical ID + logseq | shadowing |
* All buffers are **snapshot-bound and immutable where possible**
* SIMD/shard acceleration does not affect determinism
---
This completes a **full runtime execution model**, including **memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation**.
---
Next steps could be:
1. **Formal operator pseudocode in C**, implementing SIMD and shard logic
2. **Memory-mapped index access routines** for ENC-ASL-TGK-INDEX
3. **Integration with PEL deterministic DAG execution receipts**
Do you want me to draft the **operator pseudocode next**?