241 lines
6.5 KiB
Markdown
241 lines
6.5 KiB
Markdown
Absolutely — here’s a formal draft for **runtime execution semantics and memory layout** for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on **parallel, deterministic, and snapshot-safe execution**.
|
||
|
||
---
|
||
|
||
# Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This specification defines:
|
||
|
||
* How operators in an execution plan are executed in memory
|
||
* How shards, SIMD, and filters are applied efficiently
|
||
* Determinism guarantees per snapshot
|
||
* Memory layout for index scans, filter evaluation, and traversal expansion
|
||
|
||
It is fully compatible with:
|
||
|
||
* **ENC-ASL-TGK-INDEX**
|
||
* **Merged ASL + TGK query execution plan**
|
||
* **C-struct operator definitions**
|
||
|
||
---
|
||
|
||
## 2. Memory Layout Principles
|
||
|
||
1. **Immutable segments**: Index segments are **read-only** during execution
|
||
2. **Shard-local buffers**: Each shard stores a segment of records in contiguous memory
|
||
3. **SIMD key arrays**: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
|
||
4. **Canonical references**: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
|
||
5. **Traversal buffers**: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism
|
||
|
||
---
|
||
|
||
## 3. Segment Loading and Sharding
|
||
|
||
* Each index segment is **assigned to a shard** based on routing key hash
|
||
* Segment header is mapped into memory; record arrays are memory-mapped if needed
|
||
* For ASL artifacts:
|
||
|
||
```c
|
||
struct shard_asl_segment {
|
||
uint64_t *artifact_ids; // 64-bit canonical IDs
|
||
uint32_t *type_tags; // optional type tags
|
||
uint8_t *has_type_tag; // flags
|
||
uint64_t record_count;
|
||
};
|
||
```
|
||
|
||
* For TGK edges:
|
||
|
||
```c
|
||
struct shard_tgk_segment {
|
||
uint64_t *tgk_edge_ids; // canonical TGK-CORE references
|
||
uint32_t *edge_type_keys;
|
||
uint8_t *has_edge_type;
|
||
uint8_t *roles; // from/to/both
|
||
uint64_t record_count;
|
||
};
|
||
```
|
||
|
||
* **Shard-local buffers** allow **parallel SIMD evaluation** without inter-shard contention
|
||
|
||
---
|
||
|
||
## 4. SIMD-Accelerated Filter Evaluation
|
||
|
||
* SIMD applies vectorized comparison of:
|
||
|
||
* Artifact type tags
|
||
* Edge type keys
|
||
* Routing keys (pre-hashed)
|
||
* Example pseudo-code (AVX2):
|
||
|
||
```c
|
||
for (i = 0; i < record_count; i += SIMD_WIDTH) {
|
||
simd_load(type_tag[i:i+SIMD_WIDTH])
|
||
simd_cmp(type_tag_filter)
|
||
simd_mask_store(pass_mask, output_buffer)
|
||
}
|
||
```
|
||
|
||
* Determinism guaranteed by **maintaining original order** after filtering (logseq ascending + canonical ID tie-breaker)
|
||
|
||
---
|
||
|
||
## 5. Traversal Buffer Semantics (TGK)
|
||
|
||
* TGKTraversal operator maintains:
|
||
|
||
```c
|
||
struct tgk_traversal_buffer {
|
||
uint64_t *edge_ids; // expanded edges
|
||
uint64_t *node_ids; // corresponding nodes
|
||
uint32_t depth; // current traversal depth
|
||
uint64_t count; // number of records in buffer
|
||
};
|
||
```
|
||
|
||
* Buffers are **logseq-sorted per depth** to preserve deterministic traversal
|
||
* Optional **per-shard buffers** for parallel traversal
|
||
|
||
---
|
||
|
||
## 6. Merge Operator Semantics
|
||
|
||
* Merges **multiple shard-local streams**:
|
||
|
||
```c
|
||
struct merge_buffer {
|
||
uint64_t *artifact_ids;
|
||
uint64_t *tgk_edge_ids;
|
||
uint32_t *type_tags;
|
||
uint8_t *roles;
|
||
uint64_t count;
|
||
};
|
||
```
|
||
|
||
* Merge algorithm: **deterministic heap merge**
|
||
|
||
1. Compare `logseq` ascending
|
||
2. Tie-break with canonical ID
|
||
|
||
* Ensures same output regardless of shard execution order
|
||
|
||
---
|
||
|
||
## 7. Tombstone Shadowing
|
||
|
||
* Shadowing is **applied post-merge**:
|
||
|
||
```c
|
||
struct tombstone_state {
|
||
uint64_t canonical_id;
|
||
uint64_t max_logseq_seen;
|
||
uint8_t is_tombstoned;
|
||
};
|
||
```
|
||
|
||
* Algorithm:
|
||
|
||
1. Iterate merged buffer
|
||
2. For each canonical ID, keep only **latest logseq ≤ snapshot**
|
||
3. Drop tombstoned or overridden entries
|
||
|
||
* Deterministic and **snapshot-safe**
|
||
|
||
---
|
||
|
||
## 8. Traversal Expansion with SIMD & Shards
|
||
|
||
* Input: TGK edge buffer, shard-local nodes
|
||
* Steps:
|
||
|
||
1. **Filter edges** using SIMD (type, role)
|
||
2. **Expand edges** to downstream nodes
|
||
3. **Append results** to depth-sorted buffer
|
||
4. Repeat for depth `d` if traversal requested
|
||
5. Maintain deterministic order:
|
||
|
||
* logseq ascending
|
||
* canonical edge ID tie-breaker
|
||
|
||
---
|
||
|
||
## 9. Projection & Aggregation Buffers
|
||
|
||
* Output buffer for projection:
|
||
|
||
```c
|
||
struct projection_buffer {
|
||
uint64_t *artifact_ids;
|
||
uint64_t *tgk_edge_ids;
|
||
uint64_t *node_ids;
|
||
uint32_t *type_tags;
|
||
uint64_t count;
|
||
};
|
||
```
|
||
|
||
* Aggregation performed **in-place** or into **small accumulator structures**:
|
||
|
||
```c
|
||
struct aggregation_accumulator {
|
||
uint64_t count;
|
||
uint64_t sum_type_tag;
|
||
// additional aggregates as needed
|
||
};
|
||
```
|
||
|
||
* Deterministic due to **logseq + canonical ID ordering**
|
||
|
||
---
|
||
|
||
## 10. Parallelism and SIMD Determinism
|
||
|
||
* **Shard-local parallelism** allowed
|
||
|
||
* **SIMD vectorization** allowed
|
||
|
||
* Global determinism ensured by:
|
||
|
||
1. Per-shard deterministic processing
|
||
2. Deterministic merge of shards
|
||
3. Shadowing/tombstone application post-merge
|
||
4. Logseq + canonical ID ordering preserved
|
||
|
||
* This guarantees **identical results across runs and nodes**
|
||
|
||
---
|
||
|
||
## 11. Summary of Runtime Buffers
|
||
|
||
| Buffer | Contents | Purpose |
|
||
| ------------------------- | ------------------------- | --------------------------------- |
|
||
| `shard_asl_segment` | artifact IDs, type tags | parallel scan/filter |
|
||
| `shard_tgk_segment` | edge IDs, edge type, role | parallel scan/filter |
|
||
| `tgk_traversal_buffer` | expanded edges/nodes | DAG traversal |
|
||
| `merge_buffer` | merged records | deterministic merge across shards |
|
||
| `projection_buffer` | projected fields | final output |
|
||
| `aggregation_accumulator` | counts/sums | deterministic aggregation |
|
||
| `tombstone_state` | canonical ID + logseq | shadowing |
|
||
|
||
* All buffers are **snapshot-bound and immutable where possible**
|
||
* SIMD/shard acceleration does not affect determinism
|
||
|
||
---
|
||
|
||
This completes a **full runtime execution model**, including **memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation**.
|
||
|
||
---
|
||
|
||
Next steps could be:
|
||
|
||
1. **Formal operator pseudocode in C**, implementing SIMD and shard logic
|
||
2. **Memory-mapped index access routines** for ENC-ASL-TGK-INDEX
|
||
3. **Integration with PEL deterministic DAG execution receipts**
|
||
|
||
Do you want me to draft the **operator pseudocode next**?
|
||
|