NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context. Absolutely — here’s a formal draft for **runtime execution semantics and memory layout** for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on **parallel, deterministic, and snapshot-safe execution**. --- # Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration --- ## 1. Purpose This specification defines: * How operators in an execution plan are executed in memory * How shards, SIMD, and filters are applied efficiently * Determinism guarantees per snapshot * Memory layout for index scans, filter evaluation, and traversal expansion It is fully compatible with: * **ENC-ASL-TGK-INDEX** * **Merged ASL + TGK query execution plan** * **C-struct operator definitions** --- ## 2. Memory Layout Principles 1. **Immutable segments**: Index segments are **read-only** during execution 2. **Shard-local buffers**: Each shard stores a segment of records in contiguous memory 3. **SIMD key arrays**: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation 4. **Canonical references**: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access 5. **Traversal buffers**: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism --- ## 3. Segment Loading and Sharding * Each index segment is **assigned to a shard** based on routing key hash * Segment header is mapped into memory; record arrays are memory-mapped if needed * For ASL artifacts: ```c struct shard_asl_segment { uint64_t *artifact_ids; // 64-bit canonical IDs uint32_t *type_tags; // optional type tags uint8_t *has_type_tag; // flags uint64_t record_count; }; ``` * For TGK edges: ```c struct shard_tgk_segment { uint64_t *tgk_edge_ids; // canonical TGK-CORE references uint32_t *edge_type_keys; uint8_t *has_edge_type; uint8_t *roles; // from/to/both uint64_t record_count; }; ``` * **Shard-local buffers** allow **parallel SIMD evaluation** without inter-shard contention --- ## 4. SIMD-Accelerated Filter Evaluation * SIMD applies vectorized comparison of: * Artifact type tags * Edge type keys * Routing keys (pre-hashed) * Example pseudo-code (AVX2): ```c for (i = 0; i < record_count; i += SIMD_WIDTH) { simd_load(type_tag[i:i+SIMD_WIDTH]) simd_cmp(type_tag_filter) simd_mask_store(pass_mask, output_buffer) } ``` * Determinism guaranteed by **maintaining original order** after filtering (logseq ascending + canonical ID tie-breaker) --- ## 5. Traversal Buffer Semantics (TGK) * TGKTraversal operator maintains: ```c struct tgk_traversal_buffer { uint64_t *edge_ids; // expanded edges uint64_t *node_ids; // corresponding nodes uint32_t depth; // current traversal depth uint64_t count; // number of records in buffer }; ``` * Buffers are **logseq-sorted per depth** to preserve deterministic traversal * Optional **per-shard buffers** for parallel traversal --- ## 6. Merge Operator Semantics * Merges **multiple shard-local streams**: ```c struct merge_buffer { uint64_t *artifact_ids; uint64_t *tgk_edge_ids; uint32_t *type_tags; uint8_t *roles; uint64_t count; }; ``` * Merge algorithm: **deterministic heap merge** 1. Compare `logseq` ascending 2. Tie-break with canonical ID * Ensures same output regardless of shard execution order --- ## 7. Tombstone Shadowing * Shadowing is **applied post-merge**: ```c struct tombstone_state { uint64_t canonical_id; uint64_t max_logseq_seen; uint8_t is_tombstoned; }; ``` * Algorithm: 1. Iterate merged buffer 2. For each canonical ID, keep only **latest logseq ≤ snapshot** 3. Drop tombstoned or overridden entries * Deterministic and **snapshot-safe** --- ## 8. Traversal Expansion with SIMD & Shards * Input: TGK edge buffer, shard-local nodes * Steps: 1. **Filter edges** using SIMD (type, role) 2. **Expand edges** to downstream nodes 3. **Append results** to depth-sorted buffer 4. Repeat for depth `d` if traversal requested 5. Maintain deterministic order: * logseq ascending * canonical edge ID tie-breaker --- ## 9. Projection & Aggregation Buffers * Output buffer for projection: ```c struct projection_buffer { uint64_t *artifact_ids; uint64_t *tgk_edge_ids; uint64_t *node_ids; uint32_t *type_tags; uint64_t count; }; ``` * Aggregation performed **in-place** or into **small accumulator structures**: ```c struct aggregation_accumulator { uint64_t count; uint64_t sum_type_tag; // additional aggregates as needed }; ``` * Deterministic due to **logseq + canonical ID ordering** --- ## 10. Parallelism and SIMD Determinism * **Shard-local parallelism** allowed * **SIMD vectorization** allowed * Global determinism ensured by: 1. Per-shard deterministic processing 2. Deterministic merge of shards 3. Shadowing/tombstone application post-merge 4. Logseq + canonical ID ordering preserved * This guarantees **identical results across runs and nodes** --- ## 11. Summary of Runtime Buffers | Buffer | Contents | Purpose | | ------------------------- | ------------------------- | --------------------------------- | | `shard_asl_segment` | artifact IDs, type tags | parallel scan/filter | | `shard_tgk_segment` | edge IDs, edge type, role | parallel scan/filter | | `tgk_traversal_buffer` | expanded edges/nodes | DAG traversal | | `merge_buffer` | merged records | deterministic merge across shards | | `projection_buffer` | projected fields | final output | | `aggregation_accumulator` | counts/sums | deterministic aggregation | | `tombstone_state` | canonical ID + logseq | shadowing | * All buffers are **snapshot-bound and immutable where possible** * SIMD/shard acceleration does not affect determinism --- This completes a **full runtime execution model**, including **memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation**. --- Next steps could be: 1. **Formal operator pseudocode in C**, implementing SIMD and shard logic 2. **Memory-mapped index access routines** for ENC-ASL-TGK-INDEX 3. **Integration with PEL deterministic DAG execution receipts** Do you want me to draft the **operator pseudocode next**?