amduat-api/notes/Runtime Execution Semantics & Memory Layout for SIMD-Shard Acceleration.md
2026-01-17 00:19:49 +01:00

6.5 KiB
Raw Blame History

Absolutely — heres a formal draft for runtime execution semantics and memory layout for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on parallel, deterministic, and snapshot-safe execution.


Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration


1. Purpose

This specification defines:

  • How operators in an execution plan are executed in memory
  • How shards, SIMD, and filters are applied efficiently
  • Determinism guarantees per snapshot
  • Memory layout for index scans, filter evaluation, and traversal expansion

It is fully compatible with:

  • ENC-ASL-TGK-INDEX
  • Merged ASL + TGK query execution plan
  • C-struct operator definitions

2. Memory Layout Principles

  1. Immutable segments: Index segments are read-only during execution
  2. Shard-local buffers: Each shard stores a segment of records in contiguous memory
  3. SIMD key arrays: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
  4. Canonical references: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
  5. Traversal buffers: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism

3. Segment Loading and Sharding

  • Each index segment is assigned to a shard based on routing key hash
  • Segment header is mapped into memory; record arrays are memory-mapped if needed
  • For ASL artifacts:
struct shard_asl_segment {
    uint64_t *artifact_ids;       // 64-bit canonical IDs
    uint32_t *type_tags;          // optional type tags
    uint8_t  *has_type_tag;       // flags
    uint64_t record_count;
};
  • For TGK edges:
struct shard_tgk_segment {
    uint64_t *tgk_edge_ids;       // canonical TGK-CORE references
    uint32_t *edge_type_keys;
    uint8_t  *has_edge_type;
    uint8_t  *roles;              // from/to/both
    uint64_t record_count;
};
  • Shard-local buffers allow parallel SIMD evaluation without inter-shard contention

4. SIMD-Accelerated Filter Evaluation

  • SIMD applies vectorized comparison of:

    • Artifact type tags
    • Edge type keys
    • Routing keys (pre-hashed)
  • Example pseudo-code (AVX2):

for (i = 0; i < record_count; i += SIMD_WIDTH) {
    simd_load(type_tag[i:i+SIMD_WIDTH])
    simd_cmp(type_tag_filter)
    simd_mask_store(pass_mask, output_buffer)
}
  • Determinism guaranteed by maintaining original order after filtering (logseq ascending + canonical ID tie-breaker)

5. Traversal Buffer Semantics (TGK)

  • TGKTraversal operator maintains:
struct tgk_traversal_buffer {
    uint64_t *edge_ids;        // expanded edges
    uint64_t *node_ids;        // corresponding nodes
    uint32_t  depth;           // current traversal depth
    uint64_t count;            // number of records in buffer
};
  • Buffers are logseq-sorted per depth to preserve deterministic traversal
  • Optional per-shard buffers for parallel traversal

6. Merge Operator Semantics

  • Merges multiple shard-local streams:
struct merge_buffer {
    uint64_t *artifact_ids;
    uint64_t *tgk_edge_ids;
    uint32_t  *type_tags;
    uint8_t   *roles;
    uint64_t   count;
};
  • Merge algorithm: deterministic heap merge

    1. Compare logseq ascending
    2. Tie-break with canonical ID
  • Ensures same output regardless of shard execution order


7. Tombstone Shadowing

  • Shadowing is applied post-merge:
struct tombstone_state {
    uint64_t canonical_id;
    uint64_t max_logseq_seen;
    uint8_t  is_tombstoned;
};
  • Algorithm:
  1. Iterate merged buffer
  2. For each canonical ID, keep only latest logseq ≤ snapshot
  3. Drop tombstoned or overridden entries
  • Deterministic and snapshot-safe

8. Traversal Expansion with SIMD & Shards

  • Input: TGK edge buffer, shard-local nodes
  • Steps:
  1. Filter edges using SIMD (type, role)

  2. Expand edges to downstream nodes

  3. Append results to depth-sorted buffer

  4. Repeat for depth d if traversal requested

  5. Maintain deterministic order:

    • logseq ascending
    • canonical edge ID tie-breaker

9. Projection & Aggregation Buffers

  • Output buffer for projection:
struct projection_buffer {
    uint64_t *artifact_ids;
    uint64_t *tgk_edge_ids;
    uint64_t *node_ids;
    uint32_t  *type_tags;
    uint64_t   count;
};
  • Aggregation performed in-place or into small accumulator structures:
struct aggregation_accumulator {
    uint64_t count;
    uint64_t sum_type_tag;
    // additional aggregates as needed
};
  • Deterministic due to logseq + canonical ID ordering

10. Parallelism and SIMD Determinism

  • Shard-local parallelism allowed

  • SIMD vectorization allowed

  • Global determinism ensured by:

    1. Per-shard deterministic processing
    2. Deterministic merge of shards
    3. Shadowing/tombstone application post-merge
    4. Logseq + canonical ID ordering preserved
  • This guarantees identical results across runs and nodes


11. Summary of Runtime Buffers

Buffer Contents Purpose
shard_asl_segment artifact IDs, type tags parallel scan/filter
shard_tgk_segment edge IDs, edge type, role parallel scan/filter
tgk_traversal_buffer expanded edges/nodes DAG traversal
merge_buffer merged records deterministic merge across shards
projection_buffer projected fields final output
aggregation_accumulator counts/sums deterministic aggregation
tombstone_state canonical ID + logseq shadowing
  • All buffers are snapshot-bound and immutable where possible
  • SIMD/shard acceleration does not affect determinism

This completes a full runtime execution model, including memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation.


Next steps could be:

  1. Formal operator pseudocode in C, implementing SIMD and shard logic
  2. Memory-mapped index access routines for ENC-ASL-TGK-INDEX
  3. Integration with PEL deterministic DAG execution receipts

Do you want me to draft the operator pseudocode next?