6.5 KiB
Absolutely — here’s a formal draft for runtime execution semantics and memory layout for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on parallel, deterministic, and snapshot-safe execution.
Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration
1. Purpose
This specification defines:
- How operators in an execution plan are executed in memory
- How shards, SIMD, and filters are applied efficiently
- Determinism guarantees per snapshot
- Memory layout for index scans, filter evaluation, and traversal expansion
It is fully compatible with:
- ENC-ASL-TGK-INDEX
- Merged ASL + TGK query execution plan
- C-struct operator definitions
2. Memory Layout Principles
- Immutable segments: Index segments are read-only during execution
- Shard-local buffers: Each shard stores a segment of records in contiguous memory
- SIMD key arrays: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
- Canonical references: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
- Traversal buffers: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism
3. Segment Loading and Sharding
- Each index segment is assigned to a shard based on routing key hash
- Segment header is mapped into memory; record arrays are memory-mapped if needed
- For ASL artifacts:
struct shard_asl_segment {
uint64_t *artifact_ids; // 64-bit canonical IDs
uint32_t *type_tags; // optional type tags
uint8_t *has_type_tag; // flags
uint64_t record_count;
};
- For TGK edges:
struct shard_tgk_segment {
uint64_t *tgk_edge_ids; // canonical TGK-CORE references
uint32_t *edge_type_keys;
uint8_t *has_edge_type;
uint8_t *roles; // from/to/both
uint64_t record_count;
};
- Shard-local buffers allow parallel SIMD evaluation without inter-shard contention
4. SIMD-Accelerated Filter Evaluation
-
SIMD applies vectorized comparison of:
- Artifact type tags
- Edge type keys
- Routing keys (pre-hashed)
-
Example pseudo-code (AVX2):
for (i = 0; i < record_count; i += SIMD_WIDTH) {
simd_load(type_tag[i:i+SIMD_WIDTH])
simd_cmp(type_tag_filter)
simd_mask_store(pass_mask, output_buffer)
}
- Determinism guaranteed by maintaining original order after filtering (logseq ascending + canonical ID tie-breaker)
5. Traversal Buffer Semantics (TGK)
- TGKTraversal operator maintains:
struct tgk_traversal_buffer {
uint64_t *edge_ids; // expanded edges
uint64_t *node_ids; // corresponding nodes
uint32_t depth; // current traversal depth
uint64_t count; // number of records in buffer
};
- Buffers are logseq-sorted per depth to preserve deterministic traversal
- Optional per-shard buffers for parallel traversal
6. Merge Operator Semantics
- Merges multiple shard-local streams:
struct merge_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint32_t *type_tags;
uint8_t *roles;
uint64_t count;
};
-
Merge algorithm: deterministic heap merge
- Compare
logseqascending - Tie-break with canonical ID
- Compare
-
Ensures same output regardless of shard execution order
7. Tombstone Shadowing
- Shadowing is applied post-merge:
struct tombstone_state {
uint64_t canonical_id;
uint64_t max_logseq_seen;
uint8_t is_tombstoned;
};
- Algorithm:
- Iterate merged buffer
- For each canonical ID, keep only latest logseq ≤ snapshot
- Drop tombstoned or overridden entries
- Deterministic and snapshot-safe
8. Traversal Expansion with SIMD & Shards
- Input: TGK edge buffer, shard-local nodes
- Steps:
-
Filter edges using SIMD (type, role)
-
Expand edges to downstream nodes
-
Append results to depth-sorted buffer
-
Repeat for depth
dif traversal requested -
Maintain deterministic order:
- logseq ascending
- canonical edge ID tie-breaker
9. Projection & Aggregation Buffers
- Output buffer for projection:
struct projection_buffer {
uint64_t *artifact_ids;
uint64_t *tgk_edge_ids;
uint64_t *node_ids;
uint32_t *type_tags;
uint64_t count;
};
- Aggregation performed in-place or into small accumulator structures:
struct aggregation_accumulator {
uint64_t count;
uint64_t sum_type_tag;
// additional aggregates as needed
};
- Deterministic due to logseq + canonical ID ordering
10. Parallelism and SIMD Determinism
-
Shard-local parallelism allowed
-
SIMD vectorization allowed
-
Global determinism ensured by:
- Per-shard deterministic processing
- Deterministic merge of shards
- Shadowing/tombstone application post-merge
- Logseq + canonical ID ordering preserved
-
This guarantees identical results across runs and nodes
11. Summary of Runtime Buffers
| Buffer | Contents | Purpose |
|---|---|---|
shard_asl_segment |
artifact IDs, type tags | parallel scan/filter |
shard_tgk_segment |
edge IDs, edge type, role | parallel scan/filter |
tgk_traversal_buffer |
expanded edges/nodes | DAG traversal |
merge_buffer |
merged records | deterministic merge across shards |
projection_buffer |
projected fields | final output |
aggregation_accumulator |
counts/sums | deterministic aggregation |
tombstone_state |
canonical ID + logseq | shadowing |
- All buffers are snapshot-bound and immutable where possible
- SIMD/shard acceleration does not affect determinism
This completes a full runtime execution model, including memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation.
Next steps could be:
- Formal operator pseudocode in C, implementing SIMD and shard logic
- Memory-mapped index access routines for ENC-ASL-TGK-INDEX
- Integration with PEL deterministic DAG execution receipts
Do you want me to draft the operator pseudocode next?