Carl Niklas Rydberg 1d552bd46a Added some notes that needs to be analyzed.

2026-01-17 00:19:49 +01:00

6.5 KiB

Raw Blame History

Absolutely — here’s a formal draft for runtime execution semantics and memory layout for SIMD/shard acceleration of the unified execution plan over ENC-ASL-TGK-INDEX. This focuses on parallel, deterministic, and snapshot-safe execution.

Runtime Execution Semantics & Memory Layout for SIMD/Shard Acceleration

1. Purpose

This specification defines:

How operators in an execution plan are executed in memory
How shards, SIMD, and filters are applied efficiently
Determinism guarantees per snapshot
Memory layout for index scans, filter evaluation, and traversal expansion

It is fully compatible with:

ENC-ASL-TGK-INDEX
Merged ASL + TGK query execution plan
C-struct operator definitions

2. Memory Layout Principles

Immutable segments: Index segments are read-only during execution
Shard-local buffers: Each shard stores a segment of records in contiguous memory
SIMD key arrays: Routing keys, type tags, and edge type keys are stored in contiguous SIMD-aligned arrays for fast vectorized evaluation
Canonical references: artifact IDs and TGK edge IDs are stored in 64-bit aligned arrays for deterministic access
Traversal buffers: TGK traversal outputs are stored in logseq-sorted buffers to preserve determinism

3. Segment Loading and Sharding

Each index segment is assigned to a shard based on routing key hash
Segment header is mapped into memory; record arrays are memory-mapped if needed
For ASL artifacts:

struct shard_asl_segment {
    uint64_t *artifact_ids;       // 64-bit canonical IDs
    uint32_t *type_tags;          // optional type tags
    uint8_t  *has_type_tag;       // flags
    uint64_t record_count;
};

For TGK edges:

struct shard_tgk_segment {
    uint64_t *tgk_edge_ids;       // canonical TGK-CORE references
    uint32_t *edge_type_keys;
    uint8_t  *has_edge_type;
    uint8_t  *roles;              // from/to/both
    uint64_t record_count;
};

Shard-local buffers allow parallel SIMD evaluation without inter-shard contention

4. SIMD-Accelerated Filter Evaluation

SIMD applies vectorized comparison of:
- Artifact type tags
- Edge type keys
- Routing keys (pre-hashed)
Example pseudo-code (AVX2):

for (i = 0; i < record_count; i += SIMD_WIDTH) {
    simd_load(type_tag[i:i+SIMD_WIDTH])
    simd_cmp(type_tag_filter)
    simd_mask_store(pass_mask, output_buffer)
}

Determinism guaranteed by maintaining original order after filtering (logseq ascending + canonical ID tie-breaker)

5. Traversal Buffer Semantics (TGK)

TGKTraversal operator maintains:

struct tgk_traversal_buffer {
    uint64_t *edge_ids;        // expanded edges
    uint64_t *node_ids;        // corresponding nodes
    uint32_t  depth;           // current traversal depth
    uint64_t count;            // number of records in buffer
};

Buffers are logseq-sorted per depth to preserve deterministic traversal
Optional per-shard buffers for parallel traversal

6. Merge Operator Semantics

Merges multiple shard-local streams:

struct merge_buffer {
    uint64_t *artifact_ids;
    uint64_t *tgk_edge_ids;
    uint32_t  *type_tags;
    uint8_t   *roles;
    uint64_t   count;
};

Merge algorithm: deterministic heap merge
1. Compare logseq ascending
2. Tie-break with canonical ID
Ensures same output regardless of shard execution order

7. Tombstone Shadowing

Shadowing is applied post-merge:

struct tombstone_state {
    uint64_t canonical_id;
    uint64_t max_logseq_seen;
    uint8_t  is_tombstoned;
};

Algorithm:

Iterate merged buffer
For each canonical ID, keep only latest logseq ≤ snapshot
Drop tombstoned or overridden entries

Deterministic and snapshot-safe

8. Traversal Expansion with SIMD & Shards

Input: TGK edge buffer, shard-local nodes
Steps:

Filter edges using SIMD (type, role)
Expand edges to downstream nodes
Append results to depth-sorted buffer
Repeat for depth d if traversal requested
Maintain deterministic order:
- logseq ascending
- canonical edge ID tie-breaker

9. Projection & Aggregation Buffers

Output buffer for projection:

struct projection_buffer {
    uint64_t *artifact_ids;
    uint64_t *tgk_edge_ids;
    uint64_t *node_ids;
    uint32_t  *type_tags;
    uint64_t   count;
};

Aggregation performed in-place or into small accumulator structures:

struct aggregation_accumulator {
    uint64_t count;
    uint64_t sum_type_tag;
    // additional aggregates as needed
};

Deterministic due to logseq + canonical ID ordering

10. Parallelism and SIMD Determinism

Shard-local parallelism allowed
SIMD vectorization allowed
Global determinism ensured by:
1. Per-shard deterministic processing
2. Deterministic merge of shards
3. Shadowing/tombstone application post-merge
4. Logseq + canonical ID ordering preserved
This guarantees identical results across runs and nodes

11. Summary of Runtime Buffers

Buffer	Contents	Purpose
`shard_asl_segment`	artifact IDs, type tags	parallel scan/filter
`shard_tgk_segment`	edge IDs, edge type, role	parallel scan/filter
`tgk_traversal_buffer`	expanded edges/nodes	DAG traversal
`merge_buffer`	merged records	deterministic merge across shards
`projection_buffer`	projected fields	final output
`aggregation_accumulator`	counts/sums	deterministic aggregation
`tombstone_state`	canonical ID + logseq	shadowing

All buffers are snapshot-bound and immutable where possible
SIMD/shard acceleration does not affect determinism

This completes a full runtime execution model, including memory layout, SIMD/shard acceleration, traversal buffers, merge logic, tombstone handling, and projection/aggregation.

Next steps could be:

Formal operator pseudocode in C, implementing SIMD and shard logic
Memory-mapped index access routines for ENC-ASL-TGK-INDEX
Integration with PEL deterministic DAG execution receipts

Do you want me to draft the operator pseudocode next?

6.5 KiB Raw Blame History Unescape Escape