amduat-api/notes/Unified Execution Plan Specification (ASL + TGK).md
2026-01-17 00:19:49 +01:00

7 KiB
Raw Blame History

Heres a formal draft of the execution plan specification for the unified ASL + TGK query execution model. It defines operators, data flow, and snapshot semantics in a deterministic, layered way.


Unified Execution Plan Specification (ASL + TGK)


1. Purpose

This specification formalizes query execution plans for:

  • ASL artifacts (ENC-ASL-CORE)
  • TGK edges (ENC-TGK-CORE)
  • Merged index references (ENC-ASL-TGK-INDEX)

Goals:

  1. Deterministic per snapshot (logseq)
  2. Respect tombstones and shadowing
  3. Leverage filters, sharding, SIMD acceleration
  4. Support DAG traversals (TGK edges) and artifact projections
  5. Enable formal planning and optimization

2. Execution Plan Structure

An execution plan EP is a directed acyclic graph (DAG) of operators:

EP = { nodes: [Op1, Op2, ...], edges: [(Op1→Op2), ...] }

Node Properties

  • op_id: unique operator ID
  • op_type: see Operator Types (Section 3)
  • inputs: references to upstream operators
  • outputs: reference streams
  • constraints: optional filtering conditions
  • snapshot: logseq limit
  • projections: requested fields
  • traversal_depth: optional for TGK expansion

3. Operator Types

Operator Description
SegmentScan Scans a segment of ENC-ASL-TGK-INDEX, applies advisory filters
IndexFilter Applies canonical constraints (artifact type, edge type, role)
Merge Deterministically merges multiple streams (logseq ascending, canonical key tie-breaker)
Projection Selects output fields from index references
TGKTraversal Expands TGK edges from node sets (depth-limited DAG traversal)
Aggregation Performs count, sum, union, or other aggregations
LimitOffset Applies pagination or top-N selection
ShardDispatch Routes records from different shards in parallel, maintaining deterministic order
SIMDFilter Parallel filter evaluation for routing keys or type tags
TombstoneShadow Applies shadowing to remove tombstoned or overridden records

4. Operator Semantics

4.1 SegmentScan

  • Inputs: segment(s) of ENC-ASL-TGK-INDEX

  • Outputs: raw record stream

  • Steps:

    1. Select segments with logseq_min ≤ snapshot
    2. Apply advisory filters to eliminate records
    3. Return record references (artifact_id, tgk_edge_id)

4.2 IndexFilter

  • Inputs: raw record stream

  • Outputs: filtered stream

  • Steps:

    1. Apply canonical constraints:

      • Artifact type tag
      • Edge type key, role
      • Node IDs for TGK edges
    2. Drop tombstoned or shadowed records

  • Deterministic


4.3 Merge

  • Inputs: multiple streams

  • Outputs: merged stream

  • Sort order:

    1. logseq ascending
    2. canonical ID tie-breaker
  • Deterministic, regardless of input shard order


4.4 Projection

  • Inputs: record stream

  • Outputs: projected fields

  • Steps:

    • Select requested fields (artifact_id, tgk_edge_id, node_id, type tags)
    • Preserve order

4.5 TGKTraversal

  • Inputs: node set or TGK edge references

  • Outputs: expanded TGK edge references (DAG traversal)

  • Parameters:

    • depth: max recursion depth
    • snapshot: logseq cutoff
    • direction: from/to
  • Deterministic traversal:

    • logseq ascending per edge
    • canonical key tie-breaker
  • Optional projection of downstream nodes or artifacts


4.6 Aggregation

  • Inputs: record stream

  • Outputs: aggregated result

  • Examples:

    • COUNT(*), UNION, SUM(type_tag)
  • Deterministic: preserves snapshot and logseq ordering


4.7 LimitOffset

  • Inputs: record stream
  • Outputs: top-N slice
  • Deterministic: ordering from upstream merge operator

4.8 ShardDispatch & SIMDFilter

  • Inputs: parallel streams from shards

  • Outputs: unified stream

  • Ensures:

    • Deterministic merge order
    • SIMD acceleration for type/tag filters
    • Filters are advisory; exact canonical check downstream

4.9 TombstoneShadow

  • Inputs: record stream

  • Outputs: visible records only

  • Logic:

    • For a given canonical key (artifact or TGK edge):

      • Keep only the latest logseq ≤ snapshot
      • Remove shadowed/tombstoned versions

5. Data Flow Example

Query: Find all artifacts of type 42 reachable via TGK edges of type 7 from node N0, depth 2.

Execution Plan:

SegmentScan(ASL segments) 
  → IndexFilter(type_tag=42) 
  → Merge

SegmentScan(TGK segments) 
  → IndexFilter(edge_type=7, from_node=N0)
  → TGKTraversal(depth=2)
  → TombstoneShadow
  → Merge

Merge(ASL results, TGK results)
  → Projection(artifact_id, tgk_edge_id, node_id)
  → Aggregation(COUNT)
  • Each operator preserves snapshot semantics
  • Deterministic order maintained throughout

6. Snapshot and Determinism Guarantees

  1. Segment visibility: logseq_min ≤ snapshot
  2. Record visibility: logseq ≤ snapshot
  3. Merge and traversal order: logseq ascending → canonical key
  4. Filters, SIMD, and sharding cannot alter output
  5. Tombstones guarantee no resurrection of removed records

7. Plan Serialization (Optional)

Execution plans can be serialized for:

  • Reuse across queries
  • Federation / distributed execution
  • Deterministic replay

Serialization format:

struct exec_plan {
    uint32_t plan_version;
    uint32_t operator_count;
    struct operator_def operators[];  // DAG nodes
    struct operator_edge edges[];     // DAG edges
};
  • Each operator_def references type, parameters, projections, snapshot
  • Each operator_edge references upstream → downstream operators

8. Normative Invariants

  1. Deterministic output per snapshot
  2. Core objects (artifact/edge) identity authoritative from CORE layers
  3. Filters and acceleration advisory only
  4. DAG traversal respects depth and snapshot
  5. Aggregation deterministic across parallel shards
  6. Execution plan serialization preserves all operator semantics

9. Summary

This specification formalizes:

  • Operators, data flow, DAG execution
  • Snapshot safety and determinism
  • Parallel acceleration (filters, SIMD, shards)
  • Unified handling of ASL artifacts and TGK edges

It allows formal reasoning, optimization, and distributed execution while maintaining the strict separation between core semantics and index/acceleration layers.


Next steps could include:

  • C-structs for operator definitions and execution plan serialization
  • Physical operator implementations mapping to ENC-ASL-TGK-INDEX

Do you want me to draft those C-struct definitions next?