Carl Niklas Rydberg 1d552bd46a Added some notes that needs to be analyzed.

2026-01-17 00:19:49 +01:00

7 KiB

Raw Blame History

Here’s a formal draft of the execution plan specification for the unified ASL + TGK query execution model. It defines operators, data flow, and snapshot semantics in a deterministic, layered way.

Unified Execution Plan Specification (ASL + TGK)

1. Purpose

This specification formalizes query execution plans for:

ASL artifacts (ENC-ASL-CORE)
TGK edges (ENC-TGK-CORE)
Merged index references (ENC-ASL-TGK-INDEX)

Goals:

Deterministic per snapshot (logseq)
Respect tombstones and shadowing
Leverage filters, sharding, SIMD acceleration
Support DAG traversals (TGK edges) and artifact projections
Enable formal planning and optimization

2. Execution Plan Structure

An execution plan EP is a directed acyclic graph (DAG) of operators:

EP = { nodes: [Op1, Op2, ...], edges: [(Op1→Op2), ...] }

Node Properties

op_id: unique operator ID
op_type: see Operator Types (Section 3)
inputs: references to upstream operators
outputs: reference streams
constraints: optional filtering conditions
snapshot: logseq limit
projections: requested fields
traversal_depth: optional for TGK expansion

3. Operator Types

Operator	Description
`SegmentScan`	Scans a segment of ENC-ASL-TGK-INDEX, applies advisory filters
`IndexFilter`	Applies canonical constraints (artifact type, edge type, role)
`Merge`	Deterministically merges multiple streams (logseq ascending, canonical key tie-breaker)
`Projection`	Selects output fields from index references
`TGKTraversal`	Expands TGK edges from node sets (depth-limited DAG traversal)
`Aggregation`	Performs count, sum, union, or other aggregations
`LimitOffset`	Applies pagination or top-N selection
`ShardDispatch`	Routes records from different shards in parallel, maintaining deterministic order
`SIMDFilter`	Parallel filter evaluation for routing keys or type tags
`TombstoneShadow`	Applies shadowing to remove tombstoned or overridden records

4. Operator Semantics

4.1 SegmentScan

Inputs: segment(s) of ENC-ASL-TGK-INDEX
Outputs: raw record stream
Steps:
1. Select segments with logseq_min ≤ snapshot
2. Apply advisory filters to eliminate records
3. Return record references (artifact_id, tgk_edge_id)

4.2 IndexFilter

Inputs: raw record stream
Outputs: filtered stream
Steps:
1. Apply canonical constraints:
  - Artifact type tag
  - Edge type key, role
  - Node IDs for TGK edges
2. Drop tombstoned or shadowed records
Deterministic

4.3 Merge

Inputs: multiple streams
Outputs: merged stream
Sort order:
1. logseq ascending
2. canonical ID tie-breaker
Deterministic, regardless of input shard order

4.4 Projection

Inputs: record stream
Outputs: projected fields
Steps:
- Select requested fields (artifact_id, tgk_edge_id, node_id, type tags)
- Preserve order

4.5 TGKTraversal

Inputs: node set or TGK edge references
Outputs: expanded TGK edge references (DAG traversal)
Parameters:
- depth: max recursion depth
- snapshot: logseq cutoff
- direction: from/to
Deterministic traversal:
- logseq ascending per edge
- canonical key tie-breaker
Optional projection of downstream nodes or artifacts

4.6 Aggregation

Inputs: record stream
Outputs: aggregated result
Examples:
- COUNT(*), UNION, SUM(type_tag)
Deterministic: preserves snapshot and logseq ordering

4.7 LimitOffset

Inputs: record stream
Outputs: top-N slice
Deterministic: ordering from upstream merge operator

4.8 ShardDispatch & SIMDFilter

Inputs: parallel streams from shards
Outputs: unified stream
Ensures:
- Deterministic merge order
- SIMD acceleration for type/tag filters
- Filters are advisory; exact canonical check downstream

4.9 TombstoneShadow

Inputs: record stream
Outputs: visible records only
Logic:
- For a given canonical key (artifact or TGK edge):
  - Keep only the latest logseq ≤ snapshot
  - Remove shadowed/tombstoned versions

5. Data Flow Example

Query: Find all artifacts of type 42 reachable via TGK edges of type 7 from node N0, depth 2.

Execution Plan:

SegmentScan(ASL segments) 
  → IndexFilter(type_tag=42) 
  → Merge

SegmentScan(TGK segments) 
  → IndexFilter(edge_type=7, from_node=N0)
  → TGKTraversal(depth=2)
  → TombstoneShadow
  → Merge

Merge(ASL results, TGK results)
  → Projection(artifact_id, tgk_edge_id, node_id)
  → Aggregation(COUNT)

Each operator preserves snapshot semantics
Deterministic order maintained throughout

6. Snapshot and Determinism Guarantees

Segment visibility: logseq_min ≤ snapshot
Record visibility: logseq ≤ snapshot
Merge and traversal order: logseq ascending → canonical key
Filters, SIMD, and sharding cannot alter output
Tombstones guarantee no resurrection of removed records

7. Plan Serialization (Optional)

Execution plans can be serialized for:

Reuse across queries
Federation / distributed execution
Deterministic replay

Serialization format:

struct exec_plan {
    uint32_t plan_version;
    uint32_t operator_count;
    struct operator_def operators[];  // DAG nodes
    struct operator_edge edges[];     // DAG edges
};

Each operator_def references type, parameters, projections, snapshot
Each operator_edge references upstream → downstream operators

8. Normative Invariants

Deterministic output per snapshot
Core objects (artifact/edge) identity authoritative from CORE layers
Filters and acceleration advisory only
DAG traversal respects depth and snapshot
Aggregation deterministic across parallel shards
Execution plan serialization preserves all operator semantics

9. Summary

This specification formalizes:

Operators, data flow, DAG execution
Snapshot safety and determinism
Parallel acceleration (filters, SIMD, shards)
Unified handling of ASL artifacts and TGK edges

It allows formal reasoning, optimization, and distributed execution while maintaining the strict separation between core semantics and index/acceleration layers.

Next steps could include:

C-structs for operator definitions and execution plan serialization
Physical operator implementations mapping to ENC-ASL-TGK-INDEX

Do you want me to draft those C-struct definitions next?

7 KiB Raw Blame History Unescape Escape