271 lines
7 KiB
Markdown
271 lines
7 KiB
Markdown
Here’s a **formal draft of the execution plan specification** for the unified ASL + TGK query execution model. It defines operators, data flow, and snapshot semantics in a deterministic, layered way.
|
||
|
||
---
|
||
|
||
# Unified Execution Plan Specification (ASL + TGK)
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This specification formalizes **query execution plans** for:
|
||
|
||
* ASL artifacts (ENC-ASL-CORE)
|
||
* TGK edges (ENC-TGK-CORE)
|
||
* Merged index references (ENC-ASL-TGK-INDEX)
|
||
|
||
Goals:
|
||
|
||
1. Deterministic per snapshot (`logseq`)
|
||
2. Respect tombstones and shadowing
|
||
3. Leverage filters, sharding, SIMD acceleration
|
||
4. Support DAG traversals (TGK edges) and artifact projections
|
||
5. Enable formal planning and optimization
|
||
|
||
---
|
||
|
||
## 2. Execution Plan Structure
|
||
|
||
An execution plan `EP` is a **directed acyclic graph (DAG)** of **operators**:
|
||
|
||
```
|
||
EP = { nodes: [Op1, Op2, ...], edges: [(Op1→Op2), ...] }
|
||
```
|
||
|
||
### Node Properties
|
||
|
||
* `op_id`: unique operator ID
|
||
* `op_type`: see Operator Types (Section 3)
|
||
* `inputs`: references to upstream operators
|
||
* `outputs`: reference streams
|
||
* `constraints`: optional filtering conditions
|
||
* `snapshot`: logseq limit
|
||
* `projections`: requested fields
|
||
* `traversal_depth`: optional for TGK expansion
|
||
|
||
---
|
||
|
||
## 3. Operator Types
|
||
|
||
| Operator | Description |
|
||
| ----------------- | --------------------------------------------------------------------------------------- |
|
||
| `SegmentScan` | Scans a segment of ENC-ASL-TGK-INDEX, applies advisory filters |
|
||
| `IndexFilter` | Applies canonical constraints (artifact type, edge type, role) |
|
||
| `Merge` | Deterministically merges multiple streams (logseq ascending, canonical key tie-breaker) |
|
||
| `Projection` | Selects output fields from index references |
|
||
| `TGKTraversal` | Expands TGK edges from node sets (depth-limited DAG traversal) |
|
||
| `Aggregation` | Performs count, sum, union, or other aggregations |
|
||
| `LimitOffset` | Applies pagination or top-N selection |
|
||
| `ShardDispatch` | Routes records from different shards in parallel, maintaining deterministic order |
|
||
| `SIMDFilter` | Parallel filter evaluation for routing keys or type tags |
|
||
| `TombstoneShadow` | Applies shadowing to remove tombstoned or overridden records |
|
||
|
||
---
|
||
|
||
## 4. Operator Semantics
|
||
|
||
### 4.1 SegmentScan
|
||
|
||
* Inputs: segment(s) of ENC-ASL-TGK-INDEX
|
||
* Outputs: raw record stream
|
||
* Steps:
|
||
|
||
1. Select segments with `logseq_min ≤ snapshot`
|
||
2. Apply **advisory filters** to eliminate records
|
||
3. Return record references (artifact_id, tgk_edge_id)
|
||
|
||
---
|
||
|
||
### 4.2 IndexFilter
|
||
|
||
* Inputs: raw record stream
|
||
* Outputs: filtered stream
|
||
* Steps:
|
||
|
||
1. Apply **canonical constraints**:
|
||
|
||
* Artifact type tag
|
||
* Edge type key, role
|
||
* Node IDs for TGK edges
|
||
2. Drop tombstoned or shadowed records
|
||
* Deterministic
|
||
|
||
---
|
||
|
||
### 4.3 Merge
|
||
|
||
* Inputs: multiple streams
|
||
* Outputs: merged stream
|
||
* Sort order:
|
||
|
||
1. logseq ascending
|
||
2. canonical ID tie-breaker
|
||
* Deterministic, regardless of input shard order
|
||
|
||
---
|
||
|
||
### 4.4 Projection
|
||
|
||
* Inputs: record stream
|
||
* Outputs: projected fields
|
||
* Steps:
|
||
|
||
* Select requested fields (artifact_id, tgk_edge_id, node_id, type tags)
|
||
* Preserve order
|
||
|
||
---
|
||
|
||
### 4.5 TGKTraversal
|
||
|
||
* Inputs: node set or TGK edge references
|
||
* Outputs: expanded TGK edge references (DAG traversal)
|
||
* Parameters:
|
||
|
||
* `depth`: max recursion depth
|
||
* `snapshot`: logseq cutoff
|
||
* `direction`: from/to
|
||
* Deterministic traversal:
|
||
|
||
* logseq ascending per edge
|
||
* canonical key tie-breaker
|
||
* Optional projection of downstream nodes or artifacts
|
||
|
||
---
|
||
|
||
### 4.6 Aggregation
|
||
|
||
* Inputs: record stream
|
||
* Outputs: aggregated result
|
||
* Examples:
|
||
|
||
* `COUNT(*)`, `UNION`, `SUM(type_tag)`
|
||
* Deterministic: preserves snapshot and logseq ordering
|
||
|
||
---
|
||
|
||
### 4.7 LimitOffset
|
||
|
||
* Inputs: record stream
|
||
* Outputs: top-N slice
|
||
* Deterministic: ordering from upstream merge operator
|
||
|
||
---
|
||
|
||
### 4.8 ShardDispatch & SIMDFilter
|
||
|
||
* Inputs: parallel streams from shards
|
||
* Outputs: unified stream
|
||
* Ensures:
|
||
|
||
* Deterministic merge order
|
||
* SIMD acceleration for type/tag filters
|
||
* Filters are advisory; exact canonical check downstream
|
||
|
||
---
|
||
|
||
### 4.9 TombstoneShadow
|
||
|
||
* Inputs: record stream
|
||
* Outputs: visible records only
|
||
* Logic:
|
||
|
||
* For a given canonical key (artifact or TGK edge):
|
||
|
||
* Keep only the latest `logseq ≤ snapshot`
|
||
* Remove shadowed/tombstoned versions
|
||
|
||
---
|
||
|
||
## 5. Data Flow Example
|
||
|
||
**Query:** Find all artifacts of type `42` reachable via TGK edges of type `7` from node `N0`, depth 2.
|
||
|
||
Execution Plan:
|
||
|
||
```
|
||
SegmentScan(ASL segments)
|
||
→ IndexFilter(type_tag=42)
|
||
→ Merge
|
||
|
||
SegmentScan(TGK segments)
|
||
→ IndexFilter(edge_type=7, from_node=N0)
|
||
→ TGKTraversal(depth=2)
|
||
→ TombstoneShadow
|
||
→ Merge
|
||
|
||
Merge(ASL results, TGK results)
|
||
→ Projection(artifact_id, tgk_edge_id, node_id)
|
||
→ Aggregation(COUNT)
|
||
```
|
||
|
||
* Each operator preserves **snapshot semantics**
|
||
* Deterministic order maintained throughout
|
||
|
||
---
|
||
|
||
## 6. Snapshot and Determinism Guarantees
|
||
|
||
1. **Segment visibility**: `logseq_min ≤ snapshot`
|
||
2. **Record visibility**: `logseq ≤ snapshot`
|
||
3. **Merge and traversal order**: logseq ascending → canonical key
|
||
4. Filters, SIMD, and sharding **cannot alter output**
|
||
5. Tombstones guarantee no resurrection of removed records
|
||
|
||
---
|
||
|
||
## 7. Plan Serialization (Optional)
|
||
|
||
Execution plans can be serialized for:
|
||
|
||
* Reuse across queries
|
||
* Federation / distributed execution
|
||
* Deterministic replay
|
||
|
||
Serialization format:
|
||
|
||
```c
|
||
struct exec_plan {
|
||
uint32_t plan_version;
|
||
uint32_t operator_count;
|
||
struct operator_def operators[]; // DAG nodes
|
||
struct operator_edge edges[]; // DAG edges
|
||
};
|
||
```
|
||
|
||
* Each `operator_def` references type, parameters, projections, snapshot
|
||
* Each `operator_edge` references upstream → downstream operators
|
||
|
||
---
|
||
|
||
## 8. Normative Invariants
|
||
|
||
1. Deterministic output per snapshot
|
||
2. Core objects (artifact/edge) identity authoritative from CORE layers
|
||
3. Filters and acceleration advisory only
|
||
4. DAG traversal respects depth and snapshot
|
||
5. Aggregation deterministic across parallel shards
|
||
6. Execution plan serialization preserves all operator semantics
|
||
|
||
---
|
||
|
||
## 9. Summary
|
||
|
||
This specification formalizes:
|
||
|
||
* **Operators**, **data flow**, **DAG execution**
|
||
* **Snapshot safety** and **determinism**
|
||
* **Parallel acceleration** (filters, SIMD, shards)
|
||
* **Unified handling** of ASL artifacts and TGK edges
|
||
|
||
It allows **formal reasoning, optimization, and distributed execution** while maintaining the strict separation between core semantics and index/acceleration layers.
|
||
|
||
---
|
||
|
||
Next steps could include:
|
||
|
||
* **C-structs for operator definitions and execution plan serialization**
|
||
* **Physical operator implementations mapping to ENC-ASL-TGK-INDEX**
|
||
|
||
Do you want me to draft those C-struct definitions next?
|
||
|