271 lines
7 KiB
Markdown
271 lines
7 KiB
Markdown
|
|
Here’s a **formal draft of the execution plan specification** for the unified ASL + TGK query execution model. It defines operators, data flow, and snapshot semantics in a deterministic, layered way.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Unified Execution Plan Specification (ASL + TGK)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
This specification formalizes **query execution plans** for:
|
|||
|
|
|
|||
|
|
* ASL artifacts (ENC-ASL-CORE)
|
|||
|
|
* TGK edges (ENC-TGK-CORE)
|
|||
|
|
* Merged index references (ENC-ASL-TGK-INDEX)
|
|||
|
|
|
|||
|
|
Goals:
|
|||
|
|
|
|||
|
|
1. Deterministic per snapshot (`logseq`)
|
|||
|
|
2. Respect tombstones and shadowing
|
|||
|
|
3. Leverage filters, sharding, SIMD acceleration
|
|||
|
|
4. Support DAG traversals (TGK edges) and artifact projections
|
|||
|
|
5. Enable formal planning and optimization
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Execution Plan Structure
|
|||
|
|
|
|||
|
|
An execution plan `EP` is a **directed acyclic graph (DAG)** of **operators**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
EP = { nodes: [Op1, Op2, ...], edges: [(Op1→Op2), ...] }
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Node Properties
|
|||
|
|
|
|||
|
|
* `op_id`: unique operator ID
|
|||
|
|
* `op_type`: see Operator Types (Section 3)
|
|||
|
|
* `inputs`: references to upstream operators
|
|||
|
|
* `outputs`: reference streams
|
|||
|
|
* `constraints`: optional filtering conditions
|
|||
|
|
* `snapshot`: logseq limit
|
|||
|
|
* `projections`: requested fields
|
|||
|
|
* `traversal_depth`: optional for TGK expansion
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Operator Types
|
|||
|
|
|
|||
|
|
| Operator | Description |
|
|||
|
|
| ----------------- | --------------------------------------------------------------------------------------- |
|
|||
|
|
| `SegmentScan` | Scans a segment of ENC-ASL-TGK-INDEX, applies advisory filters |
|
|||
|
|
| `IndexFilter` | Applies canonical constraints (artifact type, edge type, role) |
|
|||
|
|
| `Merge` | Deterministically merges multiple streams (logseq ascending, canonical key tie-breaker) |
|
|||
|
|
| `Projection` | Selects output fields from index references |
|
|||
|
|
| `TGKTraversal` | Expands TGK edges from node sets (depth-limited DAG traversal) |
|
|||
|
|
| `Aggregation` | Performs count, sum, union, or other aggregations |
|
|||
|
|
| `LimitOffset` | Applies pagination or top-N selection |
|
|||
|
|
| `ShardDispatch` | Routes records from different shards in parallel, maintaining deterministic order |
|
|||
|
|
| `SIMDFilter` | Parallel filter evaluation for routing keys or type tags |
|
|||
|
|
| `TombstoneShadow` | Applies shadowing to remove tombstoned or overridden records |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Operator Semantics
|
|||
|
|
|
|||
|
|
### 4.1 SegmentScan
|
|||
|
|
|
|||
|
|
* Inputs: segment(s) of ENC-ASL-TGK-INDEX
|
|||
|
|
* Outputs: raw record stream
|
|||
|
|
* Steps:
|
|||
|
|
|
|||
|
|
1. Select segments with `logseq_min ≤ snapshot`
|
|||
|
|
2. Apply **advisory filters** to eliminate records
|
|||
|
|
3. Return record references (artifact_id, tgk_edge_id)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.2 IndexFilter
|
|||
|
|
|
|||
|
|
* Inputs: raw record stream
|
|||
|
|
* Outputs: filtered stream
|
|||
|
|
* Steps:
|
|||
|
|
|
|||
|
|
1. Apply **canonical constraints**:
|
|||
|
|
|
|||
|
|
* Artifact type tag
|
|||
|
|
* Edge type key, role
|
|||
|
|
* Node IDs for TGK edges
|
|||
|
|
2. Drop tombstoned or shadowed records
|
|||
|
|
* Deterministic
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.3 Merge
|
|||
|
|
|
|||
|
|
* Inputs: multiple streams
|
|||
|
|
* Outputs: merged stream
|
|||
|
|
* Sort order:
|
|||
|
|
|
|||
|
|
1. logseq ascending
|
|||
|
|
2. canonical ID tie-breaker
|
|||
|
|
* Deterministic, regardless of input shard order
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.4 Projection
|
|||
|
|
|
|||
|
|
* Inputs: record stream
|
|||
|
|
* Outputs: projected fields
|
|||
|
|
* Steps:
|
|||
|
|
|
|||
|
|
* Select requested fields (artifact_id, tgk_edge_id, node_id, type tags)
|
|||
|
|
* Preserve order
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.5 TGKTraversal
|
|||
|
|
|
|||
|
|
* Inputs: node set or TGK edge references
|
|||
|
|
* Outputs: expanded TGK edge references (DAG traversal)
|
|||
|
|
* Parameters:
|
|||
|
|
|
|||
|
|
* `depth`: max recursion depth
|
|||
|
|
* `snapshot`: logseq cutoff
|
|||
|
|
* `direction`: from/to
|
|||
|
|
* Deterministic traversal:
|
|||
|
|
|
|||
|
|
* logseq ascending per edge
|
|||
|
|
* canonical key tie-breaker
|
|||
|
|
* Optional projection of downstream nodes or artifacts
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.6 Aggregation
|
|||
|
|
|
|||
|
|
* Inputs: record stream
|
|||
|
|
* Outputs: aggregated result
|
|||
|
|
* Examples:
|
|||
|
|
|
|||
|
|
* `COUNT(*)`, `UNION`, `SUM(type_tag)`
|
|||
|
|
* Deterministic: preserves snapshot and logseq ordering
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.7 LimitOffset
|
|||
|
|
|
|||
|
|
* Inputs: record stream
|
|||
|
|
* Outputs: top-N slice
|
|||
|
|
* Deterministic: ordering from upstream merge operator
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.8 ShardDispatch & SIMDFilter
|
|||
|
|
|
|||
|
|
* Inputs: parallel streams from shards
|
|||
|
|
* Outputs: unified stream
|
|||
|
|
* Ensures:
|
|||
|
|
|
|||
|
|
* Deterministic merge order
|
|||
|
|
* SIMD acceleration for type/tag filters
|
|||
|
|
* Filters are advisory; exact canonical check downstream
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.9 TombstoneShadow
|
|||
|
|
|
|||
|
|
* Inputs: record stream
|
|||
|
|
* Outputs: visible records only
|
|||
|
|
* Logic:
|
|||
|
|
|
|||
|
|
* For a given canonical key (artifact or TGK edge):
|
|||
|
|
|
|||
|
|
* Keep only the latest `logseq ≤ snapshot`
|
|||
|
|
* Remove shadowed/tombstoned versions
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Data Flow Example
|
|||
|
|
|
|||
|
|
**Query:** Find all artifacts of type `42` reachable via TGK edges of type `7` from node `N0`, depth 2.
|
|||
|
|
|
|||
|
|
Execution Plan:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
SegmentScan(ASL segments)
|
|||
|
|
→ IndexFilter(type_tag=42)
|
|||
|
|
→ Merge
|
|||
|
|
|
|||
|
|
SegmentScan(TGK segments)
|
|||
|
|
→ IndexFilter(edge_type=7, from_node=N0)
|
|||
|
|
→ TGKTraversal(depth=2)
|
|||
|
|
→ TombstoneShadow
|
|||
|
|
→ Merge
|
|||
|
|
|
|||
|
|
Merge(ASL results, TGK results)
|
|||
|
|
→ Projection(artifact_id, tgk_edge_id, node_id)
|
|||
|
|
→ Aggregation(COUNT)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
* Each operator preserves **snapshot semantics**
|
|||
|
|
* Deterministic order maintained throughout
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Snapshot and Determinism Guarantees
|
|||
|
|
|
|||
|
|
1. **Segment visibility**: `logseq_min ≤ snapshot`
|
|||
|
|
2. **Record visibility**: `logseq ≤ snapshot`
|
|||
|
|
3. **Merge and traversal order**: logseq ascending → canonical key
|
|||
|
|
4. Filters, SIMD, and sharding **cannot alter output**
|
|||
|
|
5. Tombstones guarantee no resurrection of removed records
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Plan Serialization (Optional)
|
|||
|
|
|
|||
|
|
Execution plans can be serialized for:
|
|||
|
|
|
|||
|
|
* Reuse across queries
|
|||
|
|
* Federation / distributed execution
|
|||
|
|
* Deterministic replay
|
|||
|
|
|
|||
|
|
Serialization format:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
struct exec_plan {
|
|||
|
|
uint32_t plan_version;
|
|||
|
|
uint32_t operator_count;
|
|||
|
|
struct operator_def operators[]; // DAG nodes
|
|||
|
|
struct operator_edge edges[]; // DAG edges
|
|||
|
|
};
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
* Each `operator_def` references type, parameters, projections, snapshot
|
|||
|
|
* Each `operator_edge` references upstream → downstream operators
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Normative Invariants
|
|||
|
|
|
|||
|
|
1. Deterministic output per snapshot
|
|||
|
|
2. Core objects (artifact/edge) identity authoritative from CORE layers
|
|||
|
|
3. Filters and acceleration advisory only
|
|||
|
|
4. DAG traversal respects depth and snapshot
|
|||
|
|
5. Aggregation deterministic across parallel shards
|
|||
|
|
6. Execution plan serialization preserves all operator semantics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Summary
|
|||
|
|
|
|||
|
|
This specification formalizes:
|
|||
|
|
|
|||
|
|
* **Operators**, **data flow**, **DAG execution**
|
|||
|
|
* **Snapshot safety** and **determinism**
|
|||
|
|
* **Parallel acceleration** (filters, SIMD, shards)
|
|||
|
|
* **Unified handling** of ASL artifacts and TGK edges
|
|||
|
|
|
|||
|
|
It allows **formal reasoning, optimization, and distributed execution** while maintaining the strict separation between core semantics and index/acceleration layers.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
Next steps could include:
|
|||
|
|
|
|||
|
|
* **C-structs for operator definitions and execution plan serialization**
|
|||
|
|
* **Physical operator implementations mapping to ENC-ASL-TGK-INDEX**
|
|||
|
|
|
|||
|
|
Do you want me to draft those C-struct definitions next?
|
|||
|
|
|