amduat-api/notes/Unified Execution Plan Specification (ASL + TGK).md

Here’s a **formal draft of the execution plan specification** for the unified ASL + TGK query execution model. It defines operators, data flow, and snapshot semantics in a deterministic, layered way.

---

# Unified Execution Plan Specification (ASL + TGK)

---

## 1. Purpose

This specification formalizes **query execution plans** for:

* ASL artifacts (ENC-ASL-CORE)
* TGK edges (ENC-TGK-CORE)
* Merged index references (ENC-ASL-TGK-INDEX)

Goals:

1. Deterministic per snapshot (`logseq`)
2. Respect tombstones and shadowing
3. Leverage filters, sharding, SIMD acceleration
4. Support DAG traversals (TGK edges) and artifact projections
5. Enable formal planning and optimization

---

## 2. Execution Plan Structure

An execution plan `EP` is a **directed acyclic graph (DAG)** of **operators**:

```
EP = { nodes: [Op1, Op2, ...], edges: [(Op1→Op2), ...] }
```

### Node Properties

* `op_id`: unique operator ID
* `op_type`: see Operator Types (Section 3)
* `inputs`: references to upstream operators
* `outputs`: reference streams
* `constraints`: optional filtering conditions
* `snapshot`: logseq limit
* `projections`: requested fields
* `traversal_depth`: optional for TGK expansion

---

## 3. Operator Types

| Operator          | Description                                                                             |
| ----------------- | --------------------------------------------------------------------------------------- |
| `SegmentScan`     | Scans a segment of ENC-ASL-TGK-INDEX, applies advisory filters                          |
| `IndexFilter`     | Applies canonical constraints (artifact type, edge type, role)                          |
| `Merge`           | Deterministically merges multiple streams (logseq ascending, canonical key tie-breaker) |
| `Projection`      | Selects output fields from index references                                             |
| `TGKTraversal`    | Expands TGK edges from node sets (depth-limited DAG traversal)                          |
| `Aggregation`     | Performs count, sum, union, or other aggregations                                       |
| `LimitOffset`     | Applies pagination or top-N selection                                                   |
| `ShardDispatch`   | Routes records from different shards in parallel, maintaining deterministic order       |
| `SIMDFilter`      | Parallel filter evaluation for routing keys or type tags                                |
| `TombstoneShadow` | Applies shadowing to remove tombstoned or overridden records                            |

---

## 4. Operator Semantics

### 4.1 SegmentScan

* Inputs: segment(s) of ENC-ASL-TGK-INDEX
* Outputs: raw record stream
* Steps:

  1. Select segments with `logseq_min ≤ snapshot`
  2. Apply **advisory filters** to eliminate records
  3. Return record references (artifact_id, tgk_edge_id)

---

### 4.2 IndexFilter

* Inputs: raw record stream
* Outputs: filtered stream
* Steps:

  1. Apply **canonical constraints**:

     * Artifact type tag
     * Edge type key, role
     * Node IDs for TGK edges
  2. Drop tombstoned or shadowed records
* Deterministic

---

### 4.3 Merge

* Inputs: multiple streams
* Outputs: merged stream
* Sort order:

  1. logseq ascending
  2. canonical ID tie-breaker
* Deterministic, regardless of input shard order

---

### 4.4 Projection

* Inputs: record stream
* Outputs: projected fields
* Steps:

  * Select requested fields (artifact_id, tgk_edge_id, node_id, type tags)
  * Preserve order

---

### 4.5 TGKTraversal

* Inputs: node set or TGK edge references
* Outputs: expanded TGK edge references (DAG traversal)
* Parameters:

  * `depth`: max recursion depth
  * `snapshot`: logseq cutoff
  * `direction`: from/to
* Deterministic traversal:

  * logseq ascending per edge
  * canonical key tie-breaker
* Optional projection of downstream nodes or artifacts

---

### 4.6 Aggregation

* Inputs: record stream
* Outputs: aggregated result
* Examples:

  * `COUNT(*)`, `UNION`, `SUM(type_tag)`
* Deterministic: preserves snapshot and logseq ordering

---

### 4.7 LimitOffset

* Inputs: record stream
* Outputs: top-N slice
* Deterministic: ordering from upstream merge operator

---

### 4.8 ShardDispatch & SIMDFilter

* Inputs: parallel streams from shards
* Outputs: unified stream
* Ensures:

  * Deterministic merge order
  * SIMD acceleration for type/tag filters
  * Filters are advisory; exact canonical check downstream

---

### 4.9 TombstoneShadow

* Inputs: record stream
* Outputs: visible records only
* Logic:

  * For a given canonical key (artifact or TGK edge):

    * Keep only the latest `logseq ≤ snapshot`
    * Remove shadowed/tombstoned versions

---

## 5. Data Flow Example

**Query:** Find all artifacts of type `42` reachable via TGK edges of type `7` from node `N0`, depth 2.

Execution Plan:

```
SegmentScan(ASL segments)
  → IndexFilter(type_tag=42)
  → Merge

SegmentScan(TGK segments)
  → IndexFilter(edge_type=7, from_node=N0)
  → TGKTraversal(depth=2)
  → TombstoneShadow
  → Merge

Merge(ASL results, TGK results)
  → Projection(artifact_id, tgk_edge_id, node_id)
  → Aggregation(COUNT)
```

* Each operator preserves **snapshot semantics**
* Deterministic order maintained throughout

---

## 6. Snapshot and Determinism Guarantees

1. **Segment visibility**: `logseq_min ≤ snapshot`
2. **Record visibility**: `logseq ≤ snapshot`
3. **Merge and traversal order**: logseq ascending → canonical key
4. Filters, SIMD, and sharding **cannot alter output**
5. Tombstones guarantee no resurrection of removed records

---

## 7. Plan Serialization (Optional)

Execution plans can be serialized for:

* Reuse across queries
* Federation / distributed execution
* Deterministic replay

Serialization format:

```c
struct exec_plan {
    uint32_t plan_version;
    uint32_t operator_count;
    struct operator_def operators[];  // DAG nodes
    struct operator_edge edges[];     // DAG edges
};
```

* Each `operator_def` references type, parameters, projections, snapshot
* Each `operator_edge` references upstream → downstream operators

---

## 8. Normative Invariants

1. Deterministic output per snapshot
2. Core objects (artifact/edge) identity authoritative from CORE layers
3. Filters and acceleration advisory only
4. DAG traversal respects depth and snapshot
5. Aggregation deterministic across parallel shards
6. Execution plan serialization preserves all operator semantics

---

## 9. Summary

This specification formalizes:

* **Operators**, **data flow**, **DAG execution**
* **Snapshot safety** and **determinism**
* **Parallel acceleration** (filters, SIMD, shards)
* **Unified handling** of ASL artifacts and TGK edges

It allows **formal reasoning, optimization, and distributed execution** while maintaining the strict separation between core semantics and index/acceleration layers.

---

Next steps could include:

* **C-structs for operator definitions and execution plan serialization**
* **Physical operator implementations mapping to ENC-ASL-TGK-INDEX**

Do you want me to draft those C-struct definitions next?