amduat-api/notes/Unified Query Execution Model (ASL + TGK).md

180 lines
4.7 KiB
Markdown
Raw Normal View History

NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
Absolutely — heres a **draft for a unified query execution model** over the merged ENC-ASL-TGK-INDEX. It treats ASL artifacts and TGK edges uniformly for queries, respects snapshots, and preserves determinism.
---
# Unified Query Execution Model (ASL + TGK)
---
## 1. Purpose
This model defines how **queries over ASL artifacts and TGK edges** are executed:
* Snapshot-safe
* Deterministic per log sequence
* Able to leverage acceleration structures (filters, routing, SIMD)
* Able to support DAG program projections and trace graph traversals
It does **not** redefine core semantics:
* ENC-ASL-CORE defines artifacts
* ENC-TGK-CORE defines edges
* ENC-ASL-TGK-INDEX defines references and acceleration
---
## 2. Query Abstraction
A **query** Q is defined as:
```
Q = {
snapshot: S,
constraints: C, // filters on artifacts, edges, or nodes
projections: P, // select returned fields
traversal: optional, // TGK edge expansion
aggregation: optional // count, union, etc.
}
```
* **snapshot**: the log sequence cutoff
* **constraints**: logical predicate over index fields (artifact type, edge type, node ID)
* **projections**: the output columns
* **traversal**: optional TGK graph expansion
* **aggregation**: optional summarization
---
## 3. Execution Stages
### 3.1 Index Scan
1. Determine **segments visible** for snapshot `S`
2. For each segment:
* Use **filters** to eliminate segments/records (advisory)
* Decode **ASL artifact references** and **TGK edge references**
* Skip tombstoned or shadowed records
### 3.2 Constraint Evaluation
* Evaluate **canonical constraints**:
* Artifact ID, type tag
* Edge ID, edge type, role
* Node ID (from/to)
* Filters are advisory; exact check required
### 3.3 Traversal Expansion (Optional)
For TGK edges:
1. Expand edges from a set of nodes
2. Apply **snapshot constraints** to prevent including edges outside S
3. Produce DAG projections or downstream artifact IDs
### 3.4 Projection and Aggregation
* Apply **projection fields** as requested
* Optionally aggregate or reduce results
* Maintain **deterministic order** by logseq ascending, then canonical key
---
## 4. Routing and SIMD Acceleration
* SIMD may evaluate **multiple routing keys in parallel**
* Routing keys are precomputed in ENC-ASL-TGK-INDEX optional sections
* Acceleration **cannot change semantics**
* Parallel scans **must be deterministic**: order of records in output = logseq + canonical key
---
## 5. Snapshot Semantics
* Segment is visible if `segment.logseq_min ≤ S`
* Record is visible if `record.logseq ≤ S`
* Tombstones shadow earlier records
* Deterministic filtering required
---
## 6. Traversal Semantics (TGK edges)
* Given a set of start nodes `N_start`:
* Fetch edges with `from[] ∩ N_start ≠ ∅` (or `to[]` depending on direction)
* Each edge expanded **once per logseq**
* Expansion obeys snapshot S
* Edge properties (type, role) used in filtering but not for identity
* Optional recursion depth `d` may be specified for DAG traversal
---
## 7. Unified Query API (Conceptual)
```
result_set = query(
snapshot=S,
artifact_constraints={type_tag=42},
edge_constraints={edge_type=7, role=FROM},
start_nodes=[node1, node2],
projections=[artifact_id, tgk_edge_id, node_id],
traversal_depth=3,
aggregation='count'
)
```
* Returns combined **artifact + TGK edge references**
* Traversal automatically expands TGK edges
* Aggregation and projection deterministic
---
## 8. Determinism Guarantees
1. **Same snapshot + same constraints → identical results**
2. **Logseq ascending + canonical key tie-breaks**
3. Filters, shards, SIMD do **not affect result set**
4. Traversal expansion deterministic per DAG rules
---
## 9. Garbage Collection Safety
* Records and edges **must not be removed** if they appear in snapshot `S` or any **reachable traversal**
* Optional: **tombstone records** prevent accidental resurrection
---
## 10. Federation Considerations (Non-normative)
* Domains may restrict **visibility of artifacts/edges**
* Queries may include **domain filters**
* Canonical IDs remain stable across domains
---
## 11. Summary
The unified query execution model:
* Provides **single API over ASL + TGK references**
* Fully respects **snapshot determinism**
* Supports **DAG traversal over TGK edges**
* Leverages **ENC-ASL-TGK-INDEX acceleration**
* Guarantees **semantic consistency** without re-encoding core objects
---
Next step could be:
* **Formal execution plan specification**: physical operators, filter pushdown, traversal operators
* **C-struct definitions for query results**, mapping directly to ENC-ASL-TGK-INDEX
Do you want me to draft that next?