amduat-api/notes/enc-tgk-index.md
2026-01-17 00:19:49 +01:00

205 lines
5 KiB
Markdown

# ENC-TGK-INDEX
### Encoding Specification for TGK Edge Index References
---
## 1. Purpose
ENC-TGK-INDEX defines the **on-disk encoding for Trace Graph Kernel (TGK) index records**, which serve as **references to TGK-CORE edges**.
* It **never encodes edge structure** (`from[]` / `to[]`)
* It supports **filters, sharding, and routing** per ASL-INDEX-ACCEL
* Snapshot and log-sequence semantics are maintained for deterministic recovery
---
## 2. Layering Principle
* **TGK-CORE / ENC-TGK-CORE**: authoritative edge structure (`from[] → to[]`)
* **TGK-INDEX**: defines canonical keys, routing keys, acceleration logic
* **ENC-TGK-INDEX**: stores references to TGK-CORE edges and acceleration metadata
**Normative statement:**
> ENC-TGK-INDEX encodes only references to TGK-CORE edges and MUST NOT re-encode or reinterpret edge structure.
---
## 3. Segment Layout
Segments are **immutable** and **snapshot-bound**:
```
+-----------------------------+
| Segment Header |
+-----------------------------+
| Routing Filters |
+-----------------------------+
| TGK Index Records |
+-----------------------------+
| Optional Acceleration Data |
+-----------------------------+
| Segment Footer |
+-----------------------------+
```
* Segment atomicity is enforced
* Footer checksum guarantees completeness
---
## 4. Segment Header
```c
struct tgk_index_segment_header {
uint32_t magic; // 'TGKI'
uint16_t version; // encoding version
uint16_t flags; // segment flags
uint64_t segment_id; // unique per dataset
uint64_t logseq_min; // inclusive
uint64_t logseq_max; // inclusive
uint64_t record_count; // number of index records
uint64_t record_area_offset; // bytes from segment start
uint64_t footer_offset; // bytes from segment start
};
```
* `logseq_min` / `logseq_max` enforce snapshot visibility
---
## 5. Routing Filters
Filters are **optional but recommended**:
```c
struct tgk_index_filter_header {
uint16_t filter_type; // e.g., BLOOM, XOR, RIBBON
uint16_t version;
uint32_t flags;
uint64_t size_bytes; // length of filter payload
};
```
* Filters operate on **routing keys**, not canonical edge IDs
* Routing keys may include:
* Edge type key
* Projection context
* Direction or role
* False positives allowed; false negatives forbidden
---
## 6. TGK Index Record
Each record references a **single TGK-CORE edge**:
```c
struct tgk_index_record {
uint64_t logseq; // creation log sequence
uint64_t tgk_edge_id; // reference to ENC-TGK-CORE edge
uint32_t edge_type_key; // optional classification
uint8_t has_edge_type; // 0 or 1
uint8_t role; // optional: from / to / both
uint16_t flags; // tombstone, reserved
};
```
* `tgk_edge_id` is the **canonical key**
* No `from[]` / `to[]` fields exist here
* Edge identity is **solely TGK-CORE edge ID**
**Flags**:
| Flag | Meaning |
| --------------------- | ----------------------- |
| `TGK_INDEX_TOMBSTONE` | Shadows previous record |
| `TGK_INDEX_RESERVED` | Future use |
---
## 7. Optional Node-Projection Records (Acceleration Only)
For node-centric queries, optional records may map:
```c
struct tgk_node_edge_ref {
uint64_t logseq;
uint64_t node_id;
uint64_t tgk_edge_id;
uint8_t position; // from or to
};
```
* **Derivable from TGK-CORE edges**
* Optional; purely for acceleration
* Must not affect semantics
---
## 8. Sharding and SIMD
* Shard assignment: via **routing keys**, **not index semantics**
* SIMD-optimized arrays may exist in optional acceleration sections
* Must be deterministic and immutable
* Must follow ASL-INDEX-ACCEL invariants
---
## 9. Snapshot Interaction
At snapshot `S`:
* Segment visible if `logseq_min ≤ S`
* Record visible if `logseq ≤ S`
* Tombstones shadow earlier records
**Lookup Algorithm**:
1. Filter by snapshot
2. Evaluate routing/filter keys (advisory)
3. Confirm canonical key match with `tgk_edge_id`
---
## 10. Segment Footer
```c
struct tgk_index_segment_footer {
uint64_t checksum; // covers header + filters + records
uint64_t record_bytes; // size of record area
uint64_t filter_bytes; // size of filter area
};
```
* Ensures atomicity and completeness
---
## 11. Normative Invariants
1. **Edge identity = TGK-CORE edge ID**
2. Edge Type Key is **not part of identity**
3. Filters are **advisory only**
4. Sharding is observationally invisible
5. Index records are immutable
6. Snapshot visibility strictly follows `logseq`
7. Determinism guaranteed per snapshot
---
## 12. Summary
ENC-TGK-INDEX:
* References TGK-CORE edges without re-encoding structure
* Supports snapshot-safe, deterministic lookup
* Enables filter, shard, and SIMD acceleration
* Preserves TGK-CORE semantics strictly
This design **fully respects layering** and **prevents accidental semantic duplication**, while allowing scalable, high-performance indexing.