# ENC-TGK-INDEX ### Encoding Specification for TGK Edge Index References --- ## 1. Purpose ENC-TGK-INDEX defines the **on-disk encoding for Trace Graph Kernel (TGK) index records**, which serve as **references to TGK-CORE edges**. * It **never encodes edge structure** (`from[]` / `to[]`) * It supports **filters, sharding, and routing** per ASL-INDEX-ACCEL * Snapshot and log-sequence semantics are maintained for deterministic recovery --- ## 2. Layering Principle * **TGK-CORE / ENC-TGK-CORE**: authoritative edge structure (`from[] → to[]`) * **TGK-INDEX**: defines canonical keys, routing keys, acceleration logic * **ENC-TGK-INDEX**: stores references to TGK-CORE edges and acceleration metadata **Normative statement:** > ENC-TGK-INDEX encodes only references to TGK-CORE edges and MUST NOT re-encode or reinterpret edge structure. --- ## 3. Segment Layout Segments are **immutable** and **snapshot-bound**: ``` +-----------------------------+ | Segment Header | +-----------------------------+ | Routing Filters | +-----------------------------+ | TGK Index Records | +-----------------------------+ | Optional Acceleration Data | +-----------------------------+ | Segment Footer | +-----------------------------+ ``` * Segment atomicity is enforced * Footer checksum guarantees completeness --- ## 4. Segment Header ```c struct tgk_index_segment_header { uint32_t magic; // 'TGKI' uint16_t version; // encoding version uint16_t flags; // segment flags uint64_t segment_id; // unique per dataset uint64_t logseq_min; // inclusive uint64_t logseq_max; // inclusive uint64_t record_count; // number of index records uint64_t record_area_offset; // bytes from segment start uint64_t footer_offset; // bytes from segment start }; ``` * `logseq_min` / `logseq_max` enforce snapshot visibility --- ## 5. Routing Filters Filters are **optional but recommended**: ```c struct tgk_index_filter_header { uint16_t filter_type; // e.g., BLOOM, XOR, RIBBON uint16_t version; uint32_t flags; uint64_t size_bytes; // length of filter payload }; ``` * Filters operate on **routing keys**, not canonical edge IDs * Routing keys may include: * Edge type key * Projection context * Direction or role * False positives allowed; false negatives forbidden --- ## 6. TGK Index Record Each record references a **single TGK-CORE edge**: ```c struct tgk_index_record { uint64_t logseq; // creation log sequence uint64_t tgk_edge_id; // reference to ENC-TGK-CORE edge uint32_t edge_type_key; // optional classification uint8_t has_edge_type; // 0 or 1 uint8_t role; // optional: from / to / both uint16_t flags; // tombstone, reserved }; ``` * `tgk_edge_id` is the **canonical key** * No `from[]` / `to[]` fields exist here * Edge identity is **solely TGK-CORE edge ID** **Flags**: | Flag | Meaning | | --------------------- | ----------------------- | | `TGK_INDEX_TOMBSTONE` | Shadows previous record | | `TGK_INDEX_RESERVED` | Future use | --- ## 7. Optional Node-Projection Records (Acceleration Only) For node-centric queries, optional records may map: ```c struct tgk_node_edge_ref { uint64_t logseq; uint64_t node_id; uint64_t tgk_edge_id; uint8_t position; // from or to }; ``` * **Derivable from TGK-CORE edges** * Optional; purely for acceleration * Must not affect semantics --- ## 8. Sharding and SIMD * Shard assignment: via **routing keys**, **not index semantics** * SIMD-optimized arrays may exist in optional acceleration sections * Must be deterministic and immutable * Must follow ASL-INDEX-ACCEL invariants --- ## 9. Snapshot Interaction At snapshot `S`: * Segment visible if `logseq_min ≤ S` * Record visible if `logseq ≤ S` * Tombstones shadow earlier records **Lookup Algorithm**: 1. Filter by snapshot 2. Evaluate routing/filter keys (advisory) 3. Confirm canonical key match with `tgk_edge_id` --- ## 10. Segment Footer ```c struct tgk_index_segment_footer { uint64_t checksum; // covers header + filters + records uint64_t record_bytes; // size of record area uint64_t filter_bytes; // size of filter area }; ``` * Ensures atomicity and completeness --- ## 11. Normative Invariants 1. **Edge identity = TGK-CORE edge ID** 2. Edge Type Key is **not part of identity** 3. Filters are **advisory only** 4. Sharding is observationally invisible 5. Index records are immutable 6. Snapshot visibility strictly follows `logseq` 7. Determinism guaranteed per snapshot --- ## 12. Summary ENC-TGK-INDEX: * References TGK-CORE edges without re-encoding structure * Supports snapshot-safe, deterministic lookup * Enables filter, shard, and SIMD acceleration * Preserves TGK-CORE semantics strictly This design **fully respects layering** and **prevents accidental semantic duplication**, while allowing scalable, high-performance indexing.