amduat-api/notes/enc-tgk-index.md
2026-01-17 07:32:14 +01:00

5.1 KiB

NOTE: Superseded by tier1/tgk-1.md (TGK/1). Kept for historical context.

ENC-TGK-INDEX

Encoding Specification for TGK Edge Index References


1. Purpose

ENC-TGK-INDEX defines the on-disk encoding for Trace Graph Kernel (TGK) index records, which serve as references to TGK-CORE edges.

  • It never encodes edge structure (from[] / to[])
  • It supports filters, sharding, and routing per ASL-INDEX-ACCEL
  • Snapshot and log-sequence semantics are maintained for deterministic recovery

2. Layering Principle

  • TGK-CORE / ENC-TGK-CORE: authoritative edge structure (from[] → to[])
  • TGK-INDEX: defines canonical keys, routing keys, acceleration logic
  • ENC-TGK-INDEX: stores references to TGK-CORE edges and acceleration metadata

Normative statement:

ENC-TGK-INDEX encodes only references to TGK-CORE edges and MUST NOT re-encode or reinterpret edge structure.


3. Segment Layout

Segments are immutable and snapshot-bound:

+-----------------------------+
| Segment Header              |
+-----------------------------+
| Routing Filters             |
+-----------------------------+
| TGK Index Records           |
+-----------------------------+
| Optional Acceleration Data  |
+-----------------------------+
| Segment Footer              |
+-----------------------------+
  • Segment atomicity is enforced
  • Footer checksum guarantees completeness

4. Segment Header

struct tgk_index_segment_header {
    uint32_t magic;              // 'TGKI'
    uint16_t version;            // encoding version
    uint16_t flags;              // segment flags
    uint64_t segment_id;         // unique per dataset
    uint64_t logseq_min;         // inclusive
    uint64_t logseq_max;         // inclusive
    uint64_t record_count;       // number of index records
    uint64_t record_area_offset; // bytes from segment start
    uint64_t footer_offset;      // bytes from segment start
};
  • logseq_min / logseq_max enforce snapshot visibility

5. Routing Filters

Filters are optional but recommended:

struct tgk_index_filter_header {
    uint16_t filter_type;    // e.g., BLOOM, XOR, RIBBON
    uint16_t version;
    uint32_t flags;
    uint64_t size_bytes;     // length of filter payload
};
  • Filters operate on routing keys, not canonical edge IDs

  • Routing keys may include:

    • Edge type key
    • Projection context
    • Direction or role
  • False positives allowed; false negatives forbidden


6. TGK Index Record

Each record references a single TGK-CORE edge:

struct tgk_index_record {
    uint64_t logseq;           // creation log sequence
    uint64_t tgk_edge_id;      // reference to ENC-TGK-CORE edge
    uint32_t edge_type_key;    // optional classification
    uint8_t  has_edge_type;    // 0 or 1
    uint8_t  role;             // optional: from / to / both
    uint16_t flags;            // tombstone, reserved
};
  • tgk_edge_id is the canonical key
  • No from[] / to[] fields exist here
  • Edge identity is solely TGK-CORE edge ID

Flags:

Flag Meaning
TGK_INDEX_TOMBSTONE Shadows previous record
TGK_INDEX_RESERVED Future use

7. Optional Node-Projection Records (Acceleration Only)

For node-centric queries, optional records may map:

struct tgk_node_edge_ref {
    uint64_t logseq;
    uint64_t node_id;
    uint64_t tgk_edge_id;
    uint8_t  position; // from or to
};
  • Derivable from TGK-CORE edges
  • Optional; purely for acceleration
  • Must not affect semantics

8. Sharding and SIMD

  • Shard assignment: via routing keys, not index semantics
  • SIMD-optimized arrays may exist in optional acceleration sections
  • Must be deterministic and immutable
  • Must follow ASL-INDEX-ACCEL invariants

9. Snapshot Interaction

At snapshot S:

  • Segment visible if logseq_min ≤ S
  • Record visible if logseq ≤ S
  • Tombstones shadow earlier records

Lookup Algorithm:

  1. Filter by snapshot
  2. Evaluate routing/filter keys (advisory)
  3. Confirm canonical key match with tgk_edge_id

struct tgk_index_segment_footer {
    uint64_t checksum;        // covers header + filters + records
    uint64_t record_bytes;    // size of record area
    uint64_t filter_bytes;    // size of filter area
};
  • Ensures atomicity and completeness

11. Normative Invariants

  1. Edge identity = TGK-CORE edge ID
  2. Edge Type Key is not part of identity
  3. Filters are advisory only
  4. Sharding is observationally invisible
  5. Index records are immutable
  6. Snapshot visibility strictly follows logseq
  7. Determinism guaranteed per snapshot

12. Summary

ENC-TGK-INDEX:

  • References TGK-CORE edges without re-encoding structure
  • Supports snapshot-safe, deterministic lookup
  • Enables filter, shard, and SIMD acceleration
  • Preserves TGK-CORE semantics strictly

This design fully respects layering and prevents accidental semantic duplication, while allowing scalable, high-performance indexing.