205 lines
5 KiB
Markdown
205 lines
5 KiB
Markdown
# ENC-TGK-INDEX
|
|
|
|
### Encoding Specification for TGK Edge Index References
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
ENC-TGK-INDEX defines the **on-disk encoding for Trace Graph Kernel (TGK) index records**, which serve as **references to TGK-CORE edges**.
|
|
|
|
* It **never encodes edge structure** (`from[]` / `to[]`)
|
|
* It supports **filters, sharding, and routing** per ASL-INDEX-ACCEL
|
|
* Snapshot and log-sequence semantics are maintained for deterministic recovery
|
|
|
|
---
|
|
|
|
## 2. Layering Principle
|
|
|
|
* **TGK-CORE / ENC-TGK-CORE**: authoritative edge structure (`from[] → to[]`)
|
|
* **TGK-INDEX**: defines canonical keys, routing keys, acceleration logic
|
|
* **ENC-TGK-INDEX**: stores references to TGK-CORE edges and acceleration metadata
|
|
|
|
**Normative statement:**
|
|
|
|
> ENC-TGK-INDEX encodes only references to TGK-CORE edges and MUST NOT re-encode or reinterpret edge structure.
|
|
|
|
---
|
|
|
|
## 3. Segment Layout
|
|
|
|
Segments are **immutable** and **snapshot-bound**:
|
|
|
|
```
|
|
+-----------------------------+
|
|
| Segment Header |
|
|
+-----------------------------+
|
|
| Routing Filters |
|
|
+-----------------------------+
|
|
| TGK Index Records |
|
|
+-----------------------------+
|
|
| Optional Acceleration Data |
|
|
+-----------------------------+
|
|
| Segment Footer |
|
|
+-----------------------------+
|
|
```
|
|
|
|
* Segment atomicity is enforced
|
|
* Footer checksum guarantees completeness
|
|
|
|
---
|
|
|
|
## 4. Segment Header
|
|
|
|
```c
|
|
struct tgk_index_segment_header {
|
|
uint32_t magic; // 'TGKI'
|
|
uint16_t version; // encoding version
|
|
uint16_t flags; // segment flags
|
|
uint64_t segment_id; // unique per dataset
|
|
uint64_t logseq_min; // inclusive
|
|
uint64_t logseq_max; // inclusive
|
|
uint64_t record_count; // number of index records
|
|
uint64_t record_area_offset; // bytes from segment start
|
|
uint64_t footer_offset; // bytes from segment start
|
|
};
|
|
```
|
|
|
|
* `logseq_min` / `logseq_max` enforce snapshot visibility
|
|
|
|
---
|
|
|
|
## 5. Routing Filters
|
|
|
|
Filters are **optional but recommended**:
|
|
|
|
```c
|
|
struct tgk_index_filter_header {
|
|
uint16_t filter_type; // e.g., BLOOM, XOR, RIBBON
|
|
uint16_t version;
|
|
uint32_t flags;
|
|
uint64_t size_bytes; // length of filter payload
|
|
};
|
|
```
|
|
|
|
* Filters operate on **routing keys**, not canonical edge IDs
|
|
* Routing keys may include:
|
|
|
|
* Edge type key
|
|
* Projection context
|
|
* Direction or role
|
|
* False positives allowed; false negatives forbidden
|
|
|
|
---
|
|
|
|
## 6. TGK Index Record
|
|
|
|
Each record references a **single TGK-CORE edge**:
|
|
|
|
```c
|
|
struct tgk_index_record {
|
|
uint64_t logseq; // creation log sequence
|
|
uint64_t tgk_edge_id; // reference to ENC-TGK-CORE edge
|
|
uint32_t edge_type_key; // optional classification
|
|
uint8_t has_edge_type; // 0 or 1
|
|
uint8_t role; // optional: from / to / both
|
|
uint16_t flags; // tombstone, reserved
|
|
};
|
|
```
|
|
|
|
* `tgk_edge_id` is the **canonical key**
|
|
* No `from[]` / `to[]` fields exist here
|
|
* Edge identity is **solely TGK-CORE edge ID**
|
|
|
|
**Flags**:
|
|
|
|
| Flag | Meaning |
|
|
| --------------------- | ----------------------- |
|
|
| `TGK_INDEX_TOMBSTONE` | Shadows previous record |
|
|
| `TGK_INDEX_RESERVED` | Future use |
|
|
|
|
---
|
|
|
|
## 7. Optional Node-Projection Records (Acceleration Only)
|
|
|
|
For node-centric queries, optional records may map:
|
|
|
|
```c
|
|
struct tgk_node_edge_ref {
|
|
uint64_t logseq;
|
|
uint64_t node_id;
|
|
uint64_t tgk_edge_id;
|
|
uint8_t position; // from or to
|
|
};
|
|
```
|
|
|
|
* **Derivable from TGK-CORE edges**
|
|
* Optional; purely for acceleration
|
|
* Must not affect semantics
|
|
|
|
---
|
|
|
|
## 8. Sharding and SIMD
|
|
|
|
* Shard assignment: via **routing keys**, **not index semantics**
|
|
* SIMD-optimized arrays may exist in optional acceleration sections
|
|
* Must be deterministic and immutable
|
|
* Must follow ASL-INDEX-ACCEL invariants
|
|
|
|
---
|
|
|
|
## 9. Snapshot Interaction
|
|
|
|
At snapshot `S`:
|
|
|
|
* Segment visible if `logseq_min ≤ S`
|
|
* Record visible if `logseq ≤ S`
|
|
* Tombstones shadow earlier records
|
|
|
|
**Lookup Algorithm**:
|
|
|
|
1. Filter by snapshot
|
|
2. Evaluate routing/filter keys (advisory)
|
|
3. Confirm canonical key match with `tgk_edge_id`
|
|
|
|
---
|
|
|
|
## 10. Segment Footer
|
|
|
|
```c
|
|
struct tgk_index_segment_footer {
|
|
uint64_t checksum; // covers header + filters + records
|
|
uint64_t record_bytes; // size of record area
|
|
uint64_t filter_bytes; // size of filter area
|
|
};
|
|
```
|
|
|
|
* Ensures atomicity and completeness
|
|
|
|
---
|
|
|
|
## 11. Normative Invariants
|
|
|
|
1. **Edge identity = TGK-CORE edge ID**
|
|
2. Edge Type Key is **not part of identity**
|
|
3. Filters are **advisory only**
|
|
4. Sharding is observationally invisible
|
|
5. Index records are immutable
|
|
6. Snapshot visibility strictly follows `logseq`
|
|
7. Determinism guaranteed per snapshot
|
|
|
|
---
|
|
|
|
## 12. Summary
|
|
|
|
ENC-TGK-INDEX:
|
|
|
|
* References TGK-CORE edges without re-encoding structure
|
|
* Supports snapshot-safe, deterministic lookup
|
|
* Enables filter, shard, and SIMD acceleration
|
|
* Preserves TGK-CORE semantics strictly
|
|
|
|
This design **fully respects layering** and **prevents accidental semantic duplication**, while allowing scalable, high-performance indexing.
|
|
|
|
|