amduat-api/notes/ENC-ASL-TGK-INDEX.md

225 lines
5.8 KiB
Markdown
Raw Normal View History

NOTE: Superseded by `tier1/tgk-1.md` (TGK/1). Kept for historical context.
Absolutely — heres a **draft for ENC-ASL-TGK-INDEX**, carefully merging ASL artifact indexes and TGK edge indexes while respecting the **separation of concerns** and **snapshot determinism**.
This design keeps **ENC-ASL-CORE** and **ENC-TGK-CORE** authoritative, and only merges **index references and acceleration structures**.
---
# ENC-ASL-TGK-INDEX
### Merged On-Disk Index for ASL Artifacts and TGK Edges
---
## 1. Purpose
ENC-ASL-TGK-INDEX defines a **unified on-disk index** that:
* References **ASL artifacts** (ENC-ASL-CORE)
* References **TGK edges** (ENC-TGK-CORE)
* Supports **routing keys, filters, sharding, SIMD acceleration** per ASL-INDEX-ACCEL
* Preserves **snapshot safety, log-sequence ordering, and immutability**
> Semantic data lives in the respective CORE layers; this index layer **only stores references**.
---
## 2. Layering Principle
| Layer | Responsibility |
| --------------------- | -------------------------------------------- |
| ENC-ASL-CORE | Artifact structure and type tags |
| ENC-TGK-CORE | Edge structure (`from[] → to[]`) |
| TGK-INDEX / ASL-INDEX | Canonical & routing keys, index semantics |
| ENC-ASL-TGK-INDEX | On-disk references and acceleration metadata |
**Invariant:** This index never re-encodes artifacts or edges.
---
## 3. Segment Layout
Segments are **append-only** and **snapshot-bound**:
```
+-----------------------------+
| Segment Header |
+-----------------------------+
| Routing Filters |
+-----------------------------+
| ASL Artifact Index Records |
+-----------------------------+
| TGK Edge Index Records |
+-----------------------------+
| Optional Acceleration Data |
+-----------------------------+
| Segment Footer |
+-----------------------------+
```
* Segment atomicity enforced
* Footer checksum guarantees integrity
---
## 4. Segment Header
```c
struct asl_tgk_index_segment_header {
uint32_t magic; // 'ATXI'
uint16_t version;
uint16_t flags;
uint64_t segment_id;
uint64_t logseq_min;
uint64_t logseq_max;
uint64_t asl_record_count;
uint64_t tgk_record_count;
uint64_t record_area_offset;
uint64_t footer_offset;
};
```
* `logseq_*` enforce snapshot visibility
* Separate counts for ASL and TGK entries
---
## 5. Routing Filters
Filters may be **segmented by type**:
* **ASL filters**: artifact hash + type tag
* **TGK filters**: canonical edge ID + edge type key + optional role
```c
struct asl_tgk_filter_header {
uint16_t filter_type; // e.g., BLOOM, XOR
uint16_t version;
uint32_t flags;
uint64_t size_bytes; // length of filter payload
};
```
* Filters are advisory; false positives allowed, false negatives forbidden
* Must be deterministic per snapshot
---
## 6. ASL Artifact Index Record
```c
struct asl_index_record {
uint64_t logseq;
uint64_t artifact_id; // ENC-ASL-CORE reference
uint32_t type_tag; // optional
uint8_t has_type_tag; // 0 or 1
uint16_t flags; // tombstone, reserved
};
```
* `artifact_id` = canonical identity
* No artifact payload here
---
## 7. TGK Edge Index Record
```c
struct tgk_index_record {
uint64_t logseq;
uint64_t tgk_edge_id; // ENC-TGK-CORE reference
uint32_t edge_type_key; // optional
uint8_t has_edge_type;
uint8_t role; // optional from/to/both
uint16_t flags; // tombstone, reserved
};
```
* `tgk_edge_id` = canonical TGK-CORE edge ID
* No node lists stored in index
---
## 8. Optional Node-Projection Records
For acceleration:
```c
struct node_edge_ref {
uint64_t logseq;
uint64_t node_id; // from/to node
uint64_t tgk_edge_id;
uint8_t position; // from or to
};
```
* Fully derivable from TGK-CORE edges
* Optional; purely for lookup speed
---
## 9. Sharding and SIMD
* Shard assignment is **routing key based** (ASL artifact or TGK edge)
* SIMD arrays may store precomputed routing keys for fast filter evaluation
* Must follow ASL-INDEX-ACCEL invariants: deterministic, immutable, snapshot-safe
---
## 10. Snapshot Interaction
At snapshot `S`:
* Segment visible if `logseq_min ≤ S`
* ASL or TGK record visible if `logseq ≤ S`
* Tombstones shadow earlier records
* Filters may be used as advisory before canonical verification
---
## 11. Segment Footer
```c
struct asl_tgk_index_segment_footer {
uint64_t checksum; // covers header, filters, records
uint64_t asl_record_bytes;
uint64_t tgk_record_bytes;
uint64_t filter_bytes;
};
```
* Ensures atomicity and completeness
---
## 12. Normative Invariants
1. **ASL artifact identity = ENC-ASL-CORE artifact ID**
2. **TGK edge identity = ENC-TGK-CORE edge ID**
3. Edge type tag and artifact type tag **do not affect canonical identity**
4. Filters are advisory only; no false negatives
5. Sharding is observationally invisible
6. Index records are immutable once written
7. Snapshot visibility strictly follows `logseq`
8. Determinism guaranteed per snapshot
---
## 13. Summary
ENC-ASL-TGK-INDEX merges ASL artifacts and TGK edges into a **single, snapshot-safe, acceleration-friendly index layer**:
* Keeps core semantics authoritative
* Enables high-performance lookups using routing, sharding, SIMD, and filters
* Preserves immutability and determinism
* Fully compatible with ASL-INDEX-ACCEL principles
This design supports billions of references while avoiding semantic collisions between ASL and TGK layers.
---
If you want, the next step could be **drafting a unified query execution model** over this merged index, connecting **artifact lookups** and **TGK graph traversals** in a snapshot-safe, deterministic way.
Do you want me to do that next?