amduat-api/notes/ENC-ASL-TGK-INDEX.md
2026-01-17 07:32:14 +01:00

225 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

NOTE: Superseded by `tier1/tgk-1.md` (TGK/1). Kept for historical context.
Absolutely — heres a **draft for ENC-ASL-TGK-INDEX**, carefully merging ASL artifact indexes and TGK edge indexes while respecting the **separation of concerns** and **snapshot determinism**.
This design keeps **ENC-ASL-CORE** and **ENC-TGK-CORE** authoritative, and only merges **index references and acceleration structures**.
---
# ENC-ASL-TGK-INDEX
### Merged On-Disk Index for ASL Artifacts and TGK Edges
---
## 1. Purpose
ENC-ASL-TGK-INDEX defines a **unified on-disk index** that:
* References **ASL artifacts** (ENC-ASL-CORE)
* References **TGK edges** (ENC-TGK-CORE)
* Supports **routing keys, filters, sharding, SIMD acceleration** per ASL-INDEX-ACCEL
* Preserves **snapshot safety, log-sequence ordering, and immutability**
> Semantic data lives in the respective CORE layers; this index layer **only stores references**.
---
## 2. Layering Principle
| Layer | Responsibility |
| --------------------- | -------------------------------------------- |
| ENC-ASL-CORE | Artifact structure and type tags |
| ENC-TGK-CORE | Edge structure (`from[] → to[]`) |
| TGK-INDEX / ASL-INDEX | Canonical & routing keys, index semantics |
| ENC-ASL-TGK-INDEX | On-disk references and acceleration metadata |
**Invariant:** This index never re-encodes artifacts or edges.
---
## 3. Segment Layout
Segments are **append-only** and **snapshot-bound**:
```
+-----------------------------+
| Segment Header |
+-----------------------------+
| Routing Filters |
+-----------------------------+
| ASL Artifact Index Records |
+-----------------------------+
| TGK Edge Index Records |
+-----------------------------+
| Optional Acceleration Data |
+-----------------------------+
| Segment Footer |
+-----------------------------+
```
* Segment atomicity enforced
* Footer checksum guarantees integrity
---
## 4. Segment Header
```c
struct asl_tgk_index_segment_header {
uint32_t magic; // 'ATXI'
uint16_t version;
uint16_t flags;
uint64_t segment_id;
uint64_t logseq_min;
uint64_t logseq_max;
uint64_t asl_record_count;
uint64_t tgk_record_count;
uint64_t record_area_offset;
uint64_t footer_offset;
};
```
* `logseq_*` enforce snapshot visibility
* Separate counts for ASL and TGK entries
---
## 5. Routing Filters
Filters may be **segmented by type**:
* **ASL filters**: artifact hash + type tag
* **TGK filters**: canonical edge ID + edge type key + optional role
```c
struct asl_tgk_filter_header {
uint16_t filter_type; // e.g., BLOOM, XOR
uint16_t version;
uint32_t flags;
uint64_t size_bytes; // length of filter payload
};
```
* Filters are advisory; false positives allowed, false negatives forbidden
* Must be deterministic per snapshot
---
## 6. ASL Artifact Index Record
```c
struct asl_index_record {
uint64_t logseq;
uint64_t artifact_id; // ENC-ASL-CORE reference
uint32_t type_tag; // optional
uint8_t has_type_tag; // 0 or 1
uint16_t flags; // tombstone, reserved
};
```
* `artifact_id` = canonical identity
* No artifact payload here
---
## 7. TGK Edge Index Record
```c
struct tgk_index_record {
uint64_t logseq;
uint64_t tgk_edge_id; // ENC-TGK-CORE reference
uint32_t edge_type_key; // optional
uint8_t has_edge_type;
uint8_t role; // optional from/to/both
uint16_t flags; // tombstone, reserved
};
```
* `tgk_edge_id` = canonical TGK-CORE edge ID
* No node lists stored in index
---
## 8. Optional Node-Projection Records
For acceleration:
```c
struct node_edge_ref {
uint64_t logseq;
uint64_t node_id; // from/to node
uint64_t tgk_edge_id;
uint8_t position; // from or to
};
```
* Fully derivable from TGK-CORE edges
* Optional; purely for lookup speed
---
## 9. Sharding and SIMD
* Shard assignment is **routing key based** (ASL artifact or TGK edge)
* SIMD arrays may store precomputed routing keys for fast filter evaluation
* Must follow ASL-INDEX-ACCEL invariants: deterministic, immutable, snapshot-safe
---
## 10. Snapshot Interaction
At snapshot `S`:
* Segment visible if `logseq_min ≤ S`
* ASL or TGK record visible if `logseq ≤ S`
* Tombstones shadow earlier records
* Filters may be used as advisory before canonical verification
---
## 11. Segment Footer
```c
struct asl_tgk_index_segment_footer {
uint64_t checksum; // covers header, filters, records
uint64_t asl_record_bytes;
uint64_t tgk_record_bytes;
uint64_t filter_bytes;
};
```
* Ensures atomicity and completeness
---
## 12. Normative Invariants
1. **ASL artifact identity = ENC-ASL-CORE artifact ID**
2. **TGK edge identity = ENC-TGK-CORE edge ID**
3. Edge type tag and artifact type tag **do not affect canonical identity**
4. Filters are advisory only; no false negatives
5. Sharding is observationally invisible
6. Index records are immutable once written
7. Snapshot visibility strictly follows `logseq`
8. Determinism guaranteed per snapshot
---
## 13. Summary
ENC-ASL-TGK-INDEX merges ASL artifacts and TGK edges into a **single, snapshot-safe, acceleration-friendly index layer**:
* Keeps core semantics authoritative
* Enables high-performance lookups using routing, sharding, SIMD, and filters
* Preserves immutability and determinism
* Fully compatible with ASL-INDEX-ACCEL principles
This design supports billions of references while avoiding semantic collisions between ASL and TGK layers.
---
If you want, the next step could be **drafting a unified query execution model** over this merged index, connecting **artifact lookups** and **TGK graph traversals** in a snapshot-safe, deterministic way.
Do you want me to do that next?