224 lines
5.7 KiB
Markdown
224 lines
5.7 KiB
Markdown
Absolutely — here’s a **draft for ENC-ASL-TGK-INDEX**, carefully merging ASL artifact indexes and TGK edge indexes while respecting the **separation of concerns** and **snapshot determinism**.
|
||
|
||
This design keeps **ENC-ASL-CORE** and **ENC-TGK-CORE** authoritative, and only merges **index references and acceleration structures**.
|
||
|
||
---
|
||
|
||
# ENC-ASL-TGK-INDEX
|
||
|
||
### Merged On-Disk Index for ASL Artifacts and TGK Edges
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
ENC-ASL-TGK-INDEX defines a **unified on-disk index** that:
|
||
|
||
* References **ASL artifacts** (ENC-ASL-CORE)
|
||
* References **TGK edges** (ENC-TGK-CORE)
|
||
* Supports **routing keys, filters, sharding, SIMD acceleration** per ASL-INDEX-ACCEL
|
||
* Preserves **snapshot safety, log-sequence ordering, and immutability**
|
||
|
||
> Semantic data lives in the respective CORE layers; this index layer **only stores references**.
|
||
|
||
---
|
||
|
||
## 2. Layering Principle
|
||
|
||
| Layer | Responsibility |
|
||
| --------------------- | -------------------------------------------- |
|
||
| ENC-ASL-CORE | Artifact structure and type tags |
|
||
| ENC-TGK-CORE | Edge structure (`from[] → to[]`) |
|
||
| TGK-INDEX / ASL-INDEX | Canonical & routing keys, index semantics |
|
||
| ENC-ASL-TGK-INDEX | On-disk references and acceleration metadata |
|
||
|
||
**Invariant:** This index never re-encodes artifacts or edges.
|
||
|
||
---
|
||
|
||
## 3. Segment Layout
|
||
|
||
Segments are **append-only** and **snapshot-bound**:
|
||
|
||
```
|
||
+-----------------------------+
|
||
| Segment Header |
|
||
+-----------------------------+
|
||
| Routing Filters |
|
||
+-----------------------------+
|
||
| ASL Artifact Index Records |
|
||
+-----------------------------+
|
||
| TGK Edge Index Records |
|
||
+-----------------------------+
|
||
| Optional Acceleration Data |
|
||
+-----------------------------+
|
||
| Segment Footer |
|
||
+-----------------------------+
|
||
```
|
||
|
||
* Segment atomicity enforced
|
||
* Footer checksum guarantees integrity
|
||
|
||
---
|
||
|
||
## 4. Segment Header
|
||
|
||
```c
|
||
struct asl_tgk_index_segment_header {
|
||
uint32_t magic; // 'ATXI'
|
||
uint16_t version;
|
||
uint16_t flags;
|
||
uint64_t segment_id;
|
||
uint64_t logseq_min;
|
||
uint64_t logseq_max;
|
||
uint64_t asl_record_count;
|
||
uint64_t tgk_record_count;
|
||
uint64_t record_area_offset;
|
||
uint64_t footer_offset;
|
||
};
|
||
```
|
||
|
||
* `logseq_*` enforce snapshot visibility
|
||
* Separate counts for ASL and TGK entries
|
||
|
||
---
|
||
|
||
## 5. Routing Filters
|
||
|
||
Filters may be **segmented by type**:
|
||
|
||
* **ASL filters**: artifact hash + type tag
|
||
* **TGK filters**: canonical edge ID + edge type key + optional role
|
||
|
||
```c
|
||
struct asl_tgk_filter_header {
|
||
uint16_t filter_type; // e.g., BLOOM, XOR
|
||
uint16_t version;
|
||
uint32_t flags;
|
||
uint64_t size_bytes; // length of filter payload
|
||
};
|
||
```
|
||
|
||
* Filters are advisory; false positives allowed, false negatives forbidden
|
||
* Must be deterministic per snapshot
|
||
|
||
---
|
||
|
||
## 6. ASL Artifact Index Record
|
||
|
||
```c
|
||
struct asl_index_record {
|
||
uint64_t logseq;
|
||
uint64_t artifact_id; // ENC-ASL-CORE reference
|
||
uint32_t type_tag; // optional
|
||
uint8_t has_type_tag; // 0 or 1
|
||
uint16_t flags; // tombstone, reserved
|
||
};
|
||
```
|
||
|
||
* `artifact_id` = canonical identity
|
||
* No artifact payload here
|
||
|
||
---
|
||
|
||
## 7. TGK Edge Index Record
|
||
|
||
```c
|
||
struct tgk_index_record {
|
||
uint64_t logseq;
|
||
uint64_t tgk_edge_id; // ENC-TGK-CORE reference
|
||
uint32_t edge_type_key; // optional
|
||
uint8_t has_edge_type;
|
||
uint8_t role; // optional from/to/both
|
||
uint16_t flags; // tombstone, reserved
|
||
};
|
||
```
|
||
|
||
* `tgk_edge_id` = canonical TGK-CORE edge ID
|
||
* No node lists stored in index
|
||
|
||
---
|
||
|
||
## 8. Optional Node-Projection Records
|
||
|
||
For acceleration:
|
||
|
||
```c
|
||
struct node_edge_ref {
|
||
uint64_t logseq;
|
||
uint64_t node_id; // from/to node
|
||
uint64_t tgk_edge_id;
|
||
uint8_t position; // from or to
|
||
};
|
||
```
|
||
|
||
* Fully derivable from TGK-CORE edges
|
||
* Optional; purely for lookup speed
|
||
|
||
---
|
||
|
||
## 9. Sharding and SIMD
|
||
|
||
* Shard assignment is **routing key based** (ASL artifact or TGK edge)
|
||
* SIMD arrays may store precomputed routing keys for fast filter evaluation
|
||
* Must follow ASL-INDEX-ACCEL invariants: deterministic, immutable, snapshot-safe
|
||
|
||
---
|
||
|
||
## 10. Snapshot Interaction
|
||
|
||
At snapshot `S`:
|
||
|
||
* Segment visible if `logseq_min ≤ S`
|
||
* ASL or TGK record visible if `logseq ≤ S`
|
||
* Tombstones shadow earlier records
|
||
* Filters may be used as advisory before canonical verification
|
||
|
||
---
|
||
|
||
## 11. Segment Footer
|
||
|
||
```c
|
||
struct asl_tgk_index_segment_footer {
|
||
uint64_t checksum; // covers header, filters, records
|
||
uint64_t asl_record_bytes;
|
||
uint64_t tgk_record_bytes;
|
||
uint64_t filter_bytes;
|
||
};
|
||
```
|
||
|
||
* Ensures atomicity and completeness
|
||
|
||
---
|
||
|
||
## 12. Normative Invariants
|
||
|
||
1. **ASL artifact identity = ENC-ASL-CORE artifact ID**
|
||
2. **TGK edge identity = ENC-TGK-CORE edge ID**
|
||
3. Edge type tag and artifact type tag **do not affect canonical identity**
|
||
4. Filters are advisory only; no false negatives
|
||
5. Sharding is observationally invisible
|
||
6. Index records are immutable once written
|
||
7. Snapshot visibility strictly follows `logseq`
|
||
8. Determinism guaranteed per snapshot
|
||
|
||
---
|
||
|
||
## 13. Summary
|
||
|
||
ENC-ASL-TGK-INDEX merges ASL artifacts and TGK edges into a **single, snapshot-safe, acceleration-friendly index layer**:
|
||
|
||
* Keeps core semantics authoritative
|
||
* Enables high-performance lookups using routing, sharding, SIMD, and filters
|
||
* Preserves immutability and determinism
|
||
* Fully compatible with ASL-INDEX-ACCEL principles
|
||
|
||
This design supports billions of references while avoiding semantic collisions between ASL and TGK layers.
|
||
|
||
---
|
||
|
||
If you want, the next step could be **drafting a unified query execution model** over this merged index, connecting **artifact lookups** and **TGK graph traversals** in a snapshot-safe, deterministic way.
|
||
|
||
Do you want me to do that next?
|
||
|