amduat-api/docs/archive/enc-asl-tgk-index-draft.md
2026-01-17 10:33:23 +01:00

5.7 KiB
Raw Blame History

Absolutely — heres a draft for ENC-ASL-TGK-INDEX, carefully merging ASL artifact indexes and TGK edge indexes while respecting the separation of concerns and snapshot determinism.

This design keeps ENC-ASL-CORE and ENC-TGK-CORE authoritative, and only merges index references and acceleration structures.


ENC-ASL-TGK-INDEX

Merged On-Disk Index for ASL Artifacts and TGK Edges


1. Purpose

ENC-ASL-TGK-INDEX defines a unified on-disk index that:

  • References ASL artifacts (ENC-ASL-CORE)
  • References TGK edges (ENC-TGK-CORE)
  • Supports routing keys, filters, sharding, SIMD acceleration per ASL-INDEX-ACCEL
  • Preserves snapshot safety, log-sequence ordering, and immutability

Semantic data lives in the respective CORE layers; this index layer only stores references.


2. Layering Principle

Layer Responsibility
ENC-ASL-CORE Artifact structure and type tags
ENC-TGK-CORE Edge structure (from[] → to[])
TGK-INDEX / ASL-INDEX Canonical & routing keys, index semantics
ENC-ASL-TGK-INDEX On-disk references and acceleration metadata

Invariant: This index never re-encodes artifacts or edges.


3. Segment Layout

Segments are append-only and snapshot-bound:

+-----------------------------+
| Segment Header              |
+-----------------------------+
| Routing Filters             |
+-----------------------------+
| ASL Artifact Index Records  |
+-----------------------------+
| TGK Edge Index Records      |
+-----------------------------+
| Optional Acceleration Data  |
+-----------------------------+
| Segment Footer              |
+-----------------------------+
  • Segment atomicity enforced
  • Footer checksum guarantees integrity

4. Segment Header

struct asl_tgk_index_segment_header {
    uint32_t magic;            // 'ATXI'
    uint16_t version;
    uint16_t flags;
    uint64_t segment_id;
    uint64_t logseq_min;
    uint64_t logseq_max;
    uint64_t asl_record_count;
    uint64_t tgk_record_count;
    uint64_t record_area_offset;
    uint64_t footer_offset;
};
  • logseq_* enforce snapshot visibility
  • Separate counts for ASL and TGK entries

5. Routing Filters

Filters may be segmented by type:

  • ASL filters: artifact hash + type tag
  • TGK filters: canonical edge ID + edge type key + optional role
struct asl_tgk_filter_header {
    uint16_t filter_type;  // e.g., BLOOM, XOR
    uint16_t version;
    uint32_t flags;
    uint64_t size_bytes;   // length of filter payload
};
  • Filters are advisory; false positives allowed, false negatives forbidden
  • Must be deterministic per snapshot

6. ASL Artifact Index Record

struct asl_index_record {
    uint64_t logseq;
    uint64_t artifact_id;      // ENC-ASL-CORE reference
    uint32_t type_tag;         // optional
    uint8_t  has_type_tag;     // 0 or 1
    uint16_t flags;            // tombstone, reserved
};
  • artifact_id = canonical identity
  • No artifact payload here

7. TGK Edge Index Record

struct tgk_index_record {
    uint64_t logseq;
    uint64_t tgk_edge_id;      // ENC-TGK-CORE reference
    uint32_t edge_type_key;    // optional
    uint8_t  has_edge_type;
    uint8_t  role;             // optional from/to/both
    uint16_t flags;            // tombstone, reserved
};
  • tgk_edge_id = canonical TGK-CORE edge ID
  • No node lists stored in index

8. Optional Node-Projection Records

For acceleration:

struct node_edge_ref {
    uint64_t logseq;
    uint64_t node_id;          // from/to node
    uint64_t tgk_edge_id;
    uint8_t  position;         // from or to
};
  • Fully derivable from TGK-CORE edges
  • Optional; purely for lookup speed

9. Sharding and SIMD

  • Shard assignment is routing key based (ASL artifact or TGK edge)
  • SIMD arrays may store precomputed routing keys for fast filter evaluation
  • Must follow ASL-INDEX-ACCEL invariants: deterministic, immutable, snapshot-safe

10. Snapshot Interaction

At snapshot S:

  • Segment visible if logseq_min ≤ S
  • ASL or TGK record visible if logseq ≤ S
  • Tombstones shadow earlier records
  • Filters may be used as advisory before canonical verification

struct asl_tgk_index_segment_footer {
    uint64_t checksum;        // covers header, filters, records
    uint64_t asl_record_bytes;
    uint64_t tgk_record_bytes;
    uint64_t filter_bytes;
};
  • Ensures atomicity and completeness

12. Normative Invariants

  1. ASL artifact identity = ENC-ASL-CORE artifact ID
  2. TGK edge identity = ENC-TGK-CORE edge ID
  3. Edge type tag and artifact type tag do not affect canonical identity
  4. Filters are advisory only; no false negatives
  5. Sharding is observationally invisible
  6. Index records are immutable once written
  7. Snapshot visibility strictly follows logseq
  8. Determinism guaranteed per snapshot

13. Summary

ENC-ASL-TGK-INDEX merges ASL artifacts and TGK edges into a single, snapshot-safe, acceleration-friendly index layer:

  • Keeps core semantics authoritative
  • Enables high-performance lookups using routing, sharding, SIMD, and filters
  • Preserves immutability and determinism
  • Fully compatible with ASL-INDEX-ACCEL principles

This design supports billions of references while avoiding semantic collisions between ASL and TGK layers.


If you want, the next step could be drafting a unified query execution model over this merged index, connecting artifact lookups and TGK graph traversals in a snapshot-safe, deterministic way.

Do you want me to do that next?