Absolutely — here’s a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains**, preserving memory alignment, determinism, and snapshot safety. --- # Unified System C Struct Mapping and Binary Encoding Specification --- ## 1. Artifact & PER Records ### 1.1 Artifact Index Entry ```c typedef struct __attribute__((packed)) { uint64_t artifact_key; // canonical ArtifactKey uint64_t block_id; // CAS/ASL block ID uint32_t offset; // offset within block uint32_t length; // length in bytes uint32_t type_tag; // optional type tag uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise uint8_t reserved[3]; // padding for 8-byte alignment uint64_t logseq; // monotonic log sequence } artifact_index_entry_t; ``` **Binary encoding**: | Field | Bytes | Notes | | ------------ | ----- | ----------------------- | | artifact_key | 8 | canonical ID | | block_id | 8 | ZFS CAS block reference | | offset | 4 | offset in block | | length | 4 | payload size | | type_tag | 4 | optional type | | has_type_tag | 1 | toggle | | reserved | 3 | alignment padding | | logseq | 8 | monotonic sequence | --- ### 1.2 PER (PEL Execution Receipt) Record ```c typedef struct __attribute__((packed)) { artifact_index_entry_t base_artifact; // embedded artifact info uint64_t pel_program_id; // PEL program DAG canonical ID uint32_t input_count; // number of input artifacts uint64_t *input_keys; // array of ArtifactKeys uint32_t output_count; // number of outputs uint64_t *output_keys; // array of ArtifactKeys } per_record_t; ``` **Encoding notes**: * Base artifact encoding is identical to `artifact_index_entry_t` * Followed by PEL-specific fields: `pel_program_id`, `input_count`, `input_keys[]`, `output_count`, `output_keys[]` * Arrays are **length-prefixed** for serialization --- ## 2. TGK Edge Records ```c #define MAX_FROM 16 #define MAX_TO 16 typedef struct __attribute__((packed)) { uint64_t canonical_edge_id; // unique edge ID uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys uint32_t from_count; // actual number of from nodes uint32_t to_count; // actual number of to nodes uint32_t edge_type; // type key uint8_t roles; // bitmask of roles uint8_t reserved[7]; // padding uint64_t logseq; // log sequence } tgk_edge_record_t; ``` **Encoding notes**: * Fixed-size array simplifies SIMD processing * `from_count` / `to_count` indicate valid entries * Deterministic ordering preserved by `logseq + canonical_edge_id` --- ## 3. Shard-Local Buffers ```c typedef struct { artifact_index_entry_t *artifacts; // pointer to artifact array tgk_edge_record_t *edges; // pointer to TGK edges uint64_t artifact_count; uint64_t edge_count; snapshot_range_t snapshot; // snapshot bounds for this shard } shard_buffer_t; ``` **Binary encoding**: * Continuous memory layout per shard for SIMD operations * `artifact_count` and `edge_count` used for iteration * `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety --- ## 4. Execution Plan Structures ### 4.1 Operator Definition ```c typedef enum { OP_SEGMENT_SCAN, OP_INDEX_FILTER, OP_MERGE, OP_TGK_TRAVERSAL, OP_PROJECTION, OP_AGGREGATION, OP_TOMBSTONE_SHADOW } operator_type_t; typedef struct __attribute__((packed)) { uint32_t op_id; // unique operator ID operator_type_t type; // operator type uint32_t input_count; // number of inputs uint32_t output_count; // number of outputs uint32_t params_length; // length of serialized params uint8_t *params; // pointer to operator parameters uint32_t shard_id; // shard this operator applies to } operator_t; ``` * `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys) * Operators are serialized sequentially in execution plan --- ### 4.2 Execution Plan Serialization ```c typedef struct __attribute__((packed)) { uint32_t plan_id; // unique plan ID uint32_t operator_count; // number of operators operator_t *operators; // pointer to operator array snapshot_range_t snapshot; // snapshot bounds for execution } execution_plan_t; ``` **Encoding**: 1. `plan_id` (4 bytes) 2. `operator_count` (4 bytes) 3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes) 4. Serialized operators (fixed-size header + variable `params`) --- ## 5. Binary Serialization Rules 1. **All structures packed** to prevent gaps (`__attribute__((packed))`) 2. **Canonical byte order**: little-endian for cross-platform compatibility 3. **Pointers** replaced by offsets in serialized form 4. Arrays (inputs, outputs, from/to nodes) **length-prefixed** 5. `logseq` + `canonical_id` used for deterministic ordering --- ## 6. Notes on SIMD / Shard Layout * All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads * Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering * Serialization preserves shard boundaries for distributed execution and federation propagation --- ## 7. Summary This C struct mapping and binary encoding specification: * Covers **artifact, PER, TGK edge**, and **execution plan** structures * Enables **snapshot-safe deterministic execution** * Supports **SIMD/shard acceleration** * Is ready for **C libraries, API frontends, and cross-node federation** * Preserves **provenance, logseq ordering, and deterministic replay** --- Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism. Do you want me to draft that next?