6.5 KiB
NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
Absolutely — here’s a formal draft of C structure mappings and binary encoding for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for C libraries, API frontends, and toolchains, preserving memory alignment, determinism, and snapshot safety.
Unified System C Struct Mapping and Binary Encoding Specification
1. Artifact & PER Records
1.1 Artifact Index Entry
typedef struct __attribute__((packed)) {
uint64_t artifact_key; // canonical ArtifactKey
uint64_t block_id; // CAS/ASL block ID
uint32_t offset; // offset within block
uint32_t length; // length in bytes
uint32_t type_tag; // optional type tag
uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise
uint8_t reserved[3]; // padding for 8-byte alignment
uint64_t logseq; // monotonic log sequence
} artifact_index_entry_t;
Binary encoding:
| Field | Bytes | Notes |
|---|---|---|
| artifact_key | 8 | canonical ID |
| block_id | 8 | ZFS CAS block reference |
| offset | 4 | offset in block |
| length | 4 | payload size |
| type_tag | 4 | optional type |
| has_type_tag | 1 | toggle |
| reserved | 3 | alignment padding |
| logseq | 8 | monotonic sequence |
1.2 PER (PEL Execution Receipt) Record
typedef struct __attribute__((packed)) {
artifact_index_entry_t base_artifact; // embedded artifact info
uint64_t pel_program_id; // PEL program DAG canonical ID
uint32_t input_count; // number of input artifacts
uint64_t *input_keys; // array of ArtifactKeys
uint32_t output_count; // number of outputs
uint64_t *output_keys; // array of ArtifactKeys
} per_record_t;
Encoding notes:
- Base artifact encoding is identical to
artifact_index_entry_t - Followed by PEL-specific fields:
pel_program_id,input_count,input_keys[],output_count,output_keys[] - Arrays are length-prefixed for serialization
2. TGK Edge Records
#define MAX_FROM 16
#define MAX_TO 16
typedef struct __attribute__((packed)) {
uint64_t canonical_edge_id; // unique edge ID
uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys
uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys
uint32_t from_count; // actual number of from nodes
uint32_t to_count; // actual number of to nodes
uint32_t edge_type; // type key
uint8_t roles; // bitmask of roles
uint8_t reserved[7]; // padding
uint64_t logseq; // log sequence
} tgk_edge_record_t;
Encoding notes:
- Fixed-size array simplifies SIMD processing
from_count/to_countindicate valid entries- Deterministic ordering preserved by
logseq + canonical_edge_id
3. Shard-Local Buffers
typedef struct {
artifact_index_entry_t *artifacts; // pointer to artifact array
tgk_edge_record_t *edges; // pointer to TGK edges
uint64_t artifact_count;
uint64_t edge_count;
snapshot_range_t snapshot; // snapshot bounds for this shard
} shard_buffer_t;
Binary encoding:
- Continuous memory layout per shard for SIMD operations
artifact_countandedge_countused for iterationsnapshot_range_tdefinesmin_logseqandmax_logseqfor safety
4. Execution Plan Structures
4.1 Operator Definition
typedef enum {
OP_SEGMENT_SCAN,
OP_INDEX_FILTER,
OP_MERGE,
OP_TGK_TRAVERSAL,
OP_PROJECTION,
OP_AGGREGATION,
OP_TOMBSTONE_SHADOW
} operator_type_t;
typedef struct __attribute__((packed)) {
uint32_t op_id; // unique operator ID
operator_type_t type; // operator type
uint32_t input_count; // number of inputs
uint32_t output_count; // number of outputs
uint32_t params_length; // length of serialized params
uint8_t *params; // pointer to operator parameters
uint32_t shard_id; // shard this operator applies to
} operator_t;
paramscontains operator-specific configuration (e.g., filter masks, edge_type keys)- Operators are serialized sequentially in execution plan
4.2 Execution Plan Serialization
typedef struct __attribute__((packed)) {
uint32_t plan_id; // unique plan ID
uint32_t operator_count; // number of operators
operator_t *operators; // pointer to operator array
snapshot_range_t snapshot; // snapshot bounds for execution
} execution_plan_t;
Encoding:
plan_id(4 bytes)operator_count(4 bytes)snapshot_range_t(min_logseq, max_logseq, 16 bytes)- Serialized operators (fixed-size header + variable
params)
5. Binary Serialization Rules
- All structures packed to prevent gaps (
__attribute__((packed))) - Canonical byte order: little-endian for cross-platform compatibility
- Pointers replaced by offsets in serialized form
- Arrays (inputs, outputs, from/to nodes) length-prefixed
logseq+canonical_idused for deterministic ordering
6. Notes on SIMD / Shard Layout
- All arrays in
shard_buffer_tare contiguous and aligned to 64-byte boundaries for vectorized loads - Fixed-size arrays in
tgk_edge_record_tsimplify branchless SIMD filtering - Serialization preserves shard boundaries for distributed execution and federation propagation
7. Summary
This C struct mapping and binary encoding specification:
- Covers artifact, PER, TGK edge, and execution plan structures
- Enables snapshot-safe deterministic execution
- Supports SIMD/shard acceleration
- Is ready for C libraries, API frontends, and cross-node federation
- Preserves provenance, logseq ordering, and deterministic replay
Next logical step could be formalizing garbage collection and tombstone application rules in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.
Do you want me to draft that next?