amduat-api/notes/Unified System C Struct Mapping and Binary Encoding Specification.md
2026-01-17 00:19:49 +01:00

6.4 KiB
Raw Blame History

Absolutely — heres a formal draft of C structure mappings and binary encoding for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for C libraries, API frontends, and toolchains, preserving memory alignment, determinism, and snapshot safety.


Unified System C Struct Mapping and Binary Encoding Specification


1. Artifact & PER Records

1.1 Artifact Index Entry

typedef struct __attribute__((packed)) {
    uint64_t artifact_key;   // canonical ArtifactKey
    uint64_t block_id;       // CAS/ASL block ID
    uint32_t offset;         // offset within block
    uint32_t length;         // length in bytes
    uint32_t type_tag;       // optional type tag
    uint8_t  has_type_tag;   // 1 if type_tag is valid, 0 otherwise
    uint8_t  reserved[3];    // padding for 8-byte alignment
    uint64_t logseq;         // monotonic log sequence
} artifact_index_entry_t;

Binary encoding:

Field Bytes Notes
artifact_key 8 canonical ID
block_id 8 ZFS CAS block reference
offset 4 offset in block
length 4 payload size
type_tag 4 optional type
has_type_tag 1 toggle
reserved 3 alignment padding
logseq 8 monotonic sequence

1.2 PER (PEL Execution Receipt) Record

typedef struct __attribute__((packed)) {
    artifact_index_entry_t base_artifact;  // embedded artifact info
    uint64_t pel_program_id;               // PEL program DAG canonical ID
    uint32_t input_count;                  // number of input artifacts
    uint64_t *input_keys;                  // array of ArtifactKeys
    uint32_t output_count;                 // number of outputs
    uint64_t *output_keys;                 // array of ArtifactKeys
} per_record_t;

Encoding notes:

  • Base artifact encoding is identical to artifact_index_entry_t
  • Followed by PEL-specific fields: pel_program_id, input_count, input_keys[], output_count, output_keys[]
  • Arrays are length-prefixed for serialization

2. TGK Edge Records

#define MAX_FROM 16
#define MAX_TO   16

typedef struct __attribute__((packed)) {
    uint64_t canonical_edge_id;             // unique edge ID
    uint64_t from_nodes[MAX_FROM];          // from node ArtifactKeys
    uint64_t to_nodes[MAX_TO];              // to node ArtifactKeys
    uint32_t from_count;                    // actual number of from nodes
    uint32_t to_count;                      // actual number of to nodes
    uint32_t edge_type;                     // type key
    uint8_t  roles;                         // bitmask of roles
    uint8_t  reserved[7];                   // padding
    uint64_t logseq;                        // log sequence
} tgk_edge_record_t;

Encoding notes:

  • Fixed-size array simplifies SIMD processing
  • from_count / to_count indicate valid entries
  • Deterministic ordering preserved by logseq + canonical_edge_id

3. Shard-Local Buffers

typedef struct {
    artifact_index_entry_t *artifacts; // pointer to artifact array
    tgk_edge_record_t      *edges;     // pointer to TGK edges
    uint64_t artifact_count;
    uint64_t edge_count;
    snapshot_range_t snapshot;         // snapshot bounds for this shard
} shard_buffer_t;

Binary encoding:

  • Continuous memory layout per shard for SIMD operations
  • artifact_count and edge_count used for iteration
  • snapshot_range_t defines min_logseq and max_logseq for safety

4. Execution Plan Structures

4.1 Operator Definition

typedef enum {
    OP_SEGMENT_SCAN,
    OP_INDEX_FILTER,
    OP_MERGE,
    OP_TGK_TRAVERSAL,
    OP_PROJECTION,
    OP_AGGREGATION,
    OP_TOMBSTONE_SHADOW
} operator_type_t;

typedef struct __attribute__((packed)) {
    uint32_t op_id;                     // unique operator ID
    operator_type_t type;               // operator type
    uint32_t input_count;               // number of inputs
    uint32_t output_count;              // number of outputs
    uint32_t params_length;             // length of serialized params
    uint8_t  *params;                   // pointer to operator parameters
    uint32_t shard_id;                  // shard this operator applies to
} operator_t;
  • params contains operator-specific configuration (e.g., filter masks, edge_type keys)
  • Operators are serialized sequentially in execution plan

4.2 Execution Plan Serialization

typedef struct __attribute__((packed)) {
    uint32_t plan_id;                   // unique plan ID
    uint32_t operator_count;            // number of operators
    operator_t *operators;              // pointer to operator array
    snapshot_range_t snapshot;          // snapshot bounds for execution
} execution_plan_t;

Encoding:

  1. plan_id (4 bytes)
  2. operator_count (4 bytes)
  3. snapshot_range_t (min_logseq, max_logseq, 16 bytes)
  4. Serialized operators (fixed-size header + variable params)

5. Binary Serialization Rules

  1. All structures packed to prevent gaps (__attribute__((packed)))
  2. Canonical byte order: little-endian for cross-platform compatibility
  3. Pointers replaced by offsets in serialized form
  4. Arrays (inputs, outputs, from/to nodes) length-prefixed
  5. logseq + canonical_id used for deterministic ordering

6. Notes on SIMD / Shard Layout

  • All arrays in shard_buffer_t are contiguous and aligned to 64-byte boundaries for vectorized loads
  • Fixed-size arrays in tgk_edge_record_t simplify branchless SIMD filtering
  • Serialization preserves shard boundaries for distributed execution and federation propagation

7. Summary

This C struct mapping and binary encoding specification:

  • Covers artifact, PER, TGK edge, and execution plan structures
  • Enables snapshot-safe deterministic execution
  • Supports SIMD/shard acceleration
  • Is ready for C libraries, API frontends, and cross-node federation
  • Preserves provenance, logseq ordering, and deterministic replay

Next logical step could be formalizing garbage collection and tombstone application rules in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.

Do you want me to draft that next?