2026-01-17 07:37:47 +01:00
NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
2026-01-17 00:19:49 +01:00
Absolutely — here’ s a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains** , preserving memory alignment, determinism, and snapshot safety.
---
# Unified System C Struct Mapping and Binary Encoding Specification
---
## 1. Artifact & PER Records
### 1.1 Artifact Index Entry
```c
typedef struct __attribute__ ((packed)) {
uint64_t artifact_key; // canonical ArtifactKey
uint64_t block_id; // CAS/ASL block ID
uint32_t offset; // offset within block
uint32_t length; // length in bytes
uint32_t type_tag; // optional type tag
uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise
uint8_t reserved[3]; // padding for 8-byte alignment
uint64_t logseq; // monotonic log sequence
} artifact_index_entry_t;
```
**Binary encoding**:
| Field | Bytes | Notes |
| ------------ | ----- | ----------------------- |
| artifact_key | 8 | canonical ID |
| block_id | 8 | ZFS CAS block reference |
| offset | 4 | offset in block |
| length | 4 | payload size |
| type_tag | 4 | optional type |
| has_type_tag | 1 | toggle |
| reserved | 3 | alignment padding |
| logseq | 8 | monotonic sequence |
---
### 1.2 PER (PEL Execution Receipt) Record
```c
typedef struct __attribute__ ((packed)) {
artifact_index_entry_t base_artifact; // embedded artifact info
uint64_t pel_program_id; // PEL program DAG canonical ID
uint32_t input_count; // number of input artifacts
uint64_t *input_keys; // array of ArtifactKeys
uint32_t output_count; // number of outputs
uint64_t *output_keys; // array of ArtifactKeys
} per_record_t;
```
**Encoding notes**:
* Base artifact encoding is identical to `artifact_index_entry_t`
* Followed by PEL-specific fields: `pel_program_id` , `input_count` , `input_keys[]` , `output_count` , `output_keys[]`
* Arrays are **length-prefixed** for serialization
---
## 2. TGK Edge Records
```c
#define MAX_FROM 16
#define MAX_TO 16
typedef struct __attribute__ ((packed)) {
uint64_t canonical_edge_id; // unique edge ID
uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys
uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys
uint32_t from_count; // actual number of from nodes
uint32_t to_count; // actual number of to nodes
uint32_t edge_type; // type key
uint8_t roles; // bitmask of roles
uint8_t reserved[7]; // padding
uint64_t logseq; // log sequence
} tgk_edge_record_t;
```
**Encoding notes**:
* Fixed-size array simplifies SIMD processing
* `from_count` / `to_count` indicate valid entries
* Deterministic ordering preserved by `logseq + canonical_edge_id`
---
## 3. Shard-Local Buffers
```c
typedef struct {
artifact_index_entry_t *artifacts; // pointer to artifact array
tgk_edge_record_t *edges; // pointer to TGK edges
uint64_t artifact_count;
uint64_t edge_count;
snapshot_range_t snapshot; // snapshot bounds for this shard
} shard_buffer_t;
```
**Binary encoding**:
* Continuous memory layout per shard for SIMD operations
* `artifact_count` and `edge_count` used for iteration
* `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety
---
## 4. Execution Plan Structures
### 4.1 Operator Definition
```c
typedef enum {
OP_SEGMENT_SCAN,
OP_INDEX_FILTER,
OP_MERGE,
OP_TGK_TRAVERSAL,
OP_PROJECTION,
OP_AGGREGATION,
OP_TOMBSTONE_SHADOW
} operator_type_t;
typedef struct __attribute__ ((packed)) {
uint32_t op_id; // unique operator ID
operator_type_t type; // operator type
uint32_t input_count; // number of inputs
uint32_t output_count; // number of outputs
uint32_t params_length; // length of serialized params
uint8_t *params; // pointer to operator parameters
uint32_t shard_id; // shard this operator applies to
} operator_t;
```
* `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys)
* Operators are serialized sequentially in execution plan
---
### 4.2 Execution Plan Serialization
```c
typedef struct __attribute__ ((packed)) {
uint32_t plan_id; // unique plan ID
uint32_t operator_count; // number of operators
operator_t *operators; // pointer to operator array
snapshot_range_t snapshot; // snapshot bounds for execution
} execution_plan_t;
```
**Encoding**:
1. `plan_id` (4 bytes)
2. `operator_count` (4 bytes)
3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes)
4. Serialized operators (fixed-size header + variable `params` )
---
## 5. Binary Serialization Rules
1. **All structures packed** to prevent gaps (`__attribute__((packed))`)
2. **Canonical byte order** : little-endian for cross-platform compatibility
3. **Pointers** replaced by offsets in serialized form
4. Arrays (inputs, outputs, from/to nodes) **length-prefixed**
5. `logseq` + `canonical_id` used for deterministic ordering
---
## 6. Notes on SIMD / Shard Layout
* All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads
* Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering
* Serialization preserves shard boundaries for distributed execution and federation propagation
---
## 7. Summary
This C struct mapping and binary encoding specification:
* Covers **artifact, PER, TGK edge** , and **execution plan** structures
* Enables **snapshot-safe deterministic execution**
* Supports **SIMD/shard acceleration**
* Is ready for **C libraries, API frontends, and cross-node federation**
* Preserves **provenance, logseq ordering, and deterministic replay**
---
Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.
Do you want me to draft that next?