amduat-api/notes/Unified System C Struct Mapping and Binary Encoding Specification.md
2026-01-17 07:37:47 +01:00

196 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

NOTE: Superseded by tier1 TGK/1 and vendor/amduat/tier1/tgk-1-core.md; retained for historical context.
Absolutely — heres a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains**, preserving memory alignment, determinism, and snapshot safety.
---
# Unified System C Struct Mapping and Binary Encoding Specification
---
## 1. Artifact & PER Records
### 1.1 Artifact Index Entry
```c
typedef struct __attribute__((packed)) {
uint64_t artifact_key; // canonical ArtifactKey
uint64_t block_id; // CAS/ASL block ID
uint32_t offset; // offset within block
uint32_t length; // length in bytes
uint32_t type_tag; // optional type tag
uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise
uint8_t reserved[3]; // padding for 8-byte alignment
uint64_t logseq; // monotonic log sequence
} artifact_index_entry_t;
```
**Binary encoding**:
| Field | Bytes | Notes |
| ------------ | ----- | ----------------------- |
| artifact_key | 8 | canonical ID |
| block_id | 8 | ZFS CAS block reference |
| offset | 4 | offset in block |
| length | 4 | payload size |
| type_tag | 4 | optional type |
| has_type_tag | 1 | toggle |
| reserved | 3 | alignment padding |
| logseq | 8 | monotonic sequence |
---
### 1.2 PER (PEL Execution Receipt) Record
```c
typedef struct __attribute__((packed)) {
artifact_index_entry_t base_artifact; // embedded artifact info
uint64_t pel_program_id; // PEL program DAG canonical ID
uint32_t input_count; // number of input artifacts
uint64_t *input_keys; // array of ArtifactKeys
uint32_t output_count; // number of outputs
uint64_t *output_keys; // array of ArtifactKeys
} per_record_t;
```
**Encoding notes**:
* Base artifact encoding is identical to `artifact_index_entry_t`
* Followed by PEL-specific fields: `pel_program_id`, `input_count`, `input_keys[]`, `output_count`, `output_keys[]`
* Arrays are **length-prefixed** for serialization
---
## 2. TGK Edge Records
```c
#define MAX_FROM 16
#define MAX_TO 16
typedef struct __attribute__((packed)) {
uint64_t canonical_edge_id; // unique edge ID
uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys
uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys
uint32_t from_count; // actual number of from nodes
uint32_t to_count; // actual number of to nodes
uint32_t edge_type; // type key
uint8_t roles; // bitmask of roles
uint8_t reserved[7]; // padding
uint64_t logseq; // log sequence
} tgk_edge_record_t;
```
**Encoding notes**:
* Fixed-size array simplifies SIMD processing
* `from_count` / `to_count` indicate valid entries
* Deterministic ordering preserved by `logseq + canonical_edge_id`
---
## 3. Shard-Local Buffers
```c
typedef struct {
artifact_index_entry_t *artifacts; // pointer to artifact array
tgk_edge_record_t *edges; // pointer to TGK edges
uint64_t artifact_count;
uint64_t edge_count;
snapshot_range_t snapshot; // snapshot bounds for this shard
} shard_buffer_t;
```
**Binary encoding**:
* Continuous memory layout per shard for SIMD operations
* `artifact_count` and `edge_count` used for iteration
* `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety
---
## 4. Execution Plan Structures
### 4.1 Operator Definition
```c
typedef enum {
OP_SEGMENT_SCAN,
OP_INDEX_FILTER,
OP_MERGE,
OP_TGK_TRAVERSAL,
OP_PROJECTION,
OP_AGGREGATION,
OP_TOMBSTONE_SHADOW
} operator_type_t;
typedef struct __attribute__((packed)) {
uint32_t op_id; // unique operator ID
operator_type_t type; // operator type
uint32_t input_count; // number of inputs
uint32_t output_count; // number of outputs
uint32_t params_length; // length of serialized params
uint8_t *params; // pointer to operator parameters
uint32_t shard_id; // shard this operator applies to
} operator_t;
```
* `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys)
* Operators are serialized sequentially in execution plan
---
### 4.2 Execution Plan Serialization
```c
typedef struct __attribute__((packed)) {
uint32_t plan_id; // unique plan ID
uint32_t operator_count; // number of operators
operator_t *operators; // pointer to operator array
snapshot_range_t snapshot; // snapshot bounds for execution
} execution_plan_t;
```
**Encoding**:
1. `plan_id` (4 bytes)
2. `operator_count` (4 bytes)
3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes)
4. Serialized operators (fixed-size header + variable `params`)
---
## 5. Binary Serialization Rules
1. **All structures packed** to prevent gaps (`__attribute__((packed))`)
2. **Canonical byte order**: little-endian for cross-platform compatibility
3. **Pointers** replaced by offsets in serialized form
4. Arrays (inputs, outputs, from/to nodes) **length-prefixed**
5. `logseq` + `canonical_id` used for deterministic ordering
---
## 6. Notes on SIMD / Shard Layout
* All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads
* Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering
* Serialization preserves shard boundaries for distributed execution and federation propagation
---
## 7. Summary
This C struct mapping and binary encoding specification:
* Covers **artifact, PER, TGK edge**, and **execution plan** structures
* Enables **snapshot-safe deterministic execution**
* Supports **SIMD/shard acceleration**
* Is ready for **C libraries, API frontends, and cross-node federation**
* Preserves **provenance, logseq ordering, and deterministic replay**
---
Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.
Do you want me to draft that next?