194 lines
6.4 KiB
Markdown
194 lines
6.4 KiB
Markdown
Absolutely — here’s a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains**, preserving memory alignment, determinism, and snapshot safety.
|
||
|
||
---
|
||
|
||
# Unified System C Struct Mapping and Binary Encoding Specification
|
||
|
||
---
|
||
|
||
## 1. Artifact & PER Records
|
||
|
||
### 1.1 Artifact Index Entry
|
||
|
||
```c
|
||
typedef struct __attribute__((packed)) {
|
||
uint64_t artifact_key; // canonical ArtifactKey
|
||
uint64_t block_id; // CAS/ASL block ID
|
||
uint32_t offset; // offset within block
|
||
uint32_t length; // length in bytes
|
||
uint32_t type_tag; // optional type tag
|
||
uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise
|
||
uint8_t reserved[3]; // padding for 8-byte alignment
|
||
uint64_t logseq; // monotonic log sequence
|
||
} artifact_index_entry_t;
|
||
```
|
||
|
||
**Binary encoding**:
|
||
|
||
| Field | Bytes | Notes |
|
||
| ------------ | ----- | ----------------------- |
|
||
| artifact_key | 8 | canonical ID |
|
||
| block_id | 8 | ZFS CAS block reference |
|
||
| offset | 4 | offset in block |
|
||
| length | 4 | payload size |
|
||
| type_tag | 4 | optional type |
|
||
| has_type_tag | 1 | toggle |
|
||
| reserved | 3 | alignment padding |
|
||
| logseq | 8 | monotonic sequence |
|
||
|
||
---
|
||
|
||
### 1.2 PER (PEL Execution Receipt) Record
|
||
|
||
```c
|
||
typedef struct __attribute__((packed)) {
|
||
artifact_index_entry_t base_artifact; // embedded artifact info
|
||
uint64_t pel_program_id; // PEL program DAG canonical ID
|
||
uint32_t input_count; // number of input artifacts
|
||
uint64_t *input_keys; // array of ArtifactKeys
|
||
uint32_t output_count; // number of outputs
|
||
uint64_t *output_keys; // array of ArtifactKeys
|
||
} per_record_t;
|
||
```
|
||
|
||
**Encoding notes**:
|
||
|
||
* Base artifact encoding is identical to `artifact_index_entry_t`
|
||
* Followed by PEL-specific fields: `pel_program_id`, `input_count`, `input_keys[]`, `output_count`, `output_keys[]`
|
||
* Arrays are **length-prefixed** for serialization
|
||
|
||
---
|
||
|
||
## 2. TGK Edge Records
|
||
|
||
```c
|
||
#define MAX_FROM 16
|
||
#define MAX_TO 16
|
||
|
||
typedef struct __attribute__((packed)) {
|
||
uint64_t canonical_edge_id; // unique edge ID
|
||
uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys
|
||
uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys
|
||
uint32_t from_count; // actual number of from nodes
|
||
uint32_t to_count; // actual number of to nodes
|
||
uint32_t edge_type; // type key
|
||
uint8_t roles; // bitmask of roles
|
||
uint8_t reserved[7]; // padding
|
||
uint64_t logseq; // log sequence
|
||
} tgk_edge_record_t;
|
||
```
|
||
|
||
**Encoding notes**:
|
||
|
||
* Fixed-size array simplifies SIMD processing
|
||
* `from_count` / `to_count` indicate valid entries
|
||
* Deterministic ordering preserved by `logseq + canonical_edge_id`
|
||
|
||
---
|
||
|
||
## 3. Shard-Local Buffers
|
||
|
||
```c
|
||
typedef struct {
|
||
artifact_index_entry_t *artifacts; // pointer to artifact array
|
||
tgk_edge_record_t *edges; // pointer to TGK edges
|
||
uint64_t artifact_count;
|
||
uint64_t edge_count;
|
||
snapshot_range_t snapshot; // snapshot bounds for this shard
|
||
} shard_buffer_t;
|
||
```
|
||
|
||
**Binary encoding**:
|
||
|
||
* Continuous memory layout per shard for SIMD operations
|
||
* `artifact_count` and `edge_count` used for iteration
|
||
* `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety
|
||
|
||
---
|
||
|
||
## 4. Execution Plan Structures
|
||
|
||
### 4.1 Operator Definition
|
||
|
||
```c
|
||
typedef enum {
|
||
OP_SEGMENT_SCAN,
|
||
OP_INDEX_FILTER,
|
||
OP_MERGE,
|
||
OP_TGK_TRAVERSAL,
|
||
OP_PROJECTION,
|
||
OP_AGGREGATION,
|
||
OP_TOMBSTONE_SHADOW
|
||
} operator_type_t;
|
||
|
||
typedef struct __attribute__((packed)) {
|
||
uint32_t op_id; // unique operator ID
|
||
operator_type_t type; // operator type
|
||
uint32_t input_count; // number of inputs
|
||
uint32_t output_count; // number of outputs
|
||
uint32_t params_length; // length of serialized params
|
||
uint8_t *params; // pointer to operator parameters
|
||
uint32_t shard_id; // shard this operator applies to
|
||
} operator_t;
|
||
```
|
||
|
||
* `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys)
|
||
* Operators are serialized sequentially in execution plan
|
||
|
||
---
|
||
|
||
### 4.2 Execution Plan Serialization
|
||
|
||
```c
|
||
typedef struct __attribute__((packed)) {
|
||
uint32_t plan_id; // unique plan ID
|
||
uint32_t operator_count; // number of operators
|
||
operator_t *operators; // pointer to operator array
|
||
snapshot_range_t snapshot; // snapshot bounds for execution
|
||
} execution_plan_t;
|
||
```
|
||
|
||
**Encoding**:
|
||
|
||
1. `plan_id` (4 bytes)
|
||
2. `operator_count` (4 bytes)
|
||
3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes)
|
||
4. Serialized operators (fixed-size header + variable `params`)
|
||
|
||
---
|
||
|
||
## 5. Binary Serialization Rules
|
||
|
||
1. **All structures packed** to prevent gaps (`__attribute__((packed))`)
|
||
2. **Canonical byte order**: little-endian for cross-platform compatibility
|
||
3. **Pointers** replaced by offsets in serialized form
|
||
4. Arrays (inputs, outputs, from/to nodes) **length-prefixed**
|
||
5. `logseq` + `canonical_id` used for deterministic ordering
|
||
|
||
---
|
||
|
||
## 6. Notes on SIMD / Shard Layout
|
||
|
||
* All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads
|
||
* Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering
|
||
* Serialization preserves shard boundaries for distributed execution and federation propagation
|
||
|
||
---
|
||
|
||
## 7. Summary
|
||
|
||
This C struct mapping and binary encoding specification:
|
||
|
||
* Covers **artifact, PER, TGK edge**, and **execution plan** structures
|
||
* Enables **snapshot-safe deterministic execution**
|
||
* Supports **SIMD/shard acceleration**
|
||
* Is ready for **C libraries, API frontends, and cross-node federation**
|
||
* Preserves **provenance, logseq ordering, and deterministic replay**
|
||
|
||
---
|
||
|
||
Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.
|
||
|
||
Do you want me to draft that next?
|
||
|