194 lines
6.4 KiB
Markdown
194 lines
6.4 KiB
Markdown
|
|
Absolutely — here’s a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains**, preserving memory alignment, determinism, and snapshot safety.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Unified System C Struct Mapping and Binary Encoding Specification
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Artifact & PER Records
|
|||
|
|
|
|||
|
|
### 1.1 Artifact Index Entry
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
uint64_t artifact_key; // canonical ArtifactKey
|
|||
|
|
uint64_t block_id; // CAS/ASL block ID
|
|||
|
|
uint32_t offset; // offset within block
|
|||
|
|
uint32_t length; // length in bytes
|
|||
|
|
uint32_t type_tag; // optional type tag
|
|||
|
|
uint8_t has_type_tag; // 1 if type_tag is valid, 0 otherwise
|
|||
|
|
uint8_t reserved[3]; // padding for 8-byte alignment
|
|||
|
|
uint64_t logseq; // monotonic log sequence
|
|||
|
|
} artifact_index_entry_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Binary encoding**:
|
|||
|
|
|
|||
|
|
| Field | Bytes | Notes |
|
|||
|
|
| ------------ | ----- | ----------------------- |
|
|||
|
|
| artifact_key | 8 | canonical ID |
|
|||
|
|
| block_id | 8 | ZFS CAS block reference |
|
|||
|
|
| offset | 4 | offset in block |
|
|||
|
|
| length | 4 | payload size |
|
|||
|
|
| type_tag | 4 | optional type |
|
|||
|
|
| has_type_tag | 1 | toggle |
|
|||
|
|
| reserved | 3 | alignment padding |
|
|||
|
|
| logseq | 8 | monotonic sequence |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 1.2 PER (PEL Execution Receipt) Record
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
artifact_index_entry_t base_artifact; // embedded artifact info
|
|||
|
|
uint64_t pel_program_id; // PEL program DAG canonical ID
|
|||
|
|
uint32_t input_count; // number of input artifacts
|
|||
|
|
uint64_t *input_keys; // array of ArtifactKeys
|
|||
|
|
uint32_t output_count; // number of outputs
|
|||
|
|
uint64_t *output_keys; // array of ArtifactKeys
|
|||
|
|
} per_record_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Encoding notes**:
|
|||
|
|
|
|||
|
|
* Base artifact encoding is identical to `artifact_index_entry_t`
|
|||
|
|
* Followed by PEL-specific fields: `pel_program_id`, `input_count`, `input_keys[]`, `output_count`, `output_keys[]`
|
|||
|
|
* Arrays are **length-prefixed** for serialization
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. TGK Edge Records
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#define MAX_FROM 16
|
|||
|
|
#define MAX_TO 16
|
|||
|
|
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
uint64_t canonical_edge_id; // unique edge ID
|
|||
|
|
uint64_t from_nodes[MAX_FROM]; // from node ArtifactKeys
|
|||
|
|
uint64_t to_nodes[MAX_TO]; // to node ArtifactKeys
|
|||
|
|
uint32_t from_count; // actual number of from nodes
|
|||
|
|
uint32_t to_count; // actual number of to nodes
|
|||
|
|
uint32_t edge_type; // type key
|
|||
|
|
uint8_t roles; // bitmask of roles
|
|||
|
|
uint8_t reserved[7]; // padding
|
|||
|
|
uint64_t logseq; // log sequence
|
|||
|
|
} tgk_edge_record_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Encoding notes**:
|
|||
|
|
|
|||
|
|
* Fixed-size array simplifies SIMD processing
|
|||
|
|
* `from_count` / `to_count` indicate valid entries
|
|||
|
|
* Deterministic ordering preserved by `logseq + canonical_edge_id`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Shard-Local Buffers
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef struct {
|
|||
|
|
artifact_index_entry_t *artifacts; // pointer to artifact array
|
|||
|
|
tgk_edge_record_t *edges; // pointer to TGK edges
|
|||
|
|
uint64_t artifact_count;
|
|||
|
|
uint64_t edge_count;
|
|||
|
|
snapshot_range_t snapshot; // snapshot bounds for this shard
|
|||
|
|
} shard_buffer_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Binary encoding**:
|
|||
|
|
|
|||
|
|
* Continuous memory layout per shard for SIMD operations
|
|||
|
|
* `artifact_count` and `edge_count` used for iteration
|
|||
|
|
* `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Execution Plan Structures
|
|||
|
|
|
|||
|
|
### 4.1 Operator Definition
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef enum {
|
|||
|
|
OP_SEGMENT_SCAN,
|
|||
|
|
OP_INDEX_FILTER,
|
|||
|
|
OP_MERGE,
|
|||
|
|
OP_TGK_TRAVERSAL,
|
|||
|
|
OP_PROJECTION,
|
|||
|
|
OP_AGGREGATION,
|
|||
|
|
OP_TOMBSTONE_SHADOW
|
|||
|
|
} operator_type_t;
|
|||
|
|
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
uint32_t op_id; // unique operator ID
|
|||
|
|
operator_type_t type; // operator type
|
|||
|
|
uint32_t input_count; // number of inputs
|
|||
|
|
uint32_t output_count; // number of outputs
|
|||
|
|
uint32_t params_length; // length of serialized params
|
|||
|
|
uint8_t *params; // pointer to operator parameters
|
|||
|
|
uint32_t shard_id; // shard this operator applies to
|
|||
|
|
} operator_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
* `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys)
|
|||
|
|
* Operators are serialized sequentially in execution plan
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.2 Execution Plan Serialization
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
uint32_t plan_id; // unique plan ID
|
|||
|
|
uint32_t operator_count; // number of operators
|
|||
|
|
operator_t *operators; // pointer to operator array
|
|||
|
|
snapshot_range_t snapshot; // snapshot bounds for execution
|
|||
|
|
} execution_plan_t;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Encoding**:
|
|||
|
|
|
|||
|
|
1. `plan_id` (4 bytes)
|
|||
|
|
2. `operator_count` (4 bytes)
|
|||
|
|
3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes)
|
|||
|
|
4. Serialized operators (fixed-size header + variable `params`)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Binary Serialization Rules
|
|||
|
|
|
|||
|
|
1. **All structures packed** to prevent gaps (`__attribute__((packed))`)
|
|||
|
|
2. **Canonical byte order**: little-endian for cross-platform compatibility
|
|||
|
|
3. **Pointers** replaced by offsets in serialized form
|
|||
|
|
4. Arrays (inputs, outputs, from/to nodes) **length-prefixed**
|
|||
|
|
5. `logseq` + `canonical_id` used for deterministic ordering
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Notes on SIMD / Shard Layout
|
|||
|
|
|
|||
|
|
* All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads
|
|||
|
|
* Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering
|
|||
|
|
* Serialization preserves shard boundaries for distributed execution and federation propagation
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Summary
|
|||
|
|
|
|||
|
|
This C struct mapping and binary encoding specification:
|
|||
|
|
|
|||
|
|
* Covers **artifact, PER, TGK edge**, and **execution plan** structures
|
|||
|
|
* Enables **snapshot-safe deterministic execution**
|
|||
|
|
* Supports **SIMD/shard acceleration**
|
|||
|
|
* Is ready for **C libraries, API frontends, and cross-node federation**
|
|||
|
|
* Preserves **provenance, logseq ordering, and deterministic replay**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.
|
|||
|
|
|
|||
|
|
Do you want me to draft that next?
|
|||
|
|
|