Absolutely — here’s a formal draft of **C structure mappings and binary encoding** for artifacts, PERs, TGK edges, shards, and execution plans in the unified ASL + TGK + PEL system. This is meant for **C libraries, API frontends, and toolchains**, preserving memory alignment, determinism, and snapshot safety.

---

# Unified System C Struct Mapping and Binary Encoding Specification

---

## 1. Artifact & PER Records

### 1.1 Artifact Index Entry

```c
typedef struct __attribute__((packed)) {
    uint64_t artifact_key;   // canonical ArtifactKey
    uint64_t block_id;       // CAS/ASL block ID
    uint32_t offset;         // offset within block
    uint32_t length;         // length in bytes
    uint32_t type_tag;       // optional type tag
    uint8_t  has_type_tag;   // 1 if type_tag is valid, 0 otherwise
    uint8_t  reserved[3];    // padding for 8-byte alignment
    uint64_t logseq;         // monotonic log sequence
} artifact_index_entry_t;
```

**Binary encoding**:

| Field        | Bytes | Notes                   |
| ------------ | ----- | ----------------------- |
| artifact_key | 8     | canonical ID            |
| block_id     | 8     | ZFS CAS block reference |
| offset       | 4     | offset in block         |
| length       | 4     | payload size            |
| type_tag     | 4     | optional type           |
| has_type_tag | 1     | toggle                  |
| reserved     | 3     | alignment padding       |
| logseq       | 8     | monotonic sequence      |

---

### 1.2 PER (PEL Execution Receipt) Record

```c
typedef struct __attribute__((packed)) {
    artifact_index_entry_t base_artifact;  // embedded artifact info
    uint64_t pel_program_id;               // PEL program DAG canonical ID
    uint32_t input_count;                  // number of input artifacts
    uint64_t *input_keys;                  // array of ArtifactKeys
    uint32_t output_count;                 // number of outputs
    uint64_t *output_keys;                 // array of ArtifactKeys
} per_record_t;
```

**Encoding notes**:

* Base artifact encoding is identical to `artifact_index_entry_t`
* Followed by PEL-specific fields: `pel_program_id`, `input_count`, `input_keys[]`, `output_count`, `output_keys[]`
* Arrays are **length-prefixed** for serialization

---

## 2. TGK Edge Records

```c
#define MAX_FROM 16
#define MAX_TO   16

typedef struct __attribute__((packed)) {
    uint64_t canonical_edge_id;             // unique edge ID
    uint64_t from_nodes[MAX_FROM];          // from node ArtifactKeys
    uint64_t to_nodes[MAX_TO];              // to node ArtifactKeys
    uint32_t from_count;                    // actual number of from nodes
    uint32_t to_count;                      // actual number of to nodes
    uint32_t edge_type;                     // type key
    uint8_t  roles;                         // bitmask of roles
    uint8_t  reserved[7];                   // padding
    uint64_t logseq;                        // log sequence
} tgk_edge_record_t;
```

**Encoding notes**:

* Fixed-size array simplifies SIMD processing
* `from_count` / `to_count` indicate valid entries
* Deterministic ordering preserved by `logseq + canonical_edge_id`

---

## 3. Shard-Local Buffers

```c
typedef struct {
    artifact_index_entry_t *artifacts; // pointer to artifact array
    tgk_edge_record_t      *edges;     // pointer to TGK edges
    uint64_t artifact_count;
    uint64_t edge_count;
    snapshot_range_t snapshot;         // snapshot bounds for this shard
} shard_buffer_t;
```

**Binary encoding**:

* Continuous memory layout per shard for SIMD operations
* `artifact_count` and `edge_count` used for iteration
* `snapshot_range_t` defines `min_logseq` and `max_logseq` for safety

---

## 4. Execution Plan Structures

### 4.1 Operator Definition

```c
typedef enum {
    OP_SEGMENT_SCAN,
    OP_INDEX_FILTER,
    OP_MERGE,
    OP_TGK_TRAVERSAL,
    OP_PROJECTION,
    OP_AGGREGATION,
    OP_TOMBSTONE_SHADOW
} operator_type_t;

typedef struct __attribute__((packed)) {
    uint32_t op_id;                     // unique operator ID
    operator_type_t type;               // operator type
    uint32_t input_count;               // number of inputs
    uint32_t output_count;              // number of outputs
    uint32_t params_length;             // length of serialized params
    uint8_t  *params;                   // pointer to operator parameters
    uint32_t shard_id;                  // shard this operator applies to
} operator_t;
```

* `params` contains **operator-specific configuration** (e.g., filter masks, edge_type keys)
* Operators are serialized sequentially in execution plan

---

### 4.2 Execution Plan Serialization

```c
typedef struct __attribute__((packed)) {
    uint32_t plan_id;                   // unique plan ID
    uint32_t operator_count;            // number of operators
    operator_t *operators;              // pointer to operator array
    snapshot_range_t snapshot;          // snapshot bounds for execution
} execution_plan_t;
```

**Encoding**:

1. `plan_id` (4 bytes)
2. `operator_count` (4 bytes)
3. `snapshot_range_t` (min_logseq, max_logseq, 16 bytes)
4. Serialized operators (fixed-size header + variable `params`)

---

## 5. Binary Serialization Rules

1. **All structures packed** to prevent gaps (`__attribute__((packed))`)
2. **Canonical byte order**: little-endian for cross-platform compatibility
3. **Pointers** replaced by offsets in serialized form
4. Arrays (inputs, outputs, from/to nodes) **length-prefixed**
5. `logseq` + `canonical_id` used for deterministic ordering

---

## 6. Notes on SIMD / Shard Layout

* All arrays in `shard_buffer_t` are **contiguous and aligned to 64-byte boundaries** for vectorized loads
* Fixed-size arrays in `tgk_edge_record_t` simplify branchless SIMD filtering
* Serialization preserves shard boundaries for distributed execution and federation propagation

---

## 7. Summary

This C struct mapping and binary encoding specification:

* Covers **artifact, PER, TGK edge**, and **execution plan** structures
* Enables **snapshot-safe deterministic execution**
* Supports **SIMD/shard acceleration**
* Is ready for **C libraries, API frontends, and cross-node federation**
* Preserves **provenance, logseq ordering, and deterministic replay**

---

Next logical step could be **formalizing garbage collection and tombstone application rules** in binary form, ensuring unreachable artifacts can be removed safely while preserving determinism.

Do you want me to draft that next?