amduat/tier1/enc-pel-program-dag-1.md
2025-12-20 12:35:10 +01:00

735 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ENC/PEL-PROGRAM-DAG/1 — Canonical Encoding for DAG Programs
Status: Approved
Owner: Niklas Rydberg
Version: 0.2.0
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [binary-minimalism, deterministic]
<!-- Source: /amduat/docs/new/enc-pel-program-dag-1.md | Canonical: /amduat/tier1/enc-pel-program-dag-1.md -->
**Document ID:** `ENC/PEL-PROGRAM-DAG/1`
**Profile ID:** `PEL_ENC_PROGRAM_DAG_V1 = 0x0101`
**Layer:** Scheme Encoding Profile (on top of ASL/1-CORE + PEL/PROGRAM-DAG/1)
**Depends on (normative):**
* `ASL/1-CORE v0.4.x` — value model (`Artifact`, `TypeTag`, `Reference`, integers, `OctetString`)
* `ENC/ASL1-CORE v1.0.3` — canonical encoding conventions (integers, `OctetString`, streaming constraints)
* `PEL/PROGRAM-DAG/1 v0.3.1` — DAG Program scheme (`Program` / `Node` model, semantics, canonical topological order)
**Integrates with (informative):**
* `PEL/1-CORE v0.3.x` — primitive execution layer (`Exec_s` / `Exec_DAG`)
* `PEL/1-SURF v0.2.x` — store-backed execution surface
* `HASH/ASL1 v0.2.4` — reference formation over canonical encodings
* TypeTag registry (for `TYPE_TAG_PEL_PROGRAM_DAG_1`)
* Operation registries (e.g. `OPREG/PEL1-KERNEL` and param profiles)
> **Note:** The Profile ID `PEL_ENC_PROGRAM_DAG_V1` is a configuration label.
> It is **not** embedded in the encoded Program bytes. Selection of this encoding profile is done by context (scheme descriptor, store or engine configuration), not per value.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Overview
`ENC/PEL-PROGRAM-DAG/1` defines the **canonical binary encoding** of the `Program` structure defined in `PEL/PROGRAM-DAG/1`:
```text
Program {
nodes: list<Node>
roots: list<RootRef>
}
````
and its sub-structures:
* `OperationId`
* `DagInputExternal`, `DagInputNode`, `DagInput`
* `Node`
* `RootRef`
This encoding:
* is **injective** with respect to the logical `Program` model — distinct Programs → distinct byte strings under this profile,
* is **stable and deterministic** across implementations and time,
* is **streaming-friendly** — encoders and decoders can operate in a single forward pass,
* fixes a **canonical topological ordering** for `nodes` (matching the scheme spec).
The result is used as the payload (`Artifact.bytes`) of Program Artifacts for the `PEL/PROGRAM-DAG/1` scheme, with:
```text
Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
Artifact.bytes = ProgramBytes
```
Identity of Program Artifacts is then derived via `HASH/ASL1` over `ArtifactBytes` as usual.
---
## 1. Scope & Layering
### 1.1 Purpose
This specification defines:
* The concrete binary layout of:
* `ProgramBytes`
* `NodeBytes`
* `DagInputBytes`
* `RootRefBytes`
* `OperationIdBytes`
* Canonicalization rules:
* Node ordering (canonical topological order),
* fixed field ordering,
* integer widths and encodings.
It does **not** define:
* The logical semantics of Programs (DAG evaluation, error statuses, etc.) — those are in `PEL/PROGRAM-DAG/1` and `PEL/1-CORE`.
* The ASL/1 `Artifact` or `Reference` layouts — those are in `ASL/1-CORE` and `ENC/ASL1-CORE`.
* How Programs are used in store-backed execution — that belongs to `PEL/1-SURF`.
### 1.2 Layering constraints
In line with `SUBSTRATE/STACK-OVERVIEW`:
* `ENC/PEL-PROGRAM-DAG/1` is a **scheme-specific encoding profile**.
* It MUST NOT redefine:
* `Artifact`, `TypeTag`, `Reference`, or `HashId` (from `ASL/1-CORE`),
* the logical `Program` / `Node` / `DagInput` / `RootRef` model (from `PEL/PROGRAM-DAG/1`).
* It is **storage-neutral** and **policy-neutral**.
* It defines exactly one canonical encoding for `Program` values for this scheme under the profile ID `PEL_ENC_PROGRAM_DAG_V1`.
---
## 2. Conventions
RFC 2119 terms (**MUST**, **SHOULD**, **MAY**, etc.) are normative.
### 2.1 Integer encodings
All multi-byte integers are encoded as **big-endian** (network byte order), as in `ENC/ASL1-CORE`:
* `u8` — 1 byte
* `u16` — 2 bytes
* `u32` — 4 bytes
* `u64` — 8 bytes
Only **fixed-width** integers are used in this specification.
### 2.2 Utf8String
This specification defines a canonical `Utf8String` encoding:
```text
Utf8String = length (u32) || bytes[0..length-1]
```
* `length` is the number of bytes of UTF-8 data.
* `length` MAY be zero.
* Decoders MUST validate that the byte sequence is well-formed UTF-8.
* There is no padding or terminator.
All strings in this profile (`OperationId.name`) are encoded as `Utf8String`.
### 2.3 Parameter bytes
For operation parameters, this profile uses a compact `ParamsBytes` encoding:
```text
ParamsBytes = length (u32) || bytes[0..length-1]
```
* `bytes` is an opaque blob whose interpretation is defined per operation in the operation registry.
* `length` MAY be zero (empty params).
> This differs from the general `OctetString` encoding (which uses `u64` length and is defined in `ENC/ASL1-CORE`).
> Using `u32` length here is acceptable because this structure lives *inside* the Program Artifact payload, not as an ASL/1 top-level value.
#### 2.3.1 Parameter profiles and canonicality
When a `Program` is interpreted under a concrete operation registry + parameter profile set:
* For each operation `(name, version)`:
* There MUST be a well-defined abstract parameter model (`ParamsValue`).
* There MUST be exactly one canonical encode/decode pair:
```text
encode_params_op : ParamsValue -> ParamsBytes
decode_params_op : ParamsBytes -> ParamsValue | error
```
* All conformant implementations of that operation MUST:
* decode `ParamsBytes` into the same `ParamsValue`, or
* deterministically detect decoding failures.
Kernel parameter profiles (e.g. `OPREG/PEL1-KERNEL-PARAMS/1`) MUST ensure:
* `encode_params_op` followed by `decode_params_op` round-trips exactly.
* Any `ParamsBytes` that fails `decode_params_op` MUST be treated as a **program-level validation error** (`INVALID_PROGRAM`) under `PEL/PROGRAM-DAG/1`, not as a runtime failure.
### 2.4 Lists
A list of values of some type `T` is encoded as:
```text
List<T> = count (u32) || element_0 || element_1 || ... || element_{count-1}
```
* `count` is the number of elements (MAY be zero).
* Elements are encoded in order, using the canonical encoding of `T`.
### 2.5 Encoding version field
`ProgramBytes` includes a `program_version (u16)` field:
* In this profile, `program_version` MUST be `1`.
* Any future incompatible change to the layout of `ProgramBytes` under the same profile ID MUST be reflected by a new `program_version` value (and corresponding decoder support).
* Adding fields in a backward-compatible, strictly-append-only way SHOULD be done via a new encoding profile rather than overloading `program_version = 1`.
Decoders MUST reject any `program_version` they do not implement.
---
## 3. Logical Model Reference
For convenience, the logical types from `PEL/PROGRAM-DAG/1` are restated informally (normative source remains that document):
```text
OperationId {
name: string
version: uint32
}
DagInputExternal {
input_index: uint32
}
DagInputNode {
node_id: NodeId // uint32
output_index: uint32
}
DagInput =
DagInputExternal
| DagInputNode
Node {
id: NodeId // uint32
op: OperationId
inputs: list<DagInput>
params: Params // abstract; serialized as ParamsBytes in this profile
}
RootRef {
node_id: NodeId // uint32
output_index: uint32
}
Program {
nodes: list<Node>
roots: list<RootRef>
}
NodeId = uint32
```
`PEL/PROGRAM-DAG/1` further defines:
* Structural validity rules (unique NodeIds, acyclicity, etc.).
* Canonical topological order of Nodes.
This encoding profile assumes the logical model and validity rules as given there. It **does not** re-check them at the encoding-layer; that is scheme-level responsibility.
---
## 4. Program Encoding
### 4.1 Program header and overall layout
The canonical encoding of a `Program` value is:
```text
ProgramBytes ::
program_version (u16)
node_count (u32)
nodes (NodeBytes[0..node_count-1])
root_count (u32)
roots (RootRefBytes[0..root_count-1])
```
Constraints:
* `program_version` MUST currently be `1`.
Decoders:
* MUST accept `program_version = 1`.
* MUST reject any other value as an unsupported encoding version.
* `node_count` and `root_count` are the number of elements in the corresponding lists.
* `nodes` MUST be encoded in the **canonical topological order** defined in `PEL/PROGRAM-DAG/1 §4`. Encoders MUST perform this ordering; decoders MAY rely on it but are not required to re-check DAG properties.
* `roots` MUST be encoded in the same order as the logical `Program.roots` list.
### 4.2 Node encoding
Each `Node` is encoded as:
```text
NodeBytes ::
node_id (u32)
op_name (Utf8String)
op_version (u32)
input_count (u32)
inputs (DagInputBytes[0..input_count-1])
params_len (u32)
params_bytes (byte[0..params_len-1])
```
Field meanings:
1. `node_id (u32)`
* Encodes `Node.id`.
* MUST be unique across all Nodes in a Program (scheme-level requirement).
2. `op_name (Utf8String)`
* Encodes `OperationId.name` as UTF-8.
3. `op_version (u32)`
* Encodes `OperationId.version`.
4. `input_count (u32)`
* Number of input references consumed by this Node.
5. `inputs`
* Exactly `input_count` entries, each encoded as `DagInputBytes` (see §4.3) in order.
6. `params_len (u32)` and `params_bytes`
* `params_len` = length of the operation-specific parameter blob.
* `params_bytes` is an opaque blob whose interpretation is defined by the operations parameter profile.
* A `params_len` of `0` encodes an empty parameter blob.
> **Injectivity requirement (Node)**
> For a fixed interpretation of `ParamsBytes` and `OperationId`, distinct logical `Node` values (differing in any field) MUST produce distinct `NodeBytes`.
> Given `NodeBytes` and the corresponding operation registry, a canonical decoder MUST reconstruct exactly the same logical `Node`.
### 4.3 DagInput encoding
Each `DagInput` is encoded as a tagged union:
```text
DagInputBytes ::
kind (u8)
payload(...)
```
Where `kind` is:
```text
0x00 => DagInputExternal
0x01 => DagInputNode
```
#### 4.3.1 DagInputExternal
For:
```text
DagInputExternal {
input_index: uint32
}
```
Encoding:
```text
kind = 0x00
input_index (u32)
```
`input_index` is the 0-based index into the `inputs` list passed to `Exec_DAG`.
#### 4.3.2 DagInputNode
For:
```text
DagInputNode {
node_id: NodeId // uint32
output_index: uint32
}
```
Encoding:
```text
kind = 0x01
node_id (u32)
output_index (u32)
```
`node_id` MUST refer to some `Node.id` in the same Program (scheme-level validity).
`output_index` is the 0-based index into that Nodes output list.
#### 4.3.3 Decoder behavior
Decoders MUST:
* Treat any `kind` value other than `0x00` or `0x01` as an encoding error (invalid `DagInput`).
* For `kind = 0x00`, read exactly one `u32` as `input_index`.
* For `kind = 0x01`, read exactly two `u32` values (`node_id`, `output_index`).
* Reject truncated encodings (insufficient bytes for the payload).
Structural validity of indices (e.g., `node_id` existence, output arity) is enforced at the scheme level, not the encoding layer.
### 4.4 RootRef encoding
For:
```text
RootRef {
node_id: NodeId
output_index: uint32
}
```
Encoding:
```text
RootRefBytes ::
node_id (u32)
output_index (u32)
```
* `RootRefBytes` is identical to the payload of `DagInputNode`, but without a `kind` byte.
Roots are always Node outputs, so no variant tag is needed.
* The `roots` list in `ProgramBytes` MUST encode each `RootRef` in the logical order of `Program.roots`.
Decoders MUST reject truncated entries (insufficient bytes for both `u32` values).
---
## 5. Canonicality Requirements
### 5.1 Node ordering in Program
Encoders MUST:
* encode `Program.nodes` in the **canonical topological order** defined by `PEL/PROGRAM-DAG/1 §4`:
* Dependencies appear before dependents.
* Ties are broken by smallest `NodeId` (numeric, ascending).
* ensure that the `node_count` written in `ProgramBytes` equals the number of encoded `NodeBytes`.
Decoders:
* MAY assume the encoded order corresponds to the canonical topological order.
* MAY perform additional checks (e.g., verifying acyclicity), but this is not required for basic decoding.
### 5.2 Field ordering
Field ordering in all structures is fixed and MUST NOT vary:
* `ProgramBytes``program_version`, `node_count`, `nodes…`, `root_count`, `roots…`
* `NodeBytes``node_id`, `op_name`, `op_version`, `input_count`, `inputs…`, `params_len`, `params_bytes…`
* `DagInputBytes``kind`, then variant-specific payload.
* `RootRefBytes``node_id`, `output_index`.
Any deviation MUST be treated as an encoding error.
### 5.3 Injectivity and stability
The mapping:
```text
Program -> ProgramBytes
```
defined by this profile MUST be:
* **Injective** — if `P1 != P2` as logical `Program` values (per `PEL/PROGRAM-DAG/1`), then `ProgramBytes(P1) != ProgramBytes(P2)`.
* **Stable** — the same logical `Program` MUST encode to the same `ProgramBytes` across:
* different implementations,
* platforms,
* executions,
* and times,
given the same version of this encoding profile and the same underlying operation/param profiles.
Encoders MUST NOT:
* reorder Nodes other than by the canonical topological order,
* reorder inputs within a Node,
* reorder roots,
* introduce alternative encodings for integers, strings, or params.
---
## 6. Program Artifact Binding
### 6.1 TypeTag
Program Artifacts for this scheme MUST be encoded as:
```text
Artifact {
bytes = ProgramBytes
type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
}
```
Where:
* `TYPE_TAG_PEL_PROGRAM_DAG_1` is a `TypeTag` with a concrete `tag_id` assigned in the global TypeTag registry.
This encoding profile:
* uses `TYPE_TAG_PEL_PROGRAM_DAG_1` symbolically, and
* does not assign a specific numeric `tag_id`; that is done in a registry document.
### 6.2 Identity via ASL/1-CORE and HASH/ASL1
Given `ENC/ASL1-CORE v1` as the canonical encoding for `Artifact` and some chosen ASL1 hash algorithm `H` (e.g. `HASH-ASL1-256` under `HashId = 0x0001`):
1. The canonical `ArtifactBytes` for a Program Artifact is given by `ENC/ASL1-CORE v1`:
```text
ArtifactBytes =
encode_artifact_core_v1(
Artifact{ bytes = ProgramBytes, type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 }
)
```
2. The canonical `Reference` for that Artifact under `HashId = HID` is:
```text
digest = H(ArtifactBytes)
reference = Reference { hash_id = HID, digest = digest }
```
All conformant implementations MUST agree on:
* `ProgramBytes` for a given logical `Program`,
* `ArtifactBytes` for the Program Artifact,
* the resulting `Reference` for any fixed `(HashId, H)`.
---
## 7. Error Handling (Encoding Level)
This encoding profile defines only **structural encoding errors**. Handling of scheme-level validity errors (`INVALID_PROGRAM`, etc.) is done by `PEL/PROGRAM-DAG/1` and `PEL/1-CORE`.
Decoders MUST treat as encoding errors:
1. **Truncated fields**
* Not enough bytes to read any declared integer, string, list, or params blob.
2. **Unsupported `program_version`**
* `program_version != 1`.
3. **Invalid `DagInput.kind`**
* `kind` is not `0x00` or `0x01`.
4. **Invalid `Utf8String`**
* `op_name` bytes are not valid UTF-8.
5. **Inconsistent list lengths**
* Fewer or more `NodeBytes` than indicated by `node_count`.
* Fewer or more `RootRefBytes` than indicated by `root_count`.
These are **encoding-layer** issues. The exact error codes surfaced to callers (e.g., `ERR_PEL_ENC_INVALID`) are implementation-specific but MUST result in rejection of the Program bytes as malformed under this encoding profile.
---
## 8. Streaming and Implementation Notes
Implementations MUST be able to:
* Encode any `Program` using a single forward pass over the canonical node order:
* compute canonical topological order first (requires holding `Program` structure),
* then write fields in the order defined above.
* Decode any `ProgramBytes` sequentially:
* no backtracking or multi-pass parsing is required,
* all length prefixes appear before their content.
For very large Programs:
* Implementations MAY:
* stream Nodes one by one into an internal representation,
* stream `params_bytes` to a buffer or directly into an operation-registry decoder.
* They MUST ensure that any observable behavior (including error reporting) is independent of chunking or I/O strategy: two conformant decoders seeing the same `ProgramBytes` MUST reconstruct the same logical `Program` or the same encoding error.
---
## 9. Conformance
An implementation is **ENC/PEL-PROGRAM-DAG/1conformant** if it:
1. **Implements `ProgramBytes` encoding/decoding**
* Encodes and decodes `ProgramBytes` exactly as defined in §4.
* Treats `program_version = 1` as the only supported version.
* Treats deviations (unknown version, malformed fields) as encoding errors.
2. **Respects canonical ordering**
* When encoding `Program` values, orders `nodes` in the canonical topological order defined in `PEL/PROGRAM-DAG/1`.
* Preserves the logical order of `roots`.
3. **Uses canonical field encodings**
* Uses `u32` lengths for lists and params as specified.
* Uses `Utf8String` for operation names.
* Uses `u8` discriminants and the specified payload layout for `DagInput`.
4. **Preserves injectivity and stability**
* Ensures distinct logical Programs (per `PEL/PROGRAM-DAG/1`) produce distinct `ProgramBytes`.
* Ensures the same logical Program consistently produces the same `ProgramBytes` under this profile.
5. **Binds to Program Artifacts correctly**
* When forming Program Artifacts for `PEL/PROGRAM-DAG/1`, sets:
* `Artifact.bytes = ProgramBytes`
* `Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1`
* Uses `ENC/ASL1-CORE v1` and `HASH/ASL1` for ASL/1 identity.
Everything else — storage, transport, operation registries, traces — is outside the scope of this encoding profile, provided it does not contradict the requirements above.
---
## 10. Informative Example
> This example is non-normative and uses abbreviated hex.
> It illustrates only the field layout, not exact ASCII/UTF-8 bytes.
Consider a tiny Program:
* Nodes:
1. `N0``id = 1`
* `op = (name = "add64", version = 1)`
* `inputs = [ DagInputExternal{input_index = 0}, DagInputExternal{input_index = 1} ]`
* `params =` empty
2. `N1``id = 2`
* `op = (name = "mul64", version = 1)`
* `inputs = [ DagInputNode{node_id = 1, output_index = 0}, DagInputExternal{input_index = 2} ]`
* `params =` empty
* Roots:
* `RootRef{ node_id = 2, output_index = 0 }`
Canonical topological order:
* Node 1 has only external inputs → first.
* Node 2 depends on Node 1 → second.
ProgramBytes (pseudo-annotated):
```text
program_version = 0001 ; u16 = 1
node_count = 00000002 ; 2 nodes
; Node 0 (id = 1)
node_id = 00000001
op_name = 00000005 "add64" ; length=5, bytes 'a','d','d','6','4'
op_version = 00000001
input_count = 00000002
; input 0: external(0)
kind = 00
input_index = 00000000
; input 1: external(1)
kind = 00
input_index = 00000001
params_len = 00000000 ; empty params
; Node 1 (id = 2)
node_id = 00000002
op_name = 00000005 "mul64"
op_version = 00000001
input_count = 00000002
; input 0: node(1,0)
kind = 01
node_id = 00000001
output_index = 00000000
; input 1: external(2)
kind = 00
input_index = 00000002
params_len = 00000000 ; empty params
root_count = 00000001
; root 0: (node 2, output 0)
node_id = 00000002
output_index = 00000000
```
These bytes become `Artifact.bytes` for a Program Artifact with `type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1`. All conformant encoders under `PEL_ENC_PROGRAM_DAG_V1` MUST produce the same byte sequence for this logical Program.
---
**End of `ENC/PEL-PROGRAM-DAG/1 v0.2.0 — Canonical Encoding for DAG Programs`**
---
## Document History
* **0.2.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.