# ENC/PEL-PROGRAM-DAG/1 — Canonical Encoding for DAG Programs Status: Approved Owner: Niklas Rydberg Version: 0.2.0 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [binary-minimalism, deterministic] **Document ID:** `ENC/PEL-PROGRAM-DAG/1` **Profile ID:** `PEL_ENC_PROGRAM_DAG_V1 = 0x0101` **Layer:** Scheme Encoding Profile (on top of ASL/1-CORE + PEL/PROGRAM-DAG/1) **Depends on (normative):** * `ASL/1-CORE v0.4.x` — value model (`Artifact`, `TypeTag`, `Reference`, integers, `OctetString`) * `ENC/ASL1-CORE v1.0.3` — canonical encoding conventions (integers, `OctetString`, streaming constraints) * `PEL/PROGRAM-DAG/1 v0.3.1` — DAG Program scheme (`Program` / `Node` model, semantics, canonical topological order) **Integrates with (informative):** * `PEL/1-CORE v0.3.x` — primitive execution layer (`Exec_s` / `Exec_DAG`) * `PEL/1-SURF v0.2.x` — store-backed execution surface * `HASH/ASL1 v0.2.4` — reference formation over canonical encodings * TypeTag registry (for `TYPE_TAG_PEL_PROGRAM_DAG_1`) * Operation registries (e.g. `OPREG/PEL1-KERNEL` and param profiles) > **Note:** The Profile ID `PEL_ENC_PROGRAM_DAG_V1` is a configuration label. > It is **not** embedded in the encoded Program bytes. Selection of this encoding profile is done by context (scheme descriptor, store or engine configuration), not per value. © 2025 Niklas Rydberg. ## License Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications. Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0. --- ## 0. Overview `ENC/PEL-PROGRAM-DAG/1` defines the **canonical binary encoding** of the `Program` structure defined in `PEL/PROGRAM-DAG/1`: ```text Program { nodes: list roots: list } ```` and its sub-structures: * `OperationId` * `DagInputExternal`, `DagInputNode`, `DagInput` * `Node` * `RootRef` This encoding: * is **injective** with respect to the logical `Program` model — distinct Programs → distinct byte strings under this profile, * is **stable and deterministic** across implementations and time, * is **streaming-friendly** — encoders and decoders can operate in a single forward pass, * fixes a **canonical topological ordering** for `nodes` (matching the scheme spec). The result is used as the payload (`Artifact.bytes`) of Program Artifacts for the `PEL/PROGRAM-DAG/1` scheme, with: ```text Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 Artifact.bytes = ProgramBytes ``` Identity of Program Artifacts is then derived via `HASH/ASL1` over `ArtifactBytes` as usual. --- ## 1. Scope & Layering ### 1.1 Purpose This specification defines: * The concrete binary layout of: * `ProgramBytes` * `NodeBytes` * `DagInputBytes` * `RootRefBytes` * `OperationIdBytes` * Canonicalization rules: * Node ordering (canonical topological order), * fixed field ordering, * integer widths and encodings. It does **not** define: * The logical semantics of Programs (DAG evaluation, error statuses, etc.) — those are in `PEL/PROGRAM-DAG/1` and `PEL/1-CORE`. * The ASL/1 `Artifact` or `Reference` layouts — those are in `ASL/1-CORE` and `ENC/ASL1-CORE`. * How Programs are used in store-backed execution — that belongs to `PEL/1-SURF`. ### 1.2 Layering constraints In line with `SUBSTRATE/STACK-OVERVIEW`: * `ENC/PEL-PROGRAM-DAG/1` is a **scheme-specific encoding profile**. * It MUST NOT redefine: * `Artifact`, `TypeTag`, `Reference`, or `HashId` (from `ASL/1-CORE`), * the logical `Program` / `Node` / `DagInput` / `RootRef` model (from `PEL/PROGRAM-DAG/1`). * It is **storage-neutral** and **policy-neutral**. * It defines exactly one canonical encoding for `Program` values for this scheme under the profile ID `PEL_ENC_PROGRAM_DAG_V1`. --- ## 2. Conventions RFC 2119 terms (**MUST**, **SHOULD**, **MAY**, etc.) are normative. ### 2.1 Integer encodings All multi-byte integers are encoded as **big-endian** (network byte order), as in `ENC/ASL1-CORE`: * `u8` — 1 byte * `u16` — 2 bytes * `u32` — 4 bytes * `u64` — 8 bytes Only **fixed-width** integers are used in this specification. ### 2.2 Utf8String This specification defines a canonical `Utf8String` encoding: ```text Utf8String = length (u32) || bytes[0..length-1] ``` * `length` is the number of bytes of UTF-8 data. * `length` MAY be zero. * Decoders MUST validate that the byte sequence is well-formed UTF-8. * There is no padding or terminator. All strings in this profile (`OperationId.name`) are encoded as `Utf8String`. ### 2.3 Parameter bytes For operation parameters, this profile uses a compact `ParamsBytes` encoding: ```text ParamsBytes = length (u32) || bytes[0..length-1] ``` * `bytes` is an opaque blob whose interpretation is defined per operation in the operation registry. * `length` MAY be zero (empty params). > This differs from the general `OctetString` encoding (which uses `u64` length and is defined in `ENC/ASL1-CORE`). > Using `u32` length here is acceptable because this structure lives *inside* the Program Artifact payload, not as an ASL/1 top-level value. #### 2.3.1 Parameter profiles and canonicality When a `Program` is interpreted under a concrete operation registry + parameter profile set: * For each operation `(name, version)`: * There MUST be a well-defined abstract parameter model (`ParamsValue`). * There MUST be exactly one canonical encode/decode pair: ```text encode_params_op : ParamsValue -> ParamsBytes decode_params_op : ParamsBytes -> ParamsValue | error ``` * All conformant implementations of that operation MUST: * decode `ParamsBytes` into the same `ParamsValue`, or * deterministically detect decoding failures. Kernel parameter profiles (e.g. `OPREG/PEL1-KERNEL-PARAMS/1`) MUST ensure: * `encode_params_op` followed by `decode_params_op` round-trips exactly. * Any `ParamsBytes` that fails `decode_params_op` MUST be treated as a **program-level validation error** (`INVALID_PROGRAM`) under `PEL/PROGRAM-DAG/1`, not as a runtime failure. ### 2.4 Lists A list of values of some type `T` is encoded as: ```text List = count (u32) || element_0 || element_1 || ... || element_{count-1} ``` * `count` is the number of elements (MAY be zero). * Elements are encoded in order, using the canonical encoding of `T`. ### 2.5 Encoding version field `ProgramBytes` includes a `program_version (u16)` field: * In this profile, `program_version` MUST be `1`. * Any future incompatible change to the layout of `ProgramBytes` under the same profile ID MUST be reflected by a new `program_version` value (and corresponding decoder support). * Adding fields in a backward-compatible, strictly-append-only way SHOULD be done via a new encoding profile rather than overloading `program_version = 1`. Decoders MUST reject any `program_version` they do not implement. --- ## 3. Logical Model Reference For convenience, the logical types from `PEL/PROGRAM-DAG/1` are restated informally (normative source remains that document): ```text OperationId { name: string version: uint32 } DagInputExternal { input_index: uint32 } DagInputNode { node_id: NodeId // uint32 output_index: uint32 } DagInput = DagInputExternal | DagInputNode Node { id: NodeId // uint32 op: OperationId inputs: list params: Params // abstract; serialized as ParamsBytes in this profile } RootRef { node_id: NodeId // uint32 output_index: uint32 } Program { nodes: list roots: list } NodeId = uint32 ``` `PEL/PROGRAM-DAG/1` further defines: * Structural validity rules (unique NodeIds, acyclicity, etc.). * Canonical topological order of Nodes. This encoding profile assumes the logical model and validity rules as given there. It **does not** re-check them at the encoding-layer; that is scheme-level responsibility. --- ## 4. Program Encoding ### 4.1 Program header and overall layout The canonical encoding of a `Program` value is: ```text ProgramBytes :: program_version (u16) node_count (u32) nodes (NodeBytes[0..node_count-1]) root_count (u32) roots (RootRefBytes[0..root_count-1]) ``` Constraints: * `program_version` MUST currently be `1`. Decoders: * MUST accept `program_version = 1`. * MUST reject any other value as an unsupported encoding version. * `node_count` and `root_count` are the number of elements in the corresponding lists. * `nodes` MUST be encoded in the **canonical topological order** defined in `PEL/PROGRAM-DAG/1 §4`. Encoders MUST perform this ordering; decoders MAY rely on it but are not required to re-check DAG properties. * `roots` MUST be encoded in the same order as the logical `Program.roots` list. ### 4.2 Node encoding Each `Node` is encoded as: ```text NodeBytes :: node_id (u32) op_name (Utf8String) op_version (u32) input_count (u32) inputs (DagInputBytes[0..input_count-1]) params_len (u32) params_bytes (byte[0..params_len-1]) ``` Field meanings: 1. `node_id (u32)` * Encodes `Node.id`. * MUST be unique across all Nodes in a Program (scheme-level requirement). 2. `op_name (Utf8String)` * Encodes `OperationId.name` as UTF-8. 3. `op_version (u32)` * Encodes `OperationId.version`. 4. `input_count (u32)` * Number of input references consumed by this Node. 5. `inputs` * Exactly `input_count` entries, each encoded as `DagInputBytes` (see §4.3) in order. 6. `params_len (u32)` and `params_bytes` * `params_len` = length of the operation-specific parameter blob. * `params_bytes` is an opaque blob whose interpretation is defined by the operation’s parameter profile. * A `params_len` of `0` encodes an empty parameter blob. > **Injectivity requirement (Node)** > For a fixed interpretation of `ParamsBytes` and `OperationId`, distinct logical `Node` values (differing in any field) MUST produce distinct `NodeBytes`. > Given `NodeBytes` and the corresponding operation registry, a canonical decoder MUST reconstruct exactly the same logical `Node`. ### 4.3 DagInput encoding Each `DagInput` is encoded as a tagged union: ```text DagInputBytes :: kind (u8) payload(...) ``` Where `kind` is: ```text 0x00 => DagInputExternal 0x01 => DagInputNode ``` #### 4.3.1 DagInputExternal For: ```text DagInputExternal { input_index: uint32 } ``` Encoding: ```text kind = 0x00 input_index (u32) ``` `input_index` is the 0-based index into the `inputs` list passed to `Exec_DAG`. #### 4.3.2 DagInputNode For: ```text DagInputNode { node_id: NodeId // uint32 output_index: uint32 } ``` Encoding: ```text kind = 0x01 node_id (u32) output_index (u32) ``` `node_id` MUST refer to some `Node.id` in the same Program (scheme-level validity). `output_index` is the 0-based index into that Node’s output list. #### 4.3.3 Decoder behavior Decoders MUST: * Treat any `kind` value other than `0x00` or `0x01` as an encoding error (invalid `DagInput`). * For `kind = 0x00`, read exactly one `u32` as `input_index`. * For `kind = 0x01`, read exactly two `u32` values (`node_id`, `output_index`). * Reject truncated encodings (insufficient bytes for the payload). Structural validity of indices (e.g., `node_id` existence, output arity) is enforced at the scheme level, not the encoding layer. ### 4.4 RootRef encoding For: ```text RootRef { node_id: NodeId output_index: uint32 } ``` Encoding: ```text RootRefBytes :: node_id (u32) output_index (u32) ``` * `RootRefBytes` is identical to the payload of `DagInputNode`, but without a `kind` byte. Roots are always Node outputs, so no variant tag is needed. * The `roots` list in `ProgramBytes` MUST encode each `RootRef` in the logical order of `Program.roots`. Decoders MUST reject truncated entries (insufficient bytes for both `u32` values). --- ## 5. Canonicality Requirements ### 5.1 Node ordering in Program Encoders MUST: * encode `Program.nodes` in the **canonical topological order** defined by `PEL/PROGRAM-DAG/1 §4`: * Dependencies appear before dependents. * Ties are broken by smallest `NodeId` (numeric, ascending). * ensure that the `node_count` written in `ProgramBytes` equals the number of encoded `NodeBytes`. Decoders: * MAY assume the encoded order corresponds to the canonical topological order. * MAY perform additional checks (e.g., verifying acyclicity), but this is not required for basic decoding. ### 5.2 Field ordering Field ordering in all structures is fixed and MUST NOT vary: * `ProgramBytes` — `program_version`, `node_count`, `nodes…`, `root_count`, `roots…` * `NodeBytes` — `node_id`, `op_name`, `op_version`, `input_count`, `inputs…`, `params_len`, `params_bytes…` * `DagInputBytes` — `kind`, then variant-specific payload. * `RootRefBytes` — `node_id`, `output_index`. Any deviation MUST be treated as an encoding error. ### 5.3 Injectivity and stability The mapping: ```text Program -> ProgramBytes ``` defined by this profile MUST be: * **Injective** — if `P1 != P2` as logical `Program` values (per `PEL/PROGRAM-DAG/1`), then `ProgramBytes(P1) != ProgramBytes(P2)`. * **Stable** — the same logical `Program` MUST encode to the same `ProgramBytes` across: * different implementations, * platforms, * executions, * and times, given the same version of this encoding profile and the same underlying operation/param profiles. Encoders MUST NOT: * reorder Nodes other than by the canonical topological order, * reorder inputs within a Node, * reorder roots, * introduce alternative encodings for integers, strings, or params. --- ## 6. Program Artifact Binding ### 6.1 TypeTag Program Artifacts for this scheme MUST be encoded as: ```text Artifact { bytes = ProgramBytes type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 } ``` Where: * `TYPE_TAG_PEL_PROGRAM_DAG_1` is a `TypeTag` with a concrete `tag_id` assigned in the global TypeTag registry. This encoding profile: * uses `TYPE_TAG_PEL_PROGRAM_DAG_1` symbolically, and * does not assign a specific numeric `tag_id`; that is done in a registry document. ### 6.2 Identity via ASL/1-CORE and HASH/ASL1 Given `ENC/ASL1-CORE v1` as the canonical encoding for `Artifact` and some chosen ASL1 hash algorithm `H` (e.g. `HASH-ASL1-256` under `HashId = 0x0001`): 1. The canonical `ArtifactBytes` for a Program Artifact is given by `ENC/ASL1-CORE v1`: ```text ArtifactBytes = encode_artifact_core_v1( Artifact{ bytes = ProgramBytes, type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 } ) ``` 2. The canonical `Reference` for that Artifact under `HashId = HID` is: ```text digest = H(ArtifactBytes) reference = Reference { hash_id = HID, digest = digest } ``` All conformant implementations MUST agree on: * `ProgramBytes` for a given logical `Program`, * `ArtifactBytes` for the Program Artifact, * the resulting `Reference` for any fixed `(HashId, H)`. --- ## 7. Error Handling (Encoding Level) This encoding profile defines only **structural encoding errors**. Handling of scheme-level validity errors (`INVALID_PROGRAM`, etc.) is done by `PEL/PROGRAM-DAG/1` and `PEL/1-CORE`. Decoders MUST treat as encoding errors: 1. **Truncated fields** * Not enough bytes to read any declared integer, string, list, or params blob. 2. **Unsupported `program_version`** * `program_version != 1`. 3. **Invalid `DagInput.kind`** * `kind` is not `0x00` or `0x01`. 4. **Invalid `Utf8String`** * `op_name` bytes are not valid UTF-8. 5. **Inconsistent list lengths** * Fewer or more `NodeBytes` than indicated by `node_count`. * Fewer or more `RootRefBytes` than indicated by `root_count`. These are **encoding-layer** issues. The exact error codes surfaced to callers (e.g., `ERR_PEL_ENC_INVALID`) are implementation-specific but MUST result in rejection of the Program bytes as malformed under this encoding profile. --- ## 8. Streaming and Implementation Notes Implementations MUST be able to: * Encode any `Program` using a single forward pass over the canonical node order: * compute canonical topological order first (requires holding `Program` structure), * then write fields in the order defined above. * Decode any `ProgramBytes` sequentially: * no backtracking or multi-pass parsing is required, * all length prefixes appear before their content. For very large Programs: * Implementations MAY: * stream Nodes one by one into an internal representation, * stream `params_bytes` to a buffer or directly into an operation-registry decoder. * They MUST ensure that any observable behavior (including error reporting) is independent of chunking or I/O strategy: two conformant decoders seeing the same `ProgramBytes` MUST reconstruct the same logical `Program` or the same encoding error. --- ## 9. Conformance An implementation is **ENC/PEL-PROGRAM-DAG/1–conformant** if it: 1. **Implements `ProgramBytes` encoding/decoding** * Encodes and decodes `ProgramBytes` exactly as defined in §4. * Treats `program_version = 1` as the only supported version. * Treats deviations (unknown version, malformed fields) as encoding errors. 2. **Respects canonical ordering** * When encoding `Program` values, orders `nodes` in the canonical topological order defined in `PEL/PROGRAM-DAG/1`. * Preserves the logical order of `roots`. 3. **Uses canonical field encodings** * Uses `u32` lengths for lists and params as specified. * Uses `Utf8String` for operation names. * Uses `u8` discriminants and the specified payload layout for `DagInput`. 4. **Preserves injectivity and stability** * Ensures distinct logical Programs (per `PEL/PROGRAM-DAG/1`) produce distinct `ProgramBytes`. * Ensures the same logical Program consistently produces the same `ProgramBytes` under this profile. 5. **Binds to Program Artifacts correctly** * When forming Program Artifacts for `PEL/PROGRAM-DAG/1`, sets: * `Artifact.bytes = ProgramBytes` * `Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1` * Uses `ENC/ASL1-CORE v1` and `HASH/ASL1` for ASL/1 identity. Everything else — storage, transport, operation registries, traces — is outside the scope of this encoding profile, provided it does not contradict the requirements above. --- ## 10. Informative Example > This example is non-normative and uses abbreviated hex. > It illustrates only the field layout, not exact ASCII/UTF-8 bytes. Consider a tiny Program: * Nodes: 1. `N0` — `id = 1` * `op = (name = "add64", version = 1)` * `inputs = [ DagInputExternal{input_index = 0}, DagInputExternal{input_index = 1} ]` * `params =` empty 2. `N1` — `id = 2` * `op = (name = "mul64", version = 1)` * `inputs = [ DagInputNode{node_id = 1, output_index = 0}, DagInputExternal{input_index = 2} ]` * `params =` empty * Roots: * `RootRef{ node_id = 2, output_index = 0 }` Canonical topological order: * Node 1 has only external inputs → first. * Node 2 depends on Node 1 → second. ProgramBytes (pseudo-annotated): ```text program_version = 0001 ; u16 = 1 node_count = 00000002 ; 2 nodes ; Node 0 (id = 1) node_id = 00000001 op_name = 00000005 "add64" ; length=5, bytes 'a','d','d','6','4' op_version = 00000001 input_count = 00000002 ; input 0: external(0) kind = 00 input_index = 00000000 ; input 1: external(1) kind = 00 input_index = 00000001 params_len = 00000000 ; empty params ; Node 1 (id = 2) node_id = 00000002 op_name = 00000005 "mul64" op_version = 00000001 input_count = 00000002 ; input 0: node(1,0) kind = 01 node_id = 00000001 output_index = 00000000 ; input 1: external(2) kind = 00 input_index = 00000002 params_len = 00000000 ; empty params root_count = 00000001 ; root 0: (node 2, output 0) node_id = 00000002 output_index = 00000000 ``` These bytes become `Artifact.bytes` for a Program Artifact with `type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1`. All conformant encoders under `PEL_ENC_PROGRAM_DAG_V1` MUST produce the same byte sequence for this logical Program. --- **End of `ENC/PEL-PROGRAM-DAG/1 v0.2.0 — Canonical Encoding for DAG Programs`** --- ## Document History * **0.2.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.