21 KiB
ENC/PEL-PROGRAM-DAG/1 — Canonical Encoding for DAG Programs
Status: Approved Owner: Niklas Rydberg Version: 0.2.0 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [binary-minimalism, deterministic]
Document ID: ENC/PEL-PROGRAM-DAG/1
Profile ID: PEL_ENC_PROGRAM_DAG_V1 = 0x0101
Layer: Scheme Encoding Profile (on top of ASL/1-CORE + PEL/PROGRAM-DAG/1)
Depends on (normative):
ASL/1-CORE v0.4.x— value model (Artifact,TypeTag,Reference, integers,OctetString)ENC/ASL1-CORE v1.0.3— canonical encoding conventions (integers,OctetString, streaming constraints)PEL/PROGRAM-DAG/1 v0.3.1— DAG Program scheme (Program/Nodemodel, semantics, canonical topological order)
Integrates with (informative):
PEL/1-CORE v0.3.x— primitive execution layer (Exec_s/Exec_DAG)PEL/1-SURF v0.2.x— store-backed execution surfaceHASH/ASL1 v0.2.4— reference formation over canonical encodings- TypeTag registry (for
TYPE_TAG_PEL_PROGRAM_DAG_1) - Operation registries (e.g.
OPREG/PEL1-KERNELand param profiles)
Note: The Profile ID
PEL_ENC_PROGRAM_DAG_V1is a configuration label.
It is not embedded in the encoded Program bytes. Selection of this encoding profile is done by context (scheme descriptor, store or engine configuration), not per value.
© 2025 Niklas Rydberg.
License
Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.
Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.
0. Overview
ENC/PEL-PROGRAM-DAG/1 defines the canonical binary encoding of the Program structure defined in PEL/PROGRAM-DAG/1:
Program {
nodes: list<Node>
roots: list<RootRef>
}
and its sub-structures:
OperationIdDagInputExternal,DagInputNode,DagInputNodeRootRef
This encoding:
- is injective with respect to the logical
Programmodel — distinct Programs → distinct byte strings under this profile, - is stable and deterministic across implementations and time,
- is streaming-friendly — encoders and decoders can operate in a single forward pass,
- fixes a canonical topological ordering for
nodes(matching the scheme spec).
The result is used as the payload (Artifact.bytes) of Program Artifacts for the PEL/PROGRAM-DAG/1 scheme, with:
Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
Artifact.bytes = ProgramBytes
Identity of Program Artifacts is then derived via HASH/ASL1 over ArtifactBytes as usual.
1. Scope & Layering
1.1 Purpose
This specification defines:
-
The concrete binary layout of:
ProgramBytesNodeBytesDagInputBytesRootRefBytesOperationIdBytes
-
Canonicalization rules:
- Node ordering (canonical topological order),
- fixed field ordering,
- integer widths and encodings.
It does not define:
- The logical semantics of Programs (DAG evaluation, error statuses, etc.) — those are in
PEL/PROGRAM-DAG/1andPEL/1-CORE. - The ASL/1
ArtifactorReferencelayouts — those are inASL/1-COREandENC/ASL1-CORE. - How Programs are used in store-backed execution — that belongs to
PEL/1-SURF.
1.2 Layering constraints
In line with SUBSTRATE/STACK-OVERVIEW:
-
ENC/PEL-PROGRAM-DAG/1is a scheme-specific encoding profile. -
It MUST NOT redefine:
Artifact,TypeTag,Reference, orHashId(fromASL/1-CORE),- the logical
Program/Node/DagInput/RootRefmodel (fromPEL/PROGRAM-DAG/1).
-
It is storage-neutral and policy-neutral.
-
It defines exactly one canonical encoding for
Programvalues for this scheme under the profile IDPEL_ENC_PROGRAM_DAG_V1.
2. Conventions
RFC 2119 terms (MUST, SHOULD, MAY, etc.) are normative.
2.1 Integer encodings
All multi-byte integers are encoded as big-endian (network byte order), as in ENC/ASL1-CORE:
u8— 1 byteu16— 2 bytesu32— 4 bytesu64— 8 bytes
Only fixed-width integers are used in this specification.
2.2 Utf8String
This specification defines a canonical Utf8String encoding:
Utf8String = length (u32) || bytes[0..length-1]
lengthis the number of bytes of UTF-8 data.lengthMAY be zero.- Decoders MUST validate that the byte sequence is well-formed UTF-8.
- There is no padding or terminator.
All strings in this profile (OperationId.name) are encoded as Utf8String.
2.3 Parameter bytes
For operation parameters, this profile uses a compact ParamsBytes encoding:
ParamsBytes = length (u32) || bytes[0..length-1]
bytesis an opaque blob whose interpretation is defined per operation in the operation registry.lengthMAY be zero (empty params).
This differs from the general
OctetStringencoding (which usesu64length and is defined inENC/ASL1-CORE). Usingu32length here is acceptable because this structure lives inside the Program Artifact payload, not as an ASL/1 top-level value.
2.3.1 Parameter profiles and canonicality
When a Program is interpreted under a concrete operation registry + parameter profile set:
-
For each operation
(name, version):-
There MUST be a well-defined abstract parameter model (
ParamsValue). -
There MUST be exactly one canonical encode/decode pair:
encode_params_op : ParamsValue -> ParamsBytes decode_params_op : ParamsBytes -> ParamsValue | error
-
-
All conformant implementations of that operation MUST:
- decode
ParamsBytesinto the sameParamsValue, or - deterministically detect decoding failures.
- decode
Kernel parameter profiles (e.g. OPREG/PEL1-KERNEL-PARAMS/1) MUST ensure:
encode_params_opfollowed bydecode_params_opround-trips exactly.- Any
ParamsBytesthat failsdecode_params_opMUST be treated as a program-level validation error (INVALID_PROGRAM) underPEL/PROGRAM-DAG/1, not as a runtime failure.
2.4 Lists
A list of values of some type T is encoded as:
List<T> = count (u32) || element_0 || element_1 || ... || element_{count-1}
countis the number of elements (MAY be zero).- Elements are encoded in order, using the canonical encoding of
T.
2.5 Encoding version field
ProgramBytes includes a program_version (u16) field:
- In this profile,
program_versionMUST be1. - Any future incompatible change to the layout of
ProgramBytesunder the same profile ID MUST be reflected by a newprogram_versionvalue (and corresponding decoder support). - Adding fields in a backward-compatible, strictly-append-only way SHOULD be done via a new encoding profile rather than overloading
program_version = 1.
Decoders MUST reject any program_version they do not implement.
3. Logical Model Reference
For convenience, the logical types from PEL/PROGRAM-DAG/1 are restated informally (normative source remains that document):
OperationId {
name: string
version: uint32
}
DagInputExternal {
input_index: uint32
}
DagInputNode {
node_id: NodeId // uint32
output_index: uint32
}
DagInput =
DagInputExternal
| DagInputNode
Node {
id: NodeId // uint32
op: OperationId
inputs: list<DagInput>
params: Params // abstract; serialized as ParamsBytes in this profile
}
RootRef {
node_id: NodeId // uint32
output_index: uint32
}
Program {
nodes: list<Node>
roots: list<RootRef>
}
NodeId = uint32
PEL/PROGRAM-DAG/1 further defines:
- Structural validity rules (unique NodeIds, acyclicity, etc.).
- Canonical topological order of Nodes.
This encoding profile assumes the logical model and validity rules as given there. It does not re-check them at the encoding-layer; that is scheme-level responsibility.
4. Program Encoding
4.1 Program header and overall layout
The canonical encoding of a Program value is:
ProgramBytes ::
program_version (u16)
node_count (u32)
nodes (NodeBytes[0..node_count-1])
root_count (u32)
roots (RootRefBytes[0..root_count-1])
Constraints:
-
program_versionMUST currently be1.Decoders:
- MUST accept
program_version = 1. - MUST reject any other value as an unsupported encoding version.
- MUST accept
-
node_countandroot_countare the number of elements in the corresponding lists. -
nodesMUST be encoded in the canonical topological order defined inPEL/PROGRAM-DAG/1 §4. Encoders MUST perform this ordering; decoders MAY rely on it but are not required to re-check DAG properties. -
rootsMUST be encoded in the same order as the logicalProgram.rootslist.
4.2 Node encoding
Each Node is encoded as:
NodeBytes ::
node_id (u32)
op_name (Utf8String)
op_version (u32)
input_count (u32)
inputs (DagInputBytes[0..input_count-1])
params_len (u32)
params_bytes (byte[0..params_len-1])
Field meanings:
-
node_id (u32)- Encodes
Node.id. - MUST be unique across all Nodes in a Program (scheme-level requirement).
- Encodes
-
op_name (Utf8String)- Encodes
OperationId.nameas UTF-8.
- Encodes
-
op_version (u32)- Encodes
OperationId.version.
- Encodes
-
input_count (u32)- Number of input references consumed by this Node.
-
inputs- Exactly
input_countentries, each encoded asDagInputBytes(see §4.3) in order.
- Exactly
-
params_len (u32)andparams_bytesparams_len= length of the operation-specific parameter blob.params_bytesis an opaque blob whose interpretation is defined by the operation’s parameter profile.- A
params_lenof0encodes an empty parameter blob.
Injectivity requirement (Node) For a fixed interpretation of
ParamsBytesandOperationId, distinct logicalNodevalues (differing in any field) MUST produce distinctNodeBytes. GivenNodeBytesand the corresponding operation registry, a canonical decoder MUST reconstruct exactly the same logicalNode.
4.3 DagInput encoding
Each DagInput is encoded as a tagged union:
DagInputBytes ::
kind (u8)
payload(...)
Where kind is:
0x00 => DagInputExternal
0x01 => DagInputNode
4.3.1 DagInputExternal
For:
DagInputExternal {
input_index: uint32
}
Encoding:
kind = 0x00
input_index (u32)
input_index is the 0-based index into the inputs list passed to Exec_DAG.
4.3.2 DagInputNode
For:
DagInputNode {
node_id: NodeId // uint32
output_index: uint32
}
Encoding:
kind = 0x01
node_id (u32)
output_index (u32)
node_id MUST refer to some Node.id in the same Program (scheme-level validity).
output_index is the 0-based index into that Node’s output list.
4.3.3 Decoder behavior
Decoders MUST:
- Treat any
kindvalue other than0x00or0x01as an encoding error (invalidDagInput). - For
kind = 0x00, read exactly oneu32asinput_index. - For
kind = 0x01, read exactly twou32values (node_id,output_index). - Reject truncated encodings (insufficient bytes for the payload).
Structural validity of indices (e.g., node_id existence, output arity) is enforced at the scheme level, not the encoding layer.
4.4 RootRef encoding
For:
RootRef {
node_id: NodeId
output_index: uint32
}
Encoding:
RootRefBytes ::
node_id (u32)
output_index (u32)
-
RootRefBytesis identical to the payload ofDagInputNode, but without akindbyte. Roots are always Node outputs, so no variant tag is needed. -
The
rootslist inProgramBytesMUST encode eachRootRefin the logical order ofProgram.roots.
Decoders MUST reject truncated entries (insufficient bytes for both u32 values).
5. Canonicality Requirements
5.1 Node ordering in Program
Encoders MUST:
-
encode
Program.nodesin the canonical topological order defined byPEL/PROGRAM-DAG/1 §4:- Dependencies appear before dependents.
- Ties are broken by smallest
NodeId(numeric, ascending).
-
ensure that the
node_countwritten inProgramBytesequals the number of encodedNodeBytes.
Decoders:
- MAY assume the encoded order corresponds to the canonical topological order.
- MAY perform additional checks (e.g., verifying acyclicity), but this is not required for basic decoding.
5.2 Field ordering
Field ordering in all structures is fixed and MUST NOT vary:
ProgramBytes—program_version,node_count,nodes…,root_count,roots…NodeBytes—node_id,op_name,op_version,input_count,inputs…,params_len,params_bytes…DagInputBytes—kind, then variant-specific payload.RootRefBytes—node_id,output_index.
Any deviation MUST be treated as an encoding error.
5.3 Injectivity and stability
The mapping:
Program -> ProgramBytes
defined by this profile MUST be:
-
Injective — if
P1 != P2as logicalProgramvalues (perPEL/PROGRAM-DAG/1), thenProgramBytes(P1) != ProgramBytes(P2). -
Stable — the same logical
ProgramMUST encode to the sameProgramBytesacross:- different implementations,
- platforms,
- executions,
- and times,
given the same version of this encoding profile and the same underlying operation/param profiles.
Encoders MUST NOT:
- reorder Nodes other than by the canonical topological order,
- reorder inputs within a Node,
- reorder roots,
- introduce alternative encodings for integers, strings, or params.
6. Program Artifact Binding
6.1 TypeTag
Program Artifacts for this scheme MUST be encoded as:
Artifact {
bytes = ProgramBytes
type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
}
Where:
TYPE_TAG_PEL_PROGRAM_DAG_1is aTypeTagwith a concretetag_idassigned in the global TypeTag registry.
This encoding profile:
- uses
TYPE_TAG_PEL_PROGRAM_DAG_1symbolically, and - does not assign a specific numeric
tag_id; that is done in a registry document.
6.2 Identity via ASL/1-CORE and HASH/ASL1
Given ENC/ASL1-CORE v1 as the canonical encoding for Artifact and some chosen ASL1 hash algorithm H (e.g. HASH-ASL1-256 under HashId = 0x0001):
-
The canonical
ArtifactBytesfor a Program Artifact is given byENC/ASL1-CORE v1:ArtifactBytes = encode_artifact_core_v1( Artifact{ bytes = ProgramBytes, type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 } ) -
The canonical
Referencefor that Artifact underHashId = HIDis:digest = H(ArtifactBytes) reference = Reference { hash_id = HID, digest = digest }
All conformant implementations MUST agree on:
ProgramBytesfor a given logicalProgram,ArtifactBytesfor the Program Artifact,- the resulting
Referencefor any fixed(HashId, H).
7. Error Handling (Encoding Level)
This encoding profile defines only structural encoding errors. Handling of scheme-level validity errors (INVALID_PROGRAM, etc.) is done by PEL/PROGRAM-DAG/1 and PEL/1-CORE.
Decoders MUST treat as encoding errors:
-
Truncated fields
- Not enough bytes to read any declared integer, string, list, or params blob.
-
Unsupported
program_versionprogram_version != 1.
-
Invalid
DagInput.kindkindis not0x00or0x01.
-
Invalid
Utf8Stringop_namebytes are not valid UTF-8.
-
Inconsistent list lengths
- Fewer or more
NodeBytesthan indicated bynode_count. - Fewer or more
RootRefBytesthan indicated byroot_count.
- Fewer or more
These are encoding-layer issues. The exact error codes surfaced to callers (e.g., ERR_PEL_ENC_INVALID) are implementation-specific but MUST result in rejection of the Program bytes as malformed under this encoding profile.
8. Streaming and Implementation Notes
Implementations MUST be able to:
-
Encode any
Programusing a single forward pass over the canonical node order:- compute canonical topological order first (requires holding
Programstructure), - then write fields in the order defined above.
- compute canonical topological order first (requires holding
-
Decode any
ProgramBytessequentially:- no backtracking or multi-pass parsing is required,
- all length prefixes appear before their content.
For very large Programs:
-
Implementations MAY:
- stream Nodes one by one into an internal representation,
- stream
params_bytesto a buffer or directly into an operation-registry decoder.
-
They MUST ensure that any observable behavior (including error reporting) is independent of chunking or I/O strategy: two conformant decoders seeing the same
ProgramBytesMUST reconstruct the same logicalProgramor the same encoding error.
9. Conformance
An implementation is ENC/PEL-PROGRAM-DAG/1–conformant if it:
-
Implements
ProgramBytesencoding/decoding- Encodes and decodes
ProgramBytesexactly as defined in §4. - Treats
program_version = 1as the only supported version. - Treats deviations (unknown version, malformed fields) as encoding errors.
- Encodes and decodes
-
Respects canonical ordering
- When encoding
Programvalues, ordersnodesin the canonical topological order defined inPEL/PROGRAM-DAG/1. - Preserves the logical order of
roots.
- When encoding
-
Uses canonical field encodings
- Uses
u32lengths for lists and params as specified. - Uses
Utf8Stringfor operation names. - Uses
u8discriminants and the specified payload layout forDagInput.
- Uses
-
Preserves injectivity and stability
- Ensures distinct logical Programs (per
PEL/PROGRAM-DAG/1) produce distinctProgramBytes. - Ensures the same logical Program consistently produces the same
ProgramBytesunder this profile.
- Ensures distinct logical Programs (per
-
Binds to Program Artifacts correctly
-
When forming Program Artifacts for
PEL/PROGRAM-DAG/1, sets:Artifact.bytes = ProgramBytesArtifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
-
Uses
ENC/ASL1-CORE v1andHASH/ASL1for ASL/1 identity.
-
Everything else — storage, transport, operation registries, traces — is outside the scope of this encoding profile, provided it does not contradict the requirements above.
10. Informative Example
This example is non-normative and uses abbreviated hex. It illustrates only the field layout, not exact ASCII/UTF-8 bytes.
Consider a tiny Program:
-
Nodes:
-
N0—id = 1op = (name = "add64", version = 1)inputs = [ DagInputExternal{input_index = 0}, DagInputExternal{input_index = 1} ]params =empty
-
N1—id = 2op = (name = "mul64", version = 1)inputs = [ DagInputNode{node_id = 1, output_index = 0}, DagInputExternal{input_index = 2} ]params =empty
-
-
Roots:
RootRef{ node_id = 2, output_index = 0 }
Canonical topological order:
- Node 1 has only external inputs → first.
- Node 2 depends on Node 1 → second.
ProgramBytes (pseudo-annotated):
program_version = 0001 ; u16 = 1
node_count = 00000002 ; 2 nodes
; Node 0 (id = 1)
node_id = 00000001
op_name = 00000005 "add64" ; length=5, bytes 'a','d','d','6','4'
op_version = 00000001
input_count = 00000002
; input 0: external(0)
kind = 00
input_index = 00000000
; input 1: external(1)
kind = 00
input_index = 00000001
params_len = 00000000 ; empty params
; Node 1 (id = 2)
node_id = 00000002
op_name = 00000005 "mul64"
op_version = 00000001
input_count = 00000002
; input 0: node(1,0)
kind = 01
node_id = 00000001
output_index = 00000000
; input 1: external(2)
kind = 00
input_index = 00000002
params_len = 00000000 ; empty params
root_count = 00000001
; root 0: (node 2, output 0)
node_id = 00000002
output_index = 00000000
These bytes become Artifact.bytes for a Program Artifact with type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1. All conformant encoders under PEL_ENC_PROGRAM_DAG_V1 MUST produce the same byte sequence for this logical Program.
End of ENC/PEL-PROGRAM-DAG/1 v0.2.0 — Canonical Encoding for DAG Programs
Document History
- 0.2.0 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.