amduat/tier1/enc-pel-program-dag-1.md
2025-12-20 12:35:10 +01:00

21 KiB
Raw Permalink Blame History

ENC/PEL-PROGRAM-DAG/1 — Canonical Encoding for DAG Programs

Status: Approved Owner: Niklas Rydberg Version: 0.2.0 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [binary-minimalism, deterministic]

Document ID: ENC/PEL-PROGRAM-DAG/1 Profile ID: PEL_ENC_PROGRAM_DAG_V1 = 0x0101 Layer: Scheme Encoding Profile (on top of ASL/1-CORE + PEL/PROGRAM-DAG/1)

Depends on (normative):

  • ASL/1-CORE v0.4.x — value model (Artifact, TypeTag, Reference, integers, OctetString)
  • ENC/ASL1-CORE v1.0.3 — canonical encoding conventions (integers, OctetString, streaming constraints)
  • PEL/PROGRAM-DAG/1 v0.3.1 — DAG Program scheme (Program / Node model, semantics, canonical topological order)

Integrates with (informative):

  • PEL/1-CORE v0.3.x — primitive execution layer (Exec_s / Exec_DAG)
  • PEL/1-SURF v0.2.x — store-backed execution surface
  • HASH/ASL1 v0.2.4 — reference formation over canonical encodings
  • TypeTag registry (for TYPE_TAG_PEL_PROGRAM_DAG_1)
  • Operation registries (e.g. OPREG/PEL1-KERNEL and param profiles)

Note: The Profile ID PEL_ENC_PROGRAM_DAG_V1 is a configuration label.
It is not embedded in the encoded Program bytes. Selection of this encoding profile is done by context (scheme descriptor, store or engine configuration), not per value.

© 2025 Niklas Rydberg.

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.


0. Overview

ENC/PEL-PROGRAM-DAG/1 defines the canonical binary encoding of the Program structure defined in PEL/PROGRAM-DAG/1:

Program {
  nodes: list<Node>
  roots: list<RootRef>
}

and its sub-structures:

  • OperationId
  • DagInputExternal, DagInputNode, DagInput
  • Node
  • RootRef

This encoding:

  • is injective with respect to the logical Program model — distinct Programs → distinct byte strings under this profile,
  • is stable and deterministic across implementations and time,
  • is streaming-friendly — encoders and decoders can operate in a single forward pass,
  • fixes a canonical topological ordering for nodes (matching the scheme spec).

The result is used as the payload (Artifact.bytes) of Program Artifacts for the PEL/PROGRAM-DAG/1 scheme, with:

Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
Artifact.bytes    = ProgramBytes

Identity of Program Artifacts is then derived via HASH/ASL1 over ArtifactBytes as usual.


1. Scope & Layering

1.1 Purpose

This specification defines:

  • The concrete binary layout of:

    • ProgramBytes
    • NodeBytes
    • DagInputBytes
    • RootRefBytes
    • OperationIdBytes
  • Canonicalization rules:

    • Node ordering (canonical topological order),
    • fixed field ordering,
    • integer widths and encodings.

It does not define:

  • The logical semantics of Programs (DAG evaluation, error statuses, etc.) — those are in PEL/PROGRAM-DAG/1 and PEL/1-CORE.
  • The ASL/1 Artifact or Reference layouts — those are in ASL/1-CORE and ENC/ASL1-CORE.
  • How Programs are used in store-backed execution — that belongs to PEL/1-SURF.

1.2 Layering constraints

In line with SUBSTRATE/STACK-OVERVIEW:

  • ENC/PEL-PROGRAM-DAG/1 is a scheme-specific encoding profile.

  • It MUST NOT redefine:

    • Artifact, TypeTag, Reference, or HashId (from ASL/1-CORE),
    • the logical Program / Node / DagInput / RootRef model (from PEL/PROGRAM-DAG/1).
  • It is storage-neutral and policy-neutral.

  • It defines exactly one canonical encoding for Program values for this scheme under the profile ID PEL_ENC_PROGRAM_DAG_V1.


2. Conventions

RFC 2119 terms (MUST, SHOULD, MAY, etc.) are normative.

2.1 Integer encodings

All multi-byte integers are encoded as big-endian (network byte order), as in ENC/ASL1-CORE:

  • u8 — 1 byte
  • u16 — 2 bytes
  • u32 — 4 bytes
  • u64 — 8 bytes

Only fixed-width integers are used in this specification.

2.2 Utf8String

This specification defines a canonical Utf8String encoding:

Utf8String = length (u32) || bytes[0..length-1]
  • length is the number of bytes of UTF-8 data.
  • length MAY be zero.
  • Decoders MUST validate that the byte sequence is well-formed UTF-8.
  • There is no padding or terminator.

All strings in this profile (OperationId.name) are encoded as Utf8String.

2.3 Parameter bytes

For operation parameters, this profile uses a compact ParamsBytes encoding:

ParamsBytes = length (u32) || bytes[0..length-1]
  • bytes is an opaque blob whose interpretation is defined per operation in the operation registry.
  • length MAY be zero (empty params).

This differs from the general OctetString encoding (which uses u64 length and is defined in ENC/ASL1-CORE). Using u32 length here is acceptable because this structure lives inside the Program Artifact payload, not as an ASL/1 top-level value.

2.3.1 Parameter profiles and canonicality

When a Program is interpreted under a concrete operation registry + parameter profile set:

  • For each operation (name, version):

    • There MUST be a well-defined abstract parameter model (ParamsValue).

    • There MUST be exactly one canonical encode/decode pair:

      encode_params_op  : ParamsValue -> ParamsBytes
      decode_params_op  : ParamsBytes -> ParamsValue | error
      
  • All conformant implementations of that operation MUST:

    • decode ParamsBytes into the same ParamsValue, or
    • deterministically detect decoding failures.

Kernel parameter profiles (e.g. OPREG/PEL1-KERNEL-PARAMS/1) MUST ensure:

  • encode_params_op followed by decode_params_op round-trips exactly.
  • Any ParamsBytes that fails decode_params_op MUST be treated as a program-level validation error (INVALID_PROGRAM) under PEL/PROGRAM-DAG/1, not as a runtime failure.

2.4 Lists

A list of values of some type T is encoded as:

List<T> = count (u32) || element_0 || element_1 || ... || element_{count-1}
  • count is the number of elements (MAY be zero).
  • Elements are encoded in order, using the canonical encoding of T.

2.5 Encoding version field

ProgramBytes includes a program_version (u16) field:

  • In this profile, program_version MUST be 1.
  • Any future incompatible change to the layout of ProgramBytes under the same profile ID MUST be reflected by a new program_version value (and corresponding decoder support).
  • Adding fields in a backward-compatible, strictly-append-only way SHOULD be done via a new encoding profile rather than overloading program_version = 1.

Decoders MUST reject any program_version they do not implement.


3. Logical Model Reference

For convenience, the logical types from PEL/PROGRAM-DAG/1 are restated informally (normative source remains that document):

OperationId {
  name:    string
  version: uint32
}

DagInputExternal {
  input_index: uint32
}

DagInputNode {
  node_id:      NodeId   // uint32
  output_index: uint32
}

DagInput =
    DagInputExternal
  | DagInputNode

Node {
  id:      NodeId        // uint32
  op:      OperationId
  inputs:  list<DagInput>
  params:  Params        // abstract; serialized as ParamsBytes in this profile
}

RootRef {
  node_id:      NodeId   // uint32
  output_index: uint32
}

Program {
  nodes: list<Node>
  roots: list<RootRef>
}

NodeId = uint32

PEL/PROGRAM-DAG/1 further defines:

  • Structural validity rules (unique NodeIds, acyclicity, etc.).
  • Canonical topological order of Nodes.

This encoding profile assumes the logical model and validity rules as given there. It does not re-check them at the encoding-layer; that is scheme-level responsibility.


4. Program Encoding

4.1 Program header and overall layout

The canonical encoding of a Program value is:

ProgramBytes ::
  program_version (u16)
  node_count      (u32)
  nodes           (NodeBytes[0..node_count-1])
  root_count      (u32)
  roots           (RootRefBytes[0..root_count-1])

Constraints:

  • program_version MUST currently be 1.

    Decoders:

    • MUST accept program_version = 1.
    • MUST reject any other value as an unsupported encoding version.
  • node_count and root_count are the number of elements in the corresponding lists.

  • nodes MUST be encoded in the canonical topological order defined in PEL/PROGRAM-DAG/1 §4. Encoders MUST perform this ordering; decoders MAY rely on it but are not required to re-check DAG properties.

  • roots MUST be encoded in the same order as the logical Program.roots list.

4.2 Node encoding

Each Node is encoded as:

NodeBytes ::
  node_id       (u32)
  op_name       (Utf8String)
  op_version    (u32)
  input_count   (u32)
  inputs        (DagInputBytes[0..input_count-1])
  params_len    (u32)
  params_bytes  (byte[0..params_len-1])

Field meanings:

  1. node_id (u32)

    • Encodes Node.id.
    • MUST be unique across all Nodes in a Program (scheme-level requirement).
  2. op_name (Utf8String)

    • Encodes OperationId.name as UTF-8.
  3. op_version (u32)

    • Encodes OperationId.version.
  4. input_count (u32)

    • Number of input references consumed by this Node.
  5. inputs

    • Exactly input_count entries, each encoded as DagInputBytes (see §4.3) in order.
  6. params_len (u32) and params_bytes

    • params_len = length of the operation-specific parameter blob.
    • params_bytes is an opaque blob whose interpretation is defined by the operations parameter profile.
    • A params_len of 0 encodes an empty parameter blob.

Injectivity requirement (Node) For a fixed interpretation of ParamsBytes and OperationId, distinct logical Node values (differing in any field) MUST produce distinct NodeBytes. Given NodeBytes and the corresponding operation registry, a canonical decoder MUST reconstruct exactly the same logical Node.

4.3 DagInput encoding

Each DagInput is encoded as a tagged union:

DagInputBytes ::
  kind (u8)
  payload(...)

Where kind is:

0x00  => DagInputExternal
0x01  => DagInputNode

4.3.1 DagInputExternal

For:

DagInputExternal {
  input_index: uint32
}

Encoding:

kind         = 0x00
input_index  (u32)

input_index is the 0-based index into the inputs list passed to Exec_DAG.

4.3.2 DagInputNode

For:

DagInputNode {
  node_id:      NodeId   // uint32
  output_index: uint32
}

Encoding:

kind          = 0x01
node_id       (u32)
output_index  (u32)

node_id MUST refer to some Node.id in the same Program (scheme-level validity). output_index is the 0-based index into that Nodes output list.

4.3.3 Decoder behavior

Decoders MUST:

  • Treat any kind value other than 0x00 or 0x01 as an encoding error (invalid DagInput).
  • For kind = 0x00, read exactly one u32 as input_index.
  • For kind = 0x01, read exactly two u32 values (node_id, output_index).
  • Reject truncated encodings (insufficient bytes for the payload).

Structural validity of indices (e.g., node_id existence, output arity) is enforced at the scheme level, not the encoding layer.

4.4 RootRef encoding

For:

RootRef {
  node_id:      NodeId
  output_index: uint32
}

Encoding:

RootRefBytes ::
  node_id      (u32)
  output_index (u32)
  • RootRefBytes is identical to the payload of DagInputNode, but without a kind byte. Roots are always Node outputs, so no variant tag is needed.

  • The roots list in ProgramBytes MUST encode each RootRef in the logical order of Program.roots.

Decoders MUST reject truncated entries (insufficient bytes for both u32 values).


5. Canonicality Requirements

5.1 Node ordering in Program

Encoders MUST:

  • encode Program.nodes in the canonical topological order defined by PEL/PROGRAM-DAG/1 §4:

    • Dependencies appear before dependents.
    • Ties are broken by smallest NodeId (numeric, ascending).
  • ensure that the node_count written in ProgramBytes equals the number of encoded NodeBytes.

Decoders:

  • MAY assume the encoded order corresponds to the canonical topological order.
  • MAY perform additional checks (e.g., verifying acyclicity), but this is not required for basic decoding.

5.2 Field ordering

Field ordering in all structures is fixed and MUST NOT vary:

  • ProgramBytesprogram_version, node_count, nodes…, root_count, roots…
  • NodeBytesnode_id, op_name, op_version, input_count, inputs…, params_len, params_bytes…
  • DagInputByteskind, then variant-specific payload.
  • RootRefBytesnode_id, output_index.

Any deviation MUST be treated as an encoding error.

5.3 Injectivity and stability

The mapping:

Program -> ProgramBytes

defined by this profile MUST be:

  • Injective — if P1 != P2 as logical Program values (per PEL/PROGRAM-DAG/1), then ProgramBytes(P1) != ProgramBytes(P2).

  • Stable — the same logical Program MUST encode to the same ProgramBytes across:

    • different implementations,
    • platforms,
    • executions,
    • and times,

given the same version of this encoding profile and the same underlying operation/param profiles.

Encoders MUST NOT:

  • reorder Nodes other than by the canonical topological order,
  • reorder inputs within a Node,
  • reorder roots,
  • introduce alternative encodings for integers, strings, or params.

6. Program Artifact Binding

6.1 TypeTag

Program Artifacts for this scheme MUST be encoded as:

Artifact {
  bytes    = ProgramBytes
  type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
}

Where:

  • TYPE_TAG_PEL_PROGRAM_DAG_1 is a TypeTag with a concrete tag_id assigned in the global TypeTag registry.

This encoding profile:

  • uses TYPE_TAG_PEL_PROGRAM_DAG_1 symbolically, and
  • does not assign a specific numeric tag_id; that is done in a registry document.

6.2 Identity via ASL/1-CORE and HASH/ASL1

Given ENC/ASL1-CORE v1 as the canonical encoding for Artifact and some chosen ASL1 hash algorithm H (e.g. HASH-ASL1-256 under HashId = 0x0001):

  1. The canonical ArtifactBytes for a Program Artifact is given by ENC/ASL1-CORE v1:

    ArtifactBytes =
      encode_artifact_core_v1(
        Artifact{ bytes = ProgramBytes, type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 }
      )
    
  2. The canonical Reference for that Artifact under HashId = HID is:

    digest    = H(ArtifactBytes)
    reference = Reference { hash_id = HID, digest = digest }
    

All conformant implementations MUST agree on:

  • ProgramBytes for a given logical Program,
  • ArtifactBytes for the Program Artifact,
  • the resulting Reference for any fixed (HashId, H).

7. Error Handling (Encoding Level)

This encoding profile defines only structural encoding errors. Handling of scheme-level validity errors (INVALID_PROGRAM, etc.) is done by PEL/PROGRAM-DAG/1 and PEL/1-CORE.

Decoders MUST treat as encoding errors:

  1. Truncated fields

    • Not enough bytes to read any declared integer, string, list, or params blob.
  2. Unsupported program_version

    • program_version != 1.
  3. Invalid DagInput.kind

    • kind is not 0x00 or 0x01.
  4. Invalid Utf8String

    • op_name bytes are not valid UTF-8.
  5. Inconsistent list lengths

    • Fewer or more NodeBytes than indicated by node_count.
    • Fewer or more RootRefBytes than indicated by root_count.

These are encoding-layer issues. The exact error codes surfaced to callers (e.g., ERR_PEL_ENC_INVALID) are implementation-specific but MUST result in rejection of the Program bytes as malformed under this encoding profile.


8. Streaming and Implementation Notes

Implementations MUST be able to:

  • Encode any Program using a single forward pass over the canonical node order:

    • compute canonical topological order first (requires holding Program structure),
    • then write fields in the order defined above.
  • Decode any ProgramBytes sequentially:

    • no backtracking or multi-pass parsing is required,
    • all length prefixes appear before their content.

For very large Programs:

  • Implementations MAY:

    • stream Nodes one by one into an internal representation,
    • stream params_bytes to a buffer or directly into an operation-registry decoder.
  • They MUST ensure that any observable behavior (including error reporting) is independent of chunking or I/O strategy: two conformant decoders seeing the same ProgramBytes MUST reconstruct the same logical Program or the same encoding error.


9. Conformance

An implementation is ENC/PEL-PROGRAM-DAG/1conformant if it:

  1. Implements ProgramBytes encoding/decoding

    • Encodes and decodes ProgramBytes exactly as defined in §4.
    • Treats program_version = 1 as the only supported version.
    • Treats deviations (unknown version, malformed fields) as encoding errors.
  2. Respects canonical ordering

    • When encoding Program values, orders nodes in the canonical topological order defined in PEL/PROGRAM-DAG/1.
    • Preserves the logical order of roots.
  3. Uses canonical field encodings

    • Uses u32 lengths for lists and params as specified.
    • Uses Utf8String for operation names.
    • Uses u8 discriminants and the specified payload layout for DagInput.
  4. Preserves injectivity and stability

    • Ensures distinct logical Programs (per PEL/PROGRAM-DAG/1) produce distinct ProgramBytes.
    • Ensures the same logical Program consistently produces the same ProgramBytes under this profile.
  5. Binds to Program Artifacts correctly

    • When forming Program Artifacts for PEL/PROGRAM-DAG/1, sets:

      • Artifact.bytes = ProgramBytes
      • Artifact.type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1
    • Uses ENC/ASL1-CORE v1 and HASH/ASL1 for ASL/1 identity.

Everything else — storage, transport, operation registries, traces — is outside the scope of this encoding profile, provided it does not contradict the requirements above.


10. Informative Example

This example is non-normative and uses abbreviated hex. It illustrates only the field layout, not exact ASCII/UTF-8 bytes.

Consider a tiny Program:

  • Nodes:

    1. N0id = 1

      • op = (name = "add64", version = 1)
      • inputs = [ DagInputExternal{input_index = 0}, DagInputExternal{input_index = 1} ]
      • params = empty
    2. N1id = 2

      • op = (name = "mul64", version = 1)
      • inputs = [ DagInputNode{node_id = 1, output_index = 0}, DagInputExternal{input_index = 2} ]
      • params = empty
  • Roots:

    • RootRef{ node_id = 2, output_index = 0 }

Canonical topological order:

  • Node 1 has only external inputs → first.
  • Node 2 depends on Node 1 → second.

ProgramBytes (pseudo-annotated):

program_version = 0001           ; u16 = 1

node_count      = 00000002       ; 2 nodes

; Node 0 (id = 1)
node_id         = 00000001
op_name         = 00000005 "add64"   ; length=5, bytes 'a','d','d','6','4'
op_version      = 00000001
input_count     = 00000002
  ; input 0: external(0)
  kind          = 00
  input_index   = 00000000
  ; input 1: external(1)
  kind          = 00
  input_index   = 00000001
params_len      = 00000000       ; empty params

; Node 1 (id = 2)
node_id         = 00000002
op_name         = 00000005 "mul64"
op_version      = 00000001
input_count     = 00000002
  ; input 0: node(1,0)
  kind          = 01
  node_id       = 00000001
  output_index  = 00000000
  ; input 1: external(2)
  kind          = 00
  input_index   = 00000002
params_len      = 00000000       ; empty params

root_count      = 00000001
  ; root 0: (node 2, output 0)
  node_id       = 00000002
  output_index  = 00000000

These bytes become Artifact.bytes for a Program Artifact with type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1. All conformant encoders under PEL_ENC_PROGRAM_DAG_V1 MUST produce the same byte sequence for this logical Program.


End of ENC/PEL-PROGRAM-DAG/1 v0.2.0 — Canonical Encoding for DAG Programs


Document History

  • 0.2.0 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.