Carl Niklas Rydberg 4dae7a32d2 Add PEL tier1 specifications

2025-12-20 12:35:10 +01:00

24 KiB

Raw Permalink Blame History

PEL/PROGRAM-DAG/1 — DAG Program Scheme for PEL

Status: Approved Owner: Niklas Rydberg Version: 0.3.1 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [execution, composition]

Document ID: PEL/PROGRAM-DAG/1 Layer: L1 Scheme Profile (on top of PEL/1-CORE)

Depends on (normative):

ASL/1-CORE v0.4.x — Artifact Substrate (value model)
PEL/1-CORE v0.3.x — Primitive Execution Layer (core semantics)

Integrates with (informative):

PEL/1-SURF v0.2.x — Store-backed execution surface
ENC/PEL-PROGRAM-DAG/1 (planned) — Canonical encoding for DAG programs
PEL/TRACE-DAG/1 (planned) — Node-level execution trace profile
OPREG/PEL1-KERNEL (planned) — Operation registry for DAG ops
SUBSTRATE/STACK-OVERVIEW v0.1.x — Kernel / near-core layering

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.

0. Conventions

RFC 2119 terms (MUST, MUST NOT, SHOULD, MAY, etc.) are normative.

From ASL/1-CORE:

Artifact {
  bytes:    OctetString
  type_tag: optional TypeTag
}

TypeTag {
  tag_id: uint32
}

Reference {
  hash_id: HashId
  digest:  OctetString
}

HashId = uint16

From PEL/1-CORE (summary):

SchemeRef = Reference

ExecutionStatus = uint8
ExecutionErrorKind = uint8

ExecutionErrorSummary {
  kind:        ExecutionErrorKind
  status_code: uint32
}

DiagnosticEntry {
  code:    uint32
  message: OctetString
}

ExecutionResultValue {
  pel1_version : uint16
  status       : ExecutionStatus
  scheme_ref   : SchemeRef
  summary      : ExecutionErrorSummary
  diagnostics  : list<DiagnosticEntry>
}

PEL/1-CORE defines the following shared enums:

ExecutionStatus {
  OK                = 0
  SCHEME_UNSUPPORTED= 1
  INVALID_PROGRAM   = 2
  INVALID_INPUTS    = 3
  RUNTIME_FAILED    = 4
}

ExecutionErrorKind {
  NONE    = 0
  SCHEME  = 1
  PROGRAM = 2
  INPUTS  = 3
  RUNTIME = 4
}

For this scheme:

Exec_DAG uses only the following ExecutionStatus values:
```
OK
INVALID_PROGRAM
INVALID_INPUTS
RUNTIME_FAILED
```
(SCHEME_UNSUPPORTED is never produced by Exec_DAG; surfaces like PEL/1-SURF handle that before calling the scheme.)
It uses only the following ExecutionErrorKind values:
```
NONE
PROGRAM
INPUTS
RUNTIME
```

The status ↔ kind mapping for this scheme is:

status = OK              -> summary.kind = NONE
status = INVALID_PROGRAM -> summary.kind = PROGRAM
status = INVALID_INPUTS  -> summary.kind = INPUTS
status = RUNTIME_FAILED  -> summary.kind = RUNTIME

summary.status_code is scheme-defined but MUST be deterministic and respect:

status = OK → status_code = 0
status = INVALID_PROGRAM → status_code = 2 (canonical core code)
status = INVALID_INPUTS → status_code = 3 (canonical core code)
status = RUNTIME_FAILED → status_code MUST be non-zero and MUST NOT be 2 or 3 (operation/scheme–defined runtime code).

Exact non-zero runtime codes come from operation registries and/or a future error code profile; this spec only constrains ranges and mapping to ExecutionStatus / ExecutionErrorKind.

1. Purpose & Non-Goals

1.1 Purpose

PEL/PROGRAM-DAG/1 defines a concrete PEL scheme where:

Programs are acyclic computation graphs (DAGs) of operations (Nodes).
Nodes read from:
- external inputs (by index into the inputs list given to Exec_DAG), and
- outputs of other Nodes (by NodeId + output_index).
Programs designate root outputs as the scheme-level outputs.

This profile provides:

A precise program value model: Program, Node, DagInput, RootRef.
Structural validity rules for such programs.
A canonical topological ordering for deterministic evaluation.

A binding of PEL/1-CORE’s Exec_s to a specific function:

Exec_DAG :
  (program: Artifact, inputs: list<Artifact>, params: optional Artifact)
    -> (outputs: list<Artifact>, result: ExecutionResultValue)

Exec_DAG is pure and store-neutral; store-backed execution is handled by PEL/1-SURF.

1.2 Non-Goals

This scheme does not define:

Graph/provenance edges (TGK/1-CORE).
Store semantics (ASL/1-STORE) or surfaces (PEL/1-SURF).
Concrete binary layouts (ENC/PEL-PROGRAM-DAG/1).
Trace artifact formats (PEL/TRACE-DAG/1).
Operation registries (OPREG/PEL1-KERNEL / extensions) or specific operation semantics.

Those are separate profiles layered on top of this scheme.

2. Core Model

2.1 OperationId

Operations are identified by a logical OperationId:

OperationId {
  name:    String     // logical text identifier
  version: uint32     // monotonic per name
}

String here is a logical UTF-8 string; its concrete encoding into bytes is defined by ENC/PEL-PROGRAM-DAG/1.
For a given (name, version) within a given operation registry, semantics MUST be immutable once published. Any semantic change MUST be expressed as a new version.

At this scheme level, OperationId is opaque. A registry profile (e.g. OPREG/PEL1-KERNEL) defines a function:

OpSemantics(name, version) :
  (inputs: list<Artifact>, params: ParamsValue)
    -> Ok(list<Artifact>) | Err(status_code: uint32, diagnostics: list<DiagnosticEntry>)

2.2 Params

Each Node carries params as an opaque parameter blob:

ParamsBytes = OctetString    // raw bytes in the Program encoding
ParamsValue = scheme-defined structured value

Requirements:

For each operation (name, version):
- There MUST exist a well-defined abstract parameter model (ParamsValue).
- There MUST be exactly one canonical encode/decode between ParamsValue and ParamsBytes.
All conformant implementations of that operation MUST:
- decode the same ParamsBytes into the same ParamsValue, or
- deterministically signal a decode failure.

Static vs dynamic errors (guidance):

Static param errors (invalid structure, missing mandatory fields, invalid enums, out-of-range constants) SHOULD be treated as INVALID_PROGRAM.
Dynamic errors that depend on runtime inputs (e.g. “division by zero” because an input argument is zero, even though the param is syntactically valid) MUST be treated as RUNTIME_FAILED and surfaced via status_code and diagnostics.

This spec requires determinism and the status mapping (§0); it only gives guidance on where to draw the static/dynamic line. Parameter profiles MAY formalize this split more strictly.

The concrete placement of ParamsBytes (embedded in the Program vs separate Artifacts referenced from the Program) is defined by ENC/PEL-PROGRAM-DAG/1.

2.3 NodeId

NodeId = uint32

Constraints:

NodeId MUST be unique within a single Program.
When ordering Nodes, NodeId is compared as an unsigned integer (0 < 1 < 2 < … < 2^32 − 1).

2.4 DAG Inputs and Roots

Logical input references inside the Program:

DagInputExternal {
  input_index: uint32   // 0-based index into Exec_DAG.inputs
}

DagInputNode {
  node_id:      NodeId  // must exist in Program.nodes
  output_index: uint32  // 0-based index into that Node's outputs
}

DagInput =
    DagInputExternal
  | DagInputNode

Scheme outputs (what Exec_DAG returns as outputs) are designated by:

RootRef {
  node_id:      NodeId
  output_index: uint32
}

RootRef is interpreted like DagInputNode, but only in the final output selection step.

2.5 Node

A Node is a single operation application:

Node {
  id:      NodeId
  op:      OperationId
  inputs:  list<DagInput>
  params:  ParamsBytes
}

Constraints:

id MUST be unique within the Program.
inputs MAY be empty (nullary operations).
All DagInputNode.node_id MUST refer to some Node.id in Program.nodes.

Semantics:

When evaluated, the Node:
- resolves each DagInput to an input Artifact,
- decodes params into a ParamsValue according to op,
- applies the operation’s pure function to the input Artifacts and decoded params,
- yields zero or more output Artifacts.

The number and meaning of outputs per operation are defined in the operation registry, not here.

2.6 Program

A Program is an acyclic graph:

Program {
  nodes: list<Node>
  roots: list<RootRef>
}

Constraints (structural validity is detailed in §3):

All Node.id are unique.
All DagInputNode.node_id and RootRef.node_id refer to some Node in nodes.
The dependency graph induced by DagInputNode is a DAG (no cycles).

2.7 Program Artifact and TypeTag

At the ASL/1 layer:

A Program Artifact is an Artifact such that:
- type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 (symbolic constant), and
- bytes encode exactly one Program value under ENC/PEL-PROGRAM-DAG/1.

ENC/PEL-PROGRAM-DAG/1 MUST:

assign a concrete numeric tag_id for TYPE_TAG_PEL_PROGRAM_DAG_1,
define an injective, deterministic, streaming-friendly encoding for Program,
ensure that all conformant engines decode/encode Programs identically.

This scheme assumes such an encoding exists and is canonical; it does not define the layout itself.

3. Structural Validity

A Program is structurally valid for this scheme if and only if all of the following hold:

Unique NodeIds

For all Node A, Node B in nodes:

A.id == B.id  ⇒  A and B are the same Node

Node references are in range
- For every DagInputNode{ node_id, ... } in any Node.inputs:
  - There exists a Node in nodes with id = node_id.
Roots are in range
- For every RootRef{ node_id, ... } in roots:
  - There exists a Node in nodes with id = node_id.
Acyclic dependency graph
- Define a directed graph G:
  - vertices: all NodeId values in nodes,
  - edge from N.id to M.id if N.inputs contains DagInputNode{ node_id = M.id, ... }.
- G MUST be acyclic (a DAG).
- Equivalently, there exists at least one topological ordering of Nodes where all dependencies of a Node appear before it.
OperationId syntax is well-formed
- Each Node.op is syntactically well-formed as:
  - a valid name string (logical UTF-8 as per ENC/PEL-PROGRAM-DAG/1), and
  - a uint32 version.
- Whether (name, version) is defined in the operation registry is an execution-time concern (§5.3).
Parameter profile is available for each Node
- For every Node in nodes:
  - The active operation/parameter registry (e.g. OPREG/PEL1-KERNEL + parameter profiles) MUST provide a parameter-decoding function for that Node’s OperationId.

3.1 Parameter decoding and INVALID_PROGRAM

Program validity depends both on graph structure and on decodability of Node params under the chosen operation registry.

Given:

a Program value,
an operation registry for this scheme, and
a set of parameter profiles,

validation MUST include parameter decoding for every Node:

For each Node in Program.nodes:
- Let op be its OperationId.
- Let params_bytes be its ParamsBytes.
- Let decode_params_op be the param-decoding function associated with op.
The engine MUST conceptually invoke decode_params_op(params_bytes) during Program validation.
If decoding fails for any Node (including any length, range, or structural violations defined by decode_params_op):
- the Program MUST be classified as structurally invalid for this scheme, and
- any execution attempt MUST yield status = INVALID_PROGRAM (independent of the inputs).

Implementations MAY:

eagerly validate all params before any node evaluation, or
lazily decode on first use of a Node’s params,

as long as the observable semantics is:

For a fixed program Artifact, either all Node params are decodable (and execution can proceed), or every call to Exec_DAG on that program yields status = INVALID_PROGRAM, regardless of inputs and params.

A Program Artifact whose bytes cannot be decoded as a Program under ENC/PEL-PROGRAM-DAG/1 is treated as “not a Program” and MUST also produce status = INVALID_PROGRAM.

4. Canonical Topological Order

4.1 Definition

Given a structurally valid Program:

Construct the dependency graph G as in §3.4.
While unplaced Nodes remain:
- Let C be the set of Nodes whose dependencies are already in the order (i.e., all DagInputNode target NodeIds are in the placed set).
- From C, choose the Node with smallest NodeId (numeric ascending).
- Append that Node to the canonical order.
- Mark it as placed, and iterate.

The resulting sequence of Nodes is the canonical topological order and is unique given the Program.

4.2 Use

All references in this spec to:

“Nodes in canonical order”, or
“evaluating Nodes in order”,

refer to this ordering.

Implementations MAY evaluate Nodes in parallel or with different internal scheduling, but the observable behavior (outputs and ExecutionResultValue) MUST be equivalent to evaluating Nodes sequentially in canonical topological order.

5. Scheme Binding to PEL/1-CORE

5.1 Scheme identity

This scheme is identified by a SchemeRef pointing to a scheme descriptor Artifact, denoted:

SchemeRef_DAG_1 : SchemeRef

The descriptor Artifact (its TypeTag, bytes, and meaning) is defined in a separate descriptor document (e.g. PEL/PROGRAM-DAG-DESC/1). All conformant engines MUST agree on:

the descriptor Artifact bytes, and
the resulting Reference (SchemeRef_DAG_1).

In PEL/1-CORE terms, this scheme binds:

SchemeRef = SchemeRef_DAG_1
Exec_s    = Exec_DAG

5.2 Exec_DAG signature

For this scheme, Exec_s is instantiated as:

Exec_DAG(
  program: Artifact,         // Program Artifact
  inputs:  list<Artifact>,   // external input Artifacts
  params:  optional Artifact // optional global parameters
) -> (outputs: list<Artifact>, result: ExecutionResultValue)

Interpretation:

program MUST be an Artifact with type_tag = TYPE_TAG_PEL_PROGRAM_DAG_1 and decodable as a Program. Otherwise, status = INVALID_PROGRAM.
inputs is the list of external inputs referenced by DagInputExternal.input_index.
params, if present, is a scheme-specific Artifact that MAY:
- be interpreted as global configuration, or
- be ignored,
depending on the scheme descriptor.

The scheme MUST be deterministic:

For fixed program.bytes, inputs list, and params.bytes, all conformant implementations of Exec_DAG MUST produce identical outputs and ExecutionResultValue.

5.3 High-level semantics of Exec_DAG

Exec_DAG MUST proceed conceptually in the following steps:

Decode and validate Program
- Attempt to decode program.bytes as a Program under ENC/PEL-PROGRAM-DAG/1.
- If decoding fails:
  - outputs = [].
  - result:
```
pel1_version        = 1
status              = INVALID_PROGRAM
scheme_ref          = SchemeRef_DAG_1
summary.kind        = PROGRAM
summary.status_code = 2
diagnostics         = [ deterministic diagnostic entries ]
```
  - Return.
- If decoding succeeds but the Program violates any structural validity rule (§3):
  - Same as above: status = INVALID_PROGRAM.
Resolve operation semantics and validate params
- For each Node.op in Program.nodes, resolve (Node.op.name, Node.op.version) against the operation registry for this scheme.
- If any OperationId used in the Program is not defined:
  - Treat the Program as invalid:
    - outputs = [].
    - result as INVALID_PROGRAM (status_code = 2) with diagnostics indicating unknown operations.
  - Return.
- For each Node:
  - Decode its ParamsBytes using the parameter profile for its OperationId.
  - If decoding fails:
    - Treat as INVALID_PROGRAM:
      - outputs = [].
      - result as INVALID_PROGRAM (status_code = 2) with appropriate diagnostics.
    - Return.
  - Otherwise, conceptually store the decoded ParamsValue for use in evaluation.
Evaluate Nodes in canonical order
- Compute the canonical topological order (§4) of Nodes.
- Maintain:
```
node_outputs : NodeId -> list<Artifact>
```
- For each Node N in canonical order:
  1. Resolve Node inputs
    - For each DagInput in N.inputs:
      - If DagInputExternal{ input_index }:
        
        If input_index >= len(inputs):
        
        This is an input error:
        
        outputs = [].
        
        result:
        
        pel1_version = 1 status = INVALID_INPUTS scheme_ref = SchemeRef_DAG_1 summary.kind = INPUTS summary.status_code = 3 diagnostics = [ deterministic diagnostic for missing input_index ]
        
        Return.
        
        Otherwise use inputs[input_index] as that input Artifact.
      - If DagInputNode{ node_id, output_index }:
        
        Let outs = node_outputs[node_id] (must exist because canonical order ensures dependencies are evaluated first).
        
        If output_index >= len(outs):
        
        This is a program error (operation output arity mismatch):
        
        outputs = [].
        
        result as INVALID_PROGRAM (status_code = 2) with suitable diagnostics.
        
        Return.
    - Collect the resolved input Artifacts into node_input_artifacts.
  2. Apply operation semantics
    - Let params_value be the decoded ParamsValue for Node N.
    - Invoke the operation semantics:
```
Op(N.op)(
  node_input_artifacts: list<Artifact>,
  params_value: ParamsValue
) -> Ok(list<Artifact>) | Err(status_code: uint32, diagnostics: list<DiagnosticEntry>)
```
    - Implementation requirements:
      - Op(N.op) MUST be pure and deterministic.
      - It MUST NOT perform I/O, consult time, or depend on mutable global state.
    - If Ok(outputs_for_node):
      - Set node_outputs[N.id] = outputs_for_node.
    - If Err(status_code, diag):
      - This is a runtime failure; Exec_DAG MUST:
        
        Set outputs = [] (this scheme is fail-fast; it does not define partial outputs).
        
        Set result:
        
        pel1_version = 1 status = RUNTIME_FAILED scheme_ref = SchemeRef_DAG_1 summary.kind = RUNTIME summary.status_code = status_code // non-zero, operation-defined diagnostics = diag // or diag plus additional scheme diagnostics
      - No further Nodes are evaluated.
      - Return.
Compute scheme outputs (success case)
- If all Nodes have been evaluated successfully (no early return):
  - Initialise outputs = [].
  - For each RootRef in Program.roots (in order):
    - Let outs = node_outputs[root.node_id].
    - If root.output_index >= len(outs):
      - Treat as INVALID_PROGRAM:
        
        outputs = [].
        
        result as in step 1 with suitable diagnostics.
      - Return.
    - Otherwise, append outs[root.output_index] to outputs.
  - When all roots resolve successfully:
    - outputs is the list of root Artifacts in Program.roots order.
    - result:
```
pel1_version        = 1
status              = OK
scheme_ref          = SchemeRef_DAG_1
summary.kind        = NONE
summary.status_code = 0
diagnostics         = []
```
Return
- Return (outputs, result).

5.4 Determinism guarantees

For fixed:

program.bytes,
inputs list (sequence and values),
params presence and params.bytes,

and given:

a fixed canonical encoding profile (ENC/PEL-PROGRAM-DAG/1),
a fixed operation registry (OPREG/PEL1-KERNEL + extension registries) for this scheme,

all conformant implementations of Exec_DAG MUST produce:

identical logical outputs (as Artifact values), and
identical ExecutionResultValue payloads.

No implementation-specific identifiers, timestamps, or environment details may appear in result or in the contents of outputs.

6. Interaction with PEL/1-SURF (Informative)

When this scheme is used with PEL/1-SURF:

The surface layer provides program_ref, input_refs, params_ref and a StoreInstance.
It resolves those references to program, inputs, params via ASL/1-STORE get.
It invokes Exec_DAG(program, inputs, params) as defined here.
It persists:
- each element of outputs via put, producing output_refs, and
- a surface ExecutionResult artifact containing ExecutionResultValue plus context.

Store-resolution failures (ERR_NOT_FOUND, ERR_INTEGRITY, ERR_UNSUPPORTED) are handled by PEL/1-SURF, which maps them to INVALID_PROGRAM / INVALID_INPUTS with store_failure metadata, without calling Exec_DAG.

Exec_DAG only sees Programs and input/param Artifacts that have successfully been loaded from the store.

7. Interaction with Trace Profiles (Informative)

A trace profile such as PEL/TRACE-DAG/1 MAY:

Capture per-node inputs, outputs, and error states.
Encode these in a TraceValue Artifact with its own TypeTag.
Specify that Exec_DAG (or the surface) produces a trace Artifact as one of the outputs, and that PEL/1-SURF exposes its Reference as trace_ref in surface results.

PEL/PROGRAM-DAG/1 itself does not mandate traces; it only ensures evaluation order and deterministic behavior so that a trace profile can be layered cleanly.

8. Conformance

An implementation is PEL/PROGRAM-DAG/1–conformant if it:

Program handling
- Correctly decodes and encodes Program values according to this spec and ENC/PEL-PROGRAM-DAG/1.
- Checks structural validity as in §3, including parameter decodability.
- Treats structurally invalid or undecodable Programs as INVALID_PROGRAM in Exec_DAG.
Scheme binding
- Recognizes SchemeRef_DAG_1 as the scheme identifier.
- Implements Exec_DAG exactly as in §5 for this scheme.
- Ensures that result.scheme_ref = SchemeRef_DAG_1 and result.pel1_version = 1.
Operation registry integration
- Uses a registry for (OperationId -> OpSemantics) where:
  - each OpSemantics is pure and deterministic,
  - param encodings are canonical and deterministic,
  - runtime errors are mapped to non-zero status_codes and deterministic diagnostics.
- Treats unknown OperationId values in a Program as INVALID_PROGRAM.
Status mapping
- Produces ExecutionResultValue.status only from:
  - OK,
  - INVALID_PROGRAM,
  - INVALID_INPUTS,
  - RUNTIME_FAILED.
- Ensures summary.kind and summary.status_code follow the mapping in §0 and §5.3.
Determinism and purity
- Satisfies PEL/1-CORE’s determinism contract:
  - For fixed inputs, all conformant implementations produce identical outputs and ExecutionResultValue.
- Does not consult time, randomness, external I/O, or mutable global state as part of Exec_DAG semantics.
Layering
- Does not depend on any particular store, graph, certification, or overlay behavior.
- Treats store resolution, provenance, certification, and overlays as responsibilities of other specs (PEL/1-SURF, TGK/1-CORE, CIL/1, FER/1, FCT/1).

End of PEL/PROGRAM-DAG/1 v0.3.1 — DAG Program Scheme for PEL

Document History

0.3.1 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.

24 KiB Raw Permalink Blame History Unescape Escape