amduat/tier1/enc-pel-trace-dag-1.md
2025-12-20 12:35:10 +01:00

22 KiB
Raw Permalink Blame History

ENC/PEL-TRACE-DAG/1 — Canonical Encoding for DAG Execution Traces

Status: Approved Owner: Niklas Rydberg Version: 0.1.0 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [binary-minimalism, traceability]

Document ID: ENC/PEL-TRACE-DAG/1 Profile ID: PEL_ENC_TRACE_DAG_V1 = 0x0102 Layer: Scheme Encoding Profile (Trace)

Depends on (normative):

  • ASL/1-CORE v0.3.2 — value model (Artifact, TypeTag, Reference, integers, OctetString)
  • ENC/ASL1-CORE v1.0.3 — canonical encodings for Artifact and Reference
  • PEL/1-CORE v0.1.0 — primitive execution layer (ExecutionStatus, ExecutionErrorSummary)
  • PEL/PROGRAM-DAG/1 v0.2.0 — DAG Program scheme (Program, Node, NodeId, canonical node order)
  • PEL/TRACE-DAG/1 v0.1.0 — DAG execution trace profile (logical data model)

Integrates with (informative):

  • PEL/1-SURF v0.1.0 — store-backed execution surface
  • HASH/ASL1 v0.2.3 — ASL1 hash family for trace artifact identity
  • TypeTag registry (for TYPE_TAG_PEL_TRACE_DAG_1)

The Profile ID PEL_ENC_TRACE_DAG_V1 is a configuration label.
It is not embedded into trace payloads. Encoders and decoders select this encoding profile by context (scheme descriptor, engine/store configuration), not per value.

© 2025 Niklas Rydberg.

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.


0. Overview

ENC/PEL-TRACE-DAG/1 defines the canonical binary encoding of the TraceDAGValue structure specified in PEL/TRACE-DAG/1, and all of its nested components:

  • TraceDAGValue
  • NodeTraceDAG
  • DiagnosticEntry

This encoding:

  • is injective — distinct logical trace values → distinct byte strings;
  • is stable and deterministic — same value → same bytes across implementations and time;
  • is streaming-friendly — encoders/decoders can operate in a single forward pass;
  • embeds ASL/1 Reference values using their canonical ReferenceBytes encoding (ENC/ASL1-CORE v1) inside length-prefixed frames.

The encoded payload (TraceDAGBytes) is used as the bytes field of a trace Artifact for the PEL/TRACE-DAG/1 profile, with:

Artifact.type_tag = TYPE_TAG_PEL_TRACE_DAG_1
Artifact.bytes    = TraceDAGBytes

Trace Artifact identity is then derived by hashing canonical ArtifactBytes with "ASL1" hash algorithms (typically HASH-ASL1-256).


1. Scope & Layering

1.1 Purpose

This specification defines:

  • The binary layout of:

    • TraceDAGBytes
    • NodeTraceDAGBytes
    • DiagnosticEntryBytes
    • An internal “encoded Reference” wrapper for use inside trace payloads
  • The canonical field ordering, integer widths, and list framing.

It does not define:

  • The logical semantics of traces — those are in PEL/TRACE-DAG/1.
  • The ASL/1 Artifact / Reference encodings — these are in ENC/ASL1-CORE v1.0.3.
  • How traces are produced or when they are enabled — that is governed by PEL/1-CORE, PEL/1-SURF, and policy profiles.

1.2 Layering constraints

In line with SUBSTRATE/STACK-OVERVIEW:

  • ENC/PEL-TRACE-DAG/1 is a scheme-specific encoding profile.

  • It MUST NOT redefine:

    • Artifact, TypeTag, Reference, HashId (ASL/1-CORE),
    • the TraceDAGValue logical structure (PEL/TRACE-DAG/1).
  • It is storage-neutral and policy-neutral.

  • It defines exactly one canonical encoding for TraceDAGValue values in this scheme.


2. Conventions

The RFC 2119 terms MUST, SHOULD, MAY, etc. are normative.

2.1 Integer encodings

All multi-byte integers are encoded as big-endian (network byte order), as in ENC/ASL1-CORE:

  • u8 — 1 byte
  • u16 — 2 bytes
  • u32 — 4 bytes
  • u64 — 8 bytes

Only fixed-width integers are used in this specification.

2.2 Lists

A list of values of some type T is encoded as:

List<T> =
  count   (u32)
  element_0
  element_1
  ...
  element_{count-1}
  • count is the number of elements (MAY be zero).
  • Elements are encoded in order, using the canonical encoding for T.

2.3 UTF-8 string

Utf8String is encoded as:

Utf8String =
  length (u32)
  bytes[0..length-1]
  • length is the number of bytes.
  • bytes MUST be well-formed UTF-8.
  • There is no terminator or padding.

2.4 Octet blob (32-bit length)

For diagnostics and other opaque fields, we use a generic blob:

Blob32 =
  length (u32)
  bytes[0..length-1]

bytes is an arbitrary OctetString; interpretation is profile-specific. length MAY be zero.

2.5 Embedded Reference

Within this encoding, Reference values are embedded using a length-prefixed wrapper over canonical ReferenceBytes from ENC/ASL1-CORE v1.0.3.

We define:

EncodedRef =
  ref_len   (u32)
  ref_bytes (byte[0..ref_len-1])  // canonical ReferenceBytes

Where:

  • ref_bytes MUST be the canonical ReferenceBytes encoding for a Reference value, as defined in ENC/ASL1-CORE v1.0.3:

    ReferenceBytes ::
      hash_id (u16)
      digest  (byte[...])  // remaining bytes
    
  • ref_len MUST be the exact length (in bytes) of ref_bytes (MUST be ≥ 2).

Decoders MUST:

  • Read ref_len (u32), then ref_bytes[0..ref_len-1].

  • Decode ref_bytes as ReferenceBytes per ENC/ASL1-CORE v1.0.3.

  • Reject encodings where:

    • ref_len < 2, or
    • ref_bytes is not a valid ReferenceBytes sequence (e.g., truncated).

2.5.1 Optional EncodedRef

For optional Reference fields, we use:

OptionalEncodedRef =
  has_ref (u8)
  [ EncodedRef ]    // only if has_ref = 0x01
  • has_ref = 0x00 → no value present; no EncodedRef follows.
  • has_ref = 0x01 → exactly one EncodedRef follows.

Other has_ref values MUST be treated as encoding errors.


3. Logical Model Reference

This section restates the logical structures from PEL/TRACE-DAG/1 (source of truth) in condensed form.

3.1 DiagnosticEntry

DiagnosticEntry {
  code:    uint32        // diagnostic or error code
  message: OctetString   // typically UTF-8 text; interpretation is profile-specific
}

3.2 NodeTraceStatus

NodeTraceStatus = uint8

NodeTraceStatus {
  NODE_OK      = 0
  NODE_FAILED  = 1
  NODE_SKIPPED = 2
}

3.3 NodeTraceDAG

NodeTraceDAG {
  node_id:     NodeId      // uint32
  op_name:     string
  op_version:  uint32

  status:      NodeTraceStatus
  status_code: uint32      // 0 = success; non-zero = op-specific failure code

  output_refs: list<Reference>
  diagnostics: list<DiagnosticEntry>
}

3.4 TraceDAGValue

TraceDAGValue {
  pel1_version:   uint16        // MUST be 1 for this version
  scheme_ref:     Reference     // SchemeRef_DAG_1
  program_ref:    Reference     // Program Artifact reference

  status:         ExecutionStatus
  summary:        ExecutionErrorSummary {
                     kind:        ExecutionErrorKind
                     status_code: uint32
                   }

  exec_result_ref: optional Reference

  input_refs:     list<Reference>
  params_ref:     optional Reference

  node_traces:    list<NodeTraceDAG>   // one per Node in canonical node order
}

All semantics (how these fields are populated for different run outcomes) are defined in PEL/TRACE-DAG/1.


4. Encoding

4.1 DiagnosticEntry encoding

Logical:

DiagnosticEntry {
  code:    uint32
  message: OctetString
}

Canonical encoding:

DiagnosticEntryBytes ::
  code    (u32)
  message (Blob32)

Where Blob32 is as defined in §2.4.

  • code (u32) encodes the diagnostic or error code.
  • message is an opaque byte blob; profile users MAY agree to use UTF-8 here, but this encoding does not enforce it.

Decoders MUST:

  • Read code (u32).
  • Read message as Blob32.
  • Treat truncated blobs as encoding errors.

4.2 NodeTraceDAG encoding

Logical:

NodeTraceDAG {
  node_id:     NodeId
  op_name:     string
  op_version:  uint32

  status:      NodeTraceStatus
  status_code: uint32

  output_refs: list<Reference>
  diagnostics: list<DiagnosticEntry>
}

Canonical encoding:

NodeTraceDAGBytes ::
  node_id          (u32)
  op_name          (Utf8String)
  op_version       (u32)

  status           (u8)    // NodeTraceStatus
  status_code      (u32)

  output_ref_count (u32)
  output_refs      (EncodedRef[0..output_ref_count-1])

  diag_count       (u32)
  diagnostics      (DiagnosticEntryBytes[0..diag_count-1])

Field semantics:

  1. node_id (u32)

    • Encodes NodeTraceDAG.node_id.
  2. op_name (Utf8String)

    • Encodes NodeTraceDAG.op_name as UTF-8 (see §2.3).
  3. op_version (u32)

    • Encodes NodeTraceDAG.op_version.
  4. status (u8)

    • Encodes NodeTraceStatus:

      0x00 -> NODE_OK
      0x01 -> NODE_FAILED
      0x02 -> NODE_SKIPPED
      
    • Other values MUST be treated as encoding errors.

  5. status_code (u32)

    • MUST be:

      • 0 if status = NODE_OK or NODE_SKIPPED.
      • non-zero if status = NODE_FAILED.
    • Conformance to these rules is a semantic requirement, not a decoding requirement. Decoders MAY choose to validate and reject inconsistent encodings.

  6. output_ref_count (u32) and output_refs

    • Number of output references for this node.
    • Each entry is an EncodedRef (§2.5).
  7. diag_count (u32) and diagnostics

    • Number of diagnostic entries.
    • Each entry is encoded using DiagnosticEntryBytes (§4.1).

4.2.1 Canonical ordering of node_traces

In TraceDAGBytes, the node_traces list (see §4.4) MUST:

  • contain each NodeTraceDAG exactly once for each Program Node,
  • appear in the canonical node order defined by PEL/PROGRAM-DAG/1 (canonical topological order with NodeId as tie-breaker).

Encoders MUST enforce this; decoders MAY assume it.


4.3 Summary and status encoding

From TraceDAGValue:

summary: ExecutionErrorSummary {
  kind:        ExecutionErrorKind
  status_code: uint32
}
status:  ExecutionStatus

These are encoded in TraceDAGBytes as:

status              (u8)   // ExecutionStatus
summary_kind        (u8)   // ExecutionErrorKind
summary_status_code (u32)

The exact value sets of ExecutionStatus and ExecutionErrorKind are defined in PEL/1-CORE and PEL/TRACE-DAG/1; this spec treats them as small enums in u8 space.

Decoders MUST:

  • Read status (u8), summary_kind (u8), summary_status_code (u32) as raw fields.
  • Treat status and summary_kind values outside the agreed ranges as encoding errors or map them to an “unknown” variant at the semantic layer.

4.4 TraceDAGValue encoding

Logical:

TraceDAGValue {
  pel1_version:   uint16
  scheme_ref:     Reference
  program_ref:    Reference

  status:         ExecutionStatus
  summary:        ExecutionErrorSummary

  exec_result_ref: optional Reference

  input_refs:     list<Reference>
  params_ref:     optional Reference

  node_traces:    list<NodeTraceDAG>
}

Canonical encoding:

TraceDAGBytes ::
  pel1_version        (u16)

  scheme_ref          (EncodedRef)
  program_ref         (EncodedRef)

  status              (u8)      // ExecutionStatus
  summary_kind        (u8)      // ExecutionErrorKind
  summary_status_code (u32)     // ExecutionErrorSummary.status_code

  has_exec_result     (u8)
  [ exec_result       (EncodedRef) ]   // if has_exec_result == 0x01

  input_ref_count     (u32)
  input_refs          (EncodedRef[0..input_ref_count-1])

  has_params_ref      (u8)
  [ params_ref        (EncodedRef) ]   // if has_params_ref == 0x01

  node_trace_count    (u32)
  node_traces         (NodeTraceDAGBytes[0..node_trace_count-1])

Field semantics:

  1. pel1_version (u16)

    • MUST be 1 for traces produced under PEL/TRACE-DAG/1 v0.1.0.

    • Decoders:

      • MUST accept pel1_version = 1.
      • MUST treat other values as encoding errors for this profile revision.
  2. scheme_ref (EncodedRef)

    • Encodes the Reference to the scheme descriptor; for this profile MUST be SchemeRef_DAG_1.
  3. program_ref (EncodedRef)

    • Encodes the Reference of the Program Artifact executed in this run.
  4. status, summary_kind, summary_status_code (u32)

    • As described in §4.3.
  5. has_exec_result (u8) and exec_result (EncodedRef)

    • Encodes exec_result_ref : optional Reference:

      • has_exec_result = 0x00 → absent, no exec_result bytes follow.
      • has_exec_result = 0x01 → exactly one EncodedRef follows, encoding exec_result_ref.
      • Other values MUST be treated as encoding errors.
  6. input_ref_count (u32) and input_refs (EncodedRef[..])

    • Number and encoded values of the input_refs list.
    • Encodes TraceDAGValue.input_refs in order.
  7. has_params_ref (u8) and params_ref (EncodedRef)

    • Encodes params_ref : optional Reference:

      • has_params_ref = 0x00 → absent.
      • has_params_ref = 0x01 → present, encoded as EncodedRef.
      • Other values MUST be treated as encoding errors.
  8. node_trace_count (u32) and node_traces (NodeTraceDAGBytes[..])

    • Number and encoded values of node_traces.
    • Canonical requirement: encoders MUST set node_trace_count equal to the number of Nodes in the Program (for runs where at least one node is attempted), and MUST encode node traces in canonical node order (§4.2.1).

5. Canonicality & Injectivity

5.1 Injectivity

The mapping:

TraceDAGValue -> TraceDAGBytes

defined by this profile MUST be injective:

  • If T1 != T2 as logical TraceDAGValue instances (per PEL/TRACE-DAG/1), then their encodings MUST differ:

    T1 != T2  ⇒  TraceDAGBytes(T1) != TraceDAGBytes(T2)
    

This is ensured by:

  • fixed field ordering and explicit presence flags,
  • deterministic list ordering,
  • inclusion of all logically relevant fields.

5.2 Stability

The same logical trace value MUST always yield the same TraceDAGBytes across:

  • implementations,
  • platforms,
  • time.

Encoders MUST NOT:

  • reorder any list elements (e.g., input_refs, node_traces, output_refs, diagnostics),
  • introduce alternative encodings for integers or strings,
  • omit or reorder fields.

5.3 Node ordering

For runs where at least one node is attempted and the Program is structurally valid:

  • node_traces MUST have exactly one entry per Program Node (per PEL/PROGRAM-DAG/1) in canonical node order.

If the Program cannot be decoded or is structurally invalid:

  • node_traces MAY be empty; if non-empty, any node entries MUST still follow canonical node order for the subset present.

6. Trace Artifact Binding

6.1 TypeTag

Trace Artifacts for this profile MUST be ASL/1 Artifacts with:

Artifact {
  bytes    = TraceDAGBytes
  type_tag = TYPE_TAG_PEL_TRACE_DAG_1
}

Where:

  • TYPE_TAG_PEL_TRACE_DAG_1 is a TypeTag with a concrete tag_id assigned in the global TypeTag registry for DAG traces.

This encoding profile:

  • Refers to TYPE_TAG_PEL_TRACE_DAG_1 symbolically.
  • Does not assign a numeric tag_id; that is handled in a separate registry.

6.2 Identity via ASL/1-CORE

With ENC/ASL1-CORE v1 and "ASL1" hashes (HASH/ASL1):

  1. Canonical ArtifactBytes for a trace Artifact:

    ArtifactBytes =
      encode_artifact_core_v1(
        Artifact{
          bytes    = TraceDAGBytes,
          type_tag = TYPE_TAG_PEL_TRACE_DAG_1
        }
      )
    
  2. Canonical Reference for the trace Artifact under some HashId = HID:

    digest    = H(ArtifactBytes)              // H from HASH/ASL1, for HID
    reference = Reference { hash_id = HID,
                            digest  = digest }
    

All conformant implementations MUST agree on:

  • TraceDAGBytes for a given logical TraceDAGValue,
  • ArtifactBytes for the resulting trace Artifact,
  • the resulting Reference for any fixed HashId and hash algorithm.

7. Error Handling (Encoding Layer)

Decoders for this profile MUST treat as encoding errors:

  1. Truncated values:

    • Any attempt to read a declared integer, length-prefixed blob (Blob32, Utf8String, EncodedRef), or list entry that runs out of bytes.
  2. Invalid pel1_version:

    • pel1_version != 1.
  3. Invalid NodeTraceStatus:

    • status not in { 0x00, 0x01, 0x02 }.
  4. Invalid optional flags:

    • has_exec_result or has_params_ref not in { 0x00, 0x01 }.
  5. Invalid EncodedRef:

    • ref_len < 2, or
    • ref_bytes cannot be decoded as ReferenceBytes (per ENC/ASL1-CORE v1.0.3).
  6. Invalid Utf8String in op_name:

    • op_name bytes not valid UTF-8.
  7. Inconsistent list counts:

    • Not enough elements following a list count (e.g. input_ref_count, node_trace_count, output_ref_count, diag_count).

Mapping from these encoding errors to external error codes (e.g. ERR_PEL_TRACE_ENC_INVALID) is implementation-specific.

Semantic inconsistencies (e.g. mismatched summary vs status) are semantic-layer issues; decoders MAY validate them but are not required to at the encoding layer.


8. Streaming & Implementation Notes

Implementation requirements:

  • Single-pass encoding:

    • Encoders MUST be able to generate TraceDAGBytes in a single forward pass over the logical TraceDAGValue, assuming they have the structure in memory.
    • They MAY need to precompute counts or sizes (e.g., node_trace_count), but this is standard.
  • Single-pass decoding:

    • Decoders MUST be able to decode TraceDAGBytes in a single forward pass, with no backtracking.
    • All length prefixes appear before their content.

For large traces:

  • Implementations MAY:

    • stream NodeTraceDAGBytes entries to consumers as they decode,
    • stream diagnostic message blobs.
  • They MUST ensure that any observable behavior (including error reporting and any reconstructed TraceDAGValue) is independent of chunking or I/O strategy.


9. Conformance

An implementation is ENC/PEL-TRACE-DAG/1conformant if it:

  1. Implements the encoding layout

    • Encodes and decodes TraceDAGBytes exactly as described in §4.
    • Treats pel1_version = 1 as the only supported version for this profile revision.
    • Enforces validity of discriminants and presence flags at the encoding layer.
  2. Preserves canonical ordering

    • When encoding, preserves:

      • order of input_refs,
      • canonical order of node_traces (per PEL/PROGRAM-DAG/1),
      • order of output_refs and diagnostics within each NodeTraceDAG.
  3. Uses canonical sub-encodings

    • Uses Utf8String and Blob32 exactly as in §2.
    • Uses EncodedRef as defined in §2.5, with ReferenceBytes from ENC/ASL1-CORE v1.0.3.
  4. Ensures injectivity & stability

    • Ensures distinct logical TraceDAGValues produce distinct TraceDAGBytes.
    • Ensures the same logical value always encodes to the same bytes (no configuration affecting layout).
  5. Binds to trace Artifacts correctly

    • When forming trace Artifacts for PEL/TRACE-DAG/1, sets:

      • Artifact.bytes = TraceDAGBytes
      • Artifact.type_tag = TYPE_TAG_PEL_TRACE_DAG_1
    • Uses ENC/ASL1-CORE v1 and HASH/ASL1 for identity.

Everything else — storage, transport, policy, and graph interpretation — is delegated to other specifications.


10. Informative Example

This example illustrates field layout only. Hex and values are illustrative, not normative test vectors.

Assume a simple run:

  • pel1_version = 1

  • scheme_ref = S (an ASL/1 Reference)

  • program_ref = P

  • status = OK (0)

  • summary.kind = NONE (0), summary.status_code = 0

  • exec_result_ref = R

  • Inputs: three References [I0, I1, I2]

  • No params (params_ref = absent)

  • Program has two nodes in canonical order, with traces:

    • Node 1: OK, produced one output [O0]
    • Node 2: OK, produced one output [O1]

Simplified encoding sketch:

pel1_version        = 0001               ; u16

scheme_ref          = EncodedRef(S)      ; 4-byte length + ReferenceBytes(S)
program_ref         = EncodedRef(P)

status              = 00                 ; OK
summary_kind        = 00                 ; NONE
summary_status_code = 00000000           ; status_code = 0

has_exec_result     = 01
exec_result         = EncodedRef(R)

input_ref_count     = 00000003
input_refs          = EncodedRef(I0) EncodedRef(I1) EncodedRef(I2)

has_params_ref      = 00                 ; none

node_trace_count    = 00000002

; NodeTrace #0
node_id             = 00000001
op_name             = 00000005 "add64"
op_version          = 00000001
status              = 00                 ; NODE_OK
status_code         = 00000000
output_ref_count    = 00000001
output_refs         = EncodedRef(O0)
diag_count          = 00000000

; NodeTrace #1
node_id             = 00000002
op_name             = 00000005 "mul64"
op_version          = 00000001
status              = 00                 ; NODE_OK
status_code         = 00000000
output_ref_count    = 00000001
output_refs         = EncodedRef(O1)
diag_count          = 00000000

Where each EncodedRef(X) is:

ref_len(X) (u32) || ReferenceBytes(X)

with ReferenceBytes(X) = hash_id (u16) + digest bytes per ENC/ASL1-CORE v1.

All conformant encoders MUST produce the same TraceDAGBytes for this logical trace value; all conformant decoders MUST reconstruct the same TraceDAGValue.


End of ENC/PEL-TRACE-DAG/1 v0.1.0 — Canonical Encoding for DAG Execution Traces


Document History

  • 0.1.0 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.