amduat/tier1/enc-pel-trace-dag-1.md

836 lines
22 KiB
Markdown
Raw Permalink Normal View History

2025-12-20 12:35:10 +01:00
# ENC/PEL-TRACE-DAG/1 — Canonical Encoding for DAG Execution Traces
Status: Approved
Owner: Niklas Rydberg
Version: 0.1.0
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [binary-minimalism, traceability]
<!-- Source: /amduat/docs/new/enc-pel-trace-dag-1.md | Canonical: /amduat/tier1/enc-pel-trace-dag-1.md -->
**Document ID:** `ENC/PEL-TRACE-DAG/1`
**Profile ID:** `PEL_ENC_TRACE_DAG_V1 = 0x0102`
**Layer:** Scheme Encoding Profile (Trace)
**Depends on (normative):**
* `ASL/1-CORE v0.3.2` — value model (`Artifact`, `TypeTag`, `Reference`, integers, `OctetString`)
* `ENC/ASL1-CORE v1.0.3` — canonical encodings for `Artifact` and `Reference`
* `PEL/1-CORE v0.1.0` — primitive execution layer (ExecutionStatus, ExecutionErrorSummary)
* `PEL/PROGRAM-DAG/1 v0.2.0` — DAG Program scheme (Program, Node, NodeId, canonical node order)
* `PEL/TRACE-DAG/1 v0.1.0` — DAG execution trace profile (logical data model)
**Integrates with (informative):**
* `PEL/1-SURF v0.1.0` — store-backed execution surface
* `HASH/ASL1 v0.2.3` — ASL1 hash family for trace artifact identity
* TypeTag registry (for `TYPE_TAG_PEL_TRACE_DAG_1`)
> The Profile ID `PEL_ENC_TRACE_DAG_V1` is a configuration label.
> It is **not** embedded into trace payloads. Encoders and decoders select this encoding profile by context (scheme descriptor, engine/store configuration), not per value.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Overview
`ENC/PEL-TRACE-DAG/1` defines the **canonical binary encoding** of the `TraceDAGValue` structure specified in `PEL/TRACE-DAG/1`, and all of its nested components:
* `TraceDAGValue`
* `NodeTraceDAG`
* `DiagnosticEntry`
This encoding:
* is **injective** — distinct logical trace values → distinct byte strings;
* is **stable and deterministic** — same value → same bytes across implementations and time;
* is **streaming-friendly** — encoders/decoders can operate in a single forward pass;
* embeds ASL/1 `Reference` values using their canonical `ReferenceBytes` encoding (`ENC/ASL1-CORE v1`) inside length-prefixed frames.
The encoded payload (`TraceDAGBytes`) is used as the `bytes` field of a trace Artifact for the `PEL/TRACE-DAG/1` profile, with:
```text
Artifact.type_tag = TYPE_TAG_PEL_TRACE_DAG_1
Artifact.bytes = TraceDAGBytes
````
Trace Artifact identity is then derived by hashing canonical `ArtifactBytes` with `"ASL1"` hash algorithms (typically `HASH-ASL1-256`).
---
## 1. Scope & Layering
### 1.1 Purpose
This specification defines:
* The **binary layout** of:
* `TraceDAGBytes`
* `NodeTraceDAGBytes`
* `DiagnosticEntryBytes`
* An internal “encoded Reference” wrapper for use inside trace payloads
* The canonical **field ordering**, integer widths, and list framing.
It does **not** define:
* The logical semantics of traces — those are in `PEL/TRACE-DAG/1`.
* The ASL/1 `Artifact` / `Reference` encodings — these are in `ENC/ASL1-CORE v1.0.3`.
* How traces are produced or when they are enabled — that is governed by `PEL/1-CORE`, `PEL/1-SURF`, and policy profiles.
### 1.2 Layering constraints
In line with `SUBSTRATE/STACK-OVERVIEW`:
* `ENC/PEL-TRACE-DAG/1` is a **scheme-specific encoding profile**.
* It MUST NOT redefine:
* `Artifact`, `TypeTag`, `Reference`, `HashId` (`ASL/1-CORE`),
* the `TraceDAGValue` logical structure (`PEL/TRACE-DAG/1`).
* It is **storage-neutral** and **policy-neutral**.
* It defines exactly one canonical encoding for `TraceDAGValue` values in this scheme.
---
## 2. Conventions
The RFC 2119 terms **MUST**, **SHOULD**, **MAY**, etc. are normative.
### 2.1 Integer encodings
All multi-byte integers are encoded as **big-endian** (network byte order), as in `ENC/ASL1-CORE`:
* `u8` — 1 byte
* `u16` — 2 bytes
* `u32` — 4 bytes
* `u64` — 8 bytes
Only **fixed-width** integers are used in this specification.
### 2.2 Lists
A list of values of some type `T` is encoded as:
```text
List<T> =
count (u32)
element_0
element_1
...
element_{count-1}
```
* `count` is the number of elements (MAY be zero).
* Elements are encoded in order, using the canonical encoding for `T`.
### 2.3 UTF-8 string
`Utf8String` is encoded as:
```text
Utf8String =
length (u32)
bytes[0..length-1]
```
* `length` is the number of bytes.
* `bytes` MUST be well-formed UTF-8.
* There is no terminator or padding.
### 2.4 Octet blob (32-bit length)
For diagnostics and other opaque fields, we use a generic blob:
```text
Blob32 =
length (u32)
bytes[0..length-1]
```
`bytes` is an arbitrary `OctetString`; interpretation is profile-specific. `length` MAY be zero.
### 2.5 Embedded Reference
Within this encoding, `Reference` values are embedded using a length-prefixed wrapper over canonical `ReferenceBytes` from `ENC/ASL1-CORE v1.0.3`.
We define:
```text
EncodedRef =
ref_len (u32)
ref_bytes (byte[0..ref_len-1]) // canonical ReferenceBytes
```
Where:
* `ref_bytes` MUST be the canonical `ReferenceBytes` encoding for a `Reference` value, as defined in `ENC/ASL1-CORE v1.0.3`:
```text
ReferenceBytes ::
hash_id (u16)
digest (byte[...]) // remaining bytes
```
* `ref_len` MUST be the exact length (in bytes) of `ref_bytes` (MUST be ≥ 2).
Decoders MUST:
* Read `ref_len (u32)`, then `ref_bytes[0..ref_len-1]`.
* Decode `ref_bytes` as `ReferenceBytes` per `ENC/ASL1-CORE v1.0.3`.
* Reject encodings where:
* `ref_len < 2`, or
* `ref_bytes` is not a valid `ReferenceBytes` sequence (e.g., truncated).
#### 2.5.1 Optional EncodedRef
For optional `Reference` fields, we use:
```text
OptionalEncodedRef =
has_ref (u8)
[ EncodedRef ] // only if has_ref = 0x01
```
* `has_ref = 0x00` → no value present; no `EncodedRef` follows.
* `has_ref = 0x01` → exactly one `EncodedRef` follows.
Other `has_ref` values MUST be treated as encoding errors.
---
## 3. Logical Model Reference
This section restates the logical structures from `PEL/TRACE-DAG/1` (source of truth) in condensed form.
### 3.1 DiagnosticEntry
```text
DiagnosticEntry {
code: uint32 // diagnostic or error code
message: OctetString // typically UTF-8 text; interpretation is profile-specific
}
```
### 3.2 NodeTraceStatus
```text
NodeTraceStatus = uint8
NodeTraceStatus {
NODE_OK = 0
NODE_FAILED = 1
NODE_SKIPPED = 2
}
```
### 3.3 NodeTraceDAG
```text
NodeTraceDAG {
node_id: NodeId // uint32
op_name: string
op_version: uint32
status: NodeTraceStatus
status_code: uint32 // 0 = success; non-zero = op-specific failure code
output_refs: list<Reference>
diagnostics: list<DiagnosticEntry>
}
```
### 3.4 TraceDAGValue
```text
TraceDAGValue {
pel1_version: uint16 // MUST be 1 for this version
scheme_ref: Reference // SchemeRef_DAG_1
program_ref: Reference // Program Artifact reference
status: ExecutionStatus
summary: ExecutionErrorSummary {
kind: ExecutionErrorKind
status_code: uint32
}
exec_result_ref: optional Reference
input_refs: list<Reference>
params_ref: optional Reference
node_traces: list<NodeTraceDAG> // one per Node in canonical node order
}
```
All semantics (how these fields are populated for different run outcomes) are defined in `PEL/TRACE-DAG/1`.
---
## 4. Encoding
### 4.1 DiagnosticEntry encoding
Logical:
```text
DiagnosticEntry {
code: uint32
message: OctetString
}
```
Canonical encoding:
```text
DiagnosticEntryBytes ::
code (u32)
message (Blob32)
```
Where `Blob32` is as defined in §2.4.
* `code (u32)` encodes the diagnostic or error code.
* `message` is an opaque byte blob; profile users MAY agree to use UTF-8 here, but this encoding does not enforce it.
Decoders MUST:
* Read `code (u32)`.
* Read `message` as `Blob32`.
* Treat truncated blobs as encoding errors.
---
### 4.2 NodeTraceDAG encoding
Logical:
```text
NodeTraceDAG {
node_id: NodeId
op_name: string
op_version: uint32
status: NodeTraceStatus
status_code: uint32
output_refs: list<Reference>
diagnostics: list<DiagnosticEntry>
}
```
Canonical encoding:
```text
NodeTraceDAGBytes ::
node_id (u32)
op_name (Utf8String)
op_version (u32)
status (u8) // NodeTraceStatus
status_code (u32)
output_ref_count (u32)
output_refs (EncodedRef[0..output_ref_count-1])
diag_count (u32)
diagnostics (DiagnosticEntryBytes[0..diag_count-1])
```
Field semantics:
1. `node_id (u32)`
* Encodes `NodeTraceDAG.node_id`.
2. `op_name (Utf8String)`
* Encodes `NodeTraceDAG.op_name` as UTF-8 (see §2.3).
3. `op_version (u32)`
* Encodes `NodeTraceDAG.op_version`.
4. `status (u8)`
* Encodes `NodeTraceStatus`:
```text
0x00 -> NODE_OK
0x01 -> NODE_FAILED
0x02 -> NODE_SKIPPED
```
* Other values MUST be treated as encoding errors.
5. `status_code (u32)`
* MUST be:
* `0` if `status = NODE_OK` or `NODE_SKIPPED`.
* non-zero if `status = NODE_FAILED`.
* Conformance to these rules is a semantic requirement, not a decoding requirement. Decoders MAY choose to validate and reject inconsistent encodings.
6. `output_ref_count (u32)` and `output_refs`
* Number of output references for this node.
* Each entry is an `EncodedRef` (§2.5).
7. `diag_count (u32)` and `diagnostics`
* Number of diagnostic entries.
* Each entry is encoded using `DiagnosticEntryBytes` (§4.1).
#### 4.2.1 Canonical ordering of node_traces
In `TraceDAGBytes`, the `node_traces` list (see §4.4) MUST:
* contain each `NodeTraceDAG` exactly once for each Program `Node`,
* appear in the canonical node order defined by `PEL/PROGRAM-DAG/1` (canonical topological order with `NodeId` as tie-breaker).
Encoders MUST enforce this; decoders MAY assume it.
---
### 4.3 Summary and status encoding
From `TraceDAGValue`:
```text
summary: ExecutionErrorSummary {
kind: ExecutionErrorKind
status_code: uint32
}
status: ExecutionStatus
```
These are encoded in `TraceDAGBytes` as:
```text
status (u8) // ExecutionStatus
summary_kind (u8) // ExecutionErrorKind
summary_status_code (u32)
```
The exact value sets of `ExecutionStatus` and `ExecutionErrorKind` are defined in `PEL/1-CORE` and `PEL/TRACE-DAG/1`; this spec treats them as small enums in `u8` space.
Decoders MUST:
* Read `status (u8)`, `summary_kind (u8)`, `summary_status_code (u32)` as raw fields.
* Treat `status` and `summary_kind` values outside the agreed ranges as encoding errors or map them to an “unknown” variant at the semantic layer.
---
### 4.4 TraceDAGValue encoding
Logical:
```text
TraceDAGValue {
pel1_version: uint16
scheme_ref: Reference
program_ref: Reference
status: ExecutionStatus
summary: ExecutionErrorSummary
exec_result_ref: optional Reference
input_refs: list<Reference>
params_ref: optional Reference
node_traces: list<NodeTraceDAG>
}
```
Canonical encoding:
```text
TraceDAGBytes ::
pel1_version (u16)
scheme_ref (EncodedRef)
program_ref (EncodedRef)
status (u8) // ExecutionStatus
summary_kind (u8) // ExecutionErrorKind
summary_status_code (u32) // ExecutionErrorSummary.status_code
has_exec_result (u8)
[ exec_result (EncodedRef) ] // if has_exec_result == 0x01
input_ref_count (u32)
input_refs (EncodedRef[0..input_ref_count-1])
has_params_ref (u8)
[ params_ref (EncodedRef) ] // if has_params_ref == 0x01
node_trace_count (u32)
node_traces (NodeTraceDAGBytes[0..node_trace_count-1])
```
Field semantics:
1. `pel1_version (u16)`
* MUST be `1` for traces produced under `PEL/TRACE-DAG/1 v0.1.0`.
* Decoders:
* MUST accept `pel1_version = 1`.
* MUST treat other values as encoding errors for this profile revision.
2. `scheme_ref (EncodedRef)`
* Encodes the `Reference` to the scheme descriptor; for this profile MUST be `SchemeRef_DAG_1`.
3. `program_ref (EncodedRef)`
* Encodes the `Reference` of the Program Artifact executed in this run.
4. `status`, `summary_kind`, `summary_status_code (u32)`
* As described in §4.3.
5. `has_exec_result (u8)` and `exec_result (EncodedRef)`
* Encodes `exec_result_ref : optional Reference`:
* `has_exec_result = 0x00` → absent, no `exec_result` bytes follow.
* `has_exec_result = 0x01` → exactly one `EncodedRef` follows, encoding `exec_result_ref`.
* Other values MUST be treated as encoding errors.
6. `input_ref_count (u32)` and `input_refs (EncodedRef[..])`
* Number and encoded values of the `input_refs` list.
* Encodes `TraceDAGValue.input_refs` in order.
7. `has_params_ref (u8)` and `params_ref (EncodedRef)`
* Encodes `params_ref : optional Reference`:
* `has_params_ref = 0x00` → absent.
* `has_params_ref = 0x01` → present, encoded as `EncodedRef`.
* Other values MUST be treated as encoding errors.
8. `node_trace_count (u32)` and `node_traces (NodeTraceDAGBytes[..])`
* Number and encoded values of `node_traces`.
* **Canonical requirement:** encoders MUST set `node_trace_count` equal to the number of `Node`s in the Program (for runs where at least one node is attempted), and MUST encode node traces in canonical node order (§4.2.1).
---
## 5. Canonicality & Injectivity
### 5.1 Injectivity
The mapping:
```text
TraceDAGValue -> TraceDAGBytes
```
defined by this profile MUST be **injective**:
* If `T1 != T2` as logical `TraceDAGValue` instances (per `PEL/TRACE-DAG/1`), then their encodings MUST differ:
```text
T1 != T2 ⇒ TraceDAGBytes(T1) != TraceDAGBytes(T2)
```
This is ensured by:
* fixed field ordering and explicit presence flags,
* deterministic list ordering,
* inclusion of all logically relevant fields.
### 5.2 Stability
The same logical trace value MUST always yield the same `TraceDAGBytes` across:
* implementations,
* platforms,
* time.
Encoders MUST NOT:
* reorder any list elements (e.g., `input_refs`, `node_traces`, `output_refs`, `diagnostics`),
* introduce alternative encodings for integers or strings,
* omit or reorder fields.
### 5.3 Node ordering
For runs where at least one node is attempted and the Program is structurally valid:
* `node_traces` MUST have exactly one entry per Program `Node` (per `PEL/PROGRAM-DAG/1`) in canonical node order.
If the Program cannot be decoded or is structurally invalid:
* `node_traces` MAY be empty; if non-empty, any node entries MUST still follow canonical node order for the subset present.
---
## 6. Trace Artifact Binding
### 6.1 TypeTag
Trace Artifacts for this profile MUST be ASL/1 Artifacts with:
```text
Artifact {
bytes = TraceDAGBytes
type_tag = TYPE_TAG_PEL_TRACE_DAG_1
}
```
Where:
* `TYPE_TAG_PEL_TRACE_DAG_1` is a `TypeTag` with a concrete `tag_id` assigned in the global TypeTag registry for DAG traces.
This encoding profile:
* Refers to `TYPE_TAG_PEL_TRACE_DAG_1` symbolically.
* Does not assign a numeric `tag_id`; that is handled in a separate registry.
### 6.2 Identity via ASL/1-CORE
With `ENC/ASL1-CORE v1` and `"ASL1"` hashes (`HASH/ASL1`):
1. Canonical `ArtifactBytes` for a trace Artifact:
```text
ArtifactBytes =
encode_artifact_core_v1(
Artifact{
bytes = TraceDAGBytes,
type_tag = TYPE_TAG_PEL_TRACE_DAG_1
}
)
```
2. Canonical `Reference` for the trace Artifact under some `HashId = HID`:
```text
digest = H(ArtifactBytes) // H from HASH/ASL1, for HID
reference = Reference { hash_id = HID,
digest = digest }
```
All conformant implementations MUST agree on:
* `TraceDAGBytes` for a given logical `TraceDAGValue`,
* `ArtifactBytes` for the resulting trace Artifact,
* the resulting `Reference` for any fixed `HashId` and hash algorithm.
---
## 7. Error Handling (Encoding Layer)
Decoders for this profile MUST treat as **encoding errors**:
1. Truncated values:
* Any attempt to read a declared integer, length-prefixed blob (`Blob32`, `Utf8String`, `EncodedRef`), or list entry that runs out of bytes.
2. Invalid `pel1_version`:
* `pel1_version != 1`.
3. Invalid `NodeTraceStatus`:
* `status` not in `{ 0x00, 0x01, 0x02 }`.
4. Invalid optional flags:
* `has_exec_result` or `has_params_ref` not in `{ 0x00, 0x01 }`.
5. Invalid `EncodedRef`:
* `ref_len < 2`, or
* `ref_bytes` cannot be decoded as `ReferenceBytes` (per `ENC/ASL1-CORE v1.0.3`).
6. Invalid `Utf8String` in `op_name`:
* `op_name` bytes not valid UTF-8.
7. Inconsistent list counts:
* Not enough elements following a list count (e.g. `input_ref_count`, `node_trace_count`, `output_ref_count`, `diag_count`).
Mapping from these encoding errors to external error codes (e.g. `ERR_PEL_TRACE_ENC_INVALID`) is implementation-specific.
Semantic inconsistencies (e.g. mismatched `summary` vs `status`) are semantic-layer issues; decoders MAY validate them but are not required to at the encoding layer.
---
## 8. Streaming & Implementation Notes
Implementation requirements:
* **Single-pass encoding**:
* Encoders MUST be able to generate `TraceDAGBytes` in a single forward pass over the logical `TraceDAGValue`, assuming they have the structure in memory.
* They MAY need to precompute counts or sizes (e.g., `node_trace_count`), but this is standard.
* **Single-pass decoding**:
* Decoders MUST be able to decode `TraceDAGBytes` in a single forward pass, with no backtracking.
* All length prefixes appear before their content.
For large traces:
* Implementations MAY:
* stream `NodeTraceDAGBytes` entries to consumers as they decode,
* stream diagnostic message blobs.
* They MUST ensure that any observable behavior (including error reporting and any reconstructed `TraceDAGValue`) is independent of chunking or I/O strategy.
---
## 9. Conformance
An implementation is **ENC/PEL-TRACE-DAG/1conformant** if it:
1. **Implements the encoding layout**
* Encodes and decodes `TraceDAGBytes` exactly as described in §4.
* Treats `pel1_version = 1` as the only supported version for this profile revision.
* Enforces validity of discriminants and presence flags at the encoding layer.
2. **Preserves canonical ordering**
* When encoding, preserves:
* order of `input_refs`,
* canonical order of `node_traces` (per `PEL/PROGRAM-DAG/1`),
* order of `output_refs` and `diagnostics` within each `NodeTraceDAG`.
3. **Uses canonical sub-encodings**
* Uses `Utf8String` and `Blob32` exactly as in §2.
* Uses `EncodedRef` as defined in §2.5, with `ReferenceBytes` from `ENC/ASL1-CORE v1.0.3`.
4. **Ensures injectivity & stability**
* Ensures distinct logical `TraceDAGValue`s produce distinct `TraceDAGBytes`.
* Ensures the same logical value always encodes to the same bytes (no configuration affecting layout).
5. **Binds to trace Artifacts correctly**
* When forming trace Artifacts for `PEL/TRACE-DAG/1`, sets:
* `Artifact.bytes = TraceDAGBytes`
* `Artifact.type_tag = TYPE_TAG_PEL_TRACE_DAG_1`
* Uses `ENC/ASL1-CORE v1` and `HASH/ASL1` for identity.
Everything else — storage, transport, policy, and graph interpretation — is delegated to other specifications.
---
## 10. Informative Example
> This example illustrates field layout only.
> Hex and values are illustrative, not normative test vectors.
Assume a simple run:
* `pel1_version = 1`
* `scheme_ref = S` (an ASL/1 Reference)
* `program_ref = P`
* `status = OK (0)`
* `summary.kind = NONE (0)`, `summary.status_code = 0`
* `exec_result_ref = R`
* Inputs: three `Reference`s `[I0, I1, I2]`
* No params (`params_ref = absent`)
* Program has two nodes in canonical order, with traces:
* Node 1: OK, produced one output `[O0]`
* Node 2: OK, produced one output `[O1]`
Simplified encoding sketch:
```text
pel1_version = 0001 ; u16
scheme_ref = EncodedRef(S) ; 4-byte length + ReferenceBytes(S)
program_ref = EncodedRef(P)
status = 00 ; OK
summary_kind = 00 ; NONE
summary_status_code = 00000000 ; status_code = 0
has_exec_result = 01
exec_result = EncodedRef(R)
input_ref_count = 00000003
input_refs = EncodedRef(I0) EncodedRef(I1) EncodedRef(I2)
has_params_ref = 00 ; none
node_trace_count = 00000002
; NodeTrace #0
node_id = 00000001
op_name = 00000005 "add64"
op_version = 00000001
status = 00 ; NODE_OK
status_code = 00000000
output_ref_count = 00000001
output_refs = EncodedRef(O0)
diag_count = 00000000
; NodeTrace #1
node_id = 00000002
op_name = 00000005 "mul64"
op_version = 00000001
status = 00 ; NODE_OK
status_code = 00000000
output_ref_count = 00000001
output_refs = EncodedRef(O1)
diag_count = 00000000
```
Where each `EncodedRef(X)` is:
```text
ref_len(X) (u32) || ReferenceBytes(X)
```
with `ReferenceBytes(X)` = `hash_id (u16)` + `digest` bytes per `ENC/ASL1-CORE v1`.
All conformant encoders MUST produce the same `TraceDAGBytes` for this logical trace value; all conformant decoders MUST reconstruct the same `TraceDAGValue`.
---
**End of `ENC/PEL-TRACE-DAG/1 v0.1.0 — Canonical Encoding for DAG Execution Traces`**
---
## Document History
* **0.1.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.