amduat/tier1/enc-asl1-core.md

599 lines
18 KiB
Markdown
Raw Normal View History

# ENC/ASL1-CORE v1 — Core Canonical Encoding Profile
Status: Approved
Owner: Niklas Rydberg
Version: 1.0.5
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, binary-minimalism]
<!-- Source: /amduat/docs/new/enc-asl-core.md | Canonical: /amduat/tier1/enc-asl1-core.md -->
**Document ID:** `ENC/ASL1-CORE`
**Profile ID:** `ASL_ENC_CORE_V1 = 0x0001`
**Layer:** Substrate Primitive Profile (Canonical Encoding)
**Depends on (normative):**
* **ASL/1-CORE v0.4.1** (value model: `Artifact`, `TypeTag`, `Reference`, `HashId`)
**Integrates with (cross-profile rules):**
* **HASH/ASL1 v0.2.4** (ASL1 hash family: registry of `HashId → algorithm, digest length`)
* This profile does **not** depend on HASH/ASL1 to define its layouts.
* When both profiles are implemented, additional cross-checks apply (see §4.4, §5).
**Used by (descriptive):**
* ASL/1-CORE identity semantics (canonical encodings as the basis for hashing)
* ASL/1-STORE (persistence and integrity)
* PEL/1 (execution artifacts and results)
* CIL/1, FER/1, FCT/1, OI/1 (typed envelopes, receipts, facts, overlays)
* HASH/ASL1 (interpretation and checking of `ReferenceBytes`)
> The Profile ID `ASL_ENC_CORE_V1` and this documents version are **not** encoded into `ArtifactBytes` or `ReferenceBytes`. Encoding version is selected by context (deployment, profile, or store configuration), not embedded per value.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Overview
`ENC/ASL1-CORE v1` defines the **canonical, streaming-friendly, injective binary encoding** used across the Amduat 2.0 substrate for two core value types from ASL/1-CORE:
1. **ArtifactBytes** — canonical bytes for an ASL/1 `Artifact`
2. **ReferenceBytes** — canonical bytes for an ASL/1 `Reference`
This profile ensures:
* **Injectivity** — each ASL/1 value maps to exactly one byte string.
* **Determinism** — identical values yield identical encodings across implementations.
* **Stability** — bytes never depend on platform, locale, endian, or environment.
* **Streaming-compatibility** — encoders, decoders, and hashers operate in forward-only mode.
`ASL_ENC_CORE_V1` is the **canonical ASL/1 encoding profile** used by the Amduat 2.0 substrate stack for:
* ASL/1 identity model (via canonical encoding + ASL1 hashing),
* the hashing substrate (HASH/ASL1),
* ASL/1-STORE persistence semantics,
* PEL/1 execution input/output artifacts,
* and canonical near-core profiles.
The encodings defined in this profile satisfy all canonical encoding requirements in `ASL/1-CORE §3.2`: injectivity, stability, determinism, explicit structure, type-sensitivity, byte-transparency, and streaming-friendliness.
---
## 1. Scope & Layering
### 1.1 Purpose
This specification defines:
* The **canonical binary layout** for `ArtifactBytes` and `ReferenceBytes`.
* Normative encoding and decoding procedures.
* How these encodings interact with the ASL1 hash family.
* Required consistency checks when HASH/ASL1 is present.
* Streaming and injectivity requirements.
### 1.2 Non-goals
This profile does **not** define:
* Any filesystem, transport, or database representation.
* Chunking or multipart strategies for large artifacts.
* Any alternative encoding families (those are separate profiles).
* Semantics of `TypeTag` values or registry rules.
* Storage layout, replication, or policy.
Those concerns belong to ASL/1-STORE, PEL/1, HASH/ASL1, and higher layers.
### 1.3 Layering constraints
In line with the substrate overview:
* `ENC/ASL1-CORE` is a **near-core substrate profile**, not a kernel primitive.
* It **MUST NOT** re-define `Artifact`, `Reference`, `TypeTag`, or `HashId`; those are defined solely by `ASL/1-CORE`.
* It is **storage-neutral** and **policy-neutral**.
* It defines exactly one canonical encoding profile: `ASL_ENC_CORE_V1`.
---
## 2. Conventions
The key words **MUST**, **SHOULD**, **MAY**, etc. follow RFC 2119.
### 2.1 Integer encodings
All multi-byte integers are encoded as **big-endian**:
* `u8` — 1 byte
* `u16` — 2 bytes
* `u32` — 4 bytes
* `u64` — 8 bytes
Only **fixed-width** integers are used.
### 2.2 Booleans (presence flags)
Booleans used as presence flags are encoded as:
* `false``0x00`
* `true``0x01`
Booleans are only used for presence flags, never for general logical conditions.
### 2.3 OctetString
Except where explicitly overridden, an `OctetString` is encoded as:
```text
[length (u64)] [raw bytes]
```
* `length` is the number of bytes.
* `length` MAY be zero.
* There is no implicit terminator or padding.
Whenever this profile says an ASL/1 field is an `OctetString`, its canonical encoding is this `u64 + bytes` form **unless explicitly stated otherwise**.
> **Exception:** `Reference.digest` is encoded without an explicit length field; see §4.2.
---
## 3. Artifact Encoding
### 3.1 Logical structure (from ASL/1-CORE)
From `ASL/1-CORE`:
```text
TypeTag {
tag_id: uint32
}
Artifact {
bytes: OctetString
type_tag: optional TypeTag
}
```
`TypeTag` semantics (registries, meaning of tag IDs) are opaque at this layer.
### 3.2 Canonical layout: ArtifactBytes
The canonical binary layout for an `Artifact` is:
```text
+----------------------+-------------------------+---------------------------+
| has_type_tag (u8) | [type_tag (u32)] | bytes_len (u64) |
+----------------------+-------------------------+---------------------------+
| bytes (b[bytes_len]) ...
+------------------------------------------------------------------------
```
Fields:
1. **has_type_tag (u8)** — presence flag for `type_tag`
* `0x00` → no `type_tag`
* `0x01``type_tag` is present and follows immediately
2. **type_tag (u32)** — only present if `has_type_tag == 0x01`
* Encodes `TypeTag.tag_id` as a 32-bit unsigned integer.
3. **bytes_len (u64)**
* Length in bytes of `Artifact.bytes`.
* MAY be zero.
4. **bytes**
* Raw bytes of `Artifact.bytes` (payload).
No padding, alignment, or variant tags are introduced beyond what is explicitly described above.
### 3.3 Encoding (normative)
Let `A` be an `Artifact`. The canonical encoding function:
```text
encode_artifact_core_v1 : Artifact → ArtifactBytes
```
is defined as:
1. Emit `has_type_tag` (`u8`):
* `0x00` if `A.type_tag` is absent.
* `0x01` if `A.type_tag` is present.
2. If `A.type_tag` is present, emit `A.type_tag.tag_id` as `u32`.
3. Let `bytes_len = len(A.bytes)`; emit `bytes_len` as `u64`.
4. Emit the raw bytes of `A.bytes`.
The result is the canonical `ArtifactBytes`.
This encoding satisfies the `ASL/1-CORE §3.2` requirements: injective, stable, deterministic, explicit in structure, type-sensitive, byte-transparent, and streaming-friendly.
### 3.4 Decoding (normative)
Given a byte slice known to contain exactly one `ArtifactBytes` value, the canonical decoding function:
```text
decode_artifact_core_v1 : ArtifactBytes → Artifact
```
is defined as:
1. Read `has_type_tag` (`u8`).
* If the value is neither `0x00` nor `0x01`, fail with an encoding error.
2. If `has_type_tag == 0x01`, read `tag_id (u32)` and construct `TypeTag{ tag_id }`.
3. Read `bytes_len (u64)`.
4. Read exactly `bytes_len` bytes; this is `bytes`.
5. Construct `Artifact{ bytes, type_tag }` where `type_tag` is either `None` or `Some(TypeTag{ tag_id })` per steps above.
Decoders MUST reject:
* Invalid presence flags (`has_type_tag` not in `{0x00, 0x01}`).
* Truncated sequences (insufficient bytes for declared lengths).
* Over-long sequences where `bytes_len` cannot be represented or allocated safely in the implementations execution model (encoding error).
* Trailing bytes if the decoding context expects an isolated `ArtifactBytes` value.
### 3.5 Injectivity
The mapping:
```text
Artifact → ArtifactBytes
```
defined by `encode_artifact_core_v1` is **injective**:
* Each `Artifact` value has exactly one canonical byte string.
* Decoding the canonical bytes via `decode_artifact_core_v1` yields exactly that `Artifact`.
### 3.6 Streaming properties
Encoders and decoders MUST NOT require backtracking:
* The header (`has_type_tag`, optional `type_tag`, `bytes_len`) is computed and emitted/read once, in order.
* `bytes` MAY be streamed directly:
* Encoders MAY produce the payload incrementally after emitting `bytes_len`.
* Decoders MAY pass the payload through to a consumer or hasher as it is read.
Incremental hashing (e.g., computing digests over `ArtifactBytes`) MUST be possible with a single forward pass over the byte stream.
---
## 4. Reference Encoding
### 4.1 Logical structure (from ASL/1-CORE)
From `ASL/1-CORE`:
```text
Reference {
hash_id: HashId // uint16
digest: OctetString
}
HashId = uint16
```
For encoding purposes, `Reference.digest` is treated as a raw digest byte string, not as a generic encoded `u64 + bytes` OctetString.
### 4.2 Canonical layout: ReferenceBytes
The canonical binary layout for a `Reference` is:
```text
+----------------+---------------------------+
| hash_id (u16) | digest (b[?]) ...
+----------------+---------------------------+
```
Fields:
1. **hash_id (u16)**
* Encodes `Reference.hash_id`.
* Semantically, an element of the `HashId` space defined by ASL/1-CORE (and populated by HASH/ASL1 when present).
2. **digest**
* Raw digest bytes.
* The length of `digest` is **not encoded** explicitly in this profile.
* Digest length is determined by the decoding context:
* by the **frame boundary** of the `ReferenceBytes` value (e.g. “this message consists of exactly one `ReferenceBytes`”), or
* by an outer length-prefix in a higher-level enclosing structure.
> This layout is an explicit exception to the general `OctetString = u64 + bytes` rule. It keeps `ReferenceBytes` compact and relies on framing + the hash registry for length.
### 4.3 Encoding (normative)
Let `R` be a `Reference`. The canonical encoding function:
```text
encode_reference_core_v1 : Reference → ReferenceBytes
```
is defined as:
1. Emit `hash_id = R.hash_id` as `u16`.
2. Emit the raw bytes of `R.digest`.
When `HASH/ASL1` is implemented and the `hash_id` is known, the encoder MUST ensure:
```text
len(R.digest) == expected_digest_length(hash_id)
```
where `expected_digest_length` is taken from the HASH/ASL1 registry.
The result is the canonical `ReferenceBytes`.
### 4.4 Decoding & consistency checks (normative)
Given a byte slice known to contain exactly one `ReferenceBytes` value, the canonical decoding function:
```text
decode_reference_core_v1 : ReferenceBytes → Reference
```
is defined as:
1. Read `hash_id` as `u16`.
2. Treat **all remaining bytes in the slice** as the digest `digest`.
3. Construct `Reference{ hash_id, digest }`.
**Boundary requirement:**
Decoding contexts MUST provide explicit boundaries for `ReferenceBytes` values (e.g., via an external length-prefix or by framing the entire message as a single `ReferenceBytes` value). A decoder MUST NOT read beyond the slice that defines the `ReferenceBytes` frame.
**Cross-profile consistency with HASH/ASL1 (when present):**
If the implementation also implements `HASH/ASL1` and recognizes this `hash_id`, then:
* Let `expected_len = expected_digest_length(hash_id)` from the ASL1 registry.
* The implementation **MUST** enforce:
```text
len(digest) == expected_len
```
* Any mismatch MUST result in an encoding/integrity error.
If the implementation does **not** implement HASH/ASL1 or does not recognize the `hash_id`:
* It MAY accept the value as a structurally well-formed `Reference`.
* It MUST treat the algorithm as **unsupported** for digest recomputation or verification.
### 4.5 Injectivity
The mapping:
```text
Reference → ReferenceBytes
```
defined by `encode_reference_core_v1` is **injective**:
* Each `Reference` value has exactly one canonical byte string.
* Equality of `ReferenceBytes` implies equality of the underlying `Reference` (same `hash_id`, same digest bytes).
No additional normalization is performed.
---
## 5. Hash Interactions & Canonicality
### 5.1 Canonical hashing rule
For encoding profile `ASL_ENC_CORE_V1`, the canonical rule for constructing `Reference` values from `Artifact` values is:
```text
ArtifactBytes = encode_artifact_core_v1(A)
digest = H(ArtifactBytes)
Reference = { hash_id = HID, digest = digest }
```
where:
* `A` is an `Artifact` (ASL/1-CORE),
* `H` is a hash function associated with `HID` in the ASL1 hash family,
* `HID` is a `HashId` (u16).
This is `ASL/CORE-REF-DERIVE/1` instantiated with `ASL_ENC_CORE_V1`.
> **REF-DERIVE INV/ENC/1**
> Under `ASL_ENC_CORE_V1`, any component that claims to derive `Reference` values from `Artifact` values **MUST** use this rule.
### 5.2 Default algorithm in canonical deployments
In canonical Amduat 2.0 substrate deployments (per `HASH/ASL1`):
* `HashId = 0x0001` is assigned to `HASH-ASL1-256`.
* Digest length is 32 bytes.
* `HASH-ASL1-256` is SHA-256 or semantically equivalent.
This profile does **not** force any particular `HashId` in all deployments, but:
* if a deployment adopts `HashId = 0x0001` as `HASH-ASL1-256`, then any `Reference` with `hash_id = 0x0001` **MUST** have a 32-byte digest.
### 5.3 Deterministic agreement
If two implementations:
* implement `ASL_ENC_CORE_V1`, and
* use the same hash algorithm `H` for a given `HashId`,
then for any `Artifact A` they MUST:
* produce identical `ArtifactBytes = encode_artifact_core_v1(A)`,
* produce identical `digest = H(ArtifactBytes)`,
* produce identical `Reference` and `ReferenceBytes = encode_reference_core_v1(Reference)`.
This is the determinism foundation used by ASL/1-STORE, PEL/1, FER/1, and FCT/1.
### 5.4 Identity contexts and encoding profile selection
For any context where `Reference` values are derived (e.g. a store, a PEL engine, a profile), the **encoding profile MUST be fixed and explicit**.
If a context adopts `ASL_ENC_CORE_V1`:
* All `Reference` values in that context MUST be derived via `encode_artifact_core_v1` and the canonical hashing rule (§5.1).
* The context MUST NOT mix `Reference`s derived from different canonical encoding profiles inside the same logical identity space.
This ensures that for a given `(hash_id, digest)` pair, there is a unique underlying `ArtifactBytes` and `Artifact` (modulo cryptographic collisions).
---
## 6. Examples (Non-Normative)
Hex values are shown compactly without separators.
### 6.1 Artifact without type tag
Artifact:
```text
bytes = DE AD // two bytes: 0xDE, 0xAD
type_tag = none
```
Encoding:
```text
has_type_tag = 00
bytes_len = 0000000000000002
bytes = DEAD
```
Canonical `ArtifactBytes`:
```text
00 0000000000000002 DEAD
```
Digest with `HASH-ASL1-256` (SHA-256):
```text
digest = SHA-256(00 0000000000000002 DEAD)
```
Assuming `HashId = 0001` for `HASH-ASL1-256`, the `ReferenceBytes` are:
```text
hash_id = 0001
digest = <32 digest bytes>
```
Canonical `ReferenceBytes`:
```text
0001 <32 digest bytes>
```
### 6.2 Artifact with type tag & empty bytes
Artifact:
```text
bytes = "" (empty)
type_tag = TypeTag{ tag_id = 5 }
```
Encoding:
```text
has_type_tag = 01
type_tag = 00000005
bytes_len = 0000000000000000
bytes = (none)
```
Canonical `ArtifactBytes`:
```text
01 00000005 0000000000000000
```
Hashing and `ReferenceBytes` proceed as in §6.1.
---
## 7. Conformance
An implementation conforms to `ENC/ASL1-CORE v1.0.5` if and only if it:
1. **Correctly encodes and decodes Artifacts**
* Implements `encode_artifact_core_v1` and `decode_artifact_core_v1` exactly as in §3.3 and §3.4.
* Produces and accepts only the canonical layout for `ArtifactBytes`.
* Ensures injectivity and exact round-tripping.
2. **Correctly encodes and decodes References**
* Implements `encode_reference_core_v1` and `decode_reference_core_v1` exactly as in §4.3 and §4.4.
* Produces and accepts only the canonical layout for `ReferenceBytes` (no `digest_len` field).
* When HASH/ASL1 is also implemented:
* Enforces digest-length consistency for all known `HashId`s, i.e. `len(digest) == expected_digest_length(hash_id)`.
3. **Implements canonical hashing correctly**
* Uses `ArtifactBytes` from `encode_artifact_core_v1` as the **only** input to ASL1 hash functions when deriving `Reference`s under this profile.
* Computes `Reference` via the canonical rule in §5.1.
* Does not derive `Reference`s from non-canonical or alternative encodings in contexts that claim to use `ASL_ENC_CORE_V1`.
4. **Preserves streaming-friendliness**
* Does not require backward reads or multi-pass parsing for either `ArtifactBytes` or `ReferenceBytes`.
* Supports incremental hashing and streaming of payload bytes.
* Ensures that decoding contexts provide explicit boundaries for each `ReferenceBytes` value.
5. **Respects layering and identity semantics**
* Does not re-define `Artifact`, `Reference`, `TypeTag`, or `HashId` (those come from `ASL/1-CORE`).
* Treats storage, transport, and policy as out-of-scope (delegated to ASL/1-STORE and higher profiles).
* Ensures that two logical ASL/1 values encode identically under this profile **if and only if** they are identical under ASL/1-CORE semantics.
Everything else — transport, storage layout, replication, indexing, overlays, and policy — belongs to `ASL/1-STORE`, `HASH/ASL1`, `TGK/1`, and higher profiles.
---
## Document History
* **1.0.5 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.