amduat/tier1/enc-asl1-core.md
2025-12-19 19:22:40 +01:00

18 KiB
Raw Blame History

ENC/ASL1-CORE v1 — Core Canonical Encoding Profile

Status: Approved Owner: Niklas Rydberg Version: 1.0.5 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [deterministic, binary-minimalism]

Document ID: ENC/ASL1-CORE Profile ID: ASL_ENC_CORE_V1 = 0x0001 Layer: Substrate Primitive Profile (Canonical Encoding)

Depends on (normative):

  • ASL/1-CORE v0.4.1 (value model: Artifact, TypeTag, Reference, HashId)

Integrates with (cross-profile rules):

  • HASH/ASL1 v0.2.4 (ASL1 hash family: registry of HashId → algorithm, digest length)

    • This profile does not depend on HASH/ASL1 to define its layouts.
    • When both profiles are implemented, additional cross-checks apply (see §4.4, §5).

Used by (descriptive):

  • ASL/1-CORE identity semantics (canonical encodings as the basis for hashing)
  • ASL/1-STORE (persistence and integrity)
  • PEL/1 (execution artifacts and results)
  • CIL/1, FER/1, FCT/1, OI/1 (typed envelopes, receipts, facts, overlays)
  • HASH/ASL1 (interpretation and checking of ReferenceBytes)

The Profile ID ASL_ENC_CORE_V1 and this documents version are not encoded into ArtifactBytes or ReferenceBytes. Encoding version is selected by context (deployment, profile, or store configuration), not embedded per value.

© 2025 Niklas Rydberg.

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.


0. Overview

ENC/ASL1-CORE v1 defines the canonical, streaming-friendly, injective binary encoding used across the Amduat 2.0 substrate for two core value types from ASL/1-CORE:

  1. ArtifactBytes — canonical bytes for an ASL/1 Artifact
  2. ReferenceBytes — canonical bytes for an ASL/1 Reference

This profile ensures:

  • Injectivity — each ASL/1 value maps to exactly one byte string.
  • Determinism — identical values yield identical encodings across implementations.
  • Stability — bytes never depend on platform, locale, endian, or environment.
  • Streaming-compatibility — encoders, decoders, and hashers operate in forward-only mode.

ASL_ENC_CORE_V1 is the canonical ASL/1 encoding profile used by the Amduat 2.0 substrate stack for:

  • ASL/1 identity model (via canonical encoding + ASL1 hashing),
  • the hashing substrate (HASH/ASL1),
  • ASL/1-STORE persistence semantics,
  • PEL/1 execution input/output artifacts,
  • and canonical near-core profiles.

The encodings defined in this profile satisfy all canonical encoding requirements in ASL/1-CORE §3.2: injectivity, stability, determinism, explicit structure, type-sensitivity, byte-transparency, and streaming-friendliness.


1. Scope & Layering

1.1 Purpose

This specification defines:

  • The canonical binary layout for ArtifactBytes and ReferenceBytes.
  • Normative encoding and decoding procedures.
  • How these encodings interact with the ASL1 hash family.
  • Required consistency checks when HASH/ASL1 is present.
  • Streaming and injectivity requirements.

1.2 Non-goals

This profile does not define:

  • Any filesystem, transport, or database representation.
  • Chunking or multipart strategies for large artifacts.
  • Any alternative encoding families (those are separate profiles).
  • Semantics of TypeTag values or registry rules.
  • Storage layout, replication, or policy.

Those concerns belong to ASL/1-STORE, PEL/1, HASH/ASL1, and higher layers.

1.3 Layering constraints

In line with the substrate overview:

  • ENC/ASL1-CORE is a near-core substrate profile, not a kernel primitive.
  • It MUST NOT re-define Artifact, Reference, TypeTag, or HashId; those are defined solely by ASL/1-CORE.
  • It is storage-neutral and policy-neutral.
  • It defines exactly one canonical encoding profile: ASL_ENC_CORE_V1.

2. Conventions

The key words MUST, SHOULD, MAY, etc. follow RFC 2119.

2.1 Integer encodings

All multi-byte integers are encoded as big-endian:

  • u8 — 1 byte
  • u16 — 2 bytes
  • u32 — 4 bytes
  • u64 — 8 bytes

Only fixed-width integers are used.

2.2 Booleans (presence flags)

Booleans used as presence flags are encoded as:

  • false0x00
  • true0x01

Booleans are only used for presence flags, never for general logical conditions.

2.3 OctetString

Except where explicitly overridden, an OctetString is encoded as:

[length (u64)] [raw bytes]
  • length is the number of bytes.
  • length MAY be zero.
  • There is no implicit terminator or padding.

Whenever this profile says an ASL/1 field is an OctetString, its canonical encoding is this u64 + bytes form unless explicitly stated otherwise.

Exception: Reference.digest is encoded without an explicit length field; see §4.2.


3. Artifact Encoding

3.1 Logical structure (from ASL/1-CORE)

From ASL/1-CORE:

TypeTag {
  tag_id: uint32
}

Artifact {
  bytes:    OctetString
  type_tag: optional TypeTag
}

TypeTag semantics (registries, meaning of tag IDs) are opaque at this layer.

3.2 Canonical layout: ArtifactBytes

The canonical binary layout for an Artifact is:

+----------------------+-------------------------+---------------------------+
| has_type_tag (u8)    | [type_tag (u32)]        | bytes_len (u64)           |
+----------------------+-------------------------+---------------------------+
| bytes (b[bytes_len])                                               ...
+------------------------------------------------------------------------

Fields:

  1. has_type_tag (u8) — presence flag for type_tag

    • 0x00 → no type_tag
    • 0x01type_tag is present and follows immediately
  2. type_tag (u32) — only present if has_type_tag == 0x01

    • Encodes TypeTag.tag_id as a 32-bit unsigned integer.
  3. bytes_len (u64)

    • Length in bytes of Artifact.bytes.
    • MAY be zero.
  4. bytes

    • Raw bytes of Artifact.bytes (payload).

No padding, alignment, or variant tags are introduced beyond what is explicitly described above.

3.3 Encoding (normative)

Let A be an Artifact. The canonical encoding function:

encode_artifact_core_v1 : Artifact → ArtifactBytes

is defined as:

  1. Emit has_type_tag (u8):

    • 0x00 if A.type_tag is absent.
    • 0x01 if A.type_tag is present.
  2. If A.type_tag is present, emit A.type_tag.tag_id as u32.

  3. Let bytes_len = len(A.bytes); emit bytes_len as u64.

  4. Emit the raw bytes of A.bytes.

The result is the canonical ArtifactBytes.

This encoding satisfies the ASL/1-CORE §3.2 requirements: injective, stable, deterministic, explicit in structure, type-sensitive, byte-transparent, and streaming-friendly.

3.4 Decoding (normative)

Given a byte slice known to contain exactly one ArtifactBytes value, the canonical decoding function:

decode_artifact_core_v1 : ArtifactBytes → Artifact

is defined as:

  1. Read has_type_tag (u8).

    • If the value is neither 0x00 nor 0x01, fail with an encoding error.
  2. If has_type_tag == 0x01, read tag_id (u32) and construct TypeTag{ tag_id }.

  3. Read bytes_len (u64).

  4. Read exactly bytes_len bytes; this is bytes.

  5. Construct Artifact{ bytes, type_tag } where type_tag is either None or Some(TypeTag{ tag_id }) per steps above.

Decoders MUST reject:

  • Invalid presence flags (has_type_tag not in {0x00, 0x01}).
  • Truncated sequences (insufficient bytes for declared lengths).
  • Over-long sequences where bytes_len cannot be represented or allocated safely in the implementations execution model (encoding error).
  • Trailing bytes if the decoding context expects an isolated ArtifactBytes value.

3.5 Injectivity

The mapping:

Artifact → ArtifactBytes

defined by encode_artifact_core_v1 is injective:

  • Each Artifact value has exactly one canonical byte string.
  • Decoding the canonical bytes via decode_artifact_core_v1 yields exactly that Artifact.

3.6 Streaming properties

Encoders and decoders MUST NOT require backtracking:

  • The header (has_type_tag, optional type_tag, bytes_len) is computed and emitted/read once, in order.

  • bytes MAY be streamed directly:

    • Encoders MAY produce the payload incrementally after emitting bytes_len.
    • Decoders MAY pass the payload through to a consumer or hasher as it is read.

Incremental hashing (e.g., computing digests over ArtifactBytes) MUST be possible with a single forward pass over the byte stream.


4. Reference Encoding

4.1 Logical structure (from ASL/1-CORE)

From ASL/1-CORE:

Reference {
  hash_id: HashId   // uint16
  digest:  OctetString
}

HashId = uint16

For encoding purposes, Reference.digest is treated as a raw digest byte string, not as a generic encoded u64 + bytes OctetString.

4.2 Canonical layout: ReferenceBytes

The canonical binary layout for a Reference is:

+----------------+---------------------------+
| hash_id (u16)  | digest (b[?])           ...
+----------------+---------------------------+

Fields:

  1. hash_id (u16)

    • Encodes Reference.hash_id.
    • Semantically, an element of the HashId space defined by ASL/1-CORE (and populated by HASH/ASL1 when present).
  2. digest

    • Raw digest bytes.

    • The length of digest is not encoded explicitly in this profile.

    • Digest length is determined by the decoding context:

      • by the frame boundary of the ReferenceBytes value (e.g. “this message consists of exactly one ReferenceBytes”), or
      • by an outer length-prefix in a higher-level enclosing structure.

This layout is an explicit exception to the general OctetString = u64 + bytes rule. It keeps ReferenceBytes compact and relies on framing + the hash registry for length.

4.3 Encoding (normative)

Let R be a Reference. The canonical encoding function:

encode_reference_core_v1 : Reference → ReferenceBytes

is defined as:

  1. Emit hash_id = R.hash_id as u16.

  2. Emit the raw bytes of R.digest.

When HASH/ASL1 is implemented and the hash_id is known, the encoder MUST ensure:

len(R.digest) == expected_digest_length(hash_id)

where expected_digest_length is taken from the HASH/ASL1 registry.

The result is the canonical ReferenceBytes.

4.4 Decoding & consistency checks (normative)

Given a byte slice known to contain exactly one ReferenceBytes value, the canonical decoding function:

decode_reference_core_v1 : ReferenceBytes → Reference

is defined as:

  1. Read hash_id as u16.

  2. Treat all remaining bytes in the slice as the digest digest.

  3. Construct Reference{ hash_id, digest }.

Boundary requirement:

Decoding contexts MUST provide explicit boundaries for ReferenceBytes values (e.g., via an external length-prefix or by framing the entire message as a single ReferenceBytes value). A decoder MUST NOT read beyond the slice that defines the ReferenceBytes frame.

Cross-profile consistency with HASH/ASL1 (when present):

If the implementation also implements HASH/ASL1 and recognizes this hash_id, then:

  • Let expected_len = expected_digest_length(hash_id) from the ASL1 registry.

  • The implementation MUST enforce:

    len(digest) == expected_len
    
  • Any mismatch MUST result in an encoding/integrity error.

If the implementation does not implement HASH/ASL1 or does not recognize the hash_id:

  • It MAY accept the value as a structurally well-formed Reference.
  • It MUST treat the algorithm as unsupported for digest recomputation or verification.

4.5 Injectivity

The mapping:

Reference → ReferenceBytes

defined by encode_reference_core_v1 is injective:

  • Each Reference value has exactly one canonical byte string.
  • Equality of ReferenceBytes implies equality of the underlying Reference (same hash_id, same digest bytes).

No additional normalization is performed.


5. Hash Interactions & Canonicality

5.1 Canonical hashing rule

For encoding profile ASL_ENC_CORE_V1, the canonical rule for constructing Reference values from Artifact values is:

ArtifactBytes = encode_artifact_core_v1(A)
digest        = H(ArtifactBytes)
Reference     = { hash_id = HID, digest = digest }

where:

  • A is an Artifact (ASL/1-CORE),
  • H is a hash function associated with HID in the ASL1 hash family,
  • HID is a HashId (u16).

This is ASL/CORE-REF-DERIVE/1 instantiated with ASL_ENC_CORE_V1.

REF-DERIVE INV/ENC/1 Under ASL_ENC_CORE_V1, any component that claims to derive Reference values from Artifact values MUST use this rule.

5.2 Default algorithm in canonical deployments

In canonical Amduat 2.0 substrate deployments (per HASH/ASL1):

  • HashId = 0x0001 is assigned to HASH-ASL1-256.
  • Digest length is 32 bytes.
  • HASH-ASL1-256 is SHA-256 or semantically equivalent.

This profile does not force any particular HashId in all deployments, but:

  • if a deployment adopts HashId = 0x0001 as HASH-ASL1-256, then any Reference with hash_id = 0x0001 MUST have a 32-byte digest.

5.3 Deterministic agreement

If two implementations:

  • implement ASL_ENC_CORE_V1, and
  • use the same hash algorithm H for a given HashId,

then for any Artifact A they MUST:

  • produce identical ArtifactBytes = encode_artifact_core_v1(A),
  • produce identical digest = H(ArtifactBytes),
  • produce identical Reference and ReferenceBytes = encode_reference_core_v1(Reference).

This is the determinism foundation used by ASL/1-STORE, PEL/1, FER/1, and FCT/1.

5.4 Identity contexts and encoding profile selection

For any context where Reference values are derived (e.g. a store, a PEL engine, a profile), the encoding profile MUST be fixed and explicit.

If a context adopts ASL_ENC_CORE_V1:

  • All Reference values in that context MUST be derived via encode_artifact_core_v1 and the canonical hashing rule (§5.1).
  • The context MUST NOT mix References derived from different canonical encoding profiles inside the same logical identity space.

This ensures that for a given (hash_id, digest) pair, there is a unique underlying ArtifactBytes and Artifact (modulo cryptographic collisions).


6. Examples (Non-Normative)

Hex values are shown compactly without separators.

6.1 Artifact without type tag

Artifact:

bytes    = DE AD        // two bytes: 0xDE, 0xAD
type_tag = none

Encoding:

has_type_tag = 00
bytes_len    = 0000000000000002
bytes        = DEAD

Canonical ArtifactBytes:

00 0000000000000002 DEAD

Digest with HASH-ASL1-256 (SHA-256):

digest = SHA-256(00 0000000000000002 DEAD)

Assuming HashId = 0001 for HASH-ASL1-256, the ReferenceBytes are:

hash_id = 0001
digest  = <32 digest bytes>

Canonical ReferenceBytes:

0001 <32 digest bytes>

6.2 Artifact with type tag & empty bytes

Artifact:

bytes    = ""  (empty)
type_tag = TypeTag{ tag_id = 5 }

Encoding:

has_type_tag = 01
type_tag     = 00000005
bytes_len    = 0000000000000000
bytes        =  (none)

Canonical ArtifactBytes:

01 00000005 0000000000000000

Hashing and ReferenceBytes proceed as in §6.1.


7. Conformance

An implementation conforms to ENC/ASL1-CORE v1.0.5 if and only if it:

  1. Correctly encodes and decodes Artifacts

    • Implements encode_artifact_core_v1 and decode_artifact_core_v1 exactly as in §3.3 and §3.4.
    • Produces and accepts only the canonical layout for ArtifactBytes.
    • Ensures injectivity and exact round-tripping.
  2. Correctly encodes and decodes References

    • Implements encode_reference_core_v1 and decode_reference_core_v1 exactly as in §4.3 and §4.4.

    • Produces and accepts only the canonical layout for ReferenceBytes (no digest_len field).

    • When HASH/ASL1 is also implemented:

      • Enforces digest-length consistency for all known HashIds, i.e. len(digest) == expected_digest_length(hash_id).
  3. Implements canonical hashing correctly

    • Uses ArtifactBytes from encode_artifact_core_v1 as the only input to ASL1 hash functions when deriving References under this profile.
    • Computes Reference via the canonical rule in §5.1.
    • Does not derive References from non-canonical or alternative encodings in contexts that claim to use ASL_ENC_CORE_V1.
  4. Preserves streaming-friendliness

    • Does not require backward reads or multi-pass parsing for either ArtifactBytes or ReferenceBytes.
    • Supports incremental hashing and streaming of payload bytes.
    • Ensures that decoding contexts provide explicit boundaries for each ReferenceBytes value.
  5. Respects layering and identity semantics

    • Does not re-define Artifact, Reference, TypeTag, or HashId (those come from ASL/1-CORE).
    • Treats storage, transport, and policy as out-of-scope (delegated to ASL/1-STORE and higher profiles).
    • Ensures that two logical ASL/1 values encode identically under this profile if and only if they are identical under ASL/1-CORE semantics.

Everything else — transport, storage layout, replication, indexing, overlays, and policy — belongs to ASL/1-STORE, HASH/ASL1, TGK/1, and higher profiles.


Document History

  • 1.0.5 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.