amduat/tier1/hash-asl1.md

# HASH/ASL1 — ASL1 Hash Algorithm Registry

Status: Approved
Owner: Niklas Rydberg
Version: 0.2.4
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, registry]

<!-- Source: /amduat/docs/new/hash-asl.md | Canonical: /amduat/tier1/hash-asl1.md -->

**Document ID:** `HASH/ASL1`
**Layer:** Substrate primitive profile (over ASL/1-CORE)

**Depends on (normative):**

* `ASL/1-CORE v0.4.x` — value substrate: `HashId`, `Reference`, `Artifact`, `EncodingProfileId`
* `ENC/ASL1-CORE v1.x` — canonical encoding for `Reference` (`ReferenceBytes`)

**Informative references:**

* `ASL/1-STORE v0.4.x` — content-addressable store model
* `TGK/1-CORE v0.7.x` — trace graph kernel (uses `Reference`)
* `PEL/1` — execution substrate
* `CIL/1`, `FCT/1`, `FER/1`, `OI/1` — profiles that depend on stable `Reference` semantics
* (future) `CID/1` — content identifier and domain-separation rules

© 2025 Niklas Rydberg.

## License

Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.

Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.


---

## 0. Purpose & Context

`HASH/ASL1` defines the **ASL1 hash algorithm family** for Amduat 2.0:

* assigns stable `HashId` (`uint16`) values to concrete cryptographic hash algorithms;
* defines the **mandatory** baseline algorithm `HASH-ASL1-256`;
* reserves ranges for future classical and post-quantum algorithms;
* specifies how these algorithms are used when deriving `Reference` values:

  * via `ASL/CORE-REF-DERIVE/1` in `ASL/1-CORE`, and
  * via `ENC/ASL1-CORE v1` binary encoding of `ReferenceBytes`.

This is a **substrate primitive profile**, not kernel, but:

> In Amduat 2.0, all **identity-critical** `Reference.hash_id` values used by the standard stack (ASL/1-STORE, TGK/1-CORE, PEL/1, CIL/1, FER/1, FCT/1, OI/1) MUST be interpreted according to this registry.

---

## 1. Scope

### 1.1 In scope

This specification standardizes:

1. The **ASL1 hash family**: common properties all algorithms must satisfy.

2. A **registry** from `HashId` → algorithm descriptor:

   * `HashId` (`uint16`),
   * digest length (bytes),
   * normative definition and status.

3. How these algorithms connect to:

   * `ASL/1-CORE`’s Reference derivation rule (`ASL/CORE-REF-DERIVE/1`),
   * `ENC/ASL1-CORE v1`’s `ReferenceBytes` encoding.

4. Rules for **algorithm evolution**:

   * immutability of assignments,
   * constraints for adding new algorithms.

### 1.2 Out of scope

This specification does **not** define:

* storage APIs, replication, or retention,
* execution runtimes, scheduling, or side effects,
* keyed constructions (MACs, KDFs, PRFs, etc.),
* non-cryptographic hashes,
* domain-separation rules at the CID layer (those belong in `CID/1` and/or encoding profiles),
* migration policy (it only provides primitives).

---

## 2. Terminology & Conventions

The RFC 2119 terms **MUST**, **SHOULD**, **MAY**, etc. apply.

From `ASL/1-CORE`:

* `OctetString` — finite byte sequence (`0x00–0xFF`),
* `HashId` — `uint16`, used as `Reference.hash_id`,
* `Reference` — `{ hash_id: HashId; digest: OctetString }`,
* `EncodingProfileId` — `uint16` identifying canonical encodings (e.g. `ASL_ENC_CORE_V1`),
* `ASL/CORE-REF-DERIVE/1` — normative Reference derivation rule.

From `ENC/ASL1-CORE v1` (current):

* `ReferenceBytes` — canonical encoding:

  ```text
  u16 hash_id
  digest[...]  // remaining bytes in the frame are the digest
  ```

**Note:** `Reference` carries only `hash_id` and `digest`. There is no extra “family” field on-wire. For Amduat 2.0, `HashId` values in ASL/1 contexts are **globally** interpreted using this `HASH/ASL1` registry.

---

## 3. The ASL1 Hash Family

### 3.1 Family properties

All `"ASL1"` algorithms MUST be **cryptographic hash functions**:

* **Preimage resistance** – infeasible to find `x` for a given digest `d` with `H(x) = d`.
* **Second-preimage resistance** – infeasible, given `x`, to find `x' ≠ x` with `H(x') = H(x)`.
* **Collision resistance** – infeasible to find any `(x, x')`, `x ≠ x'` with `H(x) = H(x')`.

Each `"ASL1"` algorithm:

* accepts arbitrary-length `OctetString` inputs,
* produces a **fixed-length** `OctetString` digest,
* MUST support **incremental / streaming** operation:

  * a single forward-only pass over input,
  * no need to buffer entire input.

These properties allow:

* hashing large canonical encodings incrementally,
* use in streaming stores and execution engines.

### 3.2 Family name and global use

* Family name: `"ASL1"`.

Within Amduat 2.0:

* all **identity-critical** `Reference.hash_id` values used by the standard stack are interpreted as entries in this `"ASL1"` registry;
* `HASH/ASL1` is therefore the **global assignment** for `HashId` in ASL/1 identity contexts.

If other hash families are used in non-ASL contexts (e.g., external APIs), they **MUST NOT** reuse `HashId` values defined here for `Reference.hash_id` in ASL/1-CORE. They should either:

* live in separate fields / structures; or
* use distinct namespaces not confused with `Reference.hash_id`.

### 3.3 HashId space

`HashId` is `uint16` and appears in `Reference.hash_id` and in `ReferenceBytes.hash_id`.

This registry reserves:

* `0x0000` — **Reserved** (never a valid algorithm).
* `0x0001–0x7FFF` — classical (pre-quantum) `"ASL1"` algorithms.
* `0x8000–0xFFFF` — post-quantum or specialized `"ASL1"` algorithms.

Each algorithm has an intrinsic digest length `L` (>0 bytes), defined by its normative spec. This document does not impose an upper bound beyond “finite and practically representable in implementations.” (ENC/ASL1-CORE v1 does not carry the length explicitly; length is implied by framing and cross-checked against `L` when the algorithm is known.)

---

## 4. Algorithm Registry

### 4.1 Registry (v0.2.4)

The `"ASL1"` registry is a mapping:

```text
HashId (uint16) -> Algorithm descriptor
```

At version 0.2.4:

|        HashId | Name          | Digest (bytes) | Status    | Notes                                      |
| ------------: | ------------- | -------------- | --------- | ------------------------------------------ |
|    **0x0001** | HASH-ASL1-256 | 32             | MANDATORY | Canonical default for `ASL_ENC_CORE_V1`    |
|        0x0002 | HASH-ASL1-512 | 64 (reserved)  | RESERVED  | Intended classical 512-bit algorithm       |
|        0x8001 | HASH-ASL1-PQ1 | TBD            | RESERVED  | First PQ algorithm placeholder             |
| 0x8002–0x80FF | —             | varies         | RESERVED  | Reserved range for future PQ / specialized |

Only `0x0001` is defined normatively at this version; others are reserved for future assignment.

### 4.2 HASH-ASL1-256 (mandatory)

* **Name:** `HASH-ASL1-256`
* **HashId:** `0x0001`
* **Digest length:** 32 bytes
* **Status:** MANDATORY for all Amduat 2.0–conformant implementations

#### 4.2.1 Normative definition

`HASH-ASL1-256` is **bit-for-bit identical** to SHA-256 as defined in FIPS 180-4 (or any successor that preserves SHA-256 semantics).

For all `data : OctetString`:

```text
HASH-ASL1-256(data) == SHA-256(data)
```

Any implementation whose output differs from SHA-256 for any input MUST NOT claim to implement `HASH-ASL1-256`.

`HASH-ASL1-256` MUST be deterministic and support incremental processing of input.

#### 4.2.2 Relationship to ASL/1-CORE & ASL_ENC_CORE_V1

`ASL/1-CORE` defines `ASL/CORE-REF-DERIVE/1`:

```text
ArtifactBytes = encode_P(A)
digest        = H(ArtifactBytes)
Reference     = { hash_id = HID, digest = digest }
```

For:

* `P = ASL_ENC_CORE_V1` (`EncodingProfileId = 0x0001`),
* `HID = 0x0001`,
* `H = HASH-ASL1-256`,

this becomes the **canonical default** Reference derivation for Amduat 2.0.

Unless a profile explicitly opts out, all identity-critical `Reference` values for Artifacts encoded under `ASL_ENC_CORE_V1` **MUST** use this `(P, H)` pair.

### 4.3 Reserved IDs

The following identifiers are reserved:

* `0x0002` — `HASH-ASL1-512`, digest length 64 bytes; classical 512-bit algorithm (e.g. SHA-512 or similar), TBD.
* `0x8001` — `HASH-ASL1-PQ1`; first post-quantum algorithm, TBD.
* `0x8002–0x80FF` — reserved block for additional post-quantum / specialized algorithms.

Implementations MUST NOT treat these IDs as usable until a future `HASH/ASL1` revision defines them normatively.

---

## 5. Interaction with ASL/1-CORE & ENC/ASL1-CORE v1

### 5.1 Reference derivation

`ASL/1-CORE` defines `ASL/CORE-REF-DERIVE/1`. `HASH/ASL1` simply supplies the `"ASL1"` algorithms and `HashId`s.

Given:

* Artifact `A`,
* encoding profile `P`,
* algorithm `H` with `HashId = HID`,

then:

```text
ArtifactBytes = encode_P(A)
digest        = H(ArtifactBytes)
Reference     = { hash_id = HID, digest = digest }
```

All ASL/1 conformant components **MUST** use this procedure for any `(EncodingProfileId, HashId)` pair they claim to support.

### 5.2 ReferenceBytes under ENC/ASL1-CORE v1

`ENC/ASL1-CORE v1` encodes a `Reference` as:

```text
u16 hash_id
digest[...]  // remaining bytes in the enclosing frame are the digest
```

This profile does **not** carry an explicit digest length; framing is provided by the enclosing structure (e.g., length-prefix, message boundary).

When an implementation both:

* decodes `ReferenceBytes` under `ENC/ASL1-CORE v1`, and
* implements `HASH/ASL1` and recognizes `hash_id`,

then it MUST enforce:

```text
len(digest) == canonical_digest_length(hash_id)
```

where `canonical_digest_length(hash_id)` is taken from this registry.

Any mismatch MUST be treated as an encoding / integrity error by the consumer.

If a `hash_id` is unknown (or HASH/ASL1 is not implemented), an implementation MAY still treat the bytes as a generic `Reference { hash_id, digest }`, but:

* it cannot recompute or verify the digest cryptographically, and
* higher layers MAY treat such a `Reference` as unsupported or lower-trust.

---

## 6. Crypto Agility & Evolution

### 6.1 Immutability of assignments

Once a `HashId` is assigned to an algorithm, its:

* digest length,
* underlying construction,
* behavior on all inputs,

MUST NOT change in any way that alters output values for the **same input bytes**.

For example:

* `HashId = 0x0001` MUST always denote SHA-256 semantics; future revisions cannot redefine it as anything that changes the digest for the same input bytes (e.g. “SHA-256 plus domain separator”).

If domain separation or similar techniques are required, they MUST be expressed at the **input construction** level (e.g. in `CID/1` or encoding profiles), not by changing the hash function definition.

### 6.2 Adding new algorithms

A new `"ASL1"` algorithm MAY be added in a future `HASH/ASL1` version if and only if:

* it satisfies the family properties in §3.1;

* it has a fixed digest length `L > 0` bytes;

* its spec includes:

  * assigned `HashId`,
  * digest length,
  * normative algorithm definition (via external standard or full spec),
  * status (`MANDATORY`, `RECOMMENDED`, `OPTIONAL`, `EXPERIMENTAL`);

* it is introduced via:

  * a new `HASH/ASL1` version,
  * at least one ADR,
  * published test vectors.

Existing `HashId` assignments MUST NOT be repurposed.

### 6.3 Coexistence and migration (informative)

Higher layers can use `"ASL1"`’s crypto agility by:

* computing more than one `Reference` for the same Artifact (multi-hash),
* storing those in receipts, overlays, or catalogs,
* defining profile-specific policies like:

  * “from date D, compute both `HASH-ASL1-256` and `HASH-ASL1-PQ1` for all new Artifacts; prefer 0x8001 for new dependencies.”

`HASH/ASL1` itself:

* does not prescribe when to migrate,
* only guarantees that `HashId` mappings and algorithms are stable.

---

## 7. Conformance

An implementation is **HASH/ASL1–conformant** (v0.2.4) if:

1. **Correct HASH-ASL1-256 implementation**

   * Provides a `HASH-ASL1-256` function:

     * accepts arbitrary-length `OctetString` input,
     * returns a 32-byte `OctetString` digest,

   * matches SHA-256 exactly for all inputs,

   * behaves deterministically and supports incremental operation.

2. **Consistent Reference use with ENC/ASL1-CORE v1**

   * When encoding `ReferenceBytes`, emits:

     * `hash_id` as `u16`,
     * digest bytes equal in length to the algorithm’s canonical digest length.

   * When decoding `ReferenceBytes`:

     * for known `hash_id` values, enforces `len(digest) == canonical_digest_length(hash_id)` and treats mismatches as errors;
     * for unknown `hash_id` values, MAY accept `Reference` structurally but MUST treat the algorithm as unsupported for verification.

3. **Registry immutability**

   * Does not change the meaning of any assigned `HashId`,
   * Does not use reserved IDs as custom algorithms outside the formal registry process.

4. **Family compliance for extra algorithms**

   * For any additional `"ASL1"` algorithms claimed:

     * ensures they satisfy §3.1,
     * documents their digest length and behavior.

5. **Integration with ASL/1-CORE**

   * Uses `ASL/CORE-REF-DERIVE/1` when deriving References in the ASL/1 context,
   * For `ASL_ENC_CORE_V1` and `hash_id = 0x0001`, uses `HASH-ASL1-256` unless a profile explicitly specifies another algorithm.

---

## 8. Security Considerations

1. **Collision risk**

   * Collisions in `HASH-ASL1-256` would be a severe substrate-level integrity issue for systems that rely only on `HashId = 0x0001`.
   * Higher layers (CIL/1, FCT/1, FER/1, OI/1, TGK/PROV-style profiles) SHOULD:

     * assume collisions are possible in principle,
     * provide detection and mitigation strategies (e.g. optional dual-hash, anomaly logging).

2. **Algorithm deprecation**

   * If `HASH-ASL1-256` becomes weak:

     * future specs MAY introduce a new mandatory algorithm,
     * migration strategies SHOULD be defined at profile / domain layers.

   * Existing References with `HashId = 0x0001` remain valid as historical IDs; their meaning MUST NOT be changed.

3. **Side-channel resistance**

   * Implementations SHOULD mitigate timing/cache/power side channels, especially in shared environments.
   * Use well-reviewed crypto libraries where possible.

4. **Non-ASL1 hash usage**

   * Systems MAY use other hash functions (e.g., for local caches, external APIs),
   * Such functions MUST NOT reuse `HashId`s defined in this registry for `Reference.hash_id`,
   * They MUST be clearly separated from ASL/1 identity semantics.

---

## 9. Example (Non-Normative)

Given:

* `EncodingProfileId = ASL_ENC_CORE_V1 (0x0001)`,
* algorithm `HASH-ASL1-256` (`HashId = 0x0001`),
* Artifact:

  ```text
  Artifact {
    bytes    = 0xDE AD
    type_tag = none
  }
  ```

Assume `ENC/ASL1-CORE v1` canonical Artifact encoding:

```text
00                 ; has_type_tag = false
0000000000000002   ; bytes_len = 2 (u64)
DEAD               ; bytes
```

Then:

1. `ArtifactBytes = encode_artifact_core_v1(Artifact)`.
2. `digest = HASH-ASL1-256(ArtifactBytes)` (SHA-256).
3. `Reference = { hash_id = 0x0001, digest = digest }`.
4. `ReferenceBytes` under `ENC/ASL1-CORE v1`:

   ```text
   0001 <32 bytes of digest>
   ```

The frame boundary (e.g., length prefix or message boundary) determines where the digest ends. A consumer that knows `hash_id = 0x0001` and implements HASH/ASL1 will:

* expect exactly 32 digest bytes,
* treat any other length as an error.

This `Reference` can be used consistently across `ASL/1-STORE`, `TGK/1-CORE`, `PEL/1`, `CIL/1`, `FER/1`, `FCT/1`, `OI/1`, with equality defined by `ASL/1-CORE`.

---

## Document History

* **0.2.4 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.