amduat/tier1/asl-1-core.md

499 lines
14 KiB
Markdown
Raw Permalink Normal View History

# ASL/1-CORE — Artifact Substrate Layer (Core)
Status: Approved
Owner: Niklas Rydberg
Version: 0.4.1
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, binary-minimalism]
<!-- Source: /amduat/docs/new/asl-core.md | Canonical: /amduat/tier1/asl-1-core.md -->
**Document ID:** `ASL/1-CORE`
**Layer:** L0 — Pure logical value model (no persistence / execution semantics)
**Depends on (normative):**
* None (foundational model)
**Informative references:**
* `ENC/ASL1-CORE v1.x` — canonical encoding profile (`ASL_ENC_CORE_V1`)
* `HASH/ASL1 v0.2.2` — ASL1 hash family and `HashId` assignments
* `ASL/1-STORE v0.4.0` — content-addressable store over ASL/1-CORE
* `TGK/1-CORE v0.7.0` — trace graph kernel over `Reference`
* `PEL/1` — execution substrate
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/1-CORE defines **only logical values and their equality**.
It does **not** define storage formats, protocols, runtime APIs, or policy.
Primitive logical types:
* `OctetString` — finite sequence of bytes `0x000xFF`.
* `uint16`, `uint32`, `uint64` — fixed-width unsigned integers.
Binary layout, endianness, and on-wire representation come from **encoding profiles**, not from ASL/1-CORE itself.
---
## 1. Purpose & Non-Goals
### 1.1 Purpose
`ASL/1-CORE` defines the **artifact substrate** for Amduat 2.0:
* what an **Artifact** is,
* what a **TypeTag** is,
* what a **Reference** is, and
* how content-addressed identity is defined via canonical encodings and hash functions.
It aims to make computing **sane** by enforcing that:
* content and type hints are explicit,
* identity is precise and field-based,
* logical values are immutable,
* all higher behavior (store, execution, provenance, policy) is layered on top.
All other Amduat layers — STORE, PEL, CIL, FCT, FER, OI, TGK — must respect and build on this substrate.
### 1.2 Non-goals
ASL/1-CORE explicitly does **not** define:
* Store APIs or persistence guarantees.
* Execution runtimes, scheduling, or side-effects.
* Certificates, trust semantics, or authorization.
* Networks, transports, or wire formats.
* Compression, chunking, encryption, or indexing.
Those are defined by `ASL/1-STORE`, `PEL/1`, `CIL/1`, `FCT/1`, `FER/1`, `OI/1`, `TGK/1-CORE`, and other profiles.
---
## 2. Core Value Model
### 2.1 OctetString
```text
OctetString = finite sequence of 8-bit bytes (0x000xFF)
```
ASL/1-CORE does not assign any structure (e.g., text vs binary) to `OctetString`.
Structure, if any, is introduced by higher-layer semantics keyed off `TypeTag`.
---
### 2.2 TypeTag
A `TypeTag` identifies how higher layers intend to interpret an Artifacts bytes.
```text
TypeTag {
tag_id: uint32
}
```
Properties:
* `tag_id` is opaque at this layer.
* No particular `tag_id` semantics are defined here.
* `tag_id` participates in identity: change the tag, youve changed the Artifact.
#### 2.2.1 Tag ranges (conventions only)
By convention (non-normative here):
* `0x000000000x0FFFFFFF` — core stack / shared profiles.
* `0x100000000xFFFFFFFF` — extension / domain-specific tags.
Concrete registries and governance of `tag_id` live in separate documents.
---
### 2.3 Artifact
An **Artifact** is the fundamental immutable value in ASL/1:
```text
Artifact {
bytes: OctetString
type_tag: optional TypeTag
}
```
Properties:
* Immutable logical value.
* Two identity-sensitive dimensions:
* `bytes` — exact content bytes.
* `type_tag` — presence + `tag_id` if present.
> **ASL/CORE-ART-EQ/1**
> Two Artifacts `A` and `B` are identical in ASL/1-CORE iff:
>
> * `A.bytes` and `B.bytes` are byte-for-byte equal; and
> * either both have no `type_tag`, or both have a `type_tag` and `A.type_tag.tag_id == B.type_tag.tag_id`.
No encoding profile, store, or runtime may alter this equality.
> **ASL/CORE-IMMUT/1**
> Once an Artifact value is created, it is considered immutable. Any change to `bytes` or `type_tag` produces a **different** Artifact.
---
### 2.4 HashId
```text
HashId = uint16
```
A `HashId` identifies a particular hash algorithm in a given family.
ASL/1-CORE itself is hash-family agnostic. The Amduat 2.0 core stack uses the "ASL1" family defined in HASH/ASL1 as the canonical family for identity-critical References.
---
### 2.5 Reference
A **Reference** is a content address for an Artifact:
```text
Reference {
hash_id: HashId
digest: OctetString
}
```
Interpretation:
* `hash_id` selects a hash algorithm (e.g. `HASH-ASL1-256`).
* `digest` is that algorithms digest of a canonical encoding of some Artifact.
> **ASL/CORE-REF-EQ/1**
> Two References `R1` and `R2` are identical iff:
>
> * `R1.hash_id == R2.hash_id`, and
> * `R1.digest` and `R2.digest` are byte-for-byte equal.
No cross-`hash_id` equivalence is defined at this layer. If two different `(hash_id, digest)` pairs refer to Artifacts that happen to be “the same” in some application sense, that is strictly a higher-layer interpretation.
---
## 3. Encoding Profiles
ASL/1-CORE separates logical values from concrete encodings via **encoding profiles**.
### 3.1 EncodingProfileId
```text
EncodingProfileId = uint16
```
Each encoding profile (e.g. `ASL_ENC_CORE_V1`) is defined in its own document and specifies:
* canonical `ArtifactBytes` encodings;
* optionally `ReferenceBytes` encodings;
* invariants required to preserve ASL/1-CORE identity.
The baseline encoding profile in Amduat 2.0 is:
* `ASL_ENC_CORE_V1 = 0x0001` — defined in `ENC/ASL1-CORE v1.x`.
### 3.2 Profile requirements
Any encoding profile used with ASL/1-CORE MUST satisfy:
1. **Identity preservation**
For all Artifacts `A` and `B`:
* `A` and `B` are identical under ASL/CORE-ART-EQ/1
⇔ their canonical encodings under that profile are bit-identical.
2. **Injectivity**
Distinct Artifacts MUST NOT produce identical canonical encodings.
3. **Stability and determinism**
For any Artifact, canonical encoding:
* MUST be stable across time and implementations,
* MUST NOT depend on environment, clock, locale, or configuration.
4. **Explicit structure**
Field ordering and numeric formats MUST be fixed and unambiguous.
5. **Byte transparency**
`Artifact.bytes` MUST be encoded exactly as-is (no hidden transcoding).
6. **Streaming-friendliness**
Canonical encodings MUST be producible and consumable in a single forward-only pass.
Encoding profiles MAY impose extra constraints (e.g. on particular `TypeTag` subsets) but MUST NOT break the above.
---
## 4. Hashing and Reference Derivation
ASL/1-CORE defines how canonical encodings and hash functions combine to produce References.
### 4.1 Canonical encoding step
Given:
* Artifact `A`,
* encoding profile `P` with canonical encoder `encode_P(A) -> ArtifactBytes`,
`encode_P` MUST satisfy §3.2.
### 4.2 Reference derivation rule
Given:
* Artifact `A`,
* encoding profile `P`,
* hash algorithm `H` with:
* `HashId = HID`,
* fixed digest length `L` bytes,
then the Reference `R` for `A` under `(P, H)` is:
```text
ArtifactBytes = encode_P(A)
digest = H(ArtifactBytes)
Reference = { hash_id = HID, digest = digest }
```
> **ASL/CORE-REF-DERIVE/1**
> Any component that claims to derive References from Artifacts for a given `(EncodingProfileId, HashId)` **MUST** use this exact procedure.
### 4.3 Deterministic agreement lemma (informative)
For any two conformant implementations that share:
* the same encoding profile `P`, and
* the same hash algorithm `H` with `HashId = HID`,
then for any Artifact `A`:
* both will compute identical `ArtifactBytes`,
* both will compute identical `digest = H(ArtifactBytes)`,
* both will form identical `Reference {hash_id = HID, digest = digest}`.
This is the basis for cross-Store and cross-domain determinism in Amduat.
### 4.4 Canonical family for Amduat 2.0 (informative)
While ASL/1-CORE is conceptually family-agnostic, the **Amduat 2.0 substrate** standardizes:
* `ASL_ENC_CORE_V1` as the canonical Artifact encoding profile;
* `HASH-ASL1-256` (`HashId = 0x0001`) as the canonical default hash algorithm for identity-critical surfaces.
Other `(EncodingProfileId, HashId)` pairs are allowed but must be explicitly declared by the consuming profile or implementation.
### 4.5 Crypto agility
ASL/1-CORE supports evolution by:
* delegating algorithm definitions and `HashId` assignments to `HASH/ASL1`;
* delegating binary encodings to `ENC/*` profiles.
Higher layers MAY:
* compute multiple References for the same Artifact (multi-hash, multi-encoding),
* define migration policies,
* mark some References as “preferred” or “legacy”.
ASL/1-CORE itself:
* treats References as opaque `(hash_id, digest)` pairs;
* does not specify any relationship between different References to “the same” Artifact other than equality within that pair.
---
## 5. Logical vs Physical Representation
### 5.1 Logical-only substrate
Artifacts and References are **logical values**.
ASL/1-CORE:
* does not care where or how theyre stored;
* does not care how theyre transported;
* does not assume any particular API shape.
### 5.2 Internal representation freedom
Implementations MAY represent values as:
* structs,
* slices,
* memory-mapped buffers,
* immutable trees,
* or any other structure,
so long as they can:
* emit canonical encodings for supported profiles,
* compute hashes correctly,
* respect ASL/1-CORE identity semantics.
### 5.3 Passing values between layers
Passing `Artifact` or `Reference` between components:
* means passing a **value**, not a mutable object.
Implementations:
* MAY share underlying buffers internally,
* MUST treat the logical value as immutable,
* MUST NOT let in-place mutation change a value that has already been observed as an Artifact or Reference.
---
## 6. Identity, Equality, and Collisions
### 6.1 Artifact identity
Restating for emphasis:
> **ASL/CORE-ART-ID/1**
> Artifact identity is purely field-based:
>
> * `bytes` equality + `type_tag` presence + `tag_id` equality (if present).
Encoding profiles and hash functions MUST preserve this identity; they MUST NOT introduce alternative notions of “the same artifact” at this layer.
### 6.2 Reference identity
> **ASL/CORE-REF-ID/1**
> Reference identity is purely:
>
> * `hash_id` equality + `digest` byte equality.
Different `(hash_id, digest)` pairs are always distinct References, even if they logically point to the same underlying Artifact as understood by some higher layer.
### 6.3 Collision assumptions
ASL/1-CORE assumes the configured hash algorithms are **cryptographically strong**:
* collisions are treated as extraordinary substrate failures, not supported behavior.
If two distinct Artifacts produce the same `(hash_id, digest)`:
* ASL/1-CORE itself does not define remediation;
* `ASL/1-STORE` is responsible for surfacing this as an integrity error;
* higher profiles (e.g. CIL/1, FCT/1) MAY define detection and response strategies.
---
## 7. Relationship to Other Layers (Informative)
### 7.1 ASL/1-STORE
`ASL/1-STORE`:
* models StoreInstances as partial mappings `Reference -> Artifact`,
* parameterizes each StoreInstance by a single `StoreConfig = {encoding_profile, hash_id}`,
* uses ASL/CORE-REF-DERIVE/1 to compute References in `put`,
* respects ASL/CORE-ART-ID/1 and ASL/CORE-REF-ID/1.
STORE adds persistence, error semantics, and StoreConfig; it does not change the core value model.
### 7.2 TGK/1-CORE
`TGK/1-CORE`:
* treats `Reference` as graph nodes,
* treats specific Artifacts (EdgeArtifacts) as encodings of graph edges,
* defines a ProvenanceGraph as a projection over Artifacts and configured profiles.
TGK relies on ASL/1-CORE to ensure:
* Artifacts are immutable,
* References are stable and deterministic across implementations,
* all provenance evidence is expressed as Artifacts and References.
### 7.3 PEL/1, CIL/1, FCT/1, FER/1, OI/1
These layers:
* allocate specific `TypeTag` ranges and schemas,
* encode programs, execution traces, certificates, facts, overlays as Artifacts,
* use References consistently via ASL/CORE-REF-DERIVE/1,
* may store those values in ASL/1-STORE and expose them through TGK.
They must not override or reinterpret ASL/1-CORE equality; they build on it.
---
## 8. Conformance
An implementation is **ASL/1-COREconformant** if it:
1. **Implements the value types**
* Provides logical structures for `Artifact`, `TypeTag`, and `Reference` with at least the fields described in §2.
2. **Respects Artifact and Reference equality**
* Implements identity exactly as in ASL/CORE-ART-EQ/1 and ASL/CORE-REF-EQ/1 (and the derived ID invariants).
3. **Uses encoding profiles appropriately**
* Uses only encoding profiles that satisfy §3.2.
* For any encoding profile it claims to support, can produce canonical encodings for all Artifacts.
4. **Derives References correctly**
* Derives References strictly according to ASL/CORE-REF-DERIVE/1 for the declared `(EncodingProfileId, HashId)` pair.
5. **Enforces immutability**
* Treats Artifacts and References as immutable logical values.
* Does not leak any mechanism that would let a consumer mutate an Artifact or Reference “in place”.
6. **Maintains separation of concerns**
* Does not embed storage, execution, policy, or graph semantics into ASL/1-CORE constructs.
* Leaves stores, execution engines, and graph kernels to their respective layers.
Everything else — API design, transport formats, performance characteristics, deployment topology — lies outside ASL/1-CORE and MUST be specified by separate surfaces.
---
## Document History
* **0.4.1 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.