Adding TGK specifications.

This commit is contained in:
Carl Niklas Rydberg 2025-12-20 11:32:17 +01:00
parent 1e88925ece
commit 47e3ccc382
5 changed files with 4644 additions and 0 deletions

738
tier1/enc-tgk1-edge-1.md Normal file
View file

@ -0,0 +1,738 @@
# ENC/TGK1-EDGE/1 — Canonical Encoding for TGK EdgeArtifacts
Status: Approved
Owner: Niklas Rydberg
Version: 0.1.0
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [binary-minimalism, traceability]
<!-- Source: /amduat/docs/new/enc-tgk1-edge1.md | Canonical: /amduat/tier1/enc-tgk1-edge-1.md -->
**Document ID:** `ENC/TGK1-EDGE/1`
**Profile ID:** `TGK1_EDGE_ENC_V1 = 0x0201` (symbolic; concrete assignment lives in encoding-profile registry)
**Layer:** Edge Encoding Profile (on top of ASL/1-CORE + TGK/1-CORE)
**Depends on (normative):**
* `ASL/1-CORE v0.4.x` — value model (`Artifact`, `TypeTag`, `Reference`, `HashId`, identity model)
* `ENC/ASL1-CORE v1.x` — canonical encodings for `Artifact` and `Reference`
* `TGK/1-CORE v0.7.x` — trace graph kernel (`Node`, `EdgeBody`, `EdgeTypeId`, edgehood invariants)
**Integrates with (informative):**
* `HASH/ASL1 v0.2.x` — ASL1 hash family for `EdgeRef` identity
* `ASL/1-STORE v0.4.x` — content-addressable store holding EdgeArtifacts
* `SUBSTRATE/STACK-OVERVIEW v0.2.x` — stack layering discipline
* TGK type catalogs (e.g. `TGK/TYPES-CORE`) — `EdgeTypeId` semantics
* Future TGK profiles (`TGK/STORE/1`, `TGK/PROV/1`) that interpret edges
> The Profile ID `TGK1_EDGE_ENC_V1` is a configuration label.
> It is **not** embedded into edge payloads. Encoders and decoders select this encoding by context (type tag + profile configuration), not per value.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Overview
`ENC/TGK1-EDGE/1` defines the **canonical, streaming-friendly, injective binary encoding** of the `EdgeBody` structure from `TGK/1-CORE`:
```text
EdgeBody {
type: EdgeTypeId // uint32
from: Node[] // Node = Reference
to: Node[]
payload: Reference
}
```
and its embedding as TGK **EdgeArtifacts**:
```text
Artifact {
bytes = EdgeBytes // this profile
type_tag = TYPE_TAG_TGK1_EDGE_V1
}
```
where `EdgeBytes` is a single `OctetString` (sequence of bytes) used as `Artifact.bytes`.
Under this profile:
* `EdgeBytes` is the canonical representation of an `EdgeBody`.
* Edge identity is the ASL/1 `Reference` over the EdgeArtifact (`EdgeRef`), derived via `ENC/ASL1-CORE` + `HASH/ASL1`.
* The encoding is:
* **Injective** — distinct `EdgeBody` values → distinct `EdgeBytes`.
* **Deterministic & stable** — same `EdgeBody` → same `EdgeBytes` across implementations and time.
* **Streaming-friendly** — encoders, decoders, and hashers can operate in a single forward-only pass.
In line with `TGK/1-CORE`:
* Each EdgeArtifact encodes **exactly one** logical edge (one `EdgeBody`).
* All TGK edges are represented as ordinary ASL/1 Artifacts plus their ASL `Reference` identities; this profile introduces no additional identity or node/edge ID layer.
> **Non-goal:** This profile does **not** define what any particular `EdgeTypeId` “means”, nor how graphs are stored, indexed, or traversed. Those behaviors are defined by `TGK/1-CORE`, TGK type catalogs, and higher-layer profiles.
---
## 1. Scope & Layering
### 1.1 Purpose
This specification defines:
* The **binary layout** of:
* `EdgeBytes` — canonical encoding for `EdgeBody`.
* `EncodedRef` — an internal wrapper for embedding ASL `Reference`s.
* Canonical field ordering and integer widths.
* How `EdgeBytes` are bound into EdgeArtifacts and converted into `EdgeRef` identity.
It does **not** define:
* TGK graph semantics or provenance algorithms (`TGK/1-CORE`, `TGK/PROV/1`).
* Store or transport APIs (`ASL/1-STORE`, deployment profiles).
* Edge-type catalogs (`TGK/TYPES-*`) or policy.
### 1.2 Layering constraints
In line with `SUBSTRATE/STACK-OVERVIEW` and `TGK/1-CORE`:
* `ENC/TGK1-EDGE/1` is a **TGK edge-encoding profile**, not a kernel primitive.
* It MUST NOT:
* redefine `Artifact`, `Reference`, `HashId`, or `TypeTag` (from `ASL/1-CORE`);
* redefine `Node`, `EdgeBody`, or `EdgeTypeId` (from `TGK/1-CORE`);
* embed store, provenance, or policy semantics into its layout.
* It defines exactly one canonical encoding for `EdgeBody` values under the profile ID `TGK1_EDGE_ENC_V1`.
TGK/1-CORE sees this profile as providing a partial function:
```text
decode_edge_payload_TGK1_EDGE :
OctetString -> EdgeBody | error
```
that is:
* **partial** — may fail with an error for some inputs;
* **deterministic** — a pure function of its input bytes, with no dependence on environment or mutable state;
* **side-effect free** — decoding does not consult stores, catalogs, or policy.
Artifacts whose `type_tag` selects this profile use `decode_edge_payload_TGK1_EDGE` as their TGK edge decoder in the sense of `TGK/1-CORE §3.2`.
---
## 2. Conventions
### 2.1 RFC 2119 terms
The key words **MUST**, **MUST NOT**, **SHOULD**, **MAY**, etc. are to be interpreted as described in RFC 2119.
### 2.2 Integer encodings
All multi-byte integers are encoded as **big-endian** (network byte order), as in `ENC/ASL1-CORE`:
* `u8` — 1 byte
* `u16` — 2 bytes
* `u32` — 4 bytes
* `u64` — 8 bytes
Only fixed-width integers are used.
### 2.3 Lists
A list of values of some type `T` is encoded as:
```text
List<T> ::
count (u32)
element_0
element_1
...
element_{count-1}
```
* `count` is the number of elements (MAY be zero).
* Elements are encoded in order using the canonical encoding of `T`.
### 2.4 Embedded Reference (`EncodedRef`)
Within `EdgeBytes`, ASL/1 `Reference` values are embedded using a length-prefixed wrapper over canonical `ReferenceBytes` from `ENC/ASL1-CORE`:
```text
EncodedRef ::
ref_len (u32)
ref_bytes (byte[0..ref_len-1]) // canonical ReferenceBytes
```
Where:
* `ref_bytes` MUST be the canonical `ReferenceBytes` encoding of some `Reference` value under `ENC/ASL1-CORE v1.x`:
```text
ReferenceBytes ::
hash_id (u16)
digest (byte[...]) // remaining bytes in the frame
```
* `ref_len` MUST be the exact byte length of `ref_bytes` and MUST be ≥ 2.
Decoders MUST:
1. Read `ref_len (u32)`.
2. Read exactly `ref_len` bytes as `ref_bytes`.
3. Decode `ref_bytes` as `ReferenceBytes` per `ENC/ASL1-CORE v1.x`.
4. Reject encodings where:
* `ref_len < 2`, or
* `ref_bytes` is not a valid `ReferenceBytes` sequence (e.g. truncated or improperly framed in its context).
If the implementation also implements `HASH/ASL1` and recognizes the decoded `hash_id`, it MUST apply any length checks required by `ENC/ASL1-CORE` / `HASH/ASL1` for that `HashId` (e.g. fixed digest length). Failures MUST be treated as encoding/integrity errors.
`EncodedRef` is purely an internal framing wrapper for this profile; it introduces no additional semantics beyond “a `Reference` encoded canonically and length-prefixed so it can be embedded in larger structures”.
This pattern mirrors `EncodedRef` from `ENC/PEL-TRACE-DAG/1` for cross-profile consistency.
### 2.5 Encoding version field (`edge_version`)
`EdgeBytes` includes an `edge_version (u16)` field:
* For `TGK1_EDGE_ENC_V1`, encoders **MUST** always write `edge_version = 1`.
* Decoders for this profile:
* **MUST** accept `edge_version = 1`; and
* **MUST** treat any other value as “**not this encoding**” and fail decoding.
Within this profile, `edge_version` is a **guard word**, not an evolution mechanism:
* This document will never assign any other meaning than “constant value 1” to `edge_version` for `TGK1_EDGE_ENC_V1`.
* Values other than `1` simply indicate that the bytes are not an `EdgeBytes` value for this profile.
Any incompatible change to the `EdgeBytes` layout MUST be expressed as a **new encoding profile** (e.g. `TGK1_EDGE_ENC_V2` with its own Profile ID, and almost certainly a new `TypeTag`), not by reusing this profile with `edge_version = 2`.
Append-only extensions that would change the canonical mapping from `EdgeBody` to bytes are also out of scope for this profile; they belong in new profiles. Canonical `EdgeBody → EdgeBytes` mapping for `TGK1_EDGE_ENC_V1` is fixed and permanently tied to `edge_version = 1`.
---
## 3. Logical Model Reference (from TGK/1-CORE)
> **Source of truth:** `TGK/1-CORE`.
> This section is an informative restatement; in any conflict, `TGK/1-CORE` governs.
### 3.1 Node
```text
Node := Reference // ASL/1 Reference
```
Nodes are graph vertices identified solely by their `Reference` value.
### 3.2 EdgeTypeId
```text
EdgeTypeId = uint32
```
Semantics of particular `EdgeTypeId` values are defined by TGK type catalogs and profiles, not by this document.
### 3.3 EdgeBody
```text
EdgeBody {
type: EdgeTypeId
from: Node[] // ordered, MAY be empty
to: Node[] // ordered, MAY be empty
payload: Reference // always present
}
```
Relevant invariant from `TGK/1-CORE`:
> **TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1**
> For a well-formed `EdgeBody`, at least one of `from` or `to` **MUST** be non-empty.
> An `EdgeBody` with both `from = []` and `to = []` is invalid and MUST NOT be produced or accepted as a TGK edge.
Other notes from `TGK/1-CORE`:
* Duplicates within `from` or `to` are allowed.
* `payload` may also appear in `from` or `to`.
* Semantics of such patterns, if any, are profile-specific.
`ENC/TGK1-EDGE/1` encodes exactly these fields and MUST NOT introduce additional logical data at the `EdgeBody` level.
---
## 4. EdgeBody Encoding
### 4.1 Overall layout: `EdgeBytes`
The canonical encoding of an `EdgeBody` under `TGK1_EDGE_ENC_V1` is a single self-contained byte sequence:
```text
EdgeBytes ::
edge_version (u16)
type_id (u32) // EdgeTypeId
from_count (u32)
from_nodes (EncodedRef[0..from_count-1])
to_count (u32)
to_nodes (EncodedRef[0..to_count-1])
payload_ref (EncodedRef)
```
`EdgeBytes` is treated as an indivisible frame. When embedded in larger structures or protocols, the enclosing layer is responsible for providing the frame boundaries (e.g. via a length-prefix or message framing).
Field roles:
1. **edge_version (u16)**
* Guard word for this encoding profile.
* For `TGK1_EDGE_ENC_V1`, encoders **MUST** set `edge_version = 1` for all values.
* Decoders for this profile:
* **MUST** accept `edge_version = 1`; and
* **MUST** treat any other value as “not a `TGK1_EDGE_ENC_V1` edge payload” and fail decoding.
`edge_version` is not a version knob for evolving `TGK1_EDGE_ENC_V1`; it is a constant sanity check to quickly reject mismatched bytes.
2. **type_id (u32)**
* Encodes `EdgeBody.type : EdgeTypeId`.
* The meaning of each `EdgeTypeId` value is external to this spec.
3. **from_count (u32)** and **from_nodes**
* `from_count` is the length of `EdgeBody.from`.
* `from_nodes` is a list of `from_count` `EncodedRef` entries, each encoding a `Node` (i.e. a `Reference`).
* Order MUST match the logical `from` list; duplicates are allowed; MAY be zero-length.
4. **to_count (u32)** and **to_nodes**
* `to_count` is the length of `EdgeBody.to`.
* `to_nodes` is a list of `to_count` `EncodedRef` entries.
* Order MUST match the logical `to` list; duplicates are allowed; MAY be zero-length.
5. **payload_ref (EncodedRef)**
* Encodes `EdgeBody.payload : Reference`.
* Always present and encoded as a single `EncodedRef`.
### 4.2 Encoding procedure (normative)
Let `E` be a logical `EdgeBody` value. The canonical encoding function:
```text
encode_edgebody_tgk1_v1 : EdgeBody -> EdgeBytes
```
is defined as:
1. Set `edge_version = 1`.
2. Emit `edge_version` as `u16`.
3. Emit `E.type` as `type_id (u32)`.
4. Let `from_count = len(E.from)`; emit `from_count (u32)`.
5. For each `Node` in `E.from` in order:
* Let `R` be that `Node` (an ASL `Reference` value).
* Encode `R` as canonical `ReferenceBytes` using `ENC/ASL1-CORE v1.x`.
* Wrap as `EncodedRef` (see §2.4) and append.
6. Let `to_count = len(E.to)`; emit `to_count (u32)`.
7. For each `Node` in `E.to` in order:
* Encode as `EncodedRef` as above and append.
8. Encode `E.payload` as canonical `ReferenceBytes`, wrap as `EncodedRef`, and append as `payload_ref`.
9. Enforce the TGK non-empty endpoint invariant at encoding time:
* If `from_count == 0` **and** `to_count == 0`, the encoder MUST fail and MUST NOT produce `EdgeBytes` for this `EdgeBody` under this profile.
> **TGK1-EDGE-NONEMPTY/ENC/1**
> Encoders for `TGK1_EDGE_ENC_V1` **MUST** reject any attempt to encode an `EdgeBody` with `from = []` and `to = []`.
> Such a value is not a well-formed TGK edge per `TGK/1-CORE` and MUST NOT be emitted as an EdgeArtifact payload.
### 4.3 Decoding procedure (normative)
Given a byte slice known to contain exactly one `EdgeBytes` frame under this profile, the canonical decoding function:
```text
decode_edgebody_tgk1_v1 : EdgeBytes -> EdgeBody | error
```
is defined as:
1. Read `edge_version (u16)`.
* If `edge_version != 1`, fail with an encoding error (e.g. “not `TGK1_EDGE_ENC_V1`”).
2. Read `type_id (u32)`.
3. Read `from_count (u32)`.
* For `i = 0 .. from_count-1`, read and decode one `EncodedRef` as a `Reference` and append to `from_nodes`.
4. Read `to_count (u32)`.
* For `j = 0 .. to_count-1`, read and decode one `EncodedRef` and append to `to_nodes`.
5. Read `payload_ref` as a single `EncodedRef` and decode to `payload : Reference`.
6. If `from_count == 0` **and** `to_count == 0`, fail with an encoding error:
* This violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1` and `TGK1-EDGE-NONEMPTY/ENC/1`.
7. If the decoding context expects an isolated `EdgeBytes` value:
* After step 5 (or 6), if any unread bytes remain in the slice, the decoder MUST treat this as an encoding error (trailing data).
8. Construct and return:
```text
EdgeBody {
type = EdgeTypeId(type_id)
from = from_nodes
to = to_nodes
payload = payload
}
```
Decoders MUST additionally treat as encoding errors:
* truncated sequences (insufficient bytes for any declared field or `EncodedRef`);
* invalid `EncodedRef` encodings (see §2.4);
* any integer reads that cannot be completed because the input ends early.
`decode_edgebody_tgk1_v1` MUST be deterministic and MUST NOT depend on any external configuration beyond:
* the bytes in the `EdgeBytes` frame; and
* the static definition of `ENC/ASL1-CORE v1.x` used to decode embedded `ReferenceBytes`.
Recognition of `type_id` values (as supported or not in a given ExecutionEnvironment) is handled by `TGK/1-CORE` and the local catalog. This profile always decodes the raw `EdgeBody` structure, regardless of whether the environment later chooses to treat it as an EdgeArtifact.
---
## 5. EdgeArtifact Binding & Profile Selection
### 5.1 EdgeArtifact shape
Under this profile, EdgeArtifacts MUST be ASL/1 Artifacts of the form:
```text
Artifact {
bytes = EdgeBytes
type_tag = TYPE_TAG_TGK1_EDGE_V1
}
```
Where:
* `TYPE_TAG_TGK1_EDGE_V1` is a `TypeTag` whose concrete `tag_id`:
* is assigned in the global TypeTag registry, and
* is included in the environments `EDGE_TAG_SET` when this profile is active.
ExecutionEnvironments that wish to treat such Artifacts as TGK edges MUST:
* include `TYPE_TAG_TGK1_EDGE_V1.tag_id` in their configured `EDGE_TAG_SET`; and
* register `TGK1_EDGE_ENC_V1` as the edge-encoding profile for that tag, so that `decode_edge_payload_TGK1_EDGE` is used for those Artifacts `bytes`.
This document treats `TYPE_TAG_TGK1_EDGE_V1` symbolically and does not assign a numeric `tag_id`.
### 5.2 Integration with TGK/1-COREs `decode_edge_payload_P`
For ExecutionEnvironments that activate `TGK1_EDGE_ENC_V1` for `TYPE_TAG_TGK1_EDGE_V1`, the corresponding `decode_edge_payload_P` function from `TGK/1-CORE §3.2` is:
```text
decode_edge_payload_TGK1_EDGE(bytes: OctetString) -> EdgeBody | error
```
defined by:
```text
decode_edgebody_tgk1_v1(bytes)
```
from §4.3.
Conformant implementations MUST:
* apply `decode_edge_payload_TGK1_EDGE` only to Artifacts whose `type_tag.tag_id` is configured to use this profile; and
* treat any decoding failure as “not a valid edge payload for this profile”.
Multi-profile behavior (e.g., co-existence with other edge encodings) is governed by `TGK/1-CORE §3.2`. In particular:
* If more than one active profile successfully decodes the same `Artifact.bytes`, all such profiles MUST decode to the same logical `EdgeBody` value.
* If two active profiles decode the same bytes to different `EdgeBody` values, the ExecutionEnvironment MUST NOT treat that Artifact as an EdgeArtifact until the conflict is resolved.
---
## 6. EdgeRef Identity via ASL/1-CORE
Given:
* `EdgeBytes` from §4;
* an `EdgeArtifact`:
```text
A_edge = Artifact {
bytes = EdgeBytes
type_tag = TYPE_TAG_TGK1_EDGE_V1
}
```
* `ENC/ASL1-CORE v1.x` for canonical `ArtifactBytes`;
* a hash algorithm `H` with `HashId = HID` from `HASH/ASL1`,
the canonical `EdgeRef : Reference` (the edge identity) is:
```text
ArtifactBytes = encode_artifact_core_v1(A_edge)
digest = H(ArtifactBytes)
EdgeRef = Reference { hash_id = HID, digest = digest }
```
This profile does not introduce any new identity scheme. Edge identity is entirely determined by:
* the ASL/1 Artifact identity model,
* the selected encoding profile (typically `ASL_ENC_CORE_V1`), and
* the selected hash algorithm (`HASH/ASL1`).
---
## 7. Canonicality & Injectivity
### 7.1 Injectivity
> **TGK1-EDGE-INJECTIVE/ENC/1**
> Under `TGK1_EDGE_ENC_V1`, the mapping:
>
> ```text
> EdgeBody -> EdgeBytes
> ```
>
> MUST be injective. That is, for any two `EdgeBody` values `E1` and `E2`:
>
> ```text
> E1 != E2 ⇒ encode_edgebody_tgk1_v1(E1) != encode_edgebody_tgk1_v1(E2)
> ```
This is ensured by:
* encoding all logical fields (`type`, `from`, `to`, `payload`);
* preserving list order exactly;
* using a fixed, explicit binary layout.
### 7.2 Stability
For the fixed profile `TGK1_EDGE_ENC_V1` (with the guard word `edge_version = 1`):
* The same logical `EdgeBody` MUST always encode to the same `EdgeBytes` across:
* implementations,
* platforms,
* executions,
* and time.
Encoders MUST NOT:
* reorder elements of `from` or `to`;
* alter integer widths or endianness;
* introduce alternative layouts for any field;
* use any `edge_version` other than `1`.
---
## 8. Error Handling (Encoding Layer)
Decoders for this profile MUST treat as **encoding errors** (to be surfaced as some error category at the API boundary):
1. **Guard word mismatch**
* `edge_version != 1`.
2. **Truncated fields**
* Not enough bytes to read any declared field (`u16`, `u32`, `EncodedRef`, list elements).
3. **Invalid `EncodedRef`**
* `ref_len < 2`; or
* `ref_bytes` is not a valid `ReferenceBytes` sequence per `ENC/ASL1-CORE v1.x`; or
* (when `HASH/ASL1` is implemented and `hash_id` is known) the digest length implied by `ref_bytes` does not match the canonical length for that `HashId`.
4. **Empty endpoints**
* `from_count == 0` **and** `to_count == 0` (violation of `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1`).
5. **Inconsistent list lengths**
* Fewer actual `EncodedRef` entries than indicated by `from_count` or `to_count`.
6. **Trailing data in isolated contexts**
* Additional bytes remaining after a full `EdgeBytes` value has been decoded, when the decoding context expects exactly one `EdgeBytes` frame.
Translating these into concrete error codes (e.g. `ERR_TGK1_EDGE_ENC_INVALID`) is implementation-specific, but MUST result in rejection of the payload as an `EdgeBytes` value under this profile.
Semantic errors about `EdgeTypeId` recognition or edge-type-specific constraints are handled by TGK catalogs and higher profiles, not at the encoding layer.
---
## 9. Streaming & Implementation Notes
Implementations MUST be able to encode and decode `EdgeBytes` in a **single forward-only pass**:
* All length prefixes (`from_count`, `to_count`, `ref_len`) precede their content.
* Decoders MUST NOT require backtracking to interpret the structure.
For large edges (many endpoints):
* Encoders MAY stream `EncodedRef` entries as they are generated.
* Decoders MAY stream `EncodedRef` entries to consumers or hashers as they are read.
Any such streaming strategy MUST be observationally equivalent to decoding the entire `EdgeBytes` into an `EdgeBody` in memory and MUST respect the canonical layout.
---
## 10. Conformance
An implementation is **ENC/TGK1-EDGE/1conformant** if, for `TGK1_EDGE_ENC_V1`, it:
1. **Implements canonical EdgeBody encoding/decoding**
* Implements `encode_edgebody_tgk1_v1` and `decode_edgebody_tgk1_v1` exactly as specified in §4.
* Always writes `edge_version = 1` when encoding.
* Accepts only `edge_version = 1` and treats any other value as “not this encoding”.
2. **Uses `EncodedRef` correctly**
* Embeds `Reference` values via `EncodedRef` as in §2.4.
* Uses canonical `ReferenceBytes` from `ENC/ASL1-CORE v1.x` when forming `ref_bytes`.
* Applies `HASH/ASL1` length checks for known `HashId`s when available.
3. **Enforces TGK invariants at the encoding layer**
* Rejects encodings with both `from` and `to` empty (`TGK1-EDGE-NONEMPTY/ENC/1`).
* Treats malformed payloads as encoding errors as per §8.
4. **Binds EdgeBytes into EdgeArtifacts correctly**
* When forming EdgeArtifacts, sets:
```text
Artifact.bytes = EdgeBytes
Artifact.type_tag = TYPE_TAG_TGK1_EDGE_V1
```
* Does not embed additional logical data into the Artifact beyond `EdgeBody` and `type_tag`.
5. **Derives EdgeRef identity via ASL/1-CORE**
* Uses `ENC/ASL1-CORE v1` and `HASH/ASL1` for identity, as in §6.
* Does not introduce alternative edge identity mechanisms at this layer.
6. **Integrates with TGK/1-CORE profile selection**
* Applies `decode_edge_payload_TGK1_EDGE` only to Artifacts whose `type_tag.tag_id` is configured for this profile.
* Respects multi-profile behavior rules from `TGK/1-CORE §3.2` when other edge encodings are also active.
7. **Preserves injectivity and stability**
* Distinct `EdgeBody` values always produce distinct `EdgeBytes`.
* The same `EdgeBody` always produces the same `EdgeBytes` under this profile.
Everything else — storage layout, access protocols, graph indexes, provenance algorithms, and edge-type semantics — is defined by `ASL/1-STORE`, `TGK/1-CORE`, TGK catalogs, and higher-layer profiles.
---
## 11. Informative Example (Sketch)
> Non-normative; values and hex are illustrative only.
Consider an edge:
```text
EdgeBody {
type = 0x00000010 // EDGE_EXECUTION (for example)
from = [N_prog, N_input]
to = [N_output]
payload = R_receipt
}
```
Where `N_prog`, `N_input`, `N_output`, and `R_receipt` are `Reference` values with canonical `ReferenceBytes`:
```text
Ref(N_prog) = ReferenceBytes(N_prog) // length = len_pg, bytes = bytes_pg
Ref(N_input) = ReferenceBytes(N_input) // length = len_in, bytes = bytes_in
Ref(N_output) = ReferenceBytes(N_output) // length = len_out, bytes = bytes_out
Ref(R_receipt) = ReferenceBytes(R_receipt) // length = len_rc, bytes = bytes_rc
```
Then `EdgeBytes` under this profile are:
```text
edge_version = 0001 ; u16 (guard word)
type_id = 00000010 ; u32
from_count = 00000002 ; 2 sources
from_nodes =
000000?? bytes_pg ... ; EncodedRef(N_prog)
000000?? bytes_in ... ; EncodedRef(N_input)
to_count = 00000001 ; 1 target
to_nodes =
000000?? bytes_out ... ; EncodedRef(N_output)
payload_ref =
000000?? bytes_rc ... ; EncodedRef(R_receipt)
```
Where each `EncodedRef(X)` is:
```text
ref_len(X) (u32) || ReferenceBytes(X)
```
These `EdgeBytes` become `Artifact.bytes` for an EdgeArtifact with `type_tag = TYPE_TAG_TGK1_EDGE_V1`. All conformant encoders MUST produce the same bytes for the same logical `EdgeBody`; all conformant decoders MUST reconstruct the same `EdgeBody` from those bytes.
---
**End of `ENC/TGK1-EDGE/1 v0.1.0 — Canonical Encoding for TGK EdgeArtifacts` (draft).**
---
## Document History
* **0.1.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.

View file

@ -0,0 +1,240 @@
# OPREG/TGK-DOCGRAPH/1 — Document Graph Registry
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: Plan
Last Updated: 2025-12-01
Linked Phase Pack: PH12
Tags: [registry, tgk, docgraph]
<!-- Source: /amduat/logs/ph12/evidence/import/PH12-EV-IMPORT-001/opreg-tgk-docgraph-design-20251201.md | Canonical: /amduat/tier1/opreg-tgk-docgraph-1.md -->
**Document ID:** `OPREG/TGK-DOCGRAPH/1`
**Layer:** L1 Profile (TGK Doc Graph Registry over `TGK/1-CORE` + `ENC/TGK1-EDGE/1`)
**Depends on (normative):**
* `ASL/1-CORE v0.4.x``Artifact`, `Reference`, `TypeTag`, `HashId`
* `ENC/ASL1-CORE v1.x` — canonical encodings for Artifacts and References
* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HASH-ASL1-256`)
* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId`
* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for `EdgeBody` / EdgeArtifacts
* `AMDUAT-DOCID` (Tier-0) — document identity and SoT/surface model
**Integrates with (informative):**
* `TGK/STORE/1` — graph store/query profile over ASL/1-STORE + TGK
* ADR-032 and PH10/PH12 import designs (RΩ / export)
* Future doc graph consumers (assistant overlays, IDX, provenance views)
© 2025 Amduat Programme.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Purpose and Non-Goals
### 0.1 Purpose
`OPREG/TGK-DOCGRAPH/1` defines a **doc/import/navigation graph registry** for Amduat:
* It names **node concepts** (as ASL/1 Artifacts) for:
* conceptual documents (DOCID lineages),
* document versions at a given snapshot (e.g. RΩ),
* Git commits and blobs,
* Amduat SoT instances.
* It names **edge types** (`EdgeTypeId`s) that connect those concepts:
* document ↔ version, surface, SoT state,
* version ↔ Git blob/commit,
* document ↔ Amduat instance.
* It constrains how those edges are represented as EdgeArtifacts under
`ENC/TGK1-EDGE/1` and consumed via `TGK/STORE/1`.
This registry is intentionally **doc/import scoped**. Execution, fact, and
certificate edges live in their own TGK/OPREG registries and MUST NOT reuse
`EdgeTypeId` assignments from this doc graph registry.
This Tier-1 stub is the **canonical registry companion** to the PH12 design
note `PH12-EV-IMPORT-001 — Doc Graph OPREG Profile Design
(/logs/ph12/evidence/import/PH12-EV-IMPORT-001/opreg-tgk-docgraph-design-20251201.md)`,
which records design intent and sandbox experience; this document is the SoT
for the node and edge vocabulary.
### 0.2 Non-goals
This registry does **not** define:
* any storage API (`ASL/1-STORE`, `TGK/STORE/1` already cover that),
* any provenance algorithms or queries (`TGK/PROV/1` and higher layers),
* any assistant or overlay behavior (those consume this registry),
* concrete import/export profiles (ADR-032 handles those).
It only defines **concepts and edge types**; encoding and storage use existing
Tier-1 profiles.
---
## 1. Node Concepts (Informative overview)
This section summarizes node concepts; canonical encodings and type_tags are
defined in companion encoding profiles (TBD).
### 1.1 DOC_CONCEPT
Conceptual governed document identity per `AMDUAT-DOCID`:
* `identity_authority` (string),
* `lineage_id` (string),
* optional `doc_code` (string),
* optional `code_status` (e.g. `tentative`, `stable`).
There is exactly one `DOC_CONCEPT` node per `(identity_authority, lineage_id)`.
### 1.2 DOC_VERSION
Versioned SoT slice of a governed document at a snapshot commit:
* `identity_authority`, `lineage_id`, `doc_code`, `code_status`,
* `g_commit` (Git commit id),
* `sha256` (content hash of the doc bytes at `g_commit`),
* `path` (repository path at `g_commit`, e.g. `/amduat/tier0/docid.md`),
* `surface`, `sot` (SoT state) per DOCID header.
Multiple `DOC_VERSION` nodes may exist for a `DOC_CONCEPT` across commits.
### 1.3 GIT_COMMIT
Git commit metadata:
* `commit` (sha1),
* `parents` (list of parent commit ids),
* `tree` (tree id),
* `author_name`, `author_email`, `authored_at`,
* `committer_name`, `committer_email`, `committed_at`,
* summary or truncated message.
### 1.4 GIT_BLOB
Content snapshot for a single blob at `g_commit`:
* `blob_sha` (sha1),
* `sha256` (content hash),
* `size_bytes`,
* `mode` (tree mode, including exec/symlink bits),
* `path` at `g_commit`.
### 1.5 AMDUAT_INSTANCE
Descriptor for an Amduat SoT instance:
* `g_commit` (RΩ commit),
* `store_root` (SoT store root),
* `store_backend_id`,
* references to RΩ FER/1 receipts and manifests,
* optional labels (environment, hostname, etc.).
### 1.6 Helper nodes
* `SURFACE` — surface classification nodes (e.g. `tier0`, `tier1`, `phase`, `evidence`).
* `SOT_STATE` — SoT state nodes (`Yes`, `Plan`, `Ref`).
---
## 2. Edge Types (Doc Graph Domain)
`EdgeTypeId` values in this registry are reserved for doc/import/navigation
edges. Concrete numeric assignments live in the encoding/catalogue layer.
Implementations and other OPREG registries MUST treat these `EdgeTypeId`s as
belonging exclusively to the **Amduat doc graph domain**:
* the eventual allocation for this registry is expected to reserve a contiguous
`EdgeTypeId` band (informally: an `AMDUAT-DOCGRAPH` band),
* only doc/import/navigation semantics (edges in §§2.12.4) may occupy that
band,
* PEL execution, FER/1, CIL, FCT, and other TGK domains MUST use their own
registries and bands.
### 2.1 Identity & version edges
* `EDGE_DOC_HAS_VERSION`
`DOC_CONCEPT → DOC_VERSION` — this version belongs to this conceptual document.
* `EDGE_VERSION_OF`
`DOC_VERSION → DOC_CONCEPT` — reverse link; derivable from `EDGE_DOC_HAS_VERSION`.
* `EDGE_DOC_HAS_IDENTITY`
`DOC_VERSION → DOC_CONCEPT` — DOCID identity is attached to this version.
### 2.2 Surface & SoT edges
* `EDGE_DOC_ON_SURFACE`
`DOC_VERSION → SURFACE` — surface classification (governance/spec/phase/evidence).
* `EDGE_DOC_SOT`
`DOC_VERSION → SOT_STATE` — SoT status (`Yes`, `Plan`, `Ref`) for this version.
### 2.3 Git provenance edges
* `EDGE_VERSION_HAS_BLOB`
`DOC_VERSION → GIT_BLOB` — ties a document version to the blob at `g_commit`.
* `EDGE_VERSION_FROM_COMMIT`
`DOC_VERSION → GIT_COMMIT` — last commit that touched this path at/before the snapshot.
### 2.4 SoT instance edges
* `EDGE_DOC_MEMBER_OF_AMDUAT`
`DOC_CONCEPT → AMDUAT_INSTANCE` — this document is part of a particular Amduat instance.
---
## 3. Encoding & Store Integration (Summary)
All doc-graph edges:
* are represented as TGK `EdgeBody` values with `EdgeTypeId` from this registry,
* are encoded as EdgeArtifacts via `ENC/TGK1-EDGE/1` using `TYPE_TAG_TGK1_EDGE_V1`,
* derive `EdgeRef` identities via `HASH/ASL1` over `EdgeBytes`,
* live in ASL/1-STORE instances alongside other Artifacts.
Nodes (`DOC_CONCEPT`, `DOC_VERSION`, `GIT_COMMIT`, `GIT_BLOB`, `AMDUAT_INSTANCE`, etc.) are ordinary
ASL/1 Artifacts; their `Reference`s are the TGK nodes.
`TGK/STORE/1` provides query semantics over the resulting graph.
JSON overlays or other projected views (for example, PH12 doc graph sandboxes)
MAY be emitted for human navigation and experiments, but they are always
derived from the underlying node Artifacts and EdgeArtifacts governed by this
registry and `ENC/TGK1-EDGE/1`; overlays are never the source of truth for
doc graph semantics.
---
## 4. Ingest & Encoder Interaction (Informative)
Implementations are expected to:
* materialise node Artifacts per this registry (and companion encoding profiles),
* emit FER/1 receipts for ingest pipelines,
* emit an idempotent edge worklist (doc-edge queue) that references `EdgeTypeId`s
from this registry and node `Reference`s,
* use a separate encoder to turn worklist items into EdgeArtifacts using `ENC/TGK1-EDGE/1`,
writing them into ASL/1-STORE for consumption via `TGK/STORE/1`.
Details of worklist format and encoder scheduling are left to PH12/PHB01
implementation notes; this registry only fixes the conceptual node/edge space.

671
tier1/tgk-1-core.md Normal file
View file

@ -0,0 +1,671 @@
# TGK/1-CORE — Trace Graph Kernel (Core)
Status: Approved
Owner: Niklas Rydberg
Version: 0.7.0
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [traceability, execution]
<!-- Source: /amduat/docs/new/tgk.md | Canonical: /amduat/tier1/tgk-1-core.md -->
**Document ID:** `TGK/1-CORE`
**Layer:** L1.5 — Logical graph kernel over ASL/1 (above ASL/1, orthogonal to PEL/1)
**Depends on (normative):**
* `ASL/1-CORE v0.3.x` — value substrate: `Artifact`, `Reference`, `TypeTag`, identity model
**Informative references:**
* `ENC/ASL1-CORE v1.0.x` — canonical encodings for ASL/1 values (`ArtifactBytes`, `ReferenceBytes`)
* `HASH/ASL1 v0.2.x` — ASL1 hash family
* `ASL/1-STORE v0.3.x` — content-addressable store semantics
* `PEL/1` — execution substrate
* `CIL/1`, `FCT/1`, `FER/1`, `OI/1` — higher-layer profiles built on top of TGK/1
* (future) `ENC/TGK1-EDGE` — canonical edge-encoding profile
* (future) `TGK/STORE/1` — graph store and query semantics
* (future) `TGK/PROV/1` — provenance and trace semantics
> **Versioning note**
> TGK/1-CORE is agnostic to minor revisions of these informative documents, provided they preserve:
>
> * the ASL/1-CORE definitions of `Artifact`, `Reference`, and `TypeTag`, and
> * the existence of canonical encodings and hash families consistent with that model.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
### 0.1 RFC 2119 terminology
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**,
**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be
interpreted as described in RFC 2119.
### 0.2 Terms from ASL/1
This specification reuses the following terms from `ASL/1-CORE`:
* **Artifact** — immutable logical value:
```text
Artifact {
bytes: OctetString
type_tag: optional TypeTag
}
```
* **Reference** — content address (logical identity handle) for an Artifact:
```text
Reference {
hash_id: HashId
digest: OctetString
}
```
* **TypeTag** — opaque `uint32` identifying intended interpretation of an Artifact.
* **HashId**`uint16` identifying a hash algorithm (e.g. from `HASH/ASL1`).
Where this document says **ArtifactRef**, it means an ASL/1 `Reference` that (logically) points to an `Artifact`. TGK/1-CORE does **not** assume the corresponding Artifact is present or retrievable in any particular store.
### 0.3 Additional terminology
* **Node** — synonym for an ASL/1 `Reference` when used as a graph vertex.
* **EdgeBody** — the logical structure of a TGK edge (see §2.2).
* **EdgeArtifact** — an ASL/1 `Artifact` whose payload logically encodes an `EdgeBody` (see §3).
* **EdgeRef** — the ASL/1 `Reference` to an `EdgeArtifact`.
* **EdgeTypeId**`uint32` identifying the semantic type of an edge (see §2.3).
* **ProvenanceGraph** — the logical graph derived from a set of Artifacts and TGK/1 edge semantics (see §4).
* **ExecutionEnvironment** — a concrete deployment context characterized by:
* a **logical snapshot**: a finite set of Artifacts visible at that point in time; and
* a fixed configuration of TGK-related profiles (edge encodings, type catalogs, provenance policies, etc.) “in effect” at that snapshot.
All invariants and uniqueness claims are evaluated with respect to such a finite snapshot.
> **Source-agnostic note (informative)**
> The `Artifacts` set for a snapshot may be aggregated from any combination of ASL/1-STORE instances, archives, exports, or other sources. TGK/1-CORE is indifferent to where Artifacts come from or how they are stored; it operates purely on their logical values and `Reference`s.
TGK/1-CORE defines only **logical structures** and their equality / identity semantics. Physical storage, indexes, query APIs, and provenance algorithms are defined by separate profiles.
---
## 1. Purpose, Scope & Non-Goals
### 1.1 Purpose
`TGK/1-CORE` defines the **minimal logical graph kernel over ASL/1 Artifacts**.
It provides:
* A definition of:
* **Nodes** as ASL/1 `Reference` values (ArtifactRefs); and
* **Edges** as EdgeArtifacts whose payloads decode to `EdgeBody` values.
* A way to view any snapshot of an ExecutionEnvironment (finite set of Artifacts + configured profiles) as a **ProvenanceGraph** that is a **pure projection** over:
* immutable Artifacts (including edge Artifacts), and
* published edge-type specifications and encoding profiles.
* A base vocabulary that higher profiles (PEL/1 integration, certification, facts, overlays, provenance) can use to declare:
* how they encode their relationships into edge Artifacts; and
* how provenance traces are computed as projections over the resulting graph.
In other words:
> TGK/1-CORE makes “graph over artifacts” a first-class, **purely logical** notion, with all evidence residing in ASL/1 Artifacts.
> **TGK/EDGE-AS-ARTIFACT/CORE/1**
> All TGK edges **MUST** be represented as ASL/1 Artifacts (“EdgeArtifacts”), and all references to edges **MUST** be ordinary ASL `Reference`s (“EdgeRef”). TGK/1-CORE **MUST NOT** introduce any separate identity scheme for edges.
### 1.2 Provenance kernel invariant & determinism
> **TGK/PROV-KERNEL/CORE/1**
> For any ExecutionEnvironment considered at a particular **logical snapshot** (a finite set of Artifacts and the profile set in effect at that point):
>
> * the corresponding `ProvenanceGraph` (as defined in §4) is a **pure function** of:
>
> * that Artifact set, and
> * the profiles decoding / edge-derivation rules; and
> * any persisted graph indexes or materialized views are **optimizations only** and **MUST** be consistent with this projection.
> **TGK/DET/CORE/1**
> For a fixed snapshot and fixed profile set, any two TGK/1-COREconformant implementations **MUST** derive isomorphic `ProvenanceGraph`s (identical edge and node sets, up to set equality). No aspect of the graph may depend on wall-clock time, process identity, storage layout, or other non-declared environment state.
> **TGK/NO-OFF-GRAPH-PROV/CORE/1**
> Any relationship that is intended to participate in TGK-level provenance **MUST** be representable as:
>
> * an EdgeArtifact whose payload decodes to an `EdgeBody`, and
> * Nodes (ASL `Reference`s) in its `from` / `to` / `payload` fields.
>
> TGK/1-CORE and its profiles **MUST NOT** rely on hidden, mutable, non-Artifactual state to represent provenance-relevant relationships.
TGK/1-CORE itself does **not** define a particular provenance algorithm; that is the role of `TGK/PROV/1` and higher-layer profiles.
### 1.3 Non-goals
TGK/1-CORE explicitly does **not** define:
* Canonical binary encodings or hashing rules for edges (delegated to edge-encoding profiles such as `ENC/TGK1-EDGE` and the ASL substrate stack).
* Store APIs, physical graph storage, or indexing strategies (delegated to `TGK/STORE/1` and implementation design).
* Error codes, authorization, or transport protocols.
* A query or provenance language (delegated to `TGK/PROV/1`, overlays, or higher-level APIs).
* Global registration or semantics of particular `EdgeTypeId` values (delegated to catalogs and profiles).
TGK/1-CORE is a **logical kernel** only.
### 1.4 Layering and dependencies
TGK/1-CORE sits:
* **Above ASL/1-CORE**:
* Reuses `Artifact`, `Reference`, `TypeTag`, and identity semantics.
* Treats edge data as Artifacts; edge identities are ordinary `Reference`s.
* **Orthogonal to PEL/1**:
* MAY model PEL/1 executions as edges (via profiles).
* Does not impose runtime behavior on PEL/1 engines.
#### 1.4.1 Layering invariant with PEL/1
**TGK/PEL-LAYERING-INV/CORE/1**
* TGK/1-CORE **MUST NOT** impose additional runtime behavior or API obligations on conformant PEL/1 engines beyond those defined in `PEL/1`.
* Any TGK edges that describe PEL/1 executions **MUST** be derivable solely from stored ASL/1 Artifacts (programs, inputs, execution results, receipts) and published specifications.
* Whether a PEL/1 implementation emits edge Artifacts directly is an implementation detail and is **not** part of PEL/1 conformance.
---
## 2. Core Graph Model
### 2.1 Node
A **Node** in TGK/1-CORE is any ASL/1 `Reference`:
```text
Node := Reference // i.e., an ArtifactRef
```
Properties:
* Nodes are identified **only** by their `Reference` value.
* TGK/1-CORE does not distinguish “edge nodes” vs “data nodes”; that is a profile-level notion.
* There is no separate node ID layer; there are no node identifiers beyond `Reference`.
* The presence of a Node in the ProvenanceGraph is implied by its appearance in any TGK edges `from`, `to`, or `payload` fields (see §4.1).
> **Edges-over-edges note (informative)**
> Because Nodes are plain `Reference`s, they can point to any Artifact, including EdgeArtifacts. TGK/1-CORE therefore allows edges-over-edges (meta-edges that describe or govern other edges). The semantics of such patterns are determined by the profiles that define the relevant `EdgeTypeId` values.
### 2.2 EdgeBody
An **EdgeBody** is the logical content of a TGK edge:
```text
EdgeBody {
type: EdgeTypeId
from: Node[] // ordered, MAY be empty
to: Node[] // ordered, MAY be empty
payload: Reference // ArtifactRef, always present
}
```
Semantics and invariants:
* `type : EdgeTypeId`
Identifies the **kind** of relationship (e.g., execution, attestation, overlay mapping). Semantics of each `EdgeTypeId` are defined in separate specifications, not by TGK/1-CORE.
* `from : Node[]`
Ordered list of source nodes. MAY be empty. Order is semantically significant and part of the logical value.
* `to : Node[]`
Ordered list of target nodes. MAY be empty. Order is semantically significant.
* `payload : Reference`
A syntactically valid ASL/1 `Reference`, always present. TGK/1-CORE does **not** require that `payload` be resolvable in any particular store; existence is a deployment concern.
* **Non-emptiness constraint**
> **TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1**
> For a well-formed `EdgeBody`, at least one of `from` or `to` **MUST** be non-empty. An `EdgeBody` with `from = []` **and** `to = []` is invalid and MUST NOT be produced or accepted as a TGK edge.
> **TGK/PROV-EVIDENCE/CORE/1 (RECOMMENDED)**
> To support provenance, edge types that describe “how we got here” **SHOULD** ensure that:
>
> * `payload` references an Artifact whose content is a stable, replayable description of the relationship; and
> * the `from` and `to` node sets can, in principle, be recomputed from that payload and other Artifacts in the environment, according to the edge types profile.
>
> In edge types that use minimal descriptors as payload, those descriptors **SHOULD** themselves be defined such that their content is a deterministic function of the other Artifacts and parameters that define the relationship, so that edge Artifacts can always be re-derived.
**Duplicates and self-reference**
TGK/1-CORE does not forbid:
* duplicate entries within `from`,
* duplicate entries within `to`, or
* `payload` also appearing in `from` or `to`.
The semantics (if any) of such patterns are defined by the profiles that own the relevant `EdgeTypeId`. The kernel only requires that:
* `from` and `to` are ordered lists of syntactically valid ASL/1 `Reference`s; and
* they obey TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1.
TGK/1-CORE does **not** constrain how `EdgeBody` values are encoded into `Artifact.bytes`; this is the role of encoding profiles like `ENC/TGK1-EDGE`.
### 2.3 EdgeTypeId
`EdgeTypeId` identifies the semantic type of an edge:
```text
EdgeTypeId = uint32
```
Constraints:
* For any given ExecutionEnvironment snapshot, each `EdgeTypeId` that appears in TGK edges **MUST** have a single, well-defined and immutable semantics within that environment.
* TGK/1-CORE does not prescribe a global registration mechanism or reserved ranges.
* Catalogs such as `TGK/TYPES-CORE` typically bind `EdgeTypeId` values to human-readable names, owning profiles, and structural constraints (e.g. allowed cardinalities of `from` / `to`), but TGK/1-CORE does not standardize that surface.
**Unknown types**
* If an ExecutionEnvironment encounters an Artifact whose payload decodes to an `EdgeBody` whose `type` is not recognized in its configured catalogs/profile set, it **MUST** treat that Artifact as **not** forming a TGK edge for that environment:
* that Artifact does **not** qualify as an `EdgeArtifact` under §3.1; and
* it therefore contributes no edges or nodes to the ProvenanceGraph.
> **Environment-relative semantics (informative)**
> Recognition of `EdgeTypeId` values depends on the ExecutionEnvironments configured catalogs and profiles. As a result, the exact set of TGK edges derived from a fixed set of Artifacts may differ between environments. TGK/1-CORE considers this expected: the kernel guarantees determinism only *relative* to a given snapshot + profile set, not across all possible environments.
---
## 3. Edge Artifacts and Decoding
TGK/1-CORE uses **EdgeArtifacts** as the only concrete representation of edges.
### 3.1 EdgeArtifact definition
An **EdgeArtifact** is any ASL/1 `Artifact` that, relative to a given ExecutionEnvironment snapshot:
1. Has a `type_tag` whose `tag_id` is recognized (by the local profile set) as an edge tag; and
2. Has `bytes` that, under at least one applicable edge encoding profile, decode to a single well-formed `EdgeBody` value as defined in §2.2; and
3. Has an `EdgeBody.type` that is recognized (by the local profile set) as a supported `EdgeTypeId` for this environment (see §2.3).
Formally, for a given snapshot:
* Let `EDGE_TAG_SET` be the set of `TypeTag.tag_id` values configured as TGK edge tags.
* For each active edge encoding profile `P` in the environment:
* `P` provides a **partial** decoding function:
```text
decode_edge_payload_P : OctetString -> EdgeBody | error
```
which is a pure function of its input bytes.
> **Configuration origin note (informative)**
> `EDGE_TAG_SET` is derived from the ExecutionEnvironments configured TGK-related profiles and catalogs (e.g., `TGK/TYPES-CORE`, `ENC/TGK1-EDGE`), and/or from explicit deployment configuration. TGK/1-CORE does not prescribe how this configuration is stored, distributed, or governed; it only assumes that, for any snapshot, there is a well-defined set of `TypeTag.tag_id` values considered edge tags. In many deployments, one or more `TypeTag` values (e.g., a `TGK_EDGE_V1` tag) will be reserved specifically for edge Artifacts, but this is a convention, not a kernel requirement.
Then, an Artifact `A` is an EdgeArtifact iff:
* `A.type_tag` is present and `A.type_tag.tag_id ∈ EDGE_TAG_SET`; and
* there exists at least one active profile `P` such that:
```text
decode_edge_payload_P(A.bytes) = EdgeBody E // succeeds, no error
```
where `E` is a well-formed `EdgeBody` per §2.2; and
* `E.type` is recognized in the environment as a supported `EdgeTypeId` for TGK purposes (see §2.3).
Artifacts that satisfy the edge-tag and decoding constraints but whose decoded `EdgeBody.type` is not recognized as a supported `EdgeTypeId` for this environment MUST NOT be treated as EdgeArtifacts (see §2.3).
TGK/1-CORE does not prescribe:
* a particular `tag_id` for EdgeArtifacts; or
* a particular encoding for `EdgeBody` into `Artifact.bytes`.
Those are the responsibility of edge encoding profiles and catalogs.
> **Single-edge-per-artifact invariant (informative)**
> TGK/1-CORE assumes each EdgeArtifact encodes exactly one `EdgeBody` and thus one logical edge. Bundling multiple logical edges into a single Artifact is outside the TGK/1-CORE model and, if needed, **SHOULD** be expressed as multiple EdgeArtifacts (e.g., via an index or bundle Artifact that refers to other EdgeArtifacts).
> **Environment-relative edgehood (informative)**
> An Artifact can be an EdgeArtifact in one ExecutionEnvironment (given its profile set) and not in another. TGK/1-CORE defines edgehood relative to the configured profiles, not as an intrinsic property of the Artifact alone.
### 3.2 Edge decoding and multi-profile behavior
For each active edge encoding profile `P`:
* The function `decode_edge_payload_P` **MUST** be:
* **partial** — returns either:
* a successfully decoded `EdgeBody`, or
* an error signaling “not a valid edge payload for this profile”;
* **deterministic** — no hidden state, randomness, or external configuration affects its output.
Additional constraints:
* For Artifacts whose `type_tag.tag_id ∉ EDGE_TAG_SET`, all edge encoding profiles **MUST** treat `decode_edge_payload_P` as not applicable (always error) and **MUST NOT** attempt to reinterpret arbitrary non-edge-tag Artifacts as TGK edges.
* For Artifacts whose `type_tag.tag_id ∈ EDGE_TAG_SET`:
* It is **permitted** that some active profiles do not apply (they simply return an error).
* If more than one active profile successfully decodes `A.bytes`, then all those profiles **MUST** decode to the **same** logical `EdgeBody` value. If two active profiles decode the same Artifact to different `EdgeBody` values, the ExecutionEnvironment is misconfigured and **MUST NOT** treat that Artifact as an EdgeArtifact until the conflict is resolved.
> **TGK/EDGE-PROFILE-RECOMMEND/CORE/1 (RECOMMENDED)**
> For operational simplicity, ExecutionEnvironments **SHOULD** configure at most one active edge-encoding profile for any given edge `TypeTag.tag_id` at a time. When multiple profiles may apply to the same EdgeArtifacts (e.g., during a migration), they **MUST** be governed so that any Artifact accepted by more than one profile decodes to the same `EdgeBody`.
### 3.3 EdgeRef
An **EdgeRef** is simply the ASL/1 `Reference` to an EdgeArtifact:
```text
EdgeRef := Reference // reference to an EdgeArtifact
```
Properties:
* No new identity scheme is introduced for edges.
* The identity and equality of EdgeArtifacts and EdgeRefs are fully governed by ASL/1-CORE (canonical encoding + hashing via `ENC/ASL1-CORE` and `HASH/ASL1`).
* For a fixed canonical Artifact encoding and hash profile:
* equality of EdgeRefs is equivalent to equality of the underlying EdgeArtifacts; and
* by injective edge encodings in the applicable encoding profile, equivalent (modulo cryptographic collision assumptions) to equality of their logical `EdgeBody` values.
> **Duplicate logical edges (informative)**
> In most deployments, a given logical edge type and encoding will produce a unique EdgeArtifact for a given `EdgeBody`, because canonical encoding + ASL hashing make that Artifact and its `Reference` unique. Distinct `EdgeRef` values that encode semantically equivalent relationships can still arise if different `TypeTag` / encoding / profile combinations are used to express the same relationship. TGK/1-CORE does not attempt to normalize such cases; higher-layer profiles MAY choose to detect or coalesce them.
> **Store interaction note (informative)**
> Any ASL/1-STORE that holds EdgeArtifacts can be used to resolve `EdgeRef` via normal `get(Reference)` semantics. TGK/1-CORE does not define a separate persistence layer for edges; they are ordinary Artifacts as far as ASL/1-STORE is concerned.
### 3.4 Relationship between EdgeArtifact and EdgeBody
For each EdgeArtifact:
```text
A_edge : Artifact
Ref_edge : Reference // derived per ASL/1-CORE
Body : EdgeBody // Body = EdgeBody(A_edge) via the unique decoding result
```
The mapping `EdgeBody(A_edge)` is determined by the environments active profiles and MUST obey the determinism and well-formedness constraints above.
Encoding profiles such as `ENC/TGK1-EDGE` define:
* the concrete layout of `EdgeBody` into `Artifact.bytes`; and
* how `TypeTag` values map to particular edge schemas.
---
## 4. ProvenanceGraph as Projection
### 4.1 Graph derived from Artifacts
Given:
* a finite snapshot set of Artifacts `Artifacts`; and
* a fixed set of active edge-encoding profiles and type catalogs (the **profile set**) in an ExecutionEnvironment,
the **ProvenanceGraph** induced by `Artifacts` and the profile set is the pair:
```text
ProvenanceGraph {
Nodes: set<Node>
Edges: set<(EdgeRef, EdgeBody)>
}
```
defined as follows:
1. **Edges**
* Let `EdgeArtifacts ⊆ Artifacts` be the subset of Artifacts that qualify as EdgeArtifacts under §3.1 and §3.2.
* For each `A_edge ∈ EdgeArtifacts`:
* Let `Ref_edge` be its ASL `Reference`.
* Let `Body = EdgeBody(A_edge)` be the decoded `EdgeBody`.
Then:
```text
Edges = { (Ref_edge, Body) | A_edge ∈ EdgeArtifacts }
```
2. **Nodes**
Nodes are all ArtifactRefs that appear anywhere in edges:
```text
Nodes = {
n : Reference |
∃ (Ref_edge, Body) ∈ Edges such that
n ∈ Body.from Body.to { Body.payload }
}
```
Clarifications:
* The `Nodes` set includes only `Reference`s that participate in at least one edge as source, target, or payload.
* Artifacts (and their References) that have no incoming or outgoing edges are **not** included in the ProvenanceGraph by TGK/1-CORE. Profiles MAY define derived views that treat all Artifacts as degree-zero nodes, but this is outside the TGK/1-CORE kernel.
* TGK/1-CORE does **not** require that every `Node` in `Nodes` correspond to an Artifact present in the `Artifacts` set. The ProvenanceGraph is a graph over the **Reference space**. Whether a given `Reference` is resolvable to an Artifact in a particular store or federation is outside this kernel and is governed by `ASL/1-STORE` and deployment policy.
> **TGK/GRAPH-PROJECTION/CORE/1**
> For a fixed snapshot set of Artifacts and a fixed profile set, the ProvenanceGraph, as defined above, is unique. Implementations MAY cache or index edges and nodes, but **MUST NOT** introduce logical edges that cannot be derived from EdgeArtifacts and the profiles in effect at that snapshot.
### 4.2 Informative: provenance traces
TGK/1-CORE does **not** define provenance or trace operations normatively. However, it is intended to be the substrate for:
* `TGK/PROV/1`, which defines:
* provenance policies (e.g., “which edge types participate”), and
* trace operators (e.g., backwards reachability) over `ProvenanceGraph`.
As an informative sketch, a backwards provenance operator would:
* start from a set of target `Node`s (ArtifactRefs); and
* walk backwards along edges whose `EdgeTypeId` are selected by some policy,
* until reaching nodes that are considered roots by that policy.
Any such operator **MUST**, when specified in `TGK/PROV/1`, be defined purely as a projection over `ProvenanceGraph`, consistent with `TGK/PROV-KERNEL/CORE/1`, `TGK/DET/CORE/1`, and `TGK/NO-OFF-GRAPH-PROV/CORE/1`.
---
## 5. Interaction with Other Layers (Informative)
### 5.1 Interaction with PEL/1
A PEL/1 execution typically involves:
* a `Program` ArtifactRef,
* zero or more input ArtifactRefs,
* an `ExecutionResult` ArtifactRef that references output ArtifactRefs.
A profile such as `TGK/PEL/1` can define:
* a specific `EdgeTypeId` (e.g., `EDGE_EXECUTION`); and
* an edge encoding that maps PEL/1 execution payloads to an `EdgeBody`:
```text
EdgeBody.type = EDGE_EXECUTION
EdgeBody.from = [program_ref] input_refs[]
EdgeBody.to = output_refs[] [execution_result_ref]
EdgeBody.payload = execution_result_ref
```
Then, for each execution, an EdgeArtifact is produced (by the runtime or an ingestion tool) with:
* a TGK edge `TypeTag`, and
* a payload encoding that an edge profile (e.g., `ENC/TGK1-EDGE`) decodes to such an `EdgeBody`.
The resulting ProvenanceGraph expresses execution relationships as edges over ArtifactRefs.
TGK/1-CORE does not require PEL/1 engines to emit such edge Artifacts; they MAY be derived post hoc from stored Artifacts.
### 5.2 Interaction with CIL/1
CIL/1 defines certificate Artifacts. A profile (e.g., `TGK/CIL/1`) can specify:
* `EdgeTypeId = EDGE_ATTESTS`.
For each certificate Artifact `cert_ref` whose subject is `subject_ref`:
```text
EdgeBody.type = EDGE_ATTESTS
EdgeBody.from = [cert_ref]
EdgeBody.to = [subject_ref]
EdgeBody.payload = cert_ref
```
EdgeArtifacts that encode these `EdgeBody` values make certificate relationships explicit in the ProvenanceGraph.
TGK/1-CORE itself does not verify signatures or policies; CIL/1 and governance profiles do.
### 5.3 Interaction with FCT/1, FER/1, OI/1
Profiles can similarly define:
* evidence-to-fact edges (e.g., `EDGE_FACT_SUPPORTS`),
* overlay mapping edges (e.g., `EDGE_OVERLAY_MAPS`),
* other domain relationships.
The common pattern is:
* define an `EdgeTypeId`;
* define how to encode a logical `EdgeBody` into an EdgeArtifact payload;
* derive the graph as in §4.1.
TGK/1-CORE itself is agnostic to those semantics.
---
## 6. Conformance
An implementation is **TGK/1-COREconformant** if and only if it satisfies all of the following:
1. **Node model**
* Treats any ASL/1 `Reference` as a potential Node (`Node := Reference`).
* Does not introduce a separate node identity layer for TGK purposes.
2. **Edge artifacts and decoding**
* Defines (via configuration or companion specs) which `TypeTag.tag_id` values represent TGK edge Artifacts (`EDGE_TAG_SET`).
* For each active edge encoding profile `P`, provides a partial, deterministic decoder:
```text
decode_edge_payload_P : OctetString -> EdgeBody | error
```
that:
* succeeds (returns `EdgeBody`) exactly for payloads considered valid edges under profile `P`; and
* returns an error otherwise.
* For any Artifact `A` with `A.type_tag.tag_id ∉ EDGE_TAG_SET`, all edge profiles **MUST** treat `decode_edge_payload_P` as not applicable (always error) and **MUST NOT** attempt to interpret `A.bytes` as a TGK edge payload.
* For any Artifact `A` with `A.type_tag.tag_id ∈ EDGE_TAG_SET`:
* `A` is an EdgeArtifact only if at least one active profile successfully decodes `A.bytes` to a well-formed `EdgeBody` whose `type` is recognized as a supported `EdgeTypeId` in the environment.
* If more than one active profile decodes `A.bytes` successfully, they **MUST** all decode it to the same logical `EdgeBody`. If they do not, the environment **MUST NOT** treat `A` as an EdgeArtifact until the inconsistency is resolved.
3. **EdgeBody invariants**
* Treats as well-formed only those `EdgeBody` values that satisfy §2.2:
* `from` and `to` are ordered lists of syntactically valid ASL/1 `Reference`s;
* they satisfy TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1; and
* `payload` is always a syntactically valid ASL/1 `Reference` and always present.
* Edge encoding profiles **MUST** reject payloads that would decode to an `EdgeBody` violating these invariants.
4. **Graph projection**
* Given:
* a finite snapshot set of Artifacts; and
* the configured edge tags + decoding rules (profile set),
* it can construct the ProvenanceGraph as in §4.1:
* Edge set derived from EdgeArtifacts;
* Node set derived from `from`, `to`, and `payload` fields of `EdgeBody` values.
* Any graph indexes or caches it exposes **MUST** be consistent with this projection (`TGK/GRAPH-PROJECTION/CORE/1`, `TGK/DET/CORE/1`).
5. **Immutability**
* Treats EdgeArtifacts as immutable, as required by ASL/1-CORE.
* Does not attempt to “edit” an edge in place; logical changes **MUST** be represented by new Artifacts (edge Artifacts and/or other Artifacts) rather than mutating existing ones.
6. **Layering invariant with PEL/1**
* Respects `TGK/PEL-LAYERING-INV/CORE/1`:
* Does not impose additional requirements on PEL/1 engines beyond those in `PEL/1`.
* Allows PEL/1-related edge profiles to be implemented either by the runtime or by ingestion tools, without affecting PEL/1 conformance.
7. **Profile compatibility**
* If it claims to implement specific TGK-related profiles (e.g., `TGK/PEL/1`, `TGK/CIL/1`, `TGK/PROV/1`), it **MUST**:
* interpret `EdgeTypeId` and edge payloads according to those profiles; and
* ensure that all edges defined by those profiles can be represented as EdgeArtifacts consistent with TGK/1-CORE.
Everything else — canonical encodings for `EdgeBody`, edge hashing, graph store APIs, provenance algorithms, error models — belongs to:
* edge encoding profiles (`ENC/TGK1-EDGE`),
* storage/query profiles (`TGK/STORE/1`), and
* provenance profiles (`TGK/PROV/1`) and higher semantic layers (`FCT/1`, `FER/1`, `OI/1`, etc.).
---
## 7. Evolution (Informative)
TGK/1-CORE is intended to evolve **additively**:
* New edge types are introduced by assigning new `EdgeTypeId` values in catalogs and profiles.
* New edge tags are introduced by assigning new `TypeTag.tag_id` values to EdgeArtifacts.
* New encodings are introduced by adding new edge encoding profiles and decoders.
Existing EdgeArtifacts and their decoded `EdgeBody` values:
* **MUST NOT** be retroactively reinterpreted to have different logical meaning under TGK/1-CORE; and
* **MUST** remain valid inputs to any future profile sets that claim to support their `TypeTag` and `EdgeTypeId`, subject to the multi-profile behavior rules in §3.2.
Introducing a new edge-encoding profile that begins to treat previously non-edge Artifacts (e.g., with a new `TypeTag` or a previously unused `EdgeTypeId`) as EdgeArtifacts is allowed and considered an additive extension.
It is **not** permitted to change an existing profile or catalog in a way that causes an Artifact that previously decoded to a given `EdgeBody` (under a given `(TypeTag, EdgeTypeId)` and profile set) to be decoded to a different `EdgeBody` in the same environment. Such changes **SHOULD** instead be modeled via new `TypeTag` values and/or new `EdgeTypeId` assignments.
This aligns TGK/1-CORE with the broader Amduat design principle of **“never rewrite history; evolve by addition and projection.”**
---
## Document History
* **0.7.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.

1839
tier1/tgk-prov-1.md Normal file

File diff suppressed because it is too large Load diff

1156
tier1/tgk-store-1.md Normal file

File diff suppressed because it is too large Load diff