diff --git a/tier1/enc-tgk1-edge-1.md b/tier1/enc-tgk1-edge-1.md new file mode 100644 index 0000000..1c67b41 --- /dev/null +++ b/tier1/enc-tgk1-edge-1.md @@ -0,0 +1,738 @@ +# ENC/TGK1-EDGE/1 — Canonical Encoding for TGK EdgeArtifacts + +Status: Approved +Owner: Niklas Rydberg +Version: 0.1.0 +SoT: Yes +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [binary-minimalism, traceability] + + + +**Document ID:** `ENC/TGK1-EDGE/1` +**Profile ID:** `TGK1_EDGE_ENC_V1 = 0x0201` (symbolic; concrete assignment lives in encoding-profile registry) +**Layer:** Edge Encoding Profile (on top of ASL/1-CORE + TGK/1-CORE) + +**Depends on (normative):** + +* `ASL/1-CORE v0.4.x` — value model (`Artifact`, `TypeTag`, `Reference`, `HashId`, identity model) +* `ENC/ASL1-CORE v1.x` — canonical encodings for `Artifact` and `Reference` +* `TGK/1-CORE v0.7.x` — trace graph kernel (`Node`, `EdgeBody`, `EdgeTypeId`, edgehood invariants) + +**Integrates with (informative):** + +* `HASH/ASL1 v0.2.x` — ASL1 hash family for `EdgeRef` identity +* `ASL/1-STORE v0.4.x` — content-addressable store holding EdgeArtifacts +* `SUBSTRATE/STACK-OVERVIEW v0.2.x` — stack layering discipline +* TGK type catalogs (e.g. `TGK/TYPES-CORE`) — `EdgeTypeId` semantics +* Future TGK profiles (`TGK/STORE/1`, `TGK/PROV/1`) that interpret edges + +> The Profile ID `TGK1_EDGE_ENC_V1` is a configuration label. +> It is **not** embedded into edge payloads. Encoders and decoders select this encoding by context (type tag + profile configuration), not per value. + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + + +--- + +## 0. Overview + +`ENC/TGK1-EDGE/1` defines the **canonical, streaming-friendly, injective binary encoding** of the `EdgeBody` structure from `TGK/1-CORE`: + +```text +EdgeBody { + type: EdgeTypeId // uint32 + from: Node[] // Node = Reference + to: Node[] + payload: Reference +} +``` + +and its embedding as TGK **EdgeArtifacts**: + +```text +Artifact { + bytes = EdgeBytes // this profile + type_tag = TYPE_TAG_TGK1_EDGE_V1 +} +``` + +where `EdgeBytes` is a single `OctetString` (sequence of bytes) used as `Artifact.bytes`. + +Under this profile: + +* `EdgeBytes` is the canonical representation of an `EdgeBody`. +* Edge identity is the ASL/1 `Reference` over the EdgeArtifact (`EdgeRef`), derived via `ENC/ASL1-CORE` + `HASH/ASL1`. +* The encoding is: + + * **Injective** — distinct `EdgeBody` values → distinct `EdgeBytes`. + * **Deterministic & stable** — same `EdgeBody` → same `EdgeBytes` across implementations and time. + * **Streaming-friendly** — encoders, decoders, and hashers can operate in a single forward-only pass. + +In line with `TGK/1-CORE`: + +* Each EdgeArtifact encodes **exactly one** logical edge (one `EdgeBody`). +* All TGK edges are represented as ordinary ASL/1 Artifacts plus their ASL `Reference` identities; this profile introduces no additional identity or node/edge ID layer. + +> **Non-goal:** This profile does **not** define what any particular `EdgeTypeId` “means”, nor how graphs are stored, indexed, or traversed. Those behaviors are defined by `TGK/1-CORE`, TGK type catalogs, and higher-layer profiles. + +--- + +## 1. Scope & Layering + +### 1.1 Purpose + +This specification defines: + +* The **binary layout** of: + + * `EdgeBytes` — canonical encoding for `EdgeBody`. + * `EncodedRef` — an internal wrapper for embedding ASL `Reference`s. + +* Canonical field ordering and integer widths. + +* How `EdgeBytes` are bound into EdgeArtifacts and converted into `EdgeRef` identity. + +It does **not** define: + +* TGK graph semantics or provenance algorithms (`TGK/1-CORE`, `TGK/PROV/1`). +* Store or transport APIs (`ASL/1-STORE`, deployment profiles). +* Edge-type catalogs (`TGK/TYPES-*`) or policy. + +### 1.2 Layering constraints + +In line with `SUBSTRATE/STACK-OVERVIEW` and `TGK/1-CORE`: + +* `ENC/TGK1-EDGE/1` is a **TGK edge-encoding profile**, not a kernel primitive. + +* It MUST NOT: + + * redefine `Artifact`, `Reference`, `HashId`, or `TypeTag` (from `ASL/1-CORE`); + * redefine `Node`, `EdgeBody`, or `EdgeTypeId` (from `TGK/1-CORE`); + * embed store, provenance, or policy semantics into its layout. + +* It defines exactly one canonical encoding for `EdgeBody` values under the profile ID `TGK1_EDGE_ENC_V1`. + +TGK/1-CORE sees this profile as providing a partial function: + +```text +decode_edge_payload_TGK1_EDGE : + OctetString -> EdgeBody | error +``` + +that is: + +* **partial** — may fail with an error for some inputs; +* **deterministic** — a pure function of its input bytes, with no dependence on environment or mutable state; +* **side-effect free** — decoding does not consult stores, catalogs, or policy. + +Artifacts whose `type_tag` selects this profile use `decode_edge_payload_TGK1_EDGE` as their TGK edge decoder in the sense of `TGK/1-CORE §3.2`. + +--- + +## 2. Conventions + +### 2.1 RFC 2119 terms + +The key words **MUST**, **MUST NOT**, **SHOULD**, **MAY**, etc. are to be interpreted as described in RFC 2119. + +### 2.2 Integer encodings + +All multi-byte integers are encoded as **big-endian** (network byte order), as in `ENC/ASL1-CORE`: + +* `u8` — 1 byte +* `u16` — 2 bytes +* `u32` — 4 bytes +* `u64` — 8 bytes + +Only fixed-width integers are used. + +### 2.3 Lists + +A list of values of some type `T` is encoded as: + +```text +List :: + count (u32) + element_0 + element_1 + ... + element_{count-1} +``` + +* `count` is the number of elements (MAY be zero). +* Elements are encoded in order using the canonical encoding of `T`. + +### 2.4 Embedded Reference (`EncodedRef`) + +Within `EdgeBytes`, ASL/1 `Reference` values are embedded using a length-prefixed wrapper over canonical `ReferenceBytes` from `ENC/ASL1-CORE`: + +```text +EncodedRef :: + ref_len (u32) + ref_bytes (byte[0..ref_len-1]) // canonical ReferenceBytes +``` + +Where: + +* `ref_bytes` MUST be the canonical `ReferenceBytes` encoding of some `Reference` value under `ENC/ASL1-CORE v1.x`: + + ```text + ReferenceBytes :: + hash_id (u16) + digest (byte[...]) // remaining bytes in the frame + ``` + +* `ref_len` MUST be the exact byte length of `ref_bytes` and MUST be ≥ 2. + +Decoders MUST: + +1. Read `ref_len (u32)`. +2. Read exactly `ref_len` bytes as `ref_bytes`. +3. Decode `ref_bytes` as `ReferenceBytes` per `ENC/ASL1-CORE v1.x`. +4. Reject encodings where: + + * `ref_len < 2`, or + * `ref_bytes` is not a valid `ReferenceBytes` sequence (e.g. truncated or improperly framed in its context). + +If the implementation also implements `HASH/ASL1` and recognizes the decoded `hash_id`, it MUST apply any length checks required by `ENC/ASL1-CORE` / `HASH/ASL1` for that `HashId` (e.g. fixed digest length). Failures MUST be treated as encoding/integrity errors. + +`EncodedRef` is purely an internal framing wrapper for this profile; it introduces no additional semantics beyond “a `Reference` encoded canonically and length-prefixed so it can be embedded in larger structures”. + +This pattern mirrors `EncodedRef` from `ENC/PEL-TRACE-DAG/1` for cross-profile consistency. + +### 2.5 Encoding version field (`edge_version`) + +`EdgeBytes` includes an `edge_version (u16)` field: + +* For `TGK1_EDGE_ENC_V1`, encoders **MUST** always write `edge_version = 1`. +* Decoders for this profile: + + * **MUST** accept `edge_version = 1`; and + * **MUST** treat any other value as “**not this encoding**” and fail decoding. + +Within this profile, `edge_version` is a **guard word**, not an evolution mechanism: + +* This document will never assign any other meaning than “constant value 1” to `edge_version` for `TGK1_EDGE_ENC_V1`. +* Values other than `1` simply indicate that the bytes are not an `EdgeBytes` value for this profile. + +Any incompatible change to the `EdgeBytes` layout MUST be expressed as a **new encoding profile** (e.g. `TGK1_EDGE_ENC_V2` with its own Profile ID, and almost certainly a new `TypeTag`), not by reusing this profile with `edge_version = 2`. + +Append-only extensions that would change the canonical mapping from `EdgeBody` to bytes are also out of scope for this profile; they belong in new profiles. Canonical `EdgeBody → EdgeBytes` mapping for `TGK1_EDGE_ENC_V1` is fixed and permanently tied to `edge_version = 1`. + +--- + +## 3. Logical Model Reference (from TGK/1-CORE) + +> **Source of truth:** `TGK/1-CORE`. +> This section is an informative restatement; in any conflict, `TGK/1-CORE` governs. + +### 3.1 Node + +```text +Node := Reference // ASL/1 Reference +``` + +Nodes are graph vertices identified solely by their `Reference` value. + +### 3.2 EdgeTypeId + +```text +EdgeTypeId = uint32 +``` + +Semantics of particular `EdgeTypeId` values are defined by TGK type catalogs and profiles, not by this document. + +### 3.3 EdgeBody + +```text +EdgeBody { + type: EdgeTypeId + from: Node[] // ordered, MAY be empty + to: Node[] // ordered, MAY be empty + payload: Reference // always present +} +``` + +Relevant invariant from `TGK/1-CORE`: + +> **TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1** +> For a well-formed `EdgeBody`, at least one of `from` or `to` **MUST** be non-empty. +> An `EdgeBody` with both `from = []` and `to = []` is invalid and MUST NOT be produced or accepted as a TGK edge. + +Other notes from `TGK/1-CORE`: + +* Duplicates within `from` or `to` are allowed. +* `payload` may also appear in `from` or `to`. +* Semantics of such patterns, if any, are profile-specific. + +`ENC/TGK1-EDGE/1` encodes exactly these fields and MUST NOT introduce additional logical data at the `EdgeBody` level. + +--- + +## 4. EdgeBody Encoding + +### 4.1 Overall layout: `EdgeBytes` + +The canonical encoding of an `EdgeBody` under `TGK1_EDGE_ENC_V1` is a single self-contained byte sequence: + +```text +EdgeBytes :: + edge_version (u16) + type_id (u32) // EdgeTypeId + from_count (u32) + from_nodes (EncodedRef[0..from_count-1]) + to_count (u32) + to_nodes (EncodedRef[0..to_count-1]) + payload_ref (EncodedRef) +``` + +`EdgeBytes` is treated as an indivisible frame. When embedded in larger structures or protocols, the enclosing layer is responsible for providing the frame boundaries (e.g. via a length-prefix or message framing). + +Field roles: + +1. **edge_version (u16)** + + * Guard word for this encoding profile. + * For `TGK1_EDGE_ENC_V1`, encoders **MUST** set `edge_version = 1` for all values. + * Decoders for this profile: + + * **MUST** accept `edge_version = 1`; and + * **MUST** treat any other value as “not a `TGK1_EDGE_ENC_V1` edge payload” and fail decoding. + + `edge_version` is not a version knob for evolving `TGK1_EDGE_ENC_V1`; it is a constant sanity check to quickly reject mismatched bytes. + +2. **type_id (u32)** + + * Encodes `EdgeBody.type : EdgeTypeId`. + * The meaning of each `EdgeTypeId` value is external to this spec. + +3. **from_count (u32)** and **from_nodes** + + * `from_count` is the length of `EdgeBody.from`. + * `from_nodes` is a list of `from_count` `EncodedRef` entries, each encoding a `Node` (i.e. a `Reference`). + * Order MUST match the logical `from` list; duplicates are allowed; MAY be zero-length. + +4. **to_count (u32)** and **to_nodes** + + * `to_count` is the length of `EdgeBody.to`. + * `to_nodes` is a list of `to_count` `EncodedRef` entries. + * Order MUST match the logical `to` list; duplicates are allowed; MAY be zero-length. + +5. **payload_ref (EncodedRef)** + + * Encodes `EdgeBody.payload : Reference`. + * Always present and encoded as a single `EncodedRef`. + +### 4.2 Encoding procedure (normative) + +Let `E` be a logical `EdgeBody` value. The canonical encoding function: + +```text +encode_edgebody_tgk1_v1 : EdgeBody -> EdgeBytes +``` + +is defined as: + +1. Set `edge_version = 1`. + +2. Emit `edge_version` as `u16`. + +3. Emit `E.type` as `type_id (u32)`. + +4. Let `from_count = len(E.from)`; emit `from_count (u32)`. + +5. For each `Node` in `E.from` in order: + + * Let `R` be that `Node` (an ASL `Reference` value). + * Encode `R` as canonical `ReferenceBytes` using `ENC/ASL1-CORE v1.x`. + * Wrap as `EncodedRef` (see §2.4) and append. + +6. Let `to_count = len(E.to)`; emit `to_count (u32)`. + +7. For each `Node` in `E.to` in order: + + * Encode as `EncodedRef` as above and append. + +8. Encode `E.payload` as canonical `ReferenceBytes`, wrap as `EncodedRef`, and append as `payload_ref`. + +9. Enforce the TGK non-empty endpoint invariant at encoding time: + + * If `from_count == 0` **and** `to_count == 0`, the encoder MUST fail and MUST NOT produce `EdgeBytes` for this `EdgeBody` under this profile. + +> **TGK1-EDGE-NONEMPTY/ENC/1** +> Encoders for `TGK1_EDGE_ENC_V1` **MUST** reject any attempt to encode an `EdgeBody` with `from = []` and `to = []`. +> Such a value is not a well-formed TGK edge per `TGK/1-CORE` and MUST NOT be emitted as an EdgeArtifact payload. + +### 4.3 Decoding procedure (normative) + +Given a byte slice known to contain exactly one `EdgeBytes` frame under this profile, the canonical decoding function: + +```text +decode_edgebody_tgk1_v1 : EdgeBytes -> EdgeBody | error +``` + +is defined as: + +1. Read `edge_version (u16)`. + + * If `edge_version != 1`, fail with an encoding error (e.g. “not `TGK1_EDGE_ENC_V1`”). + +2. Read `type_id (u32)`. + +3. Read `from_count (u32)`. + + * For `i = 0 .. from_count-1`, read and decode one `EncodedRef` as a `Reference` and append to `from_nodes`. + +4. Read `to_count (u32)`. + + * For `j = 0 .. to_count-1`, read and decode one `EncodedRef` and append to `to_nodes`. + +5. Read `payload_ref` as a single `EncodedRef` and decode to `payload : Reference`. + +6. If `from_count == 0` **and** `to_count == 0`, fail with an encoding error: + + * This violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1` and `TGK1-EDGE-NONEMPTY/ENC/1`. + +7. If the decoding context expects an isolated `EdgeBytes` value: + + * After step 5 (or 6), if any unread bytes remain in the slice, the decoder MUST treat this as an encoding error (trailing data). + +8. Construct and return: + + ```text + EdgeBody { + type = EdgeTypeId(type_id) + from = from_nodes + to = to_nodes + payload = payload + } + ``` + +Decoders MUST additionally treat as encoding errors: + +* truncated sequences (insufficient bytes for any declared field or `EncodedRef`); +* invalid `EncodedRef` encodings (see §2.4); +* any integer reads that cannot be completed because the input ends early. + +`decode_edgebody_tgk1_v1` MUST be deterministic and MUST NOT depend on any external configuration beyond: + +* the bytes in the `EdgeBytes` frame; and +* the static definition of `ENC/ASL1-CORE v1.x` used to decode embedded `ReferenceBytes`. + +Recognition of `type_id` values (as supported or not in a given ExecutionEnvironment) is handled by `TGK/1-CORE` and the local catalog. This profile always decodes the raw `EdgeBody` structure, regardless of whether the environment later chooses to treat it as an EdgeArtifact. + +--- + +## 5. EdgeArtifact Binding & Profile Selection + +### 5.1 EdgeArtifact shape + +Under this profile, EdgeArtifacts MUST be ASL/1 Artifacts of the form: + +```text +Artifact { + bytes = EdgeBytes + type_tag = TYPE_TAG_TGK1_EDGE_V1 +} +``` + +Where: + +* `TYPE_TAG_TGK1_EDGE_V1` is a `TypeTag` whose concrete `tag_id`: + + * is assigned in the global TypeTag registry, and + * is included in the environment’s `EDGE_TAG_SET` when this profile is active. + +ExecutionEnvironments that wish to treat such Artifacts as TGK edges MUST: + +* include `TYPE_TAG_TGK1_EDGE_V1.tag_id` in their configured `EDGE_TAG_SET`; and +* register `TGK1_EDGE_ENC_V1` as the edge-encoding profile for that tag, so that `decode_edge_payload_TGK1_EDGE` is used for those Artifacts’ `bytes`. + +This document treats `TYPE_TAG_TGK1_EDGE_V1` symbolically and does not assign a numeric `tag_id`. + +### 5.2 Integration with TGK/1-CORE’s `decode_edge_payload_P` + +For ExecutionEnvironments that activate `TGK1_EDGE_ENC_V1` for `TYPE_TAG_TGK1_EDGE_V1`, the corresponding `decode_edge_payload_P` function from `TGK/1-CORE §3.2` is: + +```text +decode_edge_payload_TGK1_EDGE(bytes: OctetString) -> EdgeBody | error +``` + +defined by: + +```text +decode_edgebody_tgk1_v1(bytes) +``` + +from §4.3. + +Conformant implementations MUST: + +* apply `decode_edge_payload_TGK1_EDGE` only to Artifacts whose `type_tag.tag_id` is configured to use this profile; and +* treat any decoding failure as “not a valid edge payload for this profile”. + +Multi-profile behavior (e.g., co-existence with other edge encodings) is governed by `TGK/1-CORE §3.2`. In particular: + +* If more than one active profile successfully decodes the same `Artifact.bytes`, all such profiles MUST decode to the same logical `EdgeBody` value. +* If two active profiles decode the same bytes to different `EdgeBody` values, the ExecutionEnvironment MUST NOT treat that Artifact as an EdgeArtifact until the conflict is resolved. + +--- + +## 6. EdgeRef Identity via ASL/1-CORE + +Given: + +* `EdgeBytes` from §4; + +* an `EdgeArtifact`: + + ```text + A_edge = Artifact { + bytes = EdgeBytes + type_tag = TYPE_TAG_TGK1_EDGE_V1 + } + ``` + +* `ENC/ASL1-CORE v1.x` for canonical `ArtifactBytes`; + +* a hash algorithm `H` with `HashId = HID` from `HASH/ASL1`, + +the canonical `EdgeRef : Reference` (the edge identity) is: + +```text +ArtifactBytes = encode_artifact_core_v1(A_edge) +digest = H(ArtifactBytes) +EdgeRef = Reference { hash_id = HID, digest = digest } +``` + +This profile does not introduce any new identity scheme. Edge identity is entirely determined by: + +* the ASL/1 Artifact identity model, +* the selected encoding profile (typically `ASL_ENC_CORE_V1`), and +* the selected hash algorithm (`HASH/ASL1`). + +--- + +## 7. Canonicality & Injectivity + +### 7.1 Injectivity + +> **TGK1-EDGE-INJECTIVE/ENC/1** +> Under `TGK1_EDGE_ENC_V1`, the mapping: +> +> ```text +> EdgeBody -> EdgeBytes +> ``` +> +> MUST be injective. That is, for any two `EdgeBody` values `E1` and `E2`: +> +> ```text +> E1 != E2 ⇒ encode_edgebody_tgk1_v1(E1) != encode_edgebody_tgk1_v1(E2) +> ``` + +This is ensured by: + +* encoding all logical fields (`type`, `from`, `to`, `payload`); +* preserving list order exactly; +* using a fixed, explicit binary layout. + +### 7.2 Stability + +For the fixed profile `TGK1_EDGE_ENC_V1` (with the guard word `edge_version = 1`): + +* The same logical `EdgeBody` MUST always encode to the same `EdgeBytes` across: + + * implementations, + * platforms, + * executions, + * and time. + +Encoders MUST NOT: + +* reorder elements of `from` or `to`; +* alter integer widths or endianness; +* introduce alternative layouts for any field; +* use any `edge_version` other than `1`. + +--- + +## 8. Error Handling (Encoding Layer) + +Decoders for this profile MUST treat as **encoding errors** (to be surfaced as some error category at the API boundary): + +1. **Guard word mismatch** + + * `edge_version != 1`. + +2. **Truncated fields** + + * Not enough bytes to read any declared field (`u16`, `u32`, `EncodedRef`, list elements). + +3. **Invalid `EncodedRef`** + + * `ref_len < 2`; or + * `ref_bytes` is not a valid `ReferenceBytes` sequence per `ENC/ASL1-CORE v1.x`; or + * (when `HASH/ASL1` is implemented and `hash_id` is known) the digest length implied by `ref_bytes` does not match the canonical length for that `HashId`. + +4. **Empty endpoints** + + * `from_count == 0` **and** `to_count == 0` (violation of `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1`). + +5. **Inconsistent list lengths** + + * Fewer actual `EncodedRef` entries than indicated by `from_count` or `to_count`. + +6. **Trailing data in isolated contexts** + + * Additional bytes remaining after a full `EdgeBytes` value has been decoded, when the decoding context expects exactly one `EdgeBytes` frame. + +Translating these into concrete error codes (e.g. `ERR_TGK1_EDGE_ENC_INVALID`) is implementation-specific, but MUST result in rejection of the payload as an `EdgeBytes` value under this profile. + +Semantic errors about `EdgeTypeId` recognition or edge-type-specific constraints are handled by TGK catalogs and higher profiles, not at the encoding layer. + +--- + +## 9. Streaming & Implementation Notes + +Implementations MUST be able to encode and decode `EdgeBytes` in a **single forward-only pass**: + +* All length prefixes (`from_count`, `to_count`, `ref_len`) precede their content. +* Decoders MUST NOT require backtracking to interpret the structure. + +For large edges (many endpoints): + +* Encoders MAY stream `EncodedRef` entries as they are generated. +* Decoders MAY stream `EncodedRef` entries to consumers or hashers as they are read. + +Any such streaming strategy MUST be observationally equivalent to decoding the entire `EdgeBytes` into an `EdgeBody` in memory and MUST respect the canonical layout. + +--- + +## 10. Conformance + +An implementation is **ENC/TGK1-EDGE/1–conformant** if, for `TGK1_EDGE_ENC_V1`, it: + +1. **Implements canonical EdgeBody encoding/decoding** + + * Implements `encode_edgebody_tgk1_v1` and `decode_edgebody_tgk1_v1` exactly as specified in §4. + * Always writes `edge_version = 1` when encoding. + * Accepts only `edge_version = 1` and treats any other value as “not this encoding”. + +2. **Uses `EncodedRef` correctly** + + * Embeds `Reference` values via `EncodedRef` as in §2.4. + * Uses canonical `ReferenceBytes` from `ENC/ASL1-CORE v1.x` when forming `ref_bytes`. + * Applies `HASH/ASL1` length checks for known `HashId`s when available. + +3. **Enforces TGK invariants at the encoding layer** + + * Rejects encodings with both `from` and `to` empty (`TGK1-EDGE-NONEMPTY/ENC/1`). + * Treats malformed payloads as encoding errors as per §8. + +4. **Binds EdgeBytes into EdgeArtifacts correctly** + + * When forming EdgeArtifacts, sets: + + ```text + Artifact.bytes = EdgeBytes + Artifact.type_tag = TYPE_TAG_TGK1_EDGE_V1 + ``` + + * Does not embed additional logical data into the Artifact beyond `EdgeBody` and `type_tag`. + +5. **Derives EdgeRef identity via ASL/1-CORE** + + * Uses `ENC/ASL1-CORE v1` and `HASH/ASL1` for identity, as in §6. + * Does not introduce alternative edge identity mechanisms at this layer. + +6. **Integrates with TGK/1-CORE profile selection** + + * Applies `decode_edge_payload_TGK1_EDGE` only to Artifacts whose `type_tag.tag_id` is configured for this profile. + * Respects multi-profile behavior rules from `TGK/1-CORE §3.2` when other edge encodings are also active. + +7. **Preserves injectivity and stability** + + * Distinct `EdgeBody` values always produce distinct `EdgeBytes`. + * The same `EdgeBody` always produces the same `EdgeBytes` under this profile. + +Everything else — storage layout, access protocols, graph indexes, provenance algorithms, and edge-type semantics — is defined by `ASL/1-STORE`, `TGK/1-CORE`, TGK catalogs, and higher-layer profiles. + +--- + +## 11. Informative Example (Sketch) + +> Non-normative; values and hex are illustrative only. + +Consider an edge: + +```text +EdgeBody { + type = 0x00000010 // EDGE_EXECUTION (for example) + from = [N_prog, N_input] + to = [N_output] + payload = R_receipt +} +``` + +Where `N_prog`, `N_input`, `N_output`, and `R_receipt` are `Reference` values with canonical `ReferenceBytes`: + +```text +Ref(N_prog) = ReferenceBytes(N_prog) // length = len_pg, bytes = bytes_pg +Ref(N_input) = ReferenceBytes(N_input) // length = len_in, bytes = bytes_in +Ref(N_output) = ReferenceBytes(N_output) // length = len_out, bytes = bytes_out +Ref(R_receipt) = ReferenceBytes(R_receipt) // length = len_rc, bytes = bytes_rc +``` + +Then `EdgeBytes` under this profile are: + +```text +edge_version = 0001 ; u16 (guard word) + +type_id = 00000010 ; u32 + +from_count = 00000002 ; 2 sources +from_nodes = + 000000?? bytes_pg ... ; EncodedRef(N_prog) + 000000?? bytes_in ... ; EncodedRef(N_input) + +to_count = 00000001 ; 1 target +to_nodes = + 000000?? bytes_out ... ; EncodedRef(N_output) + +payload_ref = + 000000?? bytes_rc ... ; EncodedRef(R_receipt) +``` + +Where each `EncodedRef(X)` is: + +```text +ref_len(X) (u32) || ReferenceBytes(X) +``` + +These `EdgeBytes` become `Artifact.bytes` for an EdgeArtifact with `type_tag = TYPE_TAG_TGK1_EDGE_V1`. All conformant encoders MUST produce the same bytes for the same logical `EdgeBody`; all conformant decoders MUST reconstruct the same `EdgeBody` from those bytes. + +--- + +**End of `ENC/TGK1-EDGE/1 v0.1.0 — Canonical Encoding for TGK EdgeArtifacts` (draft).** + +--- + +## Document History + +* **0.1.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline. diff --git a/tier1/opreg-tgk-docgraph-1.md b/tier1/opreg-tgk-docgraph-1.md new file mode 100644 index 0000000..7073109 --- /dev/null +++ b/tier1/opreg-tgk-docgraph-1.md @@ -0,0 +1,240 @@ +# OPREG/TGK-DOCGRAPH/1 — Document Graph Registry + +Status: Draft +Owner: Architecture +Version: 0.1.0 +SoT: Plan +Last Updated: 2025-12-01 +Linked Phase Pack: PH12 +Tags: [registry, tgk, docgraph] + + + +**Document ID:** `OPREG/TGK-DOCGRAPH/1` +**Layer:** L1 Profile (TGK Doc Graph Registry over `TGK/1-CORE` + `ENC/TGK1-EDGE/1`) + +**Depends on (normative):** + +* `ASL/1-CORE v0.4.x` — `Artifact`, `Reference`, `TypeTag`, `HashId` +* `ENC/ASL1-CORE v1.x` — canonical encodings for Artifacts and References +* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HASH-ASL1-256`) +* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId` +* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for `EdgeBody` / EdgeArtifacts +* `AMDUAT-DOCID` (Tier-0) — document identity and SoT/surface model + +**Integrates with (informative):** + +* `TGK/STORE/1` — graph store/query profile over ASL/1-STORE + TGK +* ADR-032 and PH10/PH12 import designs (RΩ / export) +* Future doc graph consumers (assistant overlays, IDX, provenance views) + +© 2025 Amduat Programme. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + +--- + +## 0. Purpose and Non-Goals + +### 0.1 Purpose + +`OPREG/TGK-DOCGRAPH/1` defines a **doc/import/navigation graph registry** for Amduat: + +* It names **node concepts** (as ASL/1 Artifacts) for: + * conceptual documents (DOCID lineages), + * document versions at a given snapshot (e.g. RΩ), + * Git commits and blobs, + * Amduat SoT instances. +* It names **edge types** (`EdgeTypeId`s) that connect those concepts: + * document ↔ version, surface, SoT state, + * version ↔ Git blob/commit, + * document ↔ Amduat instance. +* It constrains how those edges are represented as EdgeArtifacts under + `ENC/TGK1-EDGE/1` and consumed via `TGK/STORE/1`. + +This registry is intentionally **doc/import scoped**. Execution, fact, and +certificate edges live in their own TGK/OPREG registries and MUST NOT reuse +`EdgeTypeId` assignments from this doc graph registry. + +This Tier-1 stub is the **canonical registry companion** to the PH12 design +note `PH12-EV-IMPORT-001 — Doc Graph OPREG Profile Design +(/logs/ph12/evidence/import/PH12-EV-IMPORT-001/opreg-tgk-docgraph-design-20251201.md)`, +which records design intent and sandbox experience; this document is the SoT +for the node and edge vocabulary. + +### 0.2 Non-goals + +This registry does **not** define: + +* any storage API (`ASL/1-STORE`, `TGK/STORE/1` already cover that), +* any provenance algorithms or queries (`TGK/PROV/1` and higher layers), +* any assistant or overlay behavior (those consume this registry), +* concrete import/export profiles (ADR-032 handles those). + +It only defines **concepts and edge types**; encoding and storage use existing +Tier-1 profiles. + +--- + +## 1. Node Concepts (Informative overview) + +This section summarizes node concepts; canonical encodings and type_tags are +defined in companion encoding profiles (TBD). + +### 1.1 DOC_CONCEPT + +Conceptual governed document identity per `AMDUAT-DOCID`: + +* `identity_authority` (string), +* `lineage_id` (string), +* optional `doc_code` (string), +* optional `code_status` (e.g. `tentative`, `stable`). + +There is exactly one `DOC_CONCEPT` node per `(identity_authority, lineage_id)`. + +### 1.2 DOC_VERSION + +Versioned SoT slice of a governed document at a snapshot commit: + +* `identity_authority`, `lineage_id`, `doc_code`, `code_status`, +* `g_commit` (Git commit id), +* `sha256` (content hash of the doc bytes at `g_commit`), +* `path` (repository path at `g_commit`, e.g. `/amduat/tier0/docid.md`), +* `surface`, `sot` (SoT state) per DOCID header. + +Multiple `DOC_VERSION` nodes may exist for a `DOC_CONCEPT` across commits. + +### 1.3 GIT_COMMIT + +Git commit metadata: + +* `commit` (sha1), +* `parents` (list of parent commit ids), +* `tree` (tree id), +* `author_name`, `author_email`, `authored_at`, +* `committer_name`, `committer_email`, `committed_at`, +* summary or truncated message. + +### 1.4 GIT_BLOB + +Content snapshot for a single blob at `g_commit`: + +* `blob_sha` (sha1), +* `sha256` (content hash), +* `size_bytes`, +* `mode` (tree mode, including exec/symlink bits), +* `path` at `g_commit`. + +### 1.5 AMDUAT_INSTANCE + +Descriptor for an Amduat SoT instance: + +* `g_commit` (RΩ commit), +* `store_root` (SoT store root), +* `store_backend_id`, +* references to RΩ FER/1 receipts and manifests, +* optional labels (environment, hostname, etc.). + +### 1.6 Helper nodes + +* `SURFACE` — surface classification nodes (e.g. `tier0`, `tier1`, `phase`, `evidence`). +* `SOT_STATE` — SoT state nodes (`Yes`, `Plan`, `Ref`). + +--- + +## 2. Edge Types (Doc Graph Domain) + +`EdgeTypeId` values in this registry are reserved for doc/import/navigation +edges. Concrete numeric assignments live in the encoding/catalogue layer. + +Implementations and other OPREG registries MUST treat these `EdgeTypeId`s as +belonging exclusively to the **Amduat doc graph domain**: + +* the eventual allocation for this registry is expected to reserve a contiguous + `EdgeTypeId` band (informally: an `AMDUAT-DOCGRAPH` band), +* only doc/import/navigation semantics (edges in §§2.1–2.4) may occupy that + band, +* PEL execution, FER/1, CIL, FCT, and other TGK domains MUST use their own + registries and bands. + +### 2.1 Identity & version edges + +* `EDGE_DOC_HAS_VERSION` + `DOC_CONCEPT → DOC_VERSION` — this version belongs to this conceptual document. + +* `EDGE_VERSION_OF` + `DOC_VERSION → DOC_CONCEPT` — reverse link; derivable from `EDGE_DOC_HAS_VERSION`. + +* `EDGE_DOC_HAS_IDENTITY` + `DOC_VERSION → DOC_CONCEPT` — DOCID identity is attached to this version. + +### 2.2 Surface & SoT edges + +* `EDGE_DOC_ON_SURFACE` + `DOC_VERSION → SURFACE` — surface classification (governance/spec/phase/evidence). + +* `EDGE_DOC_SOT` + `DOC_VERSION → SOT_STATE` — SoT status (`Yes`, `Plan`, `Ref`) for this version. + +### 2.3 Git provenance edges + +* `EDGE_VERSION_HAS_BLOB` + `DOC_VERSION → GIT_BLOB` — ties a document version to the blob at `g_commit`. + +* `EDGE_VERSION_FROM_COMMIT` + `DOC_VERSION → GIT_COMMIT` — last commit that touched this path at/before the snapshot. + +### 2.4 SoT instance edges + +* `EDGE_DOC_MEMBER_OF_AMDUAT` + `DOC_CONCEPT → AMDUAT_INSTANCE` — this document is part of a particular Amduat instance. + +--- + +## 3. Encoding & Store Integration (Summary) + +All doc-graph edges: + +* are represented as TGK `EdgeBody` values with `EdgeTypeId` from this registry, +* are encoded as EdgeArtifacts via `ENC/TGK1-EDGE/1` using `TYPE_TAG_TGK1_EDGE_V1`, +* derive `EdgeRef` identities via `HASH/ASL1` over `EdgeBytes`, +* live in ASL/1-STORE instances alongside other Artifacts. + +Nodes (`DOC_CONCEPT`, `DOC_VERSION`, `GIT_COMMIT`, `GIT_BLOB`, `AMDUAT_INSTANCE`, etc.) are ordinary +ASL/1 Artifacts; their `Reference`s are the TGK nodes. + +`TGK/STORE/1` provides query semantics over the resulting graph. + +JSON overlays or other projected views (for example, PH12 doc graph sandboxes) +MAY be emitted for human navigation and experiments, but they are always +derived from the underlying node Artifacts and EdgeArtifacts governed by this +registry and `ENC/TGK1-EDGE/1`; overlays are never the source of truth for +doc graph semantics. + +--- + +## 4. Ingest & Encoder Interaction (Informative) + +Implementations are expected to: + +* materialise node Artifacts per this registry (and companion encoding profiles), +* emit FER/1 receipts for ingest pipelines, +* emit an idempotent edge worklist (doc-edge queue) that references `EdgeTypeId`s + from this registry and node `Reference`s, +* use a separate encoder to turn worklist items into EdgeArtifacts using `ENC/TGK1-EDGE/1`, + writing them into ASL/1-STORE for consumption via `TGK/STORE/1`. + +Details of worklist format and encoder scheduling are left to PH12/PHB01 +implementation notes; this registry only fixes the conceptual node/edge space. diff --git a/tier1/tgk-1-core.md b/tier1/tgk-1-core.md new file mode 100644 index 0000000..80dc823 --- /dev/null +++ b/tier1/tgk-1-core.md @@ -0,0 +1,671 @@ + +# TGK/1-CORE — Trace Graph Kernel (Core) + +Status: Approved +Owner: Niklas Rydberg +Version: 0.7.0 +SoT: Yes +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [traceability, execution] + + + +**Document ID:** `TGK/1-CORE` +**Layer:** L1.5 — Logical graph kernel over ASL/1 (above ASL/1, orthogonal to PEL/1) + +**Depends on (normative):** + +* `ASL/1-CORE v0.3.x` — value substrate: `Artifact`, `Reference`, `TypeTag`, identity model + +**Informative references:** + +* `ENC/ASL1-CORE v1.0.x` — canonical encodings for ASL/1 values (`ArtifactBytes`, `ReferenceBytes`) +* `HASH/ASL1 v0.2.x` — ASL1 hash family +* `ASL/1-STORE v0.3.x` — content-addressable store semantics +* `PEL/1` — execution substrate +* `CIL/1`, `FCT/1`, `FER/1`, `OI/1` — higher-layer profiles built on top of TGK/1 +* (future) `ENC/TGK1-EDGE` — canonical edge-encoding profile +* (future) `TGK/STORE/1` — graph store and query semantics +* (future) `TGK/PROV/1` — provenance and trace semantics + +> **Versioning note** +> TGK/1-CORE is agnostic to minor revisions of these informative documents, provided they preserve: +> +> * the ASL/1-CORE definitions of `Artifact`, `Reference`, and `TypeTag`, and +> * the existence of canonical encodings and hash families consistent with that model. + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + + +--- + +## 0. Conventions + +### 0.1 RFC 2119 terminology + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, +**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be +interpreted as described in RFC 2119. + +### 0.2 Terms from ASL/1 + +This specification reuses the following terms from `ASL/1-CORE`: + +* **Artifact** — immutable logical value: + + ```text + Artifact { + bytes: OctetString + type_tag: optional TypeTag + } + ``` + +* **Reference** — content address (logical identity handle) for an Artifact: + + ```text + Reference { + hash_id: HashId + digest: OctetString + } + ``` + +* **TypeTag** — opaque `uint32` identifying intended interpretation of an Artifact. + +* **HashId** — `uint16` identifying a hash algorithm (e.g. from `HASH/ASL1`). + +Where this document says **ArtifactRef**, it means an ASL/1 `Reference` that (logically) points to an `Artifact`. TGK/1-CORE does **not** assume the corresponding Artifact is present or retrievable in any particular store. + +### 0.3 Additional terminology + +* **Node** — synonym for an ASL/1 `Reference` when used as a graph vertex. +* **EdgeBody** — the logical structure of a TGK edge (see §2.2). +* **EdgeArtifact** — an ASL/1 `Artifact` whose payload logically encodes an `EdgeBody` (see §3). +* **EdgeRef** — the ASL/1 `Reference` to an `EdgeArtifact`. +* **EdgeTypeId** — `uint32` identifying the semantic type of an edge (see §2.3). +* **ProvenanceGraph** — the logical graph derived from a set of Artifacts and TGK/1 edge semantics (see §4). +* **ExecutionEnvironment** — a concrete deployment context characterized by: + + * a **logical snapshot**: a finite set of Artifacts visible at that point in time; and + * a fixed configuration of TGK-related profiles (edge encodings, type catalogs, provenance policies, etc.) “in effect” at that snapshot. + +All invariants and uniqueness claims are evaluated with respect to such a finite snapshot. + +> **Source-agnostic note (informative)** +> The `Artifacts` set for a snapshot may be aggregated from any combination of ASL/1-STORE instances, archives, exports, or other sources. TGK/1-CORE is indifferent to where Artifacts come from or how they are stored; it operates purely on their logical values and `Reference`s. + +TGK/1-CORE defines only **logical structures** and their equality / identity semantics. Physical storage, indexes, query APIs, and provenance algorithms are defined by separate profiles. + +--- + +## 1. Purpose, Scope & Non-Goals + +### 1.1 Purpose + +`TGK/1-CORE` defines the **minimal logical graph kernel over ASL/1 Artifacts**. + +It provides: + +* A definition of: + + * **Nodes** as ASL/1 `Reference` values (ArtifactRefs); and + * **Edges** as EdgeArtifacts whose payloads decode to `EdgeBody` values. +* A way to view any snapshot of an ExecutionEnvironment (finite set of Artifacts + configured profiles) as a **ProvenanceGraph** that is a **pure projection** over: + + * immutable Artifacts (including edge Artifacts), and + * published edge-type specifications and encoding profiles. +* A base vocabulary that higher profiles (PEL/1 integration, certification, facts, overlays, provenance) can use to declare: + + * how they encode their relationships into edge Artifacts; and + * how provenance traces are computed as projections over the resulting graph. + +In other words: + +> TGK/1-CORE makes “graph over artifacts” a first-class, **purely logical** notion, with all evidence residing in ASL/1 Artifacts. + +> **TGK/EDGE-AS-ARTIFACT/CORE/1** +> All TGK edges **MUST** be represented as ASL/1 Artifacts (“EdgeArtifacts”), and all references to edges **MUST** be ordinary ASL `Reference`s (“EdgeRef”). TGK/1-CORE **MUST NOT** introduce any separate identity scheme for edges. + +### 1.2 Provenance kernel invariant & determinism + +> **TGK/PROV-KERNEL/CORE/1** +> For any ExecutionEnvironment considered at a particular **logical snapshot** (a finite set of Artifacts and the profile set in effect at that point): +> +> * the corresponding `ProvenanceGraph` (as defined in §4) is a **pure function** of: +> +> * that Artifact set, and +> * the profiles’ decoding / edge-derivation rules; and +> * any persisted graph indexes or materialized views are **optimizations only** and **MUST** be consistent with this projection. + +> **TGK/DET/CORE/1** +> For a fixed snapshot and fixed profile set, any two TGK/1-CORE–conformant implementations **MUST** derive isomorphic `ProvenanceGraph`s (identical edge and node sets, up to set equality). No aspect of the graph may depend on wall-clock time, process identity, storage layout, or other non-declared environment state. + +> **TGK/NO-OFF-GRAPH-PROV/CORE/1** +> Any relationship that is intended to participate in TGK-level provenance **MUST** be representable as: +> +> * an EdgeArtifact whose payload decodes to an `EdgeBody`, and +> * Nodes (ASL `Reference`s) in its `from` / `to` / `payload` fields. +> +> TGK/1-CORE and its profiles **MUST NOT** rely on hidden, mutable, non-Artifactual state to represent provenance-relevant relationships. + +TGK/1-CORE itself does **not** define a particular provenance algorithm; that is the role of `TGK/PROV/1` and higher-layer profiles. + +### 1.3 Non-goals + +TGK/1-CORE explicitly does **not** define: + +* Canonical binary encodings or hashing rules for edges (delegated to edge-encoding profiles such as `ENC/TGK1-EDGE` and the ASL substrate stack). +* Store APIs, physical graph storage, or indexing strategies (delegated to `TGK/STORE/1` and implementation design). +* Error codes, authorization, or transport protocols. +* A query or provenance language (delegated to `TGK/PROV/1`, overlays, or higher-level APIs). +* Global registration or semantics of particular `EdgeTypeId` values (delegated to catalogs and profiles). + +TGK/1-CORE is a **logical kernel** only. + +### 1.4 Layering and dependencies + +TGK/1-CORE sits: + +* **Above ASL/1-CORE**: + + * Reuses `Artifact`, `Reference`, `TypeTag`, and identity semantics. + * Treats edge data as Artifacts; edge identities are ordinary `Reference`s. + +* **Orthogonal to PEL/1**: + + * MAY model PEL/1 executions as edges (via profiles). + * Does not impose runtime behavior on PEL/1 engines. + +#### 1.4.1 Layering invariant with PEL/1 + +**TGK/PEL-LAYERING-INV/CORE/1** + +* TGK/1-CORE **MUST NOT** impose additional runtime behavior or API obligations on conformant PEL/1 engines beyond those defined in `PEL/1`. +* Any TGK edges that describe PEL/1 executions **MUST** be derivable solely from stored ASL/1 Artifacts (programs, inputs, execution results, receipts) and published specifications. +* Whether a PEL/1 implementation emits edge Artifacts directly is an implementation detail and is **not** part of PEL/1 conformance. + +--- + +## 2. Core Graph Model + +### 2.1 Node + +A **Node** in TGK/1-CORE is any ASL/1 `Reference`: + +```text +Node := Reference // i.e., an ArtifactRef +``` + +Properties: + +* Nodes are identified **only** by their `Reference` value. +* TGK/1-CORE does not distinguish “edge nodes” vs “data nodes”; that is a profile-level notion. +* There is no separate node ID layer; there are no node identifiers beyond `Reference`. +* The presence of a Node in the ProvenanceGraph is implied by its appearance in any TGK edge’s `from`, `to`, or `payload` fields (see §4.1). + +> **Edges-over-edges note (informative)** +> Because Nodes are plain `Reference`s, they can point to any Artifact, including EdgeArtifacts. TGK/1-CORE therefore allows edges-over-edges (meta-edges that describe or govern other edges). The semantics of such patterns are determined by the profiles that define the relevant `EdgeTypeId` values. + +### 2.2 EdgeBody + +An **EdgeBody** is the logical content of a TGK edge: + +```text +EdgeBody { + type: EdgeTypeId + from: Node[] // ordered, MAY be empty + to: Node[] // ordered, MAY be empty + payload: Reference // ArtifactRef, always present +} +``` + +Semantics and invariants: + +* `type : EdgeTypeId` + Identifies the **kind** of relationship (e.g., execution, attestation, overlay mapping). Semantics of each `EdgeTypeId` are defined in separate specifications, not by TGK/1-CORE. + +* `from : Node[]` + Ordered list of source nodes. MAY be empty. Order is semantically significant and part of the logical value. + +* `to : Node[]` + Ordered list of target nodes. MAY be empty. Order is semantically significant. + +* `payload : Reference` + A syntactically valid ASL/1 `Reference`, always present. TGK/1-CORE does **not** require that `payload` be resolvable in any particular store; existence is a deployment concern. + +* **Non-emptiness constraint** + + > **TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1** + > For a well-formed `EdgeBody`, at least one of `from` or `to` **MUST** be non-empty. An `EdgeBody` with `from = []` **and** `to = []` is invalid and MUST NOT be produced or accepted as a TGK edge. + +> **TGK/PROV-EVIDENCE/CORE/1 (RECOMMENDED)** +> To support provenance, edge types that describe “how we got here” **SHOULD** ensure that: +> +> * `payload` references an Artifact whose content is a stable, replayable description of the relationship; and +> * the `from` and `to` node sets can, in principle, be recomputed from that payload and other Artifacts in the environment, according to the edge type’s profile. +> +> In edge types that use minimal descriptors as payload, those descriptors **SHOULD** themselves be defined such that their content is a deterministic function of the other Artifacts and parameters that define the relationship, so that edge Artifacts can always be re-derived. + +**Duplicates and self-reference** + +TGK/1-CORE does not forbid: + +* duplicate entries within `from`, +* duplicate entries within `to`, or +* `payload` also appearing in `from` or `to`. + +The semantics (if any) of such patterns are defined by the profiles that own the relevant `EdgeTypeId`. The kernel only requires that: + +* `from` and `to` are ordered lists of syntactically valid ASL/1 `Reference`s; and +* they obey TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1. + +TGK/1-CORE does **not** constrain how `EdgeBody` values are encoded into `Artifact.bytes`; this is the role of encoding profiles like `ENC/TGK1-EDGE`. + +### 2.3 EdgeTypeId + +`EdgeTypeId` identifies the semantic type of an edge: + +```text +EdgeTypeId = uint32 +``` + +Constraints: + +* For any given ExecutionEnvironment snapshot, each `EdgeTypeId` that appears in TGK edges **MUST** have a single, well-defined and immutable semantics within that environment. +* TGK/1-CORE does not prescribe a global registration mechanism or reserved ranges. +* Catalogs such as `TGK/TYPES-CORE` typically bind `EdgeTypeId` values to human-readable names, owning profiles, and structural constraints (e.g. allowed cardinalities of `from` / `to`), but TGK/1-CORE does not standardize that surface. + +**Unknown types** + +* If an ExecutionEnvironment encounters an Artifact whose payload decodes to an `EdgeBody` whose `type` is not recognized in its configured catalogs/profile set, it **MUST** treat that Artifact as **not** forming a TGK edge for that environment: + + * that Artifact does **not** qualify as an `EdgeArtifact` under §3.1; and + * it therefore contributes no edges or nodes to the ProvenanceGraph. + +> **Environment-relative semantics (informative)** +> Recognition of `EdgeTypeId` values depends on the ExecutionEnvironment’s configured catalogs and profiles. As a result, the exact set of TGK edges derived from a fixed set of Artifacts may differ between environments. TGK/1-CORE considers this expected: the kernel guarantees determinism only *relative* to a given snapshot + profile set, not across all possible environments. + +--- + +## 3. Edge Artifacts and Decoding + +TGK/1-CORE uses **EdgeArtifacts** as the only concrete representation of edges. + +### 3.1 EdgeArtifact definition + +An **EdgeArtifact** is any ASL/1 `Artifact` that, relative to a given ExecutionEnvironment snapshot: + +1. Has a `type_tag` whose `tag_id` is recognized (by the local profile set) as an edge tag; and +2. Has `bytes` that, under at least one applicable edge encoding profile, decode to a single well-formed `EdgeBody` value as defined in §2.2; and +3. Has an `EdgeBody.type` that is recognized (by the local profile set) as a supported `EdgeTypeId` for this environment (see §2.3). + +Formally, for a given snapshot: + +* Let `EDGE_TAG_SET` be the set of `TypeTag.tag_id` values configured as TGK edge tags. +* For each active edge encoding profile `P` in the environment: + + * `P` provides a **partial** decoding function: + + ```text + decode_edge_payload_P : OctetString -> EdgeBody | error + ``` + + which is a pure function of its input bytes. + +> **Configuration origin note (informative)** +> `EDGE_TAG_SET` is derived from the ExecutionEnvironment’s configured TGK-related profiles and catalogs (e.g., `TGK/TYPES-CORE`, `ENC/TGK1-EDGE`), and/or from explicit deployment configuration. TGK/1-CORE does not prescribe how this configuration is stored, distributed, or governed; it only assumes that, for any snapshot, there is a well-defined set of `TypeTag.tag_id` values considered edge tags. In many deployments, one or more `TypeTag` values (e.g., a `TGK_EDGE_V1` tag) will be reserved specifically for edge Artifacts, but this is a convention, not a kernel requirement. + +Then, an Artifact `A` is an EdgeArtifact iff: + +* `A.type_tag` is present and `A.type_tag.tag_id ∈ EDGE_TAG_SET`; and + +* there exists at least one active profile `P` such that: + + ```text + decode_edge_payload_P(A.bytes) = EdgeBody E // succeeds, no error + ``` + + where `E` is a well-formed `EdgeBody` per §2.2; and + +* `E.type` is recognized in the environment as a supported `EdgeTypeId` for TGK purposes (see §2.3). + +Artifacts that satisfy the edge-tag and decoding constraints but whose decoded `EdgeBody.type` is not recognized as a supported `EdgeTypeId` for this environment MUST NOT be treated as EdgeArtifacts (see §2.3). + +TGK/1-CORE does not prescribe: + +* a particular `tag_id` for EdgeArtifacts; or +* a particular encoding for `EdgeBody` into `Artifact.bytes`. + +Those are the responsibility of edge encoding profiles and catalogs. + +> **Single-edge-per-artifact invariant (informative)** +> TGK/1-CORE assumes each EdgeArtifact encodes exactly one `EdgeBody` and thus one logical edge. Bundling multiple logical edges into a single Artifact is outside the TGK/1-CORE model and, if needed, **SHOULD** be expressed as multiple EdgeArtifacts (e.g., via an index or bundle Artifact that refers to other EdgeArtifacts). + +> **Environment-relative edgehood (informative)** +> An Artifact can be an EdgeArtifact in one ExecutionEnvironment (given its profile set) and not in another. TGK/1-CORE defines edgehood relative to the configured profiles, not as an intrinsic property of the Artifact alone. + +### 3.2 Edge decoding and multi-profile behavior + +For each active edge encoding profile `P`: + +* The function `decode_edge_payload_P` **MUST** be: + + * **partial** — returns either: + + * a successfully decoded `EdgeBody`, or + * an error signaling “not a valid edge payload for this profile”; + * **deterministic** — no hidden state, randomness, or external configuration affects its output. + +Additional constraints: + +* For Artifacts whose `type_tag.tag_id ∉ EDGE_TAG_SET`, all edge encoding profiles **MUST** treat `decode_edge_payload_P` as not applicable (always error) and **MUST NOT** attempt to reinterpret arbitrary non-edge-tag Artifacts as TGK edges. + +* For Artifacts whose `type_tag.tag_id ∈ EDGE_TAG_SET`: + + * It is **permitted** that some active profiles do not apply (they simply return an error). + * If more than one active profile successfully decodes `A.bytes`, then all those profiles **MUST** decode to the **same** logical `EdgeBody` value. If two active profiles decode the same Artifact to different `EdgeBody` values, the ExecutionEnvironment is misconfigured and **MUST NOT** treat that Artifact as an EdgeArtifact until the conflict is resolved. + +> **TGK/EDGE-PROFILE-RECOMMEND/CORE/1 (RECOMMENDED)** +> For operational simplicity, ExecutionEnvironments **SHOULD** configure at most one active edge-encoding profile for any given edge `TypeTag.tag_id` at a time. When multiple profiles may apply to the same EdgeArtifacts (e.g., during a migration), they **MUST** be governed so that any Artifact accepted by more than one profile decodes to the same `EdgeBody`. + +### 3.3 EdgeRef + +An **EdgeRef** is simply the ASL/1 `Reference` to an EdgeArtifact: + +```text +EdgeRef := Reference // reference to an EdgeArtifact +``` + +Properties: + +* No new identity scheme is introduced for edges. +* The identity and equality of EdgeArtifacts and EdgeRefs are fully governed by ASL/1-CORE (canonical encoding + hashing via `ENC/ASL1-CORE` and `HASH/ASL1`). +* For a fixed canonical Artifact encoding and hash profile: + + * equality of EdgeRefs is equivalent to equality of the underlying EdgeArtifacts; and + * by injective edge encodings in the applicable encoding profile, equivalent (modulo cryptographic collision assumptions) to equality of their logical `EdgeBody` values. + +> **Duplicate logical edges (informative)** +> In most deployments, a given logical edge type and encoding will produce a unique EdgeArtifact for a given `EdgeBody`, because canonical encoding + ASL hashing make that Artifact and its `Reference` unique. Distinct `EdgeRef` values that encode semantically equivalent relationships can still arise if different `TypeTag` / encoding / profile combinations are used to express the same relationship. TGK/1-CORE does not attempt to normalize such cases; higher-layer profiles MAY choose to detect or coalesce them. + +> **Store interaction note (informative)** +> Any ASL/1-STORE that holds EdgeArtifacts can be used to resolve `EdgeRef` via normal `get(Reference)` semantics. TGK/1-CORE does not define a separate persistence layer for edges; they are ordinary Artifacts as far as ASL/1-STORE is concerned. + +### 3.4 Relationship between EdgeArtifact and EdgeBody + +For each EdgeArtifact: + +```text +A_edge : Artifact +Ref_edge : Reference // derived per ASL/1-CORE +Body : EdgeBody // Body = EdgeBody(A_edge) via the unique decoding result +``` + +The mapping `EdgeBody(A_edge)` is determined by the environment’s active profiles and MUST obey the determinism and well-formedness constraints above. + +Encoding profiles such as `ENC/TGK1-EDGE` define: + +* the concrete layout of `EdgeBody` into `Artifact.bytes`; and +* how `TypeTag` values map to particular edge schemas. + +--- + +## 4. ProvenanceGraph as Projection + +### 4.1 Graph derived from Artifacts + +Given: + +* a finite snapshot set of Artifacts `Artifacts`; and +* a fixed set of active edge-encoding profiles and type catalogs (the **profile set**) in an ExecutionEnvironment, + +the **ProvenanceGraph** induced by `Artifacts` and the profile set is the pair: + +```text +ProvenanceGraph { + Nodes: set + Edges: set<(EdgeRef, EdgeBody)> +} +``` + +defined as follows: + +1. **Edges** + + * Let `EdgeArtifacts ⊆ Artifacts` be the subset of Artifacts that qualify as EdgeArtifacts under §3.1 and §3.2. + * For each `A_edge ∈ EdgeArtifacts`: + + * Let `Ref_edge` be its ASL `Reference`. + * Let `Body = EdgeBody(A_edge)` be the decoded `EdgeBody`. + + Then: + + ```text + Edges = { (Ref_edge, Body) | A_edge ∈ EdgeArtifacts } + ``` + +2. **Nodes** + + Nodes are all ArtifactRefs that appear anywhere in edges: + + ```text + Nodes = { + n : Reference | + ∃ (Ref_edge, Body) ∈ Edges such that + n ∈ Body.from ∪ Body.to ∪ { Body.payload } + } + ``` + +Clarifications: + +* The `Nodes` set includes only `Reference`s that participate in at least one edge as source, target, or payload. +* Artifacts (and their References) that have no incoming or outgoing edges are **not** included in the ProvenanceGraph by TGK/1-CORE. Profiles MAY define derived views that treat all Artifacts as degree-zero nodes, but this is outside the TGK/1-CORE kernel. +* TGK/1-CORE does **not** require that every `Node` in `Nodes` correspond to an Artifact present in the `Artifacts` set. The ProvenanceGraph is a graph over the **Reference space**. Whether a given `Reference` is resolvable to an Artifact in a particular store or federation is outside this kernel and is governed by `ASL/1-STORE` and deployment policy. + +> **TGK/GRAPH-PROJECTION/CORE/1** +> For a fixed snapshot set of Artifacts and a fixed profile set, the ProvenanceGraph, as defined above, is unique. Implementations MAY cache or index edges and nodes, but **MUST NOT** introduce logical edges that cannot be derived from EdgeArtifacts and the profiles in effect at that snapshot. + +### 4.2 Informative: provenance traces + +TGK/1-CORE does **not** define provenance or trace operations normatively. However, it is intended to be the substrate for: + +* `TGK/PROV/1`, which defines: + + * provenance policies (e.g., “which edge types participate”), and + * trace operators (e.g., backwards reachability) over `ProvenanceGraph`. + +As an informative sketch, a backwards provenance operator would: + +* start from a set of target `Node`s (ArtifactRefs); and +* walk backwards along edges whose `EdgeTypeId` are selected by some policy, +* until reaching nodes that are considered roots by that policy. + +Any such operator **MUST**, when specified in `TGK/PROV/1`, be defined purely as a projection over `ProvenanceGraph`, consistent with `TGK/PROV-KERNEL/CORE/1`, `TGK/DET/CORE/1`, and `TGK/NO-OFF-GRAPH-PROV/CORE/1`. + +--- + +## 5. Interaction with Other Layers (Informative) + +### 5.1 Interaction with PEL/1 + +A PEL/1 execution typically involves: + +* a `Program` ArtifactRef, +* zero or more input ArtifactRefs, +* an `ExecutionResult` ArtifactRef that references output ArtifactRefs. + +A profile such as `TGK/PEL/1` can define: + +* a specific `EdgeTypeId` (e.g., `EDGE_EXECUTION`); and +* an edge encoding that maps PEL/1 execution payloads to an `EdgeBody`: + + ```text + EdgeBody.type = EDGE_EXECUTION + EdgeBody.from = [program_ref] ∪ input_refs[] + EdgeBody.to = output_refs[] ∪ [execution_result_ref] + EdgeBody.payload = execution_result_ref + ``` + +Then, for each execution, an EdgeArtifact is produced (by the runtime or an ingestion tool) with: + +* a TGK edge `TypeTag`, and +* a payload encoding that an edge profile (e.g., `ENC/TGK1-EDGE`) decodes to such an `EdgeBody`. + +The resulting ProvenanceGraph expresses execution relationships as edges over ArtifactRefs. + +TGK/1-CORE does not require PEL/1 engines to emit such edge Artifacts; they MAY be derived post hoc from stored Artifacts. + +### 5.2 Interaction with CIL/1 + +CIL/1 defines certificate Artifacts. A profile (e.g., `TGK/CIL/1`) can specify: + +* `EdgeTypeId = EDGE_ATTESTS`. + +For each certificate Artifact `cert_ref` whose subject is `subject_ref`: + +```text +EdgeBody.type = EDGE_ATTESTS +EdgeBody.from = [cert_ref] +EdgeBody.to = [subject_ref] +EdgeBody.payload = cert_ref +``` + +EdgeArtifacts that encode these `EdgeBody` values make certificate relationships explicit in the ProvenanceGraph. + +TGK/1-CORE itself does not verify signatures or policies; CIL/1 and governance profiles do. + +### 5.3 Interaction with FCT/1, FER/1, OI/1 + +Profiles can similarly define: + +* evidence-to-fact edges (e.g., `EDGE_FACT_SUPPORTS`), +* overlay mapping edges (e.g., `EDGE_OVERLAY_MAPS`), +* other domain relationships. + +The common pattern is: + +* define an `EdgeTypeId`; +* define how to encode a logical `EdgeBody` into an EdgeArtifact payload; +* derive the graph as in §4.1. + +TGK/1-CORE itself is agnostic to those semantics. + +--- + +## 6. Conformance + +An implementation is **TGK/1-CORE–conformant** if and only if it satisfies all of the following: + +1. **Node model** + + * Treats any ASL/1 `Reference` as a potential Node (`Node := Reference`). + * Does not introduce a separate node identity layer for TGK purposes. + +2. **Edge artifacts and decoding** + + * Defines (via configuration or companion specs) which `TypeTag.tag_id` values represent TGK edge Artifacts (`EDGE_TAG_SET`). + + * For each active edge encoding profile `P`, provides a partial, deterministic decoder: + + ```text + decode_edge_payload_P : OctetString -> EdgeBody | error + ``` + + that: + + * succeeds (returns `EdgeBody`) exactly for payloads considered valid edges under profile `P`; and + * returns an error otherwise. + + * For any Artifact `A` with `A.type_tag.tag_id ∉ EDGE_TAG_SET`, all edge profiles **MUST** treat `decode_edge_payload_P` as not applicable (always error) and **MUST NOT** attempt to interpret `A.bytes` as a TGK edge payload. + + * For any Artifact `A` with `A.type_tag.tag_id ∈ EDGE_TAG_SET`: + + * `A` is an EdgeArtifact only if at least one active profile successfully decodes `A.bytes` to a well-formed `EdgeBody` whose `type` is recognized as a supported `EdgeTypeId` in the environment. + * If more than one active profile decodes `A.bytes` successfully, they **MUST** all decode it to the same logical `EdgeBody`. If they do not, the environment **MUST NOT** treat `A` as an EdgeArtifact until the inconsistency is resolved. + +3. **EdgeBody invariants** + + * Treats as well-formed only those `EdgeBody` values that satisfy §2.2: + + * `from` and `to` are ordered lists of syntactically valid ASL/1 `Reference`s; + * they satisfy TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1; and + * `payload` is always a syntactically valid ASL/1 `Reference` and always present. + * Edge encoding profiles **MUST** reject payloads that would decode to an `EdgeBody` violating these invariants. + +4. **Graph projection** + + * Given: + + * a finite snapshot set of Artifacts; and + * the configured edge tags + decoding rules (profile set), + * it can construct the ProvenanceGraph as in §4.1: + + * Edge set derived from EdgeArtifacts; + * Node set derived from `from`, `to`, and `payload` fields of `EdgeBody` values. + * Any graph indexes or caches it exposes **MUST** be consistent with this projection (`TGK/GRAPH-PROJECTION/CORE/1`, `TGK/DET/CORE/1`). + +5. **Immutability** + + * Treats EdgeArtifacts as immutable, as required by ASL/1-CORE. + * Does not attempt to “edit” an edge in place; logical changes **MUST** be represented by new Artifacts (edge Artifacts and/or other Artifacts) rather than mutating existing ones. + +6. **Layering invariant with PEL/1** + + * Respects `TGK/PEL-LAYERING-INV/CORE/1`: + + * Does not impose additional requirements on PEL/1 engines beyond those in `PEL/1`. + * Allows PEL/1-related edge profiles to be implemented either by the runtime or by ingestion tools, without affecting PEL/1 conformance. + +7. **Profile compatibility** + + * If it claims to implement specific TGK-related profiles (e.g., `TGK/PEL/1`, `TGK/CIL/1`, `TGK/PROV/1`), it **MUST**: + + * interpret `EdgeTypeId` and edge payloads according to those profiles; and + * ensure that all edges defined by those profiles can be represented as EdgeArtifacts consistent with TGK/1-CORE. + +Everything else — canonical encodings for `EdgeBody`, edge hashing, graph store APIs, provenance algorithms, error models — belongs to: + +* edge encoding profiles (`ENC/TGK1-EDGE`), +* storage/query profiles (`TGK/STORE/1`), and +* provenance profiles (`TGK/PROV/1`) and higher semantic layers (`FCT/1`, `FER/1`, `OI/1`, etc.). + +--- + +## 7. Evolution (Informative) + +TGK/1-CORE is intended to evolve **additively**: + +* New edge types are introduced by assigning new `EdgeTypeId` values in catalogs and profiles. +* New edge tags are introduced by assigning new `TypeTag.tag_id` values to EdgeArtifacts. +* New encodings are introduced by adding new edge encoding profiles and decoders. + +Existing EdgeArtifacts and their decoded `EdgeBody` values: + +* **MUST NOT** be retroactively reinterpreted to have different logical meaning under TGK/1-CORE; and +* **MUST** remain valid inputs to any future profile sets that claim to support their `TypeTag` and `EdgeTypeId`, subject to the multi-profile behavior rules in §3.2. + +Introducing a new edge-encoding profile that begins to treat previously non-edge Artifacts (e.g., with a new `TypeTag` or a previously unused `EdgeTypeId`) as EdgeArtifacts is allowed and considered an additive extension. + +It is **not** permitted to change an existing profile or catalog in a way that causes an Artifact that previously decoded to a given `EdgeBody` (under a given `(TypeTag, EdgeTypeId)` and profile set) to be decoded to a different `EdgeBody` in the same environment. Such changes **SHOULD** instead be modeled via new `TypeTag` values and/or new `EdgeTypeId` assignments. + +This aligns TGK/1-CORE with the broader Amduat design principle of **“never rewrite history; evolve by addition and projection.”** + +--- + +## Document History + +* **0.7.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline. diff --git a/tier1/tgk-prov-1.md b/tier1/tgk-prov-1.md new file mode 100644 index 0000000..567e8a2 --- /dev/null +++ b/tier1/tgk-prov-1.md @@ -0,0 +1,1839 @@ +# TGK/PROV/1 — Provenance & Trace Semantics over TGK/1-CORE + +Status: Approved +Owner: Niklas Rydberg +Version: 0.1.20 +SoT: Yes +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [traceability, evidence] + + + +Here’s **iteration 22** of `TGK/PROV/1`, bumped to **0.1.20**. I’m happy to treat this as **ready for approval** (no further iterations needed from my side). After that you’ll find an updated **SUBSTRATE/STACK-OVERVIEW** and a suggested **commit message**. + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + + +--- + +**Document ID:** `TGK/PROV/1` +**Layer:** L1.8 — Provenance & trace semantics over `TGK/1-CORE` + +**Depends on (normative):** + +* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId`, `ProvenanceGraph` +* (via `TGK/1-CORE`) `ASL/1-CORE v0.4.x` — value substrate (`Artifact`, `Reference`, `TypeTag`) + +**Informative references:** + +* `SUBSTRATE/STACK-OVERVIEW v0.3.x` — layering and dependency discipline +* `TGK/STORE/1 v0.2.x` — graph store & query semantics (adjacency queries over `ProvenanceGraph`) +* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for TGK `EdgeBody` / EdgeArtifacts +* `PEL/TRACE-DAG/1 v0.1.x` — DAG execution trace profile (example provenance source) +* `CIL/1`, `FER/1`, `FCT/1`, `OI/1` — profiles that define provenance-relevant edge types + +> **Normativity note** +> `TGK/PROV/1` defines **pure functions over `ProvenanceGraph`** only. It MUST NOT assume any particular store, graph index, or query API. Implementations MAY realize these functions using `TGK/STORE/1` or other substrates, but conformance is judged against the abstract semantics in this document. + +--- + +## 0. Conventions + +### 0.1 RFC 2119 terminology + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, +**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be +interpreted as described in RFC 2119. + +### 0.2 Imported types and structures + +From `ASL/1-CORE` and `TGK/1-CORE`: + +```text +Reference { + hash_id: HashId + digest: OctetString +} + +Node := Reference + +EdgeTypeId = uint32 + +EdgeBody { + type: EdgeTypeId + from: Node[] + to: Node[] + payload: Reference +} + +ProvenanceGraph { + Nodes: set + Edges: set<(EdgeRef, EdgeBody)> +} + +EdgeRef := Reference +``` + +From `TGK/1-CORE` (informative, restated): + +* `ProvenanceGraph` is a pure projection over: + + * a finite set of Artifacts, and + * a configured set of TGK edge profiles and catalogs. + +* All edges are represented as EdgeArtifacts; `EdgeRef` is an ordinary ASL `Reference`. + +Unless otherwise stated, **sets are mathematical sets** (no duplicates, no ordering). Lists are ordered and MAY contain duplicates. + +### 0.3 Additional terminology (this document) + +`TGK/PROV/1` introduces the following logical types: + +```text +ProvDirection (u8) { + BACKWARD = 1 // walk from 'to' nodes to 'from' nodes + FORWARD = 2 // walk from 'from' nodes to 'to' nodes + BOTH = 3 // union of BACKWARD and FORWARD +} + +EdgeTypeFilter { + // If empty, match all EdgeTypeId values present in the graph. + types: list +} + +DepthLimit = optional uint32 // hop-count limit; absent = unbounded + +TraceGraph { + seeds: set + nodes: set + edges: set<(EdgeRef, EdgeBody)> +} +``` + +Additional derived types used in this version: + +```text +DepthMap = map + +ProvLayer { + depth: uint32 + nodes: set +} + +ProvLayering { + layers: list // sorted by increasing depth, no empty layers +} +``` + +We also use a shorthand for finite seed sets: + +```text +SeedSet = set // always finite under TGK/PROV/1 +``` + +A `TraceGraph` is a **subgraph view** over some `ProvenanceGraph`, annotated with the seed set from which it was computed. `DepthMap` and `ProvLayering` are depth-oriented views over the same closure. `TGK/PROV/1` does not assign additional semantics to `TraceGraph.seeds`, depths, or layers by itself; higher-layer profiles (e.g. FER/1) may do so when encoding traces into Artifacts or presenting them to users. + +Seeds are always treated as **finite sets of nodes**: + +* Semantically, a seed input is a `SeedSet`. +* Even if a concrete API accepts seeds as a list or array, duplicates are ignored and order is irrelevant. + +In all operators in this document, the seed set `S` is assumed to be a **finite `SeedSet`**. Concrete APIs MAY accept seeds in any finite representation, but semantically they are finite sets. + +### 0.4 Snapshots & finiteness assumptions + +`TGK/PROV/1` always works **relative to a single snapshot** of an ExecutionEnvironment in the sense of `TGK/1-CORE`: + +* Each snapshot has a **finite** Artifact set and a fixed TGK profile set in effect. +* The corresponding `ProvenanceGraph G` for that snapshot has: + + ```text + G.Nodes -- finite + G.Edges -- finite + ``` + +All semantics in this specification are evaluated with respect to such a finite `ProvenanceGraph G` and a finite `SeedSet S`: + +* All closures, traces, depth maps, and layerings are therefore finite by construction. +* Any mention of “for a fixed `ProvenanceGraph G`” or “for a snapshot” implicitly assumes this finiteness, as guaranteed by `TGK/1-CORE`. + +`TGK/PROV/1` does not describe how snapshots evolve over time; it only speaks about the results of provenance operators **within** a given snapshot. + +--- + +## 1. Purpose, Scope & Non-Goals + +### 1.1 Purpose + +`TGK/PROV/1` defines a **kernel provenance model** over `ProvenanceGraph`: + +* A small set of **provenance query parameters**: + + * direction (`ProvDirection`), + * edge-type selection (`EdgeTypeFilter`), + * depth bound (`DepthLimit`). + +* Canonical **closure operators** over `ProvenanceGraph`: + + * `prov_closure_nodes` — node set reachable from seeds under a given policy; + * `prov_trace` — a `TraceGraph` subgraph induced by that closure. + +* Canonical **depth-oriented views**: + + * `prov_depths` — a `DepthMap` assigning each reachable node a minimum hop-count depth; + * `prov_layers` — a `ProvLayering` partitioning reachable nodes by depth. + +These operators are specified as **pure functions** over a fixed `ProvenanceGraph` snapshot (§0.4) and a fixed parameter set, in line with TGK invariants: + +* `TGK/PROV-KERNEL/CORE/1` — provenance as a projection over edges; +* `TGK/DET/CORE/1` — determinism for a fixed snapshot + profile set; +* `TGK/NO-OFF-GRAPH-PROV/CORE/1` — no hidden, non-Artifactual provenance. + +For any `ProvenanceGraph G`, finite seed set `S : SeedSet`, and provenance parameters `Q`: + +```text +prov_closure_nodes(G, S, Q) -> set +prov_depths (G, S, Q) -> DepthMap +prov_layers (G, S, Q) -> ProvLayering +prov_trace (G, S, Q) -> TraceGraph +``` + +are pure, deterministic functions of `(G, S, Q)`. + +### 1.2 Scope + +`TGK/PROV/1` covers: + +* The **provenance parameter model** (`ProvDirection`, `EdgeTypeFilter`, `DepthLimit`). + +* The **semantics** of: + + * directional neighbors and edge incidence, + * backward / forward / bidirectional cones, + * node-depth assignment and per-layer grouping, + * the induced `TraceGraph` subgraph. + +* An explicit requirement that all provenance semantics: + + * operate solely over `ProvenanceGraph`, and + * never introduce edges or nodes not already present in that graph. + +This document is intentionally conservative. It specifies “kernel” provenance operators suitable as a foundation for: + +* execution provenance (e.g. PEL DAG traces), +* certificate and evidence provenance (CIL/1, FER/1), +* fact and overlay provenance (FCT/1, OI/1), + +without baking those domains into the kernel. + +### 1.3 Non-goals + +`TGK/PROV/1` explicitly does **not** define: + +* Store or graph store surfaces (`ASL/1-STORE`, `TGK/STORE/1`); it is purely semantic. +* Provenance **policies** (e.g. “which edge types count as causal provenance vs annotations”). +* A full **query language** (filters on payload content, fact schemas, certificate semantics). +* Minimality or optimality criteria for traces (e.g. “smallest subgraph that explains X”). +* How provenance results are encoded into Artifacts (e.g. receipts or reports) — those are `FER/1` and related specifications. + +Implementations and profiles MAY define richer operators (slicing, k-bounded paths, temporal constraints, etc.) on top of this kernel, but MUST describe them as projections over `ProvenanceGraph`. + +--- + +## 2. Provenance Parameters + +### 2.1 Edge type selection (`EdgeTypeFilter`) + +`EdgeTypeFilter` constrains which edges participate in a provenance traversal. + +```text +EdgeTypeFilter { + types: list +} +``` + +For a fixed `ProvenanceGraph G`, define the **effective edge-type set**: + +```text +EffectiveTypeSet(filter, G) = + if filter.types is empty: + { body.type | (ref, body) ∈ G.Edges } + else: + { t | t ∈ filter.types ∧ ∃ (ref, body) ∈ G.Edges, body.type = t } +``` + +Only edges whose `EdgeBody.type` belongs to `EffectiveTypeSet(filter, G)` are considered during traversal. + +Notes: + +* Repeated `EdgeTypeId` values in `filter.types` have no effect. +* If `EffectiveTypeSet(filter, G)` is empty, then no edges participate in traversal for that graph and filter; the resulting edge set is empty and the node closure consists solely of the seeds (see §3.3). +* For any two filters `filter₁`, `filter₂`, if: + + ```text + EffectiveTypeSet(filter₁, G) ⊆ EffectiveTypeSet(filter₂, G) + ``` + + then all neighbor sets, closures, and traces computed with `filter₁` are subsets of those computed with `filter₂` for the same `G`, seeds, direction, and depth limit (see §2.2, §3.4.5, and §4.6). + +`TGK/PROV/1` does not prescribe which edge types “should” be included; higher-layer provenance profiles SHOULD specify concrete filters (e.g. “execution edges only”, “facts + receipts + certificates”). + +When provenance is implemented over a `TGK/STORE/1` graph store, this `EdgeTypeFilter` has the same logical shape and semantics as `TGK/STORE/1`’s `EdgeTypeFilter`, with the understanding that `G.Edges` already reflects the environment’s supported edge types. + +### 2.2 Direction semantics (`ProvDirection`) + +`ProvDirection` determines how selected edges connect nodes during traversal. + +```text +ProvDirection (u8) { + BACKWARD = 1 + FORWARD = 2 + BOTH = 3 +} +``` + +For a given `ProvenanceGraph G`, direction `dir`, and edge-type filter `filter`, define: + +1. The **selected edge set**: + + ```text + SelectedEdges(G, filter) = + { (ref, body) ∈ G.Edges | body.type ∈ EffectiveTypeSet(filter, G) } + ``` + +2. For a node `n`, the set of **incident selected edges** (via `from`/`to` only): + + ```text + IncEdges(G, n, filter) = + { (ref, body) ∈ SelectedEdges(G, filter) | + n ∈ body.from ∪ body.to } + ``` + +3. The **directional neighbor functions** `Neighbors_dir`: + + * For `dir = BACKWARD`: + + ```text + Neighbors_BACKWARD(G, n, filter) = + { m ∈ Node | + ∃ (ref, body) ∈ SelectedEdges(G, filter), + n ∈ body.to, + m ∈ body.from + } + ``` + + * For `dir = FORWARD`: + + ```text + Neighbors_FORWARD(G, n, filter) = + { m ∈ Node | + ∃ (ref, body) ∈ SelectedEdges(G, filter), + n ∈ body.from, + m ∈ body.to + } + ``` + + * For `dir = BOTH`: + + ```text + Neighbors_BOTH(G, n, filter) = + Neighbors_BACKWARD(G, n, filter) + ∪ Neighbors_FORWARD(G, n, filter) + ``` + +All neighbor sets are mathematical sets: multiple edges connecting the same pair of nodes do not create duplicate neighbor entries. + +By the definition of `ProvenanceGraph.Nodes` in `TGK/1-CORE`, all endpoints of any edge in `G.Edges` lie in `G.Nodes`. Therefore, for all `dir`, `n`, and `filter`: + +```text +Neighbors_dir(G, n, filter) ⊆ G.Nodes +``` + +Self-loops and multi-endpoint edges behave naturally under these definitions: + +* If a selected edge has the same node in both `from` and `to`, then: + + * under `FORWARD` and `BACKWARD` that node is its own neighbor, and + * under `BOTH` it still appears only once as a neighbor. + +* If a node appears multiple times in `from` and/or `to` for a given edge, this does not change neighbor sets: neighbors are deduplicated at the set level. + +For fixed `G`, `n`, and `dir`, neighbor sets are **monotone in the effective edge-type set**: if `filter₁` and `filter₂` satisfy: + +```text +EffectiveTypeSet(filter₁, G) ⊆ EffectiveTypeSet(filter₂, G) +``` + +then: + +```text +Neighbors_dir(G, n, filter₁) ⊆ Neighbors_dir(G, n, filter₂) +``` + +These neighbor definitions are intentionally aligned with `TGK/STORE/1`’s `GraphDirection` semantics when provenance traversals are implemented via graph-store adjacency: + +* `BACKWARD` corresponds to walking along edges where the current node appears in `body.to`, towards nodes in `body.from` (similar to `GraphDirection.IN`). +* `FORWARD` corresponds to walking along edges where the current node appears in `body.from`, towards nodes in `body.to` (similar to `GraphDirection.OUT`). +* `BOTH` is the union of the two. + +#### Payload nodes + +In this version: + +* `payload` is **not** used to advance the traversal frontier. +* Payload nodes may appear in the resulting `TraceGraph.nodes` set (see §4.2–§4.3), but `Neighbors_*` never step through `payload`. + +> **TGK/PROV-PAYLOAD-NONTRAV/1** +> Kernel provenance traversal under `TGK/PROV/1` **MUST NOT** use `EdgeBody.payload` as a source of neighbors. Only `from` and `to` participate in adjacency, depth assignment, and closure. Profiles that wish to treat payload values as traversable endpoints MUST either: +> +> * express those relationships via additional edges whose endpoints lie in `from` / `to`; or +> * define profile-specific traversal rules on top of this kernel, without redefining the core neighbor functions. + +### 2.3 Depth limit (`DepthLimit`) + +`DepthLimit` is an optional non-negative integer that bounds traversal length: + +```text +DepthLimit = optional uint32 +``` + +Semantics: + +* Seeds are at depth `0`. +* Traversal proceeds in **edge hops** (as defined by `Neighbors_dir`). +* For a limit `D`: + + * nodes at depth `d` with `d < D` MAY generate new neighbors at depth `d+1`; + * nodes at depth `d = D` DO NOT generate new neighbors; + * no node has depth `> D`. + +If `DepthLimit` is absent, traversal is unbounded in depth (but still finite in its effect on `G.Nodes` because `G` is finite, see §0.4). + +Special case: + +* If `DepthLimit` is present and `D = 0`, then no seed generates neighbors at all; the closure consists exactly of the seed set `S` (subject to the empty-seed rule in §3.3). + +`DepthLimit` applies to the number of edges stepped along admissible directions, **not** to the number of nodes. If there are multiple paths from seeds to a node, its depth is the length of the shortest such path under the chosen direction and type filter. + +--- + +## 3. Closure Semantics over `ProvenanceGraph` + +This section defines `prov_closure_nodes`, the core node-set closure operator, together with the canonical depth and layering views built on the same construction. + +### 3.1 Provenance query descriptor (logical) + +For convenience, this document packages parameters into a logical `ProvQuery` type: + +```text +ProvQuery { + direction: ProvDirection + type_filter: EdgeTypeFilter + depth_limit: DepthLimit +} +``` + +Concrete APIs MAY represent queries differently (e.g. as separate arguments), but their observable behavior MUST be equivalent to using a `ProvQuery` as above. + +Unless otherwise stated, in what follows: + +* `G` is a fixed `ProvenanceGraph` for some snapshot (finite, per §0.4), +* `S : SeedSet` is a finite seed set `S ⊆ Node` (seeds need not be in `G.Nodes`), and +* `Q` is a fixed `ProvQuery`. + +### 3.2 Node-depth assignment + +Fix: + +* a `ProvenanceGraph G`, +* a finite seed set `S : SeedSet`, and +* a `ProvQuery Q`. + +Let: + +* `dir = Q.direction`, +* `filter = Q.type_filter`, +* `D = Q.depth_limit` (which may be absent). + +Define a partial function: + +```text +depth_Q(G, S, n) : Node -> uint32 or undefined +``` + +as the **minimum traversal depth** of node `n` from seeds `S` under `G` and `Q`, via the following inductive construction. + +1. **Base depth-0 assignment** + + ```text + S_0 = S + depth_Q(G, S, n) = 0 for all n ∈ S_0 + ``` + + Seeds may or may not be in `G.Nodes`. Seeds that are not in `G.Nodes`: + + * still have `depth_Q(G, S, n) = 0` defined, but + * never appear in any `Frontier_d` set below (because `Frontier_d` ranges over `G.Nodes` only), + * and therefore never generate neighbors. + +2. **Iterative step** + + For each integer `d ≥ 0`, define: + + ```text + Frontier_d = { n ∈ G.Nodes | depth_Q(G, S, n) = d } + + NewNodes_d+1 = + { m ∈ G.Nodes | + ∃ n ∈ Frontier_d, + m ∈ Neighbors_dir(G, n, filter), + depth_Q(G, S, m) is undefined + } + ``` + + * If `D` is present and `d ≥ D`, then `NewNodes_d+1 = ∅` (no further expansion). + * For each `m ∈ NewNodes_d+1`, set: + + ```text + depth_Q(G, S, m) = d + 1 + ``` + +3. **Termination** + + Since `G.Nodes` is finite (§0.4) and each node in `G.Nodes` receives at most one depth assignment, this iterative construction over `G.Nodes` terminates in finitely many steps. Nodes for which `depth_Q(G, S, n)` remains undefined and which are not seeds are not reachable from `S` under `(G, Q)`. + +Depths for seeds in `S` are defined directly in step 1 and do not affect the termination of the frontier expansion over `G.Nodes`. + +### 3.3 Node closure: `prov_closure_nodes` + +Define the **provenance node closure**: + +```text +prov_closure_nodes(G, S, Q) -> set +``` + +as the set `Reachable_Q(G, S)` given by: + +```text +Reachable_Q(G, S) = + S ∪ { n ∈ G.Nodes | depth_Q(G, S, n) is defined } +``` + +By definition: + +```text +prov_closure_nodes(G, S, Q) = Reachable_Q(G, S) +``` + +This definition induces the following special cases and properties: + +* **Seeds are always included** + + For any `G`, finite `S : SeedSet`, and `Q`: + + ```text + S ⊆ Reachable_Q(G, S) + ``` + + regardless of whether seeds belong to `G.Nodes`. + +* **Seeds not in `G.Nodes`** + + If `n ∈ S` but `n ∉ G.Nodes`, then: + + * `depth_Q(G, S, n) = 0`, + * `n ∉ Frontier_d` for all `d`, and therefore `n` never generates neighbors, + * but `n ∈ Reachable_Q(G, S)` by definition. + + By the definition of `G.Nodes` in `TGK/1-CORE`, such seeds do **not** appear as endpoints of any edges in `G.Edges`. They therefore affect only the seed/closure node sets, not the traversed portion of the graph. + +* **Empty seed set** + + If `S = ∅`, then: + + * `depth_Q(G, S, n)` is undefined for all `n ∈ G.Nodes`, and + * `Reachable_Q(G, ∅) = ∅`. + + In particular, with an empty seed set, node closure is empty and the trace has no nodes or edges (see §4.2). + +* **Edge-type filter with no matching edges** + + If `EffectiveTypeSet(Q.type_filter, G)` is empty, no edges of `G` are traversed, and: + + * `depth_Q(G, S, n)` is undefined for all `n ∈ G.Nodes \ S`, + * `Reachable_Q(G, S) = S`. + + Combined with the empty-seed rule above, when `S = ∅` and `EffectiveTypeSet(Q.type_filter, G)` is empty, the closure is empty. + +* **Depth-limit special case** + + If `Q.depth_limit` is present with `D = 0`, then no `Frontier_d` with `d ≥ 0` ever expands, so: + + ```text + Reachable_Q(G, S) = S + ``` + + for all `G` and finite `S`. This is consistent with the empty-seed and empty-filter behaviors above. + +Because both `G.Nodes` and `S` are finite (§0.4, §0.3), `Reachable_Q(G, S)` (and thus `prov_closure_nodes(G, S, Q)`) is always a finite set. + +Properties: + +* If `DepthLimit` is absent, `Reachable_Q(G, S)` is the set of all nodes reachable from `S` by paths along edges in `SelectedEdges(G, filter)`, respecting `dir`. +* If `DepthLimit = D` is present, `Reachable_Q(G, S)` consists of: + + * all seeds; and + * all nodes in `G.Nodes` reachable from `S` by paths of length ≤ `D` under `dir` and `filter`. + +Collecting the seed-related aspects: + +> **TGK/PROV-SEED-SEM/1** +> For any `ProvenanceGraph G`, finite seed set `S : SeedSet`, and query `Q`: +> +> * Seeds are always included in the closure: +> +> ```text +> S ⊆ Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) +> ``` +> +> * If `S = ∅`, then `Reachable_Q(G, S) = ∅` (no seeds ⇒ empty closure). +> +> * For any seed `n ∈ S` with `n ∉ G.Nodes`, `n` is in `Reachable_Q(G, S)` but never appears in any `Frontier_d` and never generates neighbors; such seeds do not affect which nodes in `G.Nodes` are reachable. +> +> * Changing `Q.type_filter` or `Q.depth_limit` may add or remove **non-seed** reachable nodes but MUST NOT remove seeds from the closure: for all `Q` and all `Q'` differing only in `type_filter` or `depth_limit`, `S ⊆ Reachable_Q(G, S)` and `S ⊆ Reachable_Q'(G, S)`. + +### 3.4 Closure properties and closure-operator invariant + +For this section, define an **admissible `Q`-path in `G`** from `s ∈ S` to a node `n` to be a finite sequence: + +```text + with k ≥ 0 +``` + +such that for each `i` with `0 ≤ i < k` there exists an edge `(ref, body) ∈ SelectedEdges(G, filter)` that connects `n_i` and `n_{i+1}` according to `dir`: + +* if `dir = BACKWARD`, then `n_i ∈ body.to` and `n_{i+1} ∈ body.from`; +* if `dir = FORWARD`, then `n_i ∈ body.from` and `n_{i+1} ∈ body.to`; +* if `dir = BOTH`, then either of the above holds. + +The **length** of such a path is `k` (the number of edge hops). By the definition of `Neighbors_dir`, all intermediate nodes after the first step lie in `G.Nodes`. + +For any fixed `G`: + +1. **Seed monotonicity** + + If `S₁ ⊆ S₂` (both finite `SeedSet`s), then for any `Q`: + + ```text + Reachable_Q(G, S₁) ⊆ Reachable_Q(G, S₂) + ``` + +2. **Depth-limit monotonicity** + + Let `Q` and `Q'` be identical except for `DepthLimit`: + + * If `Q.depth_limit` is absent and `Q'.depth_limit = D` is present, then: + + ```text + Reachable_Q'(G, S) ⊆ Reachable_Q(G, S) + ``` + + * If both are present and `Q.depth_limit = D₁`, `Q'.depth_limit = D₂` with `D₁ ≤ D₂`, then: + + ```text + Reachable_Q(G, S) ⊆ Reachable_Q'(G, S) + ``` + +3. **Path characterization** + + For any node `n`: + + * If `depth_Q(G, S, n) = d` is defined, then there exists at least one admissible `Q`-path of length `d` from some `s ∈ S` to `n`. + * If there exists an admissible `Q`-path of length `k` from some `s ∈ S` to `n`, then: + + ```text + depth_Q(G, S, n) is defined ∧ depth_Q(G, S, n) ≤ k + ``` + +4. **Seed-union and idempotence** + + * For any two finite seed sets `S₁`, `S₂` and fixed `Q`: + + ```text + Reachable_Q(G, S₁ ∪ S₂) + = Reachable_Q(G, S₁) ∪ Reachable_Q(G, S₂) + ``` + + and, by induction, for any finite family `{S_i}`: + + ```text + Reachable_Q(G, ⋃_i S_i) = ⋃_i Reachable_Q(G, S_i) + ``` + + * For any finite `S` and fixed `Q`, closure is idempotent as a seed operator: + + ```text + S' = Reachable_Q(G, S) + ⇒ Reachable_Q(G, S') = S' + ``` + +5. **Edge-type filter monotonicity** + + Let `Q₁` and `Q₂` be identical except for `type_filter`. If: + + ```text + EffectiveTypeSet(Q₁.type_filter, G) ⊆ EffectiveTypeSet(Q₂.type_filter, G) + ``` + + then for any finite seed set `S : SeedSet`: + + ```text + Reachable_Q₁(G, S) ⊆ Reachable_Q₂(G, S) + ``` + + That is, enlarging the set of admissible edge types (while keeping `direction` and `depth_limit` fixed) can only increase, never decrease, the closure. + +Taken together, these properties mean: + +> **TGK/PROV-CLOSURE-OP/1** +> For any fixed `ProvenanceGraph G` and `ProvQuery Q`, the mapping: +> +> ```text +> Cl_Q : P_f(Node) -> P_f(Node) +> Cl_Q(S) = Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) +> ``` +> +> where `P_f(Node)` denotes finite subsets of `Node`, is a **closure operator** on finite seed sets (extensive, monotone, and idempotent). Higher-layer provenance profiles MAY rely on these algebraic properties when reasoning about seed sets and composed provenance queries. + +### 3.5 Alternative fixpoint characterization (informative) + +For **unbounded** queries (i.e. `Q.depth_limit` is absent), `prov_closure_nodes` can be described equivalently as the **least fixpoint** of a monotone operator on node sets. + +Fix `G`, finite `S : SeedSet`, `Q` with `Q.depth_limit = absent` and let `dir = Q.direction`, `filter = Q.type_filter`. Define an operator: + +```text +F_Q(X) = + S ∪ { m ∈ G.Nodes | + ∃ n ∈ (X ∩ G.Nodes), + m ∈ Neighbors_dir(G, n, filter) + } +``` + +Then: + +* `F_Q` is monotone on the powerset of `Node` (ordered by ⊆). +* The sequence `X₀ = S`, `X₁ = F_Q(X₀)`, `X₂ = F_Q(X₁)`, … stabilizes after finitely many steps on `G.Nodes` because `G.Nodes` is finite (§0.4). +* Its least fixpoint `lfp(F_Q)` satisfies: + + ```text + lfp(F_Q) = Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) + ``` + +for unbounded queries. + +For bounded queries (with `DepthLimit` present), §3.2’s depth-based construction is normative. The fixpoint view can still be recovered by intersecting the unbounded closure with the nodes whose minimum depth is ≤ `D`, but `TGK/PROV/1` does not rely on that reformulation. + +This characterization is provided to make properties like monotonicity and idempotence easy to reason about when designing higher-level provenance policies. + +### 3.6 Depth maps and layering + +The `depth_Q` construction can be surfaced to callers as a canonical **depth map** and **layering** over the closure. + +#### 3.6.1 Operation: `prov_depths` + +Define: + +```text +prov_depths( + G: ProvenanceGraph, + S: SeedSet, // finite + Q: ProvQuery +) -> DepthMap +``` + +semantically as: + +```text +prov_depths(G, S, Q) = + { (n, d) | + depth_Q(G, S, n) = d // d is defined by §3.2 + } +``` + +Viewed as a map, its **domain** is exactly: + +```text +dom(prov_depths(G, S, Q)) = Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) +``` + +because: + +* every seed `n ∈ S` has `depth_Q(G, S, n) = 0` by definition (even if `n ∉ G.Nodes`), and +* all nodes in `G.Nodes` with a defined depth also belong to `Reachable_Q(G, S)`. + +Properties: + +* For any `n` with `n ∈ Reachable_Q(G, S)`, `prov_depths(G, S, Q)[n]` is the minimum hop-count depth from some `s ∈ S` to `n` under `G` and `Q`. +* For any `n` with `n ∉ Reachable_Q(G, S)`, `prov_depths(G, S, Q)` has no entry for `n`. +* For every seed `n ∈ S`, `prov_depths(G, S, Q)[n] = 0`. Seeds therefore always appear in the depth map at depth 0 whenever they are present in the seed set. + +Because `Reachable_Q(G, S)` is finite for finite `S` (§3.3), `prov_depths(G, S, Q)` is always a finite map. + +In particular, for any node `n` with `n ∈ G.Nodes` and `prov_depths(G, S, Q)[n] > 0`, there exists at least one selected edge `(ref, body) ∈ SelectedEdges(G, filter)` and some node `m` with: + +```text +prov_depths(G, S, Q)[m] = prov_depths(G, S, Q)[n] - 1 +``` + +such that `m` and `n` are connected by that edge according to `Q.direction`. This is the “parent along some shortest path” witness implied by §3.4. + +Implementations MAY choose any concrete representation for `DepthMap` (e.g. associative map, list of `(Node, depth)` pairs) as long as it is observable-equivalent to the mapping above. + +`prov_depths` is a **derived view**: it does not change closure semantics; it simply exposes the depth information already used to define `Reachable_Q(G, S)`. + +Collecting these facts: + +> **TGK/PROV-DEPTH-MAP/1** +> For any `ProvenanceGraph G`, finite seed set `S : SeedSet`, and query `Q`: +> +> * `dom(prov_depths(G, S, Q)) = Reachable_Q(G, S) = prov_closure_nodes(G, S, Q)`; +> * every seed `n ∈ S` has `prov_depths(G, S, Q)[n] = 0`; and +> * for every `n` in the domain, `prov_depths(G, S, Q)[n]` is the minimum hop-count length of any admissible `Q`-path from some seed in `S` to `n`. +> +> Higher-layer profiles MAY rely on this depth map as a canonical notion of “distance from seeds” under `(G, Q)`. + +#### 3.6.2 Operation: `prov_layers` + +Define: + +```text +prov_layers( + G: ProvenanceGraph, + S: SeedSet, // finite + Q: ProvQuery +) -> ProvLayering +``` + +semantically in terms of `prov_depths`: + +1. Let `DMap = prov_depths(G, S, Q)`. + +2. Let: + + ```text + Depths_Q(G, S) = { d | ∃ n, (n, d) ∈ DMap } + ``` + + i.e., the set of depths actually realized by reachable nodes. + +3. For each `d ∈ Depths_Q(G, S)`, define a layer: + + ```text + Layer(d) = ProvLayer { + depth = d + nodes = { n | (n, d) ∈ DMap } + } + ``` + +4. Define `ProvLayering.layers` to be the list: + + ```text + layers = [ Layer(d₀), Layer(d₁), ..., Layer(d_k) ] + ``` + + where `{d₀, ..., d_k} = Depths_Q(G, S)` and `d₀ < d₁ < ... < d_k` (strictly increasing). If `Depths_Q(G, S)` is empty (i.e., the closure is empty), then `layers` is the empty list. + +Requirements: + +* Each `Layer(d).nodes` MUST be non-empty by construction. + +* The union over all `Layer(d).nodes` MUST equal `Reachable_Q(G, S)`: + + ```text + ⋃_{d ∈ Depths_Q(G, S)} Layer(d).nodes + = Reachable_Q(G, S) + ``` + +* Distinct layers have disjoint node sets: + + ```text + d₁ ≠ d₂ ⇒ Layer(d₁).nodes ∩ Layer(d₂).nodes = ∅ + ``` + +Seeds therefore always appear in the unique layer with `depth = 0` whenever `S` is non-empty. + +Thus, `prov_layers` provides a canonical **partition** of the reachable node set by depth, with explicit depth labels and a deterministic ascending ordering of layers. In particular, if `S` is non-empty, seeds always appear in the unique layer with `depth = 0`. + +`ProvLayering` is often convenient for: + +* rendering trace frontiers stepwise (“distance from seed”), and +* defining layer-based transformations (e.g., trimming nodes beyond a given depth, or summarizing all nodes at a depth into a single fact). + +As with `prov_depths`, `prov_layers` is a pure derived view that does not change any closure or trace semantics elsewhere in this specification. + +> **TGK/PROV-LAYERING-PARTITION/1** +> For any `G`, finite `S : SeedSet`, and `Q`, the result of `prov_layers(G, S, Q)` partitions `Reachable_Q(G, S)` into non-empty, pairwise-disjoint depth-labeled layers whose union is exactly `Reachable_Q(G, S)`, with the layer depths strictly increasing. + +--- + +## 4. TraceGraph Semantics + +`prov_trace` lifts node closure to a concrete subgraph view. + +### 4.1 TraceGraph structure + +Recall: + +```text +TraceGraph { + seeds: set + nodes: set + edges: set<(EdgeRef, EdgeBody)> +} +``` + +A `TraceGraph T` is always interpreted **relative to some base `ProvenanceGraph G`**. This spec does not encode the association `T ↔ G`; profiles that persist traces MUST make that association explicit (e.g. via receipts in `FER/1`). + +For convenience, define for any edge set `E`: + +```text +EdgeNodes(E) = + { n ∈ Node | + ∃ (ref, body) ∈ E, + n ∈ body.from ∪ body.to ∪ { body.payload } + } +``` + +### 4.2 Operation: `prov_trace` + +Logical signature: + +```text +prov_trace( + G: ProvenanceGraph, + S: SeedSet, // finite + Q: ProvQuery +) -> TraceGraph +``` + +Semantics: + +1. **Compute node closure** + + ```text + N = prov_closure_nodes(G, S, Q) // i.e., N = Reachable_Q(G, S) + ``` + +2. **Define the trace edge set** + + Let `filter = Q.type_filter`. Define: + + ```text + E = + ⋃ { IncEdges(G, n, filter) | n ∈ N } + ``` + + That is, `E` is the union, over all closure nodes, of the selected-type edges incident (via `from` or `to`) to those nodes. + + Equivalently (and as a direct set comprehension): + + ```text + E = + { (ref, body) ∈ SelectedEdges(G, filter) | + (body.from ∩ N ≠ ∅) ∨ (body.to ∩ N ≠ ∅) + } + ``` + + Notes: + + * `payload` does **not** control inclusion; edges are included based on `from` / `to`. + * Edges whose type is not in `EffectiveTypeSet(filter, G)` are excluded, even if incident to nodes in `N`. + +3. **Define the trace node set** + + The trace node set is: + + ```text + Nodes_T = S ∪ EdgeNodes(E) + ``` + + i.e., seeds plus all nodes that appear in `from`, `to`, or `payload` for edges in `E`. + + By construction and by the properties of `depth_Q`: + + * `Reachable_Q(G, S) ⊆ Nodes_T` (see §4.3); and + * for any `n ∈ Reachable_Q(G, S) ∩ G.Nodes` with `n ∉ S`, there exists at least one edge in `E` with `n` in `body.from` or `body.to`, so `n ∈ EdgeNodes(E)`. + + Thus this definition is equivalent to the alternative view: + + ```text + Nodes_T = Reachable_Q(G, S) ∪ EdgeNodes(E) + ``` + + but is stated purely in terms of seeds and edges. + +4. **Construct the TraceGraph** + + ```text + TraceGraph T { + seeds = S + nodes = Nodes_T + edges = E + } + ``` + +### 4.3 Invariants + +For any `G`, finite `S : SeedSet`, `Q`, let `T = prov_trace(G, S, Q)`. Then: + +1. **Seed inclusion** + + ```text + T.seeds = S + S ⊆ T.nodes + ``` + +2. **Edge subset** + + ```text + T.edges ⊆ G.Edges + ``` + + No edges are introduced that are not in the base `ProvenanceGraph`. + +3. **Node closure over edges** + + All nodes incident to edges in `T.edges` are in `T.nodes` by construction: + + ```text + { n ∈ Node | + ∃ (ref, body) ∈ T.edges, + n ∈ body.from ∪ body.to ∪ { body.payload } + } ⊆ T.nodes + ``` + +4. **Trace node characterization** + + Trace nodes are exactly **seeds plus edge nodes**: + + ```text + T.nodes = S ∪ EdgeNodes(T.edges) + ``` + + In particular, every node in `T.nodes` that is not a seed appears at least once as an endpoint or payload of some edge in `T.edges`. + +5. **Reachable-node containment** + + ```text + Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) ⊆ T.nodes + ``` + + Seeds and all nodes reachable under `(G, Q)` are present in `T.nodes`. + + For any `n ∈ G.Nodes` with `n ∈ Reachable_Q(G, S)` and `n ∉ S`, there exists at least one edge `(ref, body) ∈ T.edges` with `n ∈ body.from ∪ body.to`. Nodes that are only seeds (possibly not in `G.Nodes`) MAY have degree zero in `T`. + +6. **No off-graph provenance** + + * All edges in `T.edges` are drawn from `G.Edges`. + * All nodes in `T.nodes` are seeds or nodes that appear somewhere in the `EdgeBody` of some edge in `G.Edges`. + + The operator does not invent nodes or edges that are not present in the underlying `ProvenanceGraph`. + +7. **Finiteness** + + Because `G` is finite (§0.4) and `S` is a finite `SeedSet` (§0.3), `T` is always finite: + + ```text + |T.edges| ≤ |G.Edges| + |T.nodes| ≤ |G.Nodes| + |S| + ``` + + In particular, for any finite seed set `S`, `prov_trace` cannot diverge or yield an infinite trace graph for a finite base `ProvenanceGraph`. + +Collecting 2–6: + +> **TGK/PROV-TRACE-SUBGRAPH/1** +> For any `ProvenanceGraph G`, finite seed set `S : SeedSet`, and query `Q`, `prov_trace(G, S, Q)` yields a finite `TraceGraph T` such that: +> +> * `T.edges ⊆ G.Edges`, and +> * every node in `T.nodes` is either a seed or incident (as `from`, `to`, or `payload`) to some edge in `T.edges`. +> +> No nodes or edges outside `G` appear in `T`. + +And more specifically: + +> **TGK/PROV-TRACE-NODES/1** +> For any `G`, finite `S : SeedSet`, `Q` with `T = prov_trace(G, S, Q)`: +> +> * `T.seeds = S`; +> * `T.nodes = S ∪ EdgeNodes(T.edges)`; and +> * `Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) ⊆ T.nodes`. +> +> In particular, all reachable nodes from `S` under `(G, Q)` are contained in `T.nodes`, but the trace may additionally contain nodes introduced as endpoints or payload nodes of incident edges. + +### 4.4 Direction and depth impact + +* `direction` and `depth_limit` affect `T.nodes` and `T.edges` **indirectly via** `N = Reachable_Q(G, S)`: + + * Looser direction or larger depth typically produce larger `N`, and thus larger `T.edges` and `T.nodes`. + * Stricter direction or smaller depth produce smaller `N`, and thus smaller `T.edges` and `T.nodes`. + +* Edge inclusion is **incident-based**, not path-based: + + * An edge is in `T.edges` if and only if it is: + + * of a selected type; and + * incident (via `from` or `to`) to at least one node in `N`. + + * There is no additional requirement that the edge itself lie along a shortest path. Nodes adjacent to closure nodes via such edges MAY lie at a graph-theoretic distance greater than the depth limit; they are still included in `T.nodes` because they are endpoints or payload nodes of included edges. + +Profiles that need path-structured traces (e.g., minimal explanation trees, trees restricted strictly to nodes within a depth bound) MUST define those as further projections of `TraceGraph` (e.g., pruning edges and nodes that do not belong to any admissible path), without modifying the kernel semantics here. + +### 4.5 Derived edge-closure helper (informative) + +For completeness, it is sometimes convenient to speak about the **edge closure** associated with `(G, S, Q)`: + +```text +prov_closure_edges(G, S, Q) = + { (ref, body) ∈ SelectedEdges(G, Q.type_filter) | + (body.from ∩ Reachable_Q(G, S) ≠ ∅) ∨ + (body.to ∩ Reachable_Q(G, S) ≠ ∅) + } +``` + +By inspection, for any `G`, finite `S : SeedSet`, `Q`: + +```text +prov_closure_edges(G, S, Q) = prov_trace(G, S, Q).edges +``` + +This helper is purely definitional and does not add semantics beyond §4.2; it is provided for profiles that wish to talk separately about edge-closure vs node-closure. + +### 4.6 TraceGraph closure and monotonicity (informative) + +Using the properties of `Reachable_Q` in §3.4, `prov_trace` enjoys corresponding algebraic properties. For any fixed `G`: + +1. **Seed monotonicity** + + If `S₁ ⊆ S₂` (finite `SeedSet`s), then for any `Q`: + + ```text + prov_trace(G, S₁, Q).nodes ⊆ prov_trace(G, S₂, Q).nodes + prov_trace(G, S₁, Q).edges ⊆ prov_trace(G, S₂, Q).edges + ``` + +2. **Depth-limit monotonicity** + + For `Q` and `Q'` identical except for `DepthLimit`, with `Q.depth_limit` absent and `Q'.depth_limit` present, or with `Q.depth_limit ≤ Q'.depth_limit` when both are present: + + ```text + prov_trace(G, S, Q).nodes ⊆ prov_trace(G, S, Q').nodes + prov_trace(G, S, Q).edges ⊆ prov_trace(G, S, Q').edges + ``` + +3. **Edge-type filter monotonicity** + + For `Q₁` and `Q₂` identical except for `type_filter`, if: + + ```text + EffectiveTypeSet(Q₁.type_filter, G) ⊆ EffectiveTypeSet(Q₂.type_filter, G) + ``` + + then for any finite seed set `S : SeedSet`: + + ```text + prov_trace(G, S, Q₁).nodes ⊆ prov_trace(G, S, Q₂).nodes + prov_trace(G, S, Q₁).edges ⊆ prov_trace(G, S, Q₂).edges + ``` + + That is, allowing more edge types (while keeping `direction` and `depth_limit` fixed) produces a super-trace that contains all nodes and edges of the more restricted trace. + +4. **Seed-union and idempotence** + + * For any finite seed sets `S₁`, `S₂` and fixed `Q`: + + ```text + prov_trace(G, S₁ ∪ S₂, Q).nodes + = prov_trace(G, S₁, Q).nodes ∪ prov_trace(G, S₂, Q).nodes + + prov_trace(G, S₁ ∪ S₂, Q).edges + = prov_trace(G, S₁, Q).edges ∪ prov_trace(G, S₂, Q).edges + ``` + + * For any finite `S` and fixed `Q`, using `S' = Reachable_Q(G, S)`: + + ```text + prov_trace(G, S, Q).nodes = prov_trace(G, S', Q).nodes + prov_trace(G, S, Q).edges = prov_trace(G, S', Q).edges + ``` + + I.e., using the closure as the new seed set does not change the trace. + +These properties are consequences of the definitions in §3 and §4 and are intended to support reasoning about compositional provenance operators in higher-level profiles. + +> **TGK/PROV-TRACE-MONO/1** +> For any fixed `ProvenanceGraph G` and query `Q`, the operator: +> +> ```text +> T_Q(G, S) = prov_trace(G, S, Q) +> ``` +> +> is: +> +> * **monotone in finite seed sets** (`S₁ ⊆ S₂ ⇒ T_Q(G, S₁) ⊆ T_Q(G, S₂)` on both nodes and edges), +> * **monotone in depth limits** (larger or absent `DepthLimit` yields a super-trace), and +> * **monotone in effective edge types** (larger `EffectiveTypeSet` yields a super-trace), +> +> and satisfies **seed-union** and **idempotence** with respect to node closure as in the equations above. Higher-layer profiles MAY rely on these properties when comparing or composing traces. + +### 4.7 Depth-aware trace views (informative) + +While `TraceGraph` itself does not carry depth annotations, implementations often need to present or persist depth information alongside traces (e.g., for layered UIs or explanation trees). + +For any `G`, finite `S : SeedSet`, `Q`, it is always valid to pair: + +```text +T = prov_trace (G, S, Q) +DMap = prov_depths(G, S, Q) +Layers = prov_layers(G, S, Q) +``` + +with the following relationships: + +* `dom(DMap) = Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) ⊆ T.nodes`. + +* For any `(n, d) ∈ DMap`, node `n` is in the `ProvLayer` with `depth = d`. + +* Nodes in `T.nodes \ dom(DMap)` (if any) are exactly the additional nodes introduced as endpoints or payload nodes of edges in `T.edges` that lie beyond the depth bound or are adjacent via incident-only inclusion (§4.4). Equivalently: + + ```text + T.nodes \ dom(DMap) = EdgeNodes(T.edges) \ Reachable_Q(G, S) + ``` + +Higher-layer profiles MAY define depth-annotated trace encodings that: + +* embed `DMap` or `Layers` into Artifacts; or +* derive narrative or visual structures from `ProvLayering`. + +Such profiles MUST treat `prov_depths` and `prov_layers` as **derived views** of the same kernel closure — they MUST NOT redefine depths in ways that contradict `depth_Q` (§3.2). + +--- + +## 5. Relationship to TGK/1-CORE Invariants + +`TGK/PROV/1` is designed to be consistent with, and to operationalize, the core TGK invariants. + +### 5.1 Provenance kernel invariant + +Recall from `TGK/1-CORE`: + +> **TGK/PROV-KERNEL/CORE/1** +> For any snapshot (finite Artifact set + profile set), the corresponding `ProvenanceGraph` is a pure function of that input, and any graph indexes or materialized views are optimizations only. + +`TGK/PROV/1` extends this to traces and depth views: + +> **TGK/PROV-TRACE-KERNEL/1** +> For any fixed `ProvenanceGraph G`, finite seed set `S : SeedSet`, and query `Q`, all of: +> +> * `prov_closure_nodes(G, S, Q)`, +> * `prov_depths (G, S, Q)`, +> * `prov_layers (G, S, Q)`, and +> * `prov_trace (G, S, Q)` +> +> are pure functions of `(G, S, Q)`. Any stored or cached trace or depth views are optimizations only and MUST be consistent with these definitions. + +Together with **TGK/PROV-CLOSURE-OP/1**, **TGK/PROV-SEED-SEM/1**, and **TGK/PROV-DEPTH-MAP/1**, this guarantees that higher-level provenance policies can treat `Reachable_Q` and the associated depth map as a well-behaved closure-and-distance structure over finite seed sets. + +### 5.2 Determinism + +From `TGK/1-CORE`: + +> **TGK/DET/CORE/1** +> For a fixed snapshot + profile set, any two implementations derive isomorphic `ProvenanceGraph`s. + +`TGK/PROV/1` inherits and refines this: + +> **TGK/PROV-DET/1** +> For any two `TGK/1-CORE`–conformant implementations that derive the same `ProvenanceGraph G` for a snapshot, and for any fixed finite `S : SeedSet` and `Q`, the results of: +> +> * `prov_closure_nodes(G, S, Q)`, +> * `prov_depths (G, S, Q)`, +> * `prov_layers (G, S, Q)`, and +> * `prov_trace (G, S, Q)` +> +> MUST be identical (same node set, same edge set, identical depth assignments, and identical per-depth layer partition) across implementations. + +### 5.3 No off-graph provenance + +From `TGK/1-CORE`: + +> **TGK/NO-OFF-GRAPH-PROV/CORE/1** +> Provenance-relevant relationships must be expressible as edges over `Reference`s; no hidden mutable state. + +`TGK/PROV/1` enforces: + +* All input is `ProvenanceGraph G`. +* All output edges are a subset of `G.Edges` (by **TGK/PROV-TRACE-SUBGRAPH/1**). +* All output nodes are seeds or nodes that appear somewhere in the `EdgeBody` of some edge in `G.Edges` (by **TGK/PROV-TRACE-NODES/1**). +* Depths and layers are derived solely from the adjacency structure of `G.Edges` and the finite seed set `S`, consistent with **TGK/PROV-DEPTH-MAP/1** and **TGK/PROV-LAYERING-PARTITION/1**. +* Seed behavior in closure is governed by **TGK/PROV-SEED-SEM/1**. +* Traversal uses only `from` and `to` endpoints for adjacency, not `payload`, as required by **TGK/PROV-PAYLOAD-NONTRAV/1**. + +Traversal and depth assignment never consult stores, clocks, external indexes, or non-graph metadata. + +--- + +## 6. Interaction with TGK/STORE/1 (Informative) + +Although `TGK/PROV/1` is in terms of `ProvenanceGraph` only, most implementations will obtain graph structure via a `TGK/STORE/1` `GraphStoreInstance`. + +For a fixed snapshot: + +* `G.Edges` corresponds to the set of all `GraphEdgeView { edge_ref, body }` reachable via: + + * `scan_edges` (if implemented), or + * adjacency queries plus `resolve_edge`. + +* `EdgeTypeFilter` here corresponds directly to `TGK/STORE/1`’s `EdgeTypeFilter`: + + * an empty `types` list means “all supported edge types actually present in the graph”; + * a non-empty `types` list is intersected with the instance’s supported edge types, and then with the set of types that actually appear in edges. + +* `Neighbors_BACKWARD`, `Neighbors_FORWARD`, and `Neighbors_BOTH` can be implemented via combinations of: + + * `edges_from(node, type_filter)` and + * `edges_to(node, type_filter)`, + + with `type_filter` derived from `Q.type_filter` and direction mapped to OUT/IN/BOTH as appropriate: + + * BACKWARD ≈ `edges_to` (IN) followed by neighbors in `from`, + * FORWARD ≈ `edges_from` (OUT) followed by neighbors in `to`, + * BOTH ≈ union of the two. + +Depth-based operators (`prov_depths`, `prov_layers`) can be implemented atop the same adjacency operations: + +* Traversal order and depth assignments must follow the semantics of `depth_Q` (§3.2), independent of the order in which the graph store returns edges. +* Any paging or batching at the `TGK/STORE/1` layer MUST NOT alter the resulting depth map or closure. + +An implementation that uses `TGK/STORE/1` MUST ensure that its traversals: + +* respect the identity and projection semantics of `TGK/STORE/1`; and +* produce node and edge sets, along with depth and layer assignments, that match `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` exactly for some snapshot. + +Nothing in `TGK/PROV/1` requires or forbids specific paging, caching, or distribution strategies in the graph store; it only constrains the observable results. + +--- + +## 7. Examples (Informative Sketches) + +These examples are illustrative only and non-normative. + +### 7.1 Backward execution provenance + +Assume a profile `TGK/PEL/1` defines: + +* `EdgeTypeId = EDGE_EXECUTION`. +* For each execution: + + ```text + EdgeBody.type = EDGE_EXECUTION + EdgeBody.from = [program_ref] ∪ input_refs + EdgeBody.to = output_refs ∪ [execution_result_ref] + EdgeBody.payload = execution_result_ref + ``` + +Let: + +* `G` be a `ProvenanceGraph` that includes such edges. +* `S = { o }` where `o` is a particular output ArtifactRef. +* `Q` with: + + * `direction = BACKWARD` + * `type_filter = { EDGE_EXECUTION }` + * `depth_limit = absent` (unbounded) + +Then: + +* `prov_closure_nodes(G, {o}, Q)` contains: + + * `o` at depth `0`, + * the execution result and input/program nodes at depth `1`, + * and recursively all ancestors reachable via `EDGE_EXECUTION` edges. + +* `prov_depths(G, {o}, Q)` yields a map where, for example: + + * `prov_depths(...)[o] = 0`, + * `prov_depths(...)[execution_result] = 1`, + * `prov_depths(...)[input_i] = 1`, + * etc., with deeper ancestors at depth `2`, `3`, and so on. + +* `prov_layers(G, {o}, Q)` partitions these nodes into depth-based layers. + +* `prov_trace(G, {o}, Q)` includes: + + * all `EDGE_EXECUTION` edges incident (via `from` or `to`) to any node in this closure, and + * all endpoint and payload nodes of those edges. + +Higher layers may then further project `TraceGraph` and associated layers into an explanation tree, or annotate it with timings, certs, or UI overlays. + +### 7.2 Mixed execution + evidence provenance + +Suppose a profile `TGK/FER/1` defines: + +* `EDGE_RECEIPT_SUPPORTS` edges from receipt Artifacts to Artifacts supported by the receipts. + +A domain profile might choose: + +* a `type_filter` that includes both `EDGE_EXECUTION` and `EDGE_RECEIPT_SUPPORTS`; +* `direction = BACKWARD`, to follow from outputs back to inputs and receipts. + +`TGK/PROV/1` does not need to know what “execution” or “receipt” means; it only knows that these are edge types that define how to walk the graph, how to compute reachable nodes, and how to assign depths and layers. + +--- + +## 8. Conformance + +An implementation is **TGK/PROV/1–conformant** if, for each provenance operator it claims to implement, it satisfies all of the following: + +1. **Graph-based semantics** + + * Treats `ProvenanceGraph` as the sole graph abstraction. + * Does not introduce any off-graph nodes or edges into `TraceGraph` results or into any depth or layering views, consistent with **TGK/NO-OFF-GRAPH-PROV/CORE/1**, **TGK/PROV-TRACE-SUBGRAPH/1**, and **TGK/PROV-TRACE-NODES/1**. + +2. **Parameter handling** + + * Interprets `EdgeTypeFilter` as in §2.1 (including the empty-list “all types present in `G`” rule, and the monotonicity rule when effective edge-type sets are nested). + * Implements `ProvDirection` semantics exactly as in §2.2, including the distinction between BACKWARD, FORWARD, and BOTH, and the restriction that traversal uses `from` / `to` only (not `payload`), as required by **TGK/PROV-PAYLOAD-NONTRAV/1**. + * Interprets `DepthLimit` as a hop-count limit as in §2.3 (no expansions from nodes at depth `D`, with the `D = 0` seeds-only behavior). + * Treats seeds as **finite sets** of nodes: + + * All semantics in this document are defined on a finite `SeedSet S`, and conformance is judged relative to that finite set. + * If an implementation exposes APIs that accept seeds as lists or sequences, it MUST interpret them semantically as finite sets when computing provenance (ignoring duplicates and not depending on order). + * Implementations MUST NOT attempt to interpret or emulate infinite seed families when invoking `TGK/PROV/1` operators. + +3. **Node closure** + + * For any `G`, finite `S : SeedSet`, `Q`, computes `prov_closure_nodes(G, S, Q)` as in §3.3, i.e.: + + ```text + prov_closure_nodes(G, S, Q) = Reachable_Q(G, S) + ``` + + * May use any traversal algorithm (e.g., BFS, DFS with memoization) but MUST produce exactly the `Reachable_Q(G, S)` set. + + * Ensures that, for fixed `G` and `Q`, the mapping `S ↦ prov_closure_nodes(G, S, Q)` over finite seed sets behaves as a closure operator (extensive, monotone, idempotent), as stated in **TGK/PROV-CLOSURE-OP/1**. + + * Preserves the seed semantics in **TGK/PROV-SEED-SEM/1**: + + * seeds are always included in the closure, + * empty seed sets produce empty closures, + * seeds not in `G.Nodes` are in the closure but never expand the frontier. + + * Respects the **edge-type filter monotonicity** of §3.4.5: when two queries differ only in `type_filter` with nested effective type sets, the closure for the narrower edge set is a subset of the closure for the broader set. + +4. **Depth maps and layering** + + * If it exposes `prov_depths`, it MUST: + + * compute `depth_Q(G, S, n)` as in §3.2, and + * return a `DepthMap` exactly equal to the mapping in §3.6.1 (domain = `Reachable_Q(G, S) = prov_closure_nodes(G, S, Q)`, seeds at depth 0, values = minimum hop-count depth), consistent with **TGK/PROV-DEPTH-MAP/1**. + + * If it exposes `prov_layers`, it MUST: + + * derive `ProvLayering` from `prov_depths` as in §3.6.2, + * ensure that layers are ordered by strictly increasing `depth` and that nodes are partitioned by depth without overlap, consistent with **TGK/PROV-LAYERING-PARTITION/1**. + +5. **TraceGraph construction** + + * For any `G`, finite `S : SeedSet`, `Q`, computes `prov_trace(G, S, Q)` as in §4.2. + + * Ensures all invariants in §4.3 hold, including: + + * `T.edges ⊆ G.Edges`, + * `T.seeds = S`, + * `T.nodes = S ∪ EdgeNodes(T.edges)`, + * `Reachable_Q(G, S) = prov_closure_nodes(G, S, Q) ⊆ T.nodes`. + + * For any node `n` with `n ∈ G.Nodes`, `n ∈ Reachable_Q(G, S)`, and `prov_depths(G, S, Q)[n] > 0`, ensures that there exists at least one edge `(ref, body) ∈ T.edges` with `n ∈ body.from ∪ body.to`, so that every non-seed reachable node appears incident to at least one trace edge. + + * Ensures that its outputs respect the monotonicity and idempotence properties of §4.6, consistent with **TGK/PROV-TRACE-MONO/1** (monotone in seeds, depth limits, and effective edge types; seed-union and idempotence with respect to closure). + +6. **Determinism** + + * For fixed `G`, finite `S : SeedSet`, and `Q`, produces the same outputs across runs and across equivalent implementations, consistent with §5.2: + + * same closure node set, + * same depth assignments (if provided), + * same per-depth layering (if provided), + * same trace edge and node sets. + +7. **Layering** + + * Does not assume any particular physical storage or graph index model. + * If implemented on top of `TGK/STORE/1`, respects the semantics of that profile and does not alter its identity or projection guarantees. + +8. **Neutrality** + + * Does not bake in PEL-, CIL-, FER-, FCT-, or OI-specific semantics. + * Leaves provenance policies (which edge types to include and how to present them) to higher-level profiles and applications. + +Everything else — user-facing query languages, result serialization, UI representations, domain-specific policies, optimization strategies — is outside the scope of `TGK/PROV/1` and MAY be specified by additional profiles. + +--- + +## 9. Version History (Informative) + +**0.1.20 — 2025-11-16** + +* Marked the draft as stable for use as an Amduat 2.0 substrate baseline candidate. +* Performed only micro editorial cleanups (punctuation and phrasing) without changing any definitions, equations, or invariants. +* For any finite snapshot `G` and finite `SeedSet S`, the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` is **identical** to v0.1.19; this version simply records that the kernel provenance semantics are considered ready for review and adoption. + +**0.1.19 — 2025-11-16** + +* Added §0.4 (**Snapshots & finiteness assumptions**) to collect in one place the finiteness guarantees inherited from `TGK/1-CORE`: + + * each `ProvenanceGraph G` considered by this spec is the projection of a finite snapshot, with finite `G.Nodes` and `G.Edges`, and + * all semantics in `TGK/PROV/1` are defined relative to such a finite `G` and a finite `SeedSet`. + +* Cross-referenced that finiteness assumption where it is used: + + * in §3.2 and §3.3 when arguing that depth assignment and closure construction always terminate and yield finite results, and + * in §3.5 and §4.3 when stating that fixpoint iterations stabilize and `TraceGraph` is always finite. + +* These changes are **purely clarificatory**: for any finite snapshot `G` and finite seed set `S`, the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` is unchanged from v0.1.18. The new subsection and cross-references simply make the reliance on `TGK/1-CORE`’s finiteness assumptions explicit and easy to cite from other documents. + +**0.1.18 — 2025-11-16** + +* Introduced an explicit `SeedSet` alias in §0.3: + + ```text + SeedSet = set // always finite under TGK/PROV/1 + ``` + + and used this terminology in the core sections (§3, §4, §5, §8) to make “finite seed sets” a first-class, named concept rather than repeating “finite set of nodes” in prose. + +* Clarified in §1.1, §3.1, and the conformance section that all operators are defined over **finite seed sets** and that the closure operator in **TGK/PROV-CLOSURE-OP/1** is formally a map on `P_f(Node)` (finite subsets of `Node`), aligning notation with the already-stated finiteness assumptions from v0.1.17. + +* Slightly tightened wording around `SeedSet` in parameter handling (§8) to emphasize that: + + * implementations MUST interpret any seed input as a finite set (duplicates ignored, order irrelevant), and + * MUST NOT attempt to treat infinite seed families as valid inputs to `TGK/PROV/1` operators. + +These changes are purely **terminological and clarificatory**. For any finite seed set `S`, the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` is identical to v0.1.17; the new `SeedSet` alias simply makes the finiteness domain and closure-operator laws more explicit. + +**0.1.17 — 2025-11-16** + +* Made the **finiteness assumptions** for seeds explicit and wired them into the invariants: + + * In §0.3, stated that all seed sets `S` are assumed to be **finite**; seeds are now formally “finite sets of nodes” rather than arbitrary subsets of `Node`. + * In §3.1 and §3.3, clarified that the closure operator is defined for finite `S`, and noted explicitly that `Reachable_Q(G, S)` is always finite because both `G.Nodes` and `S` are finite. + * In §3.4 and §4.6, phrased the closure and trace monotonicity/idempotence laws in terms of **finite seed sets**, while preserving their algebraic form. + +* Tightened the **finiteness invariant for traces and depth maps**: + + * In §4.3.7, updated the `TraceGraph` finiteness argument to rely on both `G` and `S` being finite, and qualified the “cannot diverge” statement as applying to finite seed sets. + * In §3.6.1, noted that `prov_depths(G, S, Q)` is always a finite map because its domain is the finite closure `Reachable_Q(G, S)`. + +* Strengthened **conformance** around seed sets: + + * In §8 (parameter handling), elevated finiteness to a normative requirement: implementations MUST interpret seeds as finite sets and MUST NOT attempt to treat infinite seed families as inputs to `TGK/PROV/1` operators. + * Kept the existing “seeds are sets” rule (duplicates ignored, order irrelevant) and made it clear that this refers to finite sets. + +These changes are **clarificatory** for all realistic deployments, where seeds are always finite. For any finite seed set `S`, the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` is unchanged compared to v0.1.16. + +**0.1.16 — 2025-11-16** + +* Made the equivalence between the **named closure operator** and the internal `Reachable_Q` set completely explicit: + + * In §3.3, defined `prov_closure_nodes(G, S, Q)` directly as `Reachable_Q(G, S)` and stated the equality normatively, and in §3.4 and §3.5 referred to `Cl_Q(S)` as both `Reachable_Q(G, S)` and `prov_closure_nodes(G, S, Q)`. + * In §3.6.1 and §4.7, updated the depth and trace relationships to reference both `Reachable_Q(G, S)` and `prov_closure_nodes(G, S, Q)`, making it clear that depth maps and layers are always defined over the same closure set used by `prov_closure_nodes`. + +* Elevated the **“seeds are sets”** behavior from an informal remark to an explicit conformance requirement: + + * Kept the explanatory note in §0.3 that seeds are always treated as sets (duplicates ignored). + * In §8 (conformance), added a normative bullet under parameter handling: implementations whose APIs accept seeds as lists or sequences MUST interpret them semantically as sets when computing provenance, so results depend only on the set of seeds, not their multiplicity or order. + +* Minor editorial consistency cleanups: + + * Clarified in §3.6.2 that when the closure is empty, `ProvLayering.layers` is an empty list (while still forbidding empty individual layers). + * Slightly rephrased invariant statements in §3.6.1, §4.3, and §4.7 to consistently mention both `Reachable_Q(G, S)` and `prov_closure_nodes(G, S, Q)`, reducing the chance of misreading them as distinct concepts. + +These changes are **purely clarificatory** relative to v0.1.15: they do not alter the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, or `prov_trace` for any `(G, S, Q)`, but make the correspondence between the closure operator, its depth views, and the conformance obligations easier to see and reason about. + +**0.1.15 — 2025-11-16** + +* Clarified the behavior of **self-loops and repeated endpoints** in neighbor semantics: + + * In §2.2, made explicit that if a selected edge has the same node in both `from` and `to`, that node is its own neighbor under `FORWARD` and `BACKWARD`, and appears only once as a neighbor under `BOTH`. + * Stated that multiple occurrences of the same node within `from` and/or `to` do not change neighbor sets, since neighbors are mathematical sets. + +* Strengthened the **depth and layering explanations** around seeds: + + * In §3.6.1, highlighted that `dom(prov_depths)` is exactly the closure `Reachable_Q(G, S)` and that seeds always appear in the depth map at depth 0, even when they are not in `G.Nodes`. + * In §3.6.2, reiterated that when `S` is non-empty, seeds always appear in the unique layer with `depth = 0`, making the layering behavior around seeds precise and easy to rely on. + +* Minor editorial and consistency improvements: + + * Clarified in §0.3 that seeds are treated as **sets** of nodes (deduplicated), even if supplied as lists in concrete APIs. + * Slightly tightened the explanation in §4.2–§4.3 that trace nodes are **exactly** seeds plus nodes appearing in the included edges, while reachable nodes are guaranteed to be a subset of that node set. + * Removed a stray editorial fragment at the end of the 0.1.0 history entry, keeping the version history clean and self-contained. + +These changes are **pure clarifications** over v0.1.14. They do not alter the semantics of `prov_closure_nodes`, `prov_depths`, `prov_layers`, or `prov_trace` for any `(G, S, Q)`; they simply make certain edge cases (self-loops, duplicated endpoints, and seed behavior in depth/layer views) more explicit for implementers and profile authors. + +**0.1.14 — 2025-11-16** + +* Made the **non-traversal role of payload** fully explicit and normative: + + * In §2.2 (“Payload nodes”), introduced **TGK/PROV-PAYLOAD-NONTRAV/1**, stating that kernel provenance traversal MUST NOT use `EdgeBody.payload` as a source of neighbors and that adjacency, depth, and closure are defined purely in terms of `from` and `to`. + * Clarified that profiles wishing to traverse payload values MUST either represent those relationships via additional edges or define profile-specific traversal on top of this kernel, without altering the core neighbor functions. + +* Gave the **trace monotonicity and idempotence** properties an explicit rule name: + + * In §4.6, collected the existing seed, depth-limit, and edge-type monotonicity properties, plus the seed-union and idempotence equalities, into **TGK/PROV-TRACE-MONO/1** for easier reference from higher-layer profiles. + * Emphasized that `prov_trace` is monotone in seeds, depth limits, and effective edge types, and idempotent under reseeding with the node closure, mirroring the closure-operator law **TGK/PROV-CLOSURE-OP/1**. + +* Tightened **conformance** around these invariants: + + * In §8, required implementations to enforce the payload non-traversal rule (**TGK/PROV-PAYLOAD-NONTRAV/1**) as part of parameter handling, and to ensure that trace outputs satisfy the monotonicity and idempotence properties summarized in **TGK/PROV-TRACE-MONO/1**. + * These obligations are purely **expository**: they make explicit requirements that were already implied by the earlier definitions of `Neighbors_dir`, `Reachable_Q`, and `prov_trace`. No operational semantics changed compared to v0.1.13; the same graphs, closures, depths, and traces are computed for any given `(G, S, Q)`. + +**0.1.13 — 2025-11-16** + +* Made the **dependence on edge-type selection** explicitly monotone throughout the kernel: + + * In §2.1, clarified that `EffectiveTypeSet(filter, G)` not only determines which edges participate in traversal, but also induces a natural monotonicity: enlarging the effective edge-type set can only add neighbors, reachable nodes, and trace edges, never remove them. + * In §2.2, noted that, for fixed `G`, `n`, and `direction`, `Neighbors_dir(G, n, filter)` is monotone in `EffectiveTypeSet(filter, G)` and always returns nodes in `G.Nodes`, making the endpoint set of neighbors explicit. + * In §3.4, added an **edge-type filter monotonicity** property as item 5: when two queries differ only in `type_filter` and one effective type set is a subset of the other, the corresponding closures are nested (`Reachable_Q₁ ⊆ Reachable_Q₂`). + * In §4.6, extended the trace monotonicity discussion with an explicit **edge-type filter monotonicity** rule for `prov_trace`: allowing more edge types produces a super-trace that contains all nodes and edges of the more restricted trace. + +* Tightened **conformance** around filter behavior: + + * In §8, updated the parameter-handling and node-closure bullets to require that implementations respect the new edge-type monotonicity property (derived directly from the semantics), so that higher layers can safely depend on “more edge types ⇒ bigger or equal trace/closure” when composing provenance policies. + +* All changes are **strictly definitional**: they give names and explicit statements to monotonicity properties already implied by the original `SelectedEdges`, `Neighbors_dir`, and `Reachable_Q` definitions. The operational behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, and `prov_trace` for any concrete `G`, `S`, `Q` is unchanged compared to v0.1.12. + +**0.1.12 — 2025-11-16** + +* Captured seed behavior as an explicit, named invariant: + + * In §3.3, introduced **TGK/PROV-SEED-SEM/1**, summarizing the seed-related rules already implied by the closure definition: + + * seeds are always included in the closure, regardless of filters or depth limits; + * an empty seed set yields an empty closure; + * seeds not in `G.Nodes` are included in the closure but never expand the frontier or affect reachability inside `G.Nodes`. + + * Clarified that changes to `EdgeTypeFilter` or `DepthLimit` may add or remove non-seed reachable nodes, but MUST NOT remove seeds from the closure. + +* Wired the new seed invariant into: + + * §5.1, so that the provenance kernel description explicitly references both closure-operator behavior and seed semantics; and + * §8 (conformance), by requiring implementations to preserve the seed semantics in **TGK/PROV-SEED-SEM/1** alongside the existing closure, depth, layering, and trace invariants. + +* Removed minor duplicated trailing text in the v0.1.0 history entry for editorial cleanliness, without changing any normative semantics. + +**0.1.11 — 2025-11-16** + +* Made the depth semantics more explicit and named the key invariants: + + * In §3.6.1, clarified that `prov_depths` always includes **all seeds**, even seeds not in `G.Nodes`, with depth 0, and that its domain is exactly `Reachable_Q(G, S)`. + * Introduced **TGK/PROV-DEPTH-MAP/1** to capture the invariant that `prov_depths`: + + * has domain equal to the closure `Reachable_Q(G, S)`, + * assigns seeds depth 0, and + * assigns each node its minimal admissible-path hop-count depth. + +* Tightened the layering definition and gave it a named invariant: + + * In §3.6.2, emphasized that `prov_layers` is a **strict partition** of `Reachable_Q(G, S)` into non-empty, disjoint depth-labeled layers whose union is exactly the reachable node set. + * Introduced **TGK/PROV-LAYERING-PARTITION/1** to make this partition property easy to cite from higher-layer profiles. + +* Updated §5.1–§5.3 and §8 to reference the new invariants, so conformance explicitly guarantees: + + * closure-operator behavior for `Reachable_Q` (**TGK/PROV-CLOSURE-OP/1**), + * canonical minimal-depth semantics for `prov_depths` (**TGK/PROV-DEPTH-MAP/1**), and + * partition semantics for `prov_layers` (**TGK/PROV-LAYERING-PARTITION/1**), + without changing any substantive behavior compared to v0.1.10. + +**0.1.10 — 2025-11-16** + +* Introduced an explicit **closure-operator invariant** for node closure: + + * In §3.4, added **TGK/PROV-CLOSURE-OP/1**, stating that for fixed `G` and `Q` the map `S ↦ Reachable_Q(G, S)` is a Kuratowski-style closure operator on subsets of `Node` (extensive, monotone, idempotent). + * This makes it easy for higher-level profiles to rely on algebraic properties of provenance closure when composing queries or policies. + +* Gave the trace subgraph and node characterization explicit **normative rule names**: + + * In §4.3, collected the existing invariants into: + + * **TGK/PROV-TRACE-SUBGRAPH/1** (trace is a finite subgraph of `G` with no invented nodes or edges), and + * **TGK/PROV-TRACE-NODES/1** (node characterization as `seeds ∪ EdgeNodes(edges)` plus reachable-node containment). + + * Wired these invariants into §5.3 and §8 so conformance is explicitly tied to them. + +* Slightly tightened the prose around the relationships between `Reachable_Q`, `TraceGraph.nodes`, and the depth/layer views: + + * Clarified in §4.7 that `T.nodes \ dom(DMap)` is exactly `EdgeNodes(T.edges) \ Reachable_Q(G, S)`, i.e. the “extra” nodes introduced purely as endpoints or payload nodes of incident edges. + * Emphasized in §5.1 that **TGK/PROV-TRACE-KERNEL/1** and **TGK/PROV-CLOSURE-OP/1** jointly guarantee a clean, graph-only kernel for higher-layer provenance semantics. + +* All changes are **strictly definitional** and do not alter the behavior of `prov_closure_nodes`, `prov_depths`, `prov_layers`, or `prov_trace` compared to v0.1.9; they only assign names to already-implied properties and link them into conformance. + +**0.1.9 — 2025-11-16** + +* Tightened the **TraceGraph node-set characterization**: + + * Defined `TraceGraph.nodes` canonically as `seeds ∪ EdgeNodes(edges)` in §4.2, and made this equality explicit as an invariant in §4.3. + * Noted that this is equivalent to the earlier `Reachable_Q(G, S) ∪ EdgeNodes(E)` view but expressed purely in terms of seeds and edges, which simplifies validation of persisted traces. + +* Strengthened the link between **depths and edges**: + + * Clarified in §3.6.1 that any node in `G.Nodes` with depth > 0 must have at least one “parent along some shortest path” edge consistent with `Q.direction`. + * Reflected this in conformance (§8) by requiring that any such node appear incident (via `from` / `to`) to at least one edge in the corresponding `TraceGraph`. + +* Refined **depth-aware trace relationships**: + + * In §4.7, made explicit that `T.nodes \ dom(prov_depths)` is exactly `EdgeNodes(T.edges) \ Reachable_Q(G, S)`, i.e. the additional nodes introduced as endpoints or payload nodes of edges that lie beyond the depth bound or appear only via incident-based inclusion. + +* Confirmed that all changes are **semantically compatible** with v0.1.8: the closure, depth, layering, and trace operators still compute the same results; the new material only sharpens equalities, invariants, and conformance obligations. + +**0.1.8 — 2025-11-16** + +* Made the **depth map** and **layering** views first-class: + + * Introduced `DepthMap`, `ProvLayer`, and `ProvLayering` types in §0.3. + * Defined a normative `prov_depths(G, S, Q)` operator (§3.6.1) whose domain is exactly `Reachable_Q(G, S)` and whose values are the minimum hop-count depths already implicit in `depth_Q`. + * Defined a normative `prov_layers(G, S, Q)` operator (§3.6.2) that partitions the reachable node set into non-empty depth-labeled layers, ordered by increasing depth. + +* Extended the **kernel invariants** to cover depth views: + + * Updated `TGK/PROV-TRACE-KERNEL/1` (§5.1) and `TGK/PROV-DET/1` (§5.2) to include `prov_depths` and `prov_layers` alongside `prov_closure_nodes` and `prov_trace`. + +* Clarified depth-aware trace usage: + + * Added §4.7 to describe how `TraceGraph`, `DepthMap`, and `ProvLayering` relate, and to note that nodes outside `dom(prov_depths)` are precisely those introduced as endpoints of incident edges beyond the depth bound or via incident-only inclusion. + +* Updated the **conformance** section (§8) so that, when implementations choose to expose depth or layering, those views MUST be derived exactly from `depth_Q` as defined in §3.2–§3.3. + +All changes were **purely definitional** and did not alter the semantics of `prov_closure_nodes` or `prov_trace` as defined in v0.1.7. + +**0.1.7 — 2025-11-16** + +* Extended the closure properties in §3.4 with an explicit **seed-union** law and **idempotence** of `Reachable_Q` when used as a seed set, making the algebraic behavior of the closure operator easier to reference from higher layers. +* Introduced a new §4.6 to collect the corresponding **TraceGraph monotonicity and idempotence** properties for `prov_trace`, showing how changes in seeds and depth limits affect both node and edge sets and how re-seeding with the closure leaves the trace unchanged. +* Kept all changes purely definitional and algebraic; no operational semantics were altered, and all behaviors remain compatible with v0.1.6. + +**0.1.6 — 2025-11-16** + +* Corrected the treatment of seeds that are not in `G.Nodes` (§3.3), making it explicit that such seeds cannot appear as endpoints of edges in `G.Edges` by the definition of `ProvenanceGraph.Nodes` in `TGK/1-CORE`, and therefore influence only the seed/closure node sets and not the traversed portion of the graph. +* Introduced an explicit definition of **admissible `Q`-paths** in §3.4 and rewrote the path characterization there to refer to these paths, tightening the connection between the inductive depth construction and the intuitive “shortest path length” interpretation. +* Performed minor editorial cleanups around the closure properties to keep the relationship between `depth_Q`, admissible paths, and BFS-style semantics precise and easy to reference from higher-level profiles. + +**0.1.5 — 2025-11-16** + +* Clarified `DepthLimit` semantics, making the `D = 0` “seeds-only” special case explicit and aligning the prose more clearly with the hop-count view already implied by the inductive depth definition. +* Tightened the description of how empty seed sets and empty `EffectiveTypeSet` interact with depth limits, ensuring the resulting closure behavior is fully spelled out and obviously consistent across cases. +* Added an explicit **finiteness** invariant for `TraceGraph`, noting that `prov_trace` always returns a finite result when `ProvenanceGraph` is finite and seed sets are finite in practice. +* Introduced an informative **fixpoint characterization** (§3.5) of unbounded closures as the least fixpoint of a monotone operator, to make reasoning about monotonicity and idempotence easier for higher-level profiles. +* Strengthened the `TGK/STORE/1` interaction discussion (§6) to state explicitly that `EdgeTypeFilter` here shares semantics with `TGK/STORE/1`’s filter, and to map `ProvDirection` to `GraphDirection`-style OUT/IN/BOTH neighbor semantics more concretely. + +**0.1.4 — 2025-11-16** + +* Made the behavior of `prov_closure_nodes` under special cases explicit and normative: + + * Seeds not in `G.Nodes` are always depth-0 but never generate neighbors; they appear only as seeds and closure nodes. + * An empty seed set yields an empty closure for any `Q`. + * If `EffectiveTypeSet(filter, G)` is empty, node closure is exactly the seed set. + +* Re-expressed the trace edge set `E` in §4.2 directly as: + + ```text + E = ⋃ { IncEdges(G, n, filter) | n ∈ N } + ``` + + and noted its equivalence to the previous comprehension, so that `SelectedEdges`, `IncEdges`, and the neighbor definitions form an obviously composable family of helpers. + +* Performed minor editorial cleanups to keep the distinction between sets, lists, and ordering explicit, and to underline that any ordering concerns live in `TGK/STORE/1`, not in this provenance kernel. + +**0.1.3 — 2025-11-16** + +* Made the role of `SelectedEdges(G, filter)` and `IncEdges(G, n, filter)` explicit and reused them consistently between closure and trace definitions, to make the composition of definitions clearer and reduce repetition. +* Clarified the inductive definition of `depth_Q` so that seeds that are not in `G.Nodes` are explicitly acknowledged as depth-0 but never appear in any `Frontier_d`, while still being included in `Reachable_Q(G, S)` (§3.2–§3.3). +* Tightened language around set semantics (no ordering, no multiplicity) and made it explicit that any ordering concerns live in `TGK/STORE/1`, not this profile. +* Slightly sharpened the mapping between `prov_trace` and the derived helper `prov_closure_edges`, and made the use of `SelectedEdges` in §4.2 structurally parallel to the node-closure definitions. + +**0.1.2 — 2025-11-16** + +* Added an explicit **closure properties** subsection (§3.4) to capture: + + * seed monotonicity (more seeds → superset of reachable nodes), + * depth-limit monotonicity (larger depth bound → superset of reachable nodes), + * and the path characterization of `depth_Q` (BFS-style minimum path length). + +* Introduced a small, purely definitional helper `prov_closure_edges(G, S, Q)` (§4.5) to make it easier for profiles to talk separately about edge-closure vs node-closure, while keeping it normatively equal to `prov_trace(...).edges`. + +* Clarified in §2.2 / §3.2 that neighbor and frontier sets are always taken over `G.Nodes`, and that seeds outside `G.Nodes` never expand the frontier but remain part of the closure and in `TraceGraph.nodes`. + +**0.1.1 — 2025-11-16** + +* Clarified `TraceGraph` construction so that: + + * `TraceGraph.edges` is the set of selected-type edges incident (via `from`/`to`) to at least one node in the closure `Reachable_Q(G, S)`. + * `TraceGraph.nodes` is the union of the closure nodes and all endpoint/payload nodes of those edges (`EdgeNodes(E)`), ensuring that all nodes incident to edges in the trace are present in `TraceGraph.nodes`. + +* Fixed the node-set definition to guarantee the “node closure over edges” invariant: + + ```text + { incident nodes of T.edges } ⊆ T.nodes + ``` + +* Made the incident-based nature of edge inclusion explicit in §4.4, and noted that nodes adjacent to closure nodes via included edges may lie beyond the depth bound but are still part of `TraceGraph` as edge endpoints. + +* Introduced helper notions `SelectedEdges(G, filter)`, `IncEdges(G, n, filter)`, and `EdgeNodes(E)` for clearer, reusable definitions. + +* Tightened the description of seeds not in `G.Nodes`: they receive depth `0`, do not expand the traversal, but remain in the closure and in `TraceGraph.nodes`. + +**0.1.0 — 2025-11-16** + +* Initial draft of `TGK/PROV/1` as a kernel provenance semantics layer over `TGK/1-CORE`. + +* Defined core parameter model: + + * `ProvDirection` (BACKWARD, FORWARD, BOTH), + * `EdgeTypeFilter` (edge-type selection), + * `DepthLimit` (hop-count bound). + +* Introduced `prov_closure_nodes` as a deterministic node-set closure operator over a fixed `ProvenanceGraph` and query. + +* Introduced `TraceGraph` and `prov_trace` as a subgraph projection induced by node closure, with: + + * incident-based edge inclusion for selected edge types, and + * payload nodes included but not traversed. + +* Aligned semantics explicitly with TGK kernel invariants: + + * `TGK/PROV-KERNEL/CORE/1`, + * `TGK/DET/CORE/1`, + * `TGK/NO-OFF-GRAPH-PROV/CORE/1`. + +* Documented the relationship with `TGK/STORE/1` (graph store) as informative only, keeping `TGK/PROV/1` store-neutral and purely semantic. + +--- + +## Document History + +* **0.1.20 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline. diff --git a/tier1/tgk-store-1.md b/tier1/tgk-store-1.md new file mode 100644 index 0000000..d7b80e8 --- /dev/null +++ b/tier1/tgk-store-1.md @@ -0,0 +1,1156 @@ +# TGK/STORE/1 — Graph Store and Query Semantics + +Status: Approved +Owner: Niklas Rydberg +Version: 0.2.4 +SoT: Yes +Last Updated: 2025-11-16 +Linked Phase Pack: N/A +Tags: [traceability, execution] + + + +**Document ID:** `TGK/STORE/1` +**Layer:** L1.7 — Graph store & query profile over ASL/1-STORE + TGK/1-CORE + +**Depends on (normative):** + +* `ASL/1-CORE v0.4.x` — value substrate: `Artifact`, `Reference`, `TypeTag`, identity model +* `ASL/1-STORE v0.4.x` — content-addressable store: `StoreInstance`, `StoreConfig`, `put/get` +* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId`, `ProvenanceGraph` + +**Informative references:** + +* `SUBSTRATE/STACK-OVERVIEW v0.3.x` — layering & dependency discipline +* `ENC/ASL1-CORE v1.x` — canonical `ArtifactBytes` / `ReferenceBytes` +* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HashId` registry) +* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for TGK `EdgeBody` / EdgeArtifacts +* `PEL/1-SURF v0.2.x` — store-backed execution surface (producer of many EdgeArtifacts) +* `CIL/1`, `FER/1`, `FCT/1`, `OI/1` — provenance and fact profiles that consume graph queries +* (future) `TGK/PROV/1` — provenance / trace operators over `ProvenanceGraph` + +> **Normativity note** +> `TGK/STORE/1` is a near-core **graph store profile**, not a kernel surface. It introduces no new identity scheme. It defines how to expose and query the `ProvenanceGraph` from `TGK/1-CORE` over Artifacts reachable via `ASL/1-STORE` or equivalent feeds, and constrains graph indexes and query semantics. + +© 2025 Niklas Rydberg. + +## License + +Except where otherwise noted, this document (text and diagrams) is licensed under +the Creative Commons Attribution 4.0 International License (CC BY 4.0). + +The identifier registries and mapping tables (e.g. TypeTag IDs, HashId +assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 +Universal (CC0) to enable unrestricted reuse in implementations and derivative +specifications. + +Code examples in this document are provided under the Apache License 2.0 unless +explicitly stated otherwise. Test vectors, where present, are dedicated to the +public domain under CC0 1.0. + + +--- + +## 0. Conventions + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, +**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be +interpreted as described in RFC 2119. + +From `ASL/1-CORE`: + +```text +Artifact { + bytes: OctetString + type_tag: optional TypeTag +} + +Reference { + hash_id: HashId // uint16 + digest: OctetString +} + +TypeTag { + tag_id: uint32 +} + +HashId = uint16 +EncodingProfileId = uint16 +```` + +From `ASL/1-STORE` (snapshot view): + +```text +StoreConfig { + encoding_profile: EncodingProfileId + hash_id: HashId +} + +// Logical view: for a snapshot +StoreInstance.M : Reference -> Artifact // 0 or 1 Artifact per Reference +``` + +From `TGK/1-CORE`: + +```text +Node := Reference + +EdgeTypeId = uint32 + +EdgeBody { + type: EdgeTypeId + from: Node[] // ordered, MAY be empty + to: Node[] // ordered, MAY be empty + payload: Reference // always present +} + +ProvenanceGraph { + Nodes: set + Edges: set<(EdgeRef, EdgeBody)> +} +``` + +In this document: + +* **ArtifactRef**, **Node**, and **EdgeRef** are all ASL/1 `Reference` values. +* **ExecutionEnvironment** is as in `TGK/1-CORE`: an abstract context with a finite Artifact set and a fixed TGK profile set “in effect” at a snapshot. + +Additional terms introduced here: + +```text +GraphStoreInstance -- logical graph store as seen by TGK/STORE/1 +GraphStoreConfig -- configuration of a GraphStoreInstance +GraphStoreSnapshot -- logical state of a GraphStoreInstance at an instant +GraphEdgeView -- (EdgeRef, EdgeBody) pair +GraphDirection -- OUT, IN, BOTH +EdgeTypeFilter -- subset of EdgeTypeId values, possibly empty (“ALL”) +PageToken -- opaque cursor for paginated scans +``` + +`TGK/STORE/1` defines **logical semantics only**. Concrete APIs (HTTP, gRPC, language bindings), index structures, physical storage, distribution, and replication are out of scope. + +--- + +## 1. Purpose, Scope & Non-Goals + +### 1.1 Purpose + +`TGK/STORE/1` defines the **graph store abstraction and basic query semantics** for the TGK trace graph over ASL/1 Artifacts. + +It provides: + +* A model of a **GraphStoreInstance** that: + + * draws its Artifacts from one or more `ASL/1-STORE` instances or equivalent sources; and + * exposes the **ProvenanceGraph** defined by `TGK/1-CORE` as a queryable graph. + +* A minimal set of **query operations** over that graph: + + * **Edge resolution** by `EdgeRef`: + + ```text + resolve_edge(EdgeRef) -> EdgeBody | error + ``` + + * **Adjacency queries** by `Node` and direction (expressed as three operations): + + ```text + edges_from(node, type_filter) -> list + edges_to(node, type_filter) -> list + edges_incident(node, type_filter) -> list + ``` + + * An optional **edge scan** surface: + + ```text + scan_edges(type_filter, page_token?) -> + (list, next_page_token?) + ``` + + * An optional **neighbor query** surface: + + ```text + neighbors(node, type_filter, direction) -> list + ``` + +* A small **error model** for graph-level queries (e.g. “ref is not an edge Artifact in this graph”). + +Goals: + +* **Projection fidelity** — all graph views MUST be consistent with the `ProvenanceGraph` projection defined by `TGK/1-CORE` for some Artifact set and TGK profile set. +* **Identity discipline** — nodes and edges are identified solely by ASL/1 `Reference` values; no new ID schemes are introduced. +* **Separation of concerns** — graph queries are separate from provenance algorithms and policy decisions (defined in `TGK/PROV/1` and higher profiles). + +### 1.2 Non-goals + +`TGK/STORE/1` explicitly does **not** define: + +* A graph query language (path expressions, joins, aggregations). +* Provenance or trace operators (e.g. “all ancestors under these edge types”) — those belong to `TGK/PROV/1`. +* Certificate, fact, or overlay semantics (`CIL/1`, `FER/1`, `FCT/1`, `OI/1`). +* How Artifacts are transported between stores or replicated. +* Concrete API shapes, authentication, authorization, multi-tenancy, quotas. +* Index layout, sharding, partitioning, or caching strategies. + +It is strictly about: + +> **TGK/STORE-SCOPE/1** +> Given a finite Artifact set and a TGK profile configuration, `TGK/STORE/1` defines what graph exists (via `TGK/1-CORE`) and how basic, identity-preserving graph queries MUST behave. + +### 1.3 Layering and dependencies + +`TGK/STORE/1` sits: + +* **Above ASL/1-STORE**: + + * Assumes Artifacts are retrievable via content-addressable stores or equivalent feeds. + * Does not change `put/get` semantics or introduce new identity notions. + +* **Above TGK/1-CORE**: + + * Reuses the `ProvenanceGraph` definition as a pure projection over Artifacts and TGK profiles. + * Does not redefine `EdgeBody` or `ProvenanceGraph`. + +* **Below provenance and fact profiles**: + + * `TGK/PROV/1`, `FER/1`, `FCT/1`, `CIL/1`, `OI/1` MAY build higher-level query and interpretation surfaces on top of `TGK/STORE/1`. + +Layering invariants: + +> **TGK/STORE-LAYERING/1** +> `TGK/STORE/1` **MUST NOT**: +> +> * introduce any new notion of identity for nodes or edges; +> * bake in scheme-specific or domain-specific semantics (e.g. PEL, CIL, FCT); +> * violate `TGK/GRAPH-PROJECTION/CORE/1` or `TGK/DET/CORE/1`. + +All graph edges seen through a GraphStoreInstance **MUST** be derivable exactly as in `TGK/1-CORE` from: + +* some finite underlying Artifact set; and +* that instance’s configured TGK profile set (edge tags, encodings, type catalogs). + +--- + +## 2. Core Graph Store Model + +### 2.1 GraphStoreInstance and GraphStoreSnapshot + +A **GraphStoreInstance** is an abstract component that, at any given logical instant, has a **GraphStoreSnapshot**. + +For a given snapshot, a GraphStoreInstance is characterized by: + +```text +GraphStoreSnapshot { + config: GraphStoreConfig + + // Logical, not necessarily materialized: + Artifacts: finite set + Provenance: ProvenanceGraph +} +``` + +Where: + +* `Artifacts` is the finite set of Artifacts this snapshot “sees” (logically reachable from its backing stores, archives, feeds, etc.) and that fall within its configured identity domains. +* `config.tgk_profiles` is the TGK profile set “in effect” at this snapshot. +* `Provenance` is the unique `ProvenanceGraph` induced by `(Artifacts, config.tgk_profiles)`, as defined in `TGK/1-CORE §4.1` and `TGK/GRAPH-PROJECTION/CORE/1`. + +A GraphStoreSnapshot is conceptually an instance of a `TGK/1-CORE` ExecutionEnvironment snapshot, plus an explicit description of which identity domains and sources contribute to its Artifact set. + +Implementations are **not required** to materialize `Artifacts` or `Provenance` explicitly. They MAY: + +* maintain incremental indexes, +* cache decoded edges, +* stream edges from backing stores, + +so long as all graph query semantics in this document are defined **as if** queries were evaluated directly over `Provenance.Edges` and `Provenance.Nodes` for some snapshot. + +A GraphStoreInstance is **logically read-only** with respect to the graph: it does not create or mutate Artifacts. It only reflects whatever Artifacts and profiles are in scope for each snapshot. Writing edges is done by writing EdgeArtifacts (and other Artifacts) into the underlying stores or feeds. + +For a given GraphStoreInstance, `GraphStoreConfig` is fixed across snapshots. Config changes (e.g. adding or removing identity domains or TGK profiles) SHOULD be represented as a new GraphStoreInstance or a clearly versioned reconfiguration, not as an in-place mutation of an existing instance’s config. + +### 2.2 GraphStoreConfig + +A **GraphStoreConfig** describes the identity and provenance view for a GraphStoreInstance: + +```text +GraphStoreConfig { + id_space: IdSpaceConfig + artifact_scope: ArtifactScope + tgk_profiles: TGKProfileSet +} +``` + +Snapshots of a given GraphStoreInstance share the same `GraphStoreConfig`; they differ only in `Artifacts` and the derived `ProvenanceGraph`. + +#### 2.2.1 Identity domains (`IdSpaceConfig`) + +```text +IdSpaceConfig { + domains: list +} + +IdentityDomain { + encoding_profile: EncodingProfileId // e.g. ASL_ENC_CORE_V1 + hash_id: HashId // e.g. 0x0001 (HASH-ASL1-256) +} +``` + +Constraints: + +* Each `IdentityDomain` MUST be compatible with `ASL/1-CORE` and with the encoding profile it claims to support. +* A GraphStoreInstance MAY support multiple identity domains, but it MUST treat each `(encoding_profile, hash_id)` pair as a distinct identity domain for Artifact resolution. It MUST NOT collapse or alias different domains at this layer. +* GraphStoreInstances MUST treat `Reference` values as opaque identities: they MUST NOT attempt to “re-derive” or reinterpret `digest` bytes under a different encoding profile or hash algorithm. + +For any `Reference ref`, the GraphStoreInstance MUST either: + +* associate `ref.hash_id` with exactly one `IdentityDomain` in `id_space.domains`; or +* treat that `hash_id` as **unsupported** for graph purposes. + +The binding from a `Reference` to an `IdentityDomain` (e.g. to a particular `StoreInstance` or federation) is an implementation concern. `TGK/STORE/1` only requires that, for all domains, the effective `Reference -> Artifact` mapping behaves like an `ASL/1-STORE`-style partial function. + +#### 2.2.2 Artifact scope (`ArtifactScope`) + +```text +ArtifactScope { + // Informative description of where Artifacts come from. + // Examples: + // - a single StoreInstance + // - a set of StoreInstances + // - a union of stores plus imported archives + description: OctetString +} +``` + +`ArtifactScope` is descriptive; `TGK/STORE/1` does not standardize how Artifacts are discovered, ingested, or kept up to date. The only semantic requirement is that, for any snapshot, there is a well-defined finite `Artifacts` set over which the `ProvenanceGraph` is computed. + +#### 2.2.3 TGK profile set (`TGKProfileSet`) + +```text +TGKProfileSet { + edge_tags: set // TypeTag.tag_id values treated as edge tags + edge_types: set // supported edge types + encodings: set // edge encoding profiles (e.g. TGK1_EDGE_ENC_V1) + // plus optional catalog/profile identifiers +} +``` + +Constraints (reflecting `TGK/1-CORE §3`): + +* For any `TypeTag.tag_id` in `edge_tags`, the profiles in `encodings` MUST define consistent decoding behavior for Artifacts with that tag. If more than one profile applies to a given Artifact, they MUST all decode it to the same `EdgeBody` value. +* Only `EdgeBody.type` values in `edge_types` are considered supported edge types for this instance. Artifacts that otherwise look like edges but whose decoded `EdgeBody.type` is not in `edge_types` MUST NOT contribute edges to `Provenance`. + +> **Environment-relative note** +> As in `TGK/1-CORE`, edgehood and edge-type semantics are relative to `TGKProfileSet`. Different GraphStoreInstances over the same physical Artifacts but with different profile sets MAY expose different edges. `TGK/STORE/1` guarantees determinism only *relative* to a given `GraphStoreConfig` and snapshot. + +### 2.3 Relationship to ASL/1-STORE + +`TGK/STORE/1` assumes that `Artifacts` are obtained from one or more sources that are, at the logical level, compatible with `ASL/1-STORE`’s model: + +```text +Reference -> Artifact // partial, immutable mapping per identity domain +``` + +A GraphStoreInstance: + +* MAY be implemented directly on top of one or more `StoreInstance`s; +* MAY draw from additional Artifact feeds (e.g. import-only archives, append-only logs); +* MUST, for each identity domain in `IdSpaceConfig.domains`, behave as if there is a pure mapping: + + ```text + resolve_artifact(ref: Reference) -> + Artifact + | ERR_NOT_FOUND + | ERR_INTEGRITY + | ERR_UNSUPPORTED + ``` + + that is consistent with `ASL/1-STORE` semantics for that domain. + +If multiple physical sources are combined for a given identity domain, the effective `resolve_artifact` mapping MUST still satisfy `ASL/1-STORE`’s constraints: for any `ref`, at most one `Artifact` value; any detection of conflicting Artifacts for the same `ref` MUST result in an integrity error surfaced to callers. + +If `ref.hash_id` does not belong to any `IdentityDomain` in `IdSpaceConfig.domains`, `resolve_artifact(ref)` MUST behave as `ERR_UNSUPPORTED` for the purposes of this profile. + +`TGK/STORE/1` does **not** define new store operations. It constrains how graph queries may use Artifact resolution and how underlying store errors propagate into graph-level errors. + +--- + +## 3. Graph Store Types & Error Model + +### 3.1 Node, EdgeRef, GraphEdgeView + +These are simply re-aliases of TGK and ASL types: + +```text +Node := Reference // TGK node +EdgeRef := Reference // reference to an EdgeArtifact +``` + +For convenience, this profile defines: + +```text +GraphEdgeView { + edge_ref : EdgeRef + body : EdgeBody +} +``` + +* Each `GraphEdgeView` represents a single edge in the `ProvenanceGraph.Edges` set. +* For any snapshot and any `GraphEdgeView { edge_ref, body }` produced by the graph store, the following MUST hold: + + ```text + resolve_edge(edge_ref) = body + ``` + + when `resolve_edge` is evaluated against the same snapshot. + +### 3.2 Direction and type filter + +Graph direction is a small enum that higher-level APIs or profiles MAY use: + +```text +GraphDirection (u8) { + OUT = 1 // edges where node ∈ EdgeBody.from + IN = 2 // edges where node ∈ EdgeBody.to + BOTH = 3 // union of OUT and IN +} +``` + +`EdgeTypeFilter` is defined as: + +```text +EdgeTypeFilter { + // If empty, match all supported edge types. + types: list +} +``` + +Matching semantics: + +* If `types` is empty, all `EdgeBody.type` values in the instance’s `tgk_profiles.edge_types` set are included. +* Otherwise, only edges with `EdgeBody.type ∈ types` are included (after intersecting with `tgk_profiles.edge_types`). Repeated entries in `types` have no additional effect. + +### 3.3 GraphStore error model + +`TGK/STORE/1` defines a minimal logical error model for graph queries: + +```text +GraphErrorCode = uint8 + +GraphErrorCode { + GS_ERR_NOT_EDGE = 1 // ref resolves to a non-edge Artifact (for this TGKProfileSet) + GS_ERR_ARTIFACT_ERROR = 2 // underlying artifact resolution error + GS_ERR_UNSUPPORTED = 3 // unsupported identity domain, hash_id, or profile + GS_ERR_INTEGRITY = 4 // TGK or encoding-level integrity failure +} +``` + +These codes are **graph-level**. Concrete APIs MAY refine or enrich them, but MUST preserve the semantics: + +1. **GS_ERR_NOT_EDGE** + + Condition: + + * An explicit `resolve_edge(ref)` call where: + + * the reference resolves to an Artifact `A`, but + * `A` is not an EdgeArtifact in this instance’s `TGKProfileSet` sense (e.g. `type_tag` not in `edge_tags`, or decoding fails with “not this profile”, or decoded type not in `edge_types`). + + In adjacency and scan queries, Artifacts that cannot be decoded as edges under `TGKProfileSet` MUST simply be excluded from results; they do **not** cause query failure. + +2. **GS_ERR_ARTIFACT_ERROR** + + Condition: + + * Underlying Artifact resolution via `resolve_artifact(ref)` yields `ERR_NOT_FOUND` or `ERR_INTEGRITY` for a `ref` the graph store is attempting to resolve in the context of `resolve_edge`. + + Logically: “artifact-level error that prevents graph semantics from being evaluated for this `ref`.” + +3. **GS_ERR_UNSUPPORTED** + + Conditions: + + * The GraphStoreInstance does not support the identity domain, `hash_id`, or edge encoding profile required to interpret the provided `Reference` or edge payload, even though the Artifact might exist; or + * `resolve_artifact(ref)` yields `ERR_UNSUPPORTED` for this `ref`’s identity domain; or + * `ref.hash_id` does not belong to any configured `IdentityDomain` in `id_space.domains`. + +4. **GS_ERR_INTEGRITY** + + Conditions include: + + * The GraphStoreInstance detects an inconsistency that violates TGK/1-CORE invariants or encoding-profile invariants, such as: + + * An Artifact previously indexed as an EdgeArtifact is now resolved to different bytes or `type_tag`. + * The same `Artifact.bytes` is decoded inconsistently by configured edge profiles. + * A decoded `EdgeBody` violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1`. + +Concrete APIs can choose how to surface these (exceptions, error variants, status codes), but MUST distinguish: + +* “Not an edge in this graph store” (`GS_ERR_NOT_EDGE`), +* “Underlying artifact or identity issue” (`GS_ERR_ARTIFACT_ERROR` / `GS_ERR_INTEGRITY`), +* “Graph store cannot interpret this domain/profile” (`GS_ERR_UNSUPPORTED`). + +### 3.4 GraphStoreInstance interface (logical) + +For clarity, this profile treats `PageToken` as: + +```text +PageToken := OctetString // opaque to TGK/STORE/1; structure is implementation-defined +``` + +A GraphStoreInstance that claims `TGK/STORE/1` conformance MUST, at the logical level, support at least the following operations for each snapshot: + +```text +get_config() -> GraphStoreConfig + +resolve_edge( + ref: EdgeRef +) -> EdgeBody | GraphErrorCode + +edges_from( + node: Node, + type_filter: EdgeTypeFilter +) -> list + +edges_to( + node: Node, + type_filter: EdgeTypeFilter +) -> list + +edges_incident( + node: Node, + type_filter: EdgeTypeFilter +) -> list +``` + +and MAY additionally support: + +```text +scan_edges( + type_filter: EdgeTypeFilter, + page_token: optional PageToken +) -> ( + edges: list, + next_page_token: optional PageToken +) + +neighbors( + node: Node, + type_filter: EdgeTypeFilter, + direction: GraphDirection +) -> list +``` + +Subject to the semantics specified in the rest of this document: + +* `get_config()` MUST return the `GraphStoreConfig` associated with the GraphStoreInstance (identical for all snapshots of that instance). +* `resolve_edge` MUST behave as in §4. +* `edges_from`, `edges_to`, and `edges_incident` MUST behave as in §5. +* `scan_edges`, if implemented, MUST behave as in §6. +* `neighbors`, if implemented, MUST behave as in §5.7. + +These signatures are **logical**: concrete APIs MAY group, name, or transport them differently (e.g. as RPC endpoints, methods on a class, or functions in a module), but their observable behavior MUST be equivalent to these operations applied to some well-defined snapshot. + +--- + +## 4. Edge Resolution Semantics + +### 4.1 Operation: `resolve_edge` + +Logical signature: + +```text +resolve_edge(ref: EdgeRef) -> + EdgeBody + | GS_ERR_NOT_EDGE + | GS_ERR_ARTIFACT_ERROR + | GS_ERR_UNSUPPORTED + | GS_ERR_INTEGRITY +``` + +Semantics for a given `GraphStoreSnapshot`: + +1. **Artifact resolution** + + * The graph store attempts to resolve `ref` to an `Artifact`: + + ```text + resolve_artifact(ref) -> + Artifact + | ERR_NOT_FOUND + | ERR_INTEGRITY + | ERR_UNSUPPORTED + ``` + + * If `resolve_artifact` returns `ERR_UNSUPPORTED` for this `ref`’s identity domain (e.g., unknown `(encoding_profile, hash_id)`), or if `ref.hash_id` is not supported by any configured `IdentityDomain`, the graph store MUST return `GS_ERR_UNSUPPORTED`. + + * If it returns `ERR_NOT_FOUND` or `ERR_INTEGRITY`, the graph store MUST return `GS_ERR_ARTIFACT_ERROR`. + +2. **Edgehood check and decoding** + + * If resolution succeeds with `A : Artifact`: + + * If `A.type_tag` is absent, or `A.type_tag.tag_id` is not in the instance’s `tgk_profiles.edge_tags`, the graph store MUST return `GS_ERR_NOT_EDGE`. + + * Otherwise, the graph store applies the configured edge-encoding profile(s) to `A.bytes` as per `TGK/1-CORE §3.2`: + + * If no active edge encoding profile decodes `A.bytes` successfully, the graph store MUST return `GS_ERR_NOT_EDGE`. + + * If one or more profiles decode successfully but disagree on the resulting `EdgeBody`, the graph store MUST return `GS_ERR_INTEGRITY`. + + * If decoding yields an `EdgeBody` whose `type` is not in `tgk_profiles.edge_types`, the Artifact MUST NOT be treated as an edge for this instance; `resolve_edge(ref)` MUST return `GS_ERR_NOT_EDGE`. + + * Otherwise, let `Body = EdgeBody(A)` be the unique decoded `EdgeBody`. + + * If `Body` violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1` (both `from` and `to` empty), the graph store MUST return `GS_ERR_INTEGRITY`. + +3. **Return** + + * If all checks succeed, `resolve_edge(ref)` MUST return `Body`. + +Determinism: + +> **TGK/STORE-RESOLVE-DET/1** +> For a fixed snapshot, `resolve_edge` is a pure function of: +> +> * the snapshot’s `Artifacts` and `config.tgk_profiles`, and +> * the input `ref`. +> +> Any conformant implementation MUST compute the same `EdgeBody` for a given `ref`, or the same error code, for that snapshot. + +### 4.2 Relationship to `ProvenanceGraph.Edges` + +For a fixed `GraphStoreSnapshot` with `ProvenanceGraph.Edges`: + +* If `resolve_edge(ref)` returns `Body`, then: + + ```text + (ref, Body) ∈ ProvenanceGraph.Edges + ``` + +* If `(ref, Body) ∈ ProvenanceGraph.Edges`, then `resolve_edge(ref)` MUST return `Body`. + +* If `resolve_edge(ref)` returns `GS_ERR_NOT_EDGE`, then `ref` does not appear as an `EdgeRef` in `ProvenanceGraph.Edges` for this snapshot. + +* If `resolve_edge(ref)` returns `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, or `GS_ERR_INTEGRITY`, the snapshot does not define a well-formed TGK edge for `ref`. Implementations MAY still expose such `ref` values for diagnostic or repair purposes, but MUST NOT treat them as edges in normal graph views or adjacency results. + +Adjacency and scan queries MUST operate over the well-formed edge set `ProvenanceGraph.Edges` only, and MUST silently exclude any `ref` that would lead to error in `resolve_edge`. + +--- + +## 5. Adjacency Query Semantics + +### 5.1 Operations + +`TGK/STORE/1` defines three core adjacency queries for a given snapshot: + +```text +edges_from( + node: Node, + type_filter: EdgeTypeFilter +) -> list + +edges_to( + node: Node, + type_filter: EdgeTypeFilter +) -> list + +edges_incident( + node: Node, + type_filter: EdgeTypeFilter +) -> list +``` + +All three operations: + +* are defined as pure functions over `ProvenanceGraph.Edges` for the snapshot; +* MUST return edge views in a deterministic order (see §5.5); +* MUST NOT return graph-level error codes; they return empty lists when no edges match. + +### 5.2 `edges_from` + +For a fixed `GraphStoreSnapshot` with `ProvenanceGraph.Edges`, `edges_from(node, type_filter)` MUST return a list containing each pair `(ref, body)` such that: + +```text +(ref, body) ∈ ProvenanceGraph.Edges +node ∈ body.from +body.type ∈ EffectiveTypeSet(type_filter) +``` + +exactly once, where `EffectiveTypeSet(type_filter)` is: + +* `config.tgk_profiles.edge_types` if `type_filter.types` is empty; otherwise +* `type_filter.types ∩ config.tgk_profiles.edge_types`. + +Notes: + +* `node` need not correspond to any Artifact in `Artifacts`; it is an arbitrary `Reference`. +* If no edges satisfy the predicate, the result MUST be the empty list. + +### 5.3 `edges_to` + +Similarly, `edges_to(node, type_filter)` MUST return each `(ref, body)` such that: + +```text +(ref, body) ∈ ProvenanceGraph.Edges +node ∈ body.to +body.type ∈ EffectiveTypeSet(type_filter) +``` + +exactly once. + +### 5.4 `edges_incident` + +`edges_incident(node, type_filter)` MUST return the union of: + +* all edges in `edges_from(node, type_filter)`, and +* all edges in `edges_to(node, type_filter)`, + +with duplicates removed (i.e. if an edge has `node` in both `from` and `to`, it appears only once). + +Formally, the result set is: + +```text +{ (ref, body) | + (ref, body) ∈ ProvenanceGraph.Edges + body.type ∈ EffectiveTypeSet(type_filter) + node ∈ body.from ∪ body.to +} +``` + +### 5.5 Result ordering + +For any of the adjacency operations above, the **result set** is defined as a mathematical set. To support pagination and deterministic clients, implementations MUST impose a deterministic total order before returning a list. + +For `TGK/STORE/1`, the edge ordering for a given snapshot is **normatively defined** as: + +1. For each edge `(edge_ref, body)`, form the byte sequence: + + ```text + order_key(edge_ref) = hash_id_bytes || digest_bytes + ``` + + where: + + * `hash_id_bytes` is `edge_ref.hash_id` encoded as a 2-byte unsigned integer in big-endian order; and + * `digest_bytes` is the raw `edge_ref.digest` byte sequence. + +2. Sort edges in ascending lexicographic order by `order_key(edge_ref)`. + +This ordering: + +* MUST be used for: + + * `edges_from`, `edges_to`, `edges_incident`, and + * `scan_edges` (if implemented); + +* MUST be stable for a fixed snapshot; and + +* ensures that any two conformant implementations operating on the same snapshot produce adjacency and scan lists that are identical, including order. + +This ordering is equivalent to sorting by canonical `ReferenceBytes` under `ENC/ASL1-CORE v1`. + +> **TGK/STORE-EDGE-ORDER/1** +> For a fixed snapshot, the ordering of edges returned by any `TGK/STORE/1` operation is the ascending lexicographic order of `(edge_ref.hash_id, edge_ref.digest)` as encoded above. Conformant implementations MUST NOT use any other ordering. + +### 5.6 Relationship to `ProvenanceGraph.Nodes` + +The adjacency operations are implicitly defined over `ProvenanceGraph.Nodes`: + +* If `node ∉ ProvenanceGraph.Nodes` for a snapshot, all three adjacency queries MUST return empty lists. +* If `node ∈ ProvenanceGraph.Nodes`, adjacency queries MUST enumerate exactly the edges incident to that node, subject to `type_filter`. + +`TGK/STORE/1` does not define a “node resolution” operation; the resolution of nodes to Artifacts is done via ASL/1-STORE or the underlying `resolve_artifact` mapping in the identity domains. + +### 5.7 Optional neighbor queries: `neighbors` + +Graph stores MAY expose a neighbor query helper, derived purely from the adjacency semantics: + +```text +neighbors( + node: Node, + type_filter: EdgeTypeFilter, + direction: GraphDirection +) -> list +``` + +Semantics for a given snapshot: + +1. Define `EffectiveTypeSet(type_filter)` as in §5.2. + +2. Define the incident edge set `I(node, type_filter, direction)`: + + * If `direction = OUT`: + + ```text + I = { (ref, body) | + (ref, body) ∈ ProvenanceGraph.Edges + node ∈ body.from + body.type ∈ EffectiveTypeSet(type_filter) + } + ``` + + * If `direction = IN`: + + ```text + I = { (ref, body) | + (ref, body) ∈ ProvenanceGraph.Edges + node ∈ body.to + body.type ∈ EffectiveTypeSet(type_filter) + } + ``` + + * If `direction = BOTH`: + + ```text + I = { (ref, body) | + (ref, body) ∈ ProvenanceGraph.Edges + node ∈ (body.from ∪ body.to) + body.type ∈ EffectiveTypeSet(type_filter) + } + ``` + +3. Define the neighbor set `N(node, type_filter, direction)`: + + * If `direction = OUT`: + + ```text + N = { n ∈ Node | + ∃ (ref, body) ∈ I, n ∈ body.to + } + ``` + + * If `direction = IN`: + + ```text + N = { n ∈ Node | + ∃ (ref, body) ∈ I, n ∈ body.from + } + ``` + + * If `direction = BOTH`: + + ```text + N = N_OUT ∪ N_IN + ``` + + where: + + ```text + N_OUT = { n ∈ Node | + ∃ (ref, body) ∈ ProvenanceGraph.Edges, + node ∈ body.from, + body.type ∈ EffectiveTypeSet(type_filter), + n ∈ body.to + } + + N_IN = { n ∈ Node | + ∃ (ref, body) ∈ ProvenanceGraph.Edges, + node ∈ body.to, + body.type ∈ EffectiveTypeSet(type_filter), + n ∈ body.from + } + ``` + + Notes: + + * Self-loops are included naturally by these definitions: if an edge has `node` in both `from` and `to`, then `node ∈ N` for `direction = BOTH`, and MAY appear in `N` for `OUT` and/or `IN` depending on the edge’s endpoints. + * `N` is a mathematical set: each neighbor Node appears at most once in the result. + +4. `neighbors(...)` MUST return the elements of `N` in a deterministic order defined as: + + * For each neighbor `n`, define: + + ```text + node_order_key(n) = hash_id_bytes || digest_bytes + ``` + + where: + + * `hash_id_bytes` is `n.hash_id` as `u16` big-endian, and + * `digest_bytes` is `n.digest`. + + * Sort neighbors in ascending lexicographic order by `node_order_key(n)` and return the resulting list. + +`neighbors` MUST be observationally equivalent to deriving neighbors from `ProvenanceGraph.Edges` using the definitions above. It MUST NOT introduce any new graph semantics beyond those defined here. + +> **TGK/STORE-NEIGHBORS/1 (OPTIONAL)** +> If `neighbors` is implemented, its results MUST be consistent with the semantics above and with the canonical node ordering based on `(hash_id, digest)`. + +--- + +## 6. Edge Scan Semantics (Optional) + +### 6.1 Operation: `scan_edges` + +Graph stores MAY expose a scanning operation: + +```text +scan_edges( + type_filter: EdgeTypeFilter + page_token: optional PageToken +) -> ( + edges: list + next_page_token: optional PageToken +) +``` + +`PageToken` is an opaque `OctetString` whose internal structure and interpretation are implementation-defined. + +Semantics: + +* For a fixed snapshot and `type_filter`, `scan_edges` MUST enumerate all edges `(ref, body) ∈ ProvenanceGraph.Edges` such that: + + ```text + body.type ∈ EffectiveTypeSet(type_filter) + ``` + + exactly once, when invoked repeatedly starting with `page_token = absent` and continuing with `next_page_token` until it returns `next_page_token = absent`. + +* Within each call, `edges` MUST be ordered according to the deterministic edge ordering in §5.5. + +* Pagination boundaries MUST NOT: + + * skip edges, or + * duplicate edges across pages for a given snapshot. + +* If `page_token` is absent, the scan starts from the beginning of the ordering; otherwise, it resumes from the position encoded in `page_token`. + +* If the underlying snapshot changes between pages (e.g. more Artifacts ingested): + + * `TGK/STORE/1` does not require consistency across pages; implementations MAY treat each call as operating on an implementation-defined snapshot. + * Implementations SHOULD document whether `scan_edges` is snapshot-stable across a full scan or is “fuzzy” (best-effort, eventually-consistent). + +`scan_edges` MUST NOT surface edges that are not in `ProvenanceGraph.Edges` for whatever snapshot it operates on. + +> **TGK/STORE-SCAN/1 (OPTIONAL)** +> If `scan_edges` is implemented, it MUST satisfy the enumeration and ordering semantics above for each snapshot it operates on. + +--- + +## 7. Determinism & Consistency + +### 7.1 Snapshot-relative determinism + +Fix: + +* a `GraphStoreConfig`, +* a finite Artifact set `Artifacts`, +* a `TGKProfileSet`, +* and the resulting `ProvenanceGraph` per `TGK/1-CORE`. + +For any two `TGK/STORE/1`–conformant GraphStoreInstances that: + +* use the same `GraphStoreConfig`, and +* have snapshots whose `(Artifacts, config.tgk_profiles)` pairs are equal, + +the following MUST hold for all inputs: + +* `get_config()` returns equal `GraphStoreConfig` values. + +* `resolve_edge(ref)` returns the same `EdgeBody` or the same error code. + +* `edges_from(node, type_filter)`, `edges_to(node, type_filter)`, and `edges_incident(node, type_filter)` return identical lists of `GraphEdgeView` values (same elements in the same order, per §5.5). + +* If both implement `scan_edges`, then for any fixed snapshot: + + * the union over all pages of all `edges` returned by `scan_edges` equals the set of all edges in `ProvenanceGraph.Edges` satisfying the `type_filter`, and + * the per-page lists and page boundaries are identical up to differences in the opaque representation of `PageToken`. + +* If both implement `neighbors`, then for any fixed snapshot: + + * `neighbors(node, type_filter, direction)` returns identical neighbor Node lists (same elements in the same order), consistent with §5.7. + +> **TGK/STORE-DET/1** +> For a fixed snapshot and `GraphStoreConfig`, any two `TGK/STORE/1`–conformant GraphStoreInstances expose isomorphic graphs (identical edge and node sets, with edge lists ordered as in §5.5) and identical results for all `TGK/STORE/1` operations. + +### 7.2 Projection fidelity + +> **TGK/STORE-PROJECTION/1** +> For any GraphStoreSnapshot, GraphStoreInstances MUST ensure that: +> +> * Every `(ref, body)` reachable via graph queries (resolve or adjacency) corresponds to an edge in the snapshot’s `ProvenanceGraph.Edges`. +> * No graph query introduces edges or nodes that cannot be derived from the snapshot’s `Artifacts` and `config.tgk_profiles` under `TGK/1-CORE` (§4.1). + +Persisted graph indexes and caches: + +* are **optimizations only**; +* MUST be consistent with the `ProvenanceGraph` induced by their underlying Artifact and profile sets; and +* MUST NOT introduce additional, non-Artifactual relationships as if they were TGK edges. + +### 7.3 Interaction with evolving Artifact sets + +`TGK/STORE/1` does not fix how snapshots evolve. Implementations MAY: + +* operate with explicit snapshot boundaries (e.g. versioned graph views); or +* continuously ingest new Artifacts into a moving snapshot. + +Requirements: + +* Each individual invocation of a graph operation MUST be interpreted with respect to some well-defined snapshot. +* Implementations MAY use different snapshots for different operations, but SHOULD document their consistency model (e.g. “read-your-writes” guarantees if combined with local Artifact injection). + +--- + +## 8. Interaction with Other Layers (Informative) + +### 8.1 PEL/1-SURF + +`PEL/1-SURF` typically produces: + +* programs, inputs, outputs, and surface ExecutionResult Artifacts; and +* (via TGK-aware profiles) EdgeArtifacts that represent execution relationships (e.g., edges with type `EDGE_EXECUTION`). + +A GraphStoreInstance: + +* consumes these Artifacts (from one or more `StoreInstance`s); +* derives a `ProvenanceGraph` whose edges include execution edges (and possibly trace- or receipt-related edges); and +* exposes adjacency and neighbor queries that higher layers can use to follow execution relationships backward or forward. + +`TGK/STORE/1` does not require PEL; it only provides a graph substrate that PEL-related profiles can target. + +### 8.2 CIL/1, FER/1, FCT/1, OI/1 + +Certification, evidence, fact, and overlay layers: + +* define edge types (e.g., `EDGE_ATTESTS`, `EDGE_FACT_SUPPORTS`, `EDGE_OVERLAY_MAPS`) and EdgeArtifacts encoding those relationships; +* depend on graph queries to: + + * find which certificates, receipts, or overlays relate to a given ArtifactRef; + * navigate from facts back to supporting evidence; + * traverse overlay-based navigational relationships. + +`TGK/STORE/1` provides the minimal, policy-neutral query hooks for those profiles: + +* `edges_to` / `edges_from` / `edges_incident` with `EdgeTypeFilter` selecting the relevant edge types; +* optional `neighbors` for “just give me the adjacent nodes” views; +* optional `scan_edges` for bulk indexing or analytics. + +Semantics of what those edges “mean” are left to the profiles themselves. + +### 8.3 TGK/PROV/1 (Future) + +`TGK/PROV/1` is expected to define provenance and trace operators (e.g. backwards reachability along selected edge types). It can be specified: + +* as pure functions over `ProvenanceGraph`; and/or +* as operations that make use of `TGK/STORE/1` adjacency and neighbor queries. + +`TGK/STORE/1` ensures that any such higher-level operators can rely on a common, identity-preserving graph view across implementations. + +--- + +## 9. Conformance + +An implementation is **TGK/STORE/1–conformant** if, for each GraphStoreInstance it exposes, it satisfies all of the following: + +1. **Configuration and identity domains** + + * Associates a well-defined `GraphStoreConfig` with each GraphStoreInstance, and exposes it via `get_config()`. + * For each identity domain in `IdSpaceConfig.domains`, ensures that `Reference` values in that domain are interpreted according to `ASL/1-CORE` and, when applicable, `ENC/ASL1-CORE` and `HASH/ASL1`. + * Does not alias or merge distinct `(encoding_profile, hash_id)` domains at this layer. + * Treats `Reference` values as opaque identities; does not reinterpret `digest` bytes across algorithms or encodings. + * Treats `hash_id` values outside `IdSpaceConfig.domains` as unsupported for graph purposes. + +2. **Projection fidelity** + + * For any snapshot, there exists a finite `Artifacts` set and `TGKProfileSet = config.tgk_profiles` such that: + + * the graph exposed by the GraphStoreInstance is exactly the `ProvenanceGraph` induced by `(Artifacts, TGKProfileSet)` per `TGK/1-CORE`; and + * all graph queries are logically evaluated over that `ProvenanceGraph`. + + * Does not introduce edges, nodes, or relationships that cannot be derived from Artifacts and `TGKProfileSet` according to `TGK/1-CORE`. + +3. **Correct edge resolution** + + * Implements `resolve_edge(ref)` as in §4: + + * uses Artifact resolution consistent with `ASL/1-STORE`, + * checks edgehood and decodes EdgeArtifacts according to `TGK/1-CORE §3` and the configured `TGKProfileSet`, + * returns the correct `EdgeBody` for any `EdgeRef` in `ProvenanceGraph.Edges`, + * returns graph-level errors (`GS_ERR_NOT_EDGE`, `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, `GS_ERR_INTEGRITY`) with the specified semantics. + +4. **Correct adjacency queries** + + * Implements `edges_from`, `edges_to`, and `edges_incident` as exact projections of `ProvenanceGraph.Edges`, as in §5. + * Uses the deterministic edge ordering defined in §5.5 for all adjacency and scan results. + * Returns empty lists (not errors) for nodes with no incident edges in the snapshot. + +5. **Optional scan semantics (if provided)** + + * If `scan_edges` is implemented, enumerates edges consistent with §6 for each snapshot it operates on. + * Ensures that pagination does not skip or duplicate edges for a given snapshot. + +6. **Optional neighbor semantics (if provided)** + + * If `neighbors` is implemented, computes neighbor Node sets exactly as in §5.7. + * Uses the deterministic node ordering derived from `(hash_id, digest)` for the returned Node list. + * Ensures that `neighbors` can be derived from the snapshot’s `ProvenanceGraph` without accessing extra, off-graph state. + +7. **Error handling and integrity** + + * Surfaces errors consistent with §3.3 and §4.1. + * Treats invariant violations (ASL identity, encoding consistency, TGK edge invariants) as `GS_ERR_INTEGRITY` or an error at least as strict; does not silently ignore such conditions while treating affected Artifacts as valid edges. + +8. **Layering and neutrality** + + * Does not embed execution, certification, policy, or fact semantics into `TGK/STORE/1` constructs. + * Leaves provenance algorithms, certificate evaluation, and fact acceptance criteria to `TGK/PROV/1`, `CIL/1`, `FER/1`, `FCT/1`, and other profiles. + * Does not introduce separate node or edge ID schemes: `Node := Reference`, `EdgeRef := Reference` remain the only identifiers. + +9. **Determinism** + + * For any fixed snapshot and `GraphStoreConfig`, conforms to the determinism requirements in §7.1. + * If it claims compatibility with other TGK-aware components, ensures that they all see the same graph (up to the normative edge ordering) for a given `(Artifacts, TGKProfileSet)`. + +Everything else — concrete APIs, schema for paging tokens, topology, caching, distribution, and operational policies — is implementation-specific and MAY be standardized by additional profiles, provided they respect the semantics and invariants defined by `TGK/STORE/1`. + +--- + +## 10. Version History (Informative) + +**0.2.4 — 2025-11-16** + +* Introduced an explicit `PageToken` type alias as an opaque `OctetString` and tightened its description in `scan_edges` semantics to make the cursor representation and opacity clearer. +* Clarified in §3.4 that `PageToken`’s structure is implementation-defined while remaining logically a position in the canonical edge ordering. +* Performed minor editorial cleanups to keep terminology consistent around identity domains, unsupported `hash_id` handling, and the separation between artifact-resolution errors and graph-level query behavior. + +**0.2.3 — 2025-11-16** + +* Added an explicit logical `GraphStoreInstance` interface (§3.4) listing required and optional operations (`get_config`, `resolve_edge`, adjacency queries, `scan_edges`, `neighbors`), clarifying that concrete APIs may wrap or transport these differently while preserving semantics. +* Required `get_config()` to return the `GraphStoreConfig` associated with a GraphStoreInstance, and extended the determinism guarantees in §7.1 to cover `get_config`. +* Updated conformance criteria to require that `GraphStoreConfig` be exposed via `get_config` and that all implemented operations behave consistently with their logical definitions. + +**0.2.2 — 2025-11-16** + +* Tightened the semantics of the optional `neighbors` helper to be fully set-based and snapshot-relative, with self-loop behavior determined purely by the formal definitions (no implementation choice), ensuring cross-implementation determinism. +* Defined `neighbors`’s neighbor sets directly in terms of `ProvenanceGraph.Edges`, and made the `BOTH` case explicitly `N_OUT ∪ N_IN`. +* Introduced a canonical node ordering for `neighbors` results based on `(hash_id, digest)`, analogous to the canonical edge ordering. +* Clarified that `neighbors` is observationally equivalent to computing neighbors from the snapshot’s `ProvenanceGraph`, and cannot introduce new semantics. + +**0.2.1 — 2025-11-16** + +* Clarified that any `GraphEdgeView { edge_ref, body }` MUST be consistent with `resolve_edge(edge_ref)` for the same snapshot. +* Introduced an optional `neighbors(node, type_filter, direction)` helper, defined purely in terms of the `ProvenanceGraph` and adjacency semantics, with deterministic node ordering analogous to edge ordering. +* Extended determinism requirements (`TGK/STORE-DET/1`) to cover `neighbors` when implemented. + +**0.2.0 — 2025-11-16** + +* Introduced explicit `GraphStoreConfig` structure (`IdSpaceConfig`, `ArtifactScope`, `TGKProfileSet`). +* Clarified identity-domain handling and unsupported `hash_id` semantics. +* Defined a normative global edge ordering (`TGK/STORE-EDGE-ORDER/1`) based on `(hash_id, digest)` for all adjacency and scan operations. +* Tightened `resolve_edge` semantics and error mapping (`GS_ERR_NOT_EDGE`, `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, `GS_ERR_INTEGRITY`). +* Clarified optional `scan_edges` behavior and snapshot-relative consistency. +* Strengthened determinism and projection fidelity requirements, including `TGK/STORE-DET/1` and `TGK/STORE-PROJECTION/1`. +* Added explicit conformance criteria and made layering / neutrality constraints more precise. + +**0.1.9 — 2025-11-16** + +* First consolidated draft of `TGK/STORE/1` as a graph store profile over `ASL/1-STORE` and `TGK/1-CORE`. +* Defined `GraphStoreInstance` / `GraphStoreSnapshot`, basic edge resolution and adjacency operations, and an initial error model. +* Introduced optional `scan_edges` and high-level determinism and layering requirements. + +--- + +## Document History + +* **0.2.4 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.