amduat/tier1/tgk-store-1.md

1163 lines
45 KiB
Markdown
Raw Permalink Normal View History

2025-12-20 11:32:17 +01:00
# TGK/STORE/1 — Graph Store and Query Semantics
Status: Approved
Owner: Niklas Rydberg
Version: 0.2.4
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [traceability, execution]
<!-- Source: /amduat/docs/new/tgk1-store.md | Canonical: /amduat/tier1/tgk-store-1.md -->
**Document ID:** `TGK/STORE/1`
**Layer:** L1.7 — Graph store & query profile over ASL/1-STORE + TGK/1-CORE
**Depends on (normative):**
* `ASL/1-CORE v0.4.x` — value substrate: `Artifact`, `Reference`, `TypeTag`, identity model
* `ASL/1-STORE v0.4.x` — content-addressable store: `StoreInstance`, `StoreConfig`, `put/get`
* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId`, `ProvenanceGraph`
**Informative references:**
* `SUBSTRATE/STACK-OVERVIEW v0.3.x` — layering & dependency discipline
* `ENC/ASL1-CORE v1.x` — canonical `ArtifactBytes` / `ReferenceBytes`
* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HashId` registry)
* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for TGK `EdgeBody` / EdgeArtifacts
* `PEL/1-SURF v0.2.x` — store-backed execution surface (producer of many EdgeArtifacts)
* `CIL/1`, `FER/1`, `FCT/1`, `OI/1` — provenance and fact profiles that consume graph queries
* (future) `TGK/PROV/1` — provenance / trace operators over `ProvenanceGraph`
> **Normativity note**
> `TGK/STORE/1` is a near-core **graph store profile**, not a kernel surface. It introduces no new identity scheme. It defines how to expose and query the `ProvenanceGraph` from `TGK/1-CORE` over Artifacts reachable via `ASL/1-STORE` or equivalent feeds, and constrains graph indexes and query semantics.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**,
**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be
interpreted as described in RFC 2119.
From `ASL/1-CORE`:
```text
Artifact {
bytes: OctetString
type_tag: optional TypeTag
}
Reference {
hash_id: HashId // uint16
digest: OctetString
}
TypeTag {
tag_id: uint32
}
HashId = uint16
EncodingProfileId = uint16
````
From `ASL/1-STORE` (snapshot view):
```text
StoreConfig {
encoding_profile: EncodingProfileId
hash_id: HashId
}
// Logical view: for a snapshot
StoreInstance.M : Reference -> Artifact // 0 or 1 Artifact per Reference
```
From `TGK/1-CORE`:
```text
Node := Reference
EdgeTypeId = uint32
EdgeBody {
type: EdgeTypeId
from: Node[] // ordered, MAY be empty
to: Node[] // ordered, MAY be empty
payload: Reference // always present
}
ProvenanceGraph {
Nodes: set<Node>
Edges: set<(EdgeRef, EdgeBody)>
}
```
In this document:
* **ArtifactRef**, **Node**, and **EdgeRef** are all ASL/1 `Reference` values.
* **ExecutionEnvironment** is as in `TGK/1-CORE`: an abstract context with a finite Artifact set and a fixed TGK profile set “in effect” at a snapshot.
Additional terms introduced here:
```text
GraphStoreInstance -- logical graph store as seen by TGK/STORE/1
GraphStoreConfig -- configuration of a GraphStoreInstance
GraphStoreSnapshot -- logical state of a GraphStoreInstance at an instant
GraphEdgeView -- (EdgeRef, EdgeBody) pair
GraphDirection -- OUT, IN, BOTH
EdgeTypeFilter -- subset of EdgeTypeId values, possibly empty (“ALL”)
PageToken -- opaque cursor for paginated scans
```
`TGK/STORE/1` defines **logical semantics only**. Concrete APIs (HTTP, gRPC, language bindings), index structures, physical storage, distribution, and replication are out of scope.
---
## 1. Purpose, Scope & Non-Goals
### 1.1 Purpose
`TGK/STORE/1` defines the **graph store abstraction and basic query semantics** for the TGK trace graph over ASL/1 Artifacts.
It provides:
* A model of a **GraphStoreInstance** that:
* draws its Artifacts from one or more `ASL/1-STORE` instances or equivalent sources; and
* exposes the **ProvenanceGraph** defined by `TGK/1-CORE` as a queryable graph.
* A minimal set of **query operations** over that graph:
* **Edge resolution** by `EdgeRef`:
```text
resolve_edge(EdgeRef) -> EdgeBody | error
```
* **Adjacency queries** by `Node` and direction (expressed as three operations):
```text
edges_from(node, type_filter) -> list<GraphEdgeView>
edges_to(node, type_filter) -> list<GraphEdgeView>
edges_incident(node, type_filter) -> list<GraphEdgeView>
```
* An optional **edge scan** surface:
```text
scan_edges(type_filter, page_token?) ->
(list<GraphEdgeView>, next_page_token?)
```
* An optional **neighbor query** surface:
```text
neighbors(node, type_filter, direction) -> list<Node>
```
* A small **error model** for graph-level queries (e.g. “ref is not an edge Artifact in this graph”).
Goals:
* **Projection fidelity** — all graph views MUST be consistent with the `ProvenanceGraph` projection defined by `TGK/1-CORE` for some Artifact set and TGK profile set.
* **Identity discipline** — nodes and edges are identified solely by ASL/1 `Reference` values; no new ID schemes are introduced.
* **Separation of concerns** — graph queries are separate from provenance algorithms and policy decisions (defined in `TGK/PROV/1` and higher profiles).
### 1.2 Non-goals
`TGK/STORE/1` explicitly does **not** define:
* A graph query language (path expressions, joins, aggregations).
* Provenance or trace operators (e.g. “all ancestors under these edge types”) — those belong to `TGK/PROV/1`.
* Certificate, fact, or overlay semantics (`CIL/1`, `FER/1`, `FCT/1`, `OI/1`).
* How Artifacts are transported between stores or replicated.
* Concrete API shapes, authentication, authorization, multi-tenancy, quotas.
* Index layout, sharding, partitioning, or caching strategies.
It is strictly about:
> **TGK/STORE-SCOPE/1**
> Given a finite Artifact set and a TGK profile configuration, `TGK/STORE/1` defines what graph exists (via `TGK/1-CORE`) and how basic, identity-preserving graph queries MUST behave.
### 1.3 Layering and dependencies
`TGK/STORE/1` sits:
* **Above ASL/1-STORE**:
* Assumes Artifacts are retrievable via content-addressable stores or equivalent feeds.
* Does not change `put/get` semantics or introduce new identity notions.
* **Above TGK/1-CORE**:
* Reuses the `ProvenanceGraph` definition as a pure projection over Artifacts and TGK profiles.
* Does not redefine `EdgeBody` or `ProvenanceGraph`.
* **Below provenance and fact profiles**:
* `TGK/PROV/1`, `FER/1`, `FCT/1`, `CIL/1`, `OI/1` MAY build higher-level query and interpretation surfaces on top of `TGK/STORE/1`.
Layering invariants:
> **TGK/STORE-LAYERING/1**
> `TGK/STORE/1` **MUST NOT**:
>
> * introduce any new notion of identity for nodes or edges;
> * bake in scheme-specific or domain-specific semantics (e.g. PEL, CIL, FCT);
> * violate `TGK/GRAPH-PROJECTION/CORE/1` or `TGK/DET/CORE/1`.
All graph edges seen through a GraphStoreInstance **MUST** be derivable exactly as in `TGK/1-CORE` from:
* some finite underlying Artifact set; and
* that instances configured TGK profile set (edge tags, encodings, type catalogs).
---
## 2. Core Graph Store Model
### 2.1 GraphStoreInstance and GraphStoreSnapshot
A **GraphStoreInstance** is an abstract component that, at any given logical instant, has a **GraphStoreSnapshot**.
For a given snapshot, a GraphStoreInstance is characterized by:
```text
GraphStoreSnapshot {
config: GraphStoreConfig
// Logical, not necessarily materialized:
Artifacts: finite set<Artifact>
Provenance: ProvenanceGraph
}
```
Where:
* `Artifacts` is the finite set of Artifacts this snapshot “sees” (logically reachable from its backing stores, archives, feeds, etc.) and that fall within its configured identity domains.
* `config.tgk_profiles` is the TGK profile set “in effect” at this snapshot.
* `Provenance` is the unique `ProvenanceGraph` induced by `(Artifacts, config.tgk_profiles)`, as defined in `TGK/1-CORE §4.1` and `TGK/GRAPH-PROJECTION/CORE/1`.
A GraphStoreSnapshot is conceptually an instance of a `TGK/1-CORE` ExecutionEnvironment snapshot, plus an explicit description of which identity domains and sources contribute to its Artifact set.
Implementations are **not required** to materialize `Artifacts` or `Provenance` explicitly. They MAY:
* maintain incremental indexes,
* cache decoded edges,
* stream edges from backing stores,
so long as all graph query semantics in this document are defined **as if** queries were evaluated directly over `Provenance.Edges` and `Provenance.Nodes` for some snapshot.
A GraphStoreInstance is **logically read-only** with respect to the graph: it does not create or mutate Artifacts. It only reflects whatever Artifacts and profiles are in scope for each snapshot. Writing edges is done by writing EdgeArtifacts (and other Artifacts) into the underlying stores or feeds.
For a given GraphStoreInstance, `GraphStoreConfig` is fixed across snapshots. Config changes (e.g. adding or removing identity domains or TGK profiles) SHOULD be represented as a new GraphStoreInstance or a clearly versioned reconfiguration, not as an in-place mutation of an existing instances config.
### 2.2 GraphStoreConfig
A **GraphStoreConfig** describes the identity and provenance view for a GraphStoreInstance:
```text
GraphStoreConfig {
id_space: IdSpaceConfig
artifact_scope: ArtifactScope
tgk_profiles: TGKProfileSet
}
```
Snapshots of a given GraphStoreInstance share the same `GraphStoreConfig`; they differ only in `Artifacts` and the derived `ProvenanceGraph`.
#### 2.2.1 Identity domains (`IdSpaceConfig`)
```text
IdSpaceConfig {
domains: list<IdentityDomain>
}
IdentityDomain {
encoding_profile: EncodingProfileId // e.g. ASL_ENC_CORE_V1
hash_id: HashId // e.g. 0x0001 (HASH-ASL1-256)
}
```
Constraints:
* Each `IdentityDomain` MUST be compatible with `ASL/1-CORE` and with the encoding profile it claims to support.
* A GraphStoreInstance MAY support multiple identity domains, but it MUST treat each `(encoding_profile, hash_id)` pair as a distinct identity domain for Artifact resolution. It MUST NOT collapse or alias different domains at this layer.
* GraphStoreInstances MUST treat `Reference` values as opaque identities: they MUST NOT attempt to “re-derive” or reinterpret `digest` bytes under a different encoding profile or hash algorithm.
For any `Reference ref`, the GraphStoreInstance MUST either:
* associate `ref.hash_id` with exactly one `IdentityDomain` in `id_space.domains`; or
* treat that `hash_id` as **unsupported** for graph purposes.
The binding from a `Reference` to an `IdentityDomain` (e.g. to a particular `StoreInstance` or federation) is an implementation concern. `TGK/STORE/1` only requires that, for all domains, the effective `Reference -> Artifact` mapping behaves like an `ASL/1-STORE`-style partial function.
#### 2.2.2 Artifact scope (`ArtifactScope`)
```text
ArtifactScope {
// Informative description of where Artifacts come from.
// Examples:
// - a single StoreInstance
// - a set of StoreInstances
// - a union of stores plus imported archives
description: OctetString
}
```
`ArtifactScope` is descriptive; `TGK/STORE/1` does not standardize how Artifacts are discovered, ingested, or kept up to date. The only semantic requirement is that, for any snapshot, there is a well-defined finite `Artifacts` set over which the `ProvenanceGraph` is computed.
#### 2.2.3 TGK profile set (`TGKProfileSet`)
```text
TGKProfileSet {
edge_tags: set<uint32> // TypeTag.tag_id values treated as edge tags
edge_types: set<EdgeTypeId> // supported edge types
encodings: set<EncodingProfileId> // edge encoding profiles (e.g. TGK1_EDGE_ENC_V1)
// plus optional catalog/profile identifiers
}
```
Constraints (reflecting `TGK/1-CORE §3`):
* For any `TypeTag.tag_id` in `edge_tags`, the profiles in `encodings` MUST define consistent decoding behavior for Artifacts with that tag. If more than one profile applies to a given Artifact, they MUST all decode it to the same `EdgeBody` value.
* Only `EdgeBody.type` values in `edge_types` are considered supported edge types for this instance. Artifacts that otherwise look like edges but whose decoded `EdgeBody.type` is not in `edge_types` MUST NOT contribute edges to `Provenance`.
> **Environment-relative note**
> As in `TGK/1-CORE`, edgehood and edge-type semantics are relative to `TGKProfileSet`. Different GraphStoreInstances over the same physical Artifacts but with different profile sets MAY expose different edges. `TGK/STORE/1` guarantees determinism only *relative* to a given `GraphStoreConfig` and snapshot.
### 2.3 Relationship to ASL/1-STORE
`TGK/STORE/1` assumes that `Artifacts` are obtained from one or more sources that are, at the logical level, compatible with `ASL/1-STORE`s model:
```text
Reference -> Artifact // partial, immutable mapping per identity domain
```
A GraphStoreInstance:
* MAY be implemented directly on top of one or more `StoreInstance`s;
* MAY draw from additional Artifact feeds (e.g. import-only archives, append-only logs);
* MUST, for each identity domain in `IdSpaceConfig.domains`, behave as if there is a pure mapping:
```text
resolve_artifact(ref: Reference) ->
Artifact
| ERR_NOT_FOUND
| ERR_INTEGRITY
| ERR_UNSUPPORTED
```
that is consistent with `ASL/1-STORE` semantics for that domain.
If multiple physical sources are combined for a given identity domain, the effective `resolve_artifact` mapping MUST still satisfy `ASL/1-STORE`s constraints: for any `ref`, at most one `Artifact` value; any detection of conflicting Artifacts for the same `ref` MUST result in an integrity error surfaced to callers.
If `ref.hash_id` does not belong to any `IdentityDomain` in `IdSpaceConfig.domains`, `resolve_artifact(ref)` MUST behave as `ERR_UNSUPPORTED` for the purposes of this profile.
`TGK/STORE/1` does **not** define new store operations. It constrains how graph queries may use Artifact resolution and how underlying store errors propagate into graph-level errors.
---
## 3. Graph Store Types & Error Model
### 3.1 Node, EdgeRef, GraphEdgeView
These are simply re-aliases of TGK and ASL types:
```text
Node := Reference // TGK node
EdgeRef := Reference // reference to an EdgeArtifact
```
For convenience, this profile defines:
```text
GraphEdgeView {
edge_ref : EdgeRef
body : EdgeBody
}
```
* Each `GraphEdgeView` represents a single edge in the `ProvenanceGraph.Edges` set.
* For any snapshot and any `GraphEdgeView { edge_ref, body }` produced by the graph store, the following MUST hold:
```text
resolve_edge(edge_ref) = body
```
when `resolve_edge` is evaluated against the same snapshot.
### 3.2 Direction and type filter
Graph direction is a small enum that higher-level APIs or profiles MAY use:
```text
GraphDirection (u8) {
OUT = 1 // edges where node ∈ EdgeBody.from
IN = 2 // edges where node ∈ EdgeBody.to
BOTH = 3 // union of OUT and IN
}
```
`EdgeTypeFilter` is defined as:
```text
EdgeTypeFilter {
// If empty, match all supported edge types.
types: list<EdgeTypeId>
}
```
Matching semantics:
* If `types` is empty, all `EdgeBody.type` values in the instances `tgk_profiles.edge_types` set are included.
* Otherwise, only edges with `EdgeBody.type ∈ types` are included (after intersecting with `tgk_profiles.edge_types`). Repeated entries in `types` have no additional effect.
### 3.3 GraphStore error model
`TGK/STORE/1` defines a minimal logical error model for graph queries:
```text
GraphErrorCode = uint8
GraphErrorCode {
GS_ERR_NOT_EDGE = 1 // ref resolves to a non-edge Artifact (for this TGKProfileSet)
GS_ERR_ARTIFACT_ERROR = 2 // underlying artifact resolution error
GS_ERR_UNSUPPORTED = 3 // unsupported identity domain, hash_id, or profile
GS_ERR_INTEGRITY = 4 // TGK or encoding-level integrity failure
}
```
These codes are **graph-level**. Concrete APIs MAY refine or enrich them, but MUST preserve the semantics:
1. **GS_ERR_NOT_EDGE**
Condition:
* An explicit `resolve_edge(ref)` call where:
* the reference resolves to an Artifact `A`, but
* `A` is not an EdgeArtifact in this instances `TGKProfileSet` sense (e.g. `type_tag` not in `edge_tags`, or decoding fails with “not this profile”, or decoded type not in `edge_types`).
In adjacency and scan queries, Artifacts that cannot be decoded as edges under `TGKProfileSet` MUST simply be excluded from results; they do **not** cause query failure.
2. **GS_ERR_ARTIFACT_ERROR**
Condition:
* Underlying Artifact resolution via `resolve_artifact(ref)` yields `ERR_NOT_FOUND` or `ERR_INTEGRITY` for a `ref` the graph store is attempting to resolve in the context of `resolve_edge`.
Logically: “artifact-level error that prevents graph semantics from being evaluated for this `ref`.”
3. **GS_ERR_UNSUPPORTED**
Conditions:
* The GraphStoreInstance does not support the identity domain, `hash_id`, or edge encoding profile required to interpret the provided `Reference` or edge payload, even though the Artifact might exist; or
* `resolve_artifact(ref)` yields `ERR_UNSUPPORTED` for this `ref`s identity domain; or
* `ref.hash_id` does not belong to any configured `IdentityDomain` in `id_space.domains`.
4. **GS_ERR_INTEGRITY**
Conditions include:
* The GraphStoreInstance detects an inconsistency that violates TGK/1-CORE invariants or encoding-profile invariants, such as:
* An Artifact previously indexed as an EdgeArtifact is now resolved to different bytes or `type_tag`.
* The same `Artifact.bytes` is decoded inconsistently by configured edge profiles.
* A decoded `EdgeBody` violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1`.
Concrete APIs can choose how to surface these (exceptions, error variants, status codes), but MUST distinguish:
* “Not an edge in this graph store” (`GS_ERR_NOT_EDGE`),
* “Underlying artifact or identity issue” (`GS_ERR_ARTIFACT_ERROR` / `GS_ERR_INTEGRITY`),
* “Graph store cannot interpret this domain/profile” (`GS_ERR_UNSUPPORTED`).
### 3.4 GraphStoreInstance interface (logical)
For clarity, this profile treats `PageToken` as:
```text
PageToken := OctetString // opaque to TGK/STORE/1; structure is implementation-defined
```
A GraphStoreInstance that claims `TGK/STORE/1` conformance MUST, at the logical level, support at least the following operations for each snapshot:
```text
get_config() -> GraphStoreConfig
resolve_edge(
ref: EdgeRef
) -> EdgeBody | GraphErrorCode
edges_from(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
edges_to(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
edges_incident(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
```
and MAY additionally support:
```text
scan_edges(
type_filter: EdgeTypeFilter,
page_token: optional PageToken
) -> (
edges: list<GraphEdgeView>,
next_page_token: optional PageToken
)
neighbors(
node: Node,
type_filter: EdgeTypeFilter,
direction: GraphDirection
) -> list<Node>
```
Subject to the semantics specified in the rest of this document:
* `get_config()` MUST return the `GraphStoreConfig` associated with the GraphStoreInstance (identical for all snapshots of that instance).
* `resolve_edge` MUST behave as in §4.
* `edges_from`, `edges_to`, and `edges_incident` MUST behave as in §5.
* `scan_edges`, if implemented, MUST behave as in §6.
* `neighbors`, if implemented, MUST behave as in §5.7.
These signatures are **logical**: concrete APIs MAY group, name, or transport them differently (e.g. as RPC endpoints, methods on a class, or functions in a module), but their observable behavior MUST be equivalent to these operations applied to some well-defined snapshot.
---
## 4. Edge Resolution Semantics
### 4.1 Operation: `resolve_edge`
Logical signature:
```text
resolve_edge(ref: EdgeRef) ->
EdgeBody
| GS_ERR_NOT_EDGE
| GS_ERR_ARTIFACT_ERROR
| GS_ERR_UNSUPPORTED
| GS_ERR_INTEGRITY
```
Semantics for a given `GraphStoreSnapshot`:
1. **Artifact resolution**
* The graph store attempts to resolve `ref` to an `Artifact`:
```text
resolve_artifact(ref) ->
Artifact
| ERR_NOT_FOUND
| ERR_INTEGRITY
| ERR_UNSUPPORTED
```
* If `resolve_artifact` returns `ERR_UNSUPPORTED` for this `ref`s identity domain (e.g., unknown `(encoding_profile, hash_id)`), or if `ref.hash_id` is not supported by any configured `IdentityDomain`, the graph store MUST return `GS_ERR_UNSUPPORTED`.
* If it returns `ERR_NOT_FOUND` or `ERR_INTEGRITY`, the graph store MUST return `GS_ERR_ARTIFACT_ERROR`.
2. **Edgehood check and decoding**
* If resolution succeeds with `A : Artifact`:
* If `A.type_tag` is absent, or `A.type_tag.tag_id` is not in the instances `tgk_profiles.edge_tags`, the graph store MUST return `GS_ERR_NOT_EDGE`.
* Otherwise, the graph store applies the configured edge-encoding profile(s) to `A.bytes` as per `TGK/1-CORE §3.2`:
* If no active edge encoding profile decodes `A.bytes` successfully, the graph store MUST return `GS_ERR_NOT_EDGE`.
* If one or more profiles decode successfully but disagree on the resulting `EdgeBody`, the graph store MUST return `GS_ERR_INTEGRITY`.
* If decoding yields an `EdgeBody` whose `type` is not in `tgk_profiles.edge_types`, the Artifact MUST NOT be treated as an edge for this instance; `resolve_edge(ref)` MUST return `GS_ERR_NOT_EDGE`.
* Otherwise, let `Body = EdgeBody(A)` be the unique decoded `EdgeBody`.
* If `Body` violates `TGK/EDGE-NONEMPTY-ENDPOINT/CORE/1` (both `from` and `to` empty), the graph store MUST return `GS_ERR_INTEGRITY`.
3. **Return**
* If all checks succeed, `resolve_edge(ref)` MUST return `Body`.
Determinism:
> **TGK/STORE-RESOLVE-DET/1**
> For a fixed snapshot, `resolve_edge` is a pure function of:
>
> * the snapshots `Artifacts` and `config.tgk_profiles`, and
> * the input `ref`.
>
> Any conformant implementation MUST compute the same `EdgeBody` for a given `ref`, or the same error code, for that snapshot.
### 4.2 Relationship to `ProvenanceGraph.Edges`
For a fixed `GraphStoreSnapshot` with `ProvenanceGraph.Edges`:
* If `resolve_edge(ref)` returns `Body`, then:
```text
(ref, Body) ∈ ProvenanceGraph.Edges
```
* If `(ref, Body) ∈ ProvenanceGraph.Edges`, then `resolve_edge(ref)` MUST return `Body`.
* If `resolve_edge(ref)` returns `GS_ERR_NOT_EDGE`, then `ref` does not appear as an `EdgeRef` in `ProvenanceGraph.Edges` for this snapshot.
* If `resolve_edge(ref)` returns `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, or `GS_ERR_INTEGRITY`, the snapshot does not define a well-formed TGK edge for `ref`. Implementations MAY still expose such `ref` values for diagnostic or repair purposes, but MUST NOT treat them as edges in normal graph views or adjacency results.
Adjacency and scan queries MUST operate over the well-formed edge set `ProvenanceGraph.Edges` only, and MUST silently exclude any `ref` that would lead to error in `resolve_edge`.
---
## 5. Adjacency Query Semantics
### 5.1 Operations
`TGK/STORE/1` defines three core adjacency queries for a given snapshot:
```text
edges_from(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
edges_to(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
edges_incident(
node: Node,
type_filter: EdgeTypeFilter
) -> list<GraphEdgeView>
```
All three operations:
* are defined as pure functions over `ProvenanceGraph.Edges` for the snapshot;
* MUST return edge views in a deterministic order (see §5.5);
* MUST NOT return graph-level error codes; they return empty lists when no edges match.
### 5.2 `edges_from`
For a fixed `GraphStoreSnapshot` with `ProvenanceGraph.Edges`, `edges_from(node, type_filter)` MUST return a list containing each pair `(ref, body)` such that:
```text
(ref, body) ∈ ProvenanceGraph.Edges
node ∈ body.from
body.type ∈ EffectiveTypeSet(type_filter)
```
exactly once, where `EffectiveTypeSet(type_filter)` is:
* `config.tgk_profiles.edge_types` if `type_filter.types` is empty; otherwise
* `type_filter.types ∩ config.tgk_profiles.edge_types`.
Notes:
* `node` need not correspond to any Artifact in `Artifacts`; it is an arbitrary `Reference`.
* If no edges satisfy the predicate, the result MUST be the empty list.
### 5.3 `edges_to`
Similarly, `edges_to(node, type_filter)` MUST return each `(ref, body)` such that:
```text
(ref, body) ∈ ProvenanceGraph.Edges
node ∈ body.to
body.type ∈ EffectiveTypeSet(type_filter)
```
exactly once.
### 5.4 `edges_incident`
`edges_incident(node, type_filter)` MUST return the union of:
* all edges in `edges_from(node, type_filter)`, and
* all edges in `edges_to(node, type_filter)`,
with duplicates removed (i.e. if an edge has `node` in both `from` and `to`, it appears only once).
Formally, the result set is:
```text
{ (ref, body) |
(ref, body) ∈ ProvenanceGraph.Edges
body.type ∈ EffectiveTypeSet(type_filter)
node ∈ body.from body.to
}
```
### 5.5 Result ordering
For any of the adjacency operations above, the **result set** is defined as a mathematical set. To support pagination and deterministic clients, implementations MUST impose a deterministic total order before returning a list.
For `TGK/STORE/1`, the edge ordering for a given snapshot is **normatively defined** as:
1. For each edge `(edge_ref, body)`, form the byte sequence:
```text
order_key(edge_ref) = hash_id_bytes || digest_bytes
```
where:
* `hash_id_bytes` is `edge_ref.hash_id` encoded as a 2-byte unsigned integer in big-endian order; and
* `digest_bytes` is the raw `edge_ref.digest` byte sequence.
2. Sort edges in ascending lexicographic order by `order_key(edge_ref)`.
This ordering:
* MUST be used for:
* `edges_from`, `edges_to`, `edges_incident`, and
* `scan_edges` (if implemented);
* MUST be stable for a fixed snapshot; and
* ensures that any two conformant implementations operating on the same snapshot produce adjacency and scan lists that are identical, including order.
This ordering is equivalent to sorting by canonical `ReferenceBytes` under `ENC/ASL1-CORE v1`.
> **TGK/STORE-EDGE-ORDER/1**
> For a fixed snapshot, the ordering of edges returned by any `TGK/STORE/1` operation is the ascending lexicographic order of `(edge_ref.hash_id, edge_ref.digest)` as encoded above. Conformant implementations MUST NOT use any other ordering.
### 5.6 Relationship to `ProvenanceGraph.Nodes`
The adjacency operations are implicitly defined over `ProvenanceGraph.Nodes`:
* If `node ∉ ProvenanceGraph.Nodes` for a snapshot, all three adjacency queries MUST return empty lists.
* If `node ∈ ProvenanceGraph.Nodes`, adjacency queries MUST enumerate exactly the edges incident to that node, subject to `type_filter`.
`TGK/STORE/1` does not define a “node resolution” operation; the resolution of nodes to Artifacts is done via ASL/1-STORE or the underlying `resolve_artifact` mapping in the identity domains.
### 5.7 Optional neighbor queries: `neighbors`
Graph stores MAY expose a neighbor query helper, derived purely from the adjacency semantics:
```text
neighbors(
node: Node,
type_filter: EdgeTypeFilter,
direction: GraphDirection
) -> list<Node>
```
Semantics for a given snapshot:
1. Define `EffectiveTypeSet(type_filter)` as in §5.2.
2. Define the incident edge set `I(node, type_filter, direction)`:
* If `direction = OUT`:
```text
I = { (ref, body) |
(ref, body) ∈ ProvenanceGraph.Edges
node ∈ body.from
body.type ∈ EffectiveTypeSet(type_filter)
}
```
* If `direction = IN`:
```text
I = { (ref, body) |
(ref, body) ∈ ProvenanceGraph.Edges
node ∈ body.to
body.type ∈ EffectiveTypeSet(type_filter)
}
```
* If `direction = BOTH`:
```text
I = { (ref, body) |
(ref, body) ∈ ProvenanceGraph.Edges
node ∈ (body.from body.to)
body.type ∈ EffectiveTypeSet(type_filter)
}
```
3. Define the neighbor set `N(node, type_filter, direction)`:
* If `direction = OUT`:
```text
N = { n ∈ Node |
∃ (ref, body) ∈ I, n ∈ body.to
}
```
* If `direction = IN`:
```text
N = { n ∈ Node |
∃ (ref, body) ∈ I, n ∈ body.from
}
```
* If `direction = BOTH`:
```text
N = N_OUT N_IN
```
where:
```text
N_OUT = { n ∈ Node |
∃ (ref, body) ∈ ProvenanceGraph.Edges,
node ∈ body.from,
body.type ∈ EffectiveTypeSet(type_filter),
n ∈ body.to
}
N_IN = { n ∈ Node |
∃ (ref, body) ∈ ProvenanceGraph.Edges,
node ∈ body.to,
body.type ∈ EffectiveTypeSet(type_filter),
n ∈ body.from
}
```
Notes:
* Self-loops are included naturally by these definitions: if an edge has `node` in both `from` and `to`, then `node ∈ N` for `direction = BOTH`, and MAY appear in `N` for `OUT` and/or `IN` depending on the edges endpoints.
* `N` is a mathematical set: each neighbor Node appears at most once in the result.
4. `neighbors(...)` MUST return the elements of `N` in a deterministic order defined as:
* For each neighbor `n`, define:
```text
node_order_key(n) = hash_id_bytes || digest_bytes
```
where:
* `hash_id_bytes` is `n.hash_id` as `u16` big-endian, and
* `digest_bytes` is `n.digest`.
* Sort neighbors in ascending lexicographic order by `node_order_key(n)` and return the resulting list.
`neighbors` MUST be observationally equivalent to deriving neighbors from `ProvenanceGraph.Edges` using the definitions above. It MUST NOT introduce any new graph semantics beyond those defined here.
> **TGK/STORE-NEIGHBORS/1 (OPTIONAL)**
> If `neighbors` is implemented, its results MUST be consistent with the semantics above and with the canonical node ordering based on `(hash_id, digest)`.
---
## 6. Edge Scan Semantics (Optional)
### 6.1 Operation: `scan_edges`
Graph stores MAY expose a scanning operation:
```text
scan_edges(
type_filter: EdgeTypeFilter
page_token: optional PageToken
) -> (
edges: list<GraphEdgeView>
next_page_token: optional PageToken
)
```
`PageToken` is an opaque `OctetString` whose internal structure and interpretation are implementation-defined.
Semantics:
* For a fixed snapshot and `type_filter`, `scan_edges` MUST enumerate all edges `(ref, body) ∈ ProvenanceGraph.Edges` such that:
```text
body.type ∈ EffectiveTypeSet(type_filter)
```
exactly once, when invoked repeatedly starting with `page_token = absent` and continuing with `next_page_token` until it returns `next_page_token = absent`.
* Within each call, `edges` MUST be ordered according to the deterministic edge ordering in §5.5.
* Pagination boundaries MUST NOT:
* skip edges, or
* duplicate edges across pages for a given snapshot.
* If `page_token` is absent, the scan starts from the beginning of the ordering; otherwise, it resumes from the position encoded in `page_token`.
* If the underlying snapshot changes between pages (e.g. more Artifacts ingested):
* `TGK/STORE/1` does not require consistency across pages; implementations MAY treat each call as operating on an implementation-defined snapshot.
* Implementations SHOULD document whether `scan_edges` is snapshot-stable across a full scan or is “fuzzy” (best-effort, eventually-consistent).
`scan_edges` MUST NOT surface edges that are not in `ProvenanceGraph.Edges` for whatever snapshot it operates on.
> **TGK/STORE-SCAN/1 (OPTIONAL)**
> If `scan_edges` is implemented, it MUST satisfy the enumeration and ordering semantics above for each snapshot it operates on.
---
## 7. Determinism & Consistency
### 7.1 Snapshot-relative determinism
Fix:
* a `GraphStoreConfig`,
* a finite Artifact set `Artifacts`,
* a `TGKProfileSet`,
* and the resulting `ProvenanceGraph` per `TGK/1-CORE`.
For any two `TGK/STORE/1`conformant GraphStoreInstances that:
* use the same `GraphStoreConfig`, and
* have snapshots whose `(Artifacts, config.tgk_profiles)` pairs are equal,
the following MUST hold for all inputs:
* `get_config()` returns equal `GraphStoreConfig` values.
* `resolve_edge(ref)` returns the same `EdgeBody` or the same error code.
* `edges_from(node, type_filter)`, `edges_to(node, type_filter)`, and `edges_incident(node, type_filter)` return identical lists of `GraphEdgeView` values (same elements in the same order, per §5.5).
* If both implement `scan_edges`, then for any fixed snapshot:
* the union over all pages of all `edges` returned by `scan_edges` equals the set of all edges in `ProvenanceGraph.Edges` satisfying the `type_filter`, and
* the per-page lists and page boundaries are identical up to differences in the opaque representation of `PageToken`.
* If both implement `neighbors`, then for any fixed snapshot:
* `neighbors(node, type_filter, direction)` returns identical neighbor Node lists (same elements in the same order), consistent with §5.7.
> **TGK/STORE-DET/1**
> For a fixed snapshot and `GraphStoreConfig`, any two `TGK/STORE/1`conformant GraphStoreInstances expose isomorphic graphs (identical edge and node sets, with edge lists ordered as in §5.5) and identical results for all `TGK/STORE/1` operations.
### 7.2 Projection fidelity
> **TGK/STORE-PROJECTION/1**
> For any GraphStoreSnapshot, GraphStoreInstances MUST ensure that:
>
> * Every `(ref, body)` reachable via graph queries (resolve or adjacency) corresponds to an edge in the snapshots `ProvenanceGraph.Edges`.
> * No graph query introduces edges or nodes that cannot be derived from the snapshots `Artifacts` and `config.tgk_profiles` under `TGK/1-CORE` (§4.1).
Persisted graph indexes and caches:
* are **optimizations only**;
* MUST be consistent with the `ProvenanceGraph` induced by their underlying Artifact and profile sets; and
* MUST NOT introduce additional, non-Artifactual relationships as if they were TGK edges.
### 7.3 Interaction with evolving Artifact sets
`TGK/STORE/1` does not fix how snapshots evolve. Implementations MAY:
* operate with explicit snapshot boundaries (e.g. versioned graph views); or
* continuously ingest new Artifacts into a moving snapshot.
Requirements:
* Each individual invocation of a graph operation MUST be interpreted with respect to some well-defined snapshot.
* Implementations MAY use different snapshots for different operations, but SHOULD document their consistency model (e.g. “read-your-writes” guarantees if combined with local Artifact injection).
Consistency model guidance:
* If a GraphStoreInstance represents a moving snapshot, back-to-back calls are allowed to observe different snapshots unless the implementation documents stronger guarantees.
* If an API exposes explicit snapshot views (i.e., a GraphStoreSnapshot handle), all graph operations performed through that view MUST be evaluated against the same snapshot and MUST NOT reflect later ingests/removals.
* Clients that need stable pagination, cross-operator comparisons, or multi-call invariants SHOULD use snapshot views or versioned endpoints rather than relying on repeated calls against a moving instance.
2025-12-20 11:32:17 +01:00
---
## 8. Interaction with Other Layers (Informative)
### 8.1 PEL/1-SURF
`PEL/1-SURF` typically produces:
* programs, inputs, outputs, and surface ExecutionResult Artifacts; and
* (via TGK-aware profiles) EdgeArtifacts that represent execution relationships (e.g., edges with type `EDGE_EXECUTION`).
A GraphStoreInstance:
* consumes these Artifacts (from one or more `StoreInstance`s);
* derives a `ProvenanceGraph` whose edges include execution edges (and possibly trace- or receipt-related edges); and
* exposes adjacency and neighbor queries that higher layers can use to follow execution relationships backward or forward.
`TGK/STORE/1` does not require PEL; it only provides a graph substrate that PEL-related profiles can target.
### 8.2 CIL/1, FER/1, FCT/1, OI/1
Certification, evidence, fact, and overlay layers:
* define edge types (e.g., `EDGE_ATTESTS`, `EDGE_FACT_SUPPORTS`, `EDGE_OVERLAY_MAPS`) and EdgeArtifacts encoding those relationships;
* depend on graph queries to:
* find which certificates, receipts, or overlays relate to a given ArtifactRef;
* navigate from facts back to supporting evidence;
* traverse overlay-based navigational relationships.
`TGK/STORE/1` provides the minimal, policy-neutral query hooks for those profiles:
* `edges_to` / `edges_from` / `edges_incident` with `EdgeTypeFilter` selecting the relevant edge types;
* optional `neighbors` for “just give me the adjacent nodes” views;
* optional `scan_edges` for bulk indexing or analytics.
Semantics of what those edges “mean” are left to the profiles themselves.
### 8.3 TGK/PROV/1 (Future)
`TGK/PROV/1` is expected to define provenance and trace operators (e.g. backwards reachability along selected edge types). It can be specified:
* as pure functions over `ProvenanceGraph`; and/or
* as operations that make use of `TGK/STORE/1` adjacency and neighbor queries.
`TGK/STORE/1` ensures that any such higher-level operators can rely on a common, identity-preserving graph view across implementations.
---
## 9. Conformance
An implementation is **TGK/STORE/1conformant** if, for each GraphStoreInstance it exposes, it satisfies all of the following:
1. **Configuration and identity domains**
* Associates a well-defined `GraphStoreConfig` with each GraphStoreInstance, and exposes it via `get_config()`.
* For each identity domain in `IdSpaceConfig.domains`, ensures that `Reference` values in that domain are interpreted according to `ASL/1-CORE` and, when applicable, `ENC/ASL1-CORE` and `HASH/ASL1`.
* Does not alias or merge distinct `(encoding_profile, hash_id)` domains at this layer.
* Treats `Reference` values as opaque identities; does not reinterpret `digest` bytes across algorithms or encodings.
* Treats `hash_id` values outside `IdSpaceConfig.domains` as unsupported for graph purposes.
2. **Projection fidelity**
* For any snapshot, there exists a finite `Artifacts` set and `TGKProfileSet = config.tgk_profiles` such that:
* the graph exposed by the GraphStoreInstance is exactly the `ProvenanceGraph` induced by `(Artifacts, TGKProfileSet)` per `TGK/1-CORE`; and
* all graph queries are logically evaluated over that `ProvenanceGraph`.
* Does not introduce edges, nodes, or relationships that cannot be derived from Artifacts and `TGKProfileSet` according to `TGK/1-CORE`.
3. **Correct edge resolution**
* Implements `resolve_edge(ref)` as in §4:
* uses Artifact resolution consistent with `ASL/1-STORE`,
* checks edgehood and decodes EdgeArtifacts according to `TGK/1-CORE §3` and the configured `TGKProfileSet`,
* returns the correct `EdgeBody` for any `EdgeRef` in `ProvenanceGraph.Edges`,
* returns graph-level errors (`GS_ERR_NOT_EDGE`, `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, `GS_ERR_INTEGRITY`) with the specified semantics.
4. **Correct adjacency queries**
* Implements `edges_from`, `edges_to`, and `edges_incident` as exact projections of `ProvenanceGraph.Edges`, as in §5.
* Uses the deterministic edge ordering defined in §5.5 for all adjacency and scan results.
* Returns empty lists (not errors) for nodes with no incident edges in the snapshot.
5. **Optional scan semantics (if provided)**
* If `scan_edges` is implemented, enumerates edges consistent with §6 for each snapshot it operates on.
* Ensures that pagination does not skip or duplicate edges for a given snapshot.
6. **Optional neighbor semantics (if provided)**
* If `neighbors` is implemented, computes neighbor Node sets exactly as in §5.7.
* Uses the deterministic node ordering derived from `(hash_id, digest)` for the returned Node list.
* Ensures that `neighbors` can be derived from the snapshots `ProvenanceGraph` without accessing extra, off-graph state.
7. **Error handling and integrity**
* Surfaces errors consistent with §3.3 and §4.1.
* Treats invariant violations (ASL identity, encoding consistency, TGK edge invariants) as `GS_ERR_INTEGRITY` or an error at least as strict; does not silently ignore such conditions while treating affected Artifacts as valid edges.
8. **Layering and neutrality**
* Does not embed execution, certification, policy, or fact semantics into `TGK/STORE/1` constructs.
* Leaves provenance algorithms, certificate evaluation, and fact acceptance criteria to `TGK/PROV/1`, `CIL/1`, `FER/1`, `FCT/1`, and other profiles.
* Does not introduce separate node or edge ID schemes: `Node := Reference`, `EdgeRef := Reference` remain the only identifiers.
9. **Determinism**
* For any fixed snapshot and `GraphStoreConfig`, conforms to the determinism requirements in §7.1.
* If it claims compatibility with other TGK-aware components, ensures that they all see the same graph (up to the normative edge ordering) for a given `(Artifacts, TGKProfileSet)`.
Everything else — concrete APIs, schema for paging tokens, topology, caching, distribution, and operational policies — is implementation-specific and MAY be standardized by additional profiles, provided they respect the semantics and invariants defined by `TGK/STORE/1`.
---
## 10. Version History (Informative)
**0.2.4 — 2025-11-16**
* Introduced an explicit `PageToken` type alias as an opaque `OctetString` and tightened its description in `scan_edges` semantics to make the cursor representation and opacity clearer.
* Clarified in §3.4 that `PageToken`s structure is implementation-defined while remaining logically a position in the canonical edge ordering.
* Performed minor editorial cleanups to keep terminology consistent around identity domains, unsupported `hash_id` handling, and the separation between artifact-resolution errors and graph-level query behavior.
**0.2.3 — 2025-11-16**
* Added an explicit logical `GraphStoreInstance` interface (§3.4) listing required and optional operations (`get_config`, `resolve_edge`, adjacency queries, `scan_edges`, `neighbors`), clarifying that concrete APIs may wrap or transport these differently while preserving semantics.
* Required `get_config()` to return the `GraphStoreConfig` associated with a GraphStoreInstance, and extended the determinism guarantees in §7.1 to cover `get_config`.
* Updated conformance criteria to require that `GraphStoreConfig` be exposed via `get_config` and that all implemented operations behave consistently with their logical definitions.
**0.2.2 — 2025-11-16**
* Tightened the semantics of the optional `neighbors` helper to be fully set-based and snapshot-relative, with self-loop behavior determined purely by the formal definitions (no implementation choice), ensuring cross-implementation determinism.
* Defined `neighbors`s neighbor sets directly in terms of `ProvenanceGraph.Edges`, and made the `BOTH` case explicitly `N_OUT N_IN`.
* Introduced a canonical node ordering for `neighbors` results based on `(hash_id, digest)`, analogous to the canonical edge ordering.
* Clarified that `neighbors` is observationally equivalent to computing neighbors from the snapshots `ProvenanceGraph`, and cannot introduce new semantics.
**0.2.1 — 2025-11-16**
* Clarified that any `GraphEdgeView { edge_ref, body }` MUST be consistent with `resolve_edge(edge_ref)` for the same snapshot.
* Introduced an optional `neighbors(node, type_filter, direction)` helper, defined purely in terms of the `ProvenanceGraph` and adjacency semantics, with deterministic node ordering analogous to edge ordering.
* Extended determinism requirements (`TGK/STORE-DET/1`) to cover `neighbors` when implemented.
**0.2.0 — 2025-11-16**
* Introduced explicit `GraphStoreConfig` structure (`IdSpaceConfig`, `ArtifactScope`, `TGKProfileSet`).
* Clarified identity-domain handling and unsupported `hash_id` semantics.
* Defined a normative global edge ordering (`TGK/STORE-EDGE-ORDER/1`) based on `(hash_id, digest)` for all adjacency and scan operations.
* Tightened `resolve_edge` semantics and error mapping (`GS_ERR_NOT_EDGE`, `GS_ERR_ARTIFACT_ERROR`, `GS_ERR_UNSUPPORTED`, `GS_ERR_INTEGRITY`).
* Clarified optional `scan_edges` behavior and snapshot-relative consistency.
* Strengthened determinism and projection fidelity requirements, including `TGK/STORE-DET/1` and `TGK/STORE-PROJECTION/1`.
* Added explicit conformance criteria and made layering / neutrality constraints more precise.
**0.1.9 — 2025-11-16**
* First consolidated draft of `TGK/STORE/1` as a graph store profile over `ASL/1-STORE` and `TGK/1-CORE`.
* Defined `GraphStoreInstance` / `GraphStoreSnapshot`, basic edge resolution and adjacency operations, and an initial error model.
* Introduced optional `scan_edges` and high-level determinism and layering requirements.
---
## Document History
* **0.2.4 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.