amduat/tier1/opreg-tgk-docgraph-1.md

241 lines
8.3 KiB
Markdown
Raw Normal View History

2025-12-20 11:32:17 +01:00
# OPREG/TGK-DOCGRAPH/1 — Document Graph Registry
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: Plan
Last Updated: 2025-12-01
Linked Phase Pack: PH12
Tags: [registry, tgk, docgraph]
<!-- Source: /amduat/logs/ph12/evidence/import/PH12-EV-IMPORT-001/opreg-tgk-docgraph-design-20251201.md | Canonical: /amduat/tier1/opreg-tgk-docgraph-1.md -->
**Document ID:** `OPREG/TGK-DOCGRAPH/1`
**Layer:** L1 Profile (TGK Doc Graph Registry over `TGK/1-CORE` + `ENC/TGK1-EDGE/1`)
**Depends on (normative):**
* `ASL/1-CORE v0.4.x``Artifact`, `Reference`, `TypeTag`, `HashId`
* `ENC/ASL1-CORE v1.x` — canonical encodings for Artifacts and References
* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HASH-ASL1-256`)
* `TGK/1-CORE v0.7.x` — trace graph kernel: `Node`, `EdgeBody`, `EdgeTypeId`
* `ENC/TGK1-EDGE/1 v0.1.x` — canonical encoding for `EdgeBody` / EdgeArtifacts
* `AMDUAT-DOCID` (Tier-0) — document identity and SoT/surface model
**Integrates with (informative):**
* `TGK/STORE/1` — graph store/query profile over ASL/1-STORE + TGK
* ADR-032 and PH10/PH12 import designs (RΩ / export)
* Future doc graph consumers (assistant overlays, IDX, provenance views)
© 2025 Amduat Programme.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Purpose and Non-Goals
### 0.1 Purpose
`OPREG/TGK-DOCGRAPH/1` defines a **doc/import/navigation graph registry** for Amduat:
* It names **node concepts** (as ASL/1 Artifacts) for:
* conceptual documents (DOCID lineages),
* document versions at a given snapshot (e.g. RΩ),
* Git commits and blobs,
* Amduat SoT instances.
* It names **edge types** (`EdgeTypeId`s) that connect those concepts:
* document ↔ version, surface, SoT state,
* version ↔ Git blob/commit,
* document ↔ Amduat instance.
* It constrains how those edges are represented as EdgeArtifacts under
`ENC/TGK1-EDGE/1` and consumed via `TGK/STORE/1`.
This registry is intentionally **doc/import scoped**. Execution, fact, and
certificate edges live in their own TGK/OPREG registries and MUST NOT reuse
`EdgeTypeId` assignments from this doc graph registry.
This Tier-1 stub is the **canonical registry companion** to the PH12 design
note `PH12-EV-IMPORT-001 — Doc Graph OPREG Profile Design
(/logs/ph12/evidence/import/PH12-EV-IMPORT-001/opreg-tgk-docgraph-design-20251201.md)`,
which records design intent and sandbox experience; this document is the SoT
for the node and edge vocabulary.
### 0.2 Non-goals
This registry does **not** define:
* any storage API (`ASL/1-STORE`, `TGK/STORE/1` already cover that),
* any provenance algorithms or queries (`TGK/PROV/1` and higher layers),
* any assistant or overlay behavior (those consume this registry),
* concrete import/export profiles (ADR-032 handles those).
It only defines **concepts and edge types**; encoding and storage use existing
Tier-1 profiles.
---
## 1. Node Concepts (Informative overview)
This section summarizes node concepts; canonical encodings and type_tags are
defined in companion encoding profiles (TBD).
### 1.1 DOC_CONCEPT
Conceptual governed document identity per `AMDUAT-DOCID`:
* `identity_authority` (string),
* `lineage_id` (string),
* optional `doc_code` (string),
* optional `code_status` (e.g. `tentative`, `stable`).
There is exactly one `DOC_CONCEPT` node per `(identity_authority, lineage_id)`.
### 1.2 DOC_VERSION
Versioned SoT slice of a governed document at a snapshot commit:
* `identity_authority`, `lineage_id`, `doc_code`, `code_status`,
* `g_commit` (Git commit id),
* `sha256` (content hash of the doc bytes at `g_commit`),
* `path` (repository path at `g_commit`, e.g. `/amduat/tier0/docid.md`),
* `surface`, `sot` (SoT state) per DOCID header.
Multiple `DOC_VERSION` nodes may exist for a `DOC_CONCEPT` across commits.
### 1.3 GIT_COMMIT
Git commit metadata:
* `commit` (sha1),
* `parents` (list of parent commit ids),
* `tree` (tree id),
* `author_name`, `author_email`, `authored_at`,
* `committer_name`, `committer_email`, `committed_at`,
* summary or truncated message.
### 1.4 GIT_BLOB
Content snapshot for a single blob at `g_commit`:
* `blob_sha` (sha1),
* `sha256` (content hash),
* `size_bytes`,
* `mode` (tree mode, including exec/symlink bits),
* `path` at `g_commit`.
### 1.5 AMDUAT_INSTANCE
Descriptor for an Amduat SoT instance:
* `g_commit` (RΩ commit),
* `store_root` (SoT store root),
* `store_backend_id`,
* references to RΩ FER/1 receipts and manifests,
* optional labels (environment, hostname, etc.).
### 1.6 Helper nodes
* `SURFACE` — surface classification nodes (e.g. `tier0`, `tier1`, `phase`, `evidence`).
* `SOT_STATE` — SoT state nodes (`Yes`, `Plan`, `Ref`).
---
## 2. Edge Types (Doc Graph Domain)
`EdgeTypeId` values in this registry are reserved for doc/import/navigation
edges. Concrete numeric assignments live in the encoding/catalogue layer.
Implementations and other OPREG registries MUST treat these `EdgeTypeId`s as
belonging exclusively to the **Amduat doc graph domain**:
* the eventual allocation for this registry is expected to reserve a contiguous
`EdgeTypeId` band (informally: an `AMDUAT-DOCGRAPH` band),
* only doc/import/navigation semantics (edges in §§2.12.4) may occupy that
band,
* PEL execution, FER/1, CIL, FCT, and other TGK domains MUST use their own
registries and bands.
### 2.1 Identity & version edges
* `EDGE_DOC_HAS_VERSION`
`DOC_CONCEPT → DOC_VERSION` — this version belongs to this conceptual document.
* `EDGE_VERSION_OF`
`DOC_VERSION → DOC_CONCEPT` — reverse link; derivable from `EDGE_DOC_HAS_VERSION`.
* `EDGE_DOC_HAS_IDENTITY`
`DOC_VERSION → DOC_CONCEPT` — DOCID identity is attached to this version.
### 2.2 Surface & SoT edges
* `EDGE_DOC_ON_SURFACE`
`DOC_VERSION → SURFACE` — surface classification (governance/spec/phase/evidence).
* `EDGE_DOC_SOT`
`DOC_VERSION → SOT_STATE` — SoT status (`Yes`, `Plan`, `Ref`) for this version.
### 2.3 Git provenance edges
* `EDGE_VERSION_HAS_BLOB`
`DOC_VERSION → GIT_BLOB` — ties a document version to the blob at `g_commit`.
* `EDGE_VERSION_FROM_COMMIT`
`DOC_VERSION → GIT_COMMIT` — last commit that touched this path at/before the snapshot.
### 2.4 SoT instance edges
* `EDGE_DOC_MEMBER_OF_AMDUAT`
`DOC_CONCEPT → AMDUAT_INSTANCE` — this document is part of a particular Amduat instance.
---
## 3. Encoding & Store Integration (Summary)
All doc-graph edges:
* are represented as TGK `EdgeBody` values with `EdgeTypeId` from this registry,
* are encoded as EdgeArtifacts via `ENC/TGK1-EDGE/1` using `TYPE_TAG_TGK1_EDGE_V1`,
* derive `EdgeRef` identities via `HASH/ASL1` over `EdgeBytes`,
* live in ASL/1-STORE instances alongside other Artifacts.
Nodes (`DOC_CONCEPT`, `DOC_VERSION`, `GIT_COMMIT`, `GIT_BLOB`, `AMDUAT_INSTANCE`, etc.) are ordinary
ASL/1 Artifacts; their `Reference`s are the TGK nodes.
`TGK/STORE/1` provides query semantics over the resulting graph.
JSON overlays or other projected views (for example, PH12 doc graph sandboxes)
MAY be emitted for human navigation and experiments, but they are always
derived from the underlying node Artifacts and EdgeArtifacts governed by this
registry and `ENC/TGK1-EDGE/1`; overlays are never the source of truth for
doc graph semantics.
---
## 4. Ingest & Encoder Interaction (Informative)
Implementations are expected to:
* materialise node Artifacts per this registry (and companion encoding profiles),
* emit FER/1 receipts for ingest pipelines,
* emit an idempotent edge worklist (doc-edge queue) that references `EdgeTypeId`s
from this registry and node `Reference`s,
* use a separate encoder to turn worklist items into EdgeArtifacts using `ENC/TGK1-EDGE/1`,
writing them into ASL/1-STORE for consumption via `TGK/STORE/1`.
Details of worklist format and encoder scheduling are left to PH12/PHB01
implementation notes; this registry only fixes the conceptual node/edge space.