# ASL/INDEXES/1 -- Index Taxonomy and Relationships Status: Draft Owner: Architecture Version: 0.1.0 SoT: No Last Updated: 2025-01-17 Linked Phase Pack: N/A Tags: [indexes, content, structural, materialization] **Document ID:** `ASL/INDEXES/1` **Layer:** L2 -- Index taxonomy (no encoding) **Depends on (normative):** * `ASL/1-CORE-INDEX` * `ASL/STORE-INDEX/1` **Informative references:** * `ASL/SYSTEM/1` * `TGK/1` * `ENC/ASL-CORE-INDEX/1` © 2025 Niklas Rydberg. ## License Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications. Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0. --- ## 0. Conventions The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119. ASL/INDEXES/1 defines index roles and relationships. It does not define encodings or storage layouts. --- ## 1. Purpose This document defines the minimal set of indexes used by ASL systems and their dependency relationships. --- ## 2. Index Taxonomy (Normative) ASL systems use three distinct indexes: ### 2.1 Content Index Purpose: map semantic identity to bytes. ``` ArtifactKey -> ArtifactLocation ``` Properties: * Snapshot-relative and append-only * Deterministic replay * Optional tombstone shadowing This is the ASL/1-CORE-INDEX and is the only index that governs visibility. ### 2.2 Structural Identity (SID) SID is the canonical identity of a derivation, not of bytes. ``` SID = H(ProgramRef || Inputs[] || ParamsRef || ExecProfile) ``` Notes: * `Inputs[]` order is canonical and stable. * `ParamsRef` is optional; absence must be encoded explicitly in the hash. * `ExecProfile` captures execution profile/versioning parameters (optional, but presence/absence is part of the SID). ### 2.2.1 SID Canonicalization (Normative) Implementations MUST canonicalize SID inputs as follows: 1. **ProgramRef** is encoded as `ReferenceBytes` (`ENC/ASL1-CORE`). 2. **Inputs[]** are ordered exactly as declared by the Program DAG inputs. 3. **ParamsRef** is encoded as: * `0x00` if absent, or * `0x01 || ReferenceBytes` if present. 4. **ExecProfile** is encoded as: * `0x00` if absent, or * `0x01 || ExecProfileBytes` if present. 5. **SID hash input** is the concatenation of the above fields with no padding. `ExecProfileBytes` is an opaque, deterministic byte sequence defined by the execution environment. Any change in encoding or content MUST change SID. ### 2.2 Structural Index Purpose: map structural identity to a derivation DAG node. ``` SID -> DAG node ``` Properties: * Deterministic and rebuildable * Does not imply materialization * May be in-memory or persisted ### 2.3 Derivation Index Purpose: map a materialized ArtifactKey to the set of known derivations that produce it. ``` ArtifactKey -> [DerivationRecord] ``` Where: ``` DerivationRecord = { SID, ProgramRef, InputRefs[], ParamsRef, ExecProfile } ``` Properties: * Authoritative for known derivations * Recomputable from replay + execution, so storage is optional * Enables dedup and semantic correlation across multiple derivations * Multiple SIDs MAY map to the same ArtifactKey #### 2.3.1 DerivationRecord Data Model (Normative) Canonical fields: ``` DerivationRecord { sid: SID program_ref: Reference input_refs: Reference[] // ordered as Program DAG inputs params_ref: Optional exec_profile: Optional } ``` Rules: * `input_refs` order MUST be preserved. * `params_ref` and `exec_profile` MUST use explicit presence markers as defined in SID canonicalization. * Additional metadata MAY be stored but MUST NOT affect SID or canonical equivalence. ### 2.4 Materialization Cache Purpose: record previously materialized content for a structural identity. ``` SID -> ArtifactKey ``` Properties: * Redundant and safe to drop * Recomputable from DAG + content index * Pure performance optimization --- ## 3. Dependency Rules (Normative) Dependencies MUST follow this direction: ``` Structural Index -> Materialization Cache -> Content Index Derivation Index -> Content Index ``` Rules: * The Content Index MUST NOT depend on the Structural Index. * The Structural Index MUST NOT depend on stored bytes. * The Materialization Cache MAY depend on both. * The Derivation Index MAY depend on the Content Index. --- ## 4. PUT/GET Interaction (Informative) * PUT registers structure (if used), resolves to an ArtifactKey, and updates the Content Index. * GET consults only the Content Index and reads bytes from the store. * The Structural Index, Derivation Index, and Materialization Cache are optional optimizations for PUT. Note: versioning relationships are modeled in TGK, not in these indexes. --- ## 5. Non-Goals ASL/INDEXES/1 does not define: * Encodings for any index * Storage layout or sharding * Query operators or traversal semantics --- ## Changelog - 2026-01-18: Added SID canonicalization rules and DerivationRecord data model.