amduat/tier1/asl-indexes-1.md
2026-01-18 06:55:00 +01:00

5.4 KiB

ASL/INDEXES/1 -- Index Taxonomy and Relationships

Status: Draft Owner: Architecture Version: 0.1.0 SoT: No Last Updated: 2025-01-17 Linked Phase Pack: N/A Tags: [indexes, content, structural, materialization]

Document ID: ASL/INDEXES/1 Layer: L2 -- Index taxonomy (no encoding)

Depends on (normative):

  • ASL/1-CORE-INDEX
  • ASL/STORE-INDEX/1

Informative references:

  • ASL/SYSTEM/1
  • TGK/1
  • ENC/ASL-CORE-INDEX/1

© 2025 Niklas Rydberg.

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.


0. Conventions

The key words MUST, MUST NOT, REQUIRED, SHOULD, and MAY are to be interpreted as in RFC 2119.

ASL/INDEXES/1 defines index roles and relationships. It does not define encodings or storage layouts.


1. Purpose

This document defines the minimal set of indexes used by ASL systems and their dependency relationships.


2. Index Taxonomy (Normative)

ASL systems use three distinct indexes:

2.1 Content Index

Purpose: map semantic identity to bytes.

ArtifactKey -> ArtifactLocation

Properties:

  • Snapshot-relative and append-only
  • Deterministic replay
  • Optional tombstone shadowing

This is the ASL/1-CORE-INDEX and is the only index that governs visibility.

2.2 Structural Identity (SID)

SID is the canonical identity of a derivation, not of bytes.

SID = H(ProgramRef || Inputs[] || ParamsRef || ExecProfile)

Notes:

  • Inputs[] order is canonical and stable.
  • ParamsRef is optional; absence must be encoded explicitly in the hash.
  • ExecProfile captures execution profile/versioning parameters (optional, but presence/absence is part of the SID).

2.2.1 SID Canonicalization (Normative)

Implementations MUST canonicalize SID inputs as follows:

  1. ProgramRef is encoded as ReferenceBytes (ENC/ASL1-CORE).
  2. Inputs[] are ordered exactly as declared by the Program DAG inputs.
  3. ParamsRef is encoded as:
    • 0x00 if absent, or
    • 0x01 || ReferenceBytes if present.
  4. ExecProfile is encoded as:
    • 0x00 if absent, or
    • 0x01 || ExecProfileBytes if present.
  5. SID hash input is the concatenation of the above fields with no padding.

ExecProfileBytes is an opaque, deterministic byte sequence defined by the execution environment. Any change in encoding or content MUST change SID.

2.2 Structural Index

Purpose: map structural identity to a derivation DAG node.

SID -> DAG node

Properties:

  • Deterministic and rebuildable
  • Does not imply materialization
  • May be in-memory or persisted

2.3 Derivation Index

Purpose: map a materialized ArtifactKey to the set of known derivations that produce it.

ArtifactKey -> [DerivationRecord]

Where:

DerivationRecord = { SID, ProgramRef, InputRefs[], ParamsRef, ExecProfile }

Properties:

  • Authoritative for known derivations
  • Recomputable from replay + execution, so storage is optional
  • Enables dedup and semantic correlation across multiple derivations
  • Multiple SIDs MAY map to the same ArtifactKey

2.3.1 DerivationRecord Data Model (Normative)

Canonical fields:

DerivationRecord {
  sid: SID
  program_ref: Reference
  input_refs: Reference[]   // ordered as Program DAG inputs
  params_ref: Optional<Reference>
  exec_profile: Optional<ExecProfileBytes>
}

Rules:

  • input_refs order MUST be preserved.
  • params_ref and exec_profile MUST use explicit presence markers as defined in SID canonicalization.
  • Additional metadata MAY be stored but MUST NOT affect SID or canonical equivalence.

2.4 Materialization Cache

Purpose: record previously materialized content for a structural identity.

SID -> ArtifactKey

Properties:

  • Redundant and safe to drop
  • Recomputable from DAG + content index
  • Pure performance optimization

3. Dependency Rules (Normative)

Dependencies MUST follow this direction:

Structural Index -> Materialization Cache -> Content Index
Derivation Index -> Content Index

Rules:

  • The Content Index MUST NOT depend on the Structural Index.
  • The Structural Index MUST NOT depend on stored bytes.
  • The Materialization Cache MAY depend on both.
  • The Derivation Index MAY depend on the Content Index.

4. PUT/GET Interaction (Informative)

  • PUT registers structure (if used), resolves to an ArtifactKey, and updates the Content Index.
  • GET consults only the Content Index and reads bytes from the store.
  • The Structural Index, Derivation Index, and Materialization Cache are optional optimizations for PUT.

Note: versioning relationships are modeled in TGK, not in these indexes.


5. Non-Goals

ASL/INDEXES/1 does not define:

  • Encodings for any index
  • Storage layout or sharding
  • Query operators or traversal semantics

Changelog

  • 2026-01-18: Added SID canonicalization rules and DerivationRecord data model.