amduat/tier1/asl-indexes-1.md

227 lines
5.4 KiB
Markdown
Raw Permalink Normal View History

2026-01-17 11:18:00 +01:00
# ASL/INDEXES/1 -- Index Taxonomy and Relationships
Status: Draft
Owner: Architecture
Version: 0.1.0
SoT: No
Last Updated: 2025-01-17
Linked Phase Pack: N/A
Tags: [indexes, content, structural, materialization]
<!-- Source: /amduat-api/tier1/asl-indexes-1.md | Canonical: /amduat/tier1/asl-indexes-1.md -->
**Document ID:** `ASL/INDEXES/1`
**Layer:** L2 -- Index taxonomy (no encoding)
**Depends on (normative):**
* `ASL/1-CORE-INDEX`
* `ASL/STORE-INDEX/1`
**Informative references:**
* `ASL/SYSTEM/1`
* `TGK/1`
* `ENC/ASL-CORE-INDEX/1`
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHOULD**, and **MAY** are to be interpreted as in RFC 2119.
ASL/INDEXES/1 defines index roles and relationships. It does not define encodings or storage layouts.
---
## 1. Purpose
This document defines the minimal set of indexes used by ASL systems and their dependency relationships.
---
## 2. Index Taxonomy (Normative)
ASL systems use three distinct indexes:
### 2.1 Content Index
Purpose: map semantic identity to bytes.
```
ArtifactKey -> ArtifactLocation
```
Properties:
* Snapshot-relative and append-only
* Deterministic replay
* Optional tombstone shadowing
This is the ASL/1-CORE-INDEX and is the only index that governs visibility.
2026-01-18 06:55:00 +01:00
### 2.2 Structural Identity (SID)
SID is the canonical identity of a derivation, not of bytes.
```
SID = H(ProgramRef || Inputs[] || ParamsRef || ExecProfile)
```
Notes:
* `Inputs[]` order is canonical and stable.
* `ParamsRef` is optional; absence must be encoded explicitly in the hash.
* `ExecProfile` captures execution profile/versioning parameters (optional, but
presence/absence is part of the SID).
### 2.2.1 SID Canonicalization (Normative)
Implementations MUST canonicalize SID inputs as follows:
1. **ProgramRef** is encoded as `ReferenceBytes` (`ENC/ASL1-CORE`).
2. **Inputs[]** are ordered exactly as declared by the Program DAG inputs.
3. **ParamsRef** is encoded as:
* `0x00` if absent, or
* `0x01 || ReferenceBytes` if present.
4. **ExecProfile** is encoded as:
* `0x00` if absent, or
* `0x01 || ExecProfileBytes` if present.
5. **SID hash input** is the concatenation of the above fields with no padding.
`ExecProfileBytes` is an opaque, deterministic byte sequence defined by the
execution environment. Any change in encoding or content MUST change SID.
2026-01-17 11:18:00 +01:00
### 2.2 Structural Index
Purpose: map structural identity to a derivation DAG node.
```
SID -> DAG node
```
Properties:
* Deterministic and rebuildable
* Does not imply materialization
* May be in-memory or persisted
2026-01-18 06:55:00 +01:00
### 2.3 Derivation Index
Purpose: map a materialized ArtifactKey to the set of known derivations that
produce it.
```
ArtifactKey -> [DerivationRecord]
```
Where:
```
DerivationRecord = { SID, ProgramRef, InputRefs[], ParamsRef, ExecProfile }
```
Properties:
* Authoritative for known derivations
* Recomputable from replay + execution, so storage is optional
* Enables dedup and semantic correlation across multiple derivations
* Multiple SIDs MAY map to the same ArtifactKey
#### 2.3.1 DerivationRecord Data Model (Normative)
Canonical fields:
```
DerivationRecord {
sid: SID
program_ref: Reference
input_refs: Reference[] // ordered as Program DAG inputs
params_ref: Optional<Reference>
exec_profile: Optional<ExecProfileBytes>
}
```
Rules:
* `input_refs` order MUST be preserved.
* `params_ref` and `exec_profile` MUST use explicit presence markers as defined
in SID canonicalization.
* Additional metadata MAY be stored but MUST NOT affect SID or canonical
equivalence.
### 2.4 Materialization Cache
2026-01-17 11:18:00 +01:00
Purpose: record previously materialized content for a structural identity.
```
SID -> ArtifactKey
```
Properties:
* Redundant and safe to drop
* Recomputable from DAG + content index
* Pure performance optimization
---
## 3. Dependency Rules (Normative)
Dependencies MUST follow this direction:
```
Structural Index -> Materialization Cache -> Content Index
2026-01-18 06:55:00 +01:00
Derivation Index -> Content Index
2026-01-17 11:18:00 +01:00
```
Rules:
* The Content Index MUST NOT depend on the Structural Index.
* The Structural Index MUST NOT depend on stored bytes.
* The Materialization Cache MAY depend on both.
2026-01-18 06:55:00 +01:00
* The Derivation Index MAY depend on the Content Index.
2026-01-17 11:18:00 +01:00
---
## 4. PUT/GET Interaction (Informative)
* PUT registers structure (if used), resolves to an ArtifactKey, and updates the Content Index.
* GET consults only the Content Index and reads bytes from the store.
2026-01-18 06:55:00 +01:00
* The Structural Index, Derivation Index, and Materialization Cache are optional
optimizations for PUT.
Note: versioning relationships are modeled in TGK, not in these indexes.
2026-01-17 11:18:00 +01:00
---
## 5. Non-Goals
ASL/INDEXES/1 does not define:
* Encodings for any index
* Storage layout or sharding
* Query operators or traversal semantics
2026-01-18 06:55:00 +01:00
---
## Changelog
- 2026-01-18: Added SID canonicalization rules and DerivationRecord data model.