amduat-api/notes/multiple indexes.md
2026-01-17 00:19:49 +01:00

6.3 KiB
Raw Blame History

Yes — there are multiple indexes, and the key to keeping this clean is to separate them by semantic responsibility, exactly like your ASL-CORE-INDEX spec already does.

Below is a concrete, minimal implementation plan that fits perfectly with your current artifact.py and keeps q-bits cheap.

Ill give you:

  1. The index taxonomy (what indexes exist and why)
  2. A clean architecture (who depends on whom)
  3. Concrete Python implementations for each index
  4. How they interact during put() and get()

No hand-waving.


1 The indexes you actually need

You need three distinct indexes, not one.

A. Content Index (CID → ArtifactLocation)

This is ASL-CORE-INDEX

Purpose:

Given a semantic artifact identity, where are the bytes?

CID → (BlockID, offset, length)

Properties:

  • Snapshot-relative
  • Append-only
  • Immutable
  • Shadowable (optional)

This is the storage-facing index.


B. Structural Index (SID → DAG node)

This is your PEL / derivation index

Purpose:

Given a structure identity, what DAG node exists?

SID → Artifact (structure + children + params)

Properties:

  • In-memory or persisted
  • Deterministic
  • Rebuildable
  • Does NOT imply materialization

This index is what lets you:

  • compare derivations
  • reuse structure
  • deduplicate DAG nodes
  • avoid rebuilding identical DAGs

C. Materialization Cache (SID → CID)

This is the execution shortcut

Purpose:

If I already materialized this structure, what content did it produce?

SID → CID

Properties:

  • Fully redundant
  • Recomputable from DAG + content index
  • Safe to drop entirely
  • Huge performance win

This is the cache you already implemented — now we formalize it.


2 Dependency graph (this matters)

          ┌─────────────┐
          │ Structural  │
          │   Index     │  SID → DAG
          └──────┬──────┘
                 │
        materialize()
                 │
          ┌──────▼──────┐
          │ Materialize │
          │   Cache     │  SID → CID
          └──────┬──────┘
                 │
          ┌──────▼──────┐
          │ Content     │
          │   Index     │  CID → bytes
          └─────────────┘

Important invariant:

  • Content index never depends on DAG
  • Structural index never depends on bytes
  • Cache depends on both, but is optional

This keeps q-bits cheap.


3 Concrete Python implementations

A. Content Index (ASL-CORE-INDEX)

# content_index.py
from dataclasses import dataclass
from typing import Dict, Optional

@dataclass(frozen=True)
class ArtifactLocation:
    block_id: str
    offset: int
    length: int


class ContentIndex:
    """
    Semantic index: CID -> ArtifactLocation
    """
    def __init__(self):
        self._index: Dict[str, ArtifactLocation] = {}

    def get(self, cid: str) -> Optional[ArtifactLocation]:
        return self._index.get(cid)

    def put(self, cid: str, loc: ArtifactLocation):
        # Immutable once visible
        if cid in self._index:
            return
        self._index[cid] = loc

This is your ASL-CORE-INDEX in executable form.


B. Structural Index (PEL / DAG index)

# structural_index.py
from typing import Dict, Optional
from artifact import Artifact

class StructuralIndex:
    """
    SID -> Artifact (DAG node)
    """
    def __init__(self):
        self._nodes: Dict[str, Artifact] = {}

    def get(self, sid: str) -> Optional[Artifact]:
        return self._nodes.get(sid)

    def put(self, artifact: Artifact) -> Artifact:
        """
        Deduplicate DAG nodes by SID.
        """
        existing = self._nodes.get(artifact.sid)
        if existing is not None:
            return existing
        self._nodes[artifact.sid] = artifact
        return artifact

This ensures:

  • One DAG node per SID
  • Structural deduplication
  • Cheap comparisons

C. Materialization Cache (formalized)

You already have this — heres the final clean version:

# materialization_cache.py
from typing import Dict, Optional

class MaterializationCache:
    """
    Fully redundant cache: SID -> CID
    """
    def __init__(self):
        self._cache: Dict[str, str] = {}

    def get(self, sid: str) -> Optional[str]:
        return self._cache.get(sid)

    def put(self, sid: str, cid: str):
        self._cache[sid] = cid

4 How put() actually works (step-by-step)

This is the critical flow.

put(artifact)

1. StructuralIndex.put(artifact)
   └─ ensures DAG node exists

2. Try MaterializationCache.get(artifact.sid)
   └─ if hit → CID known, skip computation

3. If miss:
   a) Materialize DAG
   b) Compute CID
   c) Cache SID → CID

4. ContentIndex.get(CID)
   └─ if exists → done (dedup)

5. Else:
   a) Allocate block space
   b) Write bytes
   c) ContentIndex.put(CID → location)

Key property:

If any index already knows the answer, work stops early.

Thats how you recover classical performance.


5 Why this works for q-bits without cost

Because:

  • q-bits only affect step 3
  • steps 1, 4, 5 are identical to classical bytes

Once materialized:

  • q-bit artifacts are indistinguishable from classical artifacts
  • index sees only CIDs and byte ranges

No tax after first computation.


6 What you now have (this is important)

You now have:

Layer Capability
DAG Explanation / provenance
SID Structural identity
CID Semantic identity
Content index Storage truth
Cache Performance
Snapshots + logs Time

This is not common infrastructure. Thats why you couldnt find it pre-built.


If you want next

I can:

  • add snapshot + append-only log semantics to ContentIndex
  • wire these indexes into a single ASLStore
  • show federation / replication mechanics
  • show LLM → DAG → CID ingestion
  • formalize PEL traversal queries

Just point the direction.