amduat-api/notes/multiple indexes.md

Yes — there are **multiple indexes**, and the key to keeping this clean is to **separate them by semantic responsibility**, exactly like your ASL-CORE-INDEX spec already does.

Below is a **concrete, minimal implementation plan** that fits *perfectly* with your current `artifact.py` and keeps q-bits cheap.

I’ll give you:

1. The **index taxonomy** (what indexes exist and why)
2. A **clean architecture** (who depends on whom)
3. **Concrete Python implementations** for each index
4. How they interact during `put()` and `get()`

No hand-waving.

---

# 1️⃣ The indexes you actually need

You need **three distinct indexes**, not one.

## A. Content Index (CID → ArtifactLocation)

**This *is* ASL-CORE-INDEX**

Purpose:

> Given a semantic artifact identity, where are the bytes?

```text
CID → (BlockID, offset, length)
```

Properties:

* Snapshot-relative
* Append-only
* Immutable
* Shadowable (optional)

This is the *storage-facing* index.

---

## B. Structural Index (SID → DAG node)

**This is your PEL / derivation index**

Purpose:

> Given a structure identity, what DAG node exists?

```text
SID → Artifact (structure + children + params)
```

Properties:

* In-memory or persisted
* Deterministic
* Rebuildable
* Does NOT imply materialization

This index is what lets you:

* compare derivations
* reuse structure
* deduplicate DAG nodes
* avoid rebuilding identical DAGs

---

## C. Materialization Cache (SID → CID)

**This is the execution shortcut**

Purpose:

> If I already materialized this structure, what content did it produce?

```text
SID → CID
```

Properties:

* Fully redundant
* Recomputable from DAG + content index
* Safe to drop entirely
* Huge performance win

This is the cache you already implemented — now we formalize it.

---

# 2️⃣ Dependency graph (this matters)

```text
          ┌─────────────┐
          │ Structural  │
          │   Index     │  SID → DAG
          └──────┬──────┘
                 │
        materialize()
                 │
          ┌──────▼──────┐
          │ Materialize │
          │   Cache     │  SID → CID
          └──────┬──────┘
                 │
          ┌──────▼──────┐
          │ Content     │
          │   Index     │  CID → bytes
          └─────────────┘
```

**Important invariant**:

* Content index never depends on DAG
* Structural index never depends on bytes
* Cache depends on both, but is optional

This keeps q-bits cheap.

---

# 3️⃣ Concrete Python implementations

## A. Content Index (ASL-CORE-INDEX)

```python
# content_index.py
from dataclasses import dataclass
from typing import Dict, Optional

@dataclass(frozen=True)
class ArtifactLocation:
    block_id: str
    offset: int
    length: int


class ContentIndex:
    """
    Semantic index: CID -> ArtifactLocation
    """
    def __init__(self):
        self._index: Dict[str, ArtifactLocation] = {}

    def get(self, cid: str) -> Optional[ArtifactLocation]:
        return self._index.get(cid)

    def put(self, cid: str, loc: ArtifactLocation):
        # Immutable once visible
        if cid in self._index:
            return
        self._index[cid] = loc
```

This is your **ASL-CORE-INDEX** in executable form.

---

## B. Structural Index (PEL / DAG index)

```python
# structural_index.py
from typing import Dict, Optional
from artifact import Artifact

class StructuralIndex:
    """
    SID -> Artifact (DAG node)
    """
    def __init__(self):
        self._nodes: Dict[str, Artifact] = {}

    def get(self, sid: str) -> Optional[Artifact]:
        return self._nodes.get(sid)

    def put(self, artifact: Artifact) -> Artifact:
        """
        Deduplicate DAG nodes by SID.
        """
        existing = self._nodes.get(artifact.sid)
        if existing is not None:
            return existing
        self._nodes[artifact.sid] = artifact
        return artifact
```

This ensures:

* One DAG node per SID
* Structural deduplication
* Cheap comparisons

---

## C. Materialization Cache (formalized)

You already have this — here’s the final clean version:

```python
# materialization_cache.py
from typing import Dict, Optional

class MaterializationCache:
    """
    Fully redundant cache: SID -> CID
    """
    def __init__(self):
        self._cache: Dict[str, str] = {}

    def get(self, sid: str) -> Optional[str]:
        return self._cache.get(sid)

    def put(self, sid: str, cid: str):
        self._cache[sid] = cid
```

---

# 4️⃣ How `put()` actually works (step-by-step)

This is the **critical flow**.

### `put(artifact)`

```text
1. StructuralIndex.put(artifact)
   └─ ensures DAG node exists

2. Try MaterializationCache.get(artifact.sid)
   └─ if hit → CID known, skip computation

3. If miss:
   a) Materialize DAG
   b) Compute CID
   c) Cache SID → CID

4. ContentIndex.get(CID)
   └─ if exists → done (dedup)

5. Else:
   a) Allocate block space
   b) Write bytes
   c) ContentIndex.put(CID → location)
```

**Key property**:

> If *any* index already knows the answer, work stops early.

That’s how you recover classical performance.

---

# 5️⃣ Why this works for q-bits without cost

Because:

* q-bits only affect **step 3**
* steps 1, 4, 5 are identical to classical bytes

Once materialized:

* q-bit artifacts are indistinguishable from classical artifacts
* index sees only CIDs and byte ranges

No tax after first computation.

---

# 6️⃣ What you now have (this is important)

You now have:

| Layer            | Capability               |
| ---------------- | ------------------------ |
| DAG              | Explanation / provenance |
| SID              | Structural identity      |
| CID              | Semantic identity        |
| Content index    | Storage truth            |
| Cache            | Performance              |
| Snapshots + logs | Time                     |

This is **not common infrastructure**.
That’s why you couldn’t find it pre-built.

---

## If you want next

I can:

* add **snapshot + append-only log semantics** to `ContentIndex`
* wire these indexes into a single `ASLStore`
* show **federation / replication** mechanics
* show **LLM → DAG → CID** ingestion
* formalize **PEL traversal queries**

Just point the direction.