6.3 KiB
Yes — there are multiple indexes, and the key to keeping this clean is to separate them by semantic responsibility, exactly like your ASL-CORE-INDEX spec already does.
Below is a concrete, minimal implementation plan that fits perfectly with your current artifact.py and keeps q-bits cheap.
I’ll give you:
- The index taxonomy (what indexes exist and why)
- A clean architecture (who depends on whom)
- Concrete Python implementations for each index
- How they interact during
put()andget()
No hand-waving.
1️⃣ The indexes you actually need
You need three distinct indexes, not one.
A. Content Index (CID → ArtifactLocation)
This is ASL-CORE-INDEX
Purpose:
Given a semantic artifact identity, where are the bytes?
CID → (BlockID, offset, length)
Properties:
- Snapshot-relative
- Append-only
- Immutable
- Shadowable (optional)
This is the storage-facing index.
B. Structural Index (SID → DAG node)
This is your PEL / derivation index
Purpose:
Given a structure identity, what DAG node exists?
SID → Artifact (structure + children + params)
Properties:
- In-memory or persisted
- Deterministic
- Rebuildable
- Does NOT imply materialization
This index is what lets you:
- compare derivations
- reuse structure
- deduplicate DAG nodes
- avoid rebuilding identical DAGs
C. Materialization Cache (SID → CID)
This is the execution shortcut
Purpose:
If I already materialized this structure, what content did it produce?
SID → CID
Properties:
- Fully redundant
- Recomputable from DAG + content index
- Safe to drop entirely
- Huge performance win
This is the cache you already implemented — now we formalize it.
2️⃣ Dependency graph (this matters)
┌─────────────┐
│ Structural │
│ Index │ SID → DAG
└──────┬──────┘
│
materialize()
│
┌──────▼──────┐
│ Materialize │
│ Cache │ SID → CID
└──────┬──────┘
│
┌──────▼──────┐
│ Content │
│ Index │ CID → bytes
└─────────────┘
Important invariant:
- Content index never depends on DAG
- Structural index never depends on bytes
- Cache depends on both, but is optional
This keeps q-bits cheap.
3️⃣ Concrete Python implementations
A. Content Index (ASL-CORE-INDEX)
# content_index.py
from dataclasses import dataclass
from typing import Dict, Optional
@dataclass(frozen=True)
class ArtifactLocation:
block_id: str
offset: int
length: int
class ContentIndex:
"""
Semantic index: CID -> ArtifactLocation
"""
def __init__(self):
self._index: Dict[str, ArtifactLocation] = {}
def get(self, cid: str) -> Optional[ArtifactLocation]:
return self._index.get(cid)
def put(self, cid: str, loc: ArtifactLocation):
# Immutable once visible
if cid in self._index:
return
self._index[cid] = loc
This is your ASL-CORE-INDEX in executable form.
B. Structural Index (PEL / DAG index)
# structural_index.py
from typing import Dict, Optional
from artifact import Artifact
class StructuralIndex:
"""
SID -> Artifact (DAG node)
"""
def __init__(self):
self._nodes: Dict[str, Artifact] = {}
def get(self, sid: str) -> Optional[Artifact]:
return self._nodes.get(sid)
def put(self, artifact: Artifact) -> Artifact:
"""
Deduplicate DAG nodes by SID.
"""
existing = self._nodes.get(artifact.sid)
if existing is not None:
return existing
self._nodes[artifact.sid] = artifact
return artifact
This ensures:
- One DAG node per SID
- Structural deduplication
- Cheap comparisons
C. Materialization Cache (formalized)
You already have this — here’s the final clean version:
# materialization_cache.py
from typing import Dict, Optional
class MaterializationCache:
"""
Fully redundant cache: SID -> CID
"""
def __init__(self):
self._cache: Dict[str, str] = {}
def get(self, sid: str) -> Optional[str]:
return self._cache.get(sid)
def put(self, sid: str, cid: str):
self._cache[sid] = cid
4️⃣ How put() actually works (step-by-step)
This is the critical flow.
put(artifact)
1. StructuralIndex.put(artifact)
└─ ensures DAG node exists
2. Try MaterializationCache.get(artifact.sid)
└─ if hit → CID known, skip computation
3. If miss:
a) Materialize DAG
b) Compute CID
c) Cache SID → CID
4. ContentIndex.get(CID)
└─ if exists → done (dedup)
5. Else:
a) Allocate block space
b) Write bytes
c) ContentIndex.put(CID → location)
Key property:
If any index already knows the answer, work stops early.
That’s how you recover classical performance.
5️⃣ Why this works for q-bits without cost
Because:
- q-bits only affect step 3
- steps 1, 4, 5 are identical to classical bytes
Once materialized:
- q-bit artifacts are indistinguishable from classical artifacts
- index sees only CIDs and byte ranges
No tax after first computation.
6️⃣ What you now have (this is important)
You now have:
| Layer | Capability |
|---|---|
| DAG | Explanation / provenance |
| SID | Structural identity |
| CID | Semantic identity |
| Content index | Storage truth |
| Cache | Performance |
| Snapshots + logs | Time |
This is not common infrastructure. That’s why you couldn’t find it pre-built.
If you want next
I can:
- add snapshot + append-only log semantics to
ContentIndex - wire these indexes into a single
ASLStore - show federation / replication mechanics
- show LLM → DAG → CID ingestion
- formalize PEL traversal queries
Just point the direction.