303 lines
6.3 KiB
Markdown
303 lines
6.3 KiB
Markdown
|
|
Yes — there are **multiple indexes**, and the key to keeping this clean is to **separate them by semantic responsibility**, exactly like your ASL-CORE-INDEX spec already does.
|
|||
|
|
|
|||
|
|
Below is a **concrete, minimal implementation plan** that fits *perfectly* with your current `artifact.py` and keeps q-bits cheap.
|
|||
|
|
|
|||
|
|
I’ll give you:
|
|||
|
|
|
|||
|
|
1. The **index taxonomy** (what indexes exist and why)
|
|||
|
|
2. A **clean architecture** (who depends on whom)
|
|||
|
|
3. **Concrete Python implementations** for each index
|
|||
|
|
4. How they interact during `put()` and `get()`
|
|||
|
|
|
|||
|
|
No hand-waving.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 1️⃣ The indexes you actually need
|
|||
|
|
|
|||
|
|
You need **three distinct indexes**, not one.
|
|||
|
|
|
|||
|
|
## A. Content Index (CID → ArtifactLocation)
|
|||
|
|
|
|||
|
|
**This *is* ASL-CORE-INDEX**
|
|||
|
|
|
|||
|
|
Purpose:
|
|||
|
|
|
|||
|
|
> Given a semantic artifact identity, where are the bytes?
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
CID → (BlockID, offset, length)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Properties:
|
|||
|
|
|
|||
|
|
* Snapshot-relative
|
|||
|
|
* Append-only
|
|||
|
|
* Immutable
|
|||
|
|
* Shadowable (optional)
|
|||
|
|
|
|||
|
|
This is the *storage-facing* index.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## B. Structural Index (SID → DAG node)
|
|||
|
|
|
|||
|
|
**This is your PEL / derivation index**
|
|||
|
|
|
|||
|
|
Purpose:
|
|||
|
|
|
|||
|
|
> Given a structure identity, what DAG node exists?
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
SID → Artifact (structure + children + params)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Properties:
|
|||
|
|
|
|||
|
|
* In-memory or persisted
|
|||
|
|
* Deterministic
|
|||
|
|
* Rebuildable
|
|||
|
|
* Does NOT imply materialization
|
|||
|
|
|
|||
|
|
This index is what lets you:
|
|||
|
|
|
|||
|
|
* compare derivations
|
|||
|
|
* reuse structure
|
|||
|
|
* deduplicate DAG nodes
|
|||
|
|
* avoid rebuilding identical DAGs
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## C. Materialization Cache (SID → CID)
|
|||
|
|
|
|||
|
|
**This is the execution shortcut**
|
|||
|
|
|
|||
|
|
Purpose:
|
|||
|
|
|
|||
|
|
> If I already materialized this structure, what content did it produce?
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
SID → CID
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Properties:
|
|||
|
|
|
|||
|
|
* Fully redundant
|
|||
|
|
* Recomputable from DAG + content index
|
|||
|
|
* Safe to drop entirely
|
|||
|
|
* Huge performance win
|
|||
|
|
|
|||
|
|
This is the cache you already implemented — now we formalize it.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 2️⃣ Dependency graph (this matters)
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
┌─────────────┐
|
|||
|
|
│ Structural │
|
|||
|
|
│ Index │ SID → DAG
|
|||
|
|
└──────┬──────┘
|
|||
|
|
│
|
|||
|
|
materialize()
|
|||
|
|
│
|
|||
|
|
┌──────▼──────┐
|
|||
|
|
│ Materialize │
|
|||
|
|
│ Cache │ SID → CID
|
|||
|
|
└──────┬──────┘
|
|||
|
|
│
|
|||
|
|
┌──────▼──────┐
|
|||
|
|
│ Content │
|
|||
|
|
│ Index │ CID → bytes
|
|||
|
|
└─────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Important invariant**:
|
|||
|
|
|
|||
|
|
* Content index never depends on DAG
|
|||
|
|
* Structural index never depends on bytes
|
|||
|
|
* Cache depends on both, but is optional
|
|||
|
|
|
|||
|
|
This keeps q-bits cheap.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 3️⃣ Concrete Python implementations
|
|||
|
|
|
|||
|
|
## A. Content Index (ASL-CORE-INDEX)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# content_index.py
|
|||
|
|
from dataclasses import dataclass
|
|||
|
|
from typing import Dict, Optional
|
|||
|
|
|
|||
|
|
@dataclass(frozen=True)
|
|||
|
|
class ArtifactLocation:
|
|||
|
|
block_id: str
|
|||
|
|
offset: int
|
|||
|
|
length: int
|
|||
|
|
|
|||
|
|
|
|||
|
|
class ContentIndex:
|
|||
|
|
"""
|
|||
|
|
Semantic index: CID -> ArtifactLocation
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
self._index: Dict[str, ArtifactLocation] = {}
|
|||
|
|
|
|||
|
|
def get(self, cid: str) -> Optional[ArtifactLocation]:
|
|||
|
|
return self._index.get(cid)
|
|||
|
|
|
|||
|
|
def put(self, cid: str, loc: ArtifactLocation):
|
|||
|
|
# Immutable once visible
|
|||
|
|
if cid in self._index:
|
|||
|
|
return
|
|||
|
|
self._index[cid] = loc
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This is your **ASL-CORE-INDEX** in executable form.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## B. Structural Index (PEL / DAG index)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# structural_index.py
|
|||
|
|
from typing import Dict, Optional
|
|||
|
|
from artifact import Artifact
|
|||
|
|
|
|||
|
|
class StructuralIndex:
|
|||
|
|
"""
|
|||
|
|
SID -> Artifact (DAG node)
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
self._nodes: Dict[str, Artifact] = {}
|
|||
|
|
|
|||
|
|
def get(self, sid: str) -> Optional[Artifact]:
|
|||
|
|
return self._nodes.get(sid)
|
|||
|
|
|
|||
|
|
def put(self, artifact: Artifact) -> Artifact:
|
|||
|
|
"""
|
|||
|
|
Deduplicate DAG nodes by SID.
|
|||
|
|
"""
|
|||
|
|
existing = self._nodes.get(artifact.sid)
|
|||
|
|
if existing is not None:
|
|||
|
|
return existing
|
|||
|
|
self._nodes[artifact.sid] = artifact
|
|||
|
|
return artifact
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This ensures:
|
|||
|
|
|
|||
|
|
* One DAG node per SID
|
|||
|
|
* Structural deduplication
|
|||
|
|
* Cheap comparisons
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## C. Materialization Cache (formalized)
|
|||
|
|
|
|||
|
|
You already have this — here’s the final clean version:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# materialization_cache.py
|
|||
|
|
from typing import Dict, Optional
|
|||
|
|
|
|||
|
|
class MaterializationCache:
|
|||
|
|
"""
|
|||
|
|
Fully redundant cache: SID -> CID
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
self._cache: Dict[str, str] = {}
|
|||
|
|
|
|||
|
|
def get(self, sid: str) -> Optional[str]:
|
|||
|
|
return self._cache.get(sid)
|
|||
|
|
|
|||
|
|
def put(self, sid: str, cid: str):
|
|||
|
|
self._cache[sid] = cid
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 4️⃣ How `put()` actually works (step-by-step)
|
|||
|
|
|
|||
|
|
This is the **critical flow**.
|
|||
|
|
|
|||
|
|
### `put(artifact)`
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
1. StructuralIndex.put(artifact)
|
|||
|
|
└─ ensures DAG node exists
|
|||
|
|
|
|||
|
|
2. Try MaterializationCache.get(artifact.sid)
|
|||
|
|
└─ if hit → CID known, skip computation
|
|||
|
|
|
|||
|
|
3. If miss:
|
|||
|
|
a) Materialize DAG
|
|||
|
|
b) Compute CID
|
|||
|
|
c) Cache SID → CID
|
|||
|
|
|
|||
|
|
4. ContentIndex.get(CID)
|
|||
|
|
└─ if exists → done (dedup)
|
|||
|
|
|
|||
|
|
5. Else:
|
|||
|
|
a) Allocate block space
|
|||
|
|
b) Write bytes
|
|||
|
|
c) ContentIndex.put(CID → location)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key property**:
|
|||
|
|
|
|||
|
|
> If *any* index already knows the answer, work stops early.
|
|||
|
|
|
|||
|
|
That’s how you recover classical performance.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 5️⃣ Why this works for q-bits without cost
|
|||
|
|
|
|||
|
|
Because:
|
|||
|
|
|
|||
|
|
* q-bits only affect **step 3**
|
|||
|
|
* steps 1, 4, 5 are identical to classical bytes
|
|||
|
|
|
|||
|
|
Once materialized:
|
|||
|
|
|
|||
|
|
* q-bit artifacts are indistinguishable from classical artifacts
|
|||
|
|
* index sees only CIDs and byte ranges
|
|||
|
|
|
|||
|
|
No tax after first computation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 6️⃣ What you now have (this is important)
|
|||
|
|
|
|||
|
|
You now have:
|
|||
|
|
|
|||
|
|
| Layer | Capability |
|
|||
|
|
| ---------------- | ------------------------ |
|
|||
|
|
| DAG | Explanation / provenance |
|
|||
|
|
| SID | Structural identity |
|
|||
|
|
| CID | Semantic identity |
|
|||
|
|
| Content index | Storage truth |
|
|||
|
|
| Cache | Performance |
|
|||
|
|
| Snapshots + logs | Time |
|
|||
|
|
|
|||
|
|
This is **not common infrastructure**.
|
|||
|
|
That’s why you couldn’t find it pre-built.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## If you want next
|
|||
|
|
|
|||
|
|
I can:
|
|||
|
|
|
|||
|
|
* add **snapshot + append-only log semantics** to `ContentIndex`
|
|||
|
|
* wire these indexes into a single `ASLStore`
|
|||
|
|
* show **federation / replication** mechanics
|
|||
|
|
* show **LLM → DAG → CID** ingestion
|
|||
|
|
* formalize **PEL traversal queries**
|
|||
|
|
|
|||
|
|
Just point the direction.
|
|||
|
|
|