359 lines
8.5 KiB
Markdown
359 lines
8.5 KiB
Markdown
You’re right to call this out 👍
|
||
What I gave you so far had the **semantic behavior**, but it stopped *just short* of the thing that actually makes it operationally usable: **explicit snapshot identity and log positions**.
|
||
|
||
Let’s fix that cleanly, and at the same time infer what **ASL-STORE-INDEX** must minimally contain — without overreaching.
|
||
|
||
---
|
||
|
||
## 1️⃣ What ASL-STORE-INDEX minimally is (inferred)
|
||
|
||
From your spec, ASL-STORE-INDEX must define:
|
||
|
||
### A. Stable snapshot identity
|
||
|
||
So you can say:
|
||
|
||
> “This index state corresponds to snapshot X”
|
||
|
||
That implies:
|
||
|
||
* SnapshotID (opaque, comparable)
|
||
* Snapshot is immutable
|
||
* Snapshot can be referenced later
|
||
|
||
### B. Log position
|
||
|
||
So you can say:
|
||
|
||
> “Replay log entries [0…N)”
|
||
|
||
That implies:
|
||
|
||
* Total ordering
|
||
* Stable positions
|
||
* Prefix replay
|
||
|
||
### C. Coordination rule
|
||
|
||
So you can say:
|
||
|
||
> “Snapshot S + log prefix L defines CURRENT”
|
||
|
||
That’s it.
|
||
No storage layout, no IO, no performance policy.
|
||
|
||
---
|
||
|
||
## 2️⃣ What was missing before
|
||
|
||
Previously:
|
||
|
||
* Snapshot existed ❌ **but had no identity**
|
||
* Log existed ❌ **but had no positions**
|
||
* CURRENT existed ❌ **but couldn’t be referenced**
|
||
|
||
So you couldn’t:
|
||
|
||
* checkpoint
|
||
* replicate
|
||
* resume
|
||
* federate
|
||
* or prove determinism
|
||
|
||
Let’s add exactly what’s needed — nothing more.
|
||
|
||
---
|
||
|
||
## 3️⃣ Updated semantic model
|
||
|
||
We make **three things explicit**:
|
||
|
||
```text
|
||
SnapshotID = opaque identifier
|
||
LogPosition = integer (monotonic)
|
||
IndexState = (SnapshotID, LogPosition)
|
||
```
|
||
|
||
And the invariant becomes:
|
||
|
||
```
|
||
Index(SnapshotID, LogPosition) =
|
||
Snapshot[SnapshotID] + replay(log[0:LogPosition])
|
||
```
|
||
|
||
---
|
||
|
||
## 4️⃣ Updated implementation (drop-in replacement)
|
||
|
||
### `content_index.py` (snapshot + log positions)
|
||
|
||
```python
|
||
from __future__ import annotations
|
||
from dataclasses import dataclass
|
||
from typing import Dict, Optional, List, Iterable
|
||
from enum import Enum, auto
|
||
import uuid
|
||
|
||
|
||
# ---------------------------------------------------------------------
|
||
# Core data types
|
||
# ---------------------------------------------------------------------
|
||
|
||
SnapshotID = str
|
||
LogPosition = int
|
||
|
||
|
||
@dataclass(frozen=True)
|
||
class ArtifactLocation:
|
||
block_id: str
|
||
offset: int
|
||
length: int
|
||
|
||
|
||
class LogOp(Enum):
|
||
PUT = auto()
|
||
TOMBSTONE = auto()
|
||
|
||
|
||
@dataclass(frozen=True)
|
||
class IndexLogEntry:
|
||
position: LogPosition
|
||
op: LogOp
|
||
artifact_key: str
|
||
location: Optional[ArtifactLocation] = None
|
||
|
||
|
||
# ---------------------------------------------------------------------
|
||
# Snapshot
|
||
# ---------------------------------------------------------------------
|
||
|
||
class ContentIndexSnapshot:
|
||
"""
|
||
Immutable snapshot of the content index.
|
||
"""
|
||
def __init__(
|
||
self,
|
||
snapshot_id: SnapshotID,
|
||
mapping: Dict[str, ArtifactLocation],
|
||
log_position: LogPosition,
|
||
):
|
||
self.snapshot_id = snapshot_id
|
||
self.log_position = log_position
|
||
self._mapping = dict(mapping)
|
||
|
||
def get(self, key: str) -> Optional[ArtifactLocation]:
|
||
return self._mapping.get(key)
|
||
|
||
def items(self):
|
||
return self._mapping.items()
|
||
```
|
||
|
||
---
|
||
|
||
### Append-only log with positions
|
||
|
||
```python
|
||
class ContentIndexLog:
|
||
"""
|
||
Append-only, totally ordered log.
|
||
"""
|
||
def __init__(self, start_position: LogPosition = 0):
|
||
self._entries: List[IndexLogEntry] = []
|
||
self._next_position = start_position
|
||
|
||
def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
|
||
pos = self._next_position
|
||
self._next_position += 1
|
||
self._entries.append(IndexLogEntry(
|
||
position=pos,
|
||
op=LogOp.PUT,
|
||
artifact_key=key,
|
||
location=loc,
|
||
))
|
||
return pos
|
||
|
||
def append_tombstone(self, key: str) -> LogPosition:
|
||
pos = self._next_position
|
||
self._next_position += 1
|
||
self._entries.append(IndexLogEntry(
|
||
position=pos,
|
||
op=LogOp.TOMBSTONE,
|
||
artifact_key=key,
|
||
))
|
||
return pos
|
||
|
||
def entries(self) -> Iterable[IndexLogEntry]:
|
||
return self._entries
|
||
|
||
def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
|
||
return (e for e in self._entries if e.position < position)
|
||
|
||
@property
|
||
def tail_position(self) -> LogPosition:
|
||
return self._next_position
|
||
```
|
||
|
||
---
|
||
|
||
### ContentIndex with explicit CURRENT
|
||
|
||
```python
|
||
class ContentIndex:
|
||
"""
|
||
ASL-CORE-INDEX with snapshot IDs and log positions.
|
||
"""
|
||
|
||
def __init__(
|
||
self,
|
||
snapshot: Optional[ContentIndexSnapshot] = None,
|
||
log: Optional[ContentIndexLog] = None,
|
||
):
|
||
self._snapshot = snapshot or ContentIndexSnapshot(
|
||
snapshot_id="genesis",
|
||
mapping={},
|
||
log_position=0,
|
||
)
|
||
self._log = log or ContentIndexLog(
|
||
start_position=self._snapshot.log_position
|
||
)
|
||
|
||
# -----------------------------------------------------------------
|
||
# Lookup at CURRENT
|
||
# -----------------------------------------------------------------
|
||
|
||
def lookup(self, key: str) -> Optional[ArtifactLocation]:
|
||
# Shadowing: reverse replay up to CURRENT
|
||
for entry in reversed(list(self._log.entries())):
|
||
if entry.artifact_key != key:
|
||
continue
|
||
if entry.op == LogOp.TOMBSTONE:
|
||
return None
|
||
return entry.location
|
||
|
||
return self._snapshot.get(key)
|
||
|
||
# -----------------------------------------------------------------
|
||
# Append-only mutation
|
||
# -----------------------------------------------------------------
|
||
|
||
def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
|
||
return self._log.append_put(key, loc)
|
||
|
||
def tombstone(self, key: str) -> LogPosition:
|
||
return self._log.append_tombstone(key)
|
||
|
||
# -----------------------------------------------------------------
|
||
# Snapshotting
|
||
# -----------------------------------------------------------------
|
||
|
||
def snapshot(self) -> ContentIndexSnapshot:
|
||
"""
|
||
Create a new immutable snapshot representing CURRENT.
|
||
"""
|
||
materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())
|
||
|
||
for entry in self._log.entries():
|
||
if entry.op == LogOp.PUT:
|
||
materialized[entry.artifact_key] = entry.location
|
||
elif entry.op == LogOp.TOMBSTONE:
|
||
materialized.pop(entry.artifact_key, None)
|
||
|
||
snapshot_id = str(uuid.uuid4())
|
||
snapshot = ContentIndexSnapshot(
|
||
snapshot_id=snapshot_id,
|
||
mapping=materialized,
|
||
log_position=self._log.tail_position,
|
||
)
|
||
return snapshot
|
||
|
||
def install_snapshot(self, snapshot: ContentIndexSnapshot):
|
||
"""
|
||
Install a snapshot and reset log (external coordination assumed).
|
||
"""
|
||
self._snapshot = snapshot
|
||
self._log = ContentIndexLog(start_position=snapshot.log_position)
|
||
|
||
# -----------------------------------------------------------------
|
||
# Observability
|
||
# -----------------------------------------------------------------
|
||
|
||
@property
|
||
def current_snapshot_id(self) -> SnapshotID:
|
||
return self._snapshot.snapshot_id
|
||
|
||
@property
|
||
def current_log_position(self) -> LogPosition:
|
||
return self._log.tail_position
|
||
```
|
||
|
||
---
|
||
|
||
## 5️⃣ Why this is enough (and not too much)
|
||
|
||
You now have:
|
||
|
||
* ✅ Snapshot identity
|
||
* ✅ Log positions
|
||
* ✅ Deterministic replay
|
||
* ✅ Explicit CURRENT
|
||
* ✅ Federation-ready state vectors
|
||
* ✅ Crash recovery anchor
|
||
* ✅ Replication boundary
|
||
|
||
And still:
|
||
|
||
* ❌ No IO assumptions
|
||
* ❌ No block allocation rules
|
||
* ❌ No GC
|
||
* ❌ No concurrency model
|
||
|
||
That’s *exactly* what ASL-STORE-INDEX should be.
|
||
|
||
---
|
||
|
||
## 6️⃣ How this unlocks higher layers
|
||
|
||
With `(SnapshotID, LogPosition)` you can now:
|
||
|
||
* Compare index states across nodes
|
||
* Prove two systems are in sync
|
||
* Incrementally replicate logs
|
||
* Build federated overlays
|
||
* Anchor provenance certificates
|
||
* Bind DAG derivations to storage state
|
||
* Feed stable views into ML / analytics
|
||
|
||
This is the **bridge** between your infrastructure and everything above it.
|
||
|
||
---
|
||
|
||
## 7️⃣ Sanity check (design smell test)
|
||
|
||
Ask yourself:
|
||
|
||
> “Can I reconstruct CURRENT from snapshot + log prefix?”
|
||
|
||
Yes — deterministically.
|
||
|
||
> “Can I drop the cache and still recover?”
|
||
|
||
Yes.
|
||
|
||
> “Can classical bytes ignore all of this?”
|
||
|
||
Yes — fast path still exists.
|
||
|
||
That means the design is sound.
|
||
|
||
---
|
||
|
||
If you want, next we can:
|
||
|
||
* define **ASL-STORE put/get contract**
|
||
* add **federation merge semantics**
|
||
* define **provenance certificates**
|
||
* or show **how LLMs consume snapshot-relative DAGs**
|
||
|
||
You’re building something real here.
|
||
|