8.5 KiB
You’re right to call this out 👍 What I gave you so far had the semantic behavior, but it stopped just short of the thing that actually makes it operationally usable: explicit snapshot identity and log positions.
Let’s fix that cleanly, and at the same time infer what ASL-STORE-INDEX must minimally contain — without overreaching.
1️⃣ What ASL-STORE-INDEX minimally is (inferred)
From your spec, ASL-STORE-INDEX must define:
A. Stable snapshot identity
So you can say:
“This index state corresponds to snapshot X”
That implies:
- SnapshotID (opaque, comparable)
- Snapshot is immutable
- Snapshot can be referenced later
B. Log position
So you can say:
“Replay log entries [0…N)”
That implies:
- Total ordering
- Stable positions
- Prefix replay
C. Coordination rule
So you can say:
“Snapshot S + log prefix L defines CURRENT”
That’s it. No storage layout, no IO, no performance policy.
2️⃣ What was missing before
Previously:
- Snapshot existed ❌ but had no identity
- Log existed ❌ but had no positions
- CURRENT existed ❌ but couldn’t be referenced
So you couldn’t:
- checkpoint
- replicate
- resume
- federate
- or prove determinism
Let’s add exactly what’s needed — nothing more.
3️⃣ Updated semantic model
We make three things explicit:
SnapshotID = opaque identifier
LogPosition = integer (monotonic)
IndexState = (SnapshotID, LogPosition)
And the invariant becomes:
Index(SnapshotID, LogPosition) =
Snapshot[SnapshotID] + replay(log[0:LogPosition])
4️⃣ Updated implementation (drop-in replacement)
content_index.py (snapshot + log positions)
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Optional, List, Iterable
from enum import Enum, auto
import uuid
# ---------------------------------------------------------------------
# Core data types
# ---------------------------------------------------------------------
SnapshotID = str
LogPosition = int
@dataclass(frozen=True)
class ArtifactLocation:
block_id: str
offset: int
length: int
class LogOp(Enum):
PUT = auto()
TOMBSTONE = auto()
@dataclass(frozen=True)
class IndexLogEntry:
position: LogPosition
op: LogOp
artifact_key: str
location: Optional[ArtifactLocation] = None
# ---------------------------------------------------------------------
# Snapshot
# ---------------------------------------------------------------------
class ContentIndexSnapshot:
"""
Immutable snapshot of the content index.
"""
def __init__(
self,
snapshot_id: SnapshotID,
mapping: Dict[str, ArtifactLocation],
log_position: LogPosition,
):
self.snapshot_id = snapshot_id
self.log_position = log_position
self._mapping = dict(mapping)
def get(self, key: str) -> Optional[ArtifactLocation]:
return self._mapping.get(key)
def items(self):
return self._mapping.items()
Append-only log with positions
class ContentIndexLog:
"""
Append-only, totally ordered log.
"""
def __init__(self, start_position: LogPosition = 0):
self._entries: List[IndexLogEntry] = []
self._next_position = start_position
def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.PUT,
artifact_key=key,
location=loc,
))
return pos
def append_tombstone(self, key: str) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.TOMBSTONE,
artifact_key=key,
))
return pos
def entries(self) -> Iterable[IndexLogEntry]:
return self._entries
def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
return (e for e in self._entries if e.position < position)
@property
def tail_position(self) -> LogPosition:
return self._next_position
ContentIndex with explicit CURRENT
class ContentIndex:
"""
ASL-CORE-INDEX with snapshot IDs and log positions.
"""
def __init__(
self,
snapshot: Optional[ContentIndexSnapshot] = None,
log: Optional[ContentIndexLog] = None,
):
self._snapshot = snapshot or ContentIndexSnapshot(
snapshot_id="genesis",
mapping={},
log_position=0,
)
self._log = log or ContentIndexLog(
start_position=self._snapshot.log_position
)
# -----------------------------------------------------------------
# Lookup at CURRENT
# -----------------------------------------------------------------
def lookup(self, key: str) -> Optional[ArtifactLocation]:
# Shadowing: reverse replay up to CURRENT
for entry in reversed(list(self._log.entries())):
if entry.artifact_key != key:
continue
if entry.op == LogOp.TOMBSTONE:
return None
return entry.location
return self._snapshot.get(key)
# -----------------------------------------------------------------
# Append-only mutation
# -----------------------------------------------------------------
def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
return self._log.append_put(key, loc)
def tombstone(self, key: str) -> LogPosition:
return self._log.append_tombstone(key)
# -----------------------------------------------------------------
# Snapshotting
# -----------------------------------------------------------------
def snapshot(self) -> ContentIndexSnapshot:
"""
Create a new immutable snapshot representing CURRENT.
"""
materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())
for entry in self._log.entries():
if entry.op == LogOp.PUT:
materialized[entry.artifact_key] = entry.location
elif entry.op == LogOp.TOMBSTONE:
materialized.pop(entry.artifact_key, None)
snapshot_id = str(uuid.uuid4())
snapshot = ContentIndexSnapshot(
snapshot_id=snapshot_id,
mapping=materialized,
log_position=self._log.tail_position,
)
return snapshot
def install_snapshot(self, snapshot: ContentIndexSnapshot):
"""
Install a snapshot and reset log (external coordination assumed).
"""
self._snapshot = snapshot
self._log = ContentIndexLog(start_position=snapshot.log_position)
# -----------------------------------------------------------------
# Observability
# -----------------------------------------------------------------
@property
def current_snapshot_id(self) -> SnapshotID:
return self._snapshot.snapshot_id
@property
def current_log_position(self) -> LogPosition:
return self._log.tail_position
5️⃣ Why this is enough (and not too much)
You now have:
- ✅ Snapshot identity
- ✅ Log positions
- ✅ Deterministic replay
- ✅ Explicit CURRENT
- ✅ Federation-ready state vectors
- ✅ Crash recovery anchor
- ✅ Replication boundary
And still:
- ❌ No IO assumptions
- ❌ No block allocation rules
- ❌ No GC
- ❌ No concurrency model
That’s exactly what ASL-STORE-INDEX should be.
6️⃣ How this unlocks higher layers
With (SnapshotID, LogPosition) you can now:
- Compare index states across nodes
- Prove two systems are in sync
- Incrementally replicate logs
- Build federated overlays
- Anchor provenance certificates
- Bind DAG derivations to storage state
- Feed stable views into ML / analytics
This is the bridge between your infrastructure and everything above it.
7️⃣ Sanity check (design smell test)
Ask yourself:
“Can I reconstruct CURRENT from snapshot + log prefix?”
Yes — deterministically.
“Can I drop the cache and still recover?”
Yes.
“Can classical bytes ignore all of this?”
Yes — fast path still exists.
That means the design is sound.
If you want, next we can:
- define ASL-STORE put/get contract
- add federation merge semantics
- define provenance certificates
- or show how LLMs consume snapshot-relative DAGs
You’re building something real here.