359 lines
8.5 KiB
Markdown
359 lines
8.5 KiB
Markdown
|
|
You’re right to call this out 👍
|
|||
|
|
What I gave you so far had the **semantic behavior**, but it stopped *just short* of the thing that actually makes it operationally usable: **explicit snapshot identity and log positions**.
|
|||
|
|
|
|||
|
|
Let’s fix that cleanly, and at the same time infer what **ASL-STORE-INDEX** must minimally contain — without overreaching.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1️⃣ What ASL-STORE-INDEX minimally is (inferred)
|
|||
|
|
|
|||
|
|
From your spec, ASL-STORE-INDEX must define:
|
|||
|
|
|
|||
|
|
### A. Stable snapshot identity
|
|||
|
|
|
|||
|
|
So you can say:
|
|||
|
|
|
|||
|
|
> “This index state corresponds to snapshot X”
|
|||
|
|
|
|||
|
|
That implies:
|
|||
|
|
|
|||
|
|
* SnapshotID (opaque, comparable)
|
|||
|
|
* Snapshot is immutable
|
|||
|
|
* Snapshot can be referenced later
|
|||
|
|
|
|||
|
|
### B. Log position
|
|||
|
|
|
|||
|
|
So you can say:
|
|||
|
|
|
|||
|
|
> “Replay log entries [0…N)”
|
|||
|
|
|
|||
|
|
That implies:
|
|||
|
|
|
|||
|
|
* Total ordering
|
|||
|
|
* Stable positions
|
|||
|
|
* Prefix replay
|
|||
|
|
|
|||
|
|
### C. Coordination rule
|
|||
|
|
|
|||
|
|
So you can say:
|
|||
|
|
|
|||
|
|
> “Snapshot S + log prefix L defines CURRENT”
|
|||
|
|
|
|||
|
|
That’s it.
|
|||
|
|
No storage layout, no IO, no performance policy.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2️⃣ What was missing before
|
|||
|
|
|
|||
|
|
Previously:
|
|||
|
|
|
|||
|
|
* Snapshot existed ❌ **but had no identity**
|
|||
|
|
* Log existed ❌ **but had no positions**
|
|||
|
|
* CURRENT existed ❌ **but couldn’t be referenced**
|
|||
|
|
|
|||
|
|
So you couldn’t:
|
|||
|
|
|
|||
|
|
* checkpoint
|
|||
|
|
* replicate
|
|||
|
|
* resume
|
|||
|
|
* federate
|
|||
|
|
* or prove determinism
|
|||
|
|
|
|||
|
|
Let’s add exactly what’s needed — nothing more.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3️⃣ Updated semantic model
|
|||
|
|
|
|||
|
|
We make **three things explicit**:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
SnapshotID = opaque identifier
|
|||
|
|
LogPosition = integer (monotonic)
|
|||
|
|
IndexState = (SnapshotID, LogPosition)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
And the invariant becomes:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Index(SnapshotID, LogPosition) =
|
|||
|
|
Snapshot[SnapshotID] + replay(log[0:LogPosition])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4️⃣ Updated implementation (drop-in replacement)
|
|||
|
|
|
|||
|
|
### `content_index.py` (snapshot + log positions)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from __future__ import annotations
|
|||
|
|
from dataclasses import dataclass
|
|||
|
|
from typing import Dict, Optional, List, Iterable
|
|||
|
|
from enum import Enum, auto
|
|||
|
|
import uuid
|
|||
|
|
|
|||
|
|
|
|||
|
|
# ---------------------------------------------------------------------
|
|||
|
|
# Core data types
|
|||
|
|
# ---------------------------------------------------------------------
|
|||
|
|
|
|||
|
|
SnapshotID = str
|
|||
|
|
LogPosition = int
|
|||
|
|
|
|||
|
|
|
|||
|
|
@dataclass(frozen=True)
|
|||
|
|
class ArtifactLocation:
|
|||
|
|
block_id: str
|
|||
|
|
offset: int
|
|||
|
|
length: int
|
|||
|
|
|
|||
|
|
|
|||
|
|
class LogOp(Enum):
|
|||
|
|
PUT = auto()
|
|||
|
|
TOMBSTONE = auto()
|
|||
|
|
|
|||
|
|
|
|||
|
|
@dataclass(frozen=True)
|
|||
|
|
class IndexLogEntry:
|
|||
|
|
position: LogPosition
|
|||
|
|
op: LogOp
|
|||
|
|
artifact_key: str
|
|||
|
|
location: Optional[ArtifactLocation] = None
|
|||
|
|
|
|||
|
|
|
|||
|
|
# ---------------------------------------------------------------------
|
|||
|
|
# Snapshot
|
|||
|
|
# ---------------------------------------------------------------------
|
|||
|
|
|
|||
|
|
class ContentIndexSnapshot:
|
|||
|
|
"""
|
|||
|
|
Immutable snapshot of the content index.
|
|||
|
|
"""
|
|||
|
|
def __init__(
|
|||
|
|
self,
|
|||
|
|
snapshot_id: SnapshotID,
|
|||
|
|
mapping: Dict[str, ArtifactLocation],
|
|||
|
|
log_position: LogPosition,
|
|||
|
|
):
|
|||
|
|
self.snapshot_id = snapshot_id
|
|||
|
|
self.log_position = log_position
|
|||
|
|
self._mapping = dict(mapping)
|
|||
|
|
|
|||
|
|
def get(self, key: str) -> Optional[ArtifactLocation]:
|
|||
|
|
return self._mapping.get(key)
|
|||
|
|
|
|||
|
|
def items(self):
|
|||
|
|
return self._mapping.items()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Append-only log with positions
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class ContentIndexLog:
|
|||
|
|
"""
|
|||
|
|
Append-only, totally ordered log.
|
|||
|
|
"""
|
|||
|
|
def __init__(self, start_position: LogPosition = 0):
|
|||
|
|
self._entries: List[IndexLogEntry] = []
|
|||
|
|
self._next_position = start_position
|
|||
|
|
|
|||
|
|
def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
|
|||
|
|
pos = self._next_position
|
|||
|
|
self._next_position += 1
|
|||
|
|
self._entries.append(IndexLogEntry(
|
|||
|
|
position=pos,
|
|||
|
|
op=LogOp.PUT,
|
|||
|
|
artifact_key=key,
|
|||
|
|
location=loc,
|
|||
|
|
))
|
|||
|
|
return pos
|
|||
|
|
|
|||
|
|
def append_tombstone(self, key: str) -> LogPosition:
|
|||
|
|
pos = self._next_position
|
|||
|
|
self._next_position += 1
|
|||
|
|
self._entries.append(IndexLogEntry(
|
|||
|
|
position=pos,
|
|||
|
|
op=LogOp.TOMBSTONE,
|
|||
|
|
artifact_key=key,
|
|||
|
|
))
|
|||
|
|
return pos
|
|||
|
|
|
|||
|
|
def entries(self) -> Iterable[IndexLogEntry]:
|
|||
|
|
return self._entries
|
|||
|
|
|
|||
|
|
def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
|
|||
|
|
return (e for e in self._entries if e.position < position)
|
|||
|
|
|
|||
|
|
@property
|
|||
|
|
def tail_position(self) -> LogPosition:
|
|||
|
|
return self._next_position
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### ContentIndex with explicit CURRENT
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class ContentIndex:
|
|||
|
|
"""
|
|||
|
|
ASL-CORE-INDEX with snapshot IDs and log positions.
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
def __init__(
|
|||
|
|
self,
|
|||
|
|
snapshot: Optional[ContentIndexSnapshot] = None,
|
|||
|
|
log: Optional[ContentIndexLog] = None,
|
|||
|
|
):
|
|||
|
|
self._snapshot = snapshot or ContentIndexSnapshot(
|
|||
|
|
snapshot_id="genesis",
|
|||
|
|
mapping={},
|
|||
|
|
log_position=0,
|
|||
|
|
)
|
|||
|
|
self._log = log or ContentIndexLog(
|
|||
|
|
start_position=self._snapshot.log_position
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
# Lookup at CURRENT
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
|
|||
|
|
def lookup(self, key: str) -> Optional[ArtifactLocation]:
|
|||
|
|
# Shadowing: reverse replay up to CURRENT
|
|||
|
|
for entry in reversed(list(self._log.entries())):
|
|||
|
|
if entry.artifact_key != key:
|
|||
|
|
continue
|
|||
|
|
if entry.op == LogOp.TOMBSTONE:
|
|||
|
|
return None
|
|||
|
|
return entry.location
|
|||
|
|
|
|||
|
|
return self._snapshot.get(key)
|
|||
|
|
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
# Append-only mutation
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
|
|||
|
|
def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
|
|||
|
|
return self._log.append_put(key, loc)
|
|||
|
|
|
|||
|
|
def tombstone(self, key: str) -> LogPosition:
|
|||
|
|
return self._log.append_tombstone(key)
|
|||
|
|
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
# Snapshotting
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
|
|||
|
|
def snapshot(self) -> ContentIndexSnapshot:
|
|||
|
|
"""
|
|||
|
|
Create a new immutable snapshot representing CURRENT.
|
|||
|
|
"""
|
|||
|
|
materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())
|
|||
|
|
|
|||
|
|
for entry in self._log.entries():
|
|||
|
|
if entry.op == LogOp.PUT:
|
|||
|
|
materialized[entry.artifact_key] = entry.location
|
|||
|
|
elif entry.op == LogOp.TOMBSTONE:
|
|||
|
|
materialized.pop(entry.artifact_key, None)
|
|||
|
|
|
|||
|
|
snapshot_id = str(uuid.uuid4())
|
|||
|
|
snapshot = ContentIndexSnapshot(
|
|||
|
|
snapshot_id=snapshot_id,
|
|||
|
|
mapping=materialized,
|
|||
|
|
log_position=self._log.tail_position,
|
|||
|
|
)
|
|||
|
|
return snapshot
|
|||
|
|
|
|||
|
|
def install_snapshot(self, snapshot: ContentIndexSnapshot):
|
|||
|
|
"""
|
|||
|
|
Install a snapshot and reset log (external coordination assumed).
|
|||
|
|
"""
|
|||
|
|
self._snapshot = snapshot
|
|||
|
|
self._log = ContentIndexLog(start_position=snapshot.log_position)
|
|||
|
|
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
# Observability
|
|||
|
|
# -----------------------------------------------------------------
|
|||
|
|
|
|||
|
|
@property
|
|||
|
|
def current_snapshot_id(self) -> SnapshotID:
|
|||
|
|
return self._snapshot.snapshot_id
|
|||
|
|
|
|||
|
|
@property
|
|||
|
|
def current_log_position(self) -> LogPosition:
|
|||
|
|
return self._log.tail_position
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5️⃣ Why this is enough (and not too much)
|
|||
|
|
|
|||
|
|
You now have:
|
|||
|
|
|
|||
|
|
* ✅ Snapshot identity
|
|||
|
|
* ✅ Log positions
|
|||
|
|
* ✅ Deterministic replay
|
|||
|
|
* ✅ Explicit CURRENT
|
|||
|
|
* ✅ Federation-ready state vectors
|
|||
|
|
* ✅ Crash recovery anchor
|
|||
|
|
* ✅ Replication boundary
|
|||
|
|
|
|||
|
|
And still:
|
|||
|
|
|
|||
|
|
* ❌ No IO assumptions
|
|||
|
|
* ❌ No block allocation rules
|
|||
|
|
* ❌ No GC
|
|||
|
|
* ❌ No concurrency model
|
|||
|
|
|
|||
|
|
That’s *exactly* what ASL-STORE-INDEX should be.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6️⃣ How this unlocks higher layers
|
|||
|
|
|
|||
|
|
With `(SnapshotID, LogPosition)` you can now:
|
|||
|
|
|
|||
|
|
* Compare index states across nodes
|
|||
|
|
* Prove two systems are in sync
|
|||
|
|
* Incrementally replicate logs
|
|||
|
|
* Build federated overlays
|
|||
|
|
* Anchor provenance certificates
|
|||
|
|
* Bind DAG derivations to storage state
|
|||
|
|
* Feed stable views into ML / analytics
|
|||
|
|
|
|||
|
|
This is the **bridge** between your infrastructure and everything above it.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7️⃣ Sanity check (design smell test)
|
|||
|
|
|
|||
|
|
Ask yourself:
|
|||
|
|
|
|||
|
|
> “Can I reconstruct CURRENT from snapshot + log prefix?”
|
|||
|
|
|
|||
|
|
Yes — deterministically.
|
|||
|
|
|
|||
|
|
> “Can I drop the cache and still recover?”
|
|||
|
|
|
|||
|
|
Yes.
|
|||
|
|
|
|||
|
|
> “Can classical bytes ignore all of this?”
|
|||
|
|
|
|||
|
|
Yes — fast path still exists.
|
|||
|
|
|
|||
|
|
That means the design is sound.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
If you want, next we can:
|
|||
|
|
|
|||
|
|
* define **ASL-STORE put/get contract**
|
|||
|
|
* add **federation merge semantics**
|
|||
|
|
* define **provenance certificates**
|
|||
|
|
* or show **how LLMs consume snapshot-relative DAGs**
|
|||
|
|
|
|||
|
|
You’re building something real here.
|
|||
|
|
|