amduat-api/notes/add snapshot IDs + log positions.md

359 lines
8.5 KiB
Markdown
Raw Normal View History

Youre right to call this out 👍
What I gave you so far had the **semantic behavior**, but it stopped *just short* of the thing that actually makes it operationally usable: **explicit snapshot identity and log positions**.
Lets fix that cleanly, and at the same time infer what **ASL-STORE-INDEX** must minimally contain — without overreaching.
---
## 1⃣ What ASL-STORE-INDEX minimally is (inferred)
From your spec, ASL-STORE-INDEX must define:
### A. Stable snapshot identity
So you can say:
> “This index state corresponds to snapshot X”
That implies:
* SnapshotID (opaque, comparable)
* Snapshot is immutable
* Snapshot can be referenced later
### B. Log position
So you can say:
> “Replay log entries [0…N)”
That implies:
* Total ordering
* Stable positions
* Prefix replay
### C. Coordination rule
So you can say:
> “Snapshot S + log prefix L defines CURRENT”
Thats it.
No storage layout, no IO, no performance policy.
---
## 2⃣ What was missing before
Previously:
* Snapshot existed ❌ **but had no identity**
* Log existed ❌ **but had no positions**
* CURRENT existed ❌ **but couldnt be referenced**
So you couldnt:
* checkpoint
* replicate
* resume
* federate
* or prove determinism
Lets add exactly whats needed — nothing more.
---
## 3⃣ Updated semantic model
We make **three things explicit**:
```text
SnapshotID = opaque identifier
LogPosition = integer (monotonic)
IndexState = (SnapshotID, LogPosition)
```
And the invariant becomes:
```
Index(SnapshotID, LogPosition) =
Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
---
## 4⃣ Updated implementation (drop-in replacement)
### `content_index.py` (snapshot + log positions)
```python
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Optional, List, Iterable
from enum import Enum, auto
import uuid
# ---------------------------------------------------------------------
# Core data types
# ---------------------------------------------------------------------
SnapshotID = str
LogPosition = int
@dataclass(frozen=True)
class ArtifactLocation:
block_id: str
offset: int
length: int
class LogOp(Enum):
PUT = auto()
TOMBSTONE = auto()
@dataclass(frozen=True)
class IndexLogEntry:
position: LogPosition
op: LogOp
artifact_key: str
location: Optional[ArtifactLocation] = None
# ---------------------------------------------------------------------
# Snapshot
# ---------------------------------------------------------------------
class ContentIndexSnapshot:
"""
Immutable snapshot of the content index.
"""
def __init__(
self,
snapshot_id: SnapshotID,
mapping: Dict[str, ArtifactLocation],
log_position: LogPosition,
):
self.snapshot_id = snapshot_id
self.log_position = log_position
self._mapping = dict(mapping)
def get(self, key: str) -> Optional[ArtifactLocation]:
return self._mapping.get(key)
def items(self):
return self._mapping.items()
```
---
### Append-only log with positions
```python
class ContentIndexLog:
"""
Append-only, totally ordered log.
"""
def __init__(self, start_position: LogPosition = 0):
self._entries: List[IndexLogEntry] = []
self._next_position = start_position
def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.PUT,
artifact_key=key,
location=loc,
))
return pos
def append_tombstone(self, key: str) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.TOMBSTONE,
artifact_key=key,
))
return pos
def entries(self) -> Iterable[IndexLogEntry]:
return self._entries
def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
return (e for e in self._entries if e.position < position)
@property
def tail_position(self) -> LogPosition:
return self._next_position
```
---
### ContentIndex with explicit CURRENT
```python
class ContentIndex:
"""
ASL-CORE-INDEX with snapshot IDs and log positions.
"""
def __init__(
self,
snapshot: Optional[ContentIndexSnapshot] = None,
log: Optional[ContentIndexLog] = None,
):
self._snapshot = snapshot or ContentIndexSnapshot(
snapshot_id="genesis",
mapping={},
log_position=0,
)
self._log = log or ContentIndexLog(
start_position=self._snapshot.log_position
)
# -----------------------------------------------------------------
# Lookup at CURRENT
# -----------------------------------------------------------------
def lookup(self, key: str) -> Optional[ArtifactLocation]:
# Shadowing: reverse replay up to CURRENT
for entry in reversed(list(self._log.entries())):
if entry.artifact_key != key:
continue
if entry.op == LogOp.TOMBSTONE:
return None
return entry.location
return self._snapshot.get(key)
# -----------------------------------------------------------------
# Append-only mutation
# -----------------------------------------------------------------
def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
return self._log.append_put(key, loc)
def tombstone(self, key: str) -> LogPosition:
return self._log.append_tombstone(key)
# -----------------------------------------------------------------
# Snapshotting
# -----------------------------------------------------------------
def snapshot(self) -> ContentIndexSnapshot:
"""
Create a new immutable snapshot representing CURRENT.
"""
materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())
for entry in self._log.entries():
if entry.op == LogOp.PUT:
materialized[entry.artifact_key] = entry.location
elif entry.op == LogOp.TOMBSTONE:
materialized.pop(entry.artifact_key, None)
snapshot_id = str(uuid.uuid4())
snapshot = ContentIndexSnapshot(
snapshot_id=snapshot_id,
mapping=materialized,
log_position=self._log.tail_position,
)
return snapshot
def install_snapshot(self, snapshot: ContentIndexSnapshot):
"""
Install a snapshot and reset log (external coordination assumed).
"""
self._snapshot = snapshot
self._log = ContentIndexLog(start_position=snapshot.log_position)
# -----------------------------------------------------------------
# Observability
# -----------------------------------------------------------------
@property
def current_snapshot_id(self) -> SnapshotID:
return self._snapshot.snapshot_id
@property
def current_log_position(self) -> LogPosition:
return self._log.tail_position
```
---
## 5⃣ Why this is enough (and not too much)
You now have:
* ✅ Snapshot identity
* ✅ Log positions
* ✅ Deterministic replay
* ✅ Explicit CURRENT
* ✅ Federation-ready state vectors
* ✅ Crash recovery anchor
* ✅ Replication boundary
And still:
* ❌ No IO assumptions
* ❌ No block allocation rules
* ❌ No GC
* ❌ No concurrency model
Thats *exactly* what ASL-STORE-INDEX should be.
---
## 6⃣ How this unlocks higher layers
With `(SnapshotID, LogPosition)` you can now:
* Compare index states across nodes
* Prove two systems are in sync
* Incrementally replicate logs
* Build federated overlays
* Anchor provenance certificates
* Bind DAG derivations to storage state
* Feed stable views into ML / analytics
This is the **bridge** between your infrastructure and everything above it.
---
## 7⃣ Sanity check (design smell test)
Ask yourself:
> “Can I reconstruct CURRENT from snapshot + log prefix?”
Yes — deterministically.
> “Can I drop the cache and still recover?”
Yes.
> “Can classical bytes ignore all of this?”
Yes — fast path still exists.
That means the design is sound.
---
If you want, next we can:
* define **ASL-STORE put/get contract**
* add **federation merge semantics**
* define **provenance certificates**
* or show **how LLMs consume snapshot-relative DAGs**
Youre building something real here.