amduat-api/notes/add snapshot IDs + log positions.md
2026-01-17 00:19:49 +01:00

359 lines
8.5 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Youre right to call this out 👍
What I gave you so far had the **semantic behavior**, but it stopped *just short* of the thing that actually makes it operationally usable: **explicit snapshot identity and log positions**.
Lets fix that cleanly, and at the same time infer what **ASL-STORE-INDEX** must minimally contain — without overreaching.
---
## 1⃣ What ASL-STORE-INDEX minimally is (inferred)
From your spec, ASL-STORE-INDEX must define:
### A. Stable snapshot identity
So you can say:
> “This index state corresponds to snapshot X”
That implies:
* SnapshotID (opaque, comparable)
* Snapshot is immutable
* Snapshot can be referenced later
### B. Log position
So you can say:
> “Replay log entries [0…N)”
That implies:
* Total ordering
* Stable positions
* Prefix replay
### C. Coordination rule
So you can say:
> “Snapshot S + log prefix L defines CURRENT”
Thats it.
No storage layout, no IO, no performance policy.
---
## 2⃣ What was missing before
Previously:
* Snapshot existed ❌ **but had no identity**
* Log existed ❌ **but had no positions**
* CURRENT existed ❌ **but couldnt be referenced**
So you couldnt:
* checkpoint
* replicate
* resume
* federate
* or prove determinism
Lets add exactly whats needed — nothing more.
---
## 3⃣ Updated semantic model
We make **three things explicit**:
```text
SnapshotID = opaque identifier
LogPosition = integer (monotonic)
IndexState = (SnapshotID, LogPosition)
```
And the invariant becomes:
```
Index(SnapshotID, LogPosition) =
Snapshot[SnapshotID] + replay(log[0:LogPosition])
```
---
## 4⃣ Updated implementation (drop-in replacement)
### `content_index.py` (snapshot + log positions)
```python
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Optional, List, Iterable
from enum import Enum, auto
import uuid
# ---------------------------------------------------------------------
# Core data types
# ---------------------------------------------------------------------
SnapshotID = str
LogPosition = int
@dataclass(frozen=True)
class ArtifactLocation:
block_id: str
offset: int
length: int
class LogOp(Enum):
PUT = auto()
TOMBSTONE = auto()
@dataclass(frozen=True)
class IndexLogEntry:
position: LogPosition
op: LogOp
artifact_key: str
location: Optional[ArtifactLocation] = None
# ---------------------------------------------------------------------
# Snapshot
# ---------------------------------------------------------------------
class ContentIndexSnapshot:
"""
Immutable snapshot of the content index.
"""
def __init__(
self,
snapshot_id: SnapshotID,
mapping: Dict[str, ArtifactLocation],
log_position: LogPosition,
):
self.snapshot_id = snapshot_id
self.log_position = log_position
self._mapping = dict(mapping)
def get(self, key: str) -> Optional[ArtifactLocation]:
return self._mapping.get(key)
def items(self):
return self._mapping.items()
```
---
### Append-only log with positions
```python
class ContentIndexLog:
"""
Append-only, totally ordered log.
"""
def __init__(self, start_position: LogPosition = 0):
self._entries: List[IndexLogEntry] = []
self._next_position = start_position
def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.PUT,
artifact_key=key,
location=loc,
))
return pos
def append_tombstone(self, key: str) -> LogPosition:
pos = self._next_position
self._next_position += 1
self._entries.append(IndexLogEntry(
position=pos,
op=LogOp.TOMBSTONE,
artifact_key=key,
))
return pos
def entries(self) -> Iterable[IndexLogEntry]:
return self._entries
def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
return (e for e in self._entries if e.position < position)
@property
def tail_position(self) -> LogPosition:
return self._next_position
```
---
### ContentIndex with explicit CURRENT
```python
class ContentIndex:
"""
ASL-CORE-INDEX with snapshot IDs and log positions.
"""
def __init__(
self,
snapshot: Optional[ContentIndexSnapshot] = None,
log: Optional[ContentIndexLog] = None,
):
self._snapshot = snapshot or ContentIndexSnapshot(
snapshot_id="genesis",
mapping={},
log_position=0,
)
self._log = log or ContentIndexLog(
start_position=self._snapshot.log_position
)
# -----------------------------------------------------------------
# Lookup at CURRENT
# -----------------------------------------------------------------
def lookup(self, key: str) -> Optional[ArtifactLocation]:
# Shadowing: reverse replay up to CURRENT
for entry in reversed(list(self._log.entries())):
if entry.artifact_key != key:
continue
if entry.op == LogOp.TOMBSTONE:
return None
return entry.location
return self._snapshot.get(key)
# -----------------------------------------------------------------
# Append-only mutation
# -----------------------------------------------------------------
def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
return self._log.append_put(key, loc)
def tombstone(self, key: str) -> LogPosition:
return self._log.append_tombstone(key)
# -----------------------------------------------------------------
# Snapshotting
# -----------------------------------------------------------------
def snapshot(self) -> ContentIndexSnapshot:
"""
Create a new immutable snapshot representing CURRENT.
"""
materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())
for entry in self._log.entries():
if entry.op == LogOp.PUT:
materialized[entry.artifact_key] = entry.location
elif entry.op == LogOp.TOMBSTONE:
materialized.pop(entry.artifact_key, None)
snapshot_id = str(uuid.uuid4())
snapshot = ContentIndexSnapshot(
snapshot_id=snapshot_id,
mapping=materialized,
log_position=self._log.tail_position,
)
return snapshot
def install_snapshot(self, snapshot: ContentIndexSnapshot):
"""
Install a snapshot and reset log (external coordination assumed).
"""
self._snapshot = snapshot
self._log = ContentIndexLog(start_position=snapshot.log_position)
# -----------------------------------------------------------------
# Observability
# -----------------------------------------------------------------
@property
def current_snapshot_id(self) -> SnapshotID:
return self._snapshot.snapshot_id
@property
def current_log_position(self) -> LogPosition:
return self._log.tail_position
```
---
## 5⃣ Why this is enough (and not too much)
You now have:
* ✅ Snapshot identity
* ✅ Log positions
* ✅ Deterministic replay
* ✅ Explicit CURRENT
* ✅ Federation-ready state vectors
* ✅ Crash recovery anchor
* ✅ Replication boundary
And still:
* ❌ No IO assumptions
* ❌ No block allocation rules
* ❌ No GC
* ❌ No concurrency model
Thats *exactly* what ASL-STORE-INDEX should be.
---
## 6⃣ How this unlocks higher layers
With `(SnapshotID, LogPosition)` you can now:
* Compare index states across nodes
* Prove two systems are in sync
* Incrementally replicate logs
* Build federated overlays
* Anchor provenance certificates
* Bind DAG derivations to storage state
* Feed stable views into ML / analytics
This is the **bridge** between your infrastructure and everything above it.
---
## 7⃣ Sanity check (design smell test)
Ask yourself:
> “Can I reconstruct CURRENT from snapshot + log prefix?”
Yes — deterministically.
> “Can I drop the cache and still recover?”
Yes.
> “Can classical bytes ignore all of this?”
Yes — fast path still exists.
That means the design is sound.
---
If you want, next we can:
* define **ASL-STORE put/get contract**
* add **federation merge semantics**
* define **provenance certificates**
* or show **how LLMs consume snapshot-relative DAGs**
Youre building something real here.