amduat-api/notes/add snapshot IDs + log positions.md

You’re right to call this out 👍
What I gave you so far had the **semantic behavior**, but it stopped *just short* of the thing that actually makes it operationally usable: **explicit snapshot identity and log positions**.

Let’s fix that cleanly, and at the same time infer what **ASL-STORE-INDEX** must minimally contain — without overreaching.

---

## 1️⃣ What ASL-STORE-INDEX minimally is (inferred)

From your spec, ASL-STORE-INDEX must define:

### A. Stable snapshot identity

So you can say:

> “This index state corresponds to snapshot X”

That implies:

* SnapshotID (opaque, comparable)
* Snapshot is immutable
* Snapshot can be referenced later

### B. Log position

So you can say:

> “Replay log entries [0…N)”

That implies:

* Total ordering
* Stable positions
* Prefix replay

### C. Coordination rule

So you can say:

> “Snapshot S + log prefix L defines CURRENT”

That’s it.
No storage layout, no IO, no performance policy.

---

## 2️⃣ What was missing before

Previously:

* Snapshot existed ❌ **but had no identity**
* Log existed ❌ **but had no positions**
* CURRENT existed ❌ **but couldn’t be referenced**

So you couldn’t:

* checkpoint
* replicate
* resume
* federate
* or prove determinism

Let’s add exactly what’s needed — nothing more.

---

## 3️⃣ Updated semantic model

We make **three things explicit**:

```text
SnapshotID = opaque identifier
LogPosition = integer (monotonic)
IndexState = (SnapshotID, LogPosition)
```

And the invariant becomes:

```
Index(SnapshotID, LogPosition) =
    Snapshot[SnapshotID] + replay(log[0:LogPosition])
```

---

## 4️⃣ Updated implementation (drop-in replacement)

### `content_index.py` (snapshot + log positions)

```python
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Optional, List, Iterable
from enum import Enum, auto
import uuid


# ---------------------------------------------------------------------
# Core data types
# ---------------------------------------------------------------------

SnapshotID = str
LogPosition = int


@dataclass(frozen=True)
class ArtifactLocation:
    block_id: str
    offset: int
    length: int


class LogOp(Enum):
    PUT = auto()
    TOMBSTONE = auto()


@dataclass(frozen=True)
class IndexLogEntry:
    position: LogPosition
    op: LogOp
    artifact_key: str
    location: Optional[ArtifactLocation] = None


# ---------------------------------------------------------------------
# Snapshot
# ---------------------------------------------------------------------

class ContentIndexSnapshot:
    """
    Immutable snapshot of the content index.
    """
    def __init__(
        self,
        snapshot_id: SnapshotID,
        mapping: Dict[str, ArtifactLocation],
        log_position: LogPosition,
    ):
        self.snapshot_id = snapshot_id
        self.log_position = log_position
        self._mapping = dict(mapping)

    def get(self, key: str) -> Optional[ArtifactLocation]:
        return self._mapping.get(key)

    def items(self):
        return self._mapping.items()
```

---

### Append-only log with positions

```python
class ContentIndexLog:
    """
    Append-only, totally ordered log.
    """
    def __init__(self, start_position: LogPosition = 0):
        self._entries: List[IndexLogEntry] = []
        self._next_position = start_position

    def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
        pos = self._next_position
        self._next_position += 1
        self._entries.append(IndexLogEntry(
            position=pos,
            op=LogOp.PUT,
            artifact_key=key,
            location=loc,
        ))
        return pos

    def append_tombstone(self, key: str) -> LogPosition:
        pos = self._next_position
        self._next_position += 1
        self._entries.append(IndexLogEntry(
            position=pos,
            op=LogOp.TOMBSTONE,
            artifact_key=key,
        ))
        return pos

    def entries(self) -> Iterable[IndexLogEntry]:
        return self._entries

    def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
        return (e for e in self._entries if e.position < position)

    @property
    def tail_position(self) -> LogPosition:
        return self._next_position
```

---

### ContentIndex with explicit CURRENT

```python
class ContentIndex:
    """
    ASL-CORE-INDEX with snapshot IDs and log positions.
    """

    def __init__(
        self,
        snapshot: Optional[ContentIndexSnapshot] = None,
        log: Optional[ContentIndexLog] = None,
    ):
        self._snapshot = snapshot or ContentIndexSnapshot(
            snapshot_id="genesis",
            mapping={},
            log_position=0,
        )
        self._log = log or ContentIndexLog(
            start_position=self._snapshot.log_position
        )

    # -----------------------------------------------------------------
    # Lookup at CURRENT
    # -----------------------------------------------------------------

    def lookup(self, key: str) -> Optional[ArtifactLocation]:
        # Shadowing: reverse replay up to CURRENT
        for entry in reversed(list(self._log.entries())):
            if entry.artifact_key != key:
                continue
            if entry.op == LogOp.TOMBSTONE:
                return None
            return entry.location

        return self._snapshot.get(key)

    # -----------------------------------------------------------------
    # Append-only mutation
    # -----------------------------------------------------------------

    def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
        return self._log.append_put(key, loc)

    def tombstone(self, key: str) -> LogPosition:
        return self._log.append_tombstone(key)

    # -----------------------------------------------------------------
    # Snapshotting
    # -----------------------------------------------------------------

    def snapshot(self) -> ContentIndexSnapshot:
        """
        Create a new immutable snapshot representing CURRENT.
        """
        materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())

        for entry in self._log.entries():
            if entry.op == LogOp.PUT:
                materialized[entry.artifact_key] = entry.location
            elif entry.op == LogOp.TOMBSTONE:
                materialized.pop(entry.artifact_key, None)

        snapshot_id = str(uuid.uuid4())
        snapshot = ContentIndexSnapshot(
            snapshot_id=snapshot_id,
            mapping=materialized,
            log_position=self._log.tail_position,
        )
        return snapshot

    def install_snapshot(self, snapshot: ContentIndexSnapshot):
        """
        Install a snapshot and reset log (external coordination assumed).
        """
        self._snapshot = snapshot
        self._log = ContentIndexLog(start_position=snapshot.log_position)

    # -----------------------------------------------------------------
    # Observability
    # -----------------------------------------------------------------

    @property
    def current_snapshot_id(self) -> SnapshotID:
        return self._snapshot.snapshot_id

    @property
    def current_log_position(self) -> LogPosition:
        return self._log.tail_position
```

---

## 5️⃣ Why this is enough (and not too much)

You now have:

* ✅ Snapshot identity
* ✅ Log positions
* ✅ Deterministic replay
* ✅ Explicit CURRENT
* ✅ Federation-ready state vectors
* ✅ Crash recovery anchor
* ✅ Replication boundary

And still:

* ❌ No IO assumptions
* ❌ No block allocation rules
* ❌ No GC
* ❌ No concurrency model

That’s *exactly* what ASL-STORE-INDEX should be.

---

## 6️⃣ How this unlocks higher layers

With `(SnapshotID, LogPosition)` you can now:

* Compare index states across nodes
* Prove two systems are in sync
* Incrementally replicate logs
* Build federated overlays
* Anchor provenance certificates
* Bind DAG derivations to storage state
* Feed stable views into ML / analytics

This is the **bridge** between your infrastructure and everything above it.

---

## 7️⃣ Sanity check (design smell test)

Ask yourself:

> “Can I reconstruct CURRENT from snapshot + log prefix?”

Yes — deterministically.

> “Can I drop the cache and still recover?”

Yes.

> “Can classical bytes ignore all of this?”

Yes — fast path still exists.

That means the design is sound.

---

If you want, next we can:

* define **ASL-STORE put/get contract**
* add **federation merge semantics**
* define **provenance certificates**
* or show **how LLMs consume snapshot-relative DAGs**

You’re building something real here.