amduat-api/notes/add snapshot IDs + log positions.md
2026-01-17 00:19:49 +01:00

8.5 KiB
Raw Blame History

Youre right to call this out 👍 What I gave you so far had the semantic behavior, but it stopped just short of the thing that actually makes it operationally usable: explicit snapshot identity and log positions.

Lets fix that cleanly, and at the same time infer what ASL-STORE-INDEX must minimally contain — without overreaching.


1 What ASL-STORE-INDEX minimally is (inferred)

From your spec, ASL-STORE-INDEX must define:

A. Stable snapshot identity

So you can say:

“This index state corresponds to snapshot X”

That implies:

  • SnapshotID (opaque, comparable)
  • Snapshot is immutable
  • Snapshot can be referenced later

B. Log position

So you can say:

“Replay log entries [0…N)”

That implies:

  • Total ordering
  • Stable positions
  • Prefix replay

C. Coordination rule

So you can say:

“Snapshot S + log prefix L defines CURRENT”

Thats it. No storage layout, no IO, no performance policy.


2 What was missing before

Previously:

  • Snapshot existed but had no identity
  • Log existed but had no positions
  • CURRENT existed but couldnt be referenced

So you couldnt:

  • checkpoint
  • replicate
  • resume
  • federate
  • or prove determinism

Lets add exactly whats needed — nothing more.


3 Updated semantic model

We make three things explicit:

SnapshotID = opaque identifier
LogPosition = integer (monotonic)
IndexState = (SnapshotID, LogPosition)

And the invariant becomes:

Index(SnapshotID, LogPosition) =
    Snapshot[SnapshotID] + replay(log[0:LogPosition])

4 Updated implementation (drop-in replacement)

content_index.py (snapshot + log positions)

from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, Optional, List, Iterable
from enum import Enum, auto
import uuid


# ---------------------------------------------------------------------
# Core data types
# ---------------------------------------------------------------------

SnapshotID = str
LogPosition = int


@dataclass(frozen=True)
class ArtifactLocation:
    block_id: str
    offset: int
    length: int


class LogOp(Enum):
    PUT = auto()
    TOMBSTONE = auto()


@dataclass(frozen=True)
class IndexLogEntry:
    position: LogPosition
    op: LogOp
    artifact_key: str
    location: Optional[ArtifactLocation] = None


# ---------------------------------------------------------------------
# Snapshot
# ---------------------------------------------------------------------

class ContentIndexSnapshot:
    """
    Immutable snapshot of the content index.
    """
    def __init__(
        self,
        snapshot_id: SnapshotID,
        mapping: Dict[str, ArtifactLocation],
        log_position: LogPosition,
    ):
        self.snapshot_id = snapshot_id
        self.log_position = log_position
        self._mapping = dict(mapping)

    def get(self, key: str) -> Optional[ArtifactLocation]:
        return self._mapping.get(key)

    def items(self):
        return self._mapping.items()

Append-only log with positions

class ContentIndexLog:
    """
    Append-only, totally ordered log.
    """
    def __init__(self, start_position: LogPosition = 0):
        self._entries: List[IndexLogEntry] = []
        self._next_position = start_position

    def append_put(self, key: str, loc: ArtifactLocation) -> LogPosition:
        pos = self._next_position
        self._next_position += 1
        self._entries.append(IndexLogEntry(
            position=pos,
            op=LogOp.PUT,
            artifact_key=key,
            location=loc,
        ))
        return pos

    def append_tombstone(self, key: str) -> LogPosition:
        pos = self._next_position
        self._next_position += 1
        self._entries.append(IndexLogEntry(
            position=pos,
            op=LogOp.TOMBSTONE,
            artifact_key=key,
        ))
        return pos

    def entries(self) -> Iterable[IndexLogEntry]:
        return self._entries

    def upto(self, position: LogPosition) -> Iterable[IndexLogEntry]:
        return (e for e in self._entries if e.position < position)

    @property
    def tail_position(self) -> LogPosition:
        return self._next_position

ContentIndex with explicit CURRENT

class ContentIndex:
    """
    ASL-CORE-INDEX with snapshot IDs and log positions.
    """

    def __init__(
        self,
        snapshot: Optional[ContentIndexSnapshot] = None,
        log: Optional[ContentIndexLog] = None,
    ):
        self._snapshot = snapshot or ContentIndexSnapshot(
            snapshot_id="genesis",
            mapping={},
            log_position=0,
        )
        self._log = log or ContentIndexLog(
            start_position=self._snapshot.log_position
        )

    # -----------------------------------------------------------------
    # Lookup at CURRENT
    # -----------------------------------------------------------------

    def lookup(self, key: str) -> Optional[ArtifactLocation]:
        # Shadowing: reverse replay up to CURRENT
        for entry in reversed(list(self._log.entries())):
            if entry.artifact_key != key:
                continue
            if entry.op == LogOp.TOMBSTONE:
                return None
            return entry.location

        return self._snapshot.get(key)

    # -----------------------------------------------------------------
    # Append-only mutation
    # -----------------------------------------------------------------

    def put(self, key: str, loc: ArtifactLocation) -> LogPosition:
        return self._log.append_put(key, loc)

    def tombstone(self, key: str) -> LogPosition:
        return self._log.append_tombstone(key)

    # -----------------------------------------------------------------
    # Snapshotting
    # -----------------------------------------------------------------

    def snapshot(self) -> ContentIndexSnapshot:
        """
        Create a new immutable snapshot representing CURRENT.
        """
        materialized: Dict[str, ArtifactLocation] = dict(self._snapshot.items())

        for entry in self._log.entries():
            if entry.op == LogOp.PUT:
                materialized[entry.artifact_key] = entry.location
            elif entry.op == LogOp.TOMBSTONE:
                materialized.pop(entry.artifact_key, None)

        snapshot_id = str(uuid.uuid4())
        snapshot = ContentIndexSnapshot(
            snapshot_id=snapshot_id,
            mapping=materialized,
            log_position=self._log.tail_position,
        )
        return snapshot

    def install_snapshot(self, snapshot: ContentIndexSnapshot):
        """
        Install a snapshot and reset log (external coordination assumed).
        """
        self._snapshot = snapshot
        self._log = ContentIndexLog(start_position=snapshot.log_position)

    # -----------------------------------------------------------------
    # Observability
    # -----------------------------------------------------------------

    @property
    def current_snapshot_id(self) -> SnapshotID:
        return self._snapshot.snapshot_id

    @property
    def current_log_position(self) -> LogPosition:
        return self._log.tail_position

5 Why this is enough (and not too much)

You now have:

  • Snapshot identity
  • Log positions
  • Deterministic replay
  • Explicit CURRENT
  • Federation-ready state vectors
  • Crash recovery anchor
  • Replication boundary

And still:

  • No IO assumptions
  • No block allocation rules
  • No GC
  • No concurrency model

Thats exactly what ASL-STORE-INDEX should be.


6 How this unlocks higher layers

With (SnapshotID, LogPosition) you can now:

  • Compare index states across nodes
  • Prove two systems are in sync
  • Incrementally replicate logs
  • Build federated overlays
  • Anchor provenance certificates
  • Bind DAG derivations to storage state
  • Feed stable views into ML / analytics

This is the bridge between your infrastructure and everything above it.


7 Sanity check (design smell test)

Ask yourself:

“Can I reconstruct CURRENT from snapshot + log prefix?”

Yes — deterministically.

“Can I drop the cache and still recover?”

Yes.

“Can classical bytes ignore all of this?”

Yes — fast path still exists.

That means the design is sound.


If you want, next we can:

  • define ASL-STORE put/get contract
  • add federation merge semantics
  • define provenance certificates
  • or show how LLMs consume snapshot-relative DAGs

Youre building something real here.