amduat-api/notes/ASL-STORE put-get contract.md

315 lines
6.1 KiB
Markdown
Raw Normal View History

Great — this is the **right moment** to define it, because everything underneath (artifacts, DAGs, indexes, snapshots, logs) is now solid.
What follows is a **clean, minimal, *normative* ASL-STORE put/get contract** that:
* Sits *between* ASL-CORE / ASL-CORE-INDEX and any concrete storage engine
* Is compatible with your snapshot + log semantics
* Does **not** assume classical vs quantum
* Makes caching, deduplication, and replay possible
* Avoids over-specifying performance or layout
Think of this as **the membrane between semantics and mechanics**.
---
# ASL-STORE — Put/Get Contract (Normative)
## 1. Purpose
ASL-STORE defines the **operational contract** by which:
* Artifacts are **materialized and stored**
* Artifact content becomes **visible** via the ASL-CORE-INDEX
* Stored content is **retrieved deterministically**
ASL-STORE answers exactly two questions:
> **PUT**: How does an artifact become stored and indexed?
> **GET**: How are bytes retrieved once indexed?
Nothing more.
---
## 2. Scope
ASL-STORE defines:
* The **PUT lifecycle**
* The **GET lifecycle**
* Required interactions with:
* Content Index (ASL-CORE-INDEX)
* Structural DAG
* Materialization cache
* Visibility and determinism rules
ASL-STORE does **not** define:
* Block allocation strategy
* File layout
* IO APIs
* Concurrency primitives
* Caching policies
* Garbage collection
* Replication mechanics
---
## 3. Actors and Dependencies
ASL-STORE operates in the presence of:
* **Artifact DAG** (SID-addressed)
* **Materialization Cache** (`SID → CID`, optional)
* **Content Index** (`CID → ArtifactLocation`)
* **Block Store** (opaque byte storage)
* **Snapshot + Log** (for index visibility)
ASL-STORE **must not** bypass the Content Index.
---
## 4. PUT Contract
### 4.1 PUT Signature (Semantic)
```
put(artifact) → (CID, IndexState)
```
Where:
* `artifact` is an ASL artifact (possibly lazy, possibly quantum)
* `CID` is the semantic content identity
* `IndexState = (SnapshotID, LogPosition)` after the put
---
### 4.2 PUT Semantics (Step-by-step)
The following steps are **logically ordered**.
An implementation may optimize, but may not violate the semantics.
---
#### Step 1 — Structural registration (mandatory)
* The artifact **must** be registered in the Structural Index (SID → DAG).
* If an identical SID already exists, it **must be reused**.
> This guarantees derivation identity independent of storage.
---
#### Step 2 — CID resolution (lazy, cache-aware)
* If `(SID → CID)` exists in the Materialization Cache:
* Use it.
* Otherwise:
* Materialize the artifact DAG
* Compute the CID
* Cache `(SID → CID)`
> Materialization may recursively invoke child artifacts.
---
#### Step 3 — Deduplication check (mandatory)
* Lookup `CID` in the Content Index at CURRENT.
* If an entry exists:
* **No bytes are written**
* **No new index entry is required**
* PUT completes successfully
> This is **global deduplication**.
---
#### Step 4 — Physical storage (conditional)
If no existing entry exists:
* Bytes corresponding to `CID` **must be written** to a block
* A concrete `ArtifactLocation` is produced:
```
ArtifactLocation = Sequence[BlockSlice]
BlockSlice = (BlockID, offset, length)
```
No assumptions are made about block layout.
---
#### Step 5 — Index mutation (mandatory)
* Append a **PUT log entry** to the Content Index:
```
CID → ArtifactLocation
```
* The entry is **not visible** until the log position is ≤ CURRENT.
> This is the *only* moment storage becomes visible.
---
### 4.3 PUT Guarantees
After successful PUT:
* The artifacts CID:
* Is stable
* Is retrievable
* Will resolve to immutable bytes
* The Content Index state:
* Advances monotonically
* Is replayable
* Repeating PUT with the same artifact:
* Is idempotent
---
## 5. GET Contract
### 5.1 GET Signature (Semantic)
```
get(CID, IndexState?) → bytes | NOT_FOUND
```
Where:
* `CID` is the content identity
* `IndexState` is optional:
* Defaults to CURRENT
* May specify `(SnapshotID, LogPosition)`
---
### 5.2 GET Semantics
1. Resolve `CID → ArtifactLocation` using:
```
Index(snapshot, log_prefix)
```
2. If no entry exists:
* Return `NOT_FOUND`
3. Otherwise:
* Read exactly `(length)` bytes from `(BlockID, offset)`
* Return bytes **verbatim**
No interpretation is applied.
---
### 5.3 GET Guarantees
* Returned bytes are:
* Immutable
* Deterministic
* Content-addressed
* GET never triggers materialization
* GET never mutates state
---
## 6. Visibility Rules
An index entry is visible **if and only if**:
1. The referenced block is sealed
2. The log entry position ≤ CURRENT log position
3. The snapshot + log prefix includes the entry
ASL-STORE must respect these rules strictly.
---
## 7. Failure Semantics (Minimal)
ASL-STORE must guarantee:
* No visible index entry points to missing or mutable bytes
* Partial writes must not become visible
* Replaying snapshot + log after crash yields a valid index
No stronger guarantees are required at this level.
---
## 8. Determinism Contract
For fixed:
* SnapshotID
* LogPosition
* Artifact DAG
* Hash algorithms
ASL-STORE guarantees:
* Same CID
* Same ArtifactLocation resolution
* Same GET results
This is the foundation for:
* Federation
* Replication
* Provenance
* Certification
---
## 9. Relationship to q-bits (Explicit)
ASL-STORE:
* Is **agnostic** to q-bits
* Sees only:
* CIDs
* Bytes
* Does not care how CIDs were derived
Quantum artifacts affect **materialization**, not storage semantics.
---
## 10. Summary (Mental Model)
You can summarize ASL-STORE like this:
> **PUT** = “Make this artifact real, once, forever”
> **GET** = “Given identity and time, give me the bytes”
Everything else is someone elses problem — by design.
---
If you want, next we can:
* define **crash consistency boundaries**
* add **federated PUT/GET**
* define **certificates binding DAG → CID → Snapshot**
* show **how LLM outputs become artifacts**
* or formalize **overlay indexes** (concepts, domains, ontologies)
This layer is now solid enough to build *anything* above it.