amduat/tier1/asl-1-store.md

514 lines
20 KiB
Markdown
Raw Normal View History

# ASL/1-STORE — Content-Addressable Store (Core)
Status: Approved
Owner: Niklas Rydberg
Version: 0.4.0
SoT: Yes
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [deterministic, import-export]
<!-- Source: /amduat/docs/new/asl-store.md | Canonical: /amduat/tier1/asl-1-store.md -->
**Document ID:** ASL/1-STORE
**Layer:** L0.5 — Store model over ASL/1-CORE (above value model, below execution/provenance)
**Depends on (normative):**
* `ASL/1-CORE v0.3.x` — value substrate: `Artifact`, `Reference`, `TypeTag`, identity model
**Informative references:**
* `ENC/ASL1-CORE v1.0.x` — canonical encodings for `Artifact` / `Reference` (e.g. `ASL_ENC_CORE_V1`)
* `HASH/ASL1 v0.2.x` — ASL1 hash family (`HashId`, e.g. `HASH-ASL1-256`)
* `TGK/1-CORE v0.7.x` — trace graph kernel over `Reference`
* `PEL/1` — execution substrate (uses ASL/1-STORE for I/O)
* `CIL/1`, `FCT/1`, `FER/1`, `OI/1` — profiles that rely on content-addressable storage
> **Versioning note**
> ASL/1-STORE is agnostic to minor revisions of these informative documents, provided they preserve:
>
> * the ASL/1-CORE definitions of `Artifact`, `Reference`, `TypeTag`, and identity, and
> * the existence of at least one canonical encoding and hash configuration usable for reference derivation.
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 0. Conventions
### 0.1 RFC 2119 terminology
The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**,
**SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** are to be
interpreted as described in RFC 2119.
### 0.2 Terms from ASL/1-CORE
This specification reuses the following concepts from `ASL/1-CORE`:
* **Artifact**
```text
Artifact {
bytes: OctetString
type_tag: optional TypeTag
}
```
* **Reference**
```text
Reference {
hash_id: HashId
digest: OctetString
}
```
* **TypeTag** — `uint32` hint for intended interpretation of `Artifact.bytes`.
* **HashId** — `uint16` identifying a hash algorithm (e.g. in `HASH/ASL1`).
Where this document says **ArtifactRef**, it means an ASL/1 `Reference` that logically identifies an `Artifact` under the identity rules of `ASL/1-CORE`.
### 0.3 Additional terminology
* **StoreInstance** — an abstract content-addressable store implementing ASL/1-STORE semantics.
* **StoreConfig** — the identity-related configuration of a StoreInstance (see §2.2).
* **StoreSnapshot** — the logical state of a StoreInstance at some instant: a finite mapping from `Reference` to `Artifact`, plus its fixed `StoreConfig`.
* **ExecutionEnvironment** — any deployment context (process, node, cluster) that hosts one or more StoreInstances; used only informatively.
ASL/1-STORE defines **logical semantics** only. Physical representation (files, DB rows, object stores), indexing, and transport are out of scope.
---
## 1. Purpose, Scope & Non-Goals
### 1.1 Purpose
`ASL/1-STORE` defines the **minimal content-addressable store model** over `ASL/1-CORE` values.
It provides:
* The notion of a **StoreInstance** as a partial mapping:
```text
Reference -> Artifact // zero or one Artifact per Reference
```
* The semantics of two core operations:
* `put(Artifact) -> Reference`
* `get(Reference) -> Artifact | error`
* A small, logical **error model** at the store boundary.
The goals are:
* **Determinism:** same Artifact, same configuration ⇒ same Reference and same store behavior.
* **Immutability:** once a Reference is associated with an Artifact, that association does not change.
* **Separation of concerns:** ASL/1-STORE defines logical behavior; physical storage and APIs are separate concerns.
> **STORE/CORE-MINIMAL/1**
> ASL/1-STORE **MUST** remain a thin logical layer over ASL/1-CORE: it defines what a content-addressable store *is* and how `put`/`get` behave; it **MUST NOT** embed higher-level concepts such as execution, provenance, or policy.
### 1.2 Non-goals
ASL/1-STORE explicitly does **not** define:
* Concrete APIs (HTTP, gRPC, language-specific interfaces).
* Authentication, authorization, tenancy, or quotas.
* Replication, redundancy, durability, retention, or garbage collection policies.
* Chunking, compression, encryption, or indexing strategies.
* Network discovery, routing, or federation protocols.
These are the responsibility of higher-layer specifications, implementation profiles, and operational policy.
---
## 2. Core Store Model
### 2.1 StoreInstance as a partial mapping
At any given StoreSnapshot, a StoreInstance can be viewed as a partial function:
```text
StoreSnapshot.M : Reference -> Artifact // 0 or 1 Artifact per Reference
```
Properties:
* For any given `ref`, `StoreSnapshot.M(ref)` is either:
* undefined (no stored Artifact), or
* a single `Artifact` value.
* There are no duplicate or conflicting mappings in a single snapshot.
ASL/1-STORE does not specify how snapshots are implemented (MVCC, copy-on-write, etc.). It only constrains the logical mapping at any instant.
### 2.2 StoreConfig (identity configuration)
Each StoreInstance has a **StoreConfig** that determines how References are derived:
```text
StoreConfig {
encoding_profile: EncodingProfileId // e.g. ASL_ENC_CORE_V1
hash_id: HashId // e.g. 0x0001 (HASH-ASL1-256)
}
```
Constraints:
* `encoding_profile` MUST name a canonical encoding profile for `Artifact` (e.g. `ASL_ENC_CORE_V1` from `ENC/ASL1-CORE`).
* `hash_id` MUST identify a fixed hash algorithm (e.g. `HASH-ASL1-256`) whose behavior is stable as per `HASH/ASL1`.
> **STORE/CONFIG-FIXED/CORE/1**
> For a given StoreSnapshot, `StoreConfig.encoding_profile` and `StoreConfig.hash_id` are fixed. All `put` and `get` operations in that snapshot are interpreted relative to that configuration.
Implementations MAY support multiple configurations (e.g. separate namespaces per profile), but each StoreInstance, as seen through ASL/1-STORE, is always parameterised by a single `StoreConfig`.
### 2.3 Relationship to ASL/1-CORE identity
ASL/1-CORE defines how a `Reference` is derived from an `Artifact` given an encoding profile and hash algorithm:
```text
ArtifactBytes = encode_P(Artifact)
digest = H(ArtifactBytes)
Reference = { hash_id = HID, digest = digest }
```
ASL/1-STORE **reuses** this rule unchanged. For a StoreInstance with `StoreConfig` = `{ encoding_profile = P, hash_id = HID }`:
* `put` MUST derive `Reference` values exactly via the ASL/1-CORE rule for `(P, HID)`.
* `get` MUST respect the mapping semantics defined in §3.2.
ASL/1-STORE does **not** introduce any new notion of identity beyond ASL/1-CORE.
---
## 3. Store Operations
ASL/1-STORE defines two mandatory logical operations:
* `put(Artifact) -> Reference`
* `get(Reference) -> Artifact | error`
Concrete APIs MUST be semantically equivalent to these.
### 3.1 `put(Artifact) -> Reference`
**Logical signature:**
```text
put(artifact: Artifact) -> Reference | error
```
Let the StoreInstance have `StoreConfig`:
* `P = encoding_profile`
* `HID = hash_id`
* `H =` hash algorithm associated with `HID`
**Semantics:**
1. Compute the canonical encoding of `artifact` under `P`:
```text
ArtifactBytes = encode_P(artifact)
```
2. Compute the Reference under `(P, H)` as per ASL/1-CORE:
```text
digest = H(ArtifactBytes)
reference = Reference { hash_id = HID, digest = digest }
```
3. Consider the current StoreSnapshot mapping `M`:
* If `M(reference)` is **undefined** (no existing Artifact stored under `reference`):
* Logically define `M'(reference) = artifact`.
* All other mappings remain unchanged.
* If `M(reference) = artifact'` is **defined**:
* If `artifact'` is **identical** to `artifact` in the ASL/1-CORE sense (same bytes, same type_tag status and value), then:
* `M' = M` (no logical change).
* If `artifact'` is **not** identical to `artifact`, this is a **collision**:
the store **MUST NOT** silently replace or merge the artifacts. It MUST treat this as an integrity error (see §4).
4. Return `reference` (or the appropriate error in the collision case).
> **STORE/PUT-IDEMP/CORE/1**
> For a given StoreInstance and `StoreConfig`, repeated calls to `put` with identical `Artifact` inputs **MUST** always return the same `Reference`, and must not change the logical mapping after the first successful insertion.
> **STORE/PUT-NO-ALIAS/CORE/1**
> A StoreInstance **MUST NOT** associate two non-identical Artifacts with the same `Reference` (under its `StoreConfig`). Any attempt to do so **MUST** result in an integrity error.
Implementations MAY optimize by:
* caching `ArtifactBytes`,
* deduplicating storage,
* or short-circuiting `put` if a `Reference` is known to exist.
These do not affect the logical semantics.
### 3.2 `get(Reference) -> Artifact | error`
**Logical signature:**
```text
get(ref: Reference) -> Artifact | error
```
Let `M` be the current StoreSnapshot mapping.
**Semantics:**
* If `M(ref)` is **defined** (there is a stored Artifact `A`):
* `get(ref)` MUST return an Artifact identical to `A` in the ASL/1-CORE sense.
* If `M(ref)` is **undefined**:
* `get(ref)` MUST fail with `ERR_NOT_FOUND` (see §4.1).
> **STORE/GET-PURE/CORE/1**
> For a fixed StoreSnapshot and `ref`:
>
> * If `M(ref)` is defined as `A`, repeated `get(ref)` calls **MUST** return Artifacts identical to `A`.
> * If `M(ref)` is undefined, repeated `get(ref)` calls **MUST** consistently return `ERR_NOT_FOUND`, unless the mapping is changed by a subsequent `put` or administrative import.
ASL/1-STORE does **not** require that `get` recompute and re-verify the digest on every access. It does require that:
* implementations maintain internal invariants so that `M(ref)` remains consistent with the ASL/1-CORE identity rule for the configured `StoreConfig`; and
* any detected inconsistencies are treated as integrity errors (see §4.1).
### 3.3 Deletion, GC, and administrative changes (informative)
ASL/1-STORE does not standardize deletion or garbage collection, but logically:
* Removing a mapping `ref -> Artifact` transforms `M(ref)` from defined to undefined.
* After such removal, `get(ref)` must return `ERR_NOT_FOUND`.
How and when such changes occur (manual deletion, GC, retention policies) is up to higher layers and operational policy.
---
## 4. Error Model
ASL/1-STORE defines a minimal logical error model. Concrete APIs may map these to exceptions, status codes, or error variants, but MUST preserve their semantics.
### 4.1 Core error categories
1. **Not Found — `ERR_NOT_FOUND`**
Condition:
* `get(ref)` is invoked and `M(ref)` is undefined in the current StoreSnapshot.
2. **Integrity Error — `ERR_INTEGRITY`**
Conditions include (non-exhaustive):
* A `put` would associate a `Reference` with an `Artifact` different from the `Artifact` already stored under that `Reference` (violating `STORE/PUT-NO-ALIAS/CORE/1`).
* An internal invariant check reveals that for some `(ref, A)` in `M`, the canonical encoding and hash under `StoreConfig` no longer produce `ref` for `A` (corruption or misconfiguration).
Behavior on integrity errors (e.g. fail-fast vs quarantine) is implementation and policy dependent, but:
> **STORE/INTEGRITY-NO-SILENCE/CORE/1**
> A StoreInstance **MUST NOT** silently accept or mask conditions that violate ASL/1 identity invariants. Such conditions **MUST** manifest as `ERR_INTEGRITY` (or an error that is semantically at least as strong) at the stores API or monitoring boundary.
3. **Unsupported Identity Configuration — `ERR_UNSUPPORTED`**
Conditions (examples):
* A `put` or internal operation requires computing a `Reference` using an `encoding_profile` or `hash_id` that the StoreInstance does not implement.
* A `get` is invoked with a `Reference.hash_id` that the StoreInstance is not configured to support, and the StoreInstances policy is to reject such references rather than treating them as potentially unmapped.
Implementations MAY choose to:
* Accept unknown `hash_id` values but always treat them as “possibly unmapped” (effectively `ERR_NOT_FOUND` if no mapping exists); or
* Reject them explicitly as `ERR_UNSUPPORTED`.
ASL/1-STORE does not standardize I/O failures, timeouts, or auth errors; those are part of concrete API and deployment design.
---
## 5. Locality & Data Movement (Informative)
ASL/1-STORE is a logical model, but real systems care about data movement. To keep computing sane and efficient, implementations are encouraged to follow a **data movement minimization principle**.
### 5.1 Within a StoreInstance
Within a single StoreInstance and its **co-located** consumers (e.g. PEL/1 engines, TGK/1 ingestors, CIL/1 logic in the same process):
* Implementations **SHOULD** avoid copying Artifact bytes unnecessarily.
* They MAY:
* Represent `Artifact.bytes` as immutable views over underlying buffers (e.g. memory-mapped files, shared segments).
* Pass those views directly to co-located components instead of serializing/deserializing repeatedly.
* Delay or avoid materializing full `ArtifactBytes` unless required.
These optimizations are invisible at the ASL/1-STORE level as long as:
* returned `Artifact`s satisfy ASL/1-CORE equality; and
* `put`/`get` semantics remain as defined.
### 5.2 Across StoreInstances (inter-store transfer)
ASL/1-STORE does not define a transfer protocol, but the **logical meaning** of transferring an Artifact from `S_src` to `S_dst` is:
1. `artifact = S_src.get(ref_src)`
2. If `artifact` is `ERR_NOT_FOUND`, transfer fails with `ERR_NOT_FOUND`.
3. Otherwise, `ref_dst = S_dst.put(artifact)`
4. Return `ref_dst`.
If `S_src` and `S_dst` share the same `StoreConfig` (`encoding_profile` and `hash_id`), then:
* For any `Artifact`, `ref_dst` MUST equal `ref_src`.
If they differ (e.g. different hash or encoding), then:
* `ref_dst` MAY differ from `ref_src` while still identifying an Artifact identical in the ASL/1 sense.
* Higher layers (e.g. overlays, provenance profiles) MAY track both references.
Implementations **SHOULD** send only necessary data (canonical bytes or equivalent) and deduplicate at the destination by `Reference`.
---
## 6. Interaction with Other Layers (Informative)
### 6.1 PEL/1 (Primitive Execution Layer)
PEL/1 typically:
* Uses a co-located StoreInstance for:
* fetching input Artifacts by `Reference` (`get`), and
* persisting outputs and ExecutionResults (`put`).
Given ASL/1-STORE semantics:
* PEL/1 can rely on `get(ref)` to be pure and deterministic for a fixed snapshot.
* PEL/1 can rely on `put(artifact)` to be idempotent and to provide a stable `Reference` used elsewhere (e.g. in TGK edges, receipts, or facts).
ASL/1-STORE does not constrain PEL/1 scheduling, side effects, or execution policies.
### 6.2 TGK/1-CORE (Trace Graph Kernel)
TGK/1-CORE treats StoreInstances as one of many possible sources of Artifacts:
* EdgeArtifacts and other provenance-relevant Artifacts may be stored in ASL/1-STORE.
* TGK/1-CORE then builds a `ProvenanceGraph` over their `Reference`s.
ASL/1-STORE provides:
* stable `put`/`get` semantics for resolving `Reference -> Artifact`;
* immutability guarantees that underpin TGKs projection invariants.
### 6.3 CIL/1, FCT/1, FER/1, OI/1
Certification, transaction, and overlay layers:
* Use `put` to persist certificate Artifacts, fact Artifacts, evidence bundles, overlay records, etc.
* Use `get` to resolve `Reference`s when verifying proofs, reconstructing receipts, or answering queries.
They rely on ASL/1-STORE to:
* maintain consistent mappings for `Reference -> Artifact`;
* avoid silent collisions;
* distinguish `ERR_NOT_FOUND` vs `ERR_INTEGRITY` vs `ERR_UNSUPPORTED` at the storage boundary.
---
## 7. Conformance
An implementation is **ASL/1-STOREconformant** if, for each StoreInstance it exposes, it satisfies all of the following:
1. **StoreConfig correctness**
* Associates a well-defined `StoreConfig` (`encoding_profile`, `hash_id`) with each StoreInstance.
* Uses that configuration consistently for all `put` and internal identity-related operations in a StoreSnapshot.
2. **Correct `put` semantics**
* Implements `put(Artifact)` as in §3.1:
* derives `Reference` via ASL/1-COREs canonical encoding and hashing rule for its `StoreConfig`;
* ensures `STORE/PUT-IDEMP/CORE/1` and `STORE/PUT-NO-ALIAS/CORE/1`.
3. **Correct `get` semantics**
* Implements `get(Reference)` as in §3.2:
* if a mapping exists, returns an Artifact identical (ASL/1-CORE equality) to the stored value;
* if no mapping exists, returns `ERR_NOT_FOUND`.
* Guarantees `STORE/GET-PURE/CORE/1` for any fixed StoreSnapshot.
4. **Integrity handling**
* Detects and surfaces integrity violations as `ERR_INTEGRITY` (or stricter), consistent with `STORE/INTEGRITY-NO-SILENCE/CORE/1`.
* Does not silently accept collisions or identity-breaking inconsistencies.
5. **Identity preservation**
* Ensures that any `(ref, artifact)` mapping established by `put` is consistent with ASL/1-COREs definition of `Reference` for the configured `StoreConfig`.
* Does not introduce alternate identity notions (e.g. “object IDs”, “paths”) that override or replace `Reference` at this layer.
6. **Separation of logical semantics from implementation**
* Treats physical layout, caching, chunking, and replication as internal concerns that do not alter the logical `put`/`get` behavior.
* Does not require clients to know about file paths, DB keys, or internal topologies for correctness.
7. **Profile compatibility (if claimed)**
* If the implementation claims compatibility with specific encoding profiles (e.g. `ENC/ASL1-CORE v1`) and hash families (`HASH/ASL1`), it actually implements them according to those specifications.
* Any additional surfaces (e.g. “multi-profile stores”, “multi-hash stores”) are documented as separate layers or profiles and do not violate the core semantics above.
Everything else — transport design, API shape, performance characteristics, distribution, and operational policies — lies outside ASL/1-STORE and may be specified by separate documents and implementation guides.
---
## 8. Evolution (Informative)
ASL/1-STORE is intended to evolve **additively**:
* New encoding profiles (`EncodingProfileId`s) and hash algorithms (`HashId`s) can be introduced by `ENC/ASL1-CORE` and `HASH/ASL1` without changing ASL/1-STORE.
* New store-level profiles (e.g. “sharded store”, “append-only store”, “multi-profile store”) can be defined as long as they respect the core semantics of `put`/`get`.
ASL/1-STORE itself MUST NOT be changed in a way that:
* alters the meaning of existing `StoreConfig` combinations; or
* permits a conformant StoreInstance to associate two different Artifacts with the same `Reference` under the same configuration.
Such changes would be considered a new major surface (e.g. `ASL/2-STORE`), not an evolution of `ASL/1-STORE`.
This aligns with the broader Amduat principle:
> **Evolve by addition and explicit versioning; never rewrite identity or history.**
---
## Document History
* **0.4.0 (2025-11-16):** Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.