Carl Niklas Rydberg b47b914224 Scaffold C layout and ASL registry model

2025-12-19 19:22:40 +01:00

20 KiB

Raw Blame History

ASL/1-STORE — Content-Addressable Store (Core)

Status: Approved Owner: Niklas Rydberg Version: 0.4.0 SoT: Yes Last Updated: 2025-11-16 Linked Phase Pack: N/A Tags: [deterministic, import-export]

Document ID: ASL/1-STORE Layer: L0.5 — Store model over ASL/1-CORE (above value model, below execution/provenance)

Depends on (normative):

ASL/1-CORE v0.3.x — value substrate: Artifact, Reference, TypeTag, identity model

Informative references:

ENC/ASL1-CORE v1.0.x — canonical encodings for Artifact / Reference (e.g. ASL_ENC_CORE_V1)
HASH/ASL1 v0.2.x — ASL1 hash family (HashId, e.g. HASH-ASL1-256)
TGK/1-CORE v0.7.x — trace graph kernel over Reference
PEL/1 — execution substrate (uses ASL/1-STORE for I/O)
CIL/1, FCT/1, FER/1, OI/1 — profiles that rely on content-addressable storage

Versioning note ASL/1-STORE is agnostic to minor revisions of these informative documents, provided they preserve:

the ASL/1-CORE definitions of Artifact, Reference, TypeTag, and identity, and

the existence of at least one canonical encoding and hash configuration usable for reference derivation.

License

Except where otherwise noted, this document (text and diagrams) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

The identifier registries and mapping tables (e.g. TypeTag IDs, HashId assignments, EdgeTypeId tables) are additionally made available under CC0 1.0 Universal (CC0) to enable unrestricted reuse in implementations and derivative specifications.

Code examples in this document are provided under the Apache License 2.0 unless explicitly stated otherwise. Test vectors, where present, are dedicated to the public domain under CC0 1.0.

0. Conventions

0.1 RFC 2119 terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL are to be interpreted as described in RFC 2119.

0.2 Terms from ASL/1-CORE

This specification reuses the following concepts from ASL/1-CORE:

Artifact

Artifact {
  bytes:    OctetString
  type_tag: optional TypeTag
}

Reference

Reference {
  hash_id: HashId
  digest:  OctetString
}

TypeTag — uint32 hint for intended interpretation of Artifact.bytes.
HashId — uint16 identifying a hash algorithm (e.g. in HASH/ASL1).

Where this document says ArtifactRef, it means an ASL/1 Reference that logically identifies an Artifact under the identity rules of ASL/1-CORE.

0.3 Additional terminology

StoreInstance — an abstract content-addressable store implementing ASL/1-STORE semantics.
StoreConfig — the identity-related configuration of a StoreInstance (see §2.2).
StoreSnapshot — the logical state of a StoreInstance at some instant: a finite mapping from Reference to Artifact, plus its fixed StoreConfig.
ExecutionEnvironment — any deployment context (process, node, cluster) that hosts one or more StoreInstances; used only informatively.

ASL/1-STORE defines logical semantics only. Physical representation (files, DB rows, object stores), indexing, and transport are out of scope.

1. Purpose, Scope & Non-Goals

1.1 Purpose

ASL/1-STORE defines the minimal content-addressable store model over ASL/1-CORE values.

It provides:

The notion of a StoreInstance as a partial mapping:

Reference -> Artifact   // zero or one Artifact per Reference

The semantics of two core operations:
- put(Artifact) -> Reference
- get(Reference) -> Artifact | error
A small, logical error model at the store boundary.

The goals are:

Determinism: same Artifact, same configuration ⇒ same Reference and same store behavior.
Immutability: once a Reference is associated with an Artifact, that association does not change.
Separation of concerns: ASL/1-STORE defines logical behavior; physical storage and APIs are separate concerns.

STORE/CORE-MINIMAL/1 ASL/1-STORE MUST remain a thin logical layer over ASL/1-CORE: it defines what a content-addressable store is and how put/get behave; it MUST NOT embed higher-level concepts such as execution, provenance, or policy.

1.2 Non-goals

ASL/1-STORE explicitly does not define:

Concrete APIs (HTTP, gRPC, language-specific interfaces).
Authentication, authorization, tenancy, or quotas.
Replication, redundancy, durability, retention, or garbage collection policies.
Chunking, compression, encryption, or indexing strategies.
Network discovery, routing, or federation protocols.

These are the responsibility of higher-layer specifications, implementation profiles, and operational policy.

2. Core Store Model

2.1 StoreInstance as a partial mapping

At any given StoreSnapshot, a StoreInstance can be viewed as a partial function:

StoreSnapshot.M : Reference -> Artifact   // 0 or 1 Artifact per Reference

Properties:

For any given ref, StoreSnapshot.M(ref) is either:
- undefined (no stored Artifact), or
- a single Artifact value.
There are no duplicate or conflicting mappings in a single snapshot.

ASL/1-STORE does not specify how snapshots are implemented (MVCC, copy-on-write, etc.). It only constrains the logical mapping at any instant.

2.2 StoreConfig (identity configuration)

Each StoreInstance has a StoreConfig that determines how References are derived:

StoreConfig {
  encoding_profile: EncodingProfileId   // e.g. ASL_ENC_CORE_V1
  hash_id:          HashId             // e.g. 0x0001 (HASH-ASL1-256)
}

Constraints:

encoding_profile MUST name a canonical encoding profile for Artifact (e.g. ASL_ENC_CORE_V1 from ENC/ASL1-CORE).
hash_id MUST identify a fixed hash algorithm (e.g. HASH-ASL1-256) whose behavior is stable as per HASH/ASL1.

STORE/CONFIG-FIXED/CORE/1 For a given StoreSnapshot, StoreConfig.encoding_profile and StoreConfig.hash_id are fixed. All put and get operations in that snapshot are interpreted relative to that configuration.

Implementations MAY support multiple configurations (e.g. separate namespaces per profile), but each StoreInstance, as seen through ASL/1-STORE, is always parameterised by a single StoreConfig.

2.3 Relationship to ASL/1-CORE identity

ASL/1-CORE defines how a Reference is derived from an Artifact given an encoding profile and hash algorithm:

ArtifactBytes = encode_P(Artifact)
digest        = H(ArtifactBytes)
Reference     = { hash_id = HID, digest = digest }

ASL/1-STORE reuses this rule unchanged. For a StoreInstance with StoreConfig = { encoding_profile = P, hash_id = HID }:

put MUST derive Reference values exactly via the ASL/1-CORE rule for (P, HID).
get MUST respect the mapping semantics defined in §3.2.

ASL/1-STORE does not introduce any new notion of identity beyond ASL/1-CORE.

3. Store Operations

ASL/1-STORE defines two mandatory logical operations:

put(Artifact) -> Reference
get(Reference) -> Artifact | error

Concrete APIs MUST be semantically equivalent to these.

3.1 `put(Artifact) -> Reference`

Logical signature:

put(artifact: Artifact) -> Reference | error

Let the StoreInstance have StoreConfig:

P = encoding_profile
HID = hash_id
H = hash algorithm associated with HID

Semantics:

Compute the canonical encoding of artifact under P:
```
ArtifactBytes = encode_P(artifact)
```

Compute the Reference under (P, H) as per ASL/1-CORE:

digest    = H(ArtifactBytes)
reference = Reference { hash_id = HID, digest = digest }

Consider the current StoreSnapshot mapping M:
- If M(reference) is undefined (no existing Artifact stored under reference):
  - Logically define M'(reference) = artifact.
  - All other mappings remain unchanged.
- If M(reference) = artifact' is defined:
  - If artifact' is identical to artifact in the ASL/1-CORE sense (same bytes, same type_tag status and value), then:
    - M' = M (no logical change).
  - If artifact' is not identical to artifact, this is a collision: the store MUST NOT silently replace or merge the artifacts. It MUST treat this as an integrity error (see §4).
Return reference (or the appropriate error in the collision case).

STORE/PUT-IDEMP/CORE/1 For a given StoreInstance and StoreConfig, repeated calls to put with identical Artifact inputs MUST always return the same Reference, and must not change the logical mapping after the first successful insertion.

STORE/PUT-NO-ALIAS/CORE/1 A StoreInstance MUST NOT associate two non-identical Artifacts with the same Reference (under its StoreConfig). Any attempt to do so MUST result in an integrity error.

Implementations MAY optimize by:

caching ArtifactBytes,
deduplicating storage,
or short-circuiting put if a Reference is known to exist.

These do not affect the logical semantics.

3.2 `get(Reference) -> Artifact | error`

Logical signature:

get(ref: Reference) -> Artifact | error

Let M be the current StoreSnapshot mapping.

Semantics:

If M(ref) is defined (there is a stored Artifact A):
- get(ref) MUST return an Artifact identical to A in the ASL/1-CORE sense.
If M(ref) is undefined:
- get(ref) MUST fail with ERR_NOT_FOUND (see §4.1).

STORE/GET-PURE/CORE/1 For a fixed StoreSnapshot and ref:

If M(ref) is defined as A, repeated get(ref) calls MUST return Artifacts identical to A.

If M(ref) is undefined, repeated get(ref) calls MUST consistently return ERR_NOT_FOUND, unless the mapping is changed by a subsequent put or administrative import.

ASL/1-STORE does not require that get recompute and re-verify the digest on every access. It does require that:

implementations maintain internal invariants so that M(ref) remains consistent with the ASL/1-CORE identity rule for the configured StoreConfig; and
any detected inconsistencies are treated as integrity errors (see §4.1).

3.3 Deletion, GC, and administrative changes (informative)

ASL/1-STORE does not standardize deletion or garbage collection, but logically:

Removing a mapping ref -> Artifact transforms M(ref) from defined to undefined.
After such removal, get(ref) must return ERR_NOT_FOUND.

How and when such changes occur (manual deletion, GC, retention policies) is up to higher layers and operational policy.

4. Error Model

ASL/1-STORE defines a minimal logical error model. Concrete APIs may map these to exceptions, status codes, or error variants, but MUST preserve their semantics.

4.1 Core error categories

Not Found — ERR_NOT_FOUND

Condition:
- get(ref) is invoked and M(ref) is undefined in the current StoreSnapshot.
Integrity Error — ERR_INTEGRITY

Conditions include (non-exhaustive):
- A put would associate a Reference with an Artifact different from the Artifact already stored under that Reference (violating STORE/PUT-NO-ALIAS/CORE/1).
- An internal invariant check reveals that for some (ref, A) in M, the canonical encoding and hash under StoreConfig no longer produce ref for A (corruption or misconfiguration).
Behavior on integrity errors (e.g. fail-fast vs quarantine) is implementation and policy dependent, but:

STORE/INTEGRITY-NO-SILENCE/CORE/1 A StoreInstance MUST NOT silently accept or mask conditions that violate ASL/1 identity invariants. Such conditions MUST manifest as ERR_INTEGRITY (or an error that is semantically at least as strong) at the store’s API or monitoring boundary.
Unsupported Identity Configuration — ERR_UNSUPPORTED

Conditions (examples):
- A put or internal operation requires computing a Reference using an encoding_profile or hash_id that the StoreInstance does not implement.
- A get is invoked with a Reference.hash_id that the StoreInstance is not configured to support, and the StoreInstance’s policy is to reject such references rather than treating them as potentially unmapped.
Implementations MAY choose to:
- Accept unknown hash_id values but always treat them as “possibly unmapped” (effectively ERR_NOT_FOUND if no mapping exists); or
- Reject them explicitly as ERR_UNSUPPORTED.

ASL/1-STORE does not standardize I/O failures, timeouts, or auth errors; those are part of concrete API and deployment design.

5. Locality & Data Movement (Informative)

ASL/1-STORE is a logical model, but real systems care about data movement. To keep computing sane and efficient, implementations are encouraged to follow a data movement minimization principle.

5.1 Within a StoreInstance

Within a single StoreInstance and its co-located consumers (e.g. PEL/1 engines, TGK/1 ingestors, CIL/1 logic in the same process):

Implementations SHOULD avoid copying Artifact bytes unnecessarily.
They MAY:
- Represent Artifact.bytes as immutable views over underlying buffers (e.g. memory-mapped files, shared segments).
- Pass those views directly to co-located components instead of serializing/deserializing repeatedly.
- Delay or avoid materializing full ArtifactBytes unless required.

These optimizations are invisible at the ASL/1-STORE level as long as:

returned Artifacts satisfy ASL/1-CORE equality; and
put/get semantics remain as defined.

5.2 Across StoreInstances (inter-store transfer)

ASL/1-STORE does not define a transfer protocol, but the logical meaning of transferring an Artifact from S_src to S_dst is:

artifact = S_src.get(ref_src)
If artifact is ERR_NOT_FOUND, transfer fails with ERR_NOT_FOUND.
Otherwise, ref_dst = S_dst.put(artifact)
Return ref_dst.

If S_src and S_dst share the same StoreConfig (encoding_profile and hash_id), then:

For any Artifact, ref_dst MUST equal ref_src.

If they differ (e.g. different hash or encoding), then:

ref_dst MAY differ from ref_src while still identifying an Artifact identical in the ASL/1 sense.
Higher layers (e.g. overlays, provenance profiles) MAY track both references.

Implementations SHOULD send only necessary data (canonical bytes or equivalent) and deduplicate at the destination by Reference.

6. Interaction with Other Layers (Informative)

6.1 PEL/1 (Primitive Execution Layer)

PEL/1 typically:

Uses a co-located StoreInstance for:
- fetching input Artifacts by Reference (get), and
- persisting outputs and ExecutionResults (put).

Given ASL/1-STORE semantics:

PEL/1 can rely on get(ref) to be pure and deterministic for a fixed snapshot.
PEL/1 can rely on put(artifact) to be idempotent and to provide a stable Reference used elsewhere (e.g. in TGK edges, receipts, or facts).

ASL/1-STORE does not constrain PEL/1 scheduling, side effects, or execution policies.

6.2 TGK/1-CORE (Trace Graph Kernel)

TGK/1-CORE treats StoreInstances as one of many possible sources of Artifacts:

EdgeArtifacts and other provenance-relevant Artifacts may be stored in ASL/1-STORE.
TGK/1-CORE then builds a ProvenanceGraph over their References.

ASL/1-STORE provides:

stable put/get semantics for resolving Reference -> Artifact;
immutability guarantees that underpin TGK’s projection invariants.

6.3 CIL/1, FCT/1, FER/1, OI/1

Certification, transaction, and overlay layers:

Use put to persist certificate Artifacts, fact Artifacts, evidence bundles, overlay records, etc.
Use get to resolve References when verifying proofs, reconstructing receipts, or answering queries.

They rely on ASL/1-STORE to:

maintain consistent mappings for Reference -> Artifact;
avoid silent collisions;
distinguish ERR_NOT_FOUND vs ERR_INTEGRITY vs ERR_UNSUPPORTED at the storage boundary.

7. Conformance

An implementation is ASL/1-STORE–conformant if, for each StoreInstance it exposes, it satisfies all of the following:

StoreConfig correctness
- Associates a well-defined StoreConfig (encoding_profile, hash_id) with each StoreInstance.
- Uses that configuration consistently for all put and internal identity-related operations in a StoreSnapshot.
Correct put semantics
- Implements put(Artifact) as in §3.1:
  - derives Reference via ASL/1-CORE’s canonical encoding and hashing rule for its StoreConfig;
  - ensures STORE/PUT-IDEMP/CORE/1 and STORE/PUT-NO-ALIAS/CORE/1.
Correct get semantics
- Implements get(Reference) as in §3.2:
  - if a mapping exists, returns an Artifact identical (ASL/1-CORE equality) to the stored value;
  - if no mapping exists, returns ERR_NOT_FOUND.
- Guarantees STORE/GET-PURE/CORE/1 for any fixed StoreSnapshot.
Integrity handling
- Detects and surfaces integrity violations as ERR_INTEGRITY (or stricter), consistent with STORE/INTEGRITY-NO-SILENCE/CORE/1.
- Does not silently accept collisions or identity-breaking inconsistencies.
Identity preservation
- Ensures that any (ref, artifact) mapping established by put is consistent with ASL/1-CORE’s definition of Reference for the configured StoreConfig.
- Does not introduce alternate identity notions (e.g. “object IDs”, “paths”) that override or replace Reference at this layer.
Separation of logical semantics from implementation
- Treats physical layout, caching, chunking, and replication as internal concerns that do not alter the logical put/get behavior.
- Does not require clients to know about file paths, DB keys, or internal topologies for correctness.
Profile compatibility (if claimed)
- If the implementation claims compatibility with specific encoding profiles (e.g. ENC/ASL1-CORE v1) and hash families (HASH/ASL1), it actually implements them according to those specifications.
- Any additional surfaces (e.g. “multi-profile stores”, “multi-hash stores”) are documented as separate layers or profiles and do not violate the core semantics above.

Everything else — transport design, API shape, performance characteristics, distribution, and operational policies — lies outside ASL/1-STORE and may be specified by separate documents and implementation guides.

8. Evolution (Informative)

ASL/1-STORE is intended to evolve additively:

New encoding profiles (EncodingProfileIds) and hash algorithms (HashIds) can be introduced by ENC/ASL1-CORE and HASH/ASL1 without changing ASL/1-STORE.
New store-level profiles (e.g. “sharded store”, “append-only store”, “multi-profile store”) can be defined as long as they respect the core semantics of put/get.

ASL/1-STORE itself MUST NOT be changed in a way that:

alters the meaning of existing StoreConfig combinations; or
permits a conformant StoreInstance to associate two different Artifacts with the same Reference under the same configuration.

Such changes would be considered a new major surface (e.g. ASL/2-STORE), not an evolution of ASL/1-STORE.

This aligns with the broader Amduat principle:

Evolve by addition and explicit versioning; never rewrite identity or history.

Document History

0.4.0 (2025-11-16): Registered as Tier-1 spec and aligned to the Amduat 2.0 substrate baseline.

20 KiB Raw Blame History Unescape Escape