amduat-api/notes/ASL-INDEX-ACCEL.md

288 lines
5.6 KiB
Markdown
Raw Normal View History

Below is a **formal draft of ASL-INDEX-ACCEL**, written to lock down the acceleration model (filters, sharding, SIMD, routing keys) while *explicitly protecting semantics*. This document is intentionally crisp, normative, and future-proof.
---
# ASL-INDEX-ACCEL
### Index Acceleration, Routing, and Filtering Semantics
---
## 1. Purpose
ASL-INDEX-ACCEL defines the **acceleration mechanisms** used by ASL-based indexes, including:
* Routing keys
* Sharding
* Filters (Bloom, XOR, Ribbon, etc.)
* SIMD execution
* Hash recasting
This document **explicitly separates correctness from performance**.
> All mechanisms defined herein are **observationally invisible** to the semantic index defined by ASL-CORE-INDEX.
---
## 2. Scope
This specification applies to:
* Artifact indexes (ASL)
* Projection and graph indexes (e.g., TGK)
* Any index layered on ASL-CORE-INDEX semantics
It does **not** define:
* Artifact or edge identity
* Snapshot semantics
* Storage lifecycle
* Encoding details (see ENC-ASL-CORE-INDEX)
---
## 3. Canonical Key vs Routing Key
### 3.1 Canonical Key
The **Canonical Key** uniquely identifies an indexable entity.
Examples:
* Artifact: `ArtifactKey`
* TGK Edge: `CanonicalEdgeKey`
Properties:
* Defines semantic identity
* Used for equality, shadowing, and tombstones
* Stable and immutable
* Fully compared on index match
---
### 3.2 Routing Key
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
Properties:
* Derived deterministically from canonical key and optional attributes
* May be used for:
* Sharding
* Filter construction
* SIMD-friendly layouts
* MUST NOT affect index semantics
* MUST be verified by full canonical key comparison on match
Formal rule:
```
CanonicalKey determines correctness
RoutingKey determines performance
```
---
## 4. Filter Semantics
### 4.1 Advisory Nature
All filters are **advisory only**.
Rules:
* False positives are permitted
* False negatives are forbidden
* Filter behavior MUST NOT affect correctness
Formal invariant:
```
Filter miss ⇒ key is definitely absent
Filter hit ⇒ key may be present
```
---
### 4.2 Filter Inputs
Filters operate over **Routing Keys**, not Canonical Keys.
A Routing Key MAY incorporate:
* Hash of Canonical Key
* Artifact type tag (`type_tag`, `has_typetag`)
* TGK edge type key
* Direction, role, or other immutable classification attributes
Absence of optional attributes MUST be encoded explicitly.
---
### 4.3 Filter Construction
* Filters are built only over **sealed, immutable segments**
* Filters are immutable once built
* Filter construction MUST be deterministic
* Filter state MUST be covered by segment checksums
---
## 5. Sharding Semantics
### 5.1 Observational Invisibility
Sharding is a **mechanical partitioning** of the index.
Invariant:
```
LogicalIndex = all shards
```
Rules:
* Shards MUST NOT affect lookup results
* Shard count and boundaries may change over time
* Rebalancing MUST preserve lookup semantics
---
### 5.2 Shard Assignment
Shard assignment MAY be based on:
* Hash of Canonical Key
* Routing Key
* Composite routing strategies
Shard selection MUST be deterministic per snapshot.
---
## 6. Hashing and Hash Recasting
### 6.1 Hashing
Hashes MAY be used for:
* Routing
* Filtering
* SIMD layout
Hashes MUST NOT be treated as identity.
---
### 6.2 Hash Recasting
Hash recasting (changing hash functions or seeds) is permitted if:
1. It is deterministic
2. It does not change Canonical Keys
3. It does not affect index semantics
Recasting is equivalent to rebuilding acceleration structures.
---
## 7. SIMD Execution
SIMD operations MAY be used to:
* Evaluate filters
* Compare routing keys
* Accelerate scans
Rules:
* SIMD must operate only on immutable data
* SIMD must not short-circuit semantic checks
* SIMD must preserve deterministic behavior
---
## 8. Multi-Dimensional Routing Examples (Normative)
### 8.1 Artifact Index
* Canonical Key: `ArtifactKey`
* Routing Key components:
* `H(ArtifactKey)`
* `type_tag` (if present)
* `has_typetag`
---
### 8.2 TGK Edge Index
* Canonical Key: `CanonicalEdgeKey`
* Routing Key components:
* `H(CanonicalEdgeKey)`
* `edge_type_key`
* Direction or role (optional)
---
## 9. Snapshot Interaction
Acceleration structures:
* MUST respect snapshot visibility rules
* MUST operate over the same sealed segments visible to the snapshot
* MUST NOT bypass tombstones or shadowing
Snapshot cuts apply **after** routing and filtering.
---
## 10. Normative Invariants
1. Canonical Keys define identity and correctness
2. Routing Keys are advisory only
3. Filters may never introduce false negatives
4. Sharding is observationally invisible
5. Hashes are not identity
6. SIMD is an execution strategy, not a semantic construct
7. All acceleration is deterministic per snapshot
---
## 11. Non-Goals
This specification does not define:
* Specific filter algorithms
* Memory layout
* CPU instruction selection
* Encoding formats
* Federation policies
---
## 12. Summary
ASL-INDEX-ACCEL establishes a **strict contract**:
> *All acceleration exists to make the index faster, never different.*
By formalizing Canonical vs Routing keys and explicitly constraining filters, sharding, hashing, and SIMD, this document ensures that:
* Performance optimizations scale to billions of entries
* Semantics remain stable and provable
* Future optimizations cannot silently break correctness
---
If you want, the **next natural step** would be to:
* Draft **TGK-INDEX** using this exact framework
* Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively
Both would lock the model completely.