287 lines
5.6 KiB
Markdown
287 lines
5.6 KiB
Markdown
Below is a **formal draft of ASL-INDEX-ACCEL**, written to lock down the acceleration model (filters, sharding, SIMD, routing keys) while *explicitly protecting semantics*. This document is intentionally crisp, normative, and future-proof.
|
||
|
||
---
|
||
|
||
# ASL-INDEX-ACCEL
|
||
|
||
### Index Acceleration, Routing, and Filtering Semantics
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
ASL-INDEX-ACCEL defines the **acceleration mechanisms** used by ASL-based indexes, including:
|
||
|
||
* Routing keys
|
||
* Sharding
|
||
* Filters (Bloom, XOR, Ribbon, etc.)
|
||
* SIMD execution
|
||
* Hash recasting
|
||
|
||
This document **explicitly separates correctness from performance**.
|
||
|
||
> All mechanisms defined herein are **observationally invisible** to the semantic index defined by ASL-CORE-INDEX.
|
||
|
||
---
|
||
|
||
## 2. Scope
|
||
|
||
This specification applies to:
|
||
|
||
* Artifact indexes (ASL)
|
||
* Projection and graph indexes (e.g., TGK)
|
||
* Any index layered on ASL-CORE-INDEX semantics
|
||
|
||
It does **not** define:
|
||
|
||
* Artifact or edge identity
|
||
* Snapshot semantics
|
||
* Storage lifecycle
|
||
* Encoding details (see ENC-ASL-CORE-INDEX at `tier1/enc-asl-core-index.md`)
|
||
|
||
---
|
||
|
||
## 3. Canonical Key vs Routing Key
|
||
|
||
### 3.1 Canonical Key
|
||
|
||
The **Canonical Key** uniquely identifies an indexable entity.
|
||
|
||
Examples:
|
||
|
||
* Artifact: `ArtifactKey`
|
||
* TGK Edge: `CanonicalEdgeKey`
|
||
|
||
Properties:
|
||
|
||
* Defines semantic identity
|
||
* Used for equality, shadowing, and tombstones
|
||
* Stable and immutable
|
||
* Fully compared on index match
|
||
|
||
---
|
||
|
||
### 3.2 Routing Key
|
||
|
||
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
|
||
|
||
Properties:
|
||
|
||
* Derived deterministically from canonical key and optional attributes
|
||
* May be used for:
|
||
|
||
* Sharding
|
||
* Filter construction
|
||
* SIMD-friendly layouts
|
||
* MUST NOT affect index semantics
|
||
* MUST be verified by full canonical key comparison on match
|
||
|
||
Formal rule:
|
||
|
||
```
|
||
CanonicalKey determines correctness
|
||
RoutingKey determines performance
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Filter Semantics
|
||
|
||
### 4.1 Advisory Nature
|
||
|
||
All filters are **advisory only**.
|
||
|
||
Rules:
|
||
|
||
* False positives are permitted
|
||
* False negatives are forbidden
|
||
* Filter behavior MUST NOT affect correctness
|
||
|
||
Formal invariant:
|
||
|
||
```
|
||
Filter miss ⇒ key is definitely absent
|
||
Filter hit ⇒ key may be present
|
||
```
|
||
|
||
---
|
||
|
||
### 4.2 Filter Inputs
|
||
|
||
Filters operate over **Routing Keys**, not Canonical Keys.
|
||
|
||
A Routing Key MAY incorporate:
|
||
|
||
* Hash of Canonical Key
|
||
* Artifact type tag (`type_tag`, `has_typetag`)
|
||
* TGK edge type key
|
||
* Direction, role, or other immutable classification attributes
|
||
|
||
Absence of optional attributes MUST be encoded explicitly.
|
||
|
||
---
|
||
|
||
### 4.3 Filter Construction
|
||
|
||
* Filters are built only over **sealed, immutable segments**
|
||
* Filters are immutable once built
|
||
* Filter construction MUST be deterministic
|
||
* Filter state MUST be covered by segment checksums
|
||
|
||
---
|
||
|
||
## 5. Sharding Semantics
|
||
|
||
### 5.1 Observational Invisibility
|
||
|
||
Sharding is a **mechanical partitioning** of the index.
|
||
|
||
Invariant:
|
||
|
||
```
|
||
LogicalIndex = ⋃ all shards
|
||
```
|
||
|
||
Rules:
|
||
|
||
* Shards MUST NOT affect lookup results
|
||
* Shard count and boundaries may change over time
|
||
* Rebalancing MUST preserve lookup semantics
|
||
|
||
---
|
||
|
||
### 5.2 Shard Assignment
|
||
|
||
Shard assignment MAY be based on:
|
||
|
||
* Hash of Canonical Key
|
||
* Routing Key
|
||
* Composite routing strategies
|
||
|
||
Shard selection MUST be deterministic per snapshot.
|
||
|
||
---
|
||
|
||
## 6. Hashing and Hash Recasting
|
||
|
||
### 6.1 Hashing
|
||
|
||
Hashes MAY be used for:
|
||
|
||
* Routing
|
||
* Filtering
|
||
* SIMD layout
|
||
|
||
Hashes MUST NOT be treated as identity.
|
||
|
||
---
|
||
|
||
### 6.2 Hash Recasting
|
||
|
||
Hash recasting (changing hash functions or seeds) is permitted if:
|
||
|
||
1. It is deterministic
|
||
2. It does not change Canonical Keys
|
||
3. It does not affect index semantics
|
||
|
||
Recasting is equivalent to rebuilding acceleration structures.
|
||
|
||
---
|
||
|
||
## 7. SIMD Execution
|
||
|
||
SIMD operations MAY be used to:
|
||
|
||
* Evaluate filters
|
||
* Compare routing keys
|
||
* Accelerate scans
|
||
|
||
Rules:
|
||
|
||
* SIMD must operate only on immutable data
|
||
* SIMD must not short-circuit semantic checks
|
||
* SIMD must preserve deterministic behavior
|
||
|
||
---
|
||
|
||
## 8. Multi-Dimensional Routing Examples (Normative)
|
||
|
||
### 8.1 Artifact Index
|
||
|
||
* Canonical Key: `ArtifactKey`
|
||
* Routing Key components:
|
||
|
||
* `H(ArtifactKey)`
|
||
* `type_tag` (if present)
|
||
* `has_typetag`
|
||
|
||
---
|
||
|
||
### 8.2 TGK Edge Index
|
||
|
||
* Canonical Key: `CanonicalEdgeKey`
|
||
* Routing Key components:
|
||
|
||
* `H(CanonicalEdgeKey)`
|
||
* `edge_type_key`
|
||
* Direction or role (optional)
|
||
|
||
---
|
||
|
||
## 9. Snapshot Interaction
|
||
|
||
Acceleration structures:
|
||
|
||
* MUST respect snapshot visibility rules
|
||
* MUST operate over the same sealed segments visible to the snapshot
|
||
* MUST NOT bypass tombstones or shadowing
|
||
|
||
Snapshot cuts apply **after** routing and filtering.
|
||
|
||
---
|
||
|
||
## 10. Normative Invariants
|
||
|
||
1. Canonical Keys define identity and correctness
|
||
2. Routing Keys are advisory only
|
||
3. Filters may never introduce false negatives
|
||
4. Sharding is observationally invisible
|
||
5. Hashes are not identity
|
||
6. SIMD is an execution strategy, not a semantic construct
|
||
7. All acceleration is deterministic per snapshot
|
||
|
||
---
|
||
|
||
## 11. Non-Goals
|
||
|
||
This specification does not define:
|
||
|
||
* Specific filter algorithms
|
||
* Memory layout
|
||
* CPU instruction selection
|
||
* Encoding formats
|
||
* Federation policies
|
||
|
||
---
|
||
|
||
## 12. Summary
|
||
|
||
ASL-INDEX-ACCEL establishes a **strict contract**:
|
||
|
||
> *All acceleration exists to make the index faster, never different.*
|
||
|
||
By formalizing Canonical vs Routing keys and explicitly constraining filters, sharding, hashing, and SIMD, this document ensures that:
|
||
|
||
* Performance optimizations scale to billions of entries
|
||
* Semantics remain stable and provable
|
||
* Future optimizations cannot silently break correctness
|
||
|
||
---
|
||
|
||
If you want, the **next natural step** would be to:
|
||
|
||
* Draft **TGK-INDEX** using this exact framework
|
||
* Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively
|
||
|
||
Both would lock the model completely.
|