288 lines
5.6 KiB
Markdown
288 lines
5.6 KiB
Markdown
|
|
Below is a **formal draft of ASL-INDEX-ACCEL**, written to lock down the acceleration model (filters, sharding, SIMD, routing keys) while *explicitly protecting semantics*. This document is intentionally crisp, normative, and future-proof.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# ASL-INDEX-ACCEL
|
|||
|
|
|
|||
|
|
### Index Acceleration, Routing, and Filtering Semantics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
ASL-INDEX-ACCEL defines the **acceleration mechanisms** used by ASL-based indexes, including:
|
|||
|
|
|
|||
|
|
* Routing keys
|
|||
|
|
* Sharding
|
|||
|
|
* Filters (Bloom, XOR, Ribbon, etc.)
|
|||
|
|
* SIMD execution
|
|||
|
|
* Hash recasting
|
|||
|
|
|
|||
|
|
This document **explicitly separates correctness from performance**.
|
|||
|
|
|
|||
|
|
> All mechanisms defined herein are **observationally invisible** to the semantic index defined by ASL-CORE-INDEX.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Scope
|
|||
|
|
|
|||
|
|
This specification applies to:
|
|||
|
|
|
|||
|
|
* Artifact indexes (ASL)
|
|||
|
|
* Projection and graph indexes (e.g., TGK)
|
|||
|
|
* Any index layered on ASL-CORE-INDEX semantics
|
|||
|
|
|
|||
|
|
It does **not** define:
|
|||
|
|
|
|||
|
|
* Artifact or edge identity
|
|||
|
|
* Snapshot semantics
|
|||
|
|
* Storage lifecycle
|
|||
|
|
* Encoding details (see ENC-ASL-CORE-INDEX)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Canonical Key vs Routing Key
|
|||
|
|
|
|||
|
|
### 3.1 Canonical Key
|
|||
|
|
|
|||
|
|
The **Canonical Key** uniquely identifies an indexable entity.
|
|||
|
|
|
|||
|
|
Examples:
|
|||
|
|
|
|||
|
|
* Artifact: `ArtifactKey`
|
|||
|
|
* TGK Edge: `CanonicalEdgeKey`
|
|||
|
|
|
|||
|
|
Properties:
|
|||
|
|
|
|||
|
|
* Defines semantic identity
|
|||
|
|
* Used for equality, shadowing, and tombstones
|
|||
|
|
* Stable and immutable
|
|||
|
|
* Fully compared on index match
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.2 Routing Key
|
|||
|
|
|
|||
|
|
The **Routing Key** is a **derived, advisory key** used exclusively for acceleration.
|
|||
|
|
|
|||
|
|
Properties:
|
|||
|
|
|
|||
|
|
* Derived deterministically from canonical key and optional attributes
|
|||
|
|
* May be used for:
|
|||
|
|
|
|||
|
|
* Sharding
|
|||
|
|
* Filter construction
|
|||
|
|
* SIMD-friendly layouts
|
|||
|
|
* MUST NOT affect index semantics
|
|||
|
|
* MUST be verified by full canonical key comparison on match
|
|||
|
|
|
|||
|
|
Formal rule:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
CanonicalKey determines correctness
|
|||
|
|
RoutingKey determines performance
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Filter Semantics
|
|||
|
|
|
|||
|
|
### 4.1 Advisory Nature
|
|||
|
|
|
|||
|
|
All filters are **advisory only**.
|
|||
|
|
|
|||
|
|
Rules:
|
|||
|
|
|
|||
|
|
* False positives are permitted
|
|||
|
|
* False negatives are forbidden
|
|||
|
|
* Filter behavior MUST NOT affect correctness
|
|||
|
|
|
|||
|
|
Formal invariant:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Filter miss ⇒ key is definitely absent
|
|||
|
|
Filter hit ⇒ key may be present
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.2 Filter Inputs
|
|||
|
|
|
|||
|
|
Filters operate over **Routing Keys**, not Canonical Keys.
|
|||
|
|
|
|||
|
|
A Routing Key MAY incorporate:
|
|||
|
|
|
|||
|
|
* Hash of Canonical Key
|
|||
|
|
* Artifact type tag (`type_tag`, `has_typetag`)
|
|||
|
|
* TGK edge type key
|
|||
|
|
* Direction, role, or other immutable classification attributes
|
|||
|
|
|
|||
|
|
Absence of optional attributes MUST be encoded explicitly.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.3 Filter Construction
|
|||
|
|
|
|||
|
|
* Filters are built only over **sealed, immutable segments**
|
|||
|
|
* Filters are immutable once built
|
|||
|
|
* Filter construction MUST be deterministic
|
|||
|
|
* Filter state MUST be covered by segment checksums
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Sharding Semantics
|
|||
|
|
|
|||
|
|
### 5.1 Observational Invisibility
|
|||
|
|
|
|||
|
|
Sharding is a **mechanical partitioning** of the index.
|
|||
|
|
|
|||
|
|
Invariant:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
LogicalIndex = ⋃ all shards
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Rules:
|
|||
|
|
|
|||
|
|
* Shards MUST NOT affect lookup results
|
|||
|
|
* Shard count and boundaries may change over time
|
|||
|
|
* Rebalancing MUST preserve lookup semantics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5.2 Shard Assignment
|
|||
|
|
|
|||
|
|
Shard assignment MAY be based on:
|
|||
|
|
|
|||
|
|
* Hash of Canonical Key
|
|||
|
|
* Routing Key
|
|||
|
|
* Composite routing strategies
|
|||
|
|
|
|||
|
|
Shard selection MUST be deterministic per snapshot.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Hashing and Hash Recasting
|
|||
|
|
|
|||
|
|
### 6.1 Hashing
|
|||
|
|
|
|||
|
|
Hashes MAY be used for:
|
|||
|
|
|
|||
|
|
* Routing
|
|||
|
|
* Filtering
|
|||
|
|
* SIMD layout
|
|||
|
|
|
|||
|
|
Hashes MUST NOT be treated as identity.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 6.2 Hash Recasting
|
|||
|
|
|
|||
|
|
Hash recasting (changing hash functions or seeds) is permitted if:
|
|||
|
|
|
|||
|
|
1. It is deterministic
|
|||
|
|
2. It does not change Canonical Keys
|
|||
|
|
3. It does not affect index semantics
|
|||
|
|
|
|||
|
|
Recasting is equivalent to rebuilding acceleration structures.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. SIMD Execution
|
|||
|
|
|
|||
|
|
SIMD operations MAY be used to:
|
|||
|
|
|
|||
|
|
* Evaluate filters
|
|||
|
|
* Compare routing keys
|
|||
|
|
* Accelerate scans
|
|||
|
|
|
|||
|
|
Rules:
|
|||
|
|
|
|||
|
|
* SIMD must operate only on immutable data
|
|||
|
|
* SIMD must not short-circuit semantic checks
|
|||
|
|
* SIMD must preserve deterministic behavior
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Multi-Dimensional Routing Examples (Normative)
|
|||
|
|
|
|||
|
|
### 8.1 Artifact Index
|
|||
|
|
|
|||
|
|
* Canonical Key: `ArtifactKey`
|
|||
|
|
* Routing Key components:
|
|||
|
|
|
|||
|
|
* `H(ArtifactKey)`
|
|||
|
|
* `type_tag` (if present)
|
|||
|
|
* `has_typetag`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8.2 TGK Edge Index
|
|||
|
|
|
|||
|
|
* Canonical Key: `CanonicalEdgeKey`
|
|||
|
|
* Routing Key components:
|
|||
|
|
|
|||
|
|
* `H(CanonicalEdgeKey)`
|
|||
|
|
* `edge_type_key`
|
|||
|
|
* Direction or role (optional)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Snapshot Interaction
|
|||
|
|
|
|||
|
|
Acceleration structures:
|
|||
|
|
|
|||
|
|
* MUST respect snapshot visibility rules
|
|||
|
|
* MUST operate over the same sealed segments visible to the snapshot
|
|||
|
|
* MUST NOT bypass tombstones or shadowing
|
|||
|
|
|
|||
|
|
Snapshot cuts apply **after** routing and filtering.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Normative Invariants
|
|||
|
|
|
|||
|
|
1. Canonical Keys define identity and correctness
|
|||
|
|
2. Routing Keys are advisory only
|
|||
|
|
3. Filters may never introduce false negatives
|
|||
|
|
4. Sharding is observationally invisible
|
|||
|
|
5. Hashes are not identity
|
|||
|
|
6. SIMD is an execution strategy, not a semantic construct
|
|||
|
|
7. All acceleration is deterministic per snapshot
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Non-Goals
|
|||
|
|
|
|||
|
|
This specification does not define:
|
|||
|
|
|
|||
|
|
* Specific filter algorithms
|
|||
|
|
* Memory layout
|
|||
|
|
* CPU instruction selection
|
|||
|
|
* Encoding formats
|
|||
|
|
* Federation policies
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Summary
|
|||
|
|
|
|||
|
|
ASL-INDEX-ACCEL establishes a **strict contract**:
|
|||
|
|
|
|||
|
|
> *All acceleration exists to make the index faster, never different.*
|
|||
|
|
|
|||
|
|
By formalizing Canonical vs Routing keys and explicitly constraining filters, sharding, hashing, and SIMD, this document ensures that:
|
|||
|
|
|
|||
|
|
* Performance optimizations scale to billions of entries
|
|||
|
|
* Semantics remain stable and provable
|
|||
|
|
* Future optimizations cannot silently break correctness
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
If you want, the **next natural step** would be to:
|
|||
|
|
|
|||
|
|
* Draft **TGK-INDEX** using this exact framework
|
|||
|
|
* Or revise **ASL-CORE-INDEX** to reference ASL-INDEX-ACCEL normatively
|
|||
|
|
|
|||
|
|
Both would lock the model completely.
|
|||
|
|
|