5.6 KiB
Below is a formal draft of ASL-INDEX-ACCEL, written to lock down the acceleration model (filters, sharding, SIMD, routing keys) while explicitly protecting semantics. This document is intentionally crisp, normative, and future-proof.
ASL-INDEX-ACCEL
Index Acceleration, Routing, and Filtering Semantics
1. Purpose
ASL-INDEX-ACCEL defines the acceleration mechanisms used by ASL-based indexes, including:
- Routing keys
- Sharding
- Filters (Bloom, XOR, Ribbon, etc.)
- SIMD execution
- Hash recasting
This document explicitly separates correctness from performance.
All mechanisms defined herein are observationally invisible to the semantic index defined by ASL-CORE-INDEX.
2. Scope
This specification applies to:
- Artifact indexes (ASL)
- Projection and graph indexes (e.g., TGK)
- Any index layered on ASL-CORE-INDEX semantics
It does not define:
- Artifact or edge identity
- Snapshot semantics
- Storage lifecycle
- Encoding details (see ENC-ASL-CORE-INDEX at
tier1/enc-asl-core-index.md)
3. Canonical Key vs Routing Key
3.1 Canonical Key
The Canonical Key uniquely identifies an indexable entity.
Examples:
- Artifact:
ArtifactKey - TGK Edge:
CanonicalEdgeKey
Properties:
- Defines semantic identity
- Used for equality, shadowing, and tombstones
- Stable and immutable
- Fully compared on index match
3.2 Routing Key
The Routing Key is a derived, advisory key used exclusively for acceleration.
Properties:
-
Derived deterministically from canonical key and optional attributes
-
May be used for:
- Sharding
- Filter construction
- SIMD-friendly layouts
-
MUST NOT affect index semantics
-
MUST be verified by full canonical key comparison on match
Formal rule:
CanonicalKey determines correctness
RoutingKey determines performance
4. Filter Semantics
4.1 Advisory Nature
All filters are advisory only.
Rules:
- False positives are permitted
- False negatives are forbidden
- Filter behavior MUST NOT affect correctness
Formal invariant:
Filter miss ⇒ key is definitely absent
Filter hit ⇒ key may be present
4.2 Filter Inputs
Filters operate over Routing Keys, not Canonical Keys.
A Routing Key MAY incorporate:
- Hash of Canonical Key
- Artifact type tag (
type_tag,has_typetag) - TGK edge type key
- Direction, role, or other immutable classification attributes
Absence of optional attributes MUST be encoded explicitly.
4.3 Filter Construction
- Filters are built only over sealed, immutable segments
- Filters are immutable once built
- Filter construction MUST be deterministic
- Filter state MUST be covered by segment checksums
5. Sharding Semantics
5.1 Observational Invisibility
Sharding is a mechanical partitioning of the index.
Invariant:
LogicalIndex = ⋃ all shards
Rules:
- Shards MUST NOT affect lookup results
- Shard count and boundaries may change over time
- Rebalancing MUST preserve lookup semantics
5.2 Shard Assignment
Shard assignment MAY be based on:
- Hash of Canonical Key
- Routing Key
- Composite routing strategies
Shard selection MUST be deterministic per snapshot.
6. Hashing and Hash Recasting
6.1 Hashing
Hashes MAY be used for:
- Routing
- Filtering
- SIMD layout
Hashes MUST NOT be treated as identity.
6.2 Hash Recasting
Hash recasting (changing hash functions or seeds) is permitted if:
- It is deterministic
- It does not change Canonical Keys
- It does not affect index semantics
Recasting is equivalent to rebuilding acceleration structures.
7. SIMD Execution
SIMD operations MAY be used to:
- Evaluate filters
- Compare routing keys
- Accelerate scans
Rules:
- SIMD must operate only on immutable data
- SIMD must not short-circuit semantic checks
- SIMD must preserve deterministic behavior
8. Multi-Dimensional Routing Examples (Normative)
8.1 Artifact Index
-
Canonical Key:
ArtifactKey -
Routing Key components:
H(ArtifactKey)type_tag(if present)has_typetag
8.2 TGK Edge Index
-
Canonical Key:
CanonicalEdgeKey -
Routing Key components:
H(CanonicalEdgeKey)edge_type_key- Direction or role (optional)
9. Snapshot Interaction
Acceleration structures:
- MUST respect snapshot visibility rules
- MUST operate over the same sealed segments visible to the snapshot
- MUST NOT bypass tombstones or shadowing
Snapshot cuts apply after routing and filtering.
10. Normative Invariants
- Canonical Keys define identity and correctness
- Routing Keys are advisory only
- Filters may never introduce false negatives
- Sharding is observationally invisible
- Hashes are not identity
- SIMD is an execution strategy, not a semantic construct
- All acceleration is deterministic per snapshot
11. Non-Goals
This specification does not define:
- Specific filter algorithms
- Memory layout
- CPU instruction selection
- Encoding formats
- Federation policies
12. Summary
ASL-INDEX-ACCEL establishes a strict contract:
All acceleration exists to make the index faster, never different.
By formalizing Canonical vs Routing keys and explicitly constraining filters, sharding, hashing, and SIMD, this document ensures that:
- Performance optimizations scale to billions of entries
- Semantics remain stable and provable
- Future optimizations cannot silently break correctness
If you want, the next natural step would be to:
- Draft TGK-INDEX using this exact framework
- Or revise ASL-CORE-INDEX to reference ASL-INDEX-ACCEL normatively
Both would lock the model completely.