2026-01-17 11:18:00 +01:00
|
|
|
# ENC/ASL-CORE-INDEX/1 — Encoding Specification for ASL Core Index
|
|
|
|
|
|
|
|
|
|
Status: Draft
|
|
|
|
|
Owner: Niklas Rydberg
|
|
|
|
|
Version: 0.1.0
|
|
|
|
|
SoT: No
|
|
|
|
|
Last Updated: 2025-11-16
|
|
|
|
|
Linked Phase Pack: N/A
|
|
|
|
|
Tags: [encoding, index, deterministic]
|
|
|
|
|
|
|
|
|
|
<!-- Source: /amduat-api/tier1/enc-asl-core-index.md | Canonical: /amduat/tier1/enc-asl-core-index-1.md -->
|
|
|
|
|
|
|
|
|
|
**Document ID:** `ENC/ASL-CORE-INDEX/1`
|
|
|
|
|
**Layer:** Index Encoding Profile (on top of ASL/1-CORE-INDEX + ASL/STORE-INDEX/1)
|
|
|
|
|
|
|
|
|
|
**Depends on (normative):**
|
|
|
|
|
|
|
|
|
|
* `ASL/1-CORE-INDEX` — semantic index model
|
|
|
|
|
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
|
|
|
|
|
|
|
|
|
|
**Informative references:**
|
|
|
|
|
|
|
|
|
|
* `ASL/LOG/1` — append-only log semantics
|
|
|
|
|
|
|
|
|
|
© 2025 Niklas Rydberg.
|
|
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
|
|
|
|
Except where otherwise noted, this document (text and diagrams) is licensed under
|
|
|
|
|
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
|
|
|
|
|
|
|
|
|
|
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
|
|
|
|
|
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
|
|
|
|
|
Universal (CC0) to enable unrestricted reuse in implementations and derivative
|
|
|
|
|
specifications.
|
|
|
|
|
|
|
|
|
|
Code examples in this document are provided under the Apache License 2.0 unless
|
|
|
|
|
explicitly stated otherwise. Test vectors, where present, are dedicated to the
|
|
|
|
|
public domain under CC0 1.0.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 1. Purpose
|
|
|
|
|
|
|
|
|
|
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
|
|
|
|
|
|
|
|
|
|
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
|
|
|
|
|
Variable-length digest requirements are defined in ASL/1-CORE-INDEX (`tier1/asl-core-index.md`).
|
|
|
|
|
This document incorporates the federation encoding addendum.
|
|
|
|
|
|
|
|
|
|
It is intended for:
|
|
|
|
|
|
|
|
|
|
* C libraries
|
|
|
|
|
* Tools
|
|
|
|
|
* API frontends
|
|
|
|
|
* Memory-mapped access
|
|
|
|
|
|
|
|
|
|
It does **not** define:
|
|
|
|
|
|
|
|
|
|
* Index semantics (see ASL/1-CORE-INDEX)
|
|
|
|
|
* Store lifecycle behavior (see ASL-STORE-INDEX)
|
|
|
|
|
* Acceleration semantics (see ASL/INDEX-ACCEL/1)
|
|
|
|
|
* TGK edge semantics or encodings (see `TGK/1` and `TGK/1-CORE`)
|
|
|
|
|
* Federation semantics (see federation/domain policy layers)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 2. Encoding Principles
|
|
|
|
|
|
|
|
|
|
1. **Little-endian** representation
|
|
|
|
|
2. **Fixed-width fields** for deterministic access
|
|
|
|
|
3. **No pointers or references**; all offsets are file-relative
|
|
|
|
|
4. **Packed structures**; no compiler-introduced padding
|
|
|
|
|
5. **Forward compatibility** via version field
|
|
|
|
|
6. **CRC or checksum protection** for corruption detection
|
|
|
|
|
7. **Federation metadata** embedded in index records for deterministic cross-domain replay
|
|
|
|
|
|
|
|
|
|
All multi-byte integers are little-endian unless explicitly noted.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 3. Segment Layout
|
|
|
|
|
|
|
|
|
|
Each index segment file is laid out as follows:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
+------------------+
|
|
|
|
|
| SegmentHeader |
|
|
|
|
|
+------------------+
|
|
|
|
|
| BloomFilter[] | (optional, opaque to semantics)
|
|
|
|
|
+------------------+
|
|
|
|
|
| IndexRecord[] |
|
|
|
|
|
+------------------+
|
|
|
|
|
| DigestBytes[] |
|
|
|
|
|
+------------------+
|
|
|
|
|
| ExtentRecord[] |
|
|
|
|
|
+------------------+
|
|
|
|
|
| SegmentFooter |
|
|
|
|
|
+------------------+
|
|
|
|
|
```
|
|
|
|
|
|
2026-01-17 12:21:15 +01:00
|
|
|
Boxed sketch:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
┌───────────────────────┐
|
|
|
|
|
│ SegmentHeader │
|
|
|
|
|
├───────────────────────┤
|
|
|
|
|
│ BloomFilter[] (opt) │
|
|
|
|
|
├───────────────────────┤
|
|
|
|
|
│ IndexRecord[] │
|
|
|
|
|
├───────────────────────┤
|
|
|
|
|
│ DigestBytes[] │
|
|
|
|
|
├───────────────────────┤
|
|
|
|
|
│ ExtentRecord[] │
|
|
|
|
|
├───────────────────────┤
|
|
|
|
|
│ SegmentFooter │
|
|
|
|
|
└───────────────────────┘
|
|
|
|
|
```
|
|
|
|
|
|
2026-01-17 11:18:00 +01:00
|
|
|
* **SegmentHeader**: fixed-size, mandatory
|
|
|
|
|
* **BloomFilter**: optional, opaque, segment-local
|
|
|
|
|
* **IndexRecord[]**: array of index entries
|
|
|
|
|
* **DigestBytes[]**: concatenated digest bytes referenced by IndexRecord
|
|
|
|
|
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
|
|
|
|
|
* **SegmentFooter**: fixed-size, mandatory
|
|
|
|
|
|
|
|
|
|
Offsets in the header define locations of Bloom filter and index records.
|
|
|
|
|
|
|
|
|
|
### 3.1 Fixed Constants and Sizes
|
|
|
|
|
|
|
|
|
|
**Magic bytes (SegmentHeader.magic):** `ASLIDX03`
|
|
|
|
|
|
|
|
|
|
* ASCII bytes: `0x41 0x53 0x4c 0x49 0x44 0x58 0x30 0x33`
|
|
|
|
|
* Little-endian uint64 value: `0x33305844494c5341`
|
|
|
|
|
|
|
|
|
|
**Current encoding version:** `3`
|
|
|
|
|
|
|
|
|
|
**Fixed struct sizes (bytes):**
|
|
|
|
|
|
|
|
|
|
* `SegmentHeader`: 112
|
|
|
|
|
* `IndexRecord`: 48
|
|
|
|
|
* `ExtentRecord`: 16
|
|
|
|
|
* `SegmentFooter`: 24
|
|
|
|
|
|
|
|
|
|
**Section packing (no gaps):**
|
|
|
|
|
|
|
|
|
|
* `records_offset = header_size + bloom_size`
|
|
|
|
|
* `digests_offset = records_offset + (record_count * sizeof(IndexRecord))`
|
|
|
|
|
* `extents_offset = digests_offset + digests_size`
|
|
|
|
|
* `SegmentFooter` starts at `extents_offset + (extent_count * sizeof(ExtentRecord))`
|
|
|
|
|
|
|
|
|
|
All offsets MUST be file-relative, 8-byte aligned, and point to their respective arrays exactly as above.
|
|
|
|
|
|
|
|
|
|
### 3.2 Federation Defaults
|
|
|
|
|
|
|
|
|
|
This encoding integrates federation metadata into segments and records.
|
|
|
|
|
|
|
|
|
|
Legacy segments without federation fields MUST be treated as:
|
|
|
|
|
|
|
|
|
|
* `segment_domain_id = local`
|
|
|
|
|
* `segment_visibility = internal`
|
|
|
|
|
* `domain_id = local`
|
|
|
|
|
* `visibility = internal`
|
|
|
|
|
* `has_cross_domain_source = 0`
|
|
|
|
|
* `cross_domain_source = 0`
|
|
|
|
|
|
2026-01-17 11:46:57 +01:00
|
|
|
**Handling rules:**
|
|
|
|
|
|
|
|
|
|
* Encoders for version 3 MUST write explicit federation fields in both
|
|
|
|
|
`SegmentHeader` and `IndexRecord`; these fields are not optional in v3.
|
|
|
|
|
* Decoders MUST accept older versions that omit federation fields and apply the
|
|
|
|
|
defaults above.
|
|
|
|
|
* Decoders MUST reject v3 segments if federation fields are missing, malformed,
|
|
|
|
|
or contain out-of-range values (e.g., `visibility` not in {0,1} or
|
|
|
|
|
`has_cross_domain_source` not in {0,1}).
|
|
|
|
|
|
2026-01-17 11:18:00 +01:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 4. SegmentHeader
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
#pragma pack(push,1)
|
|
|
|
|
typedef struct {
|
|
|
|
|
uint64_t magic; // Unique magic number identifying segment file type
|
|
|
|
|
uint16_t version; // Encoding version
|
|
|
|
|
uint16_t shard_id; // Optional shard identifier
|
|
|
|
|
uint32_t header_size; // Total size of header including fields below
|
|
|
|
|
|
|
|
|
|
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
|
|
|
|
|
uint64_t snapshot_max; // Maximum snapshot ID
|
|
|
|
|
|
|
|
|
|
uint64_t record_count; // Number of index entries
|
|
|
|
|
uint64_t records_offset; // File offset of IndexRecord array
|
|
|
|
|
|
|
|
|
|
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
|
|
|
|
|
uint64_t bloom_size; // Size of bloom filter (0 if none)
|
|
|
|
|
|
|
|
|
|
uint64_t digests_offset; // File offset of DigestBytes array
|
|
|
|
|
uint64_t digests_size; // Total size in bytes of DigestBytes
|
|
|
|
|
|
|
|
|
|
uint64_t extents_offset; // File offset of ExtentRecord array
|
|
|
|
|
uint64_t extent_count; // Total number of ExtentRecord entries
|
|
|
|
|
|
|
|
|
|
uint32_t segment_domain_id; // Domain owning this segment
|
|
|
|
|
uint8_t segment_visibility; // 0 = internal, 1 = published
|
|
|
|
|
uint8_t federation_version; // 0 if unused
|
|
|
|
|
uint16_t reserved0; // Reserved (must be 0)
|
|
|
|
|
|
|
|
|
|
uint64_t flags; // Segment flags (must be 0 in version 3)
|
|
|
|
|
} SegmentHeader;
|
|
|
|
|
#pragma pack(pop)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Notes:**
|
|
|
|
|
|
|
|
|
|
* `magic` ensures the reader validates the segment type.
|
|
|
|
|
* `version` allows forward-compatible extension.
|
|
|
|
|
* `snapshot_min` / `snapshot_max` are reserved for future use and carry no visibility semantics in version 3.
|
|
|
|
|
* `segment_domain_id` identifies the owning domain for all records in this segment.
|
|
|
|
|
* `segment_visibility` MUST be the maximum visibility of all records in the segment.
|
|
|
|
|
* `federation_version` MUST be `0` unless a future federation encoding version is defined.
|
|
|
|
|
* `reserved0` MUST be `0`.
|
|
|
|
|
* `header_size` MUST be `112`.
|
|
|
|
|
* `flags` MUST be `0`. Readers MUST reject non-zero values.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 5. IndexRecord
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
#pragma pack(push,1)
|
|
|
|
|
typedef struct {
|
|
|
|
|
uint32_t hash_id; // Hash algorithm identifier
|
|
|
|
|
uint16_t digest_len; // Digest length in bytes
|
|
|
|
|
uint16_t reserved0; // Reserved for alignment/future use
|
|
|
|
|
uint64_t digest_offset; // File offset of digest bytes for this entry
|
|
|
|
|
|
|
|
|
|
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
|
|
|
|
|
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
|
|
|
|
|
uint32_t total_length; // Total artifact length in bytes
|
|
|
|
|
|
|
|
|
|
uint32_t domain_id; // Domain identifier for this artifact
|
|
|
|
|
uint8_t visibility; // 0 = internal, 1 = published
|
|
|
|
|
uint8_t has_cross_domain_source; // 0 or 1
|
|
|
|
|
uint16_t reserved1; // Reserved (must be 0)
|
|
|
|
|
|
|
|
|
|
uint32_t cross_domain_source; // Source domain if imported (valid if has_cross_domain_source=1)
|
|
|
|
|
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
|
|
|
|
|
} IndexRecord;
|
|
|
|
|
#pragma pack(pop)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Notes:**
|
|
|
|
|
|
|
|
|
|
* `hash_id` + `digest_len` + `digest_offset` store the artifact key deterministically.
|
|
|
|
|
* `digest_len` MUST be explicit in the encoding and MUST match the length implied by `hash_id` and StoreConfig.
|
|
|
|
|
* `digest_offset` MUST be within `[digests_offset, digests_offset + digests_size)`.
|
|
|
|
|
* `extents_offset` references the first ExtentRecord for this entry.
|
|
|
|
|
* `extent_count` defines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX in `tier1/asl-core-index.md`).
|
|
|
|
|
* `total_length` is the exact artifact size in bytes.
|
|
|
|
|
* Flags may indicate tombstone or other special status.
|
|
|
|
|
* `domain_id` MUST be present and stable across replay.
|
|
|
|
|
* `visibility` MUST be `0` or `1`.
|
|
|
|
|
* `has_cross_domain_source` MUST be `0` or `1`.
|
|
|
|
|
* `cross_domain_source` MUST be `0` when `has_cross_domain_source=0`.
|
|
|
|
|
* `reserved0` and `reserved1` MUST be `0`.
|
|
|
|
|
|
|
|
|
|
### 5.1 IndexRecord Flags
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
IDX_FLAG_TOMBSTONE = 0x00000001
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
* If `IDX_FLAG_TOMBSTONE` is set, then `extent_count`, `total_length`, and `extents_offset` MUST be `0`.
|
|
|
|
|
* All other bits are reserved and MUST be `0`. Readers MUST reject unknown flag bits.
|
|
|
|
|
* Tombstones MUST retain valid `domain_id` and `visibility` to ensure domain-local shadowing.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 6. ExtentRecord
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
#pragma pack(push,1)
|
|
|
|
|
typedef struct {
|
|
|
|
|
uint64_t block_id; // ASL block identifier
|
|
|
|
|
uint32_t offset; // Offset within block
|
|
|
|
|
uint32_t length; // Length of this extent
|
|
|
|
|
} ExtentRecord;
|
|
|
|
|
#pragma pack(pop)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Notes:**
|
|
|
|
|
|
|
|
|
|
* Extents are concatenated in order to produce artifact bytes.
|
|
|
|
|
* `extent_count` MUST be > 0 for visible (non-tombstone) entries.
|
|
|
|
|
* `total_length` MUST equal the sum of `length` across the extents.
|
|
|
|
|
* `offset` and `length` MUST describe a contiguous slice within the referenced block.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 7. SegmentFooter
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
#pragma pack(push,1)
|
|
|
|
|
typedef struct {
|
|
|
|
|
uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents
|
|
|
|
|
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
|
|
|
|
|
uint64_t seal_time_ns; // High-resolution seal timestamp
|
|
|
|
|
} SegmentFooter;
|
|
|
|
|
#pragma pack(pop)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Notes:**
|
|
|
|
|
|
|
|
|
|
* CRC ensures corruption detection during reads, covering all segment contents except the footer.
|
|
|
|
|
* Seal information allows deterministic reconstruction of CURRENT state.
|
|
|
|
|
|
2026-01-17 11:46:57 +01:00
|
|
|
**Implementation note:** The segment file bytes are hashed for log sealing as
|
|
|
|
|
defined in `ENC/ASL-LOG/1`. The hash covers the footer as written, so sealing
|
|
|
|
|
must occur after the footer is finalized.
|
|
|
|
|
|
2026-01-17 11:18:00 +01:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 8. DigestBytes
|
|
|
|
|
|
|
|
|
|
* Digest bytes are concatenated in a single byte array.
|
|
|
|
|
* Each IndexRecord references its digest via `digest_offset` and `digest_len`.
|
|
|
|
|
* The digest bytes MUST be immutable once the segment is sealed.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 9. Bloom Filter
|
|
|
|
|
|
|
|
|
|
* The bloom filter is **optional** and opaque to semantics.
|
|
|
|
|
* Its purpose is **lookup acceleration**.
|
|
|
|
|
* Must be deterministic: same entries → same bloom representation.
|
|
|
|
|
* Segment-local only; no global assumptions.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 10. Versioning and Compatibility
|
|
|
|
|
|
|
|
|
|
* `version` field in header defines encoding.
|
|
|
|
|
* Readers must **reject unsupported versions**.
|
|
|
|
|
* New fields may be added in future versions only via version bump.
|
|
|
|
|
* Existing fields must **never change meaning**.
|
|
|
|
|
* Version `1` implies single-extent layout (legacy).
|
|
|
|
|
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
|
|
|
|
|
* Version `3` introduces variable-length digest bytes with `hash_id` and `digest_offset`.
|
|
|
|
|
* Version `3` also integrates federation metadata in segment headers and index records.
|
|
|
|
|
|
|
|
|
|
### 10.1 Federation Compatibility Rules
|
|
|
|
|
|
|
|
|
|
* Legacy segments without federation fields are treated as local/internal (see 3.2).
|
|
|
|
|
* Tombstones MUST NOT shadow artifacts from other domains; domain matching is required.
|
|
|
|
|
|
2026-01-17 12:21:15 +01:00
|
|
|
### 10.2 Error Handling (Normative)
|
|
|
|
|
|
|
|
|
|
Readers MUST treat malformed segment files as invalid and MUST reject them.
|
|
|
|
|
Examples include (non-exhaustive):
|
|
|
|
|
|
|
|
|
|
* Incorrect magic/version/header size
|
|
|
|
|
* Offsets not aligned or not pointing to the expected arrays
|
|
|
|
|
* Out-of-range lengths or overflows in size calculations
|
|
|
|
|
* CRC mismatch for the segment payload
|
|
|
|
|
* Invalid federation fields or flag bits
|
|
|
|
|
|
|
|
|
|
Rejected segments MUST NOT be admitted for lookup or replay. Implementations MAY
|
|
|
|
|
surface diagnostic errors, but MUST NOT attempt partial salvage.
|
|
|
|
|
|
2026-01-17 11:18:00 +01:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 11. Alignment and Packing
|
|
|
|
|
|
|
|
|
|
* All structures are **packed** (no compiler padding)
|
|
|
|
|
* Multi-byte integers are **little-endian**
|
|
|
|
|
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
|
|
|
|
|
* Extents are accessed via `IndexRecord.extents_offset` relative to the file base.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 12. Summary of Encoding Guarantees
|
|
|
|
|
|
|
|
|
|
The ENC-ASL-CORE-INDEX specification ensures:
|
|
|
|
|
|
|
|
|
|
1. **Deterministic layout** across platforms
|
|
|
|
|
2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation)
|
|
|
|
|
3. **Immutability of sealed segments**
|
|
|
|
|
4. **Integrity validation** via CRC
|
|
|
|
|
5. **Forward-compatible extensibility**
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-01-17 12:21:15 +01:00
|
|
|
## 12.1 Error Mapping (Informative)
|
|
|
|
|
|
|
|
|
|
Decoding failures (invalid magic/version, malformed offsets, CRC mismatch,
|
|
|
|
|
invalid federation fields) MUST be surfaced to callers as decode errors. The
|
|
|
|
|
exact error codes are implementation-specific; examples include
|
|
|
|
|
`ERR_ASL_INDEX_ENC_INVALID`, `ERR_ASL_INDEX_CRC_MISMATCH`, or a generic
|
|
|
|
|
`ERR_INTEGRITY`. Encoders/decoders MUST NOT treat malformed segments as valid
|
|
|
|
|
or partially recoverable.
|
|
|
|
|
|
2026-01-17 11:18:00 +01:00
|
|
|
## 13. Relationship to Other Layers
|
|
|
|
|
|
|
|
|
|
| Layer | Responsibility |
|
|
|
|
|
| ------------------ | ---------------------------------------------------------- |
|
|
|
|
|
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
|
|
|
|
|
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
|
|
|
|
|
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
|
|
|
|
|
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
|
|
|
|
|
|
|
|
|
|
This completes the stack: **semantics → store behavior → encoding**.
|