# ENC-ASL-CORE-INDEX ### Encoding Specification for ASL Core Index --- ## 1. Purpose This document defines the **exact encoding of ASL index segments** and records for storage and interoperability. It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**. Variable-length digest requirements are defined in ASL/1-CORE-INDEX (`tier1/asl-core-index.md`). This document incorporates the federation encoding addendum. It is intended for: * C libraries * Tools * API frontends * Memory-mapped access It does **not** define: * Index semantics (see ASL/1-CORE-INDEX) * Store lifecycle behavior (see ASL-STORE-INDEX) * Acceleration semantics (see ASL/INDEX-ACCEL/1) * TGK edge semantics or encodings (see `TGK/1` and `TGK/1-CORE`) * Federation semantics (see federation/domain policy layers) --- ## 2. Encoding Principles 1. **Little-endian** representation 2. **Fixed-width fields** for deterministic access 3. **No pointers or references**; all offsets are file-relative 4. **Packed structures**; no compiler-introduced padding 5. **Forward compatibility** via version field 6. **CRC or checksum protection** for corruption detection 7. **Federation metadata** embedded in index records for deterministic cross-domain replay All multi-byte integers are little-endian unless explicitly noted. --- ## 3. Segment Layout Each index segment file is laid out as follows: ``` +------------------+ | SegmentHeader | +------------------+ | BloomFilter[] | (optional, opaque to semantics) +------------------+ | IndexRecord[] | +------------------+ | DigestBytes[] | +------------------+ | ExtentRecord[] | +------------------+ | SegmentFooter | +------------------+ ``` * **SegmentHeader**: fixed-size, mandatory * **BloomFilter**: optional, opaque, segment-local * **IndexRecord[]**: array of index entries * **DigestBytes[]**: concatenated digest bytes referenced by IndexRecord * **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord * **SegmentFooter**: fixed-size, mandatory Offsets in the header define locations of Bloom filter and index records. ### 3.1 Fixed Constants and Sizes **Magic bytes (SegmentHeader.magic):** `ASLIDX03` * ASCII bytes: `0x41 0x53 0x4c 0x49 0x44 0x58 0x30 0x33` * Little-endian uint64 value: `0x33305844494c5341` **Current encoding version:** `3` **Fixed struct sizes (bytes):** * `SegmentHeader`: 112 * `IndexRecord`: 48 * `ExtentRecord`: 16 * `SegmentFooter`: 24 **Section packing (no gaps):** * `records_offset = header_size + bloom_size` * `digests_offset = records_offset + (record_count * sizeof(IndexRecord))` * `extents_offset = digests_offset + digests_size` * `SegmentFooter` starts at `extents_offset + (extent_count * sizeof(ExtentRecord))` All offsets MUST be file-relative, 8-byte aligned, and point to their respective arrays exactly as above. ### 3.2 Federation Defaults This encoding integrates federation metadata into segments and records. Legacy segments without federation fields MUST be treated as: * `segment_domain_id = local` * `segment_visibility = internal` * `domain_id = local` * `visibility = internal` * `has_cross_domain_source = 0` * `cross_domain_source = 0` --- ## 4. SegmentHeader ```c #pragma pack(push,1) typedef struct { uint64_t magic; // Unique magic number identifying segment file type uint16_t version; // Encoding version uint16_t shard_id; // Optional shard identifier uint32_t header_size; // Total size of header including fields below uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid uint64_t snapshot_max; // Maximum snapshot ID uint64_t record_count; // Number of index entries uint64_t records_offset; // File offset of IndexRecord array uint64_t bloom_offset; // File offset of bloom filter (0 if none) uint64_t bloom_size; // Size of bloom filter (0 if none) uint64_t digests_offset; // File offset of DigestBytes array uint64_t digests_size; // Total size in bytes of DigestBytes uint64_t extents_offset; // File offset of ExtentRecord array uint64_t extent_count; // Total number of ExtentRecord entries uint32_t segment_domain_id; // Domain owning this segment uint8_t segment_visibility; // 0 = internal, 1 = published uint8_t federation_version; // 0 if unused uint16_t reserved0; // Reserved (must be 0) uint64_t flags; // Segment flags (must be 0 in version 3) } SegmentHeader; #pragma pack(pop) ``` **Notes:** * `magic` ensures the reader validates the segment type. * `version` allows forward-compatible extension. * `snapshot_min` / `snapshot_max` are reserved for future use and carry no visibility semantics in version 3. * `segment_domain_id` identifies the owning domain for all records in this segment. * `segment_visibility` MUST be the maximum visibility of all records in the segment. * `federation_version` MUST be `0` unless a future federation encoding version is defined. * `reserved0` MUST be `0`. * `header_size` MUST be `112`. * `flags` MUST be `0`. Readers MUST reject non-zero values. --- ## 5. IndexRecord ```c #pragma pack(push,1) typedef struct { uint32_t hash_id; // Hash algorithm identifier uint16_t digest_len; // Digest length in bytes uint16_t reserved0; // Reserved for alignment/future use uint64_t digest_offset; // File offset of digest bytes for this entry uint64_t extents_offset; // File offset of first ExtentRecord for this entry uint32_t extent_count; // Number of ExtentRecord entries for this artifact uint32_t total_length; // Total artifact length in bytes uint32_t domain_id; // Domain identifier for this artifact uint8_t visibility; // 0 = internal, 1 = published uint8_t has_cross_domain_source; // 0 or 1 uint16_t reserved1; // Reserved (must be 0) uint32_t cross_domain_source; // Source domain if imported (valid if has_cross_domain_source=1) uint32_t flags; // Optional flags (tombstone, reserved, etc.) } IndexRecord; #pragma pack(pop) ``` **Notes:** * `hash_id` + `digest_len` + `digest_offset` store the artifact key deterministically. * `digest_len` MUST be explicit in the encoding and MUST match the length implied by `hash_id` and StoreConfig. * `digest_offset` MUST be within `[digests_offset, digests_offset + digests_size)`. * `extents_offset` references the first ExtentRecord for this entry. * `extent_count` defines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX in `tier1/asl-core-index.md`). * `total_length` is the exact artifact size in bytes. * Flags may indicate tombstone or other special status. * `domain_id` MUST be present and stable across replay. * `visibility` MUST be `0` or `1`. * `has_cross_domain_source` MUST be `0` or `1`. * `cross_domain_source` MUST be `0` when `has_cross_domain_source=0`. * `reserved0` and `reserved1` MUST be `0`. ### 5.1 IndexRecord Flags ``` IDX_FLAG_TOMBSTONE = 0x00000001 ``` * If `IDX_FLAG_TOMBSTONE` is set, then `extent_count`, `total_length`, and `extents_offset` MUST be `0`. * All other bits are reserved and MUST be `0`. Readers MUST reject unknown flag bits. * Tombstones MUST retain valid `domain_id` and `visibility` to ensure domain-local shadowing. --- ## 6. ExtentRecord ```c #pragma pack(push,1) typedef struct { uint64_t block_id; // ASL block identifier uint32_t offset; // Offset within block uint32_t length; // Length of this extent } ExtentRecord; #pragma pack(pop) ``` **Notes:** * Extents are concatenated in order to produce artifact bytes. * `extent_count` MUST be > 0 for visible (non-tombstone) entries. * `total_length` MUST equal the sum of `length` across the extents. * `offset` and `length` MUST describe a contiguous slice within the referenced block. --- ## 7. SegmentFooter ```c #pragma pack(push,1) typedef struct { uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents uint64_t seal_snapshot; // Snapshot ID when segment was sealed uint64_t seal_time_ns; // High-resolution seal timestamp } SegmentFooter; #pragma pack(pop) ``` **Notes:** * CRC ensures corruption detection during reads, covering all segment contents except the footer. * Seal information allows deterministic reconstruction of CURRENT state. --- ## 8. DigestBytes * Digest bytes are concatenated in a single byte array. * Each IndexRecord references its digest via `digest_offset` and `digest_len`. * The digest bytes MUST be immutable once the segment is sealed. --- ## 9. Bloom Filter * The bloom filter is **optional** and opaque to semantics. * Its purpose is **lookup acceleration**. * Must be deterministic: same entries → same bloom representation. * Segment-local only; no global assumptions. --- ## 10. Versioning and Compatibility * `version` field in header defines encoding. * Readers must **reject unsupported versions**. * New fields may be added in future versions only via version bump. * Existing fields must **never change meaning**. * Version `1` implies single-extent layout (legacy). * Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`. * Version `3` introduces variable-length digest bytes with `hash_id` and `digest_offset`. * Version `3` also integrates federation metadata in segment headers and index records. ### 10.1 Federation Compatibility Rules * Legacy segments without federation fields are treated as local/internal (see 3.2). * Tombstones MUST NOT shadow artifacts from other domains; domain matching is required. --- ## 11. Alignment and Packing * All structures are **packed** (no compiler padding) * Multi-byte integers are **little-endian** * Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`. * Extents are accessed via `IndexRecord.extents_offset` relative to the file base. --- ## 12. Summary of Encoding Guarantees The ENC-ASL-CORE-INDEX specification ensures: 1. **Deterministic layout** across platforms 2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation) 3. **Immutability of sealed segments** 4. **Integrity validation** via CRC 5. **Forward-compatible extensibility** --- ## 13. Relationship to Other Layers | Layer | Responsibility | | ------------------ | ---------------------------------------------------------- | | ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping | | ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts | | ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) | | ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence | This completes the stack: **semantics → store behavior → encoding**.