11 KiB
ENC-ASL-CORE-INDEX
Encoding Specification for ASL Core Index
1. Purpose
This document defines the exact encoding of ASL index segments and records for storage and interoperability.
It translates the semantic model of ASL/1-CORE-INDEX and store contracts of ASL-STORE-INDEX into a deterministic bytes-on-disk layout.
Variable-length digest requirements are defined in ASL/1-CORE-INDEX (tier1/asl-core-index.md).
This document incorporates the federation encoding addendum.
It is intended for:
- C libraries
- Tools
- API frontends
- Memory-mapped access
It does not define:
- Index semantics (see ASL/1-CORE-INDEX)
- Store lifecycle behavior (see ASL-STORE-INDEX)
- Acceleration semantics (see ASL/INDEX-ACCEL/1)
- TGK edge semantics or encodings (see
TGK/1andTGK/1-CORE) - Federation semantics (see federation/domain policy layers)
2. Encoding Principles
- Little-endian representation
- Fixed-width fields for deterministic access
- No pointers or references; all offsets are file-relative
- Packed structures; no compiler-introduced padding
- Forward compatibility via version field
- CRC or checksum protection for corruption detection
- Federation metadata embedded in index records for deterministic cross-domain replay
All multi-byte integers are little-endian unless explicitly noted.
3. Segment Layout
Each index segment file is laid out as follows:
+------------------+
| SegmentHeader |
+------------------+
| BloomFilter[] | (optional, opaque to semantics)
+------------------+
| IndexRecord[] |
+------------------+
| DigestBytes[] |
+------------------+
| ExtentRecord[] |
+------------------+
| SegmentFooter |
+------------------+
- SegmentHeader: fixed-size, mandatory
- BloomFilter: optional, opaque, segment-local
- IndexRecord[]: array of index entries
- DigestBytes[]: concatenated digest bytes referenced by IndexRecord
- ExtentRecord[]: concatenated extent lists referenced by IndexRecord
- SegmentFooter: fixed-size, mandatory
Offsets in the header define locations of Bloom filter and index records.
3.1 Fixed Constants and Sizes
Magic bytes (SegmentHeader.magic): ASLIDX03
- ASCII bytes:
0x41 0x53 0x4c 0x49 0x44 0x58 0x30 0x33 - Little-endian uint64 value:
0x33305844494c5341
Current encoding version: 3
Fixed struct sizes (bytes):
SegmentHeader: 112IndexRecord: 48ExtentRecord: 16SegmentFooter: 24
Section packing (no gaps):
records_offset = header_size + bloom_sizedigests_offset = records_offset + (record_count * sizeof(IndexRecord))extents_offset = digests_offset + digests_sizeSegmentFooterstarts atextents_offset + (extent_count * sizeof(ExtentRecord))
All offsets MUST be file-relative, 8-byte aligned, and point to their respective arrays exactly as above.
3.2 Federation Defaults
This encoding integrates federation metadata into segments and records.
Legacy segments without federation fields MUST be treated as:
segment_domain_id = localsegment_visibility = internaldomain_id = localvisibility = internalhas_cross_domain_source = 0cross_domain_source = 0
4. SegmentHeader
#pragma pack(push,1)
typedef struct {
uint64_t magic; // Unique magic number identifying segment file type
uint16_t version; // Encoding version
uint16_t shard_id; // Optional shard identifier
uint32_t header_size; // Total size of header including fields below
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
uint64_t snapshot_max; // Maximum snapshot ID
uint64_t record_count; // Number of index entries
uint64_t records_offset; // File offset of IndexRecord array
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
uint64_t digests_offset; // File offset of DigestBytes array
uint64_t digests_size; // Total size in bytes of DigestBytes
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
uint32_t segment_domain_id; // Domain owning this segment
uint8_t segment_visibility; // 0 = internal, 1 = published
uint8_t federation_version; // 0 if unused
uint16_t reserved0; // Reserved (must be 0)
uint64_t flags; // Segment flags (must be 0 in version 3)
} SegmentHeader;
#pragma pack(pop)
Notes:
magicensures the reader validates the segment type.versionallows forward-compatible extension.snapshot_min/snapshot_maxare reserved for future use and carry no visibility semantics in version 3.segment_domain_ididentifies the owning domain for all records in this segment.segment_visibilityMUST be the maximum visibility of all records in the segment.federation_versionMUST be0unless a future federation encoding version is defined.reserved0MUST be0.header_sizeMUST be112.flagsMUST be0. Readers MUST reject non-zero values.
5. IndexRecord
#pragma pack(push,1)
typedef struct {
uint32_t hash_id; // Hash algorithm identifier
uint16_t digest_len; // Digest length in bytes
uint16_t reserved0; // Reserved for alignment/future use
uint64_t digest_offset; // File offset of digest bytes for this entry
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
uint32_t total_length; // Total artifact length in bytes
uint32_t domain_id; // Domain identifier for this artifact
uint8_t visibility; // 0 = internal, 1 = published
uint8_t has_cross_domain_source; // 0 or 1
uint16_t reserved1; // Reserved (must be 0)
uint32_t cross_domain_source; // Source domain if imported (valid if has_cross_domain_source=1)
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
} IndexRecord;
#pragma pack(pop)
Notes:
hash_id+digest_len+digest_offsetstore the artifact key deterministically.digest_lenMUST be explicit in the encoding and MUST match the length implied byhash_idand StoreConfig.digest_offsetMUST be within[digests_offset, digests_offset + digests_size).extents_offsetreferences the first ExtentRecord for this entry.extent_countdefines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX intier1/asl-core-index.md).total_lengthis the exact artifact size in bytes.- Flags may indicate tombstone or other special status.
domain_idMUST be present and stable across replay.visibilityMUST be0or1.has_cross_domain_sourceMUST be0or1.cross_domain_sourceMUST be0whenhas_cross_domain_source=0.reserved0andreserved1MUST be0.
5.1 IndexRecord Flags
IDX_FLAG_TOMBSTONE = 0x00000001
- If
IDX_FLAG_TOMBSTONEis set, thenextent_count,total_length, andextents_offsetMUST be0. - All other bits are reserved and MUST be
0. Readers MUST reject unknown flag bits. - Tombstones MUST retain valid
domain_idandvisibilityto ensure domain-local shadowing.
6. ExtentRecord
#pragma pack(push,1)
typedef struct {
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of this extent
} ExtentRecord;
#pragma pack(pop)
Notes:
- Extents are concatenated in order to produce artifact bytes.
extent_countMUST be > 0 for visible (non-tombstone) entries.total_lengthMUST equal the sum oflengthacross the extents.offsetandlengthMUST describe a contiguous slice within the referenced block.
7. SegmentFooter
#pragma pack(push,1)
typedef struct {
uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
uint64_t seal_time_ns; // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)
Notes:
- CRC ensures corruption detection during reads, covering all segment contents except the footer.
- Seal information allows deterministic reconstruction of CURRENT state.
8. DigestBytes
- Digest bytes are concatenated in a single byte array.
- Each IndexRecord references its digest via
digest_offsetanddigest_len. - The digest bytes MUST be immutable once the segment is sealed.
9. Bloom Filter
- The bloom filter is optional and opaque to semantics.
- Its purpose is lookup acceleration.
- Must be deterministic: same entries → same bloom representation.
- Segment-local only; no global assumptions.
10. Versioning and Compatibility
versionfield in header defines encoding.- Readers must reject unsupported versions.
- New fields may be added in future versions only via version bump.
- Existing fields must never change meaning.
- Version
1implies single-extent layout (legacy). - Version
2introducesExtentRecordlists andextents_offset/extent_count. - Version
3introduces variable-length digest bytes withhash_idanddigest_offset. - Version
3also integrates federation metadata in segment headers and index records.
10.1 Federation Compatibility Rules
- Legacy segments without federation fields are treated as local/internal (see 3.2).
- Tombstones MUST NOT shadow artifacts from other domains; domain matching is required.
11. Alignment and Packing
- All structures are packed (no compiler padding)
- Multi-byte integers are little-endian
- Memory-mapped readers can directly index
IndexRecord[]usingrecords_offset. - Extents are accessed via
IndexRecord.extents_offsetrelative to the file base.
12. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
- Deterministic layout across platforms
- Direct mapping from semantic model (ArtifactKey → ArtifactLocation)
- Immutability of sealed segments
- Integrity validation via CRC
- Forward-compatible extensibility
13. Relationship to Other Layers
| Layer | Responsibility |
|---|---|
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
This completes the stack: semantics → store behavior → encoding.