6.5 KiB
6.5 KiB
ENC-ASL-CORE-INDEX
Encoding Specification for ASL Core Index
1. Purpose
This document defines the exact encoding of ASL index segments and records for storage and interoperability.
It translates the semantic model of ASL/1-CORE-INDEX and store contracts of ASL-STORE-INDEX into a deterministic bytes-on-disk layout.
It is intended for:
- C libraries
- Tools
- API frontends
- Memory-mapped access
It does not define:
- Index semantics (see ASL/1-CORE-INDEX)
- Store lifecycle behavior (see ASL-STORE-INDEX)
- Acceleration semantics (see ASL/INDEX-ACCEL/1)
2. Encoding Principles
- Little-endian representation
- Fixed-width fields for deterministic access
- No pointers or references; all offsets are file-relative
- Packed structures; no compiler-introduced padding
- Forward compatibility via version field
- CRC or checksum protection for corruption detection
All multi-byte integers are little-endian unless explicitly noted.
3. Segment Layout
Each index segment file is laid out as follows:
+------------------+
| SegmentHeader |
+------------------+
| BloomFilter[] | (optional, opaque to semantics)
+------------------+
| IndexRecord[] |
+------------------+
| ExtentRecord[] |
+------------------+
| SegmentFooter |
+------------------+
- SegmentHeader: fixed-size, mandatory
- BloomFilter: optional, opaque, segment-local
- IndexRecord[]: array of index entries
- ExtentRecord[]: concatenated extent lists referenced by IndexRecord
- SegmentFooter: fixed-size, mandatory
Offsets in the header define locations of Bloom filter and index records.
4. SegmentHeader
#pragma pack(push,1)
typedef struct {
uint64_t magic; // Unique magic number identifying segment file type
uint16_t version; // Encoding version
uint16_t shard_id; // Optional shard identifier
uint32_t header_size; // Total size of header including fields below
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
uint64_t snapshot_max; // Maximum snapshot ID
uint64_t record_count; // Number of index entries
uint64_t records_offset; // File offset of IndexRecord array
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
uint64_t flags; // Reserved for future use
} SegmentHeader;
#pragma pack(pop)
Notes:
magicensures the reader validates the segment type.versionallows forward-compatible extension.snapshot_min/snapshot_maxdefine visibility semantics.
5. IndexRecord
#pragma pack(push,1)
typedef struct {
uint64_t hash_hi; // High 64 bits of artifact hash
uint64_t hash_mid; // Middle 64 bits
uint64_t hash_lo; // Low 64 bits
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
uint32_t total_length; // Total artifact length in bytes
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
uint32_t reserved; // Reserved for alignment/future use
} IndexRecord;
#pragma pack(pop)
Notes:
hash_*fields store the artifact key deterministically.extents_offsetreferences the first ExtentRecord for this entry.extent_countdefines how many extents to read (may be 0 for tombstones).total_lengthis the exact artifact size in bytes.- Flags may indicate tombstone or other special status.
6. ExtentRecord
#pragma pack(push,1)
typedef struct {
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of this extent
} ExtentRecord;
#pragma pack(pop)
Notes:
- Extents are concatenated in order to produce artifact bytes.
extent_countMUST be > 0 for visible (non-tombstone) entries.total_lengthMUST equal the sum oflengthacross the extents.
7. SegmentFooter
#pragma pack(push,1)
typedef struct {
uint64_t crc64; // CRC over header + records + bloom filter
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
uint64_t seal_time_ns; // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)
Notes:
- CRC ensures corruption detection during reads.
- Seal information allows deterministic reconstruction of CURRENT state.
8. Bloom Filter
- The bloom filter is optional and opaque to semantics.
- Its purpose is lookup acceleration.
- Must be deterministic: same entries → same bloom representation.
- Segment-local only; no global assumptions.
9. Versioning and Compatibility
versionfield in header defines encoding.- Readers must reject unsupported versions.
- New fields may be added in future versions only via version bump.
- Existing fields must never change meaning.
- Version
1implies single-extent layout (legacy). - Version
2introducesExtentRecordlists andextents_offset/extent_count.
10. Alignment and Packing
- All structures are packed (no compiler padding)
- Multi-byte integers are little-endian
- Memory-mapped readers can directly index
IndexRecord[]usingrecords_offset. - Extents are accessed via
IndexRecord.extents_offsetrelative to the file base.
11. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
- Deterministic layout across platforms
- Direct mapping from semantic model (ArtifactKey → ArtifactLocation)
- Immutability of sealed segments
- Integrity validation via CRC
- Forward-compatible extensibility
12. Relationship to Other Layers
| Layer | Responsibility |
|---|---|
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
This completes the stack: semantics → store behavior → encoding.