amduat-api/notes/enc-asl-core-index.md
2026-01-17 00:19:49 +01:00

5.3 KiB

ENC-ASL-CORE-INDEX

Encoding Specification for ASL Core Index


1. Purpose

This document defines the exact encoding of ASL index segments and records for storage and interoperability.

It translates the semantic model of ASL-CORE-INDEX and store contracts of ASL-STORE-INDEX into a deterministic bytes-on-disk layout.

It is intended for:

  • C libraries
  • Tools
  • API frontends
  • Memory-mapped access

It does not define:

  • Index semantics (see ASL-CORE-INDEX)
  • Store lifecycle behavior (see ASL-STORE-INDEX)

2. Encoding Principles

  1. Little-endian representation
  2. Fixed-width fields for deterministic access
  3. No pointers or references; all offsets are file-relative
  4. Packed structures; no compiler-introduced padding
  5. Forward compatibility via version field
  6. CRC or checksum protection for corruption detection

All multi-byte integers are little-endian unless explicitly noted.


3. Segment Layout

Each index segment file is laid out as follows:

+------------------+
| SegmentHeader    |
+------------------+
| BloomFilter[]    | (optional, opaque to semantics)
+------------------+
| IndexRecord[]    |
+------------------+
| SegmentFooter    |
+------------------+
  • SegmentHeader: fixed-size, mandatory
  • BloomFilter: optional, opaque, segment-local
  • IndexRecord[]: array of index entries
  • SegmentFooter: fixed-size, mandatory

Offsets in the header define locations of Bloom filter and index records.


4. SegmentHeader

#pragma pack(push,1)
typedef struct {
    uint64_t magic;           // Unique magic number identifying segment file type
    uint16_t version;         // Encoding version
    uint16_t shard_id;        // Optional shard identifier
    uint32_t header_size;     // Total size of header including fields below

    uint64_t snapshot_min;    // Minimum snapshot ID for which segment entries are valid
    uint64_t snapshot_max;    // Maximum snapshot ID

    uint64_t record_count;    // Number of index entries
    uint64_t records_offset;  // File offset of IndexRecord array

    uint64_t bloom_offset;    // File offset of bloom filter (0 if none)
    uint64_t bloom_size;      // Size of bloom filter (0 if none)

    uint64_t flags;           // Reserved for future use
} SegmentHeader;
#pragma pack(pop)

Notes:

  • magic ensures the reader validates the segment type.
  • version allows forward-compatible extension.
  • snapshot_min / snapshot_max define visibility semantics.

5. IndexRecord

#pragma pack(push,1)
typedef struct {
    uint64_t hash_hi;     // High 64 bits of artifact hash
    uint64_t hash_mid;    // Middle 64 bits
    uint64_t hash_lo;     // Low 64 bits
    uint32_t hash_tail;   // Optional tail for full hash if larger than 192 bits

    uint64_t block_id;    // ASL block identifier
    uint32_t offset;      // Offset within block
    uint32_t length;      // Length of artifact bytes

    uint32_t flags;       // Optional flags (tombstone, reserved, etc.)
    uint32_t reserved;    // Reserved for alignment/future use
} IndexRecord;
#pragma pack(pop)

Notes:

  • hash_* fields store the artifact key deterministically.
  • block_id references an ASL block.
  • offset / length define bytes within the block.
  • Flags may indicate tombstone or other special status.

6. SegmentFooter

#pragma pack(push,1)
typedef struct {
    uint64_t crc64;          // CRC over header + records + bloom filter
    uint64_t seal_snapshot;  // Snapshot ID when segment was sealed
    uint64_t seal_time_ns;   // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)

Notes:

  • CRC ensures corruption detection during reads.
  • Seal information allows deterministic reconstruction of CURRENT state.

7. Bloom Filter

  • The bloom filter is optional and opaque to semantics.
  • Its purpose is lookup acceleration.
  • Must be deterministic: same entries → same bloom representation.
  • Segment-local only; no global assumptions.

8. Versioning and Compatibility

  • version field in header defines encoding.
  • Readers must reject unsupported versions.
  • New fields may be added in future versions only via version bump.
  • Existing fields must never change meaning.

9. Alignment and Packing

  • All structures are packed (no compiler padding)
  • Multi-byte integers are little-endian
  • Memory-mapped readers can directly index IndexRecord[] using records_offset.

10. Summary of Encoding Guarantees

The ENC-ASL-CORE-INDEX specification ensures:

  1. Deterministic layout across platforms
  2. Direct mapping from semantic model (ArtifactKey → ArtifactLocation)
  3. Immutability of sealed segments
  4. Integrity validation via CRC
  5. Forward-compatible extensibility

11. Relationship to Other Layers

Layer Responsibility
ASL-CORE-INDEX Defines semantic meaning of artifact → location mapping
ASL-STORE-INDEX Defines lifecycle, visibility, and replay contracts
ENC-ASL-CORE-INDEX Defines exact bytes-on-disk format for segment persistence

This completes the stack: semantics → store behavior → encoding.