niklas/amduat-api

Fork 0

Carl Niklas Rydberg f2225f7a73 sceaning up index documents.

2026-01-17 06:29:58 +01:00

6.5 KiB

Raw Blame History

ENC-ASL-CORE-INDEX

Encoding Specification for ASL Core Index

1. Purpose

This document defines the exact encoding of ASL index segments and records for storage and interoperability.

It translates the semantic model of ASL/1-CORE-INDEX and store contracts of ASL-STORE-INDEX into a deterministic bytes-on-disk layout.

It is intended for:

C libraries
Tools
API frontends
Memory-mapped access

It does not define:

Index semantics (see ASL/1-CORE-INDEX)
Store lifecycle behavior (see ASL-STORE-INDEX)
Acceleration semantics (see ASL/INDEX-ACCEL/1)

2. Encoding Principles

Little-endian representation
Fixed-width fields for deterministic access
No pointers or references; all offsets are file-relative
Packed structures; no compiler-introduced padding
Forward compatibility via version field
CRC or checksum protection for corruption detection

All multi-byte integers are little-endian unless explicitly noted.

3. Segment Layout

Each index segment file is laid out as follows:

+------------------+
| SegmentHeader    |
+------------------+
| BloomFilter[]    | (optional, opaque to semantics)
+------------------+
| IndexRecord[]    |
+------------------+
| ExtentRecord[]   |
+------------------+
| SegmentFooter    |
+------------------+

SegmentHeader: fixed-size, mandatory
BloomFilter: optional, opaque, segment-local
IndexRecord[]: array of index entries
ExtentRecord[]: concatenated extent lists referenced by IndexRecord
SegmentFooter: fixed-size, mandatory

Offsets in the header define locations of Bloom filter and index records.

4. SegmentHeader

#pragma pack(push,1)
typedef struct {
    uint64_t magic;           // Unique magic number identifying segment file type
    uint16_t version;         // Encoding version
    uint16_t shard_id;        // Optional shard identifier
    uint32_t header_size;     // Total size of header including fields below

    uint64_t snapshot_min;    // Minimum snapshot ID for which segment entries are valid
    uint64_t snapshot_max;    // Maximum snapshot ID

    uint64_t record_count;    // Number of index entries
    uint64_t records_offset;  // File offset of IndexRecord array

    uint64_t bloom_offset;    // File offset of bloom filter (0 if none)
    uint64_t bloom_size;      // Size of bloom filter (0 if none)

    uint64_t extents_offset;  // File offset of ExtentRecord array
    uint64_t extent_count;    // Total number of ExtentRecord entries

    uint64_t flags;           // Reserved for future use
} SegmentHeader;
#pragma pack(pop)

Notes:

magic ensures the reader validates the segment type.
version allows forward-compatible extension.
snapshot_min / snapshot_max define visibility semantics.

5. IndexRecord

#pragma pack(push,1)
typedef struct {
    uint64_t hash_hi;     // High 64 bits of artifact hash
    uint64_t hash_mid;    // Middle 64 bits
    uint64_t hash_lo;     // Low 64 bits
    uint32_t hash_tail;   // Optional tail for full hash if larger than 192 bits

    uint64_t extents_offset;  // File offset of first ExtentRecord for this entry
    uint32_t extent_count;    // Number of ExtentRecord entries for this artifact
    uint32_t total_length;    // Total artifact length in bytes

    uint32_t flags;       // Optional flags (tombstone, reserved, etc.)
    uint32_t reserved;    // Reserved for alignment/future use
} IndexRecord;
#pragma pack(pop)

Notes:

hash_* fields store the artifact key deterministically.
extents_offset references the first ExtentRecord for this entry.
extent_count defines how many extents to read (may be 0 for tombstones).
total_length is the exact artifact size in bytes.
Flags may indicate tombstone or other special status.

6. ExtentRecord

#pragma pack(push,1)
typedef struct {
    uint64_t block_id;    // ASL block identifier
    uint32_t offset;      // Offset within block
    uint32_t length;      // Length of this extent
} ExtentRecord;
#pragma pack(pop)

Notes:

Extents are concatenated in order to produce artifact bytes.
extent_count MUST be > 0 for visible (non-tombstone) entries.
total_length MUST equal the sum of length across the extents.

7. SegmentFooter

#pragma pack(push,1)
typedef struct {
    uint64_t crc64;          // CRC over header + records + bloom filter
    uint64_t seal_snapshot;  // Snapshot ID when segment was sealed
    uint64_t seal_time_ns;   // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)

Notes:

CRC ensures corruption detection during reads.
Seal information allows deterministic reconstruction of CURRENT state.

8. Bloom Filter

The bloom filter is optional and opaque to semantics.
Its purpose is lookup acceleration.
Must be deterministic: same entries → same bloom representation.
Segment-local only; no global assumptions.

9. Versioning and Compatibility

version field in header defines encoding.
Readers must reject unsupported versions.
New fields may be added in future versions only via version bump.
Existing fields must never change meaning.
Version 1 implies single-extent layout (legacy).
Version 2 introduces ExtentRecord lists and extents_offset / extent_count.

10. Alignment and Packing

All structures are packed (no compiler padding)
Multi-byte integers are little-endian
Memory-mapped readers can directly index IndexRecord[] using records_offset.
Extents are accessed via IndexRecord.extents_offset relative to the file base.

11. Summary of Encoding Guarantees

The ENC-ASL-CORE-INDEX specification ensures:

Deterministic layout across platforms
Direct mapping from semantic model (ArtifactKey → ArtifactLocation)
Immutability of sealed segments
Integrity validation via CRC
Forward-compatible extensibility

12. Relationship to Other Layers

Layer	Responsibility
ASL/1-CORE-INDEX	Defines semantic meaning of artifact → location mapping
ASL-STORE-INDEX	Defines lifecycle, visibility, and replay contracts
ASL/INDEX-ACCEL/1	Defines routing, filters, sharding (observationally inert)
ENC-ASL-CORE-INDEX	Defines exact bytes-on-disk format for segment persistence

This completes the stack: semantics → store behavior → encoding.

6.5 KiB Raw Blame History

ENC-ASL-CORE-INDEX

Encoding Specification for ASL Core Index

1. Purpose

2. Encoding Principles

3. Segment Layout

4. SegmentHeader

5. IndexRecord

6. ExtentRecord

7. SegmentFooter

8. Bloom Filter

9. Versioning and Compatibility

10. Alignment and Packing

11. Summary of Encoding Guarantees

12. Relationship to Other Layers

6.5 KiB

Raw Blame History