amduat-api/tier1/enc-asl-core-index.md

224 lines
6.5 KiB
Markdown
Raw Normal View History

# ENC-ASL-CORE-INDEX
### Encoding Specification for ASL Core Index
---
## 1. Purpose
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
2026-01-17 06:29:58 +01:00
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
It is intended for:
* C libraries
* Tools
* API frontends
* Memory-mapped access
It does **not** define:
2026-01-17 06:29:58 +01:00
* Index semantics (see ASL/1-CORE-INDEX)
* Store lifecycle behavior (see ASL-STORE-INDEX)
2026-01-17 06:29:58 +01:00
* Acceleration semantics (see ASL/INDEX-ACCEL/1)
---
## 2. Encoding Principles
1. **Little-endian** representation
2. **Fixed-width fields** for deterministic access
3. **No pointers or references**; all offsets are file-relative
4. **Packed structures**; no compiler-introduced padding
5. **Forward compatibility** via version field
6. **CRC or checksum protection** for corruption detection
All multi-byte integers are little-endian unless explicitly noted.
---
## 3. Segment Layout
Each index segment file is laid out as follows:
```
+------------------+
| SegmentHeader |
+------------------+
| BloomFilter[] | (optional, opaque to semantics)
+------------------+
| IndexRecord[] |
+------------------+
2026-01-17 06:29:58 +01:00
| ExtentRecord[] |
+------------------+
| SegmentFooter |
+------------------+
```
* **SegmentHeader**: fixed-size, mandatory
* **BloomFilter**: optional, opaque, segment-local
* **IndexRecord[]**: array of index entries
2026-01-17 06:29:58 +01:00
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
* **SegmentFooter**: fixed-size, mandatory
Offsets in the header define locations of Bloom filter and index records.
---
## 4. SegmentHeader
```c
#pragma pack(push,1)
typedef struct {
uint64_t magic; // Unique magic number identifying segment file type
uint16_t version; // Encoding version
uint16_t shard_id; // Optional shard identifier
uint32_t header_size; // Total size of header including fields below
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
uint64_t snapshot_max; // Maximum snapshot ID
uint64_t record_count; // Number of index entries
uint64_t records_offset; // File offset of IndexRecord array
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
2026-01-17 06:29:58 +01:00
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
uint64_t flags; // Reserved for future use
} SegmentHeader;
#pragma pack(pop)
```
**Notes:**
* `magic` ensures the reader validates the segment type.
* `version` allows forward-compatible extension.
* `snapshot_min` / `snapshot_max` define visibility semantics.
---
## 5. IndexRecord
```c
#pragma pack(push,1)
typedef struct {
uint64_t hash_hi; // High 64 bits of artifact hash
uint64_t hash_mid; // Middle 64 bits
uint64_t hash_lo; // Low 64 bits
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
2026-01-17 06:29:58 +01:00
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
uint32_t total_length; // Total artifact length in bytes
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
uint32_t reserved; // Reserved for alignment/future use
} IndexRecord;
#pragma pack(pop)
```
**Notes:**
* `hash_*` fields store the artifact key deterministically.
2026-01-17 06:29:58 +01:00
* `extents_offset` references the first ExtentRecord for this entry.
* `extent_count` defines how many extents to read (may be 0 for tombstones).
* `total_length` is the exact artifact size in bytes.
* Flags may indicate tombstone or other special status.
---
2026-01-17 06:29:58 +01:00
## 6. ExtentRecord
```c
#pragma pack(push,1)
typedef struct {
uint64_t block_id; // ASL block identifier
uint32_t offset; // Offset within block
uint32_t length; // Length of this extent
} ExtentRecord;
#pragma pack(pop)
```
**Notes:**
* Extents are concatenated in order to produce artifact bytes.
* `extent_count` MUST be > 0 for visible (non-tombstone) entries.
* `total_length` MUST equal the sum of `length` across the extents.
---
## 7. SegmentFooter
```c
#pragma pack(push,1)
typedef struct {
uint64_t crc64; // CRC over header + records + bloom filter
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
uint64_t seal_time_ns; // High-resolution seal timestamp
} SegmentFooter;
#pragma pack(pop)
```
**Notes:**
* CRC ensures corruption detection during reads.
* Seal information allows deterministic reconstruction of CURRENT state.
---
2026-01-17 06:29:58 +01:00
## 8. Bloom Filter
* The bloom filter is **optional** and opaque to semantics.
* Its purpose is **lookup acceleration**.
* Must be deterministic: same entries → same bloom representation.
* Segment-local only; no global assumptions.
---
2026-01-17 06:29:58 +01:00
## 9. Versioning and Compatibility
* `version` field in header defines encoding.
* Readers must **reject unsupported versions**.
* New fields may be added in future versions only via version bump.
* Existing fields must **never change meaning**.
2026-01-17 06:29:58 +01:00
* Version `1` implies single-extent layout (legacy).
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
---
2026-01-17 06:29:58 +01:00
## 10. Alignment and Packing
* All structures are **packed** (no compiler padding)
* Multi-byte integers are **little-endian**
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
2026-01-17 06:29:58 +01:00
* Extents are accessed via `IndexRecord.extents_offset` relative to the file base.
---
2026-01-17 06:29:58 +01:00
## 11. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
1. **Deterministic layout** across platforms
2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation)
3. **Immutability of sealed segments**
4. **Integrity validation** via CRC
5. **Forward-compatible extensibility**
---
2026-01-17 06:29:58 +01:00
## 12. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |
2026-01-17 06:29:58 +01:00
| ASL/1-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
2026-01-17 06:29:58 +01:00
| ASL/INDEX-ACCEL/1 | Defines routing, filters, sharding (observationally inert) |
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
This completes the stack: **semantics → store behavior → encoding**.