194 lines
5.3 KiB
Markdown
194 lines
5.3 KiB
Markdown
# ENC-ASL-CORE-INDEX
|
|
|
|
### Encoding Specification for ASL Core Index
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
|
|
|
|
It translates the **semantic model of ASL-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
|
|
|
|
It is intended for:
|
|
|
|
* C libraries
|
|
* Tools
|
|
* API frontends
|
|
* Memory-mapped access
|
|
|
|
It does **not** define:
|
|
|
|
* Index semantics (see ASL-CORE-INDEX)
|
|
* Store lifecycle behavior (see ASL-STORE-INDEX)
|
|
|
|
---
|
|
|
|
## 2. Encoding Principles
|
|
|
|
1. **Little-endian** representation
|
|
2. **Fixed-width fields** for deterministic access
|
|
3. **No pointers or references**; all offsets are file-relative
|
|
4. **Packed structures**; no compiler-introduced padding
|
|
5. **Forward compatibility** via version field
|
|
6. **CRC or checksum protection** for corruption detection
|
|
|
|
All multi-byte integers are little-endian unless explicitly noted.
|
|
|
|
---
|
|
|
|
## 3. Segment Layout
|
|
|
|
Each index segment file is laid out as follows:
|
|
|
|
```
|
|
+------------------+
|
|
| SegmentHeader |
|
|
+------------------+
|
|
| BloomFilter[] | (optional, opaque to semantics)
|
|
+------------------+
|
|
| IndexRecord[] |
|
|
+------------------+
|
|
| SegmentFooter |
|
|
+------------------+
|
|
```
|
|
|
|
* **SegmentHeader**: fixed-size, mandatory
|
|
* **BloomFilter**: optional, opaque, segment-local
|
|
* **IndexRecord[]**: array of index entries
|
|
* **SegmentFooter**: fixed-size, mandatory
|
|
|
|
Offsets in the header define locations of Bloom filter and index records.
|
|
|
|
---
|
|
|
|
## 4. SegmentHeader
|
|
|
|
```c
|
|
#pragma pack(push,1)
|
|
typedef struct {
|
|
uint64_t magic; // Unique magic number identifying segment file type
|
|
uint16_t version; // Encoding version
|
|
uint16_t shard_id; // Optional shard identifier
|
|
uint32_t header_size; // Total size of header including fields below
|
|
|
|
uint64_t snapshot_min; // Minimum snapshot ID for which segment entries are valid
|
|
uint64_t snapshot_max; // Maximum snapshot ID
|
|
|
|
uint64_t record_count; // Number of index entries
|
|
uint64_t records_offset; // File offset of IndexRecord array
|
|
|
|
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
|
|
uint64_t bloom_size; // Size of bloom filter (0 if none)
|
|
|
|
uint64_t flags; // Reserved for future use
|
|
} SegmentHeader;
|
|
#pragma pack(pop)
|
|
```
|
|
|
|
**Notes:**
|
|
|
|
* `magic` ensures the reader validates the segment type.
|
|
* `version` allows forward-compatible extension.
|
|
* `snapshot_min` / `snapshot_max` define visibility semantics.
|
|
|
|
---
|
|
|
|
## 5. IndexRecord
|
|
|
|
```c
|
|
#pragma pack(push,1)
|
|
typedef struct {
|
|
uint64_t hash_hi; // High 64 bits of artifact hash
|
|
uint64_t hash_mid; // Middle 64 bits
|
|
uint64_t hash_lo; // Low 64 bits
|
|
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
|
|
|
|
uint64_t block_id; // ASL block identifier
|
|
uint32_t offset; // Offset within block
|
|
uint32_t length; // Length of artifact bytes
|
|
|
|
uint32_t flags; // Optional flags (tombstone, reserved, etc.)
|
|
uint32_t reserved; // Reserved for alignment/future use
|
|
} IndexRecord;
|
|
#pragma pack(pop)
|
|
```
|
|
|
|
**Notes:**
|
|
|
|
* `hash_*` fields store the artifact key deterministically.
|
|
* `block_id` references an ASL block.
|
|
* `offset` / `length` define bytes within the block.
|
|
* Flags may indicate tombstone or other special status.
|
|
|
|
---
|
|
|
|
## 6. SegmentFooter
|
|
|
|
```c
|
|
#pragma pack(push,1)
|
|
typedef struct {
|
|
uint64_t crc64; // CRC over header + records + bloom filter
|
|
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
|
|
uint64_t seal_time_ns; // High-resolution seal timestamp
|
|
} SegmentFooter;
|
|
#pragma pack(pop)
|
|
```
|
|
|
|
**Notes:**
|
|
|
|
* CRC ensures corruption detection during reads.
|
|
* Seal information allows deterministic reconstruction of CURRENT state.
|
|
|
|
---
|
|
|
|
## 7. Bloom Filter
|
|
|
|
* The bloom filter is **optional** and opaque to semantics.
|
|
* Its purpose is **lookup acceleration**.
|
|
* Must be deterministic: same entries → same bloom representation.
|
|
* Segment-local only; no global assumptions.
|
|
|
|
---
|
|
|
|
## 8. Versioning and Compatibility
|
|
|
|
* `version` field in header defines encoding.
|
|
* Readers must **reject unsupported versions**.
|
|
* New fields may be added in future versions only via version bump.
|
|
* Existing fields must **never change meaning**.
|
|
|
|
---
|
|
|
|
## 9. Alignment and Packing
|
|
|
|
* All structures are **packed** (no compiler padding)
|
|
* Multi-byte integers are **little-endian**
|
|
* Memory-mapped readers can directly index `IndexRecord[]` using `records_offset`.
|
|
|
|
---
|
|
|
|
## 10. Summary of Encoding Guarantees
|
|
|
|
The ENC-ASL-CORE-INDEX specification ensures:
|
|
|
|
1. **Deterministic layout** across platforms
|
|
2. **Direct mapping from semantic model** (ArtifactKey → ArtifactLocation)
|
|
3. **Immutability of sealed segments**
|
|
4. **Integrity validation** via CRC
|
|
5. **Forward-compatible extensibility**
|
|
|
|
---
|
|
|
|
## 11. Relationship to Other Layers
|
|
|
|
| Layer | Responsibility |
|
|
| ------------------ | ---------------------------------------------------------- |
|
|
| ASL-CORE-INDEX | Defines semantic meaning of artifact → location mapping |
|
|
| ASL-STORE-INDEX | Defines lifecycle, visibility, and replay contracts |
|
|
| ENC-ASL-CORE-INDEX | Defines exact bytes-on-disk format for segment persistence |
|
|
|
|
This completes the stack: **semantics → store behavior → encoding**.
|
|
|
|
|