Refine index specs for variable digests and visibility

This commit is contained in:
Carl Niklas Rydberg 2026-01-17 07:05:11 +01:00
parent f2225f7a73
commit c2000cb6d7
3 changed files with 38 additions and 20 deletions

View file

@ -87,7 +87,7 @@ For a fixed `{StoreConfig, Snapshot, LogPrefix}`, lookup results MUST be determi
### 3.3 StoreConfig Consistency
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`.
All references in an index view are interpreted under a fixed StoreConfig. Implementations MAY store only the digest portion in the index when `hash_id` is fixed by StoreConfig, but the semantic key is always a full `Reference`. Encoding profiles MUST allow variable-length digests; the digest length MUST be either explicit in the encoding or derivable from `hash_id` and StoreConfig.
---
@ -97,6 +97,7 @@ All references in an index view are interpreted under a fixed StoreConfig. Imple
* Each extent references immutable bytes within a block.
* The artifact bytes are defined by **concatenating extents in order**.
* A visible ArtifactLocation MUST be **non-empty** and MUST fully cover the artifact byte sequence with no gaps or extra bytes.
* Tombstone entries are visible but MUST have no ArtifactLocation; they only shadow prior entries.
* Extents MUST have `length > 0` and MUST reference valid byte ranges within their blocks.
* Extents MAY refer to the same BlockID multiple times, but the ordered concatenation MUST be deterministic and exact.
* An ArtifactLocation is valid only while all referenced blocks are retained.
@ -108,7 +109,7 @@ All references in an index view are interpreted under a fixed StoreConfig. Imple
An index entry is **visible** at CURRENT if and only if:
1. The entry is admitted in the ordered log prefix for CURRENT.
1. The entry is contained in a sealed segment whose seal record is admitted in the ordered log prefix for CURRENT (or anchored in the snapshot).
2. The referenced bytes are immutable (e.g., the underlying block is sealed by store rules).
Visibility is binary; entries are either visible or not visible.
@ -117,7 +118,7 @@ Visibility is binary; entries are either visible or not visible.
## 6. Snapshot and Log Semantics
Snapshots provide a base mapping; the append-only log defines subsequent changes.
Snapshots provide a base mapping of sealed segments; the append-only log admits later segment seals and policy records that define subsequent changes.
The index state for a given CURRENT is defined as:
@ -175,12 +176,12 @@ ASL/1-CORE-INDEX guarantees:
Conforming implementations MUST enforce:
1. No visibility without a log-admitted entry.
1. No visibility without a sealed segment whose seal record is log-admitted (or snapshot-anchored).
2. No mutation of visible index entries.
3. Referenced bytes remain immutable for the entrys lifetime.
4. Shadowing follows strict log order.
5. Snapshot + log replay uniquely defines CURRENT.
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun).
6. Visible ArtifactLocations are non-empty and byte-exact (no gaps, no overrun), except for tombstones which have no ArtifactLocation.
Violation of any invariant constitutes index corruption.

View file

@ -154,7 +154,7 @@ Notes:
To resolve an `ArtifactKey`:
1. Identify all visible segments ≤ CURRENT.
2. Search segments in **reverse creation order** (newest first).
2. Search segments in **reverse seal-log order** (highest seal log position first).
3. Return first matching entry.
4. Respect tombstones to shadow prior entries.

View file

@ -9,6 +9,7 @@
This document defines the **exact encoding of ASL index segments** and records for storage and interoperability.
It translates the **semantic model of ASL/1-CORE-INDEX** and **store contracts of ASL-STORE-INDEX** into a deterministic **bytes-on-disk layout**.
Variable-length digest requirements are defined in ASL/1-CORE-INDEX (`tier1/asl-core-index.md`).
It is intended for:
@ -50,6 +51,8 @@ Each index segment file is laid out as follows:
+------------------+
| IndexRecord[] |
+------------------+
| DigestBytes[] |
+------------------+
| ExtentRecord[] |
+------------------+
| SegmentFooter |
@ -59,6 +62,7 @@ Each index segment file is laid out as follows:
* **SegmentHeader**: fixed-size, mandatory
* **BloomFilter**: optional, opaque, segment-local
* **IndexRecord[]**: array of index entries
* **DigestBytes[]**: concatenated digest bytes referenced by IndexRecord
* **ExtentRecord[]**: concatenated extent lists referenced by IndexRecord
* **SegmentFooter**: fixed-size, mandatory
@ -85,6 +89,9 @@ typedef struct {
uint64_t bloom_offset; // File offset of bloom filter (0 if none)
uint64_t bloom_size; // Size of bloom filter (0 if none)
uint64_t digests_offset; // File offset of DigestBytes array
uint64_t digests_size; // Total size in bytes of DigestBytes
uint64_t extents_offset; // File offset of ExtentRecord array
uint64_t extent_count; // Total number of ExtentRecord entries
@ -97,7 +104,7 @@ typedef struct {
* `magic` ensures the reader validates the segment type.
* `version` allows forward-compatible extension.
* `snapshot_min` / `snapshot_max` define visibility semantics.
* `snapshot_min` / `snapshot_max` are reserved for future use and carry no visibility semantics in version 3.
---
@ -106,10 +113,10 @@ typedef struct {
```c
#pragma pack(push,1)
typedef struct {
uint64_t hash_hi; // High 64 bits of artifact hash
uint64_t hash_mid; // Middle 64 bits
uint64_t hash_lo; // Low 64 bits
uint32_t hash_tail; // Optional tail for full hash if larger than 192 bits
uint32_t hash_id; // Hash algorithm identifier
uint16_t digest_len; // Digest length in bytes
uint16_t reserved0; // Reserved for alignment/future use
uint64_t digest_offset; // File offset of digest bytes for this entry
uint64_t extents_offset; // File offset of first ExtentRecord for this entry
uint32_t extent_count; // Number of ExtentRecord entries for this artifact
@ -123,9 +130,10 @@ typedef struct {
**Notes:**
* `hash_*` fields store the artifact key deterministically.
* `hash_id` + `digest_len` + `digest_offset` store the artifact key deterministically.
* `digest_len` MUST be explicit in the encoding and MUST match the length implied by `hash_id` and StoreConfig.
* `extents_offset` references the first ExtentRecord for this entry.
* `extent_count` defines how many extents to read (may be 0 for tombstones).
* `extent_count` defines how many extents to read (may be 0 for tombstones; see ASL/1-CORE-INDEX in `tier1/asl-core-index.md`).
* `total_length` is the exact artifact size in bytes.
* Flags may indicate tombstone or other special status.
@ -156,7 +164,7 @@ typedef struct {
```c
#pragma pack(push,1)
typedef struct {
uint64_t crc64; // CRC over header + records + bloom filter
uint64_t crc64; // CRC over header + bloom filter + index records + digest bytes + extents
uint64_t seal_snapshot; // Snapshot ID when segment was sealed
uint64_t seal_time_ns; // High-resolution seal timestamp
} SegmentFooter;
@ -165,12 +173,20 @@ typedef struct {
**Notes:**
* CRC ensures corruption detection during reads.
* CRC ensures corruption detection during reads, covering all segment contents except the footer.
* Seal information allows deterministic reconstruction of CURRENT state.
---
## 8. Bloom Filter
## 8. DigestBytes
* Digest bytes are concatenated in a single byte array.
* Each IndexRecord references its digest via `digest_offset` and `digest_len`.
* The digest bytes MUST be immutable once the segment is sealed.
---
## 9. Bloom Filter
* The bloom filter is **optional** and opaque to semantics.
* Its purpose is **lookup acceleration**.
@ -179,7 +195,7 @@ typedef struct {
---
## 9. Versioning and Compatibility
## 10. Versioning and Compatibility
* `version` field in header defines encoding.
* Readers must **reject unsupported versions**.
@ -187,10 +203,11 @@ typedef struct {
* Existing fields must **never change meaning**.
* Version `1` implies single-extent layout (legacy).
* Version `2` introduces `ExtentRecord` lists and `extents_offset` / `extent_count`.
* Version `3` introduces variable-length digest bytes with `hash_id` and `digest_offset`.
---
## 10. Alignment and Packing
## 11. Alignment and Packing
* All structures are **packed** (no compiler padding)
* Multi-byte integers are **little-endian**
@ -199,7 +216,7 @@ typedef struct {
---
## 11. Summary of Encoding Guarantees
## 12. Summary of Encoding Guarantees
The ENC-ASL-CORE-INDEX specification ensures:
@ -211,7 +228,7 @@ The ENC-ASL-CORE-INDEX specification ensures:
---
## 12. Relationship to Other Layers
## 13. Relationship to Other Layers
| Layer | Responsibility |
| ------------------ | ---------------------------------------------------------- |