amduat/tier1/enc-asl-log-1.md

289 lines
7.6 KiB
Markdown
Raw Permalink Normal View History

2026-01-17 11:18:00 +01:00
# ENC/ASL-LOG/1 — Encoding Specification for ASL Append-Only Log
Status: Draft
Owner: Niklas Rydberg
Version: 0.1.0
SoT: No
Last Updated: 2025-11-16
Linked Phase Pack: N/A
Tags: [encoding, log, deterministic]
<!-- Source: /amduat-api/tier1/enc-asl-log.md | Canonical: /amduat/tier1/enc-asl-log-1.md -->
**Document ID:** `ENC/ASL-LOG/1`
**Layer:** Log Encoding Profile (on top of ASL/LOG/1)
**Depends on (normative):**
* `ASL/LOG/1` — semantic log behavior and replay rules
**Informative references:**
* `ASL/STORE-INDEX/1` — store lifecycle and replay contracts
© 2025 Niklas Rydberg.
## License
Except where otherwise noted, this document (text and diagrams) is licensed under
the Creative Commons Attribution 4.0 International License (CC BY 4.0).
The identifier registries and mapping tables (e.g. TypeTag IDs, HashId
assignments, EdgeTypeId tables) are additionally made available under CC0 1.0
Universal (CC0) to enable unrestricted reuse in implementations and derivative
specifications.
Code examples in this document are provided under the Apache License 2.0 unless
explicitly stated otherwise. Test vectors, where present, are dedicated to the
public domain under CC0 1.0.
---
## 1. Purpose
This document defines the **exact encoding** of the ASL append-only log.
It translates **ASL/LOG/1** semantics into a deterministic **bytes-on-disk** format.
It does **not** define log semantics (see `ASL/LOG/1`).
---
## 2. Encoding Principles
1. **Little-endian** integers
2. **Packed structures** (no compiler padding)
3. **Forward-compatible** versioning via header fields
4. **Deterministic serialization**: identical log content -> identical bytes
5. **Hash-chained integrity** as defined by ASL/LOG/1
---
## 3. Log File Layout
```
+----------------+
| LogHeader |
+----------------+
| LogRecord[] |
+----------------+
```
2026-01-17 12:21:15 +01:00
Boxed sketch:
```
┌───────────────────────┐
│ LogHeader │
├───────────────────────┤
│ LogRecord[] │
└───────────────────────┘
```
2026-01-17 11:18:00 +01:00
* **LogHeader**: fixed-size, mandatory, begins file
* **LogRecord[]**: append-only entries, variable number
---
## 4. LogHeader
```c
#pragma pack(push,1)
typedef struct {
uint64_t magic; // "ASLLOG01"
uint32_t version; // Encoding version (1)
uint32_t header_size; // Total header bytes including this struct
uint64_t flags; // Reserved, must be zero for v1
} LogHeader;
#pragma pack(pop)
```
Notes:
* `magic` is ASCII bytes: `0x41 0x53 0x4c 0x4c 0x4f 0x47 0x30 0x31`
* `version` allows forward compatibility
---
## 5. LogRecord Envelope
Each record is encoded as:
```c
#pragma pack(push,1)
typedef struct {
uint64_t logseq; // Monotonic sequence number
uint32_t record_type; // Record type tag
uint32_t payload_len; // Payload byte length
uint8_t payload[payload_len];
uint8_t record_hash[32]; // Hash-chained integrity (SHA-256)
} LogRecord;
#pragma pack(pop)
```
Hash chain rule (normative):
```
record_hash = H(prev_record_hash || logseq || record_type || payload_len || payload)
```
* `prev_record_hash` is the previous record's `record_hash`
* For the first record, `prev_record_hash` is 32 bytes of zero
* `H` is SHA-256 for v1
Readers MUST skip unknown `record_type` values using `payload_len` and MUST
continue replay without failure.
2026-01-17 12:21:15 +01:00
**Error handling (normative):**
* Malformed log headers or records (bad magic/version, truncated payload,
invalid `payload_len`, hash-chain mismatch) MUST cause the log to be rejected
for replay.
* Unknown `record_type` values are the only exception: they MUST be skipped
using `payload_len` and MUST NOT break replay determinism.
2026-01-17 11:18:00 +01:00
---
## 6. Record Type IDs (v1)
These type IDs bind the ASL/LOG/1 semantics to bytes-on-disk:
| Type ID | Record Type |
| ------- | ------------------ |
| 0x01 | SEGMENT_SEAL |
| 0x10 | TOMBSTONE |
| 0x11 | TOMBSTONE_LIFT |
| 0x20 | SNAPSHOT_ANCHOR |
| 0x30 | ARTIFACT_PUBLISH |
| 0x31 | ARTIFACT_UNPUBLISH |
---
## 6.1 Payload Schemas (v1)
All payloads are little-endian and packed. Variable-length fields are encoded
inline and accounted for by `payload_len`.
### 6.1.1 ArtifactRef
```c
#pragma pack(push,1)
typedef struct {
uint32_t hash_id; // Hash algorithm identifier
uint16_t digest_len; // Digest length in bytes
uint16_t reserved0; // Must be 0
uint8_t digest[digest_len];
} ArtifactRef;
#pragma pack(pop)
```
Notes:
* `digest_len` MUST be > 0.
* If StoreConfig fixes the hash, `digest_len` MUST match that hash's length.
### 6.1.2 SEGMENT_SEAL (Type 0x01)
```c
#pragma pack(push,1)
typedef struct {
uint64_t segment_id; // Store-local segment identifier
uint8_t segment_hash[32]; // SHA-256 over the segment file bytes
} SegmentSealPayload;
#pragma pack(pop)
```
2026-01-17 11:46:57 +01:00
**Implementation note (segment identity):** In this repository, `segment_id` is
allocated when a segment is created (before writing records) and persisted via
store metadata (e.g., filename or catalog). The `segment_hash` is computed over
the exact on-disk segment bytes including header, records, digest bytes,
extents, and footer; the hash is taken after the footer is written so the seal
commits to the footer metadata.
2026-01-17 11:18:00 +01:00
### 6.1.3 TOMBSTONE (Type 0x10)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
uint32_t scope; // Opaque to ASL/LOG/1
uint32_t reason_code; // Opaque to ASL/LOG/1
} TombstonePayload;
#pragma pack(pop)
```
### 6.1.4 TOMBSTONE_LIFT (Type 0x11)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
uint64_t tombstone_logseq; // logseq of the tombstone being lifted
} TombstoneLiftPayload;
#pragma pack(pop)
```
### 6.1.5 SNAPSHOT_ANCHOR (Type 0x20)
```c
#pragma pack(push,1)
typedef struct {
uint64_t snapshot_id;
uint8_t root_hash[32]; // Hash of snapshot-visible state
} SnapshotAnchorPayload;
#pragma pack(pop)
```
### 6.1.6 ARTIFACT_PUBLISH (Type 0x30)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
} ArtifactPublishPayload;
#pragma pack(pop)
```
### 6.1.7 ARTIFACT_UNPUBLISH (Type 0x31)
```c
#pragma pack(push,1)
typedef struct {
ArtifactRef artifact;
} ArtifactUnpublishPayload;
#pragma pack(pop)
```
---
## 7. Versioning Rules
* `version = 1` for this specification.
* New record types MAY be added without bumping the version.
* Layout changes to `LogHeader` or `LogRecord` require a new version.
---
2026-01-17 12:21:15 +01:00
## 7.1 Error Mapping (Informative)
Decoding failures (invalid magic/version, truncated records, invalid payload
lengths, hash-chain mismatches) MUST be surfaced to callers as decode errors.
The exact error codes are implementation-specific; examples include
`ERR_ASL_LOG_ENC_INVALID`, `ERR_ASL_LOG_HASH_MISMATCH`, or a generic
`ERR_INTEGRITY`. Unknown record types are not errors and must be skipped.
## 7.2 Conformance Checklist (Informative)
* Reject logs with invalid magic/version or truncated records.
* Enforce hash-chain validation across all records.
* Skip unknown record types using `payload_len` without breaking replay.
* Treat malformed payload lengths as fatal decode errors.
2026-01-17 11:18:00 +01:00
## 8. Relationship to Other Layers
| Layer | Responsibility |
| ---------------- | ------------------------------------------------ |
| ASL/LOG/1 | Semantic log behavior and replay rules |
| ASL-STORE-INDEX | Store lifecycle and snapshot/log contracts |
| ENC-ASL-LOG | Exact byte layout for log encoding (this doc) |
| ENC-ASL-CORE-INDEX | Exact byte layout for index segments |