# Release Projector `release_projector.py` rebuilds serving projections (JanusGraph + Elasticsearch) from a lakehouse release manifest. ## What it does 1. Loads a release manifest JSON (or a `releases_v2` row containing `manifest_json`). 2. Resolves Nessie tag/ref from the manifest (or `--nessie-ref`). 3. Reads the concept Iceberg table from that ref through Spark + Iceberg + Nessie. 4. Upserts each concept into JanusGraph and Elasticsearch. `release_projector.py` now accepts both concept-shaped rows and document-shaped rows. For docs tables, it auto-detects typical columns: - name: `canonical_name|title|name|subject` - id: `concept_id|doc_id|document_id|id|uuid` - summary text: `summary|description|abstract|content|text|body` ## Prerequisites - Python deps: `python-dotenv`, `httpx`, `gremlinpython`, `pyspark` - Spark/Iceberg/Nessie jars (default package coordinates are baked into script) - Network access to: - Nessie API (example: `http://lakehouse-core:19120/api/v2`) - MinIO S3 endpoint (example: `http://lakehouse-core:9000`) - JanusGraph Gremlin endpoint - Elasticsearch endpoint ## Recommended isolated env Do not install projector dependencies into system Python. ## Preferred: existing spark container on lakehouse-core This reuses your existing `spark` container and Spark properties file. Standard command (frozen): ```bash ./run-projector-standard.sh ``` Run by release name (no manifest path): ```bash ./run-projector-standard.sh --release-name rel_2026-02-14_docs-v1 ``` Standard dry-run: ```bash ./run-projector-standard.sh --dry-run ``` Copy files to host: ```bash rsync -av --delete /home/niklas/projects/jecio/ lakehouse-core.rakeroots.lan:/tmp/jecio/ ``` Run dry-run projection inside `spark` container: ```bash ssh lakehouse-core.rakeroots.lan 'cd /tmp/jecio && ./run-projector-via-spark-container.sh ./manifests/rel_2026-02-14_docs-v1.json lake.db1.docs --dry-run es' ``` Run publish projection (writes Janus/ES): ```bash ssh lakehouse-core.rakeroots.lan 'cd /tmp/jecio && ./run-projector-standard.sh' ``` `run-projector-via-spark-container.sh` uses: - container: `spark` (override with `SPARK_CONTAINER_NAME`) - properties file: `/opt/lakehouse/spark-conf/lakehouse-spark-defaults.conf` (override with `SPARK_PROPS`) - Spark packages: Iceberg + Nessie extensions (override with `SPARK_PACKAGES`) - arg4 `targets`: `es|gremlin|both` (default `both`) - arg5 `release_name`: optional; if set, loads manifest from `releases_v2` Direct projector usage: ```bash python3 release_projector.py --release-name rel_2026-02-14_docs-v1 --concept-table lake.db1.docs --targets es --dry-run python3 release_projector.py --release-name rel_2026-02-14_docs-v1 --concept-table lake.db1.docs --targets both python3 release_projector.py --manifest-file manifests/rel_2026-02-14_docs-v1.json --concept-table lake.db1.docs --targets es --dry-run python3 release_projector.py --manifest-file manifests/rel_2026-02-14_docs-v1.json --concept-table lake.db1.docs --targets both ``` Local setup (fallback): ```bash ./setup_local_env.sh .venv-projector source .venv-projector/bin/activate ``` Remote setup (fallback, venv on `lakehouse-core`): ```bash scp release_projector.py requirements-projector.txt manifests/rel_2026-02-14_docs-v1.json lakehouse-core.rakeroots.lan:/tmp/ ssh lakehouse-core.rakeroots.lan 'python3 -m venv /tmp/jecio-projector-venv && /tmp/jecio-projector-venv/bin/pip install --upgrade pip && /tmp/jecio-projector-venv/bin/pip install -r /tmp/requirements-projector.txt' ``` ## Required env vars (example) ```bash export NESSIE_URI=http://lakehouse-core:19120/api/v2 export NESSIE_WAREHOUSE=s3a://lakehouse/warehouse export S3_ENDPOINT=http://lakehouse-core:9000 export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin export GREMLIN_URL=ws://janus.rakeroots.lan:8182/gremlin export ES_URL=http://janus.rakeroots.lan:9200 export ES_INDEX=concepts ``` ## Run ```bash /tmp/jecio-projector-venv/bin/python /tmp/release_projector.py \ --manifest-file /tmp/rel_2026-02-14_docs-v1.json \ --concept-table lake.db1.docs \ --dry-run ``` Or local: ```bash python3 release_projector.py \ --manifest-file /path/to/release.json \ --concept-table lake.db1.concepts ``` If the manifest has a Nessie tag in fields like `nessie.tag`, you can omit `--nessie-ref`. Dry run: ```bash python3 release_projector.py \ --manifest-file /path/to/release.json \ --concept-table lake.db1.concepts \ --dry-run ```