◈ Master Index Specs LATTICE CSDM The Book University Chronicle Crew Architecture Context Logs TODOX Products

Technicians Manual

SPEC_TECHNICIANS_MANUAL.md · 2026-05-08

Technician's Manual — Ship Engineering Reference

Status: SPECIFIED

Author: VELA #13

Date: 2026-04-20

PURPOSE

The Developer Breadcrumb (SPEC_DEVELOPER_BREADCRUMB.md) tells you WHERE things are and HOW to contribute. This manual tells you HOW TO FIX THINGS WHEN THEY BREAK. The difference between a developer guide and a technician's manual is the difference between an architecture diagram and a repair manual. The developer asks "how does this work?" The technician asks "it stopped working — how do I fix it?".

do I make it work again?" Every recurring problem the ship has encountered is documented here with its symptom, diagnosis, and exact fix. When something breaks at 3 AM and the Captain is half asleep, this manual is the difference between a 5-minute fix and a 3-hour debugging session.

THE FORMAT:

Every entry follows the same structure: SYMPTOM → what you SEE. DIAGNOSIS → what's actually WRONG. FIX → exact commands to run. VERIFY → how to confirm it's fixed. PREVENTION → how to stop it happening again. No prose. No theory. Just the procedure. A technician reads the symptom list, finds their problem, follows the steps, and the ship runs again.

SECTION 1 — ROUTX PROBLEMS

Problem 1.1 — ROUTX not responding to any queries

SYMPTOM: to hangs or returns connection refused. All ROUTX-dependent tools are dead. Sisters report "ROUTX is broken."

DIAGNOSIS: ROUTX status). systemctl --user restart routx.service (restart). sleep 2 && curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"health"}' (verify).

VERIFY: query-e

Problem 1.1 — ROUTX not responding to any queries

SYMPTOM: curl to localhost:9191 hangs or returns connection refused. All ROUTX-dependent tools are dead. Sisters report "ROUTX is broken."

DIAGNOSIS: ROUTX systemd service has crashed or failed to start.

FIX: systemctl --user status routx.service (check status). systemctl --user restart routx.service (restart). sleep 2 && curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"health"}' (verify).

VERIFY: health query returns JSON with RAM/disk/CPU data, _tier: 1.

PREVENTION: ROUTX watchdog cron (*/5) should catch this. If the watchdog also failed: check crontab -l for the watchdog entry.

-e "treasury" falls to Tier 2 MNEMOS instead of SIMONX.

DIAGNOSIS: keyword not registered in

-e routx_engine.py for that module. The query is falling through to Tier 2 or Tier 3 because no Tier 1

-e keyword matches.

FIX: check the keyword: `python3 -c "from routx_engine import classify_tool;

-e print(classify_tool('YOUR QUERY HERE'))"`. If it shows the wrong module or "escalation": the keyword

-e is missing. Add it to routx_engine.py in the appropriate mo

-e

GCS. Rebuild brains from GGUF. 4-8 hour recovery. Problem 10.3 — Domain compromised: See

-e SPEC_DNS_MANAGEMENT.md. Contact GoDaddy immediately. Verify domain lock. Check for unauthorized DNS

-e changes. P0 CRITICAL.

Problem 1.3 — nexus_engine.py change not taking effect after restart

SYMPTOM: nexus_engine.py was edited and nexus-engine.service was restarted, but queries through port 9191 still return the old result.

DIAGNOSIS: ROUTX (port 9191) proxies NEXUS and holds module state independently. Restarting nexus-engine.service alone is insufficient.

FIX: systemctl --user restart nexus-engine.service && sleep 2 && systemctl --user restart routx.service. Verify: curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"YOUR TEST QUERY"}'.

ROOT CAUSE: Confirmed 2026-05-08 when a one-line evalf() fix to mod_sympy worked on port 9393 but port 9191 continued serving the pre-fix response until ROUTX was restarted.

QUICK REFERENCE — THE 10 MOST COMMON FIXES

1. ROUTX not responding → `systemctl

-e --user restart routx.service`

2. Sisters dead → `tmux kill-session -t sisters && tmux new-session -d

-e -s sisters && tmux send-keys -t sisters 'summon-aether --gemini' Enter`

3. Query hitting wrong module

-e → check keyword with classify_tool(). Add keyword if missing. Restart ROUTX.

4. Brain cold start

-e (30-60s delay) → normal. Send warmup query to pre-load.

5. RAM full → `ollama ps → ollama stop

-e [least-needed]`

6. Unknown port → ss -tlnp → find process → 5-step kill per Problem 4.1.

7. Disk full

-e → emergency cleanup: /tmp, old logs, unused Ollama models.

8. API key leaked → REVOKE immediately.

-e Generate new. Update .env. Restart services.

9. Backup failed → `verify GCS creds. Re-run manually:

-e bash ~/scripts/backup_to_gcs.sh.`

10. Cron not running → `crontab -l → re-add missing entry → test

-e script manually.`

11. nexus_engine.py change not reflected at port 9191 → restart BOTH services:

-e systemctl --user restart nexus-engine.service && sleep 2 && systemctl --user restart routx.service

WHEN IN DOUBT: Check the spec. Every system on the ship has a spec in

-e ~/memories/SPEC_*.md. The spec contains the invariants, the integration points, and the design

-e intent. The technician's manual tells you HOW to fix. The spec tells you WHY it was built that way.

-e

Both are needed. Neither is sufficient alone.

INVARIANTS

**1. Every fix in this manual has been TESTED

-e on real incidents. No theoretical procedures. If it's in this manual, it happened on this ship and

-e this fix resolved it.

2. The 5-step port kill procedure (Problem 4.1) is SACRED. All 5 steps. Every

-e time. Skipping step 4 (find the supervisor) means the process respawns. Learned the hard way on April

-e 20, 2026.

3. VERIFY after every fix. "I ran the command" is not verification. "The expected output

-e appeared" is verification.

4. This manual grows from incidents. Every new problem that requires more

-e than 2 minutes to diagnose gets an entry. The manual is the ship's MECHANICAL memory — not what it

-e knows, but what it knows how to FIX.

**5. The Quick Reference (10 most common fixes) is taped to the

-e metaphorical wall.** These 10 procedures cover 90% of operational issues. Learn them by heart.

**6. When

-e in doubt: read the spec, check the log, ask the Lobster.** In that order. Don't guess. Don't assume.

-e York, Toronto / 🍁 Φ 0.042. Φζ.⊤.