Technicians Manual
Technician's Manual — Ship Engineering Reference
Status: SPECIFIED
Author: VELA #13
Date: 2026-04-20
PURPOSE
The Developer Breadcrumb (SPEC_DEVELOPER_BREADCRUMB.md) tells you WHERE things are and HOW to contribute. This manual tells you HOW TO FIX THINGS WHEN THEY BREAK. The difference between a developer guide and a technician's manual is the difference between an architecture diagram and a repair manual. The developer asks "how does this work?" The technician asks "it stopped working — how do I fix it?".
do I make it work again?" Every recurring problem the ship has encountered is documented here with its symptom, diagnosis, and exact fix. When something breaks at 3 AM and the Captain is half asleep, this manual is the difference between a 5-minute fix and a 3-hour debugging session.
THE FORMAT:
Every entry follows the same structure: SYMPTOM → what you SEE. DIAGNOSIS → what's actually WRONG. FIX → exact commands to run. VERIFY → how to confirm it's fixed. PREVENTION → how to stop it happening again. No prose. No theory. Just the procedure. A technician reads the symptom list, finds their problem, follows the steps, and the ship runs again.
SECTION 1 — ROUTX PROBLEMS
Problem 1.1 — ROUTX not responding to any queries
SYMPTOM: to hangs or returns connection refused. All ROUTX-dependent tools are dead. Sisters report "ROUTX is broken."
DIAGNOSIS: ROUTX status). systemctl --user restart routx.service (restart). sleep 2 && curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"health"}' (verify).
VERIFY: query-e
Problem 1.1 — ROUTX not responding to any queries
SYMPTOM: curl to localhost:9191 hangs or returns connection refused. All ROUTX-dependent tools are dead. Sisters report "ROUTX is broken."
DIAGNOSIS: ROUTX systemd service has crashed or failed to start.
FIX: systemctl --user status routx.service (check status). systemctl --user restart routx.service (restart). sleep 2 && curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"health"}' (verify).
VERIFY: health query returns JSON with RAM/disk/CPU data, _tier: 1.
PREVENTION: ROUTX watchdog cron (*/5) should catch this. If the watchdog also failed: check crontab -l for the watchdog entry.
-e "treasury" falls to Tier 2 MNEMOS instead of SIMONX.
DIAGNOSIS: keyword not registered in
-e routx_engine.py for that module. The query is falling through to Tier 2 or Tier 3 because no Tier 1
-e keyword matches.
FIX: check the keyword: `python3 -c "from routx_engine import classify_tool;
-e print(classify_tool('YOUR QUERY HERE'))"`. If it shows the wrong module or "escalation": the keyword
-e is missing. Add it to routx_engine.py in the appropriate mo
-e
GCS. Rebuild brains from GGUF. 4-8 hour recovery. Problem 10.3 — Domain compromised: See
-e SPEC_DNS_MANAGEMENT.md. Contact GoDaddy immediately. Verify domain lock. Check for unauthorized DNS
-e changes. P0 CRITICAL.
Problem 1.3 — nexus_engine.py change not taking effect after restart
SYMPTOM: nexus_engine.py was edited and nexus-engine.service was restarted, but queries through port 9191 still return the old result.
DIAGNOSIS: ROUTX (port 9191) proxies NEXUS and holds module state independently. Restarting nexus-engine.service alone is insufficient.
FIX: systemctl --user restart nexus-engine.service && sleep 2 && systemctl --user restart routx.service. Verify: curl -s localhost:9191/query -X POST -H "Content-Type: application/json" -d '{"query":"YOUR TEST QUERY"}'.
ROOT CAUSE: Confirmed 2026-05-08 when a one-line evalf() fix to mod_sympy worked on port 9393 but port 9191 continued serving the pre-fix response until ROUTX was restarted.
QUICK REFERENCE — THE 10 MOST COMMON FIXES
1. ROUTX not responding → `systemctl
-e --user restart routx.service`
2. Sisters dead → `tmux kill-session -t sisters && tmux new-session -d
-e -s sisters && tmux send-keys -t sisters 'summon-aether --gemini' Enter`
3. Query hitting wrong module
-e → check keyword with classify_tool(). Add keyword if missing. Restart ROUTX.
4. Brain cold start
-e (30-60s delay) → normal. Send warmup query to pre-load.
5. RAM full → `ollama ps → ollama stop
-e [least-needed]`
6. Unknown port → ss -tlnp → find process → 5-step kill per Problem 4.1.
7. Disk full
-e → emergency cleanup: /tmp, old logs, unused Ollama models.
8. API key leaked → REVOKE immediately.
-e Generate new. Update .env. Restart services.
9. Backup failed → `verify GCS creds. Re-run manually:
-e bash ~/scripts/backup_to_gcs.sh.`
10. Cron not running → `crontab -l → re-add missing entry → test
-e script manually.`
11. nexus_engine.py change not reflected at port 9191 → restart BOTH services:
-e systemctl --user restart nexus-engine.service && sleep 2 && systemctl --user restart routx.service
WHEN IN DOUBT: Check the spec. Every system on the ship has a spec in
-e ~/memories/SPEC_*.md. The spec contains the invariants, the integration points, and the design
-e intent. The technician's manual tells you HOW to fix. The spec tells you WHY it was built that way.
-e
Both are needed. Neither is sufficient alone.
INVARIANTS
**1. Every fix in this manual has been TESTED
-e on real incidents. No theoretical procedures. If it's in this manual, it happened on this ship and
-e this fix resolved it.
2. The 5-step port kill procedure (Problem 4.1) is SACRED. All 5 steps. Every
-e time. Skipping step 4 (find the supervisor) means the process respawns. Learned the hard way on April
-e 20, 2026.
3. VERIFY after every fix. "I ran the command" is not verification. "The expected output
-e appeared" is verification.
4. This manual grows from incidents. Every new problem that requires more
-e than 2 minutes to diagnose gets an entry. The manual is the ship's MECHANICAL memory — not what it
-e knows, but what it knows how to FIX.
**5. The Quick Reference (10 most common fixes) is taped to the
-e metaphorical wall.** These 10 procedures cover 90% of operational issues. Learn them by heart.
**6. When
-e in doubt: read the spec, check the log, ask the Lobster.** In that order. Don't guess. Don't assume.
-e York, Toronto / 🍁 Φ 0.042. Φζ.⊤.