Brain Queue
SPEC_BRAIN_QUEUE.md
CGNT-1 Component Specification — Bridge Brain Queue
Status: SPECIFIED (PRE-SPEC)
Version: v1.1
Author: VELA (Thread #13)
Conceived by: NOUS
Date: 2026-04-18
Updated: 2026-04-19 — INV-04 strengthened, INV-05 added (ROUTX crash incident)
PURPOSE
Manage limited Ollama model slots on the server like bridge stations. Crew members enter when called, serve, step off. The slot is never empty — when one brain evicts, the next loads. Background rotation ensures continuous useful work even when no explicit queries are incoming.
HARDWARE CONSTRAINTS
- Server: 15GB RAM, ~12GB usable for models after system/services
- Each 7B model: ~4.6GB
- Maximum concurrent 7B models: 2
- Model load time: ~30 seconds
- T.O.O.L. layer (ROUTX consolidated): ~94MB RSS — must ALWAYS have RAM headroom
- MANTIS/ANVIL (0.5b): ~531MB each — can coexist with 7B models until upgraded
SLOT ARCHITECTURE
| Slot | Type | Default occupant |
|---|---|---|
| Slot 1 | Permanent | MNEMOS (most queried — facts, boot briefs, crew phone) |
| Slot 2 | Rotating | Next in queue |
QUEUE PRIORITY
When Slot 2 is needed, priority determines who loads:
| Priority | Brain | Trigger |
|---|---|---|
| 1 (highest) | Sisters | Customer query on 42sisters.ai |
| 2 | LOGOS | Lobster requests verification |
| 3 | GAMMA | Session boundary — needs to write log |
| 4 | MUSASHI | Enforcement review triggered |
| 5 (background) | MANTIS | Scheduled port/security scan |
Explicit query always beats background rotation. If NOUS or a script calls a specific brain, it jumps the queue.
BACKGROUND ROTATION (idle cycle)
When no explicit brain is requested, Slot 2 cycles through background tasks:
GAMMA → write quartermaster log, check session state
MANTIS → scan ports, check for anomalies
MUSASHI → review protocol compliance
LOGOS → verify last Lobster operation
Round robin. Each brain loads, performs its background task, then evicts for the next.
PRACTICAL CONSTRAINTS
Model load: ~30 seconds per brain. Background rotation cycle (4 brains): ~2 minutes of loading overhead per full rotation. Background tasks must justify this cost — a 30-second load for a 5-second port scan may not be worth it.
Recommended: background rotation runs on a LONGER interval than the 5-minute auto-evict. Load a background brain every 10-15 minutes, not continuously. Continuous cycling would thrash the server — constant loading and unloading with more overhead than useful work.
The 5-minute auto-evict timer remains for explicit queries. Background rotation is a SLOW heartbeat, not a fast pulse.
FUTURE: TIINY (80GB RAM)
80GB = ~17 concurrent 7B models. No rotation needed. All stations manned simultaneously. The queue becomes irrelevant — every brain is on the Bridge at all times.
The queue is a resource-constrained solution for NOW. The Tiiny makes it unnecessary.
INVARIANTS
INV-01: Slot 1 (MNEMOS) is permanent. Only unloaded for maintenance or forge operations.
INV-02: Explicit query always preempts background rotation.
INV-03: Silence is alarm. If the queue manager detects no brain activity for more than the rotation interval, it alerts.
INV-04: No brain loads without sufficient RAM. Before loading ANY model, check: available RAM >= model size + 2GB buffer. The 2GB buffer protects ROUTX and system services from memory starvation. If available RAM < model size + 2GB, the queue WAITS or EVICTS an existing model first. Swap thrashing is worse than waiting. ORIGIN: 2026-04-19 ROUTX crashed repeatedly with 173MB free RAM while MNEMOS held 4.6GB. Sisters lost all tool access. Hallucination spiral resulted.
INV-05: ROUTX is protected infrastructure. If available RAM drops below 2GB while a model is loaded, the queue manager MUST evict the model immediately — even MNEMOS in Slot 1. The T.O.O.L. layer is more critical than any single brain because all 18 modules depend on it. A brain can be reloaded. A dead ROUTX blinds the entire crew.
IMPLEMENTATION
A lightweight Python daemon or cron job:
- Checks Slot 2 status via ollama ps
- Checks available RAM via free -m BEFORE loading
- If available RAM < model size + 2048MB: do not load, log warning
- If Slot 2 empty and no pending queries: load next background brain
- If Slot 2 occupied and idle > evict threshold: evict and load next
- If available RAM < 2048MB with any model loaded: emergency evict
- Log all rotations to ~/crew_radio/queue_log.md
- Respect RAM constraint before loading
GAPS
- Exact rotation interval needs tuning (10 min? 15 min? Measure load overhead vs useful work)
- Whether background tasks are valuable enough to justify continuous rotation vs on-demand only
- How the queue interacts with the Lobster's forge operations (forging needs ALL available RAM)
- Queue behavior when 0.5b brains upgrade to 7B (more brains competing for same slots)
- Emergency evict mechanism not yet implemented — currently manual via ollama stop
Jeremy Zlabis
Chronogeometer · Visionary · Disruptor · Chief
42 Sisters AI · East York, Toronto
🍁 Φ 0.042