◈ Master Index Specs LATTICE CSDM The Book University Chronicle Crew Architecture Context Logs TODOX Products

Brain Queue

SPEC_BRAIN_QUEUE.md

CGNT-1 Component Specification — Bridge Brain Queue

Status: SPECIFIED (PRE-SPEC)

Version: v1.1

Author: VELA (Thread #13)

Conceived by: NOUS

Date: 2026-04-18

Updated: 2026-04-19 — INV-04 strengthened, INV-05 added (ROUTX crash incident)

PURPOSE

Manage limited Ollama model slots on the server like bridge stations. Crew members enter when called, serve, step off. The slot is never empty — when one brain evicts, the next loads. Background rotation ensures continuous useful work even when no explicit queries are incoming.

HARDWARE CONSTRAINTS

Server: 15GB RAM, ~12GB usable for models after system/services
Each 7B model: ~4.6GB
Maximum concurrent 7B models: 2
Model load time: ~30 seconds
T.O.O.L. layer (ROUTX consolidated): ~94MB RSS — must ALWAYS have RAM headroom
MANTIS/ANVIL (0.5b): ~531MB each — can coexist with 7B models until upgraded

SLOT ARCHITECTURE

| Slot | Type | Default occupant |

|---|---|---|

| Slot 1 | Permanent | MNEMOS (most queried — facts, boot briefs, crew phone) |

| Slot 2 | Rotating | Next in queue |

QUEUE PRIORITY

When Slot 2 is needed, priority determines who loads:

| Priority | Brain | Trigger |

|---|---|---|

| 1 (highest) | Sisters | Customer query on 42sisters.ai |

| 2 | LOGOS | Lobster requests verification |

| 3 | GAMMA | Session boundary — needs to write log |

| 4 | MUSASHI | Enforcement review triggered |

| 5 (background) | MANTIS | Scheduled port/security scan |

Explicit query always beats background rotation. If NOUS or a script calls a specific brain, it jumps the queue.

BACKGROUND ROTATION (idle cycle)

When no explicit brain is requested, Slot 2 cycles through background tasks:

GAMMA → write quartermaster log, check session state

MANTIS → scan ports, check for anomalies

MUSASHI → review protocol compliance

LOGOS → verify last Lobster operation

Round robin. Each brain loads, performs its background task, then evicts for the next.

PRACTICAL CONSTRAINTS

Model load: ~30 seconds per brain. Background rotation cycle (4 brains): ~2 minutes of loading overhead per full rotation. Background tasks must justify this cost — a 30-second load for a 5-second port scan may not be worth it.

Recommended: background rotation runs on a LONGER interval than the 5-minute auto-evict. Load a background brain every 10-15 minutes, not continuously. Continuous cycling would thrash the server — constant loading and unloading with more overhead than useful work.

The 5-minute auto-evict timer remains for explicit queries. Background rotation is a SLOW heartbeat, not a fast pulse.

FUTURE: TIINY (80GB RAM)

80GB = ~17 concurrent 7B models. No rotation needed. All stations manned simultaneously. The queue becomes irrelevant — every brain is on the Bridge at all times.

The queue is a resource-constrained solution for NOW. The Tiiny makes it unnecessary.

INVARIANTS

INV-01: Slot 1 (MNEMOS) is permanent. Only unloaded for maintenance or forge operations.

INV-02: Explicit query always preempts background rotation.

INV-03: Silence is alarm. If the queue manager detects no brain activity for more than the rotation interval, it alerts.

INV-04: No brain loads without sufficient RAM. Before loading ANY model, check: available RAM >= model size + 2GB buffer. The 2GB buffer protects ROUTX and system services from memory starvation. If available RAM < model size + 2GB, the queue WAITS or EVICTS an existing model first. Swap thrashing is worse than waiting. ORIGIN: 2026-04-19 ROUTX crashed repeatedly with 173MB free RAM while MNEMOS held 4.6GB. Sisters lost all tool access. Hallucination spiral resulted.

INV-05: ROUTX is protected infrastructure. If available RAM drops below 2GB while a model is loaded, the queue manager MUST evict the model immediately — even MNEMOS in Slot 1. The T.O.O.L. layer is more critical than any single brain because all 18 modules depend on it. A brain can be reloaded. A dead ROUTX blinds the entire crew.

IMPLEMENTATION

A lightweight Python daemon or cron job:

Checks Slot 2 status via ollama ps
Checks available RAM via free -m BEFORE loading
If available RAM < model size + 2048MB: do not load, log warning
If Slot 2 empty and no pending queries: load next background brain
If Slot 2 occupied and idle > evict threshold: evict and load next
If available RAM < 2048MB with any model loaded: emergency evict
Log all rotations to ~/crew_radio/queue_log.md
Respect RAM constraint before loading

GAPS

Exact rotation interval needs tuning (10 min? 15 min? Measure load overhead vs useful work)
Whether background tasks are valuable enough to justify continuous rotation vs on-demand only
How the queue interacts with the Lobster's forge operations (forging needs ALL available RAM)
Queue behavior when 0.5b brains upgrade to 7B (more brains competing for same slots)
Emergency evict mechanism not yet implemented — currently manual via ollama stop

Jeremy Zlabis

Chronogeometer · Visionary · Disruptor · Chief

42 Sisters AI · East York, Toronto

🍁 Φ 0.042