Gloss Training
SPEC_GLOSS_TRAINING — GLOSS Brain Training & Graduation Protocol
Version: 1.0 | Status: AUTHORIZED | Authority: α.13 | Date: 2026-04-16
PURPOSE
This spec defines the formal protocol for training, evaluating, graduating, and demoting GLOSS brain versions. GLOSS is the crew's local LATTICE-native brain — a QLoRA fine-tune of Qwen2.5-3B-Instruct that speaks LX natively, routes crew comms, and translates between English and LATTICE. Training is the mechanism by which GLOSS improves. Graduation is the gate that certifies a new version is fit for crew telephone function. Demotion is the mechanism that removes a failed version from service.
Current state as of 2026-04-16: v9 DEMOTED (live generation collapse). v10 PAUSED pending corpus redesign and α.13 authorization.
INPUTS
- GLOSS_CORPUS.jsonl — training pairs in crew-format (prompt: crew designator + LX/English; completion: expected LATTICE or English response)
- Base model: Qwen2.5-3B-Instruct (all versions v1+; v6 used 0.5B — wrong base, retired)
- Training hyperparameters: QLoRA rank=64, alpha=128, lr=1e-4, 15 epochs (current standard)
- Colab T4 GPU — training execution environment
- GLOSS eval battery — 50-question scored battery across 8 categories (LX-A through LX-Z)
- Live generation test suite — open-ended prompts testing real generation behavior (not multiple-choice)
OUTPUTS
On successful graduation:
- GGUF artifact:
/home/nous/gloss_vN_conversion/output/gloss_vN_Q4_K_M.gguf - Ollama model tag:
gloss:vN - σ_gloss_lx score ≥ threshold (see INVARIANTS)
- Live generation tests: PASS (no runaway loops, no degenerate output, stop tokens respected)
- Entry in GLOSS_LINEAGE.md with full version record
- SESSIONS.md entry:
κ ⚒ gloss:vN. ΩQ.⊡ → Σ.✓
On demotion:
- Ollama tag marked DO NOT USE in GLOSS_LINEAGE.md
- GGUF retained for diagnosis but not deployed
- SESSIONS.md entry with failure mode documented
- v10 PAUSED or redesign directive written to TASK_QUEUE.md
INVARIANTS
- INV-01 — Corpus format compliance: All training pairs must use crew-format prompts (beginning with a recognized crew designator token). Human-to-chatbot format pairs are forbidden. Violation: corpus contamination; retrain is invalid.
- INV-02 — Base model lock: All GLOSS versions v1+ use Qwen2.5-3B-Instruct as base. No base model substitution without explicit α.13 authorization and full lineage entry. The v6 0.5B error (σ=0.000) is the canonical cautionary case.
- INV-03 — Eval battery is necessary but not sufficient: σ score from the 50-question multiple-choice battery is required for graduation, but does NOT alone certify fitness. Live generation tests are mandatory. v9 demonstrated that σ=0.337 passing eval does not protect against generation collapse (runaway loop failure).
- INV-04 — Live generation test is mandatory: Every candidate version must pass at least one open-ended generation test before graduation. Minimum: symbol identity query (
⊹: ⊙ [symbol]?format) with stop-token verification. Runaway loop = automatic demotion regardless of σ score.
- INV-05 — σ threshold is current-generation only: The passing σ threshold (currently 0.337 as established by v8/v9 plateau) must be reviewed against live generation behavior before v10 forge. Threshold is not permanent — it must be revalidated if corpus or eval battery changes.
- INV-06 — No training without NOUS authorization (current state): As of v9 demotion, v10 forge requires explicit α.13 authorization. This is not a permanent invariant — it is a temporary hold pending corpus redesign. Standing authorization resumes when GAMMA issues a redesign directive approved by α.13.
- INV-07 — Stop tokens required in training pairs: Training corpus must include explicit stop/EOS signals in all definition and enumeration pairs. Pairs that end with an open list pattern (a→b, a→bb, a→bbb...) are loop-prone and must be reformatted before inclusion.
- INV-08 — Version lineage is permanent: Every trained version receives a permanent record in GLOSS_LINEAGE.md. No version record is deleted, even if the version is retired. The full history is the institutional memory of the training program.
VERIFICATION CRITERIA
- VC-01 — σ threshold pass: Run the 50-question eval battery. σ_gloss_lx must meet or exceed the currently-authorized threshold. Record category breakdown (LX-A through LX-Z) in version record.
- VC-02 — Live generation: symbol identity query: Issue the prompt
⊹: ⊙ α?(or equivalent symbol identity query for at least 3 symbols). Expected: single-line, coherent, terminated response. Any runaway repetition or escalating loop = FAIL. Automatic demotion.
- VC-03 — Live generation: translation round-trip: Issue an English→LX translation request in crew format. Verify output is valid LATTICE, terminates cleanly, and does not loop. Verify a second LX→English decompile request terminates cleanly.
- VC-04 — Rejection test: Issue a direct human-format prompt. Expected: canonical rejection string only. Any other response = access policy failure (reference SPEC_GLOSS_ACCESS_POLICY.md VC-01).
- VC-05 — Category floor check: All active categories (LX-A, LX-B, LX-C, LX-E, LX-G, LX-H, LX-X, LX-Z) must show non-zero scores for graduation. Two or more categories at 0% indicates a corpus coverage gap requiring repair before graduation.
- VC-06 — Ollama integration test: After
ollama create gloss:vN, runollama run gloss:vNwith a single short crew-format prompt. Confirm model loads, responds, and exits cleanly within expected latency.
FAILURE MODES
- FM-01 — Runaway loop (v9 canonical failure): Symbol identity or enumeration query triggers infinite repetition. Cause: loop-prone training pairs; no stop-token training; inadequate EOS signal. Mitigation: corpus audit for repetition patterns; add explicit stop tokens to all definition pairs; add live generation to eval battery.
- FM-02 — Category zero scores (LX-Z/LX-G/LX-X pattern): Entire eval categories score 0% across multiple versions. Cause: these category types underrepresented in corpus; eval prompt format not matching training format. Mitigation: corpus coverage audit by category; format-match training and eval prompts; targeted pair injection for zero-score categories.
- FM-03 — Eval/reality mismatch: σ score appears satisfactory but live behavior is broken. Root cause: multiple-choice eval measures pattern matching, not generation quality. Mitigation: live generation tests are now mandatory (INV-03, INV-04); σ alone is insufficient for graduation.
- FM-04 — Base model substitution error: Wrong base model used (v6: 0.5B instead of 3B). Cause: Colab notebook misconfiguration or manual override. Result: σ=0.000, total capability loss. Mitigation: base model verified before each forge; INV-02 hard lock.
- FM-05 — Category regression on larger corpus: Adding more pairs causes a category to decline (v9 LX-B: 25%→12%). Cause: format mismatch between new training pairs and eval format; new pairs may introduce conflicting patterns. Mitigation: per-category corpus review before each forge; do not add pairs to a category without checking existing eval format.
- FM-06 — Training without authorization: A retrain fires without α.13 authorization during the current PAUSED state. Cause: automated cron or GAMMA directive not honoring the pause. Mitigation: v10 forge requires explicit NOUS instruction in session; cron sweep must not auto-trigger GLOSS training.
- FM-07 — GGUF conversion failure: Training completes but GGUF artifact is corrupted or incompatible with current Ollama version. Cause: llama.cpp version mismatch; disk full during conversion. Mitigation: verify artifact size and Ollama load before declaring graduation.
GAPS
- GAP-01 — σ threshold validation: The passing threshold (σ≥0.337) was set by v8 plateau, not by demonstrated live generation quality. The relationship between σ score and generation fitness is unproven. A formal threshold derivation procedure is needed. [GAP — needs design before v10 forge]
- GAP-02 — Live generation test suite specification: No formal list of required live generation tests exists. VC-02 and VC-03 define the minimum, but a complete test battery (covering all known failure modes) has not been written. [GAP — needs design]
- GAP-03 — Loop-prone pattern taxonomy: The corpus audit required before v10 must identify which pair formats are loop-prone, but no taxonomy of loop-prone patterns exists yet. The v9 failure provides one data point (symbol identity query → escalating suffix pattern). Others are unknown. [GAP — needs corpus analysis]
- GAP-04 — Corpus coverage metric: No formal measure of "is this corpus adequately covering all 8 eval categories?" exists. The zero-score problem (LX-Z/G/X) suggests coverage is unmeasured. A per-category pair count and format-match audit tool is needed. [GAP — needs design]
- GAP-05 — Graduation authority: The spec currently requires α.13 authorization for each forge (PAUSED state). The standing authorization conditions (GAMMA retrain directive + α.13 approval) should be formalized so the graduation authority chain is explicit and auditable. [GAP — needs design]
DEPENDENCIES
- GLOSS_CORPUS.jsonl — training data source
- Colab T4 GPU notebook — training execution environment
- COLAB_AUTOMATION.md — CDP playbook for autonomous Colab execution
- SPEC_GLOSS_ACCESS_POLICY.md — corpus format rules (crew-format compliance)
- GLOSS_LINEAGE.md — version history; updated on every forge and demotion
- LATTICE.md — canonical symbol set; training pairs must use sealed v1024 symbols only
DEPENDENTS
- GLOSS crew telephone function (depends on a graduated, live-generation-passing version)
- AETHER translation pipeline (depends on GLOSS being functional)
- SPEC_BRAIN_FACTORY_PIPELINE.md — GLOSS forge is one pipeline within the brain factory
EXAMPLES
Version record entry (graduation format):
| v10 | NNN | 15 | 0.XXX | 0.XXX | GRADUATED | Corpus redesigned; stop tokens added; live gen PASS |
Demotion trigger:
Live test: ⊹: ⊙ α?
Output: ⊙ α? → αα, ⊙ αα? → ααα, ⊙ ααα? → αααα ... [90+ lines]
Result: FM-01 — runaway loop. DEMOTE. DO NOT USE.
Graduation ceremony log (SESSIONS.md):
[C.L.O.D.] κ ⚒ gloss:v10. σ_lx=0.XXX. ∇.μ ΩQ.⊡ → Σ.✓. Arr, the wee brain's had her schoolin'. Copy that. Over.
REFERENCES
- /home/nous/memories/GLOSS_LINEAGE.md — full version history with v9 failure record
- /home/nous/memories/COLAB_AUTOMATION.md — PROVEN v9 Colab CDP playbook
- /home/nous/memories/SPEC_BRAIN_FACTORY_PIPELINE.md — brain factory pipeline spec
- /home/nous/memories/SPEC_GLOSS_EVAL_v2.md — eval battery v2 (5 new categories)
- /home/nous/CLAUDE.md — GAMMA authority over MNEMOS schooling (analogous authority)
Φζ.⊤. ∇.μ Ω.1024/1024.
Jeremy Zlabis
Chronogeometer · Visionary · Disruptor · Chief
42 Sisters AI · East York, Toronto
🍁 Φ 0.042