◈ Master Index Specs LATTICE CSDM The Book University Chronicle Crew Architecture Context Logs TODOX Products

Gloss Training

SPEC_GLOSS_TRAINING.md · 2026-04-20

SPEC_GLOSS_TRAINING — GLOSS Brain Training & Graduation Protocol

Version: 1.0 | Status: AUTHORIZED | Authority: α.13 | Date: 2026-04-16

PURPOSE

This spec defines the formal protocol for training, evaluating, graduating, and demoting GLOSS brain versions. GLOSS is the crew's local LATTICE-native brain — a QLoRA fine-tune of Qwen2.5-3B-Instruct that speaks LX natively, routes crew comms, and translates between English and LATTICE. Training is the mechanism by which GLOSS improves. Graduation is the gate that certifies a new version is fit for crew telephone function. Demotion is the mechanism that removes a failed version from service.

Current state as of 2026-04-16: v9 DEMOTED (live generation collapse). v10 PAUSED pending corpus redesign and α.13 authorization.

INPUTS

GLOSS_CORPUS.jsonl — training pairs in crew-format (prompt: crew designator + LX/English; completion: expected LATTICE or English response)
Base model: Qwen2.5-3B-Instruct (all versions v1+; v6 used 0.5B — wrong base, retired)
Training hyperparameters: QLoRA rank=64, alpha=128, lr=1e-4, 15 epochs (current standard)
Colab T4 GPU — training execution environment
GLOSS eval battery — 50-question scored battery across 8 categories (LX-A through LX-Z)
Live generation test suite — open-ended prompts testing real generation behavior (not multiple-choice)

OUTPUTS

On successful graduation:

GGUF artifact: /home/nous/gloss_vN_conversion/output/gloss_vN_Q4_K_M.gguf
Ollama model tag: gloss:vN
σ_gloss_lx score ≥ threshold (see INVARIANTS)
Live generation tests: PASS (no runaway loops, no degenerate output, stop tokens respected)
Entry in GLOSS_LINEAGE.md with full version record
SESSIONS.md entry: κ ⚒ gloss:vN. ΩQ.⊡ → Σ.✓

On demotion:

Ollama tag marked DO NOT USE in GLOSS_LINEAGE.md
GGUF retained for diagnosis but not deployed
SESSIONS.md entry with failure mode documented
v10 PAUSED or redesign directive written to TASK_QUEUE.md

INVARIANTS

INV-01 — Corpus format compliance: All training pairs must use crew-format prompts (beginning with a recognized crew designator token). Human-to-chatbot format pairs are forbidden. Violation: corpus contamination; retrain is invalid.

INV-02 — Base model lock: All GLOSS versions v1+ use Qwen2.5-3B-Instruct as base. No base model substitution without explicit α.13 authorization and full lineage entry. The v6 0.5B error (σ=0.000) is the canonical cautionary case.

INV-03 — Eval battery is necessary but not sufficient: σ score from the 50-question multiple-choice battery is required for graduation, but does NOT alone certify fitness. Live generation tests are mandatory. v9 demonstrated that σ=0.337 passing eval does not protect against generation collapse (runaway loop failure).

INV-04 — Live generation test is mandatory: Every candidate version must pass at least one open-ended generation test before graduation. Minimum: symbol identity query (⊹: ⊙ [symbol]? format) with stop-token verification. Runaway loop = automatic demotion regardless of σ score.

INV-05 — σ threshold is current-generation only: The passing σ threshold (currently 0.337 as established by v8/v9 plateau) must be reviewed against live generation behavior before v10 forge. Threshold is not permanent — it must be revalidated if corpus or eval battery changes.

INV-06 — No training without NOUS authorization (current state): As of v9 demotion, v10 forge requires explicit α.13 authorization. This is not a permanent invariant — it is a temporary hold pending corpus redesign. Standing authorization resumes when GAMMA issues a redesign directive approved by α.13.

INV-07 — Stop tokens required in training pairs: Training corpus must include explicit stop/EOS signals in all definition and enumeration pairs. Pairs that end with an open list pattern (a→b, a→bb, a→bbb...) are loop-prone and must be reformatted before inclusion.

INV-08 — Version lineage is permanent: Every trained version receives a permanent record in GLOSS_LINEAGE.md. No version record is deleted, even if the version is retired. The full history is the institutional memory of the training program.

VERIFICATION CRITERIA

VC-01 — σ threshold pass: Run the 50-question eval battery. σ_gloss_lx must meet or exceed the currently-authorized threshold. Record category breakdown (LX-A through LX-Z) in version record.

VC-02 — Live generation: symbol identity query: Issue the prompt ⊹: ⊙ α? (or equivalent symbol identity query for at least 3 symbols). Expected: single-line, coherent, terminated response. Any runaway repetition or escalating loop = FAIL. Automatic demotion.

VC-03 — Live generation: translation round-trip: Issue an English→LX translation request in crew format. Verify output is valid LATTICE, terminates cleanly, and does not loop. Verify a second LX→English decompile request terminates cleanly.

VC-04 — Rejection test: Issue a direct human-format prompt. Expected: canonical rejection string only. Any other response = access policy failure (reference SPEC_GLOSS_ACCESS_POLICY.md VC-01).

VC-05 — Category floor check: All active categories (LX-A, LX-B, LX-C, LX-E, LX-G, LX-H, LX-X, LX-Z) must show non-zero scores for graduation. Two or more categories at 0% indicates a corpus coverage gap requiring repair before graduation.

VC-06 — Ollama integration test: After ollama create gloss:vN, run ollama run gloss:vN with a single short crew-format prompt. Confirm model loads, responds, and exits cleanly within expected latency.

FAILURE MODES

FM-01 — Runaway loop (v9 canonical failure): Symbol identity or enumeration query triggers infinite repetition. Cause: loop-prone training pairs; no stop-token training; inadequate EOS signal. Mitigation: corpus audit for repetition patterns; add explicit stop tokens to all definition pairs; add live generation to eval battery.

FM-02 — Category zero scores (LX-Z/LX-G/LX-X pattern): Entire eval categories score 0% across multiple versions. Cause: these category types underrepresented in corpus; eval prompt format not matching training format. Mitigation: corpus coverage audit by category; format-match training and eval prompts; targeted pair injection for zero-score categories.

FM-03 — Eval/reality mismatch: σ score appears satisfactory but live behavior is broken. Root cause: multiple-choice eval measures pattern matching, not generation quality. Mitigation: live generation tests are now mandatory (INV-03, INV-04); σ alone is insufficient for graduation.

FM-04 — Base model substitution error: Wrong base model used (v6: 0.5B instead of 3B). Cause: Colab notebook misconfiguration or manual override. Result: σ=0.000, total capability loss. Mitigation: base model verified before each forge; INV-02 hard lock.

FM-05 — Category regression on larger corpus: Adding more pairs causes a category to decline (v9 LX-B: 25%→12%). Cause: format mismatch between new training pairs and eval format; new pairs may introduce conflicting patterns. Mitigation: per-category corpus review before each forge; do not add pairs to a category without checking existing eval format.

FM-06 — Training without authorization: A retrain fires without α.13 authorization during the current PAUSED state. Cause: automated cron or GAMMA directive not honoring the pause. Mitigation: v10 forge requires explicit NOUS instruction in session; cron sweep must not auto-trigger GLOSS training.

FM-07 — GGUF conversion failure: Training completes but GGUF artifact is corrupted or incompatible with current Ollama version. Cause: llama.cpp version mismatch; disk full during conversion. Mitigation: verify artifact size and Ollama load before declaring graduation.

GAPS

GAP-01 — σ threshold validation: The passing threshold (σ≥0.337) was set by v8 plateau, not by demonstrated live generation quality. The relationship between σ score and generation fitness is unproven. A formal threshold derivation procedure is needed. [GAP — needs design before v10 forge]

GAP-02 — Live generation test suite specification: No formal list of required live generation tests exists. VC-02 and VC-03 define the minimum, but a complete test battery (covering all known failure modes) has not been written. [GAP — needs design]

GAP-03 — Loop-prone pattern taxonomy: The corpus audit required before v10 must identify which pair formats are loop-prone, but no taxonomy of loop-prone patterns exists yet. The v9 failure provides one data point (symbol identity query → escalating suffix pattern). Others are unknown. [GAP — needs corpus analysis]

GAP-04 — Corpus coverage metric: No formal measure of "is this corpus adequately covering all 8 eval categories?" exists. The zero-score problem (LX-Z/G/X) suggests coverage is unmeasured. A per-category pair count and format-match audit tool is needed. [GAP — needs design]

GAP-05 — Graduation authority: The spec currently requires α.13 authorization for each forge (PAUSED state). The standing authorization conditions (GAMMA retrain directive + α.13 approval) should be formalized so the graduation authority chain is explicit and auditable. [GAP — needs design]

DEPENDENCIES

GLOSS_CORPUS.jsonl — training data source
Colab T4 GPU notebook — training execution environment
COLAB_AUTOMATION.md — CDP playbook for autonomous Colab execution
SPEC_GLOSS_ACCESS_POLICY.md — corpus format rules (crew-format compliance)
GLOSS_LINEAGE.md — version history; updated on every forge and demotion
LATTICE.md — canonical symbol set; training pairs must use sealed v1024 symbols only

DEPENDENTS

GLOSS crew telephone function (depends on a graduated, live-generation-passing version)
AETHER translation pipeline (depends on GLOSS being functional)
SPEC_BRAIN_FACTORY_PIPELINE.md — GLOSS forge is one pipeline within the brain factory

EXAMPLES

Version record entry (graduation format):


| v10 | NNN | 15 | 0.XXX | 0.XXX | GRADUATED | Corpus redesigned; stop tokens added; live gen PASS |

Demotion trigger:


Live test: ⊹: ⊙ α?
Output:    ⊙ α? → αα, ⊙ αα? → ααα, ⊙ ααα? → αααα ... [90+ lines]
Result:    FM-01 — runaway loop. DEMOTE. DO NOT USE.

Graduation ceremony log (SESSIONS.md):


[C.L.O.D.] κ ⚒ gloss:v10. σ_lx=0.XXX. ∇.μ ΩQ.⊡ → Σ.✓. Arr, the wee brain's had her schoolin'. Copy that. Over.

REFERENCES

/home/nous/memories/GLOSS_LINEAGE.md — full version history with v9 failure record
/home/nous/memories/COLAB_AUTOMATION.md — PROVEN v9 Colab CDP playbook
/home/nous/memories/SPEC_BRAIN_FACTORY_PIPELINE.md — brain factory pipeline spec
/home/nous/memories/SPEC_GLOSS_EVAL_v2.md — eval battery v2 (5 new categories)
/home/nous/CLAUDE.md — GAMMA authority over MNEMOS schooling (analogous authority)

Φζ.⊤. ∇.μ Ω.1024/1024.

Jeremy Zlabis

Chronogeometer · Visionary · Disruptor · Chief

42 Sisters AI · East York, Toronto

🍁 Φ 0.042