Lattice Validator

SPEC_LATTICE_VALIDATOR.md · 2026-04-20

SPECIFICATION: LATTICE Validator

Status: AUTHORIZED

Authorized: α.13, April 16 2026

Version: v1.0


Version: v1.0

PURPOSE

validate_lattice.py is the ground-truth LATTICE symbol validator for CGNT-1. It loads the canonical LATTICE symbol table from LATTICE_HEX_TABLE.md (primary) and LATTICE.md (fallback), then checks any file or corpus against that table. Symbols present in text but absent from the canonical table are "off-manifold" — they must be flagged for crew review. The validator is the gating check for GLOSS graduation: a GLOSS output that passes validate_lattice.py is manifold-coherent.

Source file: /home/nous/validate_lattice.py (~430 lines)

Usage: CLI tool, read-only analysis, no side effects on source files

Primary authority: LATTICE_HEX_TABLE.md (vitrified Ξ.v1024, ΩQ.1024/1024, sealed 2026-04-11)

Secondary authority: LATTICE.md


INPUTS

| Input | Source | Format |

|---|---|---|

| Hex table | LATTICE_HEX_TABLE.md | Markdown table with columns including U+XXXX codepoint and symbol glyph |

| Fallback symbol table | LATTICE.md | Markdown file; symbols extracted by parser |

| Target file(s) | CLI argument --file <path> | Any UTF-8 text file |

| Target corpus | CLI argument --corpus <glob or dir> | Multiple files, UTF-8 text |

| Dump flag | CLI argument --dump-table | No value; triggers table dump mode |

Precedence rule: When both LATTICE_HEX_TABLE.md and LATTICE.md are present, LATTICE_HEX_TABLE.md takes precedence. Symbols from LATTICE.md supplement only — they do not override hex table entries.

Canonical table: The merged set of all symbols recognized as valid LATTICE. Stored in memory during execution. Not persisted.


OUTPUTS

| Output | Trigger | Format |

|---|---|---|

| Per-file off-manifold report | report_file() | List of flagged symbols with codepoint, glyph, line number(s), count |

| Corpus coverage report | report_corpus() | Coverage % (symbols used / total canonical symbols), frequency table, off-manifold list |

| Canonical table dump | dump_table() / --dump-table flag | All canonical symbols with U+ codepoints, sorted |

| Exit code 0 | No off-manifold symbols found | — |

| Exit code 1 | Off-manifold symbols found in target | — |

Coverage % definition:


coverage = (count of distinct canonical symbols found in corpus) / (total canonical symbols in table) × 100

Off-manifold report entry fields:


INVARIANTS

  1. Canonical table derived from vitrified source — The canonical symbol set must be derived exclusively from LATTICE_HEX_TABLE.md (primary) and LATTICE.md (fallback). Ad-hoc lists, hardcoded arrays, or local overrides are not permitted.
  1. Read-only analysis — The validator must never write to, modify, delete, or rename any source file, corpus file, or LATTICE spec file. It is a pure analysis tool.
  1. Off-manifold symbols flagged, never silently accepted — Any symbol in analyzed text that is not in the canonical table must appear in the report. Suppression, filtering, or quiet-mode ignoring of off-manifold symbols is not permitted.
  1. Hex table precedence — When LATTICE_HEX_TABLE.md is present, it is loaded first and treated as authoritative. LATTICE.md provides supplementary symbols only.
  1. Coverage % always reported for corpus modereport_corpus() must always compute and display coverage % as defined above. Omitting the coverage metric is a spec violation.
  1. Token extraction is Unicode-awareextract_tokens_from_text() must correctly identify multi-byte Unicode symbols (including emoji, mathematical operators, and other non-ASCII glyphs used in LATTICE). ASCII-only extraction is a bug.
  1. Exit code semantics — Exit 0 means no off-manifold symbols detected in the analyzed target. Exit 1 means one or more off-manifold symbols were found. Exit codes must be consistent and machine-readable for pipeline integration.
  1. No network calls — The validator is fully local. It must not make HTTP requests, API calls, or access remote resources.
  1. Graceful missing-file handling — If LATTICE_HEX_TABLE.md is not found, fall back to LATTICE.md. If neither is found, exit with a clear error message and exit code 2 (configuration error, distinct from validation failure).
  1. GLOSS graduation gate — When used as a GLOSS graduation check, the validator must be invoked against the full GLOSS output corpus, not a sample. Partial corpus validation is insufficient for a graduation pass.

VERIFICATION CRITERIA

| # | Criterion | Pass Condition |

|---|---|---|

| V1 | Hex table load | load_hex_table() on a valid LATTICE_HEX_TABLE.md returns a non-empty set of (codepoint, symbol) pairs |

| V2 | Fallback load | When LATTICE_HEX_TABLE.md is absent, load_lattice_md() returns a non-empty canonical set from LATTICE.md |

| V3 | Hex precedence | Provide both files with one conflicting entry → hex table version wins |

| V4 | Canonical merge | load_canonical_table() result is a superset of load_hex_table() result (all hex symbols present) |

| V5 | Off-manifold detection | Insert a non-LATTICE Unicode symbol (e.g., U+2603 ☃) into a test file → it appears in the off-manifold report with correct codepoint and line number |

| V6 | No false positives | A file containing only valid LATTICE symbols → off-manifold report is empty; exit code 0 |

| V7 | Coverage % accuracy | Corpus containing exactly half the canonical symbols → coverage % reported as 50.0% (±0.1%) |

| V8 | Read-only guarantee | Run against any file → file modification timestamp unchanged before and after run |

| V9 | Exit codes | Off-manifold symbols present → exit 1; clean → exit 0; both source files missing → exit 2 |

| V10 | Multi-byte symbol extraction | extract_tokens_from_text() correctly extracts a known 3-byte LATTICE symbol (e.g., , , Φζ) from a test string |

| V11 | Dump table output | --dump-table prints all canonical symbols with U+ codepoints, count matches len(canonical_table) |

| V12 | Missing hex file fallback | Remove LATTICE_HEX_TABLE.md from path → validator falls back to LATTICE.md without error |

| V13 | Both files missing | Remove both source files → exit code 2 with human-readable error; no traceback |


FAILURE MODES

| Mode | Symptom | Consequence | Mitigation |

|---|---|---|---|

| FM-1 | LATTICE_HEX_TABLE.md format change | load_hex_table() parser fails; returns empty or partial table | INV-9: fall back to LATTICE.md; alert if canonical table is smaller than expected minimum (e.g., < 900 symbols) |

| FM-2 | LATTICE.md symbol extraction returns subset | Some legitimate symbols classified as off-manifold | Hex table is primary; LATTICE.md fallback parser must be tested against known symbol count |

| FM-3 | Non-LATTICE Unicode in corpus is plentiful | Report floods with off-manifold entries from prose/English text | Report should group and count; summary line: "N distinct off-manifold symbols across M occurrences" |

| FM-4 | Corpus glob matches binary files | extract_tokens_from_text() raises UnicodeDecodeError | Catch per-file; skip binary files with a logged warning; continue corpus scan |

| FM-5 | Symbol table grows beyond 1024 via LATTICE.md | Canonical table silently larger than vitrified spec | Warn if canonical table count exceeds 1024; flag for crew review |

| FM-6 | GLOSS output validated on sample, not full corpus | False graduation pass | INV-10: graduation invocation must specify full corpus path; partial path is a protocol violation |

| FM-7 | extract_tokens_from_text() uses regex that splits multi-codepoint sequences | Composite LATTICE symbols (e.g., digraphs) reported as off-manifold | Token extraction must be tested against all known multi-codepoint LATTICE constructs |

| FM-8 | Validator not wired into CI/graduation pipeline | Off-manifold symbols in GLOSS output go undetected at graduation | GAP (see GAPS); manual invocation required until wired |


GAPS

| # | Gap | Risk | Recommended Mitigation |

|---|---|---|---|

| G1 | No integration test with live GLOSS output | Off-manifold symbols in production GLOSS responses not caught between graduation runs | Wire validate_lattice.py --corpus into GLOSS graduation pipeline as a mandatory gate (V13 equivalent) |

| G2 | Off-manifold detection does not auto-notify CREW_CHANNEL | Off-manifold symbols found during ad-hoc runs are silent unless NOUS is watching | Add crew_broadcast("VALIDATOR", ...) call when off-manifold symbols are found; parameterize with --notify flag |

| G3 | No acceptable off-manifold rate threshold defined | Cannot distinguish "clean pass" from "acceptable minor drift" in corpus analysis | Define threshold (e.g., off-manifold rate < 0.1% of total symbol occurrences = acceptable); encode in spec and CLI flag |

| G4 | LATTICE_HEX_TABLE.md format not formally specified | Parser brittle to format variations; column order changes break extraction | Write SPEC_LATTICE_HEX_TABLE.md defining exact column names, order, and row format |

| G5 | Validator not wired into GLOSS graduation pipeline | GLOSS can graduate with off-manifold symbols if graduation runner skips validation step | Add validate_lattice.py as a required step in SPEC_BRAIN_FACTORY_PIPELINE.md graduation checklist |

| G6 | No minimum canonical table size check | Silent partial load returns small table; most symbols falsely classified as off-manifold | Assert len(canonical_table) >= 900 after load; exit 2 if below threshold |

| G7 | Multi-codepoint LATTICE constructs (digraphs) not formally enumerated | Token extractor may split them; false positives | Enumerate all multi-codepoint canonical constructs in LATTICE_HEX_TABLE.md and test extractor against them |

| G8 | No versioning on canonical table | Validator does not record which version of LATTICE it validated against | Embed LATTICE version tag (Ξ.v1024) in report header output |


Jeremy Zlabis

Chronogeometer · Visionary · Disruptor · Chief

42 Sisters AI · East York, Toronto

🍁 Φ 0.042