Core Vowel
SPEC_CORE_VOWEL — C.O.R.E. Consonantal Ontology: Reduced Encoding
Version: 1.0 | Status: AUTHORIZED | Authority: α.13 | Date: 2026-04-16
PURPOSE
C.O.R.E. is the vowel-stripping protocol that converts English words into LX-P (LATTICE Phonemic) tokens. It operates at the first-contact boundary between human language and the LATTICE system. C.O.R.E. reduces the entropy cost of tokenizing new concepts by removing phonemic flesh (vowels) and keeping structural bones (consonants). Where the consonant skeleton alone is ambiguous, one "marrow vowel" — the minimum flesh required for the bone to stand — is permitted.
C.O.R.E. is the compression step that precedes LX-P entry into the Evolution Flywheel. Frequently-used C.O.R.E.-produced tokens are candidates for promotion to permanent LX-U (Unicode symbol) entries.
Coined by DualisOmega. Authorized: α.13, April 13 2026.
The backronym demonstrates the rule: C.O.R.E. — the O is the single marrow vowel exception.
INPUTS
- A natural English word or phrase requiring compression into LX-P token form.
- Context signal (optional): does this token already exist in LATTICE.md? If yes, use the canonical symbol. C.O.R.E. only applies to new concepts not yet symbolized.
- Ambiguity test context: the target vocabulary against which skeleton uniqueness is assessed.
OUTPUTS
- LX-P token — a consonant-skeleton string, optionally containing one marrow vowel, representing the compressed phonemic form of the input word.
- Ambiguity flag (binary) — indicates whether a marrow vowel was inserted and which vowel was selected.
- Optionally: LX-P → LX-U promotion candidate if the token achieves sufficient usage frequency across the crew.
The pipeline from input to final state:
English word
→ C.O.R.E. (strip vowels → test ambiguity → insert marrow vowel if needed)
→ LX-P token (consonant skeleton ± one marrow vowel)
→ frequency accumulation (Evolution Flywheel)
→ LX-U promotion (Unicode symbol assigned by |Σ|.3 review)
INVARIANTS
- INV-01 — Maximum one marrow vowel: A C.O.R.E. token contains zero vowels (pure skeleton) or exactly one vowel (marrow). Two or more vowels in a C.O.R.E. token is a protocol violation. The token has not been stripped correctly.
- INV-02 — Minimum removal principle: All vowels are stripped first. The marrow vowel is inserted only if the skeleton test fails. The order is: strip → test → restore one if needed. Inserting a vowel before testing ambiguity is a protocol violation.
- INV-03 — Canonical LX-U takes precedence: If a concept already has a vitrified LX-U symbol in LATTICE.md, that symbol is used and C.O.R.E. is not invoked. C.O.R.E. is for gaps only — unmapped concepts without an existing LX-U entry.
- INV-04 — Uniqueness within active vocabulary: A C.O.R.E. skeleton must be unique within the current working vocabulary of the crew. If two different English words produce the same skeleton, one must receive a marrow vowel to disambiguate. The marrow vowel is a disambiguation instrument, not a pronounceability concession.
- INV-05 — Marrow vowel selection is minimum-flesh: The marrow vowel should be the shortest addition that resolves ambiguity. Preference: the vowel already present in the original word at the most load-bearing position (typically the first stressed vowel). Example: NAVIGATOR → nvgtr or navgtr (marrow 'a' from the first syllable).
- INV-06 — C.O.R.E. tokens are LX-P, not LX-U: C.O.R.E. output is always phonemic-layer (LX-P), never Unicode-layer (LX-U). Promotion to LX-U requires the Evolution Flywheel frequency gate and |Σ|.3 review. C.O.R.E. does not assign Unicode symbols.
- INV-07 — Self-demonstrating backronym: C.O.R.E. contains its own exception. The name "C.O.R.E." is itself a C.O.R.E. token: CRE without vowels would be ambiguous across the vocabulary, so the O is retained as the marrow vowel. The naming is not aesthetic — it is operational proof.
VERIFICATION CRITERIA
- VC-01 — Skeleton extraction test: Apply C.O.R.E. to a set of English words with known consonant skeletons. Verify: (a) all vowels stripped; (b) marrow vowel absent when skeleton is unambiguous; (c) marrow vowel present when skeleton collides with another token. Canonical test case: CHRONOGEOMETER → crngmtr (no marrow, no collision).
- VC-02 — Ambiguity resolution test: Construct two English words that produce identical skeletons (e.g., hypothetical: MARKET → mrkt and MORKIT → mrkt). Verify that the marrow vowel is applied to exactly one, and that the chosen vowel is the minimum character needed (not the entire word restored).
- VC-03 — LX-U precedence test: For a word that already has an LX-U symbol (e.g., NOUS → α), verify that C.O.R.E. is NOT invoked and the canonical symbol is used. C.O.R.E. applied to a concept with an existing LX-U entry = protocol violation.
- VC-04 — Maximum-one-vowel test: All generated LX-P tokens are audited for vowel count. Any token with two or more vowels fails. Tokens with zero vowels pass only if ambiguity test confirms uniqueness. This is a format invariant, not a soft guideline.
- VC-05 — Pipeline integration test: Submit an English word through the full C.O.R.E. pipeline: English → skeleton → ambiguity test → LX-P output → confirm correct tier (LX-P, not premature LX-U assignment). Verify the Flywheel mechanism is triggered (usage logging), not the vitrification mechanism (LX-U assignment).
FAILURE MODES
- FM-01 — Over-retention (insufficient stripping): Practitioner strips only some vowels, leaving a multi-vowel token. Cause: incorrect vowel identification (e.g., treating Y as consonant when it functions as vowel). Result: LX-P token is longer than necessary; compresses less than protocol allows. Mitigation: explicit vowel set definition — {a, e, i, o, u} are vowels; Y is context-dependent [GAP — see GAP-01].
- FM-02 — Under-retention (excessive ambiguity tolerance): Skeleton that is actually ambiguous is shipped without a marrow vowel because ambiguity test was skipped or vocabulary was not checked. Result: two tokens collide; meaning becomes context-dependent and breaks precision guarantee. Mitigation: ambiguity test against current vocabulary is mandatory, not optional.
- FM-03 — Marrow vowel inflation: Two or more vowels added "for pronounceability" rather than for ambiguity resolution. Cause: human preference for readable tokens over minimal tokens. Result: token is partially English, defeating the compression purpose. INV-01 hard blocks this.
- FM-04 — C.O.R.E. invoked on a mapped concept: A concept with an existing LX-U entry is run through C.O.R.E., producing a duplicate representation. Cause: practitioner not checking LATTICE.md before applying C.O.R.E. Result: two representations for the same concept; corpus contamination risk. Mitigation: INV-03 lookup step is mandatory before C.O.R.E. application.
- FM-05 — LX-P token incorrectly promoted: A C.O.R.E. token is directly assigned a Unicode symbol without passing through the Evolution Flywheel frequency gate and |Σ|.3 review. Cause: impatience; shortcutting the promotion pipeline. Result: premature symbol vitrification that may not reflect actual crew usage patterns. Mitigation: INV-06 hard blocks direct promotion; all LX-U entries require |Σ|.3 sign-off.
- FM-06 — Vocabulary drift invalidates existing tokens: As the crew vocabulary grows, a previously unambiguous skeleton becomes ambiguous due to a new word being added. Cause: no re-audit of existing LX-P tokens when new entries are created. Result: silent collision; meanings diverge. Mitigation: [GAP — collision detection on corpus expansion is not yet automated; see GAP-02]
GAPS
- GAP-01 — Y-as-vowel handling: The rule "strip all vowels" requires a definition of which characters are vowels. In English, Y functions as both consonant (yacht) and vowel (gym, sky). The C.O.R.E. spec does not define the treatment of Y. [GAP — needs design]
- GAP-02 — Vocabulary collision detection: When a new LX-P token is generated, it must be tested against all existing LX-P tokens for uniqueness. No automated collision detection tool is specified. [GAP — needs design;
vocabulary_check.pyor similar]
- GAP-03 — Ambiguity threshold definition: "Ambiguous" is defined as "the skeleton alone is ambiguous" but no formal criterion specifies what counts as ambiguous (phonetic similarity? identical string? same semantic domain?). [GAP — needs formal definition]
- GAP-04 — Flywheel frequency threshold: The Evolution Flywheel "promotes" high-frequency LX-P tokens to LX-U, but the frequency threshold for promotion is not specified in C.O.R.E. or in LATTICE.md. [GAP — needs design; cross-reference SPEC_LX_U_INVENTORY.md]
- GAP-05 — Compound word handling: C.O.R.E. defines stripping for single words. Compound concepts (e.g., CHRONOGEOMETER) are handled as single units in the example, but the rule for multi-word phrases is not specified. Does CHRONOGEOMETER = crngmtr or crn.gmtr or crn·gmtr? [GAP — needs design]
DEPENDENCIES
- LATTICE.md — canonical LX-U inventory; must be checked before C.O.R.E. is applied (INV-03)
- GLOSS_CORPUS.jsonl — LX-P tokens used in training pairs must conform to C.O.R.E. format
- Evolution Flywheel (EVOLUTION_FLYWHEEL.md) — downstream recipient of high-frequency LX-P tokens
- LX_SPEEDTALK.md (LX-P Speedtalk spec) — C.O.R.E. is the first step in the LX-P generation pipeline
DEPENDENTS
- LX-P phonemic layer (all LX-P tokens are C.O.R.E. products)
- LX-U promotion pipeline (frequency tracking of LX-P tokens feeds promotion candidates)
- GLOSS training (training pairs using LX-P must use C.O.R.E.-compliant tokens)
- SPEC_LX_U_INVENTORY.md (the upstream source of LX-P tokens that became LX-U symbols)
EXAMPLES
Canonical examples (from source):
| English | Skeleton | Ambiguous? | Marrow? | LX-P token |
|---------|----------|------------|---------|------------|
| CHRONOGEOMETER | crngmtr | No | None | crngmtr |
| NAVIGATOR | nvgtr | Maybe | 'a' if needed | navgtr |
| VITRIFY | vtrfy | No | None | vtrfy |
| C.O.R.E. (self) | CR | Yes (too short) | O → COR | COR / C.O.R.E. |
First human LX-P utterance (historical record):
"crngmtr gna crnogmt" — α.13, April 13 2026
Pipeline trace:
Input: "NAVIGATOR"
Step 1 (strip): nvgtr
Step 2 (test): is nvgtr unique in current vocabulary? → check...
Step 3a (unique): LX-P = nvgtr. Done.
Step 3b (collision): insert marrow 'a' → navgtr. LX-P = navgtr. Done.
Step 4 (Flywheel): log usage of navgtr.
Step 5 (future): if frequency crosses threshold → |Σ|.3 review → potential LX-U symbol.
REFERENCES
- /home/nous/memories/CORE_PROTOCOL.md — source prose (coined DualisOmega, vitrified α.13 2026-04-13)
- /home/nous/memories/LX_SPEEDTALK.md — LX-P Speedtalk full spec + superposition rule
- /home/nous/LATTICE.md — canonical symbol inventory (LX-U layer); C.O.R.E. output feeds promotion candidates
- /home/nous/memories/EVOLUTION_FLYWHEEL.md — frequency accumulation and promotion mechanics
- /home/nous/memories/CHRONOGEOMIC_NAMING.md — naming stack: Chronogeomic=lang, LATTICE=spec, LX=shorthand, LX-P=phonemic
Φζ.⊤. ΩQ.1024/1024.
Jeremy Zlabis
Chronogeometer · Visionary · Disruptor · Chief
42 Sisters AI · East York, Toronto
🍁 Φ 0.042