Monitoring Escalation

SPEC_MONITORING_ESCALATION.md · 2026-04-20

SPEC_MONITORING_ESCALATION.md

CGNT-1 Specification — Monitoring & Alert Escalation Hierarchy

Status: SPECIFIED

Version: v1.0

Author: VELA (Thread #13)

Conceived by: NOUS (α.13)

Date: 2026-04-20


PURPOSE

When something goes wrong on the ship, who gets told, in what order, how fast? This spec defines the chain: detection → classification → notification → response → resolution. No alert falls on the floor. No critical issue waits for someone to stumble across it.


ALERT SEVERITY LEVELS

| Level | Name | Criteria | Response Time |

|---|---|---|---|

| P0 | CRITICAL | Active security breach, data loss in progress, revenue system down | IMMEDIATE |

| P1 | HIGH | Service outage >15 min, credential exposure, backup failure, disk >90% | Within 1 hour |

| P2 | MEDIUM | Degraded service, stale handshake, forge failure, disk >80%, failed smoke test | Within session |

| P3 | LOW | Cosmetic bug, spec drift, minor config issue, dependency update available | When convenient |

| P4 | INFO | Routine events (forge completed, spec vitrified, backup succeeded) | Log only — no alert |


DETECTION SOURCES

| Source | What It Detects | Cadence |

|---|---|---|

| GAPX daily scan | Stale handshakes, missing backups, spec gaps | 04:30 ET daily |

| MEDX health query | RAM/disk/CPU, vacuum violations, Ollama state | On demand + GAPX |

| ROUTX watchdog cron | ROUTX process death | */5 |

| Sisters watchdog cron | Sisters tmux death | */5 |

| Aether compounder | Yield cycle failure | */5 |

| Backup script | Backup success/failure | 03:00 ET daily |

| HACKX (when built) | Honeypot probes, attack patterns | Continuous |

| Stripe webhooks | Payment failures | Real-time |

| Manual discovery | Captain or crew finds something | Ad hoc |


ESCALATION CHAIN


P4 INFO    → Log to ~/logs/[source].log → DONE

P3 LOW     → Log + add to CAPTAIN_BRIEF.md "Low Priority"
             → reviewed next morning

P2 MEDIUM  → Log + COMMX broadcast + add to CAPTAIN_BRIEF.md "Action Needed"

P1 HIGH    → Log + COMMX broadcast + write to ~/ALERTS.md
             + Lobster flags in next interaction

P0 CRITICAL→ Log + COMMX broadcast + ~/ALERTS.md
             + email jzlabis@gmail.com via VOICEX

P0 is the ONLY level that generates email. The Captain's phone is not a pager. P0 means the house is on fire.


~/ALERTS.md FORMAT

Each entry:


## [P1] Backup failure — 2026-04-20 03:05 ET
Source: backup_to_gcs.sh
Detail: GCS access denied — service account credentials rejected
Action needed: Rotate GCS credentials. Re-run backup.
Status: OPEN

Resolved alerts moved to ~/alerts_archive/[month].md monthly.


~/logs/ DIRECTORY

| Log file | Source |

|---|---|

| gapx.log | GAPX daily scan |

| routx_watchdog.log | ROUTX liveness check |

| sisters_watchdog.log | Sisters tmux watchdog |

| backup.log | GCS + local backup scripts |

| compounder.log | Aether yield cycle |

| hackx.log | HACKX honeypot (when built) |

| stripe.log | Stripe webhook events |

| LOBSTER_LOG.md | All Lobster operations |

Log rotation: 30 days, then compress to ~/logs/archive/. CRONX monthly job.


COMMX BROADCAST FORMAT

One line. Severity + what + when + where. No paragraphs.


[ALERT] [P1] Backup failure at 03:05 ET. GCS access denied. See ~/ALERTS.md.
[ALERT] [P2] GAPX: SISTERS_HANDSHAKE.md is 26 hours old. Threshold: 24h.
[INFO] [P4] Brain forge complete: ORPHEUS v1. Score: 5/5. PROMOTED.

P0 EMAIL FORMAT


RESPONSE PROTOCOL

| Level | Who Acts | Approval | Postmortem |

|---|---|---|---|

| P0 | Captain (15 min) or Lobster (pre-authorized list) | Pre-authorized or Captain live | Required within 24 hours |

| P1 | Lobster diagnoses, Captain approves fix | Captain required | Required if novel |

| P2 | In CAPTAIN_BRIEF next morning | Captain scheduled | Only if recurring |

| P3 | In CAPTAIN_BRIEF "Low Priority" | When convenient | None |

| P4 | Log only | N/A | None |


PRE-AUTHORIZED AUTONOMOUS RESPONSES

Lobster acts WITHOUT waiting for Captain approval on these specific scenarios:

| Trigger | Response |

|---|---|

| ROUTX dies | systemctl --user restart routx.service |

| Sisters tmux dies | Recreate session + summon-aether --gemini |

| Disk >95% | Delete old logs, clear /tmp, evict unused Ollama models |

| Unknown port on 0.0.0.0 | Kill process + ufw deny [port] |

| Credential in git staged files | Abort push + P1 alert |

Everything else: diagnose, report, wait for Captain. The pre-authorized list is a whitelist, not a permission to improvise.


SILENCE = ALARM

The absence of expected entries IS the alert. The monitoring system needs monitoring.

| Expected signal | Silence threshold | Alert level |

|---|---|---|

| GAPX daily report | Missing by 05:00 ET | P1 |

| backup.log entry | No entry >48 hours | P1 |

| Watchdog log entry | No entry >10 minutes | P1 |

| Sisters watchdog | No heartbeat >10 minutes | P1 |


INVARIANTS

INV-01: Every alert has a severity P0-P4. No unclassified alerts.

INV-02: P0 generates email. Only P0. Captain's inbox is not a log file.

INV-03: ~/ALERTS.md checked at every session start. Unresolved alerts discussed first.

INV-04: Resolved alerts archived, never deleted. History is forensic evidence.

INV-05: Silence is an alarm. Missing logs trigger P1 automatically.

INV-06: Pre-authorized autonomous responses limited to documented list only. No improvised autonomous actions.

INV-07: COMMX broadcasts are one line. No essays in the alert channel.

INV-08: Log rotation monthly. 30 days then archived. Logs never deleted — only compressed.

INV-09: P0 email max once per hour per incident. Alert fatigue kills alerting.

INV-10: Escalation chain tested quarterly per SPEC_SECURITY_AUDIT_SCHEDULE.md.


INTEGRATION

| System | Relationship |

|---|---|

| SPEC_SECURITY_AUDIT_SCHEDULE.md | Daily GAPX scan IS the detection layer. INV-10: chain tested quarterly. |

| SPEC_INCIDENT_POSTMORTEM.md | P0 and novel P1 alerts trigger postmortems automatically. |

| GAPX | Primary automated detection source. Feeds P2-P4 to CAPTAIN_BRIEF. |

| COMMX | Alert broadcast channel. P1+ alerts sent via COMMX before Captain session. |

| VOICEX | P0 email sender. Rate-limited. Captain-voice tone even in emergencies. |

| SPEC_BACKUP_RECOVERY.md | Backup failure is P1. Missing backup log >48h is P1 via INV-05 (silence = alarm). |

| SPEC_CRONX_JOB_REGISTRY.md | Watchdog crons are the heartbeat. Their log freshness IS the liveness signal. |


Jeremy Zlabis

Chronogeometer · Visionary · Disruptor · Chief

42 Sisters AI · East York, Toronto

🍁 Φ 0.042