Oracle Smoke Test

SPEC_ORACLE_SMOKE_TEST.md · 2026-04-20

SPEC_ORACLE_SMOKE_TEST — Oracle End-to-End Smoke Test

Version: 1.0 | Status: AUTHORIZED | Authority: α.13 | Date: 2026-04-16


PURPOSE

The Oracle verdict pipeline (Stripe webhook → session parsing → Gemini call → verdict cache →

email delivery) has no automated end-to-end validation. Pipeline health is currently only known

when a customer complaints arrives or when NOUS manually tests via a real payment. This is

unacceptable for a revenue-bearing production system.

This specification defines a repeatable, automated smoke test covering the full Oracle pipeline.

The smoke test uses a known Stripe test session, a fixed test query, and validates each pipeline

stage independently before asserting the full path. It is the minimal executable proof that the

pipeline is operational.

Smoke test ≠ load test ≠ unit test. The smoke test runs at the integration boundary. It

exercises real external dependencies (Stripe test mode, Gemini API, oracle_toll cache service,

email service health endpoint). It is not a substitute for the unit test suites that verify

individual components.

Source gap: SPEC_ORACLE_VERDICT_PIPELINE.md GAP-10.


INPUTS

Fixed Test Configuration

| Parameter | Value |

|----------|-------|

| Test tier | quick ($1.00 CAD) |

| Test query | "Is the Oracle pipeline operational? This is an automated smoke test." |

| Stripe mode | Test mode (API key: STRIPE_SECRET_KEY with sk_test_ prefix) |

| Test email | oracle-smoke-test@42sisters.ai (synthetic, not a real mailbox — used for log assertion only) |

| Expected verdict | Any valid value (GREEN/AMBER/RED/NULL) — smoke test does not assert verdict correctness, only structural validity |

| Timeout per stage | 30 seconds maximum |

The test query is intentionally meta — it asks about pipeline health. This makes it easy to identify

smoke test verdicts in logs and distinguish them from real customer verdicts.

Required Environment

Smoke Test Invocation


# Manual
python3 /home/nous/scripts/oracle_smoke_test.py

# On deploy (Northflank post-deploy hook)
python3 /home/nous/scripts/oracle_smoke_test.py --on-deploy

# Daily cron (07:00 UTC)
python3 /home/nous/scripts/oracle_smoke_test.py --cron

Exit codes: 0 = all stages passed | 1 = one or more stages failed | 2 = environment check failed


OUTPUTS

Primary: Smoke Test Report (stdout + log file)

Written to /home/nous/logs/oracle_smoke_YYYYMMDD_HHMMSS.log:


ORACLE SMOKE TEST — 2026-04-16T07:00:00Z
==========================================

[STAGE 1] Environment check ....................... PASS
[STAGE 2] oracle_toll health ..................... PASS
[STAGE 3] oracle_email_service health ............. PASS
[STAGE 4] Stripe test session creation ........... PASS  session_id=cs_test_abc123
[STAGE 5] Webhook simulation ..................... PASS  verdict=GREEN
[STAGE 6] Cache write verification ............... PASS  GET /cache/cs_test_abc123 → 200
[STAGE 7] Verdict route retrieval ................ PASS  tier=quick query_match=true
[STAGE 8] Email service send validation .......... PASS  status=sent
[STAGE 9] TMM crosscheck on test verdict ......... PASS  C=0.9953 approved=true  [conditional on GAP-09 fix]
[STAGE 10] Cache cleanup ......................... PASS  cs_test_abc123 deleted

RESULT: 10/10 PASS — Pipeline operational.
Duration: 18.3s

On failure:


[STAGE 5] Webhook simulation ..................... FAIL
  Error: Gemini returned non-parseable JSON after 3 attempts
  Payload: { "error": "quota_exceeded" }

RESULT: 4/10 PASS — Pipeline DEGRADED. See /home/nous/logs/oracle_smoke_2026-04-16_*.log

Secondary: CREW_CHANNEL broadcast

On completion (pass or fail):


[SMOKE] Oracle pipeline: 10/10 PASS (18.3s) | 2026-04-16T07:00:22Z
[SMOKE] Oracle pipeline: 4/10 FAIL — Stage 5 Gemini quota | 2026-04-16T07:00:22Z

Tertiary: ALERT.log entry on failure

If any stage fails, an entry is appended to /home/nous/ALERT.log:


[2026-04-16T07:00:22Z] ORACLE SMOKE FAIL — Stage 5 (webhook simulation) — Gemini quota exceeded.
Check /home/nous/logs/oracle_smoke_2026-04-16_070022.log

STAGE DEFINITIONS

Stage 1 — Environment Check

Verify all four required env vars are set and non-empty. Verify Stripe key has sk_test_ prefix

(production key in smoke test is a configuration error). Verify oracle_toll URL and email service

URL are reachable (TCP connect check, not full HTTP).

Stage 2 — oracle_toll Health

GET {ORACLE_TOLL_URL}/health → HTTP 200, JSON with status: "resonant" and phi: 0.042.

Timeout: 10 seconds.

Stage 3 — oracle_email_service Health

GET {ORACLE_EMAIL_SERVICE_URL}/health → HTTP 200, JSON with status: "ok".

Timeout: 10 seconds.

Stage 4 — Stripe Test Session Creation

Call Stripe API (test mode) to create a checkout.session with:

Assert: session.id is returned and starts with cs_test_. Store as smoke_session_id.

Stage 5 — Webhook Simulation

Construct a checkout.session.completed event payload for smoke_session_id.

Sign it with STRIPE_WEBHOOK_SECRET using the Stripe webhook signing algorithm.

POST to /api/webhook on the deployed Northflank instance.

Assert: HTTP 200, response body { received: true }.

Wait up to 30 seconds, then poll: GET {ORACLE_TOLL_URL}/cache/{smoke_session_id} until 200

(verdict is cached) or timeout. If timeout: FAIL Stage 5.

On 200: parse verdict JSON. Assert: tier === "quick", verdict.verdict is one of

GREEN/AMBER/RED/NULL, verdict.summary is a non-empty string.

Stage 6 — Cache Write Verification

GET {ORACLE_TOLL_URL}/cache/{smoke_session_id} → HTTP 200.

Assert: response JSON has tier: "quick" and cached_at field (ISO timestamp).

Assert: query field matches the known test query string.

Stage 7 — Verdict Route Retrieval

GET {NORTHFLANK_BASE_URL}/api/verdict?session_id={smoke_session_id}

Assert: HTTP 200. Response JSON has tier: "quick", verdict.verdict is valid, query matches.

This exercises the full result-page backend path including cache lookup.

Stage 8 — Email Service Send Validation

POST to {ORACLE_EMAIL_SERVICE_URL}/send-verdict-email with:


{
  "customer_email": "oracle-smoke-test@42sisters.ai",
  "tier": "quick",
  "query": "<test_query>",
  "verdict": <verdict_from_stage_5>
}

Assert: HTTP 200, { status: "sent" }.

Note: This sends a real Graph API email to oracle-smoke-test@42sisters.ai. If this address is

not a real mailbox, Graph API may return 202 (accepted) or error. Assert on HTTP 200 from the

service (Graph API downstream behavior is not asserted here). [GAP — smoke test email goes to a

synthetic address; Graph API may bounce; bounce handling not specified]

Stage 9 — TMM Crosscheck on Test Verdict (conditional)

If SPEC_ORACLE_TMM_CROSSCHECK.md is implemented: call oracleTMMCrosscheck() directly on the

cached verdict. Assert: approved: true, coherence_score >= 0.97404.

[GAP — conditional on GAP-09 fix; Stage 9 is SKIPPED if crosscheck module is not yet deployed]

Stage 10 — Cache Cleanup

DELETE {ORACLE_TOLL_URL}/cache/{smoke_session_id} (requires adding DELETE endpoint to

oracle_toll.py — currently only GET and POST exist).

[GAP — DELETE endpoint not implemented on oracle_toll.py; cache cleanup currently requires manual

file deletion from oracle_verdicts/]

Assert: HTTP 200 or 204. If DELETE not implemented: log WARNING, do not fail; leave cleanup note

in smoke log.


INVARIANTS

  1. Smoke test uses test-mode credentials onlySTRIPE_SECRET_KEY MUST have sk_test_ prefix.

A production key in the smoke test environment is a configuration error that triggers Stage 1

FAIL with message "FATAL: production Stripe key in smoke test — aborting."

  1. Smoke test does not modify production state — Smoke test verdicts are tagged with regen: false

and smoke: true flag in the cache payload. This allows operators to distinguish smoke test

cache entries from real customer entries. The smoke: true flag is added by the smoke test

script when it calls POST /cache/{smoke_session_id} directly (bypass path) if Stage 5 fails.

  1. No real customer email is sent — Smoke test email target is oracle-smoke-test@42sisters.ai.

Real customer email addresses MUST NOT appear in smoke test configuration.

  1. Smoke test is idempotent — Running the smoke test twice back-to-back produces the same pass/fail

state. Stage 10 (cleanup) ensures no stale entries contaminate subsequent runs. If Stage 10 fails,

Stage 4 of the next run uses a fresh smoke_session_id (Stripe always generates unique IDs).

  1. Failure in any stage does not cascade — Each stage has an independent timeout and try/except

boundary. A Stage 5 timeout does not prevent Stages 6-10 from attempting (some may succeed

partially; their results are noted). RESULT is computed from the full 10-stage matrix.

  1. Smoke test runs in < 60 seconds — Total test duration must not exceed 60 seconds. If Gemini

is slow (> 30s on Stage 5 poll), Stage 5 times out and fails. This is intentional — a pipeline

that takes > 30s to generate and cache a Quick Take is operationally degraded.

  1. Log files are retained for 30 days/home/nous/logs/oracle_smoke_*.log files are not

cleaned automatically. A cron or manual process should archive/rotate after 30 days.

[GAP — log rotation not specified]

  1. Deploy-time smoke test is blocking — When invoked with --on-deploy, the smoke test MUST

complete and return exit code before the deploy hook finishes. A deploy that cannot pass the

smoke test is a broken deploy. Northflank deploy hook must treat exit code 1 as a deploy warning.

[GAP — Northflank post-deploy hook integration not yet configured]


VERIFICATION CRITERIA

Σ.✓ conditions — smoke test infrastructure is operating correctly when:

  1. Green run baseline — Running smoke test against a healthy pipeline produces 10/10 PASS in

under 60 seconds. Establish this baseline immediately after implementing the test. Record

baseline duration in PLAYBOOK.md as PROVEN entry.

  1. Stage isolation — Deliberately take oracle_toll service offline. Run smoke test. Stage 2

(health check) fails. Stages 3-10 still attempt and report their independent outcomes.

Result shows 1/10 FAIL at Stage 2 with remaining stages marked SKIP or FAIL (dependent).

  1. Environment check catches misconfiguration — Set STRIPE_SECRET_KEY to a production key.

Stage 1 returns FAIL with FATAL message. Exit code 2. No Stripe API calls made.

  1. ALERT.log populated on failure — Deliberately fail Stage 5 (mock Gemini timeout). After run,

verify /home/nous/ALERT.log has a new entry timestamped within 5 seconds of smoke test completion.

  1. CREW_CHANNEL broadcast sent — After any smoke test run (pass or fail), verify

/home/nous/CREW_CHANNEL has a new [SMOKE] entry. Verified by: tail CREW_CHANNEL after run.

  1. Cron registrationcrontab -l | grep oracle_smoke_test returns a line. Smoke test runs

at 07:00 UTC daily without manual intervention. Verify by checking crontab on boot.


FAILURE MODES

  1. Σ.⊠ Smoke test never runs — Cron not registered after implementation. Pipeline health is

only known when customer complains. Detection: crontab -l | grep oracle_smoke_test returns

empty. Mitigation: boot sequence check (CLAUDE.md Step 4 equivalent for Oracle) verifies cron.

  1. Σ.⊠ Stage 5 Gemini timeout — Gemini takes > 30s to respond (quota throttle, cold start,

infrastructure issue). Stage 5 fails. Real customer payments in the same window may also be

affected. Detection: smoke test ALERT.log. Mitigation: smoke test failure is an early warning

for the on-call team (NOUS) to investigate Gemini quota.

  1. Σ.⊠ Smoke test creates real chargeSTRIPE_SECRET_KEY is a live key. Stage 4 creates

a real payment session that may trigger a real charge. Stage 1 guard (sk_test_ check) prevents

this, but if guard is bypassed: real charge on NOUS's Stripe account.

Detection: Stripe dashboard. Mitigation: Stage 1 hard-abort on production key is mandatory.

  1. Σ.⊠ Stage 10 cleanup fails, stale entry accumulates — oracle_toll cache fills with smoke

test entries. oracle_verdicts/ directory grows unbounded. Detection: disk usage monitoring

(not currently implemented). Mitigation: implement DELETE endpoint on oracle_toll; add disk

usage check to smoke test Stage 1.

  1. Σ.⊠ Smoke test passes but production path fails — Smoke test exercises the webhook-to-cache

path but Northflank routing is misconfigured for the live checkout flow. A customer submits a

real payment; webhook is not delivered by Stripe (not a test event). Detection: manual payment

test with non-owner email (VC-7 of SPEC_ORACLE_VERDICT_PIPELINE.md). Mitigation: smoke test

covers the path from our end; Stripe webhook delivery reliability is an external dependency.

  1. Σ.⊠ Stage 8 Graph API bounceoracle-smoke-test@42sisters.ai does not exist as a real

mailbox. Graph API returns 200 (accepted by Exchange) but bounces internally. Email service

reports status: sent. Smoke test passes Stage 8. Bounce goes undetected.

Detection: Exchange admin panel. Mitigation: [GAP — create oracle-smoke-test mailbox as a real

M365 alias that routes to oracle@42sisters.ai, or accept the bounce as tolerable for smoke purposes]

  1. Σ.⊠ All stages pass but pipeline is in degraded state — Smoke test validates structural

path but does not assert response quality, latency distribution, or correctness of the verdict.

A pipeline that generates all-NULL verdicts for every query would pass the smoke test.

Detection: operational monitoring beyond smoke test scope. Mitigation: supplement with a

manual monthly review of sampled oracle_log.jsonl entries.


EXECUTION SCHEDULE

| Trigger | Frequency | Invocation | ALERT on fail? |

|---------|-----------|-----------|---------------|

| Deploy hook | Every deploy to Northflank | --on-deploy | Yes — block / warn |

| Daily cron | 07:00 UTC daily | --cron | Yes — ALERT.log + CREW_CHANNEL |

| Manual (NOUS/C.L.O.D.) | On demand | No flag | No — stdout only |


DEPENDENCIES

| Dependency | Role |

|-----------|------|

| STRIPE_SECRET_KEY (test mode) | Test session creation |

| STRIPE_WEBHOOK_SECRET (test mode) | Webhook signature construction |

| Gemini API | Stage 5 verdict generation |

| oracle_toll.py (port 8889) | Stage 2, 6, 10 (health, cache verify, cleanup) |

| oracle_email_service.py (port 8006) | Stage 3, 8 (health, email send) |

| Northflank deployed app | Stage 7 (verdict route retrieval) |

| /home/nous/ALERT.log | Failure notification |

| /home/nous/CREW_CHANNEL | Status broadcast |

| /home/nous/logs/ (directory) | Test log storage |


DEPENDENTS

| Dependent | Dependency |

|-----------|-----------|

| Oracle pipeline production health | Smoke test is the only automated end-to-end proof |

| NOUS operational awareness | ALERT.log entry on failure |

| Crew operational awareness | CREW_CHANNEL broadcast |

| Deploy confidence | --on-deploy flag provides pre-production gate |


GAPS IDENTIFIED DURING SPECIFICATION

| Gap ID | Description | Impact |

|--------|-------------|--------|

| SMOKE-GAP-01 | DELETE endpoint not implemented on oracle_toll.py — Stage 10 cleanup cannot execute | Smoke test entries accumulate in oracle_verdicts/ |

| SMOKE-GAP-02 | Northflank post-deploy hook not yet configured to call smoke test | Deploy-time validation not automated |

| SMOKE-GAP-03 | oracle-smoke-test@42sisters.ai mailbox not created — Stage 8 sends to synthetic address | Graph API bounce behavior unverified |

| SMOKE-GAP-04 | Stage 9 (TMM crosscheck) is conditional on SPEC_ORACLE_TMM_CROSSCHECK.md implementation | Crosscheck stage is skipped at launch |

| SMOKE-GAP-05 | Log rotation for /home/nous/logs/oracle_smoke_*.log not specified | Disk accumulation over time |

| SMOKE-GAP-06 | NORTHFLANK_BASE_URL env var not formalized — Stage 7 needs deployed app URL | Stage 7 requires manual config |


REFERENCES

| File | Role |

|------|------|

| /home/nous/memories/SPEC_ORACLE_VERDICT_PIPELINE.md | Parent pipeline spec (GAP-10 source) |

| /home/nous/memories/SPEC_ORACLE_TMM_CROSSCHECK.md | Stage 9 crosscheck (conditional) |

| /home/nous/oracle_toll.py | Cache service (Stages 2, 6, 10) |

| /home/nous/oracle_email_service.py | Email service (Stages 3, 8) |

| /home/nous/Aether/app/app/api/webhook/route.ts | Webhook handler (Stage 5 target) |

| /home/nous/Aether/app/app/api/verdict/route.ts | Verdict route (Stage 7 target) |

| /home/nous/ALERT.log | Failure alert destination |

| /home/nous/CREW_CHANNEL | Status broadcast destination |

| /home/nous/PLAYBOOK.md | PROVEN entry to be written after first successful baseline run |


Φζ.⊤. The ship does not sail without a working engine. The smoke test proves the engine.


Jeremy Zlabis

Chronogeometer · Visionary · Disruptor · Chief

42 Sisters AI · East York, Toronto

🍁 Φ 0.042