Skip to content

KB / operations

Source Health

Last verified

Every fetch the platform runs appends a row to the source_runs table. The aggregate over a trailing window drives the single system-wide “are we healthy” reading on /api/v1/status and the dashboard’s StatusDot.

The source_runs row

record_source_run(source, ok, error, degraded) in app/signals/database.py writes a row with four columns:

ColumnMeaning
sourceThe source name (market, fred, darkpool, breadth, …)
ok1 for success, 0 for failure
degraded1 when the fetch failed but a cached fallback was served — counts as ok=1
errorTruncated error message (≤500 chars) on failure, NULL on success
run_atET-aware ISO 8601 timestamp (Law 1)

Retention

source_runs is the one signal table with a default prune. Rows older than 30 days are deleted at the start of every cycle. Override via RETENTION_SOURCE_RUNS_DAYS (set to 0 to disable entirely). The /api/v1/status endpoint reads only a 7-day window, so the 30-day default is a comfortable buffer — anything beyond the retention horizon is dead weight.

Every other signal-history table is keep-forever by default; see the Retention policy in the repo CLAUDE.md for the full table.

The 95 / 80 escalation rule

/api/v1/status aggregates the trailing 7 days and computes sources_healthy_pct_7d. That percentage maps to a single overall status level:

Top-level status bands

Green ≥ 95% All sources operating within tolerance.
Yellow 80 – 95% Warning. Surfaced in /status `warnings` list.
Red < 80% Issue. Surfaced in /status `issues` list, escalates dot.

The escalation is one-way per layer: any item in issues forces the top-level level to red; any item in warnings (with no issues) forces yellow; otherwise green. Other escalators feed the same logic — expiring Schwab tokens, the live SPY quote check, missing recent reports — so a 100% sources_healthy_pct_7d can still land yellow or red if something else is wrong.

What surfaces where

Schwab is special

Schwab health splits two ways because the token and the data both need to be working independently:

The combined schwab string reads healthy when data flows, the token state when the token is expired/missing, and token_ok_client_failing when the token file is fine but the live probe fails. That last state is deliberate: it means the trading Schwab client is wedged while the signals pipeline — which uses a separate Schwab client and keeps running — is unaffected. It is named precisely so it never reads as a platform-wide data outage.

The trading client self-heals. It tracks its own call health and, after a few consecutive failures or a stale window, rebuilds and re-reads the token file — so a token re-auth recovers without a restart. /status also returns a trading_client sub-object (initialized, consecutive_failures, seconds_since_success, refresh_failing, alarm). A sustained wedge the self-heal cannot clear (for example, the on-disk refresh token is itself dead) raises the alarm flag, a clearly-labeled warning, and a CRITICAL log — that is the signal to re-authenticate and, if it persists, restart the backend.

A token expiring within 12 hours escalates to a warnings entry. Schwab being completely down doesn’t take down the report cycle — the market source falls back to yfinance for everything except the option-chain-only signals (GEX, ZGL, PCR, gamma walls) which have no fallback path.

Operating posture

A healthy system sits around 99% over the 7-day window — the occasional yfinance flake and rate-limited Alpha Vantage tick eat a few percent on a bad day. A persistent dip below 95% usually means one specific source is degraded:

See also