Source Health

Last verified2026-05-20

Every fetch the platform runs appends a row to the source_runs table. The aggregate over a trailing window drives the single system-wide “are we healthy” reading on /api/v1/status and the dashboard’s StatusDot.

The `source_runs` row

record_source_run(source, ok, error, degraded) in app/signals/database.py writes a row with four columns:

Column	Meaning
`source`	The source name (`market`, `fred`, `darkpool`, `breadth`, …)
`ok`	`1` for success, `0` for failure
`degraded`	`1` when the fetch failed but a cached fallback was served — counts as `ok=1`
`error`	Truncated error message (≤500 chars) on failure, `NULL` on success
`run_at`	ET-aware ISO 8601 timestamp (Law 1)

Retention

source_runs is the one signal table with a default prune. Rows older than 30 days are deleted at the start of every cycle. Override via RETENTION_SOURCE_RUNS_DAYS (set to 0 to disable entirely). The /api/v1/status endpoint reads only a 7-day window, so the 30-day default is a comfortable buffer — anything beyond the retention horizon is dead weight.

Every other signal-history table is keep-forever by default; see the Retention policy in the repo CLAUDE.md for the full table.

The 95 / 80 escalation rule

/api/v1/status aggregates the trailing 7 days and computes sources_healthy_pct_7d. That percentage maps to a single overall status level:

Top-level status bands

Green≥ 95%All sources operating within tolerance.

Yellow80 – 95%Warning. Surfaced in /status `warnings` list.

Red< 80%Issue. Surfaced in /status `issues` list, escalates dot.

The escalation is one-way per layer: any item in issues forces the top-level level to red; any item in warnings (with no issues) forces yellow; otherwise green. Other escalators feed the same logic — expiring brokerage data-session tokens, the live SPY quote check, missing recent reports — so a 100% sources_healthy_pct_7d can still land yellow or red if something else is wrong.

What surfaces where

GET /api/v1/status — top-level health reading. Returns level, brokerage data-session status, sources_healthy_pct_7d, total_source_runs_7d, degraded_runs_7d, and a per-source source_failures_7d breakdown.
GET /api/v1/source-health — per-source detail: today’s request count, failure count, remaining rate-limit budget (where applicable). No source currently carries a daily budget cap — the third-party news-sentiment feed’s 25-req/day budget retired with that source (2026-06, DOCTRINE D22).
GET /api/v1/source-health/trends — rolling per-source reliability series over a configurable window (1–180 days).
Dashboard StatusDot — polls /api/v1/status and renders the same green/yellow/red band as a dot in the dashboard header. Hover for the most recent percentage.

The brokerage data session is special

Brokerage data-session health splits two ways because the token and the data both need to be working independently:

token_status ∈ unreadable — based on the on-disk refresh token state.
data_status ∈ unknown — based on a live SPY quote probe against the trading client at request time.

The combined brokerage status reads healthy when data flows, the token state when the token is expired/missing, and token_ok_client_failing when the token file is fine but the live probe fails. That last state is deliberate: it means the trading brokerage client is wedged while the signals pipeline — which uses a separate client and keeps running — is unaffected. It is named precisely so it never reads as a platform-wide data outage.

The trading client self-heals. It tracks its own call health and, after a few consecutive failures or a stale window, rebuilds and re-reads the token file — so a token re-auth recovers without a restart. /status also returns a trading_client sub-object (initialized, consecutive_failures, seconds_since_success, refresh_failing, alarm). A sustained wedge the self-heal cannot clear (for example, the on-disk refresh token is itself dead) raises the alarm flag, a clearly-labeled warning, and a CRITICAL log — that is the signal to re-authenticate and, if it persists, restart the backend.

A token expiring within 12 hours escalates to a warnings entry. The brokerage feed going down doesn’t take down the report cycle — the market source falls back to the public market-data feed for everything except the option-chain-only signals (GEX, ZGL, PCR, gamma walls) which have no fallback path.

Operating posture

A healthy system sits around 99% over the 7-day window — the occasional market-data-feed flake eats a few percent on a bad day. A persistent dip below 95% usually means one specific source is degraded:

Check source_failures_7d — which source is the loudest contributor?
Check /api/v1/source-health for that source — is the budget exhausted, the upstream down, or the token expired?
Cross-reference against the source’s release calendar — a weekly source (cot, aaii) only fires a handful of times per week, so a single failure has outsized impact on its 7d percentage.